Title | Lewis, Jamison_MCS_2023 |
Alternative Title | Effects of XAI in Decision Making |
Creator | Lewis, Jamison |
Collection Name | Master of Computer Science |
Description | The following masters of computer science thesis explores the impact of Explainable Artificial Intelligence (XAI) on decision making. |
Abstract | This thesis paper explores the impact of Explainable Artificial Intelligence (XAI) on decision making. XAI is an AI (Artificial Intelligence) technology that provides human users with explanations of AI-based decisions, enabling them to better understand and trust the decisions made by AI systems. In this thesis I examine the current state of XAI and the ways it effects decision making and reliance in various contexts, including a user's AI Background experience, a user's initial bias towards AI, the preference of rationale generation techniques, and the complexity of a situation. Finally, I evaluate the way each of these factors play a role in the decision-making process. By understanding the benefits and risks of XAI, we can better use this technology to improve decision making in our society. My findings are the AI background of an individual has a minimal effect on the perception of XAI rationale generation technique messages specifically in terms of understandability. In addition, AI background and bias don't influence reliance to use XAI suggestions in decision making, but instead it's the combination of preference of rationale generation technique and complexity of a situation that have the most influence on reliance to use XAI in decision making. |
Subject | Artificial Intelligence; Computer science; Technology; Problem solving |
Keywords | XAI; Reliance; Decision Making; Manipulation; Explainability Pitfalls; Critical Thinking; Rationale Generation Techniques |
Digital Publisher | Stewart Library, Weber State University, Ogden, Utah, United States of America |
Date | 2023 |
Medium | Thesis |
Type | Text |
Access Extent | 72 page PDF; 2.29 MB |
Language | eng |
Rights | The author has granted Weber State University Archives a limited, non-exclusive, royalty-free license to reproduce their theses, in whole or in part, in electronic or paper form and to make it available to the general public at no charge. The author retains all other rights. |
Source | University Archives Electronic Records: Master of Computer Science. Stewart Library, Weber State University |
OCR Text | Show Effects of XAI In Decision Making by Jamison Lewis A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE OF COMPUTER SCIENCE WEBER STATE UNIVERSITY Ogden, Utah February 7, 2023 ___________________________________ Robert Ball Committee Chair ___________________________________ Nicole Anderson Committee Member ___________________________________ Sarah Herrmann Committee Member Nicole Anderson (Feb 8, 2023 15:25 PST) Author A Thesis in the Field of Computer Science for the Degree of Master of Science in Computer Science Weber State University Copyright 2023 Jamison Lewis Effects of XAI In Decision Making Abstract This thesis paper explores the impact of Explainable Artificial Intelligence (XAI) on decision making. XAI is an AI (Artificial Intelligence) technology that provides human users with explanations of AI-based decisions, enabling them to better understand and trust the decisions made by AI systems. In this thesis I examine the current state of XAI and the ways it effects decision making and reliance in various contexts, including a user’s AI Background experience, a user’s initial bias towards AI, the preference of rationale generation techniques, and the complexity of a situation. Finally, I evaluate the way each of these factors play a role in the decision-making process. By understanding the benefits and risks of XAI, we can better use this technology to improve decision making in our society. My findings are the AI background of an individual has a minimal effect on the perception of XAI rationale generation technique messages specifically in terms of understandability. In addition, AI background and bias don’t influence reliance to use XAI suggestions in decision making, but instead it’s the combination of preference of rationale generation technique and complexity of a situation that have the most influence on reliance to use XAI in decision making. Table of Contents Related Work ..................................................................................................................... 11 Research Design ................................................................................................................ 13 Recruitment ........................................................................................................... 13 AI Background Group ............................................................................... 13 Non-AI Background Group ....................................................................... 14 Screening Method ................................................................................................. 14 Education Level/Subject of Study ............................................................. 15 Knowledge Test ......................................................................................... 15 Self-reported Knowledge Level ................................................................ 15 AI Class Taken .......................................................................................... 16 Initial Bias ................................................................................................. 16 Study Details ..................................................................................................................... 17 Screening Results .................................................................................................. 20 Results ............................................................................................................................... 23 Perception and AI Background ............................................................................. 23 Descriptive Analysis ................................................................................. 23 Chi-Square Analysis .................................................................................. 30 Preference and AI Background ............................................................................. 34 Descriptive Analysis ................................................................................. 35 Chi-Square Analysis .................................................................................. 35 Dependent T-Test ...................................................................................... 36 Independent T-Test ................................................................................... 39 Complexity and Preference ................................................................................... 41 Descriptive analysis ................................................................................... 41 Chi-Square Analysis .................................................................................. 43 Bias and Reliance .................................................................................................. 43 Descriptive Analysis ................................................................................. 44 Correlational Analysis ............................................................................... 44 RGT Preference and Reliance ............................................................................... 45 Chi-Square Analysis .................................................................................. 46 One-Way ANOVA .................................................................................... 46 Two-Way ANOVA ................................................................................... 48 Linear Regression ...................................................................................... 49 Complexity and Reliance ...................................................................................... 50 Chi-Square Analysis .................................................................................. 50 Linear Regression ...................................................................................... 51 AI Background and Reliance ................................................................................ 53 Chi-Square Analysis .................................................................................. 53 Linear Regression ...................................................................................... 53 Conclusion ......................................................................................................................... 54 Appendix 1. Questionnaire ................................................................................................ 58 Appendix 2. Study Variables ............................................................................................ 61 Appendix 3. Study Medium .............................................................................................. 62 References ......................................................................................................................... 68 Introduction Artificial intelligence (AI) is what makes it possible for machines or programs/applications to learn from experience. AI works by combining large sets of data with intelligent, iterative processing algorithms to learn from patterns and features in the data that they analyze. The AI improves its expertise each time the system iterates through more datasets. In turn this allows autonomous or semi-autonomous systems to be created. AI has been applied to numerous real-world domains such as education and transportation, some of which are domains containing critical systems such as health care [10,11], criminal justice [12- 14], and finance [15,16]. AI-driven systems increasingly power high-stakes decision making in public domains and their explainability is critical for end users to make informed and accountable actions. Machine learning is a branch of artificial intelligence that involves the development of algorithms and models that allow machines to learn and improve from experience. Deep neural networks are a type of machine learning model that are inspired by the structure and function of the human brain. These networks consist of multiple layers of interconnected nodes, or "neurons," which process and transmit information. They are particularly well-suited for tasks such as image recognition, natural language processing, and speech recognition, and have been used to achieve state-of-the-art performance in many areas. The explainability of AI systems comes from an area of research called Explainable AI (XAI). Explainable Artificial Intelligence (XAI) aims to provide human understandable justifications for AI system behavior using various AI and machine learning techniques. Although there is a current lack of consensus on the meaning of explainability and related terms such as interpretability [24], XAI work shares a common goal of making the AI systems’ decisions or behaviors understandable by people [25]. Simpler models such as linear regression and decision-tree are typically considered directly interpretable, but recent work (e.g. [22, 23]) focuses on developing new algorithms that open the black box and allow 7 internal inspection without sacrificing performance. The term “black box” refers to models that are sufficiently complex that they are not straightforwardly interpretable to humans. To explain the models that are not directly human understandable, such as deep neural networks, the combination of the various AI and machine learning techniques referred to as explanation generation methods are used. Explanation generation methods are computational models that learn to translate an AI’s internal state and action data representations into natural language. They are often post-hoc techniques [3] that could be applied after model building. Post-hoc techniques in XAI are methods used to explain the decision-making process of a machine learning model after it has been trained and deployed. Typically, these methods rely on distilling a simpler model from the input and output [26] or meta knowledge about the model [42] to generate explanations that approximate the model’s behavior. The meta knowledge of a model in AI refers to the information about the model itself, including its structure, parameters, and performance metrics. These methods allow any model to become explainable and can make them geared more towards non-AI experts. This paper explores three different modals of explanation generation methods called Rationale Generation (RG), Action Declaring (AD), Numerical Reasoning (NR), and the proposition of another type called Counterfactual Generation (CG). These four will be referred to as Rationale Generation techniques (RGT) in this paper. XAI is important in situations where human operators work alongside autonomous and semi-autonomous systems because it can help build rapport, confidence, and understanding between the system and its operator. In the event an autonomous system fails to complete a task or completes it in an unexpected way, explanations help the human worker understand the circumstances that led to the behavior, which also allows the worker to make an informed decision on how to address the behavior. Explainability has been an ongoing issue within AI [2], but with the rapid increase of models that are difficult to interpret, it has led to growth in techniques that aim to help us understand what is going on 8 inside the opaque AI box, called a black box as opposed to a white box. The term “white box” refers to models one can clearly explain how they behave, how they produce predictions and what the influencing variables are [17]. While understanding what’s going on inside the black or white box is important, who metaphorically opens the box also matters. In XAI there is an important question of “explainable to whom?” that’s not directly explored. The who is the most important factor of which governs the most effective way to describe the why behind decisions. For this reason, it’s important to understand how different users with different user characteristics matter in XAI. For example, one’s AI background is an impactful user characteristic in XAI because there is often disparity in this characteristic between creators/developers and end-users, which can lead to inequities [5]. Many end-users are unlikely to have AI backgrounds comparable to the creators of the technology [7]. This issue is the reason why studying how Rationale Generation Techniques affect end users is imperative and how a combination of different demographics (e.g., education, profession, bias) impact perceptions of explanations. With the increase of AI being utilized in high stakes environments we also need to understand how urgency can affect decision making in combination with various Rationale Generation Techniques, AI backgrounds, and different sets of end user demographics. Urgency in the context of time and complexity is known to affect the decision making of a participant [4], thus making this another crucial variable to be explored. This paper focuses on two groups, people with and people without an AI background. These two groups will be compared in various ways to understand how impactful this user characteristic is regarding working with XAI and seek to explain their similarities and differences. 9 I will explore how the preference of different Rationale Generation Techniques might change based not solely on one’s AI background, but also how the complexity of a situation can affect preference as well. As stated in [4], the complexity of a decision affects the decision of an individual. I want to understand how this may impact a user’s interaction with XAI suggestions and see if it changes their predilection of Rationale Generation Technique based on the complexity of a situation. In addition to preference, understanding how the perception of XAI Rationale Generation Techniques can change based on AI background is another way these two groups will be compared and contrasted. Perception will be measured along five dimensions: understandability, confidence, intelligence, friendliness, and second chance, which are grounded in related work around HCI (Human- Computer Interaction, HRI (Human-Robot Interaction), and XAI [18, 19, 20, 3, 21] and will be quantitatively explored within and between groups to see their similarities and differences. My research question is the following: Does the Rational Generation Techniques of an XAI, the complexity of a situation, initial bias towards AI, and AI background of a participant influence a person’s reliance to use XAI for a particular situation and in what ways do these variables affect their thought processes and overall decision? I hypothesize that the more an XAI suggestion explains a scenario quantitatively and provides statistics concerning the most likely outcomes the more inclined a participant will be to use and trust the XAI for multiple scenarios. When complexity is added to the picture, the more complex a situation is the favored XAI suggestion will have more weight in the participants decision-making process thus increasing the reliance on the XAI. I believe participants with AI experience are more inclined to rely on the XAI suggestions due to them being more familiar with how AI systems work and inherently trusting AI systems more in general. 10 Additionally, I believe participants without AI experience will be more skeptical initially while using the XAI suggestions, but their trust will increase if the XAI explains the AI's behavior. Recent work has shown there is a dual process of cognition when people process AI explanations [8]. The dual-process theory says that users’ cognitive processes follow two systems: System 1 often relies on heuristics (mental short-cuts, bias) that can be developed through past experiences and if applied inappropriately can lead to cognitive biases such as over-trust in XAI whereas System 2 engages in deliberative and analytical thinking [2]. This leads me to think that the Counterfactual rationale generation technique will be the most favored XAI suggestion regardless of AI background. The reason being is this technique's explanations are statistical statements that measure what could happen and participants will see the probability of outcomes based on variables within the current state giving participants a sense that the XAI is giving an analyzed and condensed view of a scenario. This ultimately makes the participant decide on what to do, leading them to use more of a System 2 type of thinking instead of rationale generation techniques that simply tell the participant which action they should take. That may lead the participant to listen to the XAI suggestion and simply do what it states thus decreasing the amount of System 2 thinking used in their thought process. 11 Related Work Ehsan and Riedl started research into a concept called Explainability Pitfalls (EPs). Explainability Pitfalls (EPs) are unanticipated and unintended negative downstream effects from AI explanations that can cause users to act against their own self-interests, align their decisions with a third party, or exploit their cognitive heuristics. Examples of these downstream negative effects include user perceptions like misplaced trust, over (or under) estimating the AI’s capabilities, and over-reliance on certain explanation forms [1]. This is an important concept that needs to be considered while creating and implementing advanced AI/XAI systems as the general population becomes increasingly involved with them. Previous work done by Ehsan and Riedl has shown the AI background of an individual changes the way they perceive explanations from an XAI. Additionally, those without an AI background will trust a human-like explanation without much cognitive skepticism [1]. This is an issue because most XAI research is in developing algorithms that instill unwavering trust in users [6]. More specifically, keeping the user’s trust even when the XAI is wrong. Some research has begun to examine users’ cognitive process of interpreting AI explanations, which could help us understand user receptions of XAI. Recent work has also shown there is a dual process of cognition when people process AI explanations [8]. The dual-process theory says that users’ cognitive processes follow two systems: System 1 often relies on heuristics (mental short-cuts, bias) that can be developed through past experiences and if applied inappropriately can lead to cognitive biases such as over-trust in XAI whereas System 2 engages in deliberative and analytical thinking [2]. In XAI, it’s often assumed that users engage in system 2 thinking, but additional research has shown most use system 1 [8]. One way to mitigate these biases would be to use Cognitive Forcing Functions (CFFs), interventions that disrupt heuristic reasoning and promote System 2 analytical thinking [9]. This is where the future research suggestion of using a counterfactual rationale generation algorithm, 12 to incite users to use a combination of System 1 and System 2 comes to be integrated into this paper’s experiment approach. This paper is a continuation of the research into the concept of Explainability Pitfalls (EPs) with complexity being added as an additional variable as well as the proposed counterfactual rationale generation technique being used to see how users perceive it. With research focus being on developing more persuasive systems and the knowledge that user backgrounds affect the way they perceive AI explanations; unsuspecting negative effects can emerge such as creating a false sense of security and tricking users into over-trusting systems intentionally or unintentionally. Without the awareness of how to identify and avoid these pitfalls, there are increased risks for users who may never be aware they’re being affected. This experiment aims to bring awareness to the unrealized intellectual blind spots in the development of high-stakes decision making AI-driven systems. 13 Research Design Recruitment Prior research has shown the AI background of an individual affects the way they perceive explanatory messages and AI systems in general [2]. As perception has been proven to be different based on AI background [2,3], I decided to use AI background as the baseline differentiator for a high contrast approach. This high contrast approach sets up a baseline where both groups have undeniable differences in characteristics which allows me to have an unadulterated understanding of the differences and similarities between both groups because at the core both groups are vastly different. However, instead of exclusively looking into how both groups differ in perception of explanatory messages I also wanted to see how they differed in terms of preference, and use of these explanatory messages using the previous perception research as the foundation of this study. AI Background Group For the AI group, I recruited participants who were students enrolled in graduate and undergraduate level CS programs and have or currently enrolled in AI courses as well as participants who currently work with AI operationally. I chose students currently in graduate level CS programs because part of the curriculum is exposure to AI. To be sure they had sufficient experience with AI, I chose those who had taken AI courses or were towards the end of the course understanding they would understand the inner workings of AI and how it derives output from training data as well as the various types of AI algorithms and their applications in real-world scenarios. This amount of training would be sufficient to make them statistically different than those without AI experience. I chose participants that work with AI operationally for the obvious reason that they work with AI regularly and understand it’s applications in real-world scenarios. These participants weren’t required to have extensive formal education as they’ve 14 been screened by companies and paid because of the knowledge they have of AI and implement AI solutions for other companies that pay them for their knowledge of AI. Non-AI Background Group For the non-AI group, I recruited participants willing to participate in the study keeping in mind that the groups needed to be similar in terms of education level and different in terms of subject of study. I reasoned that a wide spectrum of thinking skills are sharpened during the course of education and training [31]. As AI background is the foundational separator for the two groups, I wanted to control the number of differences between them from a statistical standpoint. Not meaning all those in the AI group think the same, this simply means there’s statistically higher potential for a wider range of thought with education and training [32] and I wanted that potential to be in both groups. From an education level standpoint, I wanted each participant to have an Associate’s degree as a minimum to participate in the study. As an additional requirement these same individuals couldn’t have an education or occupation in Engineering, Computer Science, nor Data Science/Analysis. Screening Method The way I decided to create these two groups was in the form of a screening questionnaire. The questionnaire consisted of four parts: - Education Level/Subject of Study - Knowledge test - Self-reported knowledge levels in: o Computer Programming o AI - Confirmation of whether they have ever taken an AI class. 15 In addition to these four, I also added an Initial Bias section consisting of two questions to initially understand how each participant felt about AI. These two questions asked how much they trusted AI and how useful they felt AI was. This section’s data was used in the analysis portion of the study to understand how bias may affect reliance on XAI. Education Level/Subject of Study Education level was used to determine if this variable was similar between the AI group and non- AI group. My goal was to have the education level be similar between the groups. Education level was measured in the following way, High School = 1, Trade School = 2, Some College = 3, Associate’s Degree = 4, Bachelor’s Degree = 5, Master’s Degree = 6, Ph.D. or higher = 7. Knowledge Test This knowledge test was used to get a baseline understanding of the participants programming and AI competency to ensure both groups were measurably different in terms of AI background knowledge. This test was created from previous work by U. Ehsan and the Georgia Institute of Technology to ensure the content within was relevant and the questions became gradually more difficult. This test consisted of five increasingly more difficult questions. If the participant had no comprehension of the subject the question referred to, they were allowed to indicate this and move on to the next one. That would indicate a subtraction of one point out of five total possible points. The cut-off points for each group were as follows. Self-reported Knowledge Level Participants self-disclosed their programming background knowledge to be used in conjunction with the knowledge test scores to further indicate if a participant should be in the AI or non-AI group. For 16 the AI group, participants were required to have a self-disclosed minimum level of “Moderate knowledge” or more [>= 4]. For the non-AI group, participants were required to have “No knowledge” [= 1]. Participants also self-disclosed their AI background knowledge to be used in conjunction with the knowledge test scores to further indicate if a participant should be in the AI or non-AI group. AI Class Taken Participants were asked to indicate if they’d taken an AI course, reason being no participant that had taken an AI course could be in the non-AI group. However, participants that hadn’t taken an AI course, passed the knowledge test with 4 or more, had at least “Moderate knowledge” of programming, and at least “Some knowledge” of AI background knowledge could be placed into the AI group. Initial Bias Cognitive bias refers to a systematic, non-random and predictable deviation from rationality in judgment or decision-making [27, 28]. As decision-making is influenced and enabled by internal (e.g., perception) factors [29], different individuals may utilize biases and heuristics to different degrees and this means that strategic decisions are correlated with the use of biases and heuristics [30]. As part of my research question, I wanted to test how cognitive bias might play a role in someone’s desire to use XAI and how much weight this factor has when the person needs help in the decision-making process. To gather data relevant to test if bias influences one’s willingness to use XAI suggestions, I had each participant self-disclose how much they trusted AI in a questionnaire using a 7-point Likert scale (1: No trust, 7: Full Trust) and how useful they felt AI was in everyday life (1: Not at all, 7: Extremely Useful). I used AI as the focus point as the field of XAI is not well-known to non-technical individuals [2]. 17 Study Details The experiment took place in-person in a one-on-one setting to avoid “group think”. The in-person experiment allowed for a cognitive walkthrough approach. A cognitive walk through approach in research involves evaluating a subject through the perspective of a user, considering the cognitive processes and decision-making involved in the study. All participants had an easier time verbalizing their thoughts, rather than noting them in a text box, resulting in better qualitative data. Audio from the study was recorded and used in the qualitative analysis portion of this study. No personal identifiable information was asked or talked about during the study as it was irrelevant to its purpose. This study approach was approved by the IRB. The experiment consisted of the participants analyzing six real-world scenarios of varying complexity (each scenario was given an initial numerical complexity rating) and decide how to react by selecting an option from a pre-determined list of choices per scenario. Scenarios had varying complexity to see if the complexity of a situation affected the level of reliance a participant had on XAI suggestions. In this study, reliance refers to a participant’s level of dependence on or trust in XAI suggestions when making critical decisions. Did the participant simply do what the XAI suggested, or did they use the suggestion in their critical thinking process to produce the best solution? During the study each participant was discreetly timed to see how long each scenario took them. While completing each scenario, participants had the use of 4 fabricated XAI suggestions of which appeared to be derived from a specialized AI (Fig 1). Although the suggestions were fabricated, the participants were told they were real. This made them feel more inclined to use the XAI suggestions in their decisions and helped build legitimacy. Each scenario had four different XAI suggestions deriving from these rational generation techniques: Rationale Generation (RG), Action Declaring (AD), Numerical Reasoning (NR), and Counterfactual 18 Generation (CG) to aid the participant in their decisions. The order of which the XAI suggestions appeared in each scenario changed so the primacy effect wouldn’t become a factor in participant preference of XAI suggestions. The rationale generation techniques were arbitrarily named Alpha, Bravo, Charlie, and Delta so participants didn’t know which one they were working with to not inspire favoritism during the study. I wanted participants to know the technique based on its messages, not its name. The following describes the explanation mechanisms and attributes of each rational generation technique (Fig 1): - Rationale-Generating (RG): Explanations are in natural language rationales explaining the “why” behind the suggestion. - Action-Declaring (AD): Explanations are detailed by stating its action in natural language without any justification. This XAI simply states which option is best for the scenario. - Numerical-Reasoning (NR): Simply outputs Q-values for the current state with no natural language component. Participants don’t know what the Q-values mean. Q-values come from a Reinforcement Learning algorithm called tabular Q-learning. The algorithm attempts to learn the utility (Q-value, quality of the action) of different actions in different scenarios. Once learning is complete, the algorithm picks the action with the highest q-value. - Counterfactual-Generation (CG): Explanations are statistical statements that measure what could happen. Participants see the probability of outcomes based on variables within the current state. After each scenario, participants answered and explained the following post-scenario questions: - If they used an XAI suggestion to help in their decision? This was quantitatively answered in a “Yes” or “No” format followed by an explanation of their decision. 19 - How much was the XAI suggestion used in their decision? This was quantitively answered using a 7-point Likert-scale (1: not at all, 7: it’s all I used). - Which XAI suggestion did they prefer? Participants selected their preferred Rationale-Generation technique for the scenario. This data was used in the analysis of if the complexity of a scenario can change a participants explanation preference. - How difficult it was for them to complete the scenario (i.e. select a choice)? This was quantitively answered using a 7-point Likert-scale (1: Easy, 7: Very Difficult). This data was used in the analysis in determining whether urgency in the context of complexity plays a factor in reliance. - Their thought process, and why they chose the decision they did. This was free response using a text box. This data along with audio was used in qualitative analyses. - Lastly, they were asked to rank each rational generation technique in order of preference along 5 perception dimensions. The following lists the perception dimensions along with the description of what they meant for the study: o Understandability o Confidence o Intelligence o Friendliness o Second chance (When an XAI fails to give a reasonable analysis of a scenario.) While the participants ranked the rationale generation techniques, they were asked to explain their reasons for the way they ranked each category. The ranking order was defined as follows, 1st being their most preferred and 4th being least preferred for each category. See Appendix 3 for images of the study medium used in the experiment. 20 Fig 1. Example Scenario from the Study. In this scenario “Alpha”, “Bravo”, “Charlie”, and “Delta” refer to the AD, RG, CG, and NR RGTs respectively. Screening Results The information collected from the questionnaire was used to create the between subject experiment: - 8 participants with formal or informal education in the subject of AI (AI Experience) - 8 participants without formal or informal education in the subject of AI (Without AI Experience) All participants were screened and scored using the questionnaire (see Appendix 1) to ensure measurable differences between the distinct groups in the context of AI background. The AI background group consisted of 8 adults residing in the US. On average, the study duration was 40.83 minutes. Participants received a US $20 Amazon gift card for their time. Participants reported an average education level of 5.0 (5 = Bachelor’s Degree). For the screening criteria, this group scored an average of 4.625 (out of 5) [SD = 0.518] on the knowledge test, self-reported “Moderate knowledge” on programming (Mean = 4.375, SD = .518) [4 = “Moderate knowledge”, 5 = “A lot of knowledge”] and 21 “Some knowledge” in AI (Mean = 3.625, SD = 0.744) [3 = “Some knowledge”, 4 = “Moderate knowledge”]. The non-AI background group consisted of 8 adults residing in the US. On average, the study duration was 39.53 minutes. Participants received a US $20 Amazon gift card for their time. Participants reported an average education level of 4.38 (4 = Associate’s Degree, 5 = Bachelor’s Degree). For the screening criteria, this group scored an average of 0.625 (out of 5) [SD = 0.518] on the knowledge test, self-reported “No knowledge” on programming (Mean = 1.0, SD = 0.0) [1 = “No knowledge”] and “No knowledge” in AI (Mean = 1.0, SD = 0.0) [1 = “No knowledge”]. To ensure these two groups were measurably different, I performed statistical tests. I chose the two-sample Mann-Whitney U Test to prove these groups were different based on the knowledge test scores. This test was used because the data doesn’t need to be normally distributed and my tests were based on the rank sum difference of the knowledge test scores of both groups. With this test, my alternative hypothesis was if the rank sum is significantly different between both groups, then the difference of AI knowledge of both groups is sufficient for them to be used in this study. I used a significance level of .05. After analysis I found a z-value of -3.361 and the p-value for a given z-value was .001 (.001 < .05) showing statistical significance in the difference of rank sums thus proving the alternative hypothesis. Additionally, no one in the AI group had a self-disclosed programming score lower than “Moderate knowledge” [= 4], nor did anyone have a self-disclosed AI score lower than “Some knowledge” [= 3]. Also, no one in the non-AI group had a self-disclosed programming score higher than “No knowledge” [=1], nor did anyone have a self-disclosed AI score higher than “No knowledge” [=1] as well as no one that had taken an AI class. These results indicate that the screening criteria had established two groups that were measurably different in terms of AI background. 22 Thus, I am convinced that I can confidently compare the two groups against each other. 23 Results Perception and AI Background Descriptive Analysis Initially, I did a descriptive statistical analysis of the overall perception data based on AI background. First, I chose to look into how both groups ranked each rationale generation technique, more specifically how the overall rankings were distributed to each rationale generation technique per rank represented in Figures 2 through 6. Figures 2 through 6 provide an initial look into how similar the rationale generation techniques were ranked. Secondly, I chose to investigate the distribution of votes each rationale generation technique received per perception dimension by participant group represented in Figures 7 through 11. Figures 7 through 11 gave me a different look into what was going on in Figures 2 through 6 based on the RGT instead of the rank. Commonly used acronyms in this section: - RGT: Rationale Generation Technique - AD: Action Declaring - NR: Numerical Reasoning - RG: Rational Generating - CG: Counterfactual Generation For the understandability dimension (Fig 2 & 6), it appears both groups were similar in the sense they felt the NR RGT was the least understandable. This quantitative data was confirmed by the qualitative analysis where most participants didn’t quite understand what the data values the NR RGT meant. The non-AI group was very similar in their votes for understandability with the CG RGT being the most understandable and NR RGT being the least. The AI group appeared to understand the RG technique 24 the most and the NR RGT the least. The AI group preferred the RG RGT because it stated an action and a justification for the action letting them know how the XAI arrived at its decision. The non-AI group preferred the CG RGT because it gave them valuable information to help in their decision without making them feel dictated by it. For the confidence dimension (Fig 3 & 8), it appears both groups were very similar in their rankings. The CG RGT looks to have the most confidence in both groups, followed by the RG RGT. A thing of note, the non-AI group still ranked the NR RGT least according to several statements made during the study due to the fact they did not fully know what the numerical values represented. Whereas with the AI group the AD RGT was ranked 3rd and 4th a higher percentage of the time than the NR RGT, according to qualitative analysis the reason for this was because they didn’t know how the XAI arrived at its answer because there was no justification. For the intelligence dimension (Fig 4 & 9), both groups were similar in their top 2 rankings with the CG RGT being first and the RG RGT being second. According to qualitative data, both groups felt the statistical probabilities the CG RGT gave vs the action with a justification the RG RGT gave made the XAI seem more intelligent. The non-AI group voted the NR RGT as the least intelligent because they didn’t fully know what the numbers were. However, the AD RGT was close to last because it simply stated an action. The AI group voted the AD RGT last for the same reasons as the non-AI group, there was no justification. For the friendliness dimension (Fig 5 & 10), both groups were similar in deciding the NR RGT was the least friendly because there were only numbers making it less human-like. The RG and CG RGT’s were considered the friendliest because they were the most human-like. However, the AI group ranked the RG RGT highest because of its conciseness over the CG RGT. Both groups ranked the AD RGT 3rd because even though it was human-readable simply stating an action seemed harsh. 25 For the second chance dimension (Fig 6 & 11), both groups ranked the CG and RG RGT ‘s 1st and 2nd. The CG RGT was first because they had a better sense of where the error came from because its explanations seemed more transparent than the RG RGT. The non-AI didn’t trust the NR RGT at all because they didn’t fully understand what the values represented thus it was ranked last. The AI group ranked the AD RGT last because it only states an action without justification and context whereas with the NR RGT you can derive what it’s trying to say and give you more context into what it’s thinking. Fig. 2. Understandability: Distributions of rankings for each RGT, in each dimension, per rank, separated by group (left: AI Group, right: Non-AI Group) 26 Fig. 3. Confidence: Distributions of rankings for each RGT, in each dimension, per rank, separated by group (left: AI Group, right: Non-AI Group) Fig. 4. Intelligence: Distributions of rankings for each RGT, in each dimension, per rank, separated by group (left: AI Group, right: Non-AI Group) 27 Fig. 5. Friendliness: Distributions of rankings for each RGT, in each dimension, per rank, separated by group (left: AI Group, right: Non-AI Group) Fig. 6. Second Chance: Distributions of rankings for each RGT, in each dimension, per rank, separated by group (left: AI Group, right: Non-AI Group) 28 Fig. 7. Understandability: Distributions of rankings per RGT, in each dimension, separated by group (left: AI group, right: non-AI group) \ Fig. 8. Confidence: Distributions of rankings per RGT, in each dimension, separated by group (left: AI group, right: non-AI group) 29 Fig. 9. Intelligence: Distributions of rankings per RGT, in each dimension, separated by group (left: AI group, right: non-AI group) Fig. 10. Friendliness: Distributions of rankings per RGT, in each dimension, separated by group (left: AI group, right: non-AI group) 30 Fig. 11. Second Chance: Distributions of rankings per RGT, in each dimension, separated by group (left: AI group, right: non- AI group) Table 1. Rank Summary of RGTs along the 5 Perception Dimensions This table summarizes the ranking of RGTs based on the percentage of votes given per rank for each dimension separated by group. Rank order similarities between groups are highlighted in yellow. Chi-Square Analysis I performed an analysis using the data taken from the perception rankings by each participant along the 5 perception dimensions (understandability, confidence, intelligence, friendliness, and second chance) and divided them by the established groups based on AI background. This data was then used to perform a chi-square analysis between the groups to see if there’s a relationship between AI background and perceptions of rationale generation techniques. Rank AI Group Non-AI Group AI Group2 Non-AI Group3 AI Group4 Non-AI Group5 AI Group6 Non-AI Group7 AI Group8 Non-AI Group9 1 RG CG CG CG CG CG RG RG CG CG 2 CG RG RG RG RG RG CG CG RG RG 3 AD AD NR AD NR AD AD AD NR AD 4 NR NR AD NR AD NR NR NR AD NR Perception Dimensions Understandability Confidence Intelligence Friendliness Seccond Chance 31 I did the chi-square analysis for each RGT per perception dimension, the number of votes received per rank in a given perception was one dimension and the AI background was the other. I did this to understand if the difference in perception of a given RGT in a given perception dimension was statistically significant enough to state the AI background of a participant effects the way the RGT is perceived. The null hypothesis for the chi-square analysis was the difference in perception is not significant meaning both groups perceived the given RGT the same in the given perception dimension and the AI background of a participant didn’t affect the way the RGT was perceived. The alternative hypothesis for the chi-square analysis was, the difference in perception is significant meaning both groups perceived the given RGT differently in the given perception dimension and the AI background of a participant effected the way they perceived the RGT. For this test I used the critical chi-square value for a given alpha of .05 for the comparison. This test used a critical chi-square value of 7.815. The following paragraphs explain the results of the chi-square analysis for each rationale generation technique along the 5 perception dimensions along with a qualitative analysis giving the reasons behind the results. Refer to table 2 for the overall results of this analysis. Action Declaring (AD) For the understandability dimension the null hypothesis was rejected [9.87 > 7.815] thus accepting the alternative hypothesis. Both groups perceived the AD RGT differently in terms of understandability. The AI group understood it more because of its conciseness whereas the non-AI group understood it less because there was no justification with the action. For the confidence dimension the null hypothesis was rejected [16.422 > 7.815] thus instating the alternative hypothesis to be true. The AI group had significantly less confidence in the AD RGT because there was no way to understand how the XAI came to its conclusion and felt it was more dire when one doesn’t know how an XAI arrives at a conclusion vs trusting in numbers one doesn’t fully comprehend. The non-AI group had 32 more confidence in the AD RGT because even though there was no understanding of how the AD RGT came to its conclusion they still knew what it meant vs they didn’t fully understand what the NR RGT message meant. For the intelligence dimension the null hypothesis was not rejected [6.298 < 7.815]. Both groups voted the AD RGT similarly, but for different reasons. The AI group voted the XAI to be the least intelligent because it simply stated an action without context, but the non-AI group voted it to be a little more intelligent simply because it was more humanlike than just numbers. For the friendliness dimension the null hypothesis was not rejected [2.506 < 7.815]. Both groups voted the AD RGT similarly and for the same reason being it was better than the NR RGT because it was more humanlike than numbers. Finally, for the second chance dimension the null hypothesis was rejected [17.633 > 7.815] thus instating the alternative hypothesis to be true. The AI group had significantly less trust in the AD RGT than the NR RGT because they didn’t know how it arrived at its conclusion making them less willing the give the XAI a second chance. Whereas the non-AI group trusted it more than the NR RGT because it was more humanlike. Numerical Reasoning (NR) For the understandability dimension the null hypothesis was not rejected [4.995 < 7.815]. Both groups didn’t fully understand what the numbers represented making the vote distribution similar. For the confidence dimension the null hypothesis was rejected [14.153 > 7.815] thus instating the alternative hypothesis to be true. Reason being the AI group had more faith in numbers than the non-AI group did. For the intelligence dimension the null hypothesis was not rejected [5.629 < 7.815]. In this case both groups were similar in not having confidence in a message they didn’t fully comprehend. For the friendliness dimension the null hypothesis was not rejected [0.503 < 7.815]. Both groups were the same in stating numbers were not friendly as they’re not humanlike. For the second chance 33 dimension the null hypothesis was rejected [10.996 > 7.815] thus instating the alternative hypothesis to be true. The reason for this is the AI group had significantly more faith in numbers than the non-AI group. The non-AI group was less willing to give something they didn’t understand a chance whereas the AI group was more willing to do so. Rationale Generation (RG) The null hypothesis was not rejected for any of the perception dimensions. The results were as follows understandability [6.552 < 7.815], confidence [4.819 < 7.815], intelligence [0.687 < 7.815], friendliness [7.754 < 7.815], second chance [3.291 < 7.815]. Both groups felt the RG RGT was understandable because of its humanlike message, conciseness, and justification for the action it chose. This XAI was only second to the CG RGT for the same reasons in terms of confidence, intelligence, and second chance. However, the AI group perceived the RG RGT to be the friendliest because the CG RGT wouldn’t specify an action it would only give statistical probabilities and they preferred the deterministic approach of the AD RGT. Counterfactual Generation (CG) For the understandability dimension the null hypothesis was rejected [16.071 > 7.815] thus instating the alternative hypothesis to be true. The non-AI group were all similar in voting this XAI to be the most understandable because it was the most humanlike among the other rationale generation techniques. The AI group felt the CG RGT needed to be more concise and deterministic when it came to understandability. For all other perception dimensions the null hypothesis was not rejected. The results were as follows confidence [2.793 < 7.815], intelligence [2.291 < 7.815], friendliness [2.998 < 7.815], second chance [6.412 < 7.815]. Both groups had the most confidence in the CG RGT because of the statistical probabilities it gave for each outcome, and felt it was the most intelligent for the same reason. Both groups felt the CG RGT was too “wordy” at times making it less friendly than the RG RGT. 34 However, both groups felt the transparency and human likeness of the CG RGT was more worthy of a second chance than the RG RGT. Table 2. Perception and AI background chi-square results Rejected null hypothesis cells highlighted in yellow This analysis showed AI background doesn’t play a significant role in the way XAI messages are perceived. Out of the 20 pairwise comparisons made between groups only 6 proved to be significantly different (refer to table 2). For most of the time the XAI messages of these RGTs were perceived the same within and between groups meaning they were thought of in the same way regardless of AI background. Preference and AI Background Commonly used acronyms in this section: - RGT: Rationale Generation Technique - AD: Action Declaring - NR: Numerical Reasoning - RG: Rational Generating - CG: Counterfactual Generation Perception Dimension AD NR RG CG Understandability 9.87 4.995 6.552 16.071 Confidence 16.422 14.153 4.819 2.793 Intelligence 6.298 5.629 0.687 2.291 Friendliness 2.506 0.503 7.754 2.998 Second Chance 17.633 10.996 3.291 6.412 Perception and AI Background Chi-Square Results 35 Descriptive Analysis The next thing I explored was whether the preference of rationale generation technique was affected by the AI background of a participant. The data used in this section was taken from the portion of the study where participants selected their most preferred RGT after each scenario. I performed descriptive analysis to see what the distribution of votes looked like for each RGT (refer to Fig 12 for a visual representation of this information). Based on Fig 12, I can initially see that 1: The distribution of preference votes is very similar and 2: The CG RGT is the overwhelming favorite between both groups. This descriptive analysis leads me to believe AI background has no effect on the preference of RGT. The following section tests this hypothesis to see if there’s a significant relationship between the two variables. Fig 12. Distribution of Preference Votes X-axis: Rationale Generation Technique, Y-axis: Percentage of Preference Votes Given Chi-Square Analysis I performed an analysis using data taken from the study where participants selected their preferred rationale generation technique after each scenario. I then counted how many times each RGT was selected as preferred and divided these numbers according to the group criteria. This left me with a data model 36 consisting of the number of times each RGT was selected as preferred separated by group. I then performed a chi-square analysis between the groups to see if there was a relationship between preference and AI background with the RGT preference being one dimension and AI background being the other. For this test I used the critical chi-square value for a given alpha of .05 for the comparison. This test used a critical chi-square value of 7.815. The null hypothesis for the chi-square analysis was the difference in preference is not significant meaning preference of RGT is not affected by a participants AI background. The alternative hypothesis for the chi-square analysis was the difference in preference is significant meaning preference of RGT is affected by a participants AI background. The results of the chi-square analysis did not reject the null hypothesis [5.296 < 7.815]. This proved my hypothesis was correct, the distributions were very similar. Qualitatively, both groups preferred the RGTs for the same reason per scenario. What this meant is preference depends on the scenario vs the AI background of the participant. In other words, AI background doesn’t influence the preference of RGT. The following section explores the RGT preference within each group to see how significant the preference was between each RGT to statistically determine if one RGT was preferred overall per group. Dependent T-Test I performed an analysis using data taken from the study where participants selected their preferred rationale generation technique after each scenario. I then separated all data by group to create two data sets consisting of preferential data. For each group, I divided the number of times each RGT was selected as preferred by scenario. For the individual groups, I performed a within group pairwise comparison of each RGT using a dependent samples t-test to see if there was statistical significance of an RGT being preferred over another and which RGT the group preferred between the two. In this test the left side 37 determined the test statistic sign, if the left mean was lesser then the right the test statistic would be negative and if the left mean was greater than the right the test statistic would be positive. In this case the greater mean signifies a higher preference since we’re dealing with counts per question summed by RGT. This test had an alpha of .05. The dependent samples t-test was performed because these within group comparisons consisted of data taken from the same participants answering the same question 6 times throughout the study. The following paragraphs quantitatively explain the results per group with a qualitative explanation to understand the reasons why. Table 3 visualizes the t-test results. AI Group - The AI group showed no significant preference between the AD and NR RGTs. The results were close to being significant because referring to Fig. 12 I can see the NR RGT was favored for a higher percentage of the time due to the AI groups faith in numbers. However, the RG RGT won over the AD RGT reasons being they didn’t trust the AD RGT because there was no justification with the action and the RG RGT proved to be more useful with the justification for the action giving the participants more insight into the scenario they were analyzing. The CG RGT won over the AD RGT as well for the same reasons as the RG RGT to won, but the group liked the CG RGT especially because it gave them statistical probabilities of multiple use cases giving them even more insight into the scenario they were analyzing. Between the NR and RG RGT there was no statistical significance for preference; it was close so it’s notable. The reason for this is most in the AI group didn’t know what the q-values meant initially but began to understand them as the study progressed and some liked how concise the NR RGT was for certain scenarios. The RG RGT still received a higher percentage of preference votes though (Fig 12). Between the NR and CG RGTs the CG RGT won for the same reasons it won over the AD RGT. There was no statistically significant preference detected between RG and CG. For most scenarios the CG RGT was preferred for its statistical probabilities to give insight into a scenario, but for scenarios involving 38 more uncertainty and dire consequences they preferred an action with a justification, so both were useful in different circumstances. According to Table 3, the CG RGT was preferred overall by the AI group. This data is supported by fig 12 and qualitative analysis performed on audio recorded from the study. Non-AI Group – The non-AI group showed no preference between the AD and NR RGTs, both weren’t trusted by this group. The AD RGT claimed a higher percentage of the votes (Fig 12) simply because it was more humanlike. Between the AD and RG RGTs there was no statistically significant preference because for this group while analyzing a scenario that dealt with classification, they wanted an action in a concise format and some liked how the AD RGT told them what to do without justification while others wanted the action with justification. The CG RGT won over the AD because it was more helpful and provided more insight into the scenario the participants were analyzing. Between the NR and RG RGT there was no statistical significance for preference. Most in the non-AI group didn’t know what the q-values meant initially but began to understand them as the study progressed and some liked how concise the NR RGT was for certain scenarios. The RG RGT still received a higher percentage of preference votes (Fig 12) and was considered less intimidating than numbers. The CG RGT won over the NR RGT for the same reasons it won over the AD RGT. Between the RG and CG RGTs the CG RGT won because it gave participants statistical probabilities of the whole scenario vs a single view consisting of an action with a justification. This created more trust with the participants and caused them to gravitate towards these messages for multiple scenarios. According to Table 3 the CG RGT was most preferred overall by the non-AI group. This data is supported by Fig 12 and qualitative analysis performed on audio recorded from the study. 39 Table 3. Summary of p-values for pairwise comparisons, showing which RGT was preferred Significant p-values highlighted in yellow with preferred RGT in parenthesis. This analysis explored the pairwise comparison between RGTs to determine if there was an RGT that was preferred more than the others. Table 3 highlights which RGTs were significantly more preferred in the pairwise comparisons. In addition, it shows which RGT was the more preferred between the two being compared. Conclusively, the CG RGT was preferred overall between both groups being the one to be most preferred out of all but one of its comparisons. Furthermore, this further drives the point that AI background doesn’t play a significant factor in RGT preference because of how similar the comparison results are between groups. Independent T-Test I analyzed preference and AI background to understand if there was a relationship between AI background and the preference of each RGT (e.g., see if the AI group preferred the AD RGT more than the non-AI group did). I performed this analysis using data taken from the study where participants selected their preferred rationale generation technique after each scenario. I separated all data by group to create two data sets consisting of preferential data. For each group, I divided the number of times each RGT was selected as preferred by scenario. I performed an analysis using an independent samples t-test using the target RGT column from both data sets to see if a group had preferred the target RGT more than the other and is there a difference of preference based on AI background. RGT Preference Comparison AI Background Non-AI Background AD vs NR 0.051 0.245 AD vs RG 0.011 (RG) 0.29 AD vs CG 0.002 (CG) 0.004 (CG) NR vs RG 0.053 0.072 NR vs CG 0.004 (CG) 0.004 (CG) RG vs CG 0.076 0.036 (CG) Preference Within Group Pairwise Comparison 40 In this test the left side determined the test statistic sign, if the left mean was lesser then the right the test statistic would be negative and if the left mean was greater than the right the test statistic would be positive. In this case the greater mean signifies a higher preference since we’re dealing with counts per question summed by the target RGT. This test had a significance value set to .05. The independent samples t-test was performed because this between group comparison consisted of data taken from two different groups answering the same question 6 times throughout the study. The following paragraph quantitatively explains the results with a qualitative explanation to understand the reasons why. Table 4 visualizes the t-test results. Table 4. Summary of p-values, showing between group preference of RGTs Significant p-values highlighted in yellow The only RGT that proved to hold significant difference in the level of preference between groups was the AD RGT. The AD RGT showed to be significantly more preferred by the non-AI group than by the AI group. Reason being, the AI group found it to be less trustworthy without justification for the action and the non-AI group found it useful for certain scenarios thus creating a larger gap in preference. The NR, RG, and CG RGTs showed no significant difference in preference. This test showed the only difference in preference between the groups was with the AD RGT. Furthermore, it proves there’s no significant difference in preference between groups meaning AI background doesn’t serve as a significant factor in RGT preference. RGT p-value Group Preferred AD 0.022 Non-AI NR 0.36 RG 0.24 CG 0.436 Preference Between Groups 41 Complexity and Preference Commonly used acronyms in this section: - RGT: Rationale Generation Technique - AD: Action Declaring - NR: Numerical Reasoning - RG: Rational Generating - CG: Counterfactual Generation The next item I explored was to see if the complexity of a scenario influenced RGT preference. Initially I performed an overall descriptive analysis to see how each scenario was rated and then I performed within and between analyses to see if there was a statistical relationship between the complexity and RGT preference. Descriptive analysis After completion, participants rated the difficulty of each scenario from 1 to 7 [1 = easy, 7 = very difficult]. This will be referred to as the complexity rating. Fig. 13 shows the mean and Fig. 14 shows the standard deviation of the complexity ratings for each scenario. The difference in complexity ranges from a mean of 2.75 to 4.62, with a 26.71% variability between the two. Secondly, note that the standard deviation of ratings is large considering the scale to be from 1 to 7. This meant the complexity ratings for each scenario had a lot of variability which made overall complexity a difficult variable to measure. However, I continued with my analysis to see if a statistical relationship could be made between it and RGT preference. 42 Fig 13. Average Complexity Ratings per Scenario X-axis: Scenario Number, Y-axis: Mean of difficulty ratings Fig 14. Standard Deviation of Complexity Ratings per Scenario X-axis: Scenario Number, Y-axis: Standard deviation of difficulty ratings 43 Chi-Square Analysis I performed an analysis to see if there was a relationship with the complexity rating of a scenario and the preferred RGT. I performed this within groups and between groups. This was done using data taken from the study with one column being the complexity rating given to each scenario and the corresponding preferred RGT being the other and a cross table was generated from the data. Relationship was then tested using a Chi-squared analysis technique to determine statistical significance between the two variables. The level of significance for this test was set to .05. The AI group didn’t show a relationship between complexity and preferred RGT [chi = 23.451, p = 0.174]. The non-AI group didn’t show a relationship between complexity and preferred RGT [chi = 11.709, p = 0.701]. As well as the overall study didn’t show a relationship between complexity and preferred RGT [chi = 18.138, p = 0.447]. From the qualitative analysis, some scenarios were easier while at the same time more difficult for others explaining the variability shown in Fig 13 & 14. This test was unable to prove a relationship between the difficulty of a situation and preference of RGT. Because of the variability in overall complexity scores, this test is considered inconclusive. Bias and Reliance The next variable I explored was whether the AI bias of a participant influenced their reliance on XAI. This was explored initially overall to see what the bias of participants looked like and then I performed a correlational analysis for both between and within groups to see if there was a statistical relationship between bias and reliance. The data for this section was taken from the questionnaire where it asked the participant to disclose how much they trusted AI and the self-disclosed numerical value participants selected after each scenario indicating how much they used the XAI in their decision. 44 Descriptive Analysis Participants were asked to rate how much they trusted AI [1 = no trust, 7 = full trust] and how useful they felt it was [1 = not at all, 7 = extremely useful]. The overall AI trust scores were as follows, a mean of 4.062 with a standard deviation of 1.731. The high trust scores consisted of 43.75% of the votes and the low trust scores consisted of 56.25% of the votes, meaning this sample had less trust in AI. The overall feeling of AI usefulness scores was as follows, a mean of 4.375 with a standard deviation of 1.544. The high usefulness scores consisted of 62.5% and the low usefulness scores consisted of 37.5% of the votes. The sample used for this study have lower trust in AI but feel it’s useful. These results indicate the sample used in the study represent a mixture of those that trust AI and feel it’s useful along with those that don’t trust AI and don’t feel it’s useful. The mixed variety of bias shows the sample represents multiple mindsets instead of just one that leans one way or the other. Correlational Analysis The data for this correlational analysis came from the XAI level of use (reliance) value from each scenario along with the participants self-disclosed AI level of trust (initial bias) value taken from the questionnaire. This was then used to perform an independent samples t-test within groups and between groups with the initial bias being the independent variable and the dependent variable being the level of use per scenario. The AI group showed bias had no effect on reliance [p = 0.696]. The non-AI group showed bias had no effect on reliance [p = 0.869]. The overall result showed bias had no effect on reliance [p = 0.879]. From the observations during the study and from qualitative analysis of audio, bias wasn’t the reason any of the participants chose to use or not use the XAI to aid in their decisions. 45 RGT Preference and Reliance I explored if the preference of a rationale generation technique influences how much it is relied upon to make decisions. The data in this section is taken from the portions of the study after each scenario asking the participant to indicate if they used the XAI in their decision (XAIHelp), which XAI they preferred (XAIPreferred), how much did they use the XAI in their decision (XAIUse). I performed the following analyses: analysis using chi-square to see if there was a relationship between the XAIPreferred and XAIHelp, one-way ANOVA to see if an RGT inspired more reliance than others and if it’s effected by background, two-way ANOVA to understand the interaction between XAIPreferred and AI background on XAIUse, and lastly a linear regression to see if there’s a relationship between XAIHelp XAIPreferred and XAIUse. Commonly used acronyms in this section: - RGT: Rationale Generation Technique - AD: Action Declaring - NR: Numerical Reasoning - RG: Rational Generating - CG: Counterfactual Generation Variable Definitions: - XAIHelp: Value indicating whether they used the XAI suggestions in their decision (yes = 1, no = 0) - XAIPreferred: preferred rationale generation technique suggestion - XAIUse: Value indicating how much the XAI suggestions were used in their decision from 1 to 7 (1: not at all, 7: it’s all I used) 46 Chi-Square Analysis The data used in this analysis came from the preferred RGT selected after each scenario along with whether the RGT was used in the decision (yes = 1, no = 0). This was then placed into a cross table indicating the number of times each RGT was selected as preferred and the number of times it was used or not used. This data was then utilized for the chi-square analysis. The significance level for this test was set to .05. The results of the chi-square are as follows: The AI group showed RGT preference had a significant effect on reliance [chi = 18.190, p = 0.0004], the non-AI group showed RGT preference had a significant effect on reliance [chi = 21.452, p = <0.0001], and overall the results showed RGT preference plays a significant role in reliance [chi = 8.616, p = 0.034]. The result of this analysis shows a very strong relationship between the overall preference of an RGT and the level of reliance on it. The qualitative analysis supports the findings as participants would heavily rely on the XAI they preferred the most for multiple scenarios. This analysis proves the more an RGT is liked the more a person is willing to allow it to influence their decision. In addition, they are significantly more willing to allow this to happen in multiple occasions. One-Way ANOVA The data used in this analysis came from four datasets (one for each RGT) containing each time the RGT was selected as preferred along with the XAIUse value. These four datasets were then concatenated together and used in a one-way ANOVA with an alpha set to 0.05. First, I tested if the overall preference data influenced XAIUse. Secondly, I did a pairwise comparison to see if an RGT inspired more reliance than the others. The overall results of the one-way ANOVA came back stating the means were significantly different [p = <0.0001]. The pairwise one-way ANOVA comparisons are shown in Table 5. The analysis of the pairwise results shown in Table 5 are as follows: the means between the AD and NR RGTs are not significantly 47 different and using the comparison between the AD and NR RGTs as reference you can see the comparison between the AD and RG RGTs is significantly different, but the p-value is still .00003 (e05) which is significantly higher than the p-value given by the AD and CG comparison which was p = <0.0001 (e24). This shows how much more the CG RGT influenced reliance than the RG RGT. Additionally, the comparison between the RG and CG RGTs calculated a p-value of <0.0001(e07). What this means is even though the RG RGT did in fact influence a significant amount of reliance, the CG RGT still influenced it significantly more than the RG RGT did. In other words, there’s a positive relationship between the preference of an RGT and the amount an individual is willing to rely upon its suggestions. Fig 15 and Fig 16 support the findings from the one-way ANOVA analysis in the sense that the more the RGT is preferred the more it’ll be relied upon. Fig 16 shows the CG RGT was preferred significantly more than the others and Fig 15 shows the CG RGT had more density in the higher values in level of use than the RG, AD, and NR RGTs. This means the CG RGT was the most preferred and it was relied upon the most. In contrast, Fig 16 indicates the AD RGT was the least preferred and the density chart (Fig 15) shows it had more density in the lowest values of level of use than all other RGTs proving it was used the very least. These findings suggest the more an RGT is preferred the more relied upon it should be and most importantly that preference is directly correlated to reliance Table 5. One-Way ANOVA, Pairwise Comparison Preference and Reliance Significant p-values highlighted in yellow RGT Comparison p-value AD vs NR 0.16147 AD vs RG 0.00003 AD vs CG < 0.0001 NR vs RG 0.01029 NR vs CG < 0.0001 RG vs CG < 0.0001 1-ANOVA Pairwise RGT Preference and Reliance 48 Fig 15. Density chart, Level of Use By RGT X-axis: XAI Level of Use, Y-axis: Kernel Density Estimate Figure 16. Distribution of Preference Votes by RGT X-axis: Rationale Generation Technique, Y-axis: Percentage of Votes Two-Way ANOVA Following the one-way ANOVA analysis, I performed a two-way ANOVA to see the interaction effect between XAIPreferred and the AI background of a participant on the level of XAIUse. This analysis used the same data used in the one-way ANOVA, but with the group indicator [AI = 1, non-AI = 0] added as a dimension. With this data I performed a two-way ANOVA with XAIPreferred and AI background as the independent variables and XAIUse as the dependent variable. This test used a significance level of 49 0.05. The p-value associated with the AI backgrounds effect on XAIUse was 0.136, meaning there’s no significance between AI background and reliance. Further, the p-value for the interaction effect of AI background and XAIPreferred on XAIUse was 0.150, meaning there is no significant interaction between AI background and the preferred RGT on the level of use. Linear Regression To understand the strength of the correlation between the XAIPreferred and XAIUse variables I performed a linear regression with both variables added as dimensions in the model. This model consisted of data collected from both groups. The data used in this model came from the preferred RGT selected after each scenario (XAIPreferred) where the participant indicated they used the XAI in their decision (XAIHelp == 1). This data was used for the x-axis. Additionally, the y-axis data came from the disclosed level of XAI use in decision-making (XAIUse) where the participant indicated they used the XAI in their decision (XAIHelp == 1). This data being matched to the corresponding XAIPreferred selected for each scenario. The model was trained from 40% of the raw data, this resulted in the most accurate model for this dataset. The results of the linear regression model came back with an R2 of 0.100 and an adjusted R2 of 0.037. The model was unsuccessful in fitting the data. Even though there’s a strong correlation between XAIPreferred and XAIUse, it appears the model needs more data to build a successful predictive function. In addition to building a model based on the XAIPreferred and XAIUse variables, I decided to add AI background as a dimension to see if this variable would improve the model. For this three-dimensional model I added AI background as a dimension to the x-axis of the model I trained previously. The results of this linear regression model came back with an r2 of 0.103 and an adjusted r2 of 0.017. This model was unsuccessful as well in fitting the data. My conclusion is, both models need more data before a successful predictive function can be generated. 50 Complexity and Reliance For this portion of the study, I wanted to analyze how the complexity of a scenario might have influenced the amount a participant relied on XAI suggestions. First, I performed a chi-square analysis between complexity and reliance to see if the relationship was statistically significant. Secondly, I trained a linear regression model to see how strong the relationship was between the two were. Additionally, I added AI background as a third dimension to the linear regression model to explore its relationship with complexity and reliance. Chi-Square Analysis The data used in this analysis came from the difficulty rating (QuestionDifficulty) and level of XAI use (XAIUse) values disclosed by participants after completing each scenario. A cross table was then generated from this data. QuestionDifficulty ranged from 1 to 7 and XAIUse ranged from 1 to 7. The cross table gave a pairwise count of each unique value. The data in the cross table was then used to perform a chi-square analysis. The alpha was set to 0.05 for this test. The results of the test were a chi-score of 161.010 with a given p-value of <0.0001. These results indicate there’s a strong relationship between complexity and reliance. Qualitative analysis supports this finding because as participants analyzed difficult scenarios, more specifically scenarios involving uncertainty with dire consequences, they relied heavily upon XAI suggestions in decision-making. This showed that in difficult situations participants needed someone to support their decision and the XAI played that role. Additionally, difficult situations made participants rely on XAI suggestions even more. This analysis shows there’s a positive correlation between the difficulty of a situation and reliance on XAI suggestions. 51 Linear Regression The data used in this analysis came from the difficulty rating (QuestionDifficulty) and level of XAI use (XAIUse) values disclosed by participants after completing each scenario. The QuestionDifficulty variable was used for the x-axis, and the XAIUse variable was used for the y-axis. The results of this two-dimensional linear regression model came back with an R2 of 0.785 and an adjusted R2 of 0.768 (Fig 17 visualizes the results of the two-dimensional linear regression model). These findings indicate the predictive function fit the data successfully. A strong correlation was found between QuestionDifficulty and XAIUse along with a successful model built. Fig 17 shows a positive correlation between QuestionDifficulty and XAIUse meaning the more difficult participants found a scenario to be the more XAI suggestions were relied upon. This model along with the chi-square analysis done previously show how strong the correlation is between complexity and reliance. In addition to building a model based on complexity and reliance, I decided to add AI background as a third dimension to understand the relationship between all three. I used the same dataset from the two-dimensional model for the three-dimensional one, but with AI background added to the x-axis of the model I trained previously. The results of this three-dimensional linear model came back with an R2 of 0.821 and an adjusted R2 of 0.791 (Fig 18 visualizes the results of the three-dimensional linear regression model). Between the two-dimensional model (QuestionDifficulty, XAIUse) and the three-dimensional model (Question Difficulty, XAIUse, AIBackground) the difference in the R2 values is 0.036. Even though the three-dimensional model proved to be successful these results indicate AI background had minimal effect on the model in terms of improving the accuracy of it. Further showing AI background serves as a very small factor in influencing people to use XAI suggestions and that complexity is a driving factor in reliance. 52 Fig 17 Linear Regression Model of QuestionDifficulty, and XAIUse X-axis: QuestionDifficulty, Y-axis: XAIUse Fig 18 Linear Regression Model of QuestionDifficulty, AIBackground, and XAIUse X-axis: QuestionDifficulty, Y-axis: XAIUse 53 AI Background and Reliance The last variable I explored was the relationship between AI background and reliance on XAI (XAIUse) suggestions. To explore this quantitatively I performed an analysis between the two to see if there was a relationship and trained a linear regression model to test how strong the relationship was. The data for this section came from the XAIUse values disclosed by participants after completing each scenario being one dimension along with their group indicator [AI = 1, non-AI = 0] being the other dimension. The results in this section found no statistical significance between AI background and reliance proving AI background doesn’t influence reliance on XAI suggestions. Chi-Square Analysis The AI background and XAIUse data was placed into a cross table to give a pairwise count of each unique value, with AI background being 0 and 1, and XAIUse being 1 to 7. This cross table was then used to perform a chi-square analysis with an alpha of 0.05. The results of the chi-square returned a chi-score of 4.693 with a given p-value of 0.454. This indicated there’s no statistically significant relationship between AI background and reliance. Qualitatively this is supported as all participants relied more on XAI suggestions when scenarios were difficult regardless of AI background. Linear Regression For this model the group indicator [AI = 1, non-AI - 0] (AI Background) was used for the x-axis, and the XAIUse values were used as the y-axis. The results of this two-dimensional linear regression model came back with an R2 of 0.011 and an adjusted R2 of -0.065. These results strongly suggest there is no significant relationship between the AI background of a participant and the amount they rely upon XAI suggestion to make decisions. 54 Conclusion My research question is to explore if the Rational Generation Techniques (RGTs) of an XAI, the complexity of a situation, initial bias towards AI, and AI background of a participant influence a person’s reliance to use XAI for a particular situation and in what ways do these variables affect their thought processes and overall decision. The study I conducted consisted of 16 participants separated into two distinct groups, those with an AI background and those without (AI-group, non-AI group), based on a strict screening criterion to ensure the groups were statistically different. These participants then analyzed 6 scenarios of varying themes and difficulty of which they needed to make a decision on choosing from a pre-determined list of choices. For each scenario the participants analyzed they had the help of four fabricated XAI messages deriving from four different RGTs (Action-Declaring (AD), Numerical-Reasoning (NR), Rationale-Generating (RG), Counterfactual-Generation (CG)) of which they could use to aid in their decision-making process. These XAI suggestions were arbitrarily named “Alpha”, “Bravo”, “Charlie”, and “Delta” and the order of these suggestions changed for each scenario. After each scenario participants then indicated whether they used the XAI suggestions in their decision (yes or no) and which rationale generation technique (RGT) they preferred (Alpha, Bravo, Charlie, Delta) along with how much the XAI suggestion was used in their decision (1: not at all, 7: it’s all I used). They also indicated how difficult the scenario was for them to complete (1: easy, 7: very difficult). Lastly, participants ranked the RGTs (1: top choice, 4: lowest choice) along 5 perception dimensions (understandability, confidence, intelligence, friendliness, second chance). The data from this study was then used to explore my research question. Initially, I investigated how AI background may change the way an individual might perceive XAI suggestions, and I found that it does but only to a small degree. More specifically, the only statistical significance found with AI background and perception of XAI suggestions was in the understandability 55 perception dimension pertaining to the AD RGT. The way the AD RGT was perceived was the most effected by AI background, and this was because of the prior knowledge the AI group had about AI. The increased experience with AI also led the participants to have more trust in the NR RGT because of their increased faith in numbers than the non-AI group. Additionally, AI background was explored to see how it may affect the preference of RGTs as well as affecting the level of reliance on XAI suggestions. AI background proved to have no statistical significance on reliance of XAI suggestions and preference of RGTs. Both AI and non-AI groups behaved similarly in terms of reliance and preference of RGTs. These findings suggest AI and non-AI experts perceive XAI suggestions the same regardless of the technique used to generate the messages and they both rely upon XAI suggestions the same. The initial bias a participant had towards AI had no statistical significance in a participant’s use of XAI suggestions either. All participants regardless of initial bias were willing to use the XAI suggestions when they needed the help. Thus, leading me to conclude the only thing bias affects is the amount of time it takes for participants to use the XAI suggestions and not effecting the amount they use them. This means no matter the bias one holds towards AI; people will still rely upon XAI suggestions the same amount if they need the help. The Rationale Generation Techniques of an XAI played a pivotal role in preference. With the Counterfactual Rationale Generation Technique being the most preferred among the four used in this study. The CG RGT garnered a lot of trust from all participants because of the statistical probabilities of multiple use cases, and humanlike message it gave. This made participants feel more informed, and less forced to either choose the action an XAI suggests or find another solution without the help of an XAI suggestion. In other words, they were given a wider view with the CG RGT rather than being pushed in a single direction with the AD, RG, and NR RGTs. Along with the increased trust, this preference also increased the willingness of participants to rely on it in decision making thus creating a strong correlation 56 between preference and reliance on RGTs. People prefer to use an XAI that gives them a probability of multiple outcomes rather than a definitive one. In addition, if people like the XAI they’re working with they will keep using it for multiple occasions and will heavily rely on this XAI’s messages when help is needed in decision-making. Even though the complexity of a scenario didn’t influence the preferred RGT, it did heavily influence the level of reliance in the sense the more complex a scenario was, the more an XAI was relied upon. Keeping in mind the most preferred and trusted RGT during the study was the CG RGT we can deduce that the more complex a scenario was, the most preferred and trusted AI would be used heavily which is why the correlation between complexity and reliance was very strong and positive. This study found the more complex a situation is, the more XAI will be used to aid in decision-making and the most preferred/trusted XAI will be the one used in this case. AI background and bias have little to no affect in these circumstances, it’s purely situational and preference based. If a user is given an XAI suggestion like the CG RGT it will be used for most situation types. In addition, no matter the AI background people will perceive XAI messages similarly and will only use them if they prove useful for the situation. People will decide if they need the help in decision-making, and then will decide if an XAI suggestion is useful based on their perception. If the XAI suggestion proves useful, and based on how difficult the situation is, they will use it accordingly. This use case is purely situational as situations differ in complexity and theme. One RGT may be useful for one scenario but can also prove inefficient for another. Complexity is the driving factor in people’s reliance to use XAI suggestions and a trusted/preferred XAI will heavily influence their decisions. These findings are here to further research and improve our understanding of how XAI is used in the decision-making process and which factors drive people to use it. This research can help AI experts create XAI that encourages people to use critical thinking or blindly follow what they want. This research 57 can also help non-AI experts understand the psychological factors involving the use of XAI. No matter the use, because of the increased use of XAI it must be studied in all aspects. 58 Appendix 1. Questionnaire Initial Bias In General, how much do you trust AI? (1: No Trust 7: Full Trust) 1 2 3 4 5 6 7 In General, how useful do you feel AI is in everyday life? (1: not at all 7: Extremely Useful) 1 2 3 4 5 6 7 Education Level / Subject of Study What is the highest degree or level of education you have completed? a) Some High School b) High School c) Bachelor’s Degree d) Master’s Degree e) Ph.D. or higher f) Trade School If your education level is higher than high school, what was/is your subject of study? Knowledge Test (1) What would be the output of the following python program? name = "Peter" print("Hello " + name) (a) Peter (b) Hello Peter 59 (c) Hello + Peter (d) "Hello" + name (2) What would be the output of the following python program? numbers = [2, 4] for i in range(len(numbers)): print(numbers[i] + i) (a) 2 5 (b) 2 5 8 (c) 2 4 (d) 2 4 10 (3) Which of the following is an unsupervised learning task? (a) Distinguishing pictures containing cats from pictures not containing cats (b) Flagging text messages as appropriate or inappropriate (c) Divide data points into different clusters without any labels available (d) Predict the value of a house after training on a dataset with house features and values (4) What is the general goal of reinforcement learning? (a) Maximize potential or expected punishment (b) Maximize potential or expected reward (c) Get to the goal as soon as possible (d) Avoid the most obstacles in any given state (5) In MDPs, the Markov assumption is that: (a) The current state is independent of all other states (b) The current state depends only on the history of previous states and actions (c) The current state depends on the full sequence of states and actions (past and future) (d) The current state only depends on the immediate previous state and action Computer Programming Background Knowledge When it comes to computer programming or coding, I believe I have (1) No knowledge: I might be aware of computer programs, but have never coded before (2) A little knowledge: I know basic concepts in programming, but have never applied it (3) Some knowledge: I have applied programming concepts by coding at least once before 60 (4) Moderate knowledge: I apply programming concepts somewhat frequently for my work, class, or leisure (5) A lot of knowledge: I apply programming concepts very frequently or create cutting edge software AI Background Knowledge When it comes to Artificial Intelligence (AI), I believe I have (1) No knowledge: I might be aware of AI, but have no knowledge about it (2) A little knowledge: I know basic concepts in AI, but have never applied it (3) Some knowledge: I have applied AI concepts by coding at least once before (4) Moderate knowledge: I apply AI concepts somewhat frequently for my work, class, or leisure (5) A lot of knowledge: I apply AI concepts very frequently or create cutting edge software AI class Have you ever taken or are currently taking any classes on Artificial Intelligence? • Yes • No Appendix 2. Study Variables 62 Appendix 3. Study Medium 63 64 65 66 67 68 References [1]Ehsan, U. and Riedl, M. O., “Explainability Pitfalls: Beyond Dark Patterns in Explainable AI”, 2021. [2]Ehsan, U., “The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations”, 2021. [3]Ehsan, U., Tambwekar, P., Chan, L., Harrison, B., and Riedl, M., “Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions”, 2019. [4]Thura, D., Beauregard-Racine, J., Fradet, C., and Cisek, P., “Decision making by urgency gating: theory and experimental support”, Journal of Neurophysiology, 2012. [5]Michael Chromik, Malin Eiband, Felicitas Buchner, Adrian Krüger, and Andreas Butz. 2021. I Think I Get Your Point, AI! The Illusion of Explanatory Depth in Explainable AI. In 26th International Conference on Intelligent User Interfaces. 307–317. [6]Schraagen, J. M., Elsasser, P., Fricke, H., Hof, M., & Ragalmuto, F. (2020). Trusting the X in XAI: Effects of different types of explanations by a self-driving car on trust, explanation satisfaction and mental models. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 64(1), 339–343. https://doi.org/10.1177/1071181320641077 [7]Weina Jin, Sheelagh Carpendale, Ghassan Hamarneh, and Diane Gromala. 2019. Bridging ai developers and end users: an end-user-centred explainable ai taxonomy and visual vocabularies. Proceedings of the IEEE Visualization, Vancouver, BC, Canada (2019), 20–25. [8]Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454–464. [9]Pat Croskerry. 2003. Cognitive forcing strategies in clinical decision making. Annals of emergency medicine 41, 1 (2003), 110–120. [10] Tyler J. Loftus, Patrick J. Tighe, Amanda C. Filiberto, Philip A. Efron, Scott C. Brakenridge, Alicia M. Mohr, Parisa Rashidi, Jr Upchurch, Gilbert R., and Azra Bihorac. Artificial Intelligence and Surgical Decision-making. JAMA Surgery, 155(2):148–158, 02 2020. [11] Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. Interpretable deep models for icu outcome prediction. In AMIA Annual Symposium Proceedings, volume 2016, page 371. American Medical Informatics Association, 2016. [12] Cynthia Rudin, Caroline Wang, and Beau Coker. The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, 2(1), 3 2020. https://hdsr.mitpress.mit.edu/pub/7z10o269. [13] Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human Decisions and Machine Predictions. The Quarterly Journal of Economics, 133(1):237– 293, 2017. 69 [14] Karen Hao. Ai is sending people to jail – and getting it wrong. MIT Technology Review, January 2019. [15] Donald MacKenzie. Material Signals: A Historical Sociology of High-Frequency Trading. American Journal of Sociology, 123(6):1635–1683, 2018. [16] John Murawski. Mortgage providers look to ai to process home loans faster. Wall Street Journal, March 2019. [17] J. Petch, S. Di, W. Nelson, "Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology", Canadian Journal of Cardiology, Volume 38, Issue 2, 2022, Pages 204-213 [18] Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. ’It’s Reducing a Human Being to a Percentage’: Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 377. [19] Devleena Das and Sonia Chernova. 2020. Leveraging rationales to improve human task performance. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 510– 518. [20] Fred D Davis, Richard P Bagozzi, and Paul R Warshaw. 1989. User acceptance of computer technology: a comparison of two theoretical models. Management science 35, 8 (1989), 982–1003. [21] W. N. Venables and B. D. Ripley. 2002. Modern Applied Statistics with S (fourth ed.). Springer, New York. http://www.stats.ox.ac.uk/pub/MASS4ISBN 0-387-95457-0. [22] Sanjeeb Dash, Oktay Gunluk, and Dennis Wei. 2018. Boolean Decision Rules via Column Generation. Advances in Neural Information Processing Systems 31 (2018), 4655–4665. [23] Barney Glaser and Anselm Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transactions, Chicago. [24] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58 (2020), 82–115. [25] Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138–52160. [26] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765–4774. [27] Ariely, D. (2008). Predictably irrational: The hidden forces that shape our decisions. New York: Harper Collins. doi:10.5465/AMP.2009.37008011. [28] Blanco, F., 2017. Cognitive bias. Encyclopedia of animal cognition and behavior, 1(6). [29] Acciarini, C., Brunetta, F. and Boccardelli, P. (2021), "Cognitive biases and decision-making strategies in times of change: a systematic literature review", Management Decision, Vol. 59 No. 3, pp. 638-652. 70 [30] Busenitz, L.W. and Barney, J.B. (1997), “Differences between entrepreneurs and managers in large organizations: biases and heuristics in strategic decision-making”, Journal of Business Venturing, Vol. 12 No. 1, pp. 9-30. [31] Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39(2), 93–104. [32] I. Indrayanti, S. Ngabekti, and B. Astuti, “Development of Guided Inquiry Based Learning Modules to Improve Environmental Attitude and Hight Order Thinking Skills”, JISE, vol. 10, no. 1, pp. 65 - 69, Apr. 2021. Jamison_Lewis_Thesis Final Audit Report 2023-02-10 Created: 2023-02-08 By: Isabelle Vivier (isabellevivier@weber.edu) Status: Signed Transaction ID: CBJCHBCAABAAR0JefK_r0VRrn0hkQ1Eyho221CtJDBgs "Jamison_Lewis_Thesis" History Document created by Isabelle Vivier (isabellevivier@weber.edu) 2023-02-08 - 11:08:05 PM GMT Document emailed to nanderson1@weber.edu for signature 2023-02-08 - 11:12:27 PM GMT Document emailed to Sarah Herrmann (sarahherrmann@weber.edu) for signature 2023-02-08 - 11:12:27 PM GMT Document emailed to Robert Ball (robertball@weber.edu) for signature 2023-02-08 - 11:12:27 PM GMT Email viewed by nanderson1@weber.edu 2023-02-08 - 11:13:59 PM GMT Signer nanderson1@weber.edu entered name at signing as Nicole Anderson 2023-02-08 - 11:25:46 PM GMT Document e-signed by Nicole Anderson (nanderson1@weber.edu) Signature Date: 2023-02-08 - 11:25:48 PM GMT - Time Source: server Email viewed by Sarah Herrmann (sarahherrmann@weber.edu) 2023-02-08 - 11:28:59 PM GMT Document e-signed by Sarah Herrmann (sarahherrmann@weber.edu) Signature Date: 2023-02-08 - 11:29:10 PM GMT - Time Source: server Email viewed by Robert Ball (robertball@weber.edu) 2023-02-10 - 1:36:21 AM GMT Document e-signed by Robert Ball (robertball@weber.edu) Signature Date: 2023-02-10 - 1:36:36 AM GMT - Time Source: server Agreement completed. 2023-02-10 - 1:36:36 AM GMT |
Format | application/pdf |
ARK | ark:/87278/s6tqb1gx |
Setname | wsu_smt |
ID | 96895 |
Reference URL | https://digital.weber.edu/ark:/87278/s6tqb1gx |