Title | Tyler, Carson_MCS_2021 |
Alternative Title | Comparing Different Display Possibilities to Solve the Bubble-Up Problem in Recommendation Engines |
Creator | Tyler, Carson |
Collection Name | Master of Computer Science |
Description | Uncertainty is prevalent in most recommendation systems. The inability to effectively present results of objects with scarce data remains an opportunity for investigation. This thesis investigates how items within a system lacking user reviews can be recommended in a content-based recommendation engine. This research contributes to the body of existing knowledge on uncertainty, recommendation systems, and participant experience. The results of this research are measured by determining participant trust given unexpected results provided by this recommendation system. The following research question is presented, which this thesis aims to answer: How does introducing uncertain results in a content-based/collaborative filtering hybrid recommendation system impact a participant's trust towards a recommendation system? In other words, this thesis addresses how uncertain results affect the participant. The results of this thesis proved to not be statistically significant. Each variable tested against the control showed similar results to one another. Participant scores were collected regarding their satisfaction and trust in the recommendation system. The scores showed no statistical significance between the different tests. Therefore, this thesis concludes different ways of displaying uncertain results does not improve their chance of satisfaction or use. Since the different ways of displaying uncertain results are not relevant, this suggests that simply showing uncertain results may be sufficient to overcome the bubble-up problem. |
Subject | Computer science |
Keywords | Lack of user input; Content-based recommendation engine; Participant experience; Uncertain results |
Digital Publisher | Stewart Library, Weber State University |
Date | 2021 |
Language | eng |
Rights | The author has granted Weber State University Archives a limited, non-exclusive, royalty-free license to reproduce their theses, in whole or in part, in electronic or paper form and to make it available to the general public at no charge. The author retains all other rights. |
Source | University Archives Electronic Records; Master of Education in Curriculum and Instruction. Stewart Library, Weber State University |
OCR Text | Show Comparing Different Display Possibilities to Solve the Bubble-Up Problem in Recommendation Engines by Carson Tyler A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE OF COMPUTER SCIENCE WEBER STATE UNIVERSITY Ogden, Utah ________________________________ Faculty Advisor, Robert Ball Committee Chair ________________________________ Second Committee member, Joshua Jensen Committee Member ________________________________ Second Committee member, Hugo Valle Committee Member ________________________________ Student, Carson Tyler Signature: Email: Signature: Email: Signature: Email: Signature: Email: Robert Ball (Jan 5, 2021 13:17 MST) Robert Ball robertball@weber.edu Carson Tyler (Jan 5, 2021 13:19 MST) Carson Tyler Jan 5, 2021 carsontyler@mail.weber.edu joshuajensen1@weber.edu Hugo Valle (Jan 5, 2021 13:44 MST) hugovalle1@weber.edu Author A Thesis in the Field of Computer Science for the Degree of Master of Science in Computer Science Weber State University December 2020 Copyright 2020 Carson Tyler Comparing Different Display Possibilities to Solve the Bubble-Up Problem in Recommendation Engines Abstract Uncertainty is prevalent in most recommendation systems. The inability to effectively present results of objects with scarce data remains an opportunity for investigation. This thesis investigates how items within a system lacking user reviews can be recommended in a content-based recommendation engine. This research contributes to the body of existing knowledge on uncertainty, recommendation systems, and participant experience. The results of this research are measured by determining participant trust given unexpected results provided by this recommendation system. The following research question is presented, which this thesis aims to answer: How does introducing uncertain results in a content-based/collaborative filtering hybrid recommendation system impact a participant’s trust towards a recommendation system? In other words, this thesis addresses how uncertain results affect the participant. The results of this thesis proved to not be statistically significant. Each variable tested against the control showed similar results to one another. Participant scores were collected regarding their satisfaction and trust in the recommendation system. The scores showed no statistical significance between the different tests. Therefore, this thesis concludes different ways of displaying uncertain results does not improve their chance of satisfaction or use. Since the different ways of displaying uncertain results are not relevant, this suggests that simply showing uncertain results may be sufficient to overcome the bubble-up problem. Author’s Biographical Sketch Carson L. Tyler is a graduate student at Weber State University in the Master of Science in Computer Science program, slated to graduate in December 2020. He received his Bachelor of Science in Computer Science in July 2019 from Weber State University, graduated Summa Cum Laude alongside an Associate of Science, an Associate of Applied Science, and a Certificate of Proficiency. He served as Vice-Chair of the Association of Computer Machinery chapter as well as Student Vice President of The Honor Society of Phi Kappa Phi at Weber State. He has worked in the software development field since early 2018, working professionally with a variety of languages including .NET, ASP.NET, VB.NET, React, and Ruby. Dedication Dedicated to Dr. Robert Ball, my amazing thesis advisor who encouraged me to pursue this degree and supported me through the program. Without him, this research would not have been started nor completed. Acknowledgments I would like to acknowledge the Info Vis Lab at Weber State, of which I was part of for this project, who advised me weekly and offered support and direction. Table of Contents Author’s Biographical Sketch ............................................................................................ iii Dedication .......................................................................................................................... iv Acknowledgments................................................................................................................v List of Tables .................................................................................................................... vii List of Figures .................................................................................................................. viii Introduction ..........................................................................................................................9 Related Work .....................................................................................................................22 Gathering and Retrieving Data ..........................................................................................31 Creating a database ............................................................................................................34 Developing a Hybrid Recommendation System ................................................................37 CITI Certification and IRB Approval ................................................................................39 Designing and Executing of User Experiments .................................................................41 Analysis of Results ............................................................................................................50 Intermingling Results .............................................................................................58 Conclusion .........................................................................................................................60 References ..........................................................................................................................62 List of Tables Table 1 Display Types ....................................................................................................... 15 Table 2 Sample data collected for the first seven columns. .............................................. 50 List of Figures Figure 1 Display Technique #1 with control, both certain and uncertain recipes ............16 Figure 2 Display technique #2 with certain and uncertain recipes...................................16 Figure 3 Display technique #3 Uncertain recipes .............................................................16 Figure 4 Display technique #4 with certain recipes ..........................................................17 Figure 5 Survey questions ..................................................................................................19 Figure 6 ERD for the database ..........................................................................................36 Figure 7 Screenshot of the Informed Consent page...........................................................41 Figure 8 The Directions Page ............................................................................................43 Figure 9 The Scenario page ...............................................................................................45 Figure 10 The Recommended Recipes page ......................................................................46 Figure 11 Overall rating bar graph...................................................................................51 Figure 12 Related rating bar graph...................................................................................52 Figure 13 Reuse rating bar graph .....................................................................................53 Figure 14 Satisfaction rating bar graph ............................................................................54 Figure 15 Trust rating bar graph ......................................................................................55 Figure 16 Overall Rating ANOVA .....................................................................................56 Figure 17 Related Rating ANOVA .....................................................................................56 Figure 18 Reuse Rating ANOVA........................................................................................57 Figure 19 Satisfaction Rating ANOVA ..............................................................................57 Introduction Recommendation engines continue to rise in popularity as the applications for their use expand. Fields defines a recommendation engine as “a technique or method that presents a user with suggested objects for consumption based on past behavior” [1]. In other words, recommendation engines allow for a personalized, custom experience for each user in a given situation. The goal of this system is to increase user experience and increase the likelihood that they will continue to use and reuse the product utilizing the recommendation engine. For example, websites such as www.google.com and www.amazon.com are popular for their recommendation engines. Google’s search engine utilizes a recommendation engine to provide accurate results given a search phrase by the user. Similarly, Amazon’s website has multiple applications of a recommendation engine. On the homepage, certain products are presented to the user. As the user uses the website and makes purchases, the homepage products change to better appeal to the user. The goal of this application is to increase user click-throughs and product sales. Additionally, there are different types of recommendation engines. Specifically, content-based and collaborative filtering recommendation engines are the two most popular types and used in this research. A content-based recommendation engine uses metadata from an object to recommend it. For example, if one were designed for recipes, the metadata would include the title, ingredients, and directions (among other metadata) of the recipe. Results from this type of recommendation engine rely solely on the metadata for recommending. 10 For instance, say a content-based recommendation engine is recommending recipes related to a chicken parmesan recipe. This system would take the title, ingredients, and directions and directly compare those to every other recipes’ title, ingredients, and directions. It would determine which recipes have the most similarity to the original recipe, and then return a certain amount to the user in ranked order. On the other hand, a collaborative filtering recommendation engine utilizes user reviews of the object to be recommended. For example, if this type were designed for recipes on a website, user reviews such as each rating would be used to determine the best objects to recommend. For instance, say a content-based recommendation engine is recommending recipes related to a chicken parmesan recipe. Instead of comparing this recipe based on its metadata, it will rely simply on the reviews. Say this recipe has a 4.5-star rating and has 50 reviews. This system may look at each user who rated this recipe with a 5-star rating and compile a list of all the recipes rated 5 stars by this user. This would be done for every user who reviewed the chicken parmesan recipe with a 5-star rating. Then, the compiled lists would be compared to one another, and any recipes that are on multiple lists would be recommended. There are numerous ways that a collaborative filtering engine can function, and this is one such way. In this research, a hybrid recommendation engine is used. This type combines both content-based and collaborative filtering, where both the metadata and user reviews will be used to recommend the recipes. Combining these two types allows for more accurate and reliable results and is the most commonly used type in industry. However, when incorporating collaborative filtering, an implicit bias is introduced as objects with 11 low to no user reviews will likely be excluded from the recommendation engine’s results. Consequently, this Bubble-Up Problem makes it difficult for new or unpopular objects to be recommended. The Bubble-Up Problem describes a situation in a recommendation system in which new additions to the system are not likely to be recommended by the system. Most recommendation systems will only present recommendations to the user with high viewership and/or high user ratings. Therefore, new additions (e.g. new recipes, new products) to the system will be at a disadvantage of being recommended to the user by the system. Therefore, the Bubble-Up Problem presents a unique opportunity to investigate different techniques by which new additions to a recommendation may have a fairer chance at being recommended. As demonstrated, this problem is referred to as the “Bubble-Up Problem” because it can be difficult for new or unpopular additions to “bubble up” to the top of recommendations and compete with popular, established objects. As explained, by investigating different solutions to this problem, the new or unpopular additions can gain greater exposure and increase sales. This results in the potential for increased revenue for these types of objects if implemented correctly. If not, it can further increase the bias against them, decreasing their exposure and potential sales or views. An example of a recommendation system in which the Bubble-Up Problem is prevalent is in recipe recommendation systems. New recipes are added to these sites daily though are unlikely to gain an audience organically by the recommendation system. For this reason, websites such as www.allrecipes.com, referred to henceforth as AllRecipes, and the recipes hosted by this site were chosen to test the Bubble-Up Problem. 12 Mascia et al. and Zeeb research the effects of uncertainty exposure to rats, stating it leads to susceptibility to try that uncertain event and partake in risky. Additionally, Merlhiot et al. conclude uncertainty generally leads to a negative perception of that event [2-3]. The Bubble Up Problem relates to uncertainty about the unknown in a recommendation engine. When new or unpopular objects are recommended by the system, there is a chance the results will not be satisfactory. Because the object does not have user reviews, the quality of the object is unknown. In the context of this thesis, two types of recipes are described. First, certain recipes describe recipes that have greater than or equal to 100 ratings and greater than or equal to four stars. These recipes are regarded as certain recipes that are likely to be the most receptive by the participants. This is due to the quality and volume of ratings associated with the recipe. The recipe has already been established by other users; thus, the uncertainty is taken away. Next, uncertain recipes describe recipes that have less than 10 ratings and any number of stars, including zero. These describe recipes that are new or unpopular additions, as referenced above. The age of the recipe is irrelevant; because of the low reviews, its value has not been established by other users. Normally, recommendation engines will have a bias against uncertain recipes. They will not recommend these recipes for the reasons described. However, these recipes need exposure and user ratings to become more established and, therefore, be qualified for recommendation. Thus, a paradox is presented, and the importance of the Bubble-Up Problem is reinforced. Uncertain recipes will be key in testing this problem. 13 One potential solution to this problem is to purposefully include uncertain recipes in the results with a certain display technique. For example, the uncertain recipes could be displayed with a different font color, placed in a separate column, and/or given some other sort of identifier. By doing this, the user will know the recipe they are viewing may be uncertain and, therefore, may not necessarily be a quality result. The benefit of this is uncertain recipes gain exposure and, as a result, more user reviews. To test and further investigate the Bubble-Up Problem, I took the following approach: An experiment was conducted to determine various participants’ understanding, desire to reuse, and satisfaction with the recommendation system regarding certain and uncertain recipes. Different display and presentation effects were applied to the experiment to identify the best way to display uncertain results. These effects represent the variables in the experiment. (See Figures 1 – 4.) The control for this experiment was to use a recommendation system that does not recommend uncertain results like most recommendation systems. Uncertain results are excluded from the control. Recommendations are based solely on the number of views and the number of stars a recipe has received, where higher rated items will be more likely to be recommended than lower-rated items. For example, if a participant were to input “chicken” into the recommendation system, the top ten rated items with chicken included in the recipe would be recommended. The results will be displayed in two columns side-by-side, numbered 1 through 10. The experiment focuses on a system that does recommend uncertain results. This system was a hybrid recommendation engine, incorporating both content-based and collaborative filtering. Using both the metadata of a recipe and the accompanying 14 reviews and ratings, participants selected a predefined recipe that relates to a given scenario that interested them. Five to ten different recipes were recommended which relate to the chosen recipe. The goal of this recommendation was to suggest meaningful recommendations, particularly for uncertain recipes. The key difference between this experiment and the control is this one does suggest uncertain recipes that rely on the content along with reviews. For example, if a participant were to input “chicken” into this recommendation system, two columns would be returned: the first would be the same as the control, where reviews are contributing factor as to what is recommended; the second would be content-based- driven recommendations that will include uncertain results. These will be distinguished from the control recommendations, but both recipes will be surveyed. To better display uncertain recipes, different highlighting was applied to certain and uncertain recipes. Certain recipes were displayed with white font color. This is the default font color of the online web application and, therefore, is not significantly different. Contrastingly, uncertain recipes were displayed with yellow font color. This different font color was described to the participants, so they understood the difference. The different effects applied to this experiment are as follows: 1. Uncertain recipes were displayed in a separate column from the collaborative recipes, totaling ten recipe results, though all recipes will be displayed with white font color. Uncertain recipes were indistinguishable from certain recipes. See Figure 1. 2. Uncertain recipes were displayed in a separate column from the collaborative recipes, totaling ten recipe results, and uncertain recipes will be displayed with 15 yellow font color. Uncertain recipes were distinguishable from certain recipes. See Figure 2. 3. Uncertain recipes were only returned, excluding collaborative (certain) results, with the indication of being uncertain, totaling five recipe results. See Figure 3. 4. Certain recipes were returned, excluding hybrid (uncertain) results, with the indication of being uncertain, totaling five recipe results. This effect is included to test if the uncertain yellow font color has any effect on the participant’s experience. See Figure 4. Table 1 Display Types Display Type Certain Recipes Uncertain Recipes 1 Displayed in white Displayed in white 2 Displayed in white Displayed in yellow 3 Excluded Displayed in yellow 4 Displayed in yellow Excluded 16 Figure 1 Display Technique #1 with control, both certain and uncertain recipes Figure 2 Display technique #2 with certain and uncertain recipes Figure 3 Display technique #3 Uncertain recipes 17 Figure 4 Display technique #4 with certain recipes After participating in each effect, each participant was asked to complete a survey with the following questions. By asking the questions presented, we can gain a greater understanding of the effectiveness of the effects applied to the experiment. In parentheses is the name of the variable used to store the answer. See Figure 5. • On a scale of 1-5, what was your overall rating of the recommendation system? (Overall Rating) • On a scale of 1-5, how accurate were the recommendations related to your search? (Related Rating) • On a scale of 1-5, please rate each recommendation and how accurate it was to your intended search. Each recipe recommendation will be listed for the participant to rate. (Accuracy 1-10) • On a scale of 1-5, how unexpected was each recommendation? Each recipe recommendation will be listed for the participant to rate. (Unexpected Rating) • On a scale of 1-5, how likely are you to reuse this recommendation engine for future recipe searches? (Reuse Rating) • On a scale of 1-5, how much do you trust this recommendation system to provide you with useful, accurate results? (Trust Rating) 18 • On a scale of 1-5, what is your overall satisfaction with the recommendation system? (Satisfaction Rating) 19 Figure 5 Survey questions 20 The following milestones are presented as a method to track progress throughout this thesis. These will be described in greater detail throughout this thesis. 1. Gather and retrieve data. This includes cleaning and preprocessing data through data wrangling. 2. Create a database. This will be used to store the survey results collected from the participants after each effect, as well as compensation information if they wish to provide that information. 3. Develop a hybrid recommendation system. This was done in Python, primarily utilizing the Pandas library for the content-based recommendation calculations. 4. Receive CITI Certification, submit the IRB, and receive clearance for user experiments. 5. Design and perform the user experiments as described above, including distributing and collecting the valuable surveys. 6. Analyze results. Determine useful aggregates of the collected data 7. Write the thesis. Prepare the final documentation. Trust has been defined by Deutsch as when an “individual is confronted with an ambiguous path, a path that can lead to an event perceived to be beneficial or to an event perceived to be harmful” [5]. In the context of this thesis, the “ambiguous path” would be a participant interacting with the recommendation system, with the beneficial event being one that increases participant satisfaction towards the recommendation system and the willingness to reuse the system. On the contrary, the harmful event would be one that decreases participant satisfaction and willingness to reuse the system. 21 Trust is a key factor in this thesis. If a user trusts the recommendation engine, they are likely to reuse it and are likely to give exposure to uncertain results. Additionally, the user may be more likely to try cooking an uncertain recipe and ratings it, essentially helping that recipe to “bubble-up.” This project aims to determine participant trust in a recipe recommendation system and if different effects of displaying recipes impact participant trust. 22 Related Work In their article Music Recommendation Engine, Murali et al. investigate how a user’s trust was impacted in a music recommendation engine. Users created a music profile that includes their music preferences, such as favorite genres or most listened to songs. The music recommendation engine would then recommend new music based on the user profile [6]. The system also recommends unexpected but useful songs to the user which are intended to increase the user’s trust in the system. The researchers conducted their experiments by having users create a music profile, then participate in an experiment. This experiment recommended music based on the profile. Several other researchers also investigate the effects of a user profile with uncertainty [7,8,9]. Additionally, users indicated how open they are to listening to more music. As such, the recommendation engine would not only recommend music like the profile but also uncertain music. Murali et al. measure their results through surveys that were conducted immediately following the user experiment. They state that a recommendation engine is only useful when the user is satisfied with and trusts the system. The reusability of the system is a significant indicator of the quality of a system. They conclude that the usefulness of their engine was greater to users who were more inclined to unexpected results than those who were not. Ayata et al. also investigate music recommendation systems using physiological sensors to recommend music [10]. Additionally, Pukkhem and Wu describe similar research using object recommendation with uncertainty [11, 12]. This thesis was heavily influenced by the design of the project by Murali et al. While this thesis does not deal with music or a user profile, it does focus on 23 recommending certain and uncertain results. Future research for this thesis specifically could investigate the effect of implementing a user profile in terms of recipe recommendations. Additionally, this thesis did not survey the participant’s inclination to use unexpected results, as Murali et al. did, which may have been an unintentional implicit bias in the design of this thesis. Furthermore, Worapat et al. investigate how a user profile affects a user’s perception of a recommendation system. Their research focused on returning users and how satisfiability with recommendations improved as users continued to build their profile. They conclude it is difficult to provide sound recommendations that the user finds useful without a user profile, and a user profile will aid in an early stage of a recommendation engine with insufficient additional information [13]. Additionally, Lipshitz et al. research different types of uncertainty and coping strategies associated with uncertainty. They suggest three types of uncertainty: inadequate understanding, incomplete information, and undifferentiated alternatives. The coping strategies associated with these are: assumption-based reasoning, weighing the pros and cons of competing alternatives, suppressing uncertainty, and forestalling [14]. The research of Lipshitz et al. is supported by Berkeley and Hymphreys who use their research to investigate bias in the way questions are asked. They identify seven types of uncertainty in decision making. These types of uncertainty are significant when structuring the design of a question, as they claim presenting information in different ways can influence how a decision is made [15]. Allaire and Firsirotu, as well as Conrath, expound upon Lipshitz’s research to investigate additional methods of coping with uncertainty and decision making, 24 specifically in an organizational setting [16, 17]. Both of these papers find similar results that when multiple people are involved in making a decision, the uncertainty surrounding that decision is lessened or removed. In addition, Milliken supports Lipshitz’s research with their investigation into different types of uncertainty and responses to each type, identifying state uncertainty, effect uncertainty, and response uncertainty [18]. Further investigating decision making, Bell researches the expectations of an outcome of a decision made under uncertainty. The greater an expectation is for an uncertain outcome, the greater the emotional reward – or disappointment – is. Therefore, some people will be willing to have higher expectations while others will limit theirs. In the context of my thesis, Bell’s research explains how receptible to uncertain recipes participants may be [19]. Hsu and Bhatt’s research takes a neurological approach in understanding the brain patterns associated with uncertainty in decision making and the explanations of those patterns [20]. This research reaffirms Bell’s conclusions that the expectation of a particular reward when making an uncertain decision will influence the level of risk taken by the person. Supporting Hsu and Bhatt’s research, Platt and Huettel also investigate how the brain responds to uncertainty in decision making. They explain the brain inherently has mechanisms for dealing with and understanding uncertainty; however, mental disorders, including addiction, can affect or destroy those mechanisms [21]. In addition to his previous research, Bell also investigates the outcomes of dealing with uncertainty. For example, placing a bet on a sports team to win a game is an uncertain decision. If the bet results in a payout for the bettor, they may have a positive feeling about their decision; however, if there was another bet that had a higher payout 25 than what the bettor won, then then they may have negative feelings such as regret upon learning of that outcome. Again, this can result in decreased risk taken by people when faced with an uncertain decision [22]. Prelec and Loewenstein also research uncertainty in decision making. They investigate how changing a decision from certain to uncertain will make the decision less important to the decision-maker. [23] Karl researches uncertainty in decision making when making travel-related decisions. Their research focuses on factors that play into choosing whether or not to travel and travel destination. The results of this research show education levels, travel frequencies, and age are indications of risk associated with uncertain decisions related to travel. This research can be related to my thesis, in that those who with higher education or who bake often may be more willing to try uncertain recipes [24]. Zhao et al. also investigate uncertainty in decision making in a practical sense, explaining the uncertainty in highway development. In highway development, there are many uncertainties, such as traffic patterns, future usage, and future construction work on the highway. In a situation such as this, a risk must be taken to achieve the desired goal, such as predicting traffic patterns and designing an optimal highway solution [25]. The concept of uncertainty can be explained in mathematical models. Wei and Wang investigate such a model using multi-attribute decision-making problems [26]. In this model, each attribute is assigned a separate weight. This weight is initially unknown; however, Wei and Wang propose a formula to calculate these weights by utilizing the uncertain linguistic weighting average (ULWA) operator. Additionally, Luo investigates multi-attribute decision-making problems and various related concepts to solving the 26 uncertain weights [27]. Specifically, they research the theory of intuitionistic fuzzy sets and apply that to optimization models to efficiently and effectively predict the uncertain weights. Jin et al. also research uncertain decision-making models but use benefit and loss matrices for the model. Their approach also incorporates the risk of a benefit chance and loss change. The matrices are comprised of different weights which correspond to action weights. The researchers also incorporate fuzzy analytics and use related methods to calculate the weights. The weight with the highest number in either matrix corresponds to the best course of action [28]. On the subject of big data and the related uncertainty, Hariri et al. research the inherent uncertainty in big data due to noise, incompleteness, and inconsistency [29]. These researchers present strategies and solutions for processing this data and thereby reducing or removing the uncertainty. Primarily, this is done by utilizing artificial intelligence, which generally performs better than traditional data techniques. However, the researchers state uncertainties will likely still be present. Moffat proposes a method of analyzing uncertainty in data and results of an experiment. By identifying and explaining uncertainties, both the user of the data and recipient of the research can better interpret the results [30]. Next, I describe related works related to recommendation engines. Ozsoy and Polat describe a trust-based recommendation system. This system, unlike a content-based one, utilizes trust data for giving recommendations. Trust data is data that estimates the user’s rating for each item, then returns the items with the highest rating. They claim this 27 method has consistently delivered better results when compared to content-based recommendation engines [31]. Another alternate recommendation system is a context-aware recommendation engine. Zheng et al. investigate such a system and propose a Java-based open-source library. Context-based recommendation engines differ from collaborative filtering and content-based because they rely on the situation. For example, a user searching for a dinner recipe would likely want a different recipe if they were dining with their significant other than if they were dining with their children. The context in this example would thus be who they are dining with. Content-based and collaborative filtering systems do not account for this and Zheng et al. provide a context-aware system to solve this problem [32]. Similarly, Pathak et al. suggest a hybrid recommendation engine utilizing content, context, and collaborative algorithms for recommending movies. Their so-called ORBIT hybrid movie recommendation algorithm can be compared to the hybrid recommendation system used in my thesis, as both are using content and collaborative algorithms [33]. In addition to this research, Pathak et al. also describe a hybrid recommendation engine for books they call NOVA. Similar to the ORBIT recommendation system, NOVA combines the collaborative, content, and context recommendation algorithms to provide the best, most efficient results to the user. This optimized algorithm first utilizes content-based filtering to return the most similar books. Then, it uses collaborative-filtering to get highly-rated books with the most ratings. Finally, it uses context-based filtering to narrow the results to the most recent, highest rated items related to their search. As with their other recommendation system, Pathak et al. conclude a hybrid 28 recommendation engine that utilizes these three algorithms will provide optimal results [34]. Schrage takes a broad approach to recommendation engines, describing the origins and evolution of these systems as well as how they are implemented by popular companies and applications. More specifically, he investigates the impact of machine and deep learning algorithms to a user of systems that benefit from these in addition to how these algorithms are implemented, both in the back-end processing and front-end user experience [35]. Li et al. theorize a personal image recommendation engine based on web search engine use. When a user searches for an image on a search engine, the researchers claim it is of the best interest to the user to personalize these results rather than have the same results for every user who searches the same keywords. Therefore, they suggest an algorithm to infer a user’s general interest based on their previous search engine history and behavior and then estimates the user’s interest in an image [36]. Furthermore, Wu et al. attempt to implement a collaborative filtering recommendation engine based on the ant colony algorithm to solve the problem of recommending items to new users. The ant colony algorithm mimics the behavior of ants looking for food. Initially, ants will wander randomly until a food source is found, at which point they start a path that other ants can follow. This dynamic system can be modeled in an algorithm. The system developed by Wu et al. relies on K-Means clustering and nearest neighbors to recommend products based on other users who have searched similar things [37]. 29 Bhat et al. also describe a content-based recommendation engine for selecting employees to be included in the decision-making process on software development products [38]. Their research shows the practicality of a content-based recommendation engine without utilizing other recommendation algorithms such as collaborative filtering. They claim the results of using this recommendation system are promising, as it was able to identify users with the correct experience related to a project. Alternate and non-traditional recommendation engines are also an area of interest and useful to explore in my thesis. Sahu et al. propose personalized recommendation engines using the Hadoop platform. They claim that, due to an abundance of sellers on the internet for the same product, it can be difficult for a user to find the right item for the best price. By constructing personalized recommendation engines for each user, this problem can be solved. Due to the amount of data on the internet, the researchers claim utilizing Hadoop allows for the quickest and most accurate results [39]. Hossain and Uddin also propose an alternate recommendation system: a neural engine-based recommendation system. This system utilizing a neural engine that builds a neural recommender by using a neural network based on a user’s selected interests and behavior in the system. The recommender constructs different classes for movie genres whose weighted grade changes based on user behavior. They claim this approach provided better results than traditional methods, such as content-based and collaborative filtering systems [40]. In summary, the related works in uncertainty, decision making, and recommendation engines have been described in detail. My thesis uses many aspects of these topics and implements different parts of the related research. By providing a deeper 30 understanding of these topics, I hope this thesis makes more sense, as well as helps with any future research done into this topic. 31 Gathering and Retrieving Data The first milestone of this thesis was to retrieve the data to be tested. The data chosen for this thesis is recipe data from www.allrecipes.com. Recipe data was collected through a python web scraping script that grabs the metadata for each recipe from its respective site on AllRecipes. The base URL of a recipe on their site is www.allrecipes.com/recipe/, with the recipe ID as the endpoint. The IDs were automatically generated in a range from 0 to 300,000. The visited web page was logged to prevent visiting it again. Then, an attempt to reach the webpage was performed. If there were too many redirects on the request, then that ID was skipped. There are various subpages that the script would be directed to, given an ID, which would be parsed through individually to find any recipe available. If a valid recipe page was found, an output file was created to store the recipe metadata. The metadata was retrieved through the HTML of the webpage. Nearly every aspect of the recipe was collected, even the fields not used in the scope of this thesis. In addition to recipe data being web scraped, reviews and cooks (the person who posted the recipe) associated with a recipe were collected. These were performed in separate scripts and stored individually. This resulted in a variety of different directories being created with nearly all metadata associated with individual recipes. A recipe retrieved from this site features various metadata fields, though for my thesis only the following fields are used for each recipe: ID, URL, title, ratings, stars, image, ingredients, and directions. The ID is a simple unique integer identifier of the recipe. The URL is the endpoint at which the recipe can be found on AllRecipes. The title is a string value of the recipe’s 32 title. The ratings field is an integer value of the number of ratings a recipe has received. Stars is an integer value of the average rating a recipe has received. Image URL is a string value of a full URL to the recipe’s main/first photo. If a recipe does not have an uploaded image, this URL defaults to a photo that states “No photos have been added!” Ingredients is a string array of individual ingredients for a recipe. Lastly, Directions is a string array of individual ingredients, numbered, for a recipe. In addition to the fields retrieved from AllRecipes, I added two supplementary fields to each recipe: type and most similar recipes. Type is a string value that describes the role of the recipe. These roles, described later, determine how the recipe is tested by the participants. Most similar recipes are an integer array of recipe ID’s that are most similar to the recipe. The approach for determining these values is described later. The method for determining these recipes is described in detail in Milestone 3. Essentially, a Jaccard Index was calculated between every recipe in the dataset with every recipe in the dataset, then the scores were sorted in descending order and the top ten recipes were returned as the most similar. The Jaccard Index is a calculation that determines which members of two sets are shared and which are distinct. The similarity, as described by Gupta and Sardana, is calculated on the union divided intersection of the two data sets [41]. However, the scope of the similar recipes needed to be more specific for this research. This original program calculated the highest score with no regard for the number of reviews or ratings. As a result, the simple 10 most similar would not be acceptable since this thesis focuses on recommending uncertain results, which have been described as ones with a low number of reviews and ratings. Therefore, the above 33 program was modified. Instead of comparing a recipe to all other recipes in a dataset, it was only compared to recipes with less than ten reviews and any number of ratings. This provided the “uncertain” characteristic we desired. The top five recipes calculated given this condition were stored in the Most Similar field and were assigned a Type “uncertain.” Similarly, “certain” recipes were desired for this thesis project to test against the uncertain ones. To calculate certain recipes, a similar condition was applied to the above program. Instead of comparing a recipe to all other recipes in a dataset, it was only compared to recipes with more than 100 reviews and where ratings were greater than four. As before, the top five results from this calculation were compiled together and added to the Most Similar field and given the Type “certain”. Finally, the recipe that was compared to these is given the Type “main”, as this will be the parent recipe which participants will choose from in the experiments. This process was repeated 40 times, five times per scenario. There are eight scenarios utilized in this thesis, each with five recipes to choose from. Therefore, five main recipes and ten accompanying certain and uncertain recipes were collected for each scenario. This totals 55 recipes per scenario, or 440 recipes total. The recipes were stored in JSON file formats, split by scenario. 34 Creating a database A database was needed to store information about the user experiments and hold the gathered data. Various database types and hosting services were considered in this process. I selected MySQL as a database and decided to host it through Free MySQL Hosting (www.freemysqlhosting.net). MySQL was chosen for its popularity among web technologies, specifically with PHP and Node.JS. Free MySQL Hosting was chosen due to its convenience, easy interface, and cheap pricing. In the beginning, three tables were needed. First, a data table was to be created to store the participant survey results, along with an identity column. This table initially stored just the survey results. However, it became apparent that this was not sufficient for this research. As such, additional fields were also collected, including the ID of the “main” (parent) recipe, the display type of the scenario, and the full list of similar recipes to the parent. Every column in this table, excluding the similar recipes column, was of type integer. The similar recipes column was a nvarchar column type. When a participant submitted a survey on the front-end, it was sent to this table. The second table created was the compensation table. This table stored the email and name of the participant, as well as an identity column, if they wished to receive a $10 Amazon gift card for their participation. Receiving compensation was not required as part of the experiment. The third table created was a recipes table which stored the recipe data collected in the first milestone. Each recipe field had its own column in the table (including the two appended fields), with an identity column for the primary key. However, a problem arose in identifying which scenario the data row belonged to. Three different solutions were 35 considered. First, adding a new column to the table to identify which scenario the row belonged to, in addition to adding a new table titled scenarios. This new table would contain the ID of the scenario as well as the title and/or brief description of it. The second solution considered was to arrange the recipes in order of scenario. For example, the first scenario would occupy the IDs ranging from 0 through 54, since there are 55 recipes per scenario. Then, in the front-end code, there would be some sort of indicator of which indices belonged to each scenario. Finally, the third solution considered was to create individual tables for each scenario. In other words, there would be eight tables, each with the scenario title as the table name, containing the 55 recipes related to that scenario. The advantages of this solution were organization and ease of management. Ultimately, this was the solution selected. While it was initially more work, once it was set up it was easy to modify if needed. It allowed for easier access to the recipes and made displaying them on the front-end manageable and easy. However, as the front-end was being developed, it became apparent there was no need for any of the recipe tables. The data was static and would be read-only. Since the data was already stored in a JSON format from data wrangling, I decided to store the JSON files in the project directory and load them directly from there instead of going to the database to retrieve that information. This significantly increased load times on the front-end and reduced data usage on the hosting platforms, both for the database hosting and the website hosting. It also resulted in less complex code, reducing the likelihood of errors and bugs when the experiments were performed. 36 As a result, the scenario tables were removed from the database and converted into JSON files. This resulted in two remaining tables: the data table and the compensation table. Figure 6 shows the entity-relationship diagram (ERD) of the database. Many of the fields collected in the data table were integers, ranging from 1-5, with one text field. It is important to note that there is no relationship between the data and the compensation table. The two tables are independent of each other. This design was done to remove any identifiable information from the collected data, in compliance with the IRB, described later. The SessionIDs do not match between the two tables and, therefore, do not have any relationship. Figure 6 ERD for the database 37 Developing a Hybrid Recommendation System As previously mentioned, the Most Similar field required the use of a recommendation system to determine which recipes were most related to a given recipe. The recommendation system used for this research was written in Python 3 in Visual Studio Code. First, a list of stop words was created which contains words that would be ignored in the calculation. Then, modified fields for the recipe metadata were compiled, ignoring any words that were in the stop words. The purpose of removing stop words is to increase accuracy results. The stop words contain words that give no significance to the recipe, such as “the”, “if”, “extra”, or “about”. By ignoring these words, I could make the recipe fields more specific and, therefore, more accurate in determining similarity to other recipes. After the recipe fields were parsed through and stop words were removed, the recipe was compared to the other recipes in the dataset. The similarity was performed using Jaccard Index on two arrays, each array containing every parsed word from a recipe. The intersection of these two recipes was then calculated. The length of this intersection would be the numerator in the final equation. The denominator was calculated by adding the length of the two-word arrays and then subtracting the numerator. The numerator was then divided by the denominator. The result of the Jaccard Index was returned and placed in an array that stored the similarity score for all recipes. This array was then sorted in descending order. A higher Jaccard Index score indicates higher similarity. This process was done for each recipe in 38 the dataset. Once the dataset had been completely parsed through, the top ten recipe IDs in the Jaccard Index array were set as the Most Similar field for that recipe. For example, in this thesis the recipe Easter Breakfast Casserole was used, whose ingredients include eggs, bacon, cheese, onions, peppers, among others. It returned similar recipes such as Egg and Hash Brown Pie and Bacon Breakfast Casserole (Gluten- Free), which share similar recipes, though not exactly. 39 CITI Certification and IRB Approval Because the experiments in this thesis involved people, Collaborative Institutional Training Initiative (CITI) Certification was required, along with IRB approval. This process began by completing the CITI Certification through www.citiprogram.org. This course provided guidelines for conducting experiments, including ways to handle and store user data, how to phrase questions, and how to collect the appropriate data from experiments. Due to the nature of this thesis, this course provided useful information for ensuring the data collected was non-identifiable, that any identifiable information collected was voluntary, and that participants were adequately informed on how their collected data was handled. Upon completion of the CITI Certification, IRB approval was required before the experiments began. This process was done through the IRB Committee at Weber State University. Data collection was prohibited until approval was received from this committee. The IRB Application included various questions regarding the target audience, questions asked, the nature of the experiments, and how they are conducted, as well as how data was stored. To align with the requirements of this application, the data collected could not be identifiable. Identifiable information, such as a full name, would have to be collected separately from the data collection. In other words, there would be no connection between the two sets of data. Therefore, because identifiable information was being collected for compensation (name and email), the data collected from the surveys and the data collected from the 40 compensation were stored independently of each other, with no connection between the two SQL tables of data. In addition to providing this information, an informed consent form was required to distribute to the participants before their study. This form provided valuable information about the study to the participants, including what they would be doing in the experiments, any risks associated with the study, compensation, and confidentiality of data. Traditionally, this form would be distributed in person before the participant was to take part. However, the experiments for this study were strictly online, for a couple of reasons. First, this thesis was conducted during the COIVD-19 pandemic of 2020, during which there were various group gathering restrictions and university-specific social distancing guidelines. Additionally, because the experiment was conducted exclusively through a website, participants could participate at any location with an internet connection. Therefore, the informed consent form was included at the beginning of the experiment on the website. Participants were then required to read this document and accept it before beginning the experiment. The IRB application was approved shortly thereafter, and experiments were able to be conducted. 41 Designing and Executing User Experiments I created a website specifically for this thesis designed to perform the experiments and collect participant data. The website was created using Node.JS and React for the front-end, with much of this code being written in Typescript. In addition to this front-end project, a second back-end project was required to interact with the database. This was a separate project that used JavaScript and functioned as an API that could be called from the front-end when data needed to be passed into the database. The front-end project followed a basic pattern. First, the informed consent was presented to the participant of which they were required to agree to. This provided the necessary information regarding the thesis experiment and how the participant was to be treated. This was done in compliance with the IRB. See Figure 7. Figure 7 Screenshot of the Informed Consent page. Next, a page with basic instructions on how the experiment was to be executed was displayed. This gave directions to the participant on what their responsibility was and 42 how they were to participate. The following paragraphs were given to the user as directions: “In this experiment, you will be selecting recipes based on a given scenario and then rating recommendations based on your selection. You can select recipes by clicking on them. Eight (8) scenarios will be presented. You should select the recipe you think is the best for that scenario. There is no right answer, simply pick the one you think looks the best or would be the most delicious or is the easiest to cook - whatever you decide. Upon selecting a recipe, you will be presented with either five (5) or ten (10) recipes. Consider these recommendations and how they related to the selected recipe. You can cycle through the recipes by clicking on them. There are two (2) types of recipes: certain and uncertain recipes. Uncertain recipes will have a yellow font color and will show up once you've initially selected a recipe. Some scenarios will not have uncertain recipes. You will then fill out a survey related to the recommended recipes and the recommendation system as a whole. You will rate these from 1-5 by clicking on the stars.” Participants were required to acknowledge and agree to these directions before continuing in the experiment. Additionally, a phone number was provided for the participant to contact if any errors were to arise in the project. See Figure 8. 43 Figure 8 The Directions Page After the participant accepted these directions, the experiment began. As stated, there are eight scenarios the participant was to complete. Each scenario was like the next one, with only minor differences. For ease of reference, each scenario will be referred to as scenario 1, scenario 2, etc. The order of scenarios was the same for each participant. For example, scenario 1 was always breakfast recipes, scenario 2 was always dessert recipes, etc. The following eight scenarios were presented to the participant in the following order: 1. You are preparing breakfast for your family one morning. Of the options presented, choose the recipe that you are most likely to make. 2. You have been invited to a work party and asked to bring a dessert. Of the options presented, chose the recipe that you are most likely to bring. 44 3. You have been invited to a neighborhood block party and asked to bring a main dish for dinner. Of the options presented, choose the recipe that you are most likely to bring. 4. You are looking for a drink recipe to make for a treat one day. Of the options presented, choose the recipe that you are most likely to make. 5. You are hosting a get-together with your friends, one of whom is gluten-free. Of the options presented, choose the recipe that you are most likely to make. 6. You are attending a New Year's Eve party and are asked to bring a side dish. Of the options presented, choose the recipe that you are most likely to bring. 7. You are attending a family reunion potluck and are asked to bring a salad. Of the options presented, choose the recipe that you are most likely to make. 8. You are going on a picnic with your friends and want to bring sandwiches. Of the options presented, choose the recipe that you are most likely to make. Each scenario page had the same layout: the name of five initially recommended recipes were displayed in the top left, the details of a single recipe were displayed on the right half of the screen, and the scenario was displayed on the left, beneath the list of five recipes. See Figure 9. Participants could click on each of the five recipes. When they did, the right-side recipe display would change to show the selected recipe. Also, the selected recipe would be bolded in the recipe list. When a participant decided on the best recipe for a given scenario, they would click a “Select Recipe” button, which would bring them to the next page, the recommended recipes page. 45 Figure 9 The Scenario page The recommended recipes page is a key part of this experiment. This page displays the following things: the recommended recipes according to the selected “main” recipe from the participant; the display type of the recommended recipes; the 26 survey questions related to this scenario; and, a recipe display, similar to the one on the previous page. On this page, participants were directed to click through each recommended recipe. The recommended recipe selected would be shown in the display panel where the participant could read the ingredients and directions and determine how closely related to the main recipe the recommended recipe is. The participant would do this for each recommended recipe. Then, they would begin answering the questions in the survey. The questions, listed previously, were designed to best gauge the participants’ trust and satisfaction with the recommendation system, given the resulting recommended recipes. After successfully answering each question in the survey, the participant would then submit the survey and continue to scenario 2, which would start at the scenario page, described above. The initially recommended recipes would change depending on the 46 scenario and, as a result, the recommended recipes given a “main” recipe would also change. See Figure 10, which is displaying 5 identifiable uncertain recipes. Figure 10 The Recommended Recipes page Another key aspect of this experiment is found on the recommended recipes page. As previously stated, there are four different display types for the recommended recipes: 1. Five certain and five uncertain recipes are displayed together with no distinction between either of the two. Both types have the same white font color. 47 2. Five certain and five uncertain recipes are displayed together with a distinction between the two. Certain recipes have a white font color while uncertain recipes have a yellow font color, indicating they are uncertain. 3. Only five certain recipes are displayed, but have a yellow font color, giving the appearance that these recipes are uncertain while still being certain. 4. Only five uncertain recipes are displayed and have a yellow font color. The order of these types was randomized for each participant to reduce any implicit, accidental bias that may exist in the design of the experiment. After the eighth scenario, the participant was thanked for their participation and given the opportunity to provide their name and email address, to receive compensation, a $10 Amazon gift card. The website and API project were both hosted on AWS using their free tier plan. The associated database was hosted using www.freemysqlhosting.net, which offers free or low-cost plans for MySQL database hosting. The experiment website domain was www.carsons-thesis-project.s3-website.us-east-2.amazonaws.com. Before recruitment began for participation in the experiment, a pilot test run was conducted with an individual who had no prior information about the experiment. They participated as though they were actually participating, though they were vocal about their choices, what they were thinking, and asked any questions they had (called a cognitive walkthrough). This was done to fine-tune any aspects of the website and perfect it before recruiting more people. This pilot run was done on the morning of September 15th, 2020. The run was a success and only a few things needed changing or fixing. 48 As a result, recruitment began later that day. Participants were contacted through social media, work email, and instant messaging. They were directed to the website where they could take the experiment at their own pace at any time. Additionally, they were informed to allot at least 20 minutes to complete the experiment. The target participation number was 50 individuals. By the end of the second day of recruitment (September 16th), this number had been exceeded. The experiment was closed for future participation after the 56th individual participated. 50 of these individuals elected to receive compensation for their participation. The execution of this experiment worked nearly as intended. No errors, issues, or complaints were reported from the participants. Additionally, the data was successfully collected and stored without issue. However, a small issue did arise shortly after beginning the experiments. Due to the amount of data being received, it quickly became apparent the current database did not have enough storage for 50 or more entries of data. The database did exceed the storage limit for a moment, but this was quickly resolved by upgrading the database plan. As a result of the quick response, there was no loss of data from participants nor any interruption on active experiments. Upon closing the experiment for future participation, the next task was to distribute the compensation for participation. Fifty $10 Amazon gift cards were ordered through the Computer Science Master’s program. There was a delay between the end of experiments and the distribution of gift cards due to complications caused by the COVID-19 pandemic. However, upon reception of the gift cards, they were distributed to each participant who elected to receive compensation. They were asked to fill out and return a 49 short form simply stating that they received the gift card and redeemed it successfully. This was done at the request of the University for auditing purposes. 50 Analysis of Results Fifty-six individuals participated in this thesis project. Each participant was presented with eight surveys to complete, which resulted in 448 total surveys, or 448 rows of data in the data table. Additionally, there were four display types presented to the user, two scenarios for each. In other words, there were 112 rows of data per display type. The following table is an example of the data collected. It is only a sample of a small set of data and does not represent the whole data row. Table 2 Sample data collected for the first seven columns. Sessio n ID Overal l Rating Relate d Rating Accurac y 1 Accurac y 2 Accurac y 3 Accurac y 4 193 5 5 5 5 3 3 194 4 4 2 5 4 5 195 5 5 5 5 5 5 196 4 4 5 5 5 5 197 4 4 5 2 4 4 51 The data was first visualized using line and box plot graphs. The data being measured in these graphs was the average of each rating (Overall, Related, etc.) for each display type. Each display type is referenced by its number. Display number 0 is the control, with both certain and uncertain recipes. Display number 1 has both certain and uncertain recipes, where the uncertain recipes have a yellow font color. Display number 2 has just uncertain recipes that have a yellow font color. Display number 3 has just certain recipes that have a yellow font color. The results are as follows: Figure 11 Overall rating bar graph Figure 11 shows the average overall rating collected from the survey for each display type. While display number 2 is slightly higher than the rest, each display type has an average overall rating near 4. This rating corresponds to the question asked in the survey, “On a scale of 1-5, what was your overall rating of the recommendation system?” 52 Figure 12 Related rating bar graph Figure 12 shows the average related rating collected from the survey for each display type. While display number 2 is slightly higher than the rest, each display type has an average related rating near 4. This rating corresponds to the question asked in the survey, “On a scale of 1-5, how accurate were the recommendations related to your search?” 53 Figure 13 Reuse rating bar graph Figure 13 shows the average reuse rating collected from the survey for each display type. While display number 2 is slightly higher than the rest, each display type has an average reuse rating near 4. This rating corresponds to the question asked in the survey, “On a scale of 1-5, how likely are you to reuse this recommendation engine for future recipe searches?” 54 Figure 14 Satisfaction rating bar graph Figure 14 shows the average satisfaction rating collected from the survey for each display type. While display number 2 is slightly higher than the rest, each display type has an average satisfaction rating near 4. This rating corresponds to the question asked in the survey, “On a scale of 1-5, what is your overall satisfaction with the recommendation system?” 55 Figure 15 Trust rating bar graph Figure 15 shows the average trust rating collected from the survey for each display type. While display number 2 is slightly higher than the rest, each display type has an average trust rating near 4. This rating corresponds to the question asked in the survey, “On a scale of 1-5, how much do you trust this recommendation system to provide you with useful, accurate results?” As we can see in the previous five figures, the mean ratings do not fluctuate much between the four display types. Additionally, the ratings are relatively consistent with one another, with each average rating staying near 4 for each display type. 56 After visualizing the results using bar charts, the data were analyzed using one-way ANOVA tests for how the display type is affected by the different ratings. Figure 16 Overall Rating ANOVA Figure 16 shows the overall rating ANOVA for each display type as well as the average rating of that the four conditions received. We can see the p-value (0.84) is greater than the a priori; therefore, there is no statistical significance found for the overall rating and we accept the null hypothesis, meaning the display type does not appear to affect the overall rating. Figures 17 – 20 show the results for the other survey questions. Similar to Figure 16, none of the results resulted in statistical significance based on an a priori of 0.05. Figure 17 Related Rating ANOVA 57 Figure 17 shows the related rating ANOVA for each display type as well as the average rating of that the four conditions received with a resulting p-value of 0.43. Figure 18 Reuse Rating ANOVA Figure 18 shows the reuse rating ANOVA for each display type as well as the average rating of that the four conditions received with a resulting p-value of 0.79. Figure 19 Satisfaction Rating ANOVA Figure 19 shows the satisfaction rating ANOVA for each display type as well as the average rating of that the four conditions received with a resulting p-value of 0.44. 58 Figure 20 Trust Rating ANOVA Figure 20 shows the trust rating ANOVA for each display type as well as the average rating of that the four conditions received with a resulting p-value of 0.44. There are no ratings tested that appear to have any statistical significance. In other words, the display types do not appear to influence the ratings. This was expected after visualizing the data, as the average ratings appeared to be similar. As a result, this thesis cannot reject the null hypothesis and concludes that displaying uncertain results differently from certain results does not improve the likelihood of them being chosen by the user. As a result, this is a very promising result for the Bubble Up Problem. Specifically, when presented with high-quality results, regardless of the way displayed, the user will pick certain or uncertain results based on the quality of the recipe. When determining the recipes to be used in this research, the recipes were handpicked based on their rating, number of reviews, and subjective quality of the recipe (at the discretion of myself). This was done to ensure high-quality recipes were chosen for both certain and uncertain recipes. As a result, there is a possibility for numerous explanations as to why we received the results we did. In other words, we received intermingling result, meaning there may be multiple explanations for our results. Intermingling Results There was no statistical significance between the control and intermingling results. As mentioned above, this is probably due to the nature of the high-quality recipes that were used. Consequently, this may have been one cause of the lack of statistical significance. Therefore, high-quality results are likely to be chosen by the user regardless 59 of user reviews. Future work in this research could include low-quality recipes to determine if that impacts user satisfaction. Based on the idea of intermingling certain and uncertain products (display number 1), a new idea that recommendation engines could use to introduce uncertain/new products to the user could be a form of a multi-arm bandit algorithm. The idea is that after the new or uncertain product has been rated to a certain threshold (e.g., a magic number that the developers agree on, such as 100) then it could be released into the regular recommendation engine to stand on its own feet. This would allow new products to be exposed to the user and rise to the top with other already established recipes. My suggested algorithm is strengthened by display 2 and 3 (displaying uncertain and certain results as uncertain results, respectively). Displaying just these “uncertain” results was done to determine if there was any significance in the yellow font display type. As calculated in the ANOVAs, there is no statistical significance between any of the display types, regardless of rating. Because of this, we can expect that the proposed algorithm would only allow for high-quality recipes to make it through the vetting process. As stated before, high-quality recipes will be high-quality to the user, regardless of whether it had been vetted by crowdsourcing or not. For example, a newly posted recipe will start with zero reviews and ratings; however, the recipe may still be high-quality to the user it is suggested to. 60 Conclusion We conclude there is no statistical significance in displaying uncertain results using different display techniques for high-quality recipes. Different methods of displaying these results do not appear to influence a user’s trust or satisfaction with them and the users are equally as likely to choose uncertain results, regardless of how they are displayed, as they are to choose certain results. Based on the results, it appears that presenting users with appealing, high-quality uncertain products, recipes in this situation, can solve the Bubble-Up Problem. Due to the statistical insignificance of the data, the result of the research question presented can be answered by the following: Introducing uncertain results in a content-based/ collaborative filtering hybrid recommendation system does not impact a participant’s trust towards a recommendation system, neither negatively nor positively. Each variable tested against the control shows similar results to one another. However, the results may be tainted by implicit, accidental bias in the design of the experiment. For example, the length of the experiment may have caused participants to become lazier by the end of the experiment, providing quick, inaccurate results to simply finish the experiment. Additionally, using compensation to make the experiment attractive may have had the same results, where participants quickly went through the experiment without critically thinking about their answers. Lastly, there may be other aspects of the design of this project which may have unintentionally affected the participants and their answers. Therefore, there is an opportunity to improve on this experiment and further investigate the Bubble-Up Problem in a recommendation system. Future research into 61 this subject may include redesigning the front-end user experience to remove any implicit bias, using different data other than recipes to see if similar results are achieved, and collecting more detailed information from the surveys to better determine a user’s trust in the recommendation system. Furthermore, as described earlier, introducing low-quality, unappealing recipes, both certain and uncertain. into each display technique can be done to determine if low-quality results affect the user’s trust and satisfaction. 62 References 1. Fields, Benjamin, “Contextualize Your Listening: The Playlist as Recommendation Engine,” University of London, 2011. 2. Mascia, P., Neugebauer, N.M., Brown, J. et al. “Exposure to conditions of uncertainty promotes the pursuit of amphetamine.” Neuropsychopharmacol 44, 274–280 (2019). https://doi.org/10.1038/s41386-018-0099-4. 3. Zeeb, Fiona D et al. “Uncertainty exposure causes behavioural sensitization and increases risky decision-making in male rats: toward modelling gambling disorder.” Journal of psychiatry & neuroscience : JPN vol. 42,6 (2017): 404-413. doi:10.1503/jpn.170003. 4. Merlhiot G, Mermillod M, Le Pennec J-L, Dutheil F, Mondillon L. “Influence of uncertainty on framed decision-making with moral dilemma.” 2018, PLoS ONE 13(5): e0197923. https://doi.org/10.1371/journal.pone.0197923. 5. Deutsch, M., “Cooperation and trust: Some theoretical notes,” Nebraska Symposium on Motivation, 1962, University Nebraska Press, 275-320. 6. Vidhya Murali, Kurt Jacobson, Edward Newett, Brian Whitman, and Romain Yon, “Music Personalization at Spotify,” In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16), 2016, Association for Computing Machinery, New York, NY, USA, 373, doi: https://doi.org/10.1145/2959100.2959120 7. O. Nasraoui and C. Petenes, "An intelligent Web recommendation engine based on fuzzy approximate reasoning," The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03., St Louis, MO, USA, 2003, pp. 1116-1121 vol.2, doi: 10.1109/FUZZ.2003.1206588. 8. W. Paireekreng, "Mobile content recommendation system for re-visiting user using content-based filtering and client-side user profile," 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 2013, pp. 1655-1660, doi: 10.1109/ICMLC.2013.6890864. 9. H. Xue and D. Zhang, "A Recommendation Model Based on Content and Social Network," 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 2019, pp. 477-481, doi: 10.1109/ITAIC.2019.8785729. 10. D. Ayata, Y. Yaslan and M. E. Kamasak, "Emotion Based Music Recommendation System Using Wearable Physiological Sensors," in IEEE Transactions on Consumer Electronics, vol. 64, no. 2, pp. 196-203, May 2018, doi: 10.1109/TCE.2018.2844736. 11. N. Pukkhem and W. Vatanawood, "An Evidential Reasoning Approach for Learning Object Recommendation with Uncertainty," 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC), Kaohsiung, 2009, pp. 262-265, doi: 10.1109/ICICIC.2009.84. 12. Z. Wu and H. Wu, "Uncertainty Management in Personalized Recommendation for E-commerce," 2009 21st IEEE International Conference on Tools with Artificial Intelligence, Newark, NJ, 2009, pp. 617-620, doi: 10.1109/ICTAI.2009.119. 63 13. Worapat, Paireekreng & Wong, Kok & Fung, Chun, “A model for mobile content filtering on non-interactive recommendation systems,” Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, 2011, 2822- 2827, doi: 10.1109/ICSMC.2011.6084100. 14. Lipshitz, Raanan, & Orna Strauss, “Coping with Uncertainty: A Naturalistic Decision-Making Analysis”, Organizational Behavior and Human Decision Processes, Vol.69, No. 2, Feb 1997. 15. Berkeley, D., & Hymphreys, P., “Structuring decision problems and the ‘bias’ heuristic.” Acta Psychologica, 1982, 201-252. 16. Allaire, Y., & Firsirotu, M. E., “Coping with strategic uncertainty,” Sloan Management Journal, 1989, 70-76. 17. Conrath, D., “Organizational decision-making behavior under varying conditions of uncertainty,” Management Science, 1967, B487-B500. 18. Milliken, F. C., “Three types of perceived uncertainty about the environment: State, effect, and response uncertainty,” Academy of Management Review, 1987, 133-143. 19. Bell, David E. “Disappointment in Decision Making under Uncertainty.” Operations Research, vol. 33, no. 1, 1985, pp. 1–27. JSTOR. 20. Hsu M, Bhatt M, Adolphs R, Tranel D, Camerer CF. “Neural systems responding to degrees of uncertainty in human decision-making,” Science. 2005 Dec 9;310(5754):1680-3. doi: 10.1126/science.1115327. 21. Platt, Michael L, and Scott A Huettel. “Risky business: the neuroeconomics of decision making under uncertainty.” Nature neuroscience vol. 11,4 (2008): 398- 403. doi:10.1038/nn2062. 22. Bell, David E., “Regret in Decision Making under Uncertainty, Operations Research”, 1987, issue 5, p. 961-981. 23. Prelec, Drazen and Loewenstein, George. “Decision Making Over Time and Under Uncertainty: A Common Approach.” Management Science, vol. 37 no. 7, 1991, pp. 770–786. 24. Karl, Marion. “Risk and Uncertainty in Travel Decision-Making: Tourist and Destination Perspective”. Journal of Travel Research, vol. 57 iss. 1, pp: 129-146. 25. G. Jun, H. Tao, Y. Ting and Z. Hong-wei, "A Multi-attribute Group Decision-making Approach Based on Uncertain Linguistic Information," 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, 2019, pp. 4125- 4128, doi: 10.1109/CCDC.2019.8832831. 26. G. Wei and X. Wang, "Maximal deviation model for multiple attribute decision making under uncertain linguistic environment," 2008 7th World Congress on Intelligent Control and Automation, Chongqing, 2008, pp. 7471-7475, doi: 10.1109/WCICA.2008.4594084. 27. Luo, Yujun, "Projection method for multiple attribute decision making with uncertain attribute weights under intuitionistic fuzzy environment," 2009 Chinese Control and Decision Conference, Guilin, 2009, pp. 2945-2948, doi: 10.1109/CCDC.2009.5191817. 28. J. Jin, R. Shen, M. Zhang, C. Zhou and Z. Pan, "Uncertain decision-making analysis method based on information entropy principles," 2009 Chinese Control 64 and Decision Conference, Guilin, 2009, pp. 2241-2246, doi: 10.1109/CCDC.2009.5192192. 29. Hariri, R.H., Fredericks, E.M. & Bowers, K.M. “Uncertainty in big data analytics: survey, opportunities, and challenges,” J Big Data 6, 44 (2019). https://doi.org/10.1186/s40537-019-0206-3 30. Moffat, R. J., “Describing the uncertainties in experimental results”, Experimental Thermal Fluid Science, vol. 1, pp. 3–17, 1988. 31. M. G. Ozsoy and F. Polat, "Trust based recommendation systems," 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), Niagara Falls, ON, 2013, pp. 1267-1274, doi: 10.1145/2492517.2500276. 32. Y. Zheng, B. Mobasher and R. Burke, "CARSKit: A Java-Based Context-Aware Recommendation Engine," 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, 2015, pp. 1668-1671, doi: 10.1109/ICDMW.2015.222. 33. D. Pathak, S. Matharia and C. N. S. Murthy, "ORBIT: Hybrid movie recommendation engine," 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN), Tirunelveli, 2013, pp. 19-24, doi: 10.1109/ICE-CCN.2013.6528589. 34. D. Pathak, S. Matharia and C. N. S. Murthy, "NOVA: Hybrid book recommendation engine," 2013 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, 2013, pp. 977-982, doi: 10.1109/IAdCC.2013.6514359. 35. Michael Schrage, "RECOMMENDATION ENGINES," in Recommendation Engines , MIT Press, 2020, pp.1-6. 36. Y. Li, J. Luo and T. Mei, "Personalized image recommendation for web search engine users," 2014 IEEE International Conference on Multimedia and Expo (ICME), Chengdu, 2014, pp. 1-6, doi: 10.1109/ICME.2014.6890327. 37. Y. Wu, Y. Du and L. Li, "A research of collaborative filtering recommendation based on ant colony algorithm," 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering, Bali, 2011, pp. 58-61, doi: 10.1109/URKE.2011.6007907. 38. M. Bhat, K. Shumaiev, K. Koch, U. Hohenstein, A. Biesdorf and F. Matthes, "An Expert Recommendation System for Design Decision Making: Who Should be Involved in Making a Design Decision?" 2018 IEEE International Conference on Software Architecture (ICSA), Seattle, WA, 2018, pp. 85-8509, doi: 10.1109/ICSA.2018.00018. 39. U. Sahu, A. K. Tripathy, A. Chitnis, K. A. Corda and S. Rodrigues, "Personalized recommendation engine using HADOOP," 2015 International Conference on Technologies for Sustainable Development (ICTSD), Mumbai, 2015, pp. 1-6, doi: 10.1109/ICTSD.2015.7095901. 40. M. A. Hossain and M. N. Uddin, "A Neural Engine for Movie Recommendation System," 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Dhaka, Bangladesh, 2018, pp. 443-448, doi: 10.1109/CEEICT.2018.8628128. 65 41. A. K. Gupta and N. Sardana, "Significance of Clustering Coefficient over Jaccard Index," 2015 Eighth International Conference on Contemporary Computing (IC3), Noida, 2015, pp. 463-466, doi: 10.1109/IC3.2015.7346726. |
Format | application/pdf |
ARK | ark:/87278/s6tbp1qb |
Setname | wsu_smt |
ID | 96835 |
Reference URL | https://digital.weber.edu/ark:/87278/s6tbp1qb |