Hoggan, Ashlee_MCS_2021

Title	Hoggan, Ashlee_MCS_2021
Alternative Title	Gender Prediction Based on Food Selection
Creator	Hoggan, Ashlee
Collection Name	Master of Computer Science
Description	The following Master of Science of Computer Science thesis explores gendered stereotypes in food by examining participant's answers while allowing machine learning algorithms to determine whether the individual is male or female.
Abstract	The research presented in this paper investigates whether an individual's gender can be predicted based on their recipe preferences. Gender is commonly used in demographic recommendation engines to improve CTR (click-through rates). Previous research has found that there are gendered stereotypes in food, and those stereotypes would classify desserts as feminine and hearty meals like steaks as masculine. A more in depth look into this subject is examined in this thesis by having participants answer a series of 200 questions that determine whether there is a gender stereotype phenomenon within recipe selections and whether the questions' results can allow machine learning algorithms to determine whether an individual is male or female. The results of this study found that the machine learning algorithms used for testing only had a 50% accuracy rate when determining someone's gender based on their recipe selections. Although the machine learning algorithms only had a 50% accuracy rate, there was statistical significance with the meat versus dessert testing category. However, I did find that men are more likely to choose meat recipes over dessert recipes, and women are more likely to select dessert recipes over meat recipes.
Subject	Algorithms; Gender; Computer science
Keywords	Demographic recommendation engines; gender; machine-learning algorithms; recipes
Digital Publisher	Stewart Library, Weber State University, Ogden, Utah, United States of America
Date	2021
Medium	Thesis
Type	Text
Access Extent	1.35 MB; 70 page PDF
Language	eng
Rights	The author has granted Weber State University Archives a limited, non-exclusive, royalty-free license to reproduce their theses, in whole or in part, in electronic or paper form and to make it available to the general public at no charge. The author retains all other rights.
Source	University Archives Electronic Records; Master of Computer Science. Stewart Library, Weber State University
OCR Text	Show Gender Prediction Based on Food Selection by Ashlee Hoggan A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE OF COMPUTER SCIENCE WEBER STATE UNIVERSITY Ogden, Utah August 23, 2021 _____________________ (signature) Robert Ball Committee Chair ________________________________ (signature) Sarah Herrmann Committee Member ________________________________ (signature) Nicole Anderson Committee Member ________________________________ (signature) Ashlee Hoggan Ashlee Hoggan A Thesis in the Field of Computer Science for the Degree of Master of Science in Computer Science Weber State University December 2021 Copyright 2021 Ashlee Hoggan Gender Prediction Based on Food Selection Abstract The research presented in this paper investigates whether an individual’s gender can be predicted based on their recipe preferences. Gender is commonly used in demographic recommendation engines to improve CTR (click-through rates). Previous research has found that there are gendered stereotypes in food, and those stereotypes would classify desserts as feminine and hearty meals like steaks as masculine. A more in-depth look into this subject is examined in this thesis by having participants answer a series of 200 questions that determine whether there is a gender stereotype phenomenon within recipe selections and whether the questions’ results can allow machine learning algorithms to determine whether an individual is male or female. The results of this study found that the machine learning algorithms used for testing only had a 50% accuracy rate when determining someone’s gender based on their recipe selections. Although the machine learning algorithms only had a 50% accuracy rate, there was statistical significance with the meat versus dessert testing category. However, I did find that men are more likely to choose meat recipes over dessert recipes, and women are more likely to select dessert recipes over meat recipes. Table of Contents List of Tables ..................................................................................................................... vi List of Figures .................................................................................................................. viii Introduction ..........................................................................................................................9 The following are my hypotheses: .........................................................................10 Study Outline .........................................................................................................11 Paper Outline .........................................................................................................14 Related Work .....................................................................................................................16 Gather Recipe Data and Information .................................................................................21 Create a Set of Questions ...................................................................................................22 Design of Recipe Flashcard Prototypes .............................................................................25 Creating the Study..............................................................................................................27 Website Design ......................................................................................................28 Database Structure .................................................................................................28 Complete IRB ....................................................................................................................30 Pilot Test ............................................................................................................................32 Adjusted Study Based on Pilot Test...................................................................................34 Recipe Flashcard Changes .....................................................................................34 New Additions .......................................................................................................35 Database Changes ..................................................................................................37 Experiment Run .................................................................................................................39 Machine Learning Algorithms ...........................................................................................42 Results ....................................................................................................................43 Analysis of Results ............................................................................................................47 Conclusion .........................................................................................................................61 References ..........................................................................................................................68 List of Tables Table 1 Recipe Gender Classifiers .................................................................................... 11 Table 2 Recipe Selections by Gender ............................................................................... 40 Table 3 Dessert vs. Meat Question Category vs. Participant ............................................ 40 Table 4 Submitter Gender Question Category vs. Participant .......................................... 41 Table 5 Direction Length Question Category vs. Participant ........................................... 41 Table 6 Neural Nets .......................................................................................................... 44 Table 7 Decision Tree ....................................................................................................... 45 Table 8 SVM ..................................................................................................................... 46 Table 9 Submitter Question Category ............................................................................... 48 Table 10 Meat vs. Dessert Question Category .................................................................. 49 Table 11 Direction Length Question Category ................................................................. 50 Table 12 Long vs. Short Time Allotted ............................................................................ 51 Table 13 Participant Group ............................................................................................... 51 Table 14 Screen Position .................................................................................................. 52 Table 15 Submitter Question Category ............................................................................. 53 Table 16 Meat vs. Dessert Question Category .................................................................. 54 Table 17 Direction Length Question Category ................................................................. 55 Table 18 Long vs. Short Time Allotted ............................................................................ 55 Table 19 Participant Group ............................................................................................... 56 Table 20 Screen Position .................................................................................................. 56 Table 21 Submitter Question Category ............................................................................. 57 Table 22 Meat vs. Dessert Question Category .................................................................. 58 Table 23 Direction Length Question Category ................................................................. 59 Table 24 Long vs. Short Time Allotted ............................................................................ 59 Table 25 Participant Group ............................................................................................... 60 Table 26 Screen Position .................................................................................................. 60 Table 27 Statistical Significance Between Tests .............................................................. 66 List of Figures Figure 1 Group A Halfway Page .......................................................................................13 Figure 2 Group B Halfway Page ........................................................................................13 Figure 3 Initial Recipe Flashcard Prototype ......................................................................25 Figure 4 Final Recipe Flashcard Prototype Design ...........................................................26 Figure 5 Website Wireframe ..............................................................................................28 Figure 6 Initial Database Structure Before Pilot Testing ...................................................29 Figure 7 Final Recipe Flashcard Design ............................................................................34 Figure 8 Pause Page ...........................................................................................................36 Figure 9 Tutorial Page .......................................................................................................36 Figure 10 Halfway Page.....................................................................................................37 Figure 11 Final Database Structure ....................................................................................38 Figure 12 Participant Gender Breakdown ..........................................................................39 Introduction As defined by Adobe, recommendation engines are software systems that “personalize experiences by identifying the right offer, product, or content for the people on your website or mobile app, or for those who interact with you on any digital channel” [1]. More specific recommendation engines, such as demographic recommendation engines, focus on using the demographic data, such as the location, age, and gender of an individual, on determining what content to recommend. This thesis looks at demographic recommendation engines surrounding recipe choices and seeing if the CTR (click-through rate) can be improved based on gender food stereotypes. The American Psychological Association defines gender in this context as “the condition of being male, female, or neuter … gender implies the psychological, behavioral, social, and cultural aspects of being male or female (i.e., masculinity or femininity)” [2]. An example of a gender food stereotype is that meat (mainly red) is considered more masculine [3], whereas desserts tend to be considered more feminine [4]. Human Computer Interaction (HCI) is defined as “a field of study focusing on the design of computer technology, and in particular, the interaction between humans (the users) and computers.” [5]. This thesis uses HCI concepts to answer the following research question: Can someone’s gender be accurately predicted by machine learning algorithms based on their recipe selections? 10 The following are my hypotheses: 􀁸􇠠 Predicting people’s gender using machine learning algorithms will result in more than a 70% accuracy rate [6]. The data used to teach the machine learning algorithm will come from the study, which involves participants choosing recipes. The predictions will be verified using the study’s data. 􀁸􇠠 Females will be more likely to click on recipes that are desserts [4]. 􀁸􇠠 Females will be more likely to click on more complex recipes because they are more likely to have more experience cooking [7]. For a recipe to be considered more complex in the context of this study, it must have more directions than the recipe that it is being compared to when presented to the participants. 􀁸􇠠 Males will be more likely to click on recipes that are meat-based [3]. 􀁸􇠠 Males will be more likely to click on less complex recipes (i.e., fewer directions) because they have less experience cooking [7]. 􀁸􇠠 The perceived gender of the recipe submitter will play a part in whether an individual chooses a recipe. An example of this hypothesis is if a man and woman submitted the same recipe for ribs, the man’s recipe would be selected over the woman’s because ribs are considered masculine [3, 4]. 􀁸􇠠 Individuals will be more likely to choose a recipe that fits their gender stereotype if they have less time to process a recipe’s information (the submitter, directions, image) [8]. 11 Study Outline In Table 1, there is a breakdown of what is considered a masculine recipe versus a feminine recipe for this study’s purposes. Feminine recipes are dessert-themed and have more directions, whereas male recipes are primarily meat-based and have fewer directions. Table 1 is based on the gender stereotypes in food [3, 4]. Table 1 Recipe Gender Classifiers Masculine Feminine Meat Desserts Fewer directions More directions The results found in this thesis are the result of 524 participants who were recruited from the PSY1010 course at Weber State University. The only piece of identifiable information collected from each participant was their gender, and they were able to select from the following: 􀁸􇠠 Female 􀁸􇠠 Male 􀁸􇠠 Transgender 􀁸􇠠 Non-binary 􀁸􇠠 Non-conforming Each participant was randomly assigned a group, A or B. The A/B testing used in this study was to see if time impacted the participant’s recipe selection and to address the hypothesis that an individual will be more likely to select a recipe that fits their gender 12 identity if they have less time to choose a recipe. The difference between the two groups is as followed: 􀁸􇠠 Group A o 5 seconds for the first 100 questions o 1.75 seconds for the last 100 questions 􀁸􇠠 Group B o 1.75 seconds for the first 100 questions o 5 seconds for the last 100 questions For example, if we have Participant A and they are randomly assigned to group A, they will be given 5 seconds to each question for the first 100 questions. The participant will be shown a splash screen at the halfway mark where it will show that they have 1.75 seconds to answer each of the remaining 100 questions; this is demonstrated in Figure 1. Let us say the next participant, Participant B, joins the study and is assigned to Group B. Participant B will have 1.75 seconds to answer each question for the first 100 questions, and halfway through the study, they will be put on the halfway page where it will show that they have 5 seconds to answer each of the remaining questions. An example of this can be seen in Figure 2. 13 Figure 1 Group A Halfway Page Figure 2 Group B Halfway Page The two different groups are a way to test system 1 (fast, unconscious) vs. system 2 (slow, conscious) thinking. Daniel Kahneman’s research has found that our brains have two operating system: system 1 and system 2. The system 1 operating system makes up 14 98% of our thinking whereas system 2 only makes up 2% of our thinking [8]. By having participants be randomly assigned to a group, the system 1 and system 2 thoughts can be tested and used to address the hypothesis that an individual will be more likely to choose a recipe that fits their gender stereotype if they have less time to process a recipe’s information. This study investigated three different question categories to address the various hypotheses, and the three categories and the hypotheses that they addressed were: 􀁸􇠠 The question category (meat vs. dessert) addresses whether females are more likely to click on dessert recipes and men being more likely to click on meat recipes. 􀁸􇠠 The question category for submitter gender addresses whether the perceived gender of the recipe submitter makes an impact on a participant’s recipe selection. 􀁸􇠠 The question category for direction length addresses whether women are more likely to click with a longer direction length, whereas men are more likely to click on recipes with shorter directions. Paper Outline The remainder of this thesis will walk through the milestones which led to the final result of the machine learning algorithms not accurately predicting someone’s gender based on their recipe selections. The milestones were completed in the following order: 15 1. Gather Recipe Data and Information 2. Create a Set of Questions 3. Design Recipe Flashcard Prototypes 4. Create the Study 5. Complete IRB 6. Pilot Test 7. Adjust study based on Pilot Test 8. Run Experiment 9. Run Data Through Machine Learning Algorithms 10. Analyze Results Milestones 1-3 are all about the background knowledge needed to conduct a study based on recipe selections. The data for the recipes needs to be gathered, a set of questions to ask the participants needs to be written, and the design of the study flashcards needs to be finalized before web development starts. Milestone 4 is the development of the website used to host the study itself. The development process includes setting up the database, writing the application, and hosting the website. Milestones 5-7 are the finalization steps before the study is live to participants. IRB approval is granted in these milestones, and the website is cleaned up for the optimal participant experience. Milestone 8 is when the study is open to participants and looks at the breakdown of participant results in the case of this thesis. Milestones 9 and 10 are using the final data to gather and analyze results. 16 Related Work Recommendation engines, also known as recommendation systems, are tools used by many individuals and companies to personalize what offers, products, or content are shown and who would see them [1] and can be used as an effective form of market targeting to increase profits [9]. A recommendation can be constructed using different techniques, and the four most common techniques are collaborative filtering, content-based, knowledge-based, and demographic-based [10]. Recommendation engines built using collaborative filtering present recommendations based on the previous history of a user (i.e., previous likes, viewing history) [11]. Content-based recommendation engines will recommend things to users based on similarities between other items they have liked [12]. The third technique for creating a recommendation system is to have a knowledge-based system that uses the knowledge of an item and how it might meet the user’s needs [13]. Finally, the fourth most common technique to construct a recommendation is by using demographics. This thesis will be focusing on a recommendation engine built using one of the four techniques, demographics. Demographic recommendation engines use data like gender, age, and location to recommend specific content to individuals [14]. For example, a demographic engine might conclude that women typically prefer to see ads for make-up products, whereas men prefer to see offers for the latest sports game. A demographic engine can conclude these things because it has found that women tend to click make-up ads at higher rates than men, and men tend to click on offers for sports games at higher rates than women. The study presented in this thesis looks at gender differences in recipe selections and whether the data collected could improve demographic recommendation engines [13]. 17 People have been researching and testing ways to improve recommendation engines over the years. In 2002, M. Setten looked at how to improve the personalization of recommendation engines by using categories. He looked at a prediction technique called genre LMS, which provides recommendations based on data that can be categorized and learns from the user’s interest. His approach was unique because it did not require other user information to provide a recommendation. However, his prediction technique only worked on data that could be easily categorized (i.e., movies) [15]. Nguyen et al. looked at the ability to recommend Reddit threads based on the user’s Twitter profile, treating it as a genre classification problem. They introduced a new classifier, WordNet, which is based on genre classification, and it had high precision like other well-known classifiers (SVM [17], Random forests [18], Bayesian ensemble [19]) on larger datasets but had poor accuracy on smaller datasets. Nguyen et al., results indicate that when given a dataset that contains many tweets, their classifier can recommend a Reddit thread that may be of interest to the user [16]. One of the most significant problems with recommendation engines is called the cold-start problem [20], where the engine lacks information about users and items [21]. For example, when a new user signs up to a website like AllRecipes.com, they may not be given any recipe recommendations because there is no data about the user yet. The recommendation engine doesn’t know what to recommend to the new user until it has more data about the specific user. Sahebi et al. found through their research that one way to help aid in the cold-start problem is to use user connections and ratings to detect communities of people and provide better recommendations that way. Their research found that using the community-based approach for data lead to better recommendations 18 [22]. The cold-start problem is significant because the study presented in this thesis does not start with any data besides the recipes gathered from AllRecipes.com, and all other data will be gathered at the time of the study. The other part of this thesis looks at gender stereotypes with food. Gender stereotypes with food have been examined, and many papers have concluded that meats are typically viewed as masculine [4], whereas vegetables, dairy products, fish, fruits, and desserts are more commonly seen as feminine [3]. Men generally have recognized that a meatless diet is healthier, but in a study conducted by K. Sellaeg and G. E. Chapmen, most men did not avoid meat [4], whereas in a study conducted by E. Roos et al. they found that women’s diets tend to be more in line with meatless dietary guidelines [23]. According to L. A. Rudman et al., most people, both males and females, are more likely to follow their gender stereotypes in fear of social backlash, so it would make sense that even though men see having a meatless diet as healthier, they still choose to eat meaty recipes over healthier or dessert recipes [24]. However, it is tough to say that participants of the study presented in this thesis will experience backlash because Rudman et al, did not mention whether someone can still enact gender stereotypes if no one was watching. According to a paper written by S. Higgins et al., when examining click-through rates, it has been found that younger men tend to have a higher CTR when a male is presented in a targeted advertisement on Facebook compared to older men and women. The fact that younger men have a higher CTR when presented with an ad containing a male show that ads personalized to a user’s age and gender creates more engagement [25]. Further research has been conducted by Zhao et al., and they found that an individual’s emotional state can affect their shopping, which indicates that 19 recommendation engines should also consider emotional wants from its users to drive a higher CTR [26]. An individual’s emotional state can affect their shopping by creating specific browsing history. An example of this may be that when someone is emotionally upset they’re more likely to do retail shopping, which feeds the recommendation engine to be more likely to show clothing items. Websites like Movielens and Flixter allow their users to rate movies, and in a previous study conducted by U. Weinsberg et al., they investigated where the user ratings could be used to determine the gender of the user [28]. The results of the gender inference algorithms used by U. Weinsberg et al. showed a 70%-80% accuracy rate even with the websites allowing users to obfuscate information about the users. In a different study, C. Peersman et al. found that a support vector machine (SVM) algorithm yielded a 71.3% accuracy rate for aged-based classification and an even better accuracy of 88.8% when the metadata was balanced with age and gender [6]. An SVM algorithm is a machine learning algorithm that uses a hyperplane as the decision surface and uses training data classified as support vectors. The support vectors are the data points closest to the optimal hyperplane, and an optimal hyperplane minimizes the probability of a classification error [29]. Gender classification has also been examined using Twitter user’s tweets. R. Hirt et al. used three different classifiers. The first classifier was a text classifier that used natural language processing (NPL) to classify the gender of a user. The second classifier was a name classifier to examine the user’s Twitter handle and the name shown at the top of a user’s profile. The final classifier used was a third-party classifier to classify the 20 faces that appeared as the Twitter user’s avatar. These three classifiers together achieved an 80% accuracy rate across 3,000 user profiles [30]. Other machine learning algorithms have been used to investigate whether gender can be predicted based on metadata. In a study conducted by C. Verma et al., they used classifiers such as Bayesian Network (BN), C5 Decision Tree (C5), Random Tree (RT), and Logistic Regression (LR) to determine whether the gender of European teachers could be predicted. Their results showed that RT had the highest accuracy of 96.7%, and LR had the lowest accuracy of 81.65% [31]. Some studies have shown some issues with machine learning algorithms when the dataset is unbalanced. To overcome unbalanced data issues, some researchers have used Cost-Sensitive Learning, Resampling, and Class balancing to balance the data [32] and ultimately give a more accurate accuracy of the machine learning algorithms. This thesis uses the background knowledge of gender differences found in food (desserts being more feminine, meat being more masculine) and looks to see if an individual’s gender can be predicted based on their recipe selections using the following machine learning algorithms: 􀁸􇠠 SVM 􀁸􇠠 Decision Tree 􀁸􇠠 Neural Nets The SVM algorithm had high accuracy with just age-based classifiers but produced an even higher accuracy when the age-based classifiers were combined with gender data, which could potentially have high accuracy on the data gathered during the study [6]. A different study showed that Decision Trees had a high accuracy of over 80%, 21 and so this thesis will also be using this algorithm [31]. The last machine learning algorithm used in this thesis, neural nets, was not commonly used in other research but is a more popular machine learning algorithm, so it would be interesting to see if it had a higher accuracy than the SVM and Decision Tree algorithms. Gather Recipe Data and Information The recipe data gathered for this study came from Dr. Robert Ball and his team. They scraped recipe data from AllRecipes.com and stored it locally for use in Google Drive. I manually went through that data to find 50 female recipes and 50 male recipes. Female recipes are defined by recipes that could be considered desserts like ice cream, cupcakes, and cookies. In comparison, male recipes are explained by recipes that had meat as the focal point of the recipe, like steaks, hamburgers, and BBQ. One of the main criteria I looked at when selecting recipes was the image associated with the recipe. Most male recipes were less flattering than female recipes, so finding recipes with appealing photos and fit within one of the gendered categories was important. The other necessary data I gathered was the submitter name information. An issue with the submitter name information is that it could be made up of usernames or the individual’s actual name. To handle this issue, I found the top 100 female names and the top 100 male names and used those in place of the submitter names from AllRecipes.com. All of the names chosen were from the same cultural origin to remove the cultural bias that comes from names. 22 Create a Set of Questions I created the set of questions with the intent of not swaying a participant in a particular direction. I made 13 questions, but only 11 of them were used for the study. The idea is that the questions would not take away from the instinctual choice of the participant. The participant should choose whichever recipe looks the most appealing to them, not which recipe is necessarily the “correct” answer. For example, if the participant is presented with the question “Which recipe do you prefer?” and they are shown a steak recipe and a cupcake recipe, I would want the participant to select their preference, as there is no correct answer. Each question had to fit into the three categories that were going to be examined (submitter name, meat vs. dessert, and the number of directions) and be easy to read. The participants were only allowed a short amount of time to answer each question, so the question could not take too long to read but had to make them think about their recipe preference. I created an initial set of 13 questions that could be used in the study, and these questions focused on scenarios that could be used to compare different recipes in the three different categories that are being tested for (i.e., meat versus dessert, submitter gender, and direction length). As stated above, the purpose of the questions was to get the participant to think about which recipe they would prefer in the scenario but not draw from the instinctual choice of the participant. The final set of questions consisted of 11 out of the 13 questions because one of the questions was too specific and the other was too similar to another. An important thing to note is that some of the questions 23 specifically ask about desserts. If a question directly asks about desserts, the participants will only be shown dessert options. The following are a list of the initial questions: 1. Which recipe are you more likely to make for dinner? 2. Which recipe are you more likely to eat for lunch? 3. Which recipe would you prefer to make? 4. Which recipe would you bring to a party? 5. Which dessert recipe are you more likely to make? 6. Which recipe are you more likely to bring to a summer party? 7. Which recipe do you prefer? 8. Which dessert are you more likely to bring to a party? 9. Which recipe are you more likely to take to a friend? 10. Which recipe are you more likely to make for a work party? 11. Which recipe are you more likely to eat? 12. Which recipe are you more likely to want for dessert? 13. Which recipe are you more likely to bring to a BBQ? 24 The following are a list of the final questions: 1. Which recipe would you prefer to make? 2. Which recipe would you bring to a party? 3. Which dessert recipe are you more likely to make? 4. Which recipe are you more likely to bring to a summer party? 5. Which recipe are you more likely to want for dessert? 6. Which recipe are you more likely to make for dinner? 7. Which recipe are you more likely to take to a friend? 8. Which recipe do you prefer to eat? 9. Which recipe do you prefer? 10. Which recipe are you more likely to make for a work party? 11. Which recipe would you prefer to eat? Another essential thing to include in the study was a couple of filler questions, which check whether the participant is just clicking through the study or whether they are reading the question before selecting an answer. The filler questions were designed to be sparse, so I decided to only include one filler question in the first 100 questions and one filler question in the last 100 questions. The following are a list of the filler questions: 1. Please select the recipe on the left. 2. Please select the recipe on the right. 25 Design of Recipe Flashcard Prototypes The recipe flashcard’s initial prototype came from the thesis proposal and was the foundation for the recipe flashcards and is shown in Figure 3. The flashcards needed to have the submitter’s name, instructions, and recipe image laid out nicely to present all the information to the participant. The prototype below has the question, recipe image, submitter, and instructions, all available to the participant in a nicely laid out format. The main issue with this prototype is that the submitter name is significantly smaller than it should be because the submitter name is one of the categories tested during the study. Figure 3 Initial Recipe Flashcard Prototype For the prototype after the thesis proposal, I thought it was essential to make the submitter name more visible and to tweak the image position and sizes. The calories per serving needed to be removed because it was no longer being tested. An update of the flashcard can be seen in Figure 4. 26 Figure 4 Final Recipe Flashcard Prototype Design 27 Creating the Study In my thesis proposal, I stated that my full stack (the backend and client-side software [33]) would consist of MySQL, Java, HTML5, CSS, and Vanilla JavaScript. I wanted to keep the stack as lightweight as possible, so I did not use third-party libraries such as Bootstrap and jQuery for web page development. The idea for the website was that the participants would go to the landing page, and there would be the Informed Consent that they would have to read through and accept before starting the study and a gender selection button. The participant could choose their gender from the following options: female, male, transgender, non-binary, and non-conforming. As soon as the participant accepted the informed consent and selected their gender, they were taken to the study’s first question. Once in the study, the participant was assigned a group (A or B), which determined how long they get for the first 100 questions and the last 100 questions. The participant’s start time needed to be recorded, along with their recipe selection and the position of the recipe (whether it was on the left or right side of the screen). To ensure that an extra variable is not being added to the testing, every participant got the same order for the questions. For instance, questions 37 and 114 were always the filler questions. Figure 5 shows the overall flow of the website. 28 Figure 5 Website Wireframe Website Design The website was made using Java for the backend and HTML5, CSS, and Vanilla JavaScript for the frontend. I used Jetty, a Java web server, and Jersey, an open source framework created by Eclipse and is used to create RESTful applications for the server. REST stands for Representational State Transfer, and a RESTful application is an application that uses HTTP requests to transmit data. Database Structure A sophisticated database structure was necessary for this study because it would contain a lot of information about the participant’s behavior while within the study. For the basic structure of the database, there would need to be the following tables: Person, Recipe, Question, and Answer—the database was hosted through a Relational Database Service (RDS) instance on Amazon Web Services (AWS). An overview of the database can be seen in Figure 6. 29 Figure 6 Initial Database Structure Before Pilot Testing Once the website was completed, both the user interface (UI) and the backend, it was time to host the application. I put the jar files for the website in a Docker container to host the website and hosted it on AWS Elastic Beanstalk, a service to quickly deploy and scale web applications. A Docker container is a standard unit of software that packages up code and all its dependencies so that the application runs [34]. A jar file is a specific file format to aggregate java classes and associated metadata [35]. 30 Complete IRB I completed the IRB in multiple steps. The first step was to complete the Social & Behavioral Research – Basic/Refresher course through citiprogram.com. Once the Social & Behavioral Research course was complete, it was time to fill out the IRB form through Cayuse, Weber State University’s IRB application software. I filled out the Weber State University Informed Consent Form. The consent form consists of everything the participant needs to know before going into the study and the benefits and risks of being a participant. For the IRB submission, I had to have the questions that participants would be asked already known. The questions I submitted with the IRB application are in the Create a Set of Questions chapter above. I had more questions listed for the IRB submission than what ended up being used during the study. One of the main concerns when submitting the IRB was how to get participants and the exact number of participants in the study. For this study, the participants were recruited through the PSY 1010 course at Weber State University and were awarded course credit for completing the study by submitting their email at the end of the study. To ensure that students could access the study, the study was connected to the Sona system, which hosts different psychological studies. The first submission of the IRB was returned to me because it was unclear what information would be gathered, when it would be collected and how it will be kept anonymous. To address the first and second concerns, I stated that only the participants’ genders, recipe selections, and the time it took to select their answers would be recorded. For the last concern, I ensured that all the participant’s data would be kept anonymous by only having a Universally Unique Identifier (UUID) [36] tied to the participant; no other 31 identifiable information besides the participant’s gender was stored. The three concerns above were addressed with the second submission of the IRB, and it was accepted. 32 Pilot Test The pilot test was conducted informally by having family and friends be participants, with a total of 10 participants. I worked with the participants closely and walked them through the study to get their feedback and fix any glaring issues before the study went live. The initial feedback was that the question lengths were too long; it was too hard to try and read the question and answer it in the allotted time. During one of the participant’s study trials, most of their questions were left unanswered because the reading time between the question and the two recipes far exceeded the time limit of two seconds. Another observation arose when they were running through the 10 second time-limit portion of the study, and they said that the question time lasted too long. In the pilot test, two seconds was too short to read and answer the question, but 10 seconds was too long. The 10 seconds time allotment was changed to five seconds. During the pilot test, multiple participants mentioned that they were not prepared for the questions’ format and that the timing was weird and something that they did not get used to until several questions into the study. Another timing issue was that some of the participants wanted to take a break during the study because they had to use the restroom, attend to children, and just do something that took them away from the study, which meant that when the participant left the study for a second, they were kicked out for missing too many questions. Another issue was the loading time between the questions. The questions would take too long to load, and the additional load time would be used against the time allotted on the question, which meant that questions were being skipped and unanswered because 33 there was not enough time for the users to see the question and answer before the “time ran out.” This issue was addressed by storing the name of the image for the recipe in the database and storing all the photos on the server. Although this solution’s results did not result in the fastest loading times, it was significantly quicker than pinging AllRecipes.com for the recipe photos. The final takeaway from the pilot test was that the images were taking away from the recipes themselves. During one of the participant’s trials, we realized that they were selecting recipes based on the image and not the recipe content itself. It did not matter what the question was; the recipe was selected based on what image looked the best. The recipe images felt like they took away from the study’s purpose and caused participants to only look at what pictures were being presented and select a recipe based on the better-looking image. 34 Adjusted Study Based on Pilot Test The pilot test brought to light some of the issues with the study’s design and the website that the participants will be going to. Some of the most significant changes were to the recipe flashcards, and those changes can be seen in Figure 7. Figure 7 Final Recipe Flashcard Design Recipe Flashcard Changes The images were removed from the flashcards and replaced with recipe titles because the photos were taking away from the purpose of the study, and the pilot test found that the participants were picking recipes based on how good the picture looked, not what the participants’ preference was. A few other changes shown in Figure 7 are the time remaining, question count, and help button. All the pilot test participants thought it was essential to include the question count to know where they were in the study, and they also wanted to see the time remaining before they had to select a recipe. The help button was added so that the participants could go back to the tutorial at any time. 35 Another significant change can be seen on the right in Figure 7, and that change is the card highlighting. Whenever the time remaining drops below one second, the cards are highlighted in red to signify that time is running up. The green highlight is when the card has been selected, indicating that the participant’s response has been recorded. New Additions The pilot test results also indicated that there needed to be a few new additions to the website to improve the participant’s study experience. Figure 8 shows the new pause addition, which allows the participant to pause twice throughout their study. The pause time has no limit. In addition, I added a tutorial which can be seen in Figure 9. A tutorial was needed because most pilot test participants stated that they did not like being thrown right into the study because they were unfamiliar with how the study worked. The tutorial steps a participant through the setup and can be accessed throughout the study by hitting the help button. Lastly, I added a break between the first 100 questions and the second 100 questions. The intermediate break between sections can be seen in Figure 10 and was added because the pilot test participants said the dramatic difference in the time allotted between the first 100 questions and the last 100 questions was jarring. 36 Figure 8 Pause Page Figure 9 Tutorial Page 37 Figure 10 Halfway Page Database Changes The database required a few changes once it was determined that images were no longer going to be used in the recipe flashcards and that there needed to be more information collected. The Recipe table, a table used to track recipe information, was changed by adding a column for the recipe’s title. Another change to the database was to the Person table, a table to track a participant’s gender, number of pauses used, how many questions they have answered, the time they started the study, and whether they were group A or B. The Person table was changed so that the participant’s pauses could be tracked to not use more than two pauses during the study. The final change to the database was a change to the Answer table, a table to track which answer was selected for each question, what position on the screen the answer was, the chosen recipe, and the unselected recipe. The change to the Answer table was an addition of an entry date 38 column to track when the answer was selected. The final structure of the database can be seen in Figure 11. Figure 11 Final Database Structure 39 Experiment Run The study ran from October to December 4th, the final day of the Fall 2020 semester. The hard cutoff time was due to grades being due, and there would be no new PSY 1010 students taking the study for class credit. For the two months that the study was available, there were a total of 593 participants who started the study and 526 participants who answered all 200 questions but may or may not have answered the filler questions correctly. A breakdown of gender in the 526 participants who answered all 200 questions can be found in Figure 12. There were 341 female, 174 male, five non-conforming, three non-binary, and three transgender participants. Since there was such a low number of, non-conforming, non-binary, and transgender participants their data was removed. Figure 12 Participant Gender Breakdown Participant Gender Breakdown female male non-conforming non-binary transgender 40 From the male and female participants who completed the study, there are a total of 105,191 answers. A breakdown of the number of female recipes selected versus male recipes can be seen in Table 2. Table 2 shows that more feminine recipes are selected than masculine recipes. In Tables 3-5, the answers are broken down even further to show the difference in female recipes versus male recipes selected per category tested.. Since the male to female ratio was not balanced, the number of female participants was reduced to be equal to the number of male participants in the datasets for the machine learning algorithms, which is discussed in the next chapter. The number of female and male participants had to equalized because the machine learning algorithms were only setting an individual’s gender to female because there was more female data than male data. The last 167 female participants were removed from the data. Table 2 Recipe Selections by Gender Feminine Recipes Selected Masculine Recipes Selected Recipe Gender 57,058 48,133 Table 3 Dessert vs. Meat Question Category vs. Participant Female Male Dessert Recipe 6,130 5,536 Meat Recipe 5.159 5,865 41 Table 4 Submitter Gender Question Category vs. Participant Female Male Female Submitter 6,310 6,397 Male Submitter 4,988 4,996 Table 5 Direction Length Question Category vs. Participant Female Male Short 6,142 5,152 Long 5,159 6,231 42 Machine Learning Algorithms The three machine learning algorithms used for analysis are Neural Nets (NN), Decision Trees, and Support Vector Machine (SVM), and they were all written in Python. For all three algorithms used in this analysis, the non-conforming, non-binary, and transgender gender identities were removed from the analysis because there were not enough participants who identified with those genders. There was a significant difference in the number of female participants compared to male participants, which meant that the data needed to be fixed to where there was an equal number of male and female participants. The point of using these algorithms is to only look at whether the computer can predict male participants versus female participants based on their recipe selections. Each algorithm was run three times, and before the data was run through the algorithms, it had to be filtered. The first filter was to remove every non-available answer, meaning that the question timed out and was left unanswered. Next, the data was cleaned up to remove any participants who did not complete all 200 questions. There was a significant difference between the number of female participants compared to male participants, and there were enough female participants removed from the dataset to ensure an equal number of female participants compared to male participants. The last 167 female participants were removed from the datasets. The first run through the algorithms used all the data collected from the study and only filtered out what was mentioned above. The purpose of this run was to look at all the data as a whole and see if the algorithms could accurately predict female participants versus male participants. 43 For the second run through the algorithms, the first 75 questions of each section were filtered out, and only the last 25 questions for each section were answered. This meant that the second run only analyzed questions 75-100 and 175-200 for each participant. The purpose of filtering the data was to see if there was any change from the whole dataset when a participant had enough time to adjust to the question time limit. The final run through the algorithms was to analyze the participants who correctly answered both filler questions and took over five minutes to complete the study, indicating that they were not just clicking through the study as fast as they could. It was important to see if there was any change between this run and the first run because, in this last run, the participants were paying attention to the questions and not just clicking through as fast as they could to finish the study. Results When I started analyzing the data with the three machine learning algorithms (NN, decision trees, and SVM), it was apparent that the data had to be filtered so that there was an equal number of female participants and male participants because the algorithms would always result in always choosing female. This issue led to me removing some female participants from the algorithms to compare an equal number of female and male participants. The first dataset run through the algorithms was the one with all the answers from every participant who answered all 200 questions, the second dataset contained answers from the last 25 questions of each section (questions 75-100 and 175- 200), and finally, the last dataset was only answers from participants who answered the filler questions correctly. For this review, I have split up the results by machine learning algorithm and not the dataset. 44 The first machine algorithm tested against the datasets was Neural Nets (NN). Table 4 shows the results from running the algorithm against the dataset where 80% of the data was used for training, and the remaining 20% was used to test against. The best that this algorithm could do was 52% accuracy on the first and third datasets. Table 6 Neural Nets All Data Precision Recall F1-score support Female 0.52 0.61 0.56 6655 Male 0.52 0.43 0.47 6553 Accuracy - - 0.52 13208 Macro Avg 0.52 0.52 0.51 13208 Weighted Avg 0.52 0.52 0.51 13208 Last 25 Questions Female 0.50 0.60 0.54 1661 Male 0.51 0.41 0.46 1695 Accuracy - - 0.50 3356 Macro Avg 0.51 0.50 0.50 3356 Weighted Avg 0.51 0.50 0.50 3356 Passed Filter Female 0.52 0.41 0.45 1462 Male 0.52 0.63 0.57 1485 Accuracy - - 0.52 2947 Macro Avg 0.52 0.52 0.51 3356 Weighted Avg 0.52 0.52 0.51 3356 The second machine algorithm tested against the datasets was Decision Tree. Table 5 shows the results from running the algorithm against the dataset where 80% of the data was used for training, and the remaining 20% was used to test against. The best this algorithm could do was 55% accuracy at correctly selecting males when tested against the dataset that only used the answers from participants who answered the filler questions correctly. 45 Table 7 Decision Tree All Data Precision Recall F1-score support Female 0.50 0.73 0.60 6545 Male 0.52 0.29 0.37 6663 Accuracy - - 0.51 13208 Macro Avg 0.51 0.51 0.48 13208 Weighted Avg 0.51 0.51 0.48 13208 Last 25 Questions Female 0.49 0.69 0.58 1655 Male 0.51 0.31 0.38 1701 Accuracy - - 0.50 3356 Macro Avg 0.50 0.50 0.48 3356 Weighted Avg 0.50 0.50 0.48 3356 Passed Filter Female 0.52 0.48 0.50 1422 Male 0.55 0.59 0.56 1525 Accuracy - - 0.53 2947 Macro Avg 0.53 0.53 0.53 2947 Weighted Avg 0.53 0.53 0.53 2947 The last machine algorithm tested against the datasets was Support Vector Machine (SVM). Table 6 shows the results from running the algorithm against the dataset where 80% of the data was used for training, and the remaining 20% was used to test against. The best that this algorithm could do was 54% accuracy at correctly selecting males when tested against the dataset that only used the answers from participants who answered the filler questions correctly. 46 Table 8 SVM All Data Precision Recall F1-score support Female 0.52 0.56 0.53 6569 Male 0.52 0.48 0.50 6639 Accuracy - - 0.52 13208 Macro Avg 0.52 0.52 0.52 13208 Weighted Avg 0.52 0.52 0.52 13208 Last 25 Questions Female 0.53 0.56 0.54 1677 Male 0.53 0.50 0.51 1679 Accuracy - - 0.53 3356 Macro Avg 0.53 0.53 0.53 3356 Weighted Avg 0.53 0.53 0.53 3356 Passed Filter Female 0.53 0.53 0.53 1452 Male 0.54 0.53 0.54 1495 Accuracy - - 0.53 2947 Macro Avg 0.53 0.53 0.53 2947 Weighted Avg 0.53 0.53 0.53 2947 47 Analysis of Results The purpose of the statistical review is to see if there is any statistical significance when looking at the data. A chi-squared test was performed on each of the three different categories. Female participants will be denoted with an “F,” whereas male participants will be represented as an “M” in the tables. The first dataset is all the aggregated data from the participants who answered all 200 questions, a total of 105,191 answers. The second dataset looks at only the last 25 questions from both sections, which means that only answers to questions 75-100 and 175-200 were used. I used only the last twenty-five questions because I wanted to see if the participants’ time to get used to the question time limit led to statistical significance. It is important to note that by testing the last 25 questions for each section may lead to participants potentially be fatigued because these are the last questions of the sections. Finally, the last dataset only contains answers from participants who answered each of the filler questions right and took longer than five minutes to complete the study. In other words, the last dataset looked at participants who were paying attention to the questions being asked and did not rush through the study. Each dataset was run through six tests that test the question category, screen position, participant group, and the long vs. short time allotment. It is important to note that for the question category results, the results are split up with the time allotment for the question. It is important to see if the time difference in questions caused any statistical significance. For example, if the question category is testing a meat recipe versus a dessert recipe, I want to see if men and women are more likely to click on the recipe that fits their gender if they have less time to think versus if they have a longer time. 48 The first round of statistical tests was to look at all the aggregated data from every participant who completed all 200 questions. For the first test, which tested the submitter question category, the results showed no statistical significance, as shown in Table 7. The submitter question category tested whether or not the perceived gender of the recipe submitter affected the participant’s recipe selection. Something important to note is that the data collected for the submitter question category is wrong and thus the test is invalid. The data collected only showed whether the recipe tested was considered a male or female recipe (meat vs dessert) and not the gender of the submitter. Table 9 Submitter Question Category Submitter Category Short time limit (1.75 seconds) p-value: 0.982 Submitter Gender F M Total Male to Female Ratio female 5574 4399 9973 0.79 male 2904 2295 5199 0.79 Long time limit (5 seconds) p-value: 0.65 Submitter Gender F M Total Male to Female Ratio female 6268 4945 11213 0.79 male 3195 2482 5677 0.78 The second test tested whether there was any significance in men selecting meat recipes over dessert recipes and women selecting dessert recipes over meat recipes. As seen in Table 8, there is statistical significance in the meat vs. dessert question category. The significance found shows a relationship between the gender of the participant and the 49 gender of the recipe they chose. Female participants were more likely to select dessert recipes, and male participants were more likely to choose meat recipes. The statistical significance for the meat vs. dessert question category shows that men are more likely to select masculine recipes and women are more likely to choose feminine recipes. Although this result indicates that women and men are more likely to choose recipes that fit their gender (i.e., desserts for women, meat-based meals for men), it does not mean that individuals are more likely to eat the recipe they selected. It would be interesting to see if further research could determine whether individuals are more likely to choose their specific gendered recipe due to upbringing, culture, and societal pressures. Table 10 Meat vs. Dessert Question Category Meat vs. Dessert Category Short time limit (1.75 seconds) p-value: < .001 Recipe Type F M Total Male to Female Ratio dessert 5380 4687 10067 0.871 meat 2598 2619 5217 1.01 Long time limit (5 seconds) p-value: < .001 Recipe Type F M Total Male to Female Ratio dessert 6316 5060 11376 0.80113996 meat 2701 3017 5718 1.11699371 50 The third question category was direction length, and as seen in Table 9, there was no statistical significance. The direction length looked specifically at whether or not the direction length of the recipe affected the participant’s recipe selection. Table 11 Direction Length Question Category Direction Length Category Short time limit (1.75 seconds) p-value: 0.87 Direction Length F M Total Male to Female Ratio short 5682 4696 10378 0.82646955 long 2933 2439 5372 0.83157177 Long time limit (5 seconds) p-value: 0.41 Direction Length F M Total Male to Female Ratio short 5982 5131 11113 0.86 long 3059 2553 5612 0.83 The remaining three tests were to see if the screen position, participant group, and time allotment had any significance. Table 10 shows that the time allotment did not have any significance on the recipe selection. In other words, it did not matter whether the participant had 5 seconds to answer a question or 1.75 seconds to answer a question. The time allowed per question did not affect the participant’s recipe selection. 51 Table 12 Long vs. Short Time Allotted p-value: 0.97 Time Allowed F M Total Male to Female Ratio 5s 28029 23188 51217 0.83 1.75s 25532 21135 46667 0.83 However, Tables 11 and 12 show that screen position and participant group did have a slight significance in recipe selection. Table 11 looks explicitly at the order in which a participant’s question time allotment was set. Participants in Group A started with 5 seconds for the first 100 questions and 1.75 seconds for the last 100 questions. However, participants in Group B had 1.75 seconds for the first 100 questions and 5 seconds for the last 100 questions. The test results show that the participant’s group did have a slight effect on their recipe selection, which means that participants in Group A may have had more time to process the questions because their first 100 questions had a 5-second time allotment. Table 13 Participant Group p-value: < .05 Group F M Total Male to Female Ratio A 28400 23165 51565 0.82 B 25161 21158 46319 0.84 Table 12 looks at whether the screen position of the recipe affected the participant’s recipe selection. The screen position did have a slight effect on the recipe selection, which means that participants could have been focusing on click recipes more on one side of the screen versus the other side. 52 Table 14 Screen Position p-value: < .001 Position F M Total Male to Female Ratio left 28607 23144 51751 0.81 right 24954 21179 46133 0.85 The second round of statistical tests were to look at the last 25 questions for each section. Only the answers for questions 75-100 and 175-200 were used in this dataset. It is also important to note that the answers only came from participants who answered all 200 questions. The first test, which tested the submitter question category, showed no statistical significance, as shown in Table 13. As with the first dataset, the data collected for the submitter question category is wrong and thus the test is invalid. The data collected only showed whether the recipe tested was considered a male or female recipe (meat vs dessert) and not the gender of the submitter. 53 Table 15 Submitter Question Category Submitter Category Short time limit (1.75 seconds) p-value: 0.90 Submitter Gender F M Total Male to Female Ratio female 1506 1319 2825 0.88 male 790 685 1475 0.87 Long time limit (5 seconds) p-value: 0.21 Submitter Gender F M Total Male to Female Ratio female 1591 1463 3054 0.92 male 843 715 1558 0.85 The second test for the dataset was to test whether there was any significance in men selecting meat recipes over dessert recipes and women selecting dessert recipes over meat recipes. As seen in Table 14, there is a statistical significance in the meat vs. dessert question category. As with the test using the whole dataset, this test also showed that female participants were more likely to select dessert recipes and male participants were more likely to choose meat recipes. As with the first dataset, the statistical significance for the meat vs. dessert question category shows that men are more likely to select masculine recipes and women are more likely to choose feminine recipes. The results also show that women and men are more likely to choose the recipe that fits their gender when given more time, which could indicate that when given more time to process the question and the two recipes, individuals are more likely to select the recipe that best fits their gender. This result is contrary to my hypothesis that men and women are more likely to select a recipe that fits 54 their gender stereotype when given less time to process the question. A reason for this result could be that the 1.75 second time allotment was too short for the participant’s to process anything on the screen before the time ran out. Table 16 Meat vs. Dessert Question Category Meat vs. Dessert Category Short time limit (1.75 seconds) p-value: 0.06 Recipe Type F M Total Male to Female Ratio dessert 1174 1041 2215 0.8867121 meat 567 579 1146 1.021164 Long time limit (5 seconds) p-value: 0.00011 Recipe Type F M Total Male to Female Ratio dessert 1331 1070 2401 0.80 meat 590 624 1214 1.06 The third question category was direction length, and as seen in Table 15, there was no statistical significance. 55 Table 17 Direction Length Question Category Direction Length Category Short time limit (1.75 seconds) p-value: 0.65 Direction Length F M Total Male to Female Ratio short 1577 1271 2848 0.81 long 806 670 1476 0.83 Long time limit (5 seconds) p-value: 0.39 Direction Length F M Total Male to Female Ratio short 1693 1379 3072 0.81 long 880 678 1558 0.77 The remaining three tests were to see if the screen position, participant group, and time allotment had any significance. The results from the last three tests can be seen in Tables 16, 17, and 18, and all three tests showed no significance. Table 16 shows that the time allotment on a question did not affect the recipe selection. Table 18 Long vs. Short Time Allotted p-value: 0.62 Time Allowed F M Total Male to Female Ratio 5s 6928 5929 12857 0.86 1.75s 6420 5565 11985 0.87 Table 17 shows that for this dataset, answers to questions 75-100 and 175-200 did, the participant group did not affect the recipe selection. 56 Table 19 Participant Group p-value: 0.53 Group F M Total Male to Female Ratio A 6946 5934 12880 0.85 B 6402 5560 11962 0.87 Table 18 shows that for this dataset, the screen position did not affect the recipe selection. Table 20 Screen Position p-value: 0.18 Position F M Total Male to Female Ratio left 7058 5979 13037 0.85 right 6290 5515 11805 0.88 The last round of statistical tests looked at the dataset, which contained only the answers from participants who correctly answered the filler questions. The purpose of the filler questions is to check whether the participant is just clicking through the study or whether they are reading the question before selecting an answer. The first test, which tested the submitter question category, showed no statistical significance, as shown in Table 19. This means that the submitter category did not affect the recipe selection. As with the first two datasets, the data collected for the submitter question category is wrong and thus the test is invalid. The data collected only showed whether the recipe tested was considered a male or female recipe (meat vs dessert) and not the gender of the submitter. 57 Table 21 Submitter Question Category Submitter Category Short time limit (1.75 seconds) p-value: 0.20 Submitter Gender F M Total Male to Female Ratio female 1153 900 2053 0.78057242 male 642 552 1194 0.859813084 Long time limit (5 seconds) p-value: 0.95 Submitter Gender F M Total Male to Female Ratio female 1246 1031 2277 0.83 male 696 572 1268 0.82 The second test for the dataset was to test whether there was any significance in men selecting meat recipes over dessert recipes and women selecting dessert recipes over meat recipes. As seen in Table 20, there is a statistical significance in the meat vs. dessert question category. However, there is more significance when the participant was allowed a longer time to answer the question. This means that when a participant had a long time to read the question, they were more likely to pick the recipe that fit their gender (i.e., female choosing a dessert recipe). As with the first two datasets, the statistical significance for the meat vs. dessert question category shows that men are more likely to select masculine recipes and women are more likely to choose feminine recipes. The results from this test also show that individuals who are paying attention to the questions being asked are more likely to select a recipe that fits their gender. 58 Table 22 Meat vs. Dessert Question Category Meat vs. Dessert Category Short time limit (1.75 seconds) p-value: 0.09 Recipe Type F M Total Male to Female Ratio dessert 1101 928 2029 0.84 meat 615 587 1202 0.954 Long time limit (5 seconds) p-value: < .001 Recipe Type F M Total Male to Female Ratio dessert 1213 1113 2326 0.92 meat 581 696 1277 1.20 The third question category was direction length, and as seen in Table 21, there was no statistical significance. The results shown in Table 21 indicate that the recipe’s direction length did not affect the recipe selection. 59 Table 23 Direction Length Question Category Direction Length Category Short time limit (1.75 seconds) p-value: 0.88 Direction Length F M Total Male to Female Ratio short 1078 952 2030 0.88 long 643 560 1203 0.87 Long time limit (5 seconds) p-value: 0.41 Direction Length F M Total Male to Female Ratio short 1264 1083 2347 0.86 long 714 576 1290 0.81 The remaining three tests were to see if the screen position, participant group, and time allotment had any significance. Table 22 shows that the time allotment did not affect the recipe selection. Table 24 Long vs. Short Time Allotted p-value: 0.18 Time Allowed F M Total Male to Female Ratio 5s 5824 5071 10895 0.87 1.75s 5342 4479 9821 0.84 Like the original dataset, which was tested first, the screen position and participant group show statistical significance. Tables 23 and 24 show the results from those two tests. 60 Participants in Group A started with 5 seconds for the first 100 questions and 1.75 seconds for the last 100 questions. However, participants in Group B had 1.75 seconds for the first 100 questions and 5 seconds for the last 100 questions. The results of the test shown in Table 23 show that the participant’s group did have a slight effect on their recipe selection, which could indicate that participants who started in Group A had more time to get familiar with the format of the questions because they were allotted 5 seconds for the first 100 questions. Table 25 Participant Group p-value: 0.09 Group F M Total Male to Female Ratio A 5863 4901 10764 0.84 B 5303 4649 9952 0.88 Table 24 looks at whether the screen position of the recipe affected the participant’s recipe selection. The screen position did have a slight effect on the recipe selection, indicating that individuals were more likely to select a recipe on a specific side of the page. Table 26 Screen Position p-value: < .001 Position F M Total Male to Female Ratio left 28607 23144 51751 0.81 right 24954 21179 46133 0.85 61 Conclusion To answer the research question stated in the introduction, this study indicates that it is not possible for a machine to accurately predict an individual’s gender based on their recipe selections given the three question categories tested. One of the issues with this study setup was that it did not account for an individual’s food preference. A participant may have been vegetarian or vegan and automatically selected the recipe that did not seem to contain any animal products and so further research would need to be done to see if an individual’s diet or food preferences would change the outcome. Another limitation of this study was that the participants were asked to select their gender at the beginning of the study, which could have primed their brain to look for differences in gender. It would have been better to ask the participants to select their gender at the end of the study to avoid any priming for gender differences. The first hypothesis that I stated in the introduction is that predicting people’s gender using machine learning algorithms will result in a 70% accuracy rate. The results from my study indicated that my hypothesis was wrong, and instead, the machine learning algorithms resulted in accuracy rates in the range of 50%-55%. Two of my hypotheses addressed the meat versus dessert question category. The first hypothesis that addressed the meat versus dessert question category stated that females would be more likely to click on desserts recipes, and the second hypothesis said that males would be more likely to click on meat recipes. Both of these hypotheses were true, and the chi-squared tests showed statistical significance for the meat versus dessert 62 question category using all three datasets. Females were more likely to select dessert recipes, and men were more likely to choose meat recipes. The second question category was the recipe direction length, and two of my hypotheses addressed that category. The first hypothesis stated that females would be more likely to click on more complex recipes because they are more likely to have more experience cooking, and for a recipe to be considered more complex, it must have more directions than the recipe that it is being compared to. The second hypothesis stated that men would be more likely to click on less complex recipes (i.e., fewer directions) because they have less experience cooking. Both of these hypotheses were not true, and the chi-squared tests showed no statistical significance when it came to the recipe direction length. A limitation of the direction length test presented in this thesis may be that by using the last 25 questions of each section, the participants were feeling fatigued from the number of questions asked. It may be better in the future to pull the answers from the middle 25 questions from each section instead of the last 25 questions. Another hypothesis that I had stated was that the perceived gender of the recipe submitter would play a part in whether an individual chooses a recipe. An example of this hypothesis is if a man and woman submitted the same recipe for ribs, the man’s recipe would be selected over the woman’s because ribs are considered masculine. It’s unclear whether this hypothesis was true or not because the data collected for this specific test was invalid. A major issue with the data collected for this test specifically was that the recipe’s gender was accounted for (was it a dessert or meat-based recipe) but the submitter’s gender was not recorded. This was an oversight on the data collection and can 63 be improved with further research, but for the purposes of this thesis, has created an invalid test. The last hypothesis I stated was that individuals would be more likely to choose a recipe that fits their gender stereotype if they have less time to process a recipe’s information (the submitter, directions, image). The results of this study indicate that this hypothesis is wrong and when participants were given more time to choose a recipe, they often selected the recipe that fit their gender stereotype which is contrary to what my hypothesis was. A limitation of the study could be that the 1.75 second time limit was too short. Feedback from the pilot test indicated that the original 2 second time limit was too short, but the study presented in this thesis kept the short time limit regardless of the feedback. The short time limit may have ended up creating a situation where the participant just had to select an answer because there wasn’t enough time to process all the information in front of them. Future research on this topic should do a more in-depth investigation into the right time limits that should be used. One of the takeaways from this study is that the screen position and participant group affected the recipe selection. The first dataset tested using the chi-squared test was the whole dataset of every answer from participants who completed the study, and this dataset showed that there was statistical significance when it came to both the screen position and the participant’s group. The other dataset that showed the same statistical significance was the last dataset tested: the filtered dataset that only contained answers from participants who accurately answered both filler questions. The dataset that included the answers from questions 75-100 and 175-200 did not show any statistical significance for the screen position or participant group. Screen position and participant group could 64 have an effect on recommendation engines in the future because it may be that someone is more likely to select an advertisement if it is on the right side of the page, or someone might be more willing to click on a recommendation if they have enough time to read the full description. Another takeaway from this study is that men are more likely to select meat-based recipes, and women are more likely to choose dessert recipes. Each dataset ran through the chi-squared test came to the same result of statistical significance, which indicates that women are more likely to select dessert recipes and men are more likely to choose meat-based recipes. For websites that use demographic engines to present information to users, it could be important to research the difference in a selection based on one’s gender (i.e., are women more likely to select shorts or are men)? Table 25 shows the breakdown of tests that had statistical significance versus the tests that did not have any statistical significance, as well as the p-values associated with each test. The datasets in the table are denoted by numbers, where dataset 1 is the dataset that contained all the answers for each participant who completed the study. Dataset 2 is the dataset that only contains answers to questions 75-100 and 175-200 for each participant. The last dataset, 3, only includes responses from participants who correctly answered the filler questions. Each dataset showed statistical significance when it came to the meat vs. dessert question category. The first dataset, which contained all of the answers from participants who answered all 200 questions, showed statistical significance in screen position and participant group. The last dataset, which only 65 included the answers from participants who answered the filler questions correctly, also showed statistical significance in screen position and participant group. 66 Table 27 Statistical Significance Between Tests Dataset Test Showed Statistical Significance 1 Submitter No 1 Meat vs Dessert Yes 1 Direction Length No 1 Time Allotment No 1 Participant Group Yes 1 Screen Position Yes 2 Submitter No 2 Meat vs Dessert Yes 2 Direction Length No 2 Time Allotment No 2 Participant Group No 2 Screen Position No 3 Submitter No 3 Meat vs Dessert Yes 3 Direction Length No 3 Time Allotment No 3 Participant Group Yes 3 Screen Position Yes It is hard to say that the participant’s group and the recipe screen position affected the recipe selection because not all three datasets showed statistical significance for these two tests. The screen position may have shown statistical significance because it could have just been that participants were clicking recipes on one side of the screen more than 67 others. More data collection would be needed to see if participants were only clicking on one side of the screen. Research on this topic could be improved by incorporating different diets to see if there is any significance and whether it could improve the accuracy rate of the machine learning algorithms when trying to determine the gender of an individual. Another thing that might be important to look at is the age and location of the individual. The research in this paper did not collect age or location, and those demographic attributes may play a part in which recipe an individual is more likely to choose. Older individuals may be more likely to fall into gender stereotypes than younger individuals because there has been a large push socially to disregard gender stereotypes. Location is another important factor because someone from Utah may have more of an appetite for sweets whereas someone from Texas may have more of an appetite for meat-based recipes, especially common BBQ foods. References 1. [Online]. Available: https://business.adobe.com/glossary/recommendation-engine. html 2. American Psychologists, Guidelines for Psychological Practice with Transgender and Gender Nonconforming People, American Psychological Association, 2015. [Online]. Available: https://www.apa.org/pi/lgbt/resources/sexuality-definitions. pdf 3. K Sellaeg, G. E. Chapman, Masculinity and food ideals of men who live alone. Elsevier Appetite. 2008. 4. N. Cavazza, A. R. Graziani, M. Guidetti, Impression formation via #foodporn: Effects of posting gender-sterotyped food pictures on Instagram profiles. Elsevier Appetite. Department of Communication and Economics, University of Modena, Reggio Emilia, Italy. 2019 5. “What is Human-Computer Interaction (HCI)?,” The Interaction Design Foundation. [Online]. Available: https://www.interaction-design. org/literature/topics/human-computer-interaction. 6. C. Peersman, W. Daelemans, L. Van Vaerenbergh, Predicting Age and Gender in Online Social Networks. Association for Computing Machinery, New York, NY, United States. 2011 7. L. S. Taillie, Who’s cooking? Trends in US home food preparation by gender, education, and race/ethnicity from 2003 to 2016. Nutrition Journal 17, 41 (2018). 8. [Online]. Available: https://suebehaviouraldesign.com/kahneman-fast-slow-thinking/ G. Linden, B. Smith, J. York. Amazon.com Recommendations Item-to- Item Collaborative Filtering. IEEE Internet Comput 7(1):76-80. 2003 9. Y. Wang, S. C. Chan, G. Ngai. Applicability of Demographic Recommender System to Tourist Attractions: A Case Study on TripAdvisor. IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Vol 3 (WI-IAT’ 12). IEEE Computer Society, USA. 97-101. 2012 10. J.B. Schafer, D. Frankowski, J. Herlocker, S. Sen, Collaborative filtering recommender systems. The Adaptive Web, 291-324, 2007 11. G. Adomavicius, A. Tuzhilin, Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749, 2005 12. R. L. Rosa, G. M. Schwartz, W. V. Ruggiero, D. Z. Rodríguez. A Knowledge- Based Recommendation System That Includes Sentiment Analysis and Deep Learning. IEEE Transactions on Industrial Informatics, vol. 15, no. 4, pp. 2124- 2135, 2019 13. L. Safoury, A. Sala. Exploiting User Demographic Attributes for Solving Cold- Start Problem in Recommender System. Lecture Notes on Software Engineering, Vol 1, No. 3. 2013 14. B. Krulwich. Lifestyle Finder: Intelligent User Profiling Using Large-Scale Demographic Data, AI Magazine, Vol. 18, Issue 2. 1997 15. M. V. Setten. Experiments with a recommendation technique that learns category interests. In ICWI (pp. 722–725). 2002 69 16. H. Nguyen. R. Richards. C. Chan. K. J. Liszka. RedTweet: recommendation engine for reddit. J. Intell Inf System (2016) 47:247-265. 2016 17. B. Stein, S. Meyer zu Eissen, Distinguishing topic from genre. Proceedings of the 6th International Conference on Knowledge Management (I-KNOW 06). Graz: Journal of Universal Computer Science. 2006 18. L. Brieman. Random forests. Machine Learning, 45(1):5-32. 2001 19. T. Dietterich. Ensemble Methods in machine learning. Multiple Classifier Systems, 1857, 1–15. 2000 20. A. I. Schein, A. Popescul, L. H. Ungar, D. M. Pennock. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ‘02). Association for Computing Machinery, New York, NY, USA, 253– 260. 2002 21. A. L. Vi. Pereira, E. R. Hruschka. Simultaneous co-clustering and learning to address the cold start problem in recommender systems. Knowledge-Based System Vol. 82:11-19. 2015 22. S. Sahebi, W. W. Cohen. Community-Based Recommendations: a Solution to the Cold Start Problem. Workshop on Recommender Systems and the Social Web. RSWEB (p. 60) 2011 23. E. Roos, E. Lahelma, M. Virtanen, R. Prattala, P. Pietinen. Gender, Socioeconomic Status and Family Status as Determinants of Food Behavior. Helsinki, Finland. 1998 24. L. A. Rudman, K. Fairchild, Reactions to Counterstereotypic Behavior: The Role of Backlash in Cultural Stereotype Maintenance. Journal of Personality and Social Psychology. Rutgers, The State University of New Jersey. 2004. 25. S. Higgins, M. Mulvenna, R. Bond, A. McCartan, S. Gallagher, D. Quinn. Multivariate Testing Confirms the Effect of Age-Gender congruence on Click- Through Rates from Online Social Network Digital Advertisements. CyberPsychology, Behavior & Social Networking. 2018 26. G. Zhao, S. Luo, J. He. Style Matching Model-Based Recommend System for Online Shopping. IEEE 10th International Conference on Computer-Aided Industrial Design & Conceptual Design, 2009 27. U. Weinsberg, S. Bhagat, S. Ionnidis, N. Taft, BlurMe: Inferring and Obfuscating User Gender Based on Ratings. RecSys ’12, Dublin, Ireland. 2012. 28. M. Cutajr, I. Grech, O. Casha, J. Micallef. Comparison of Different Multiclass SVM Methods for Speaker Independent Phoneme Recognition. Department of Microelectronics & Nanoelectronics, University of Malta, Msida, Malta. 2012. 29. R. Hirt, N. Kuhl, G. Satzger. Cognitive computing for customer profiling: meta classification for gender prediction. Institute of Applied Information at University of Leipzig. 2019 30. C. Verma, A. S. Tarawneh, Z. Illes, V. Stoffova, S. Dahiya. Gender Prediction of the European School’s Teachers Using Machine Learning: Preliminary Results. 2018 IEEE 8th International Advance Computing Conference, Greater Noida, India. 2018 31. D. Duong, H. Tan, S. Pham. Customer Gender Prediction Based on E-Commerce Data. 8th International Conference on Knowledge and System Engineering. 2016 70 32. “What is Full Stack?,” What is Full Stack. [Online]. Available: https://www.w3schools.com/whatis/whatis_fullstack.asp 33. “What is a Container?,” Docker. [Online]. Available: https://www.docker.com/resources/what-container 34. “Java Archive (JAR) Files,” JDK 6 Java Archive (JAR)-related APIs & Developer Guides. [Online]. Available: https://docs.oracle.com/javase/6/docs/technotes/guides/jar/index.html 35. “UUID,” UUID (Universally Unique Identifier) Definition. [Online]. Available: https://techterms.com/definition/uuid
Format	application/pdf
ARK	ark:/87278/s68nmvz9
Setname	wsu_smt
ID	96849
Reference URL	https://digital.weber.edu/ark:/87278/s68nmvz9

Back to Search Results