ChhetriSachinPaudel_MCS_2026

Title ChhetriSachinPaudel_MCS_2026
Alternative Title Using Custom Data to Train and Evaluate Chatbots with Existing AI Tools
Creator Chhetri, Sachin Paudel
Contributors Zhang, Yong (advisor)
Collection Name Master of Computer Science
Abstract This paper investigates whether a practical, domain-specific chatbot can be built effectively from custom institutional data under modest hardware constraints, and whether retrieval-based grounding or parameter-efficient fine-tuning is the more dependable strategy for this setting. The study uses a custom Weber State University Computer Science and MSCS corpus containing 851 normalized records, split at the document level into training, validation, and test partitions to prevent data leakage. Three approaches were implemented and evaluated on a locked 52-question benchmark containing 44 answerable questions and 8 unanswerable control questions: a dense Retrieval-Augmented Generation (RAG) system, a LoRA fine-tuned model, and a QLoRA fine-tuned model. The dense RAG system used Qwen/Qwen3-Embedding-0.6B for retrieval, Qwen/Qwen3-Reranker-0.6B for reranking, and Qwen/Qwen3-4B-Instruct-2507 for grounded answer generation. LoRA and QLoRA used Qwen/Qwen2.5-1.5B-Instruct as a shared base model and were trained on QA-aligned supervision derived strictly from the training and validation splits. Evaluation was conducted using a deterministic custom suite reporting Token F1 (word-level), Semantic Similarity, Factual Term Match, Abstention Accuracy, and runtime summaries rather than an LLM-as-a-judge framework. RAG achieved the strongest overall answer quality, with a Semantic Similarity of 0.8290 and Factual Term Match of 0.9091, substantially outperforming LoRA and QLoRA. LoRA and QLoRA achieved higher abstention accuracy overall, but that advantage mainly reflected their tendency to answer more often rather than stronger evidence calibration. These findings suggest that for small institutional knowledge bases, retrieval-based grounding remains the most reliable method for factual accuracy and source fidelity, while parameter-efficient fine-tuning offers a lighter-weight adaptation strategy with weaker grounding. The paper concludes with a practical recommendation: use RAG when evidence-backed answers matter most and treat LoRA or QLoRA as supporting baselines rather than replacements for retrieval.
Subject Universities and colleges--Data processing; Chatbots; Question-answering systems; Information retrieval; Machine learning; Natural language processing (Computer science)
Keywords Computer Science; Retrieval-Augmented Generation (RAG); LLMs; Chatbots
Digital Publisher Digitized by Special Collections & University Archives, Stewart Library, Weber State University.
Date 2026-05
Medium theses
Type Text
Access Extent 23 page pdf
Conversion Specifications Adobe Acrobat
Language eng
Rights The author has granted Weber State University Archives a limited, non-exclusive, royalty-free license to reproduce his or her thesis, in whole or in part, in electronic or paper form and to make it available to the general public at no charge. The author retains all other rights. For further information: IN COPYRIGHT - EDUCATIONAL USE PERMITTED
Source University Archives Electronic Records: Master of Computer Science. Stewart Library, Weber State University
OCR Text Show
Format application/pdf
ARK ark:/87278/s6mwj36p
Setname wsu_smt
ID 169753
Reference URL https://digital.weber.edu/ark:/87278/s6mwj36p