CHATBOT AKADEMIK BERBASIS RAG UNTUK INFORMASI AKADEMIK MAHASISWA
Abstract
This study aims to develop a Retrieval-Augmented Generation (RAG)-based academic chatbot to provide accurate, relevant, and official document-based academic information for students of the Informatics Engineering Study Program, FTI UNISSULA. The methods used include collecting and pre-processing academic documents, chunking processes, forming vector representations using the Sentence-BERT model, and storing them in the FAISS database to support semantic search. The RAG system integrates document retrieval results with the capabilities of the Large Language Model (LLM) in generating contextual responses. System evaluation was carried out using the ROUGE-1 and BLEU-4 metrics on 50 questions consisting of FAQ and non-FAQ categories. The test results showed that the system was able to respond to all questions given, with high performance in the FAQ category (ROUGE-1 of 0.957 and BLEU-4 of 0.877), and lower performance in the non-FAQ category due to paraphrasing variations in academic documents. These results indicate that the RAG approach is effective in improving the accuracy and relevance of academic chatbot answers, and is able to reduce the risk of misinformation compared to a purely generative approach.
References
A. Alshammari and M. Alqahtani, “Design and Implementation of Academic Information Systems in Higher Education: A Systematic Review,” Educ. Inf. Technol., 2021.
G. D. Albert and A. Voutama, “Pengembangan Chatbot Berbasis Pdf Menggunakan Local Retrieval-Augmented Generation (Rag) Dan Ollama,” J. Inform. dan Tek. Elektro Terap., vol. 13, no. 2, 2025, doi: 10.23960/jitet.v13i2.6361.
J. Prayoga, F. R. S. Br Ginting, K. Siregar, N. Ramadani, and R. R. Al Hafiz, “Analisis Audit Sistem Informasi Absensi Pada Stmik Kaputama Menggunakan Framework Cobit-5,” War. Dharmawangsa, vol. 19, no. 1, pp. 180–187, 2025, doi: 10.46576/wdw.v19i1.5823.
P. Lewis, E. Perez, A. Piktus, et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 9459–9474.
T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” in International Conference on Learning Representations (ICLR), 2020.
X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained Models for Natural Language Processing: A Survey,” arXiv Prepr. arXiv2003.08271, 2021.
D. Kristanto et al., “Pengembangan Chatbot Layanan Informasi Kampus Menggunakan TF-IDF,” pp. 103–115, 2025, doi: 10.33364/algoritma/v.22-2.2350.
A. Kurniawan, A. Abdiansah, and A. Syahrini, “NL2SQL for Chatbot with Semantic Parsing Using Rule-Based Methods,” vol. 5, no. 1, pp. 39–48, 2024.
M. Amin, K. Nazik, and A. Salwa, “Interacting with Educational Chatbots: A Systematic Review,” Education and Information Technologies, 2023.
T. Brown et al., “Language Models are Few-Shot Learners,” in Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 1877–1901.
I. Ortiz-Garces, J. Govea, R. O. Andrade, and W. Villegas-Ch, “Optimizing Chatbot Effectiveness through Advanced Syntactic Analysis: A Comprehensive Study in Natural Language Processing,” Appl. Sci., vol. 14, no. 5, 2024, doi: 10.3390/app14051737.
P. Zhao et al., “Retrieval-Augmented Generation for AI-Generated Content : A Survey,” pp. 1–29, 2026.
V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 6769–6781.
J. Lin, R. Pradeep, T. Teofili, and J. Xian, “Vector Search with OpenAI Embeddings : Lucene Is All You Need,” pp. 1–9, 2020.
D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J-SAKTI (Jurnal Sains Komput. Dan Inform., vol. 5, no. 2, pp. 697–711, 2021.
Y. Gao, Y. Xiong, X. Gao, et al., “Retrieval-Augmented Generation for AI-Generated Content: A Survey,” arXiv preprint arXiv:2402.19473, 2024.
M. D. Putri, A. Rahmat, and Y. Sanjaya, “Penerapan Teknik Chunking untuk Mengendalikan Beban Kognitif Intrinsik Siswa SMA Pada Pembelajaran Sistem Reproduksi Manusia Implementation of Chunking Technique to Control the Intrinsic Cognitive Load in Senior High School Students of Human Reproductive System Learning,” vol. 18, pp. 25–29, 2021.
H. Steck et al., “Is Cosine-Similarity of Embeddings Really About Similarity ?,” vol. 2024, no. May 2024, 2026, doi: 10.1145/3589335.3651526.
Y. Yuniati et al., “Analisis Performa Ekstraksi Konten GPT-3 dengan BERTScore dan ROUGE,” vol. 11, no. 6, pp. 1273–1280, 2024.
S. Gehrmann et al., “Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices,” ACM Computing Surveys, 2023.
















