ANALYSIS ENGAGEMENT OF INSTAGRAM VISITORS AT UNIVERSITY OF MULTI DATA PALEMBANG BASED ON TOPIC USING LDA

Social media has a big impact on everyday life, one of which is to communicate or to get information. Therefore, the development of social media applications makes people use social media applications to find information via the internet. The Instagram applications is one of the most popular social media because it has different topics based on post in the from of images or videos. Therefore, it is very difficult to identify a topic manually. One way to get implied information on social media is through topic modeling. This research was conducted to analyze the application of the LDA method to identify what topics are on Instagram at Multi Data Palembang University. The topics chosen in this study were obtained from LDA based on coherence values. This research uses 2 models, namely random forest and decision tree. Each model tested will produce different accuracy, precision, recall, and f1-score values. Tests were carried out on the LDA labeling dataset and manual labeling, the test results on the LDA labeling dataset were very good using the random forest model with an accuracy values of 78%, precision 80%, recall 66.66%, and f1-score 72.72%.


INTRODUCTION
Over time, today's technology users have had a big impact on everyday life, one of which is the development of information and communication technology.The growing development of information and communication technology makes people use social media to search for information via the internet with smartphones.Social media is a digital platform that offers every user the opportunity to engage in social activities.Several activities can be carried out on social media, such as communicating or interacting to provide information or content in the form of text, photos and videos.One of the digital platforms or social media that is widely used to receive information is Instagram [1].
Instagram can also be interpreted as an application that acts as a means of sharing photos and videos on social networks, allowing users to take photos and videos and add filters that give interesting effects to photos.Users can also provide negative and positive opinions or comments when discussing a topic that is currently trending [2].Instagram is also often used for advertising media because instagram has features that support product marketing very well.The magnitude of this opportunity has fueled the rise of electronic accounts or online stores that sell various products on Instagram.Online stores are virtual stores where buyers and sellers cannot meet in person, so customer engagement is needed [3] Customer Engagement is an emotional interaction between customers and companies formed by motivation, referrals and customer experinces through creative social media posts from brands that influence buyers [4] Therefore, user engagement on social media is very influential, so that social media posts or accounts becomes popular and tends to attract more attention, if it generates higher engagement indicators for that post or account.So that the role of digital in higher education is very important to recruit prospective students and promote campus facilities or campus activities.
There are several universitas in Palembang, namely Sriwijaya University, Palembang PGRI University, Palembang Multi Data Palembang, Musi Charitas Catholic University, Tamansiswa University, Kader Bangsa University, Tridinanti University 3 Palembang, IGM University, Bina Darma University, Palembang Muhammadiya University, and Islamic University Country [5] There are many Universities in South Sumatra, the outhor chose Multi Data Palembang University to carry out the topic modeling method on Instagram Universitas Multi Data Palembang University.The researcher chose Multi Data Palembang University because Multi Data Palembang University was ranked second in the webimerics version of the university ranking in SOUTH SUMATRA July 2021 and the Multi Data Palembang University had 11.7 thousand followers [6], but likes and comments on Instagram posts of Multi Data Palembang University were very not in accordance with followers, so researchers choose Instagram social media at Multi Data Palembang University to do the modeling topic.
Topic modeling is a machine learning technique used to group documents into related topics or themes.The modeling topic includes many methods that can be used, such as Latent Semantic Analysis (LSA), Probabilistic Letent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA).Latent Dirichlet Allocation (LDA) is an enhanced mixture model to capture word exchange and legacy document systems over PLSA and LSA.The most popular topic modeling method today is Latent Dirichlet Allocation (LDA).Latent Dirichlet Allocation (LDA) is a probabilistic algorithm that models each document as a combination of different topics and the distribution of words from each topic in the document.The Latent Dirichlet Allocation (LDA) algorithm tries to find the topics or themes that appear most frequently in documents, then determines how often certain words appear in those topics.The Latent Dirichlet Allocation (LDA) algorithm can be applied to various types of text data, such as news articles, business documents, and even text messages or social media.The Latent Dirichlet Allocation (LDA) algorithm is widely used in document grouping or classification applications, sentiment analysis, and natural language processing, so researchers chose the Latent Dirichlet Allocation (LDA) algorithm as the model for this research [7].
Proof of the application of the Latent Dirichlet Allocation (LDA) Algorithm can be proven in research conducted by Akhsin Nurlayli and Morch.Ari Nasichuddin (2019), regarding Topic Modeling of Research by JPTEI UNY Lecturers on Google Scholar Using Latent Dirichlet Allocation.The research data was carried out using the scraping technique.Scraping results are stored in .csvformat and the data obtained is 909 publication titles.The classification process is divided into 4 clusters or 4 topics and the algorithm used is Latent Dirichlet Allocation.Most research results are in the 1st cluster and the least in the 4th cluster [8].
Proof of the application of the Latent Dirichlet Allocation (LDA) Algorithm can be proven in research conducted by Alif Iffan Alfanzar, Khalid, and Indri Sudanawati Rozas regarding Thesis Topic Modeling Using the Latent Diriclhet Method.This research collected data using the scraping method.This research Jurnal Rekayasa Sistem Informasi dan Teknologi Volume 1, No 2 -November 2023 e-ISSN : 3025-888X Hal. 98 conducted experiments on 5 different iteration tests, namely: 100, 500, 1000, and 5000.Each iteration test has a different number of topics, namely: 2, 3, 4, and 5.The test results using the Latent Dirichlet Allocation (LDA) method are available 584 thesis abstract data.The results of topic modeling were carried out in 5 iteration tests and each iteration test included a different number of topics, namely 2, 3, 4, and 5.The best topic cluster results were obtained with a number of topics of 3 [9].
Proof of the application of the Latent Dirichlet Allocation (LDA) Algorithm can again be proven in research conducted by Diandra Zakeshia Tiara Kannitha, Mustafid, and Puspita Kartikasari regarding Topic Modeling in Customer Complaints Using the Latent Dirichlet Allocation Algorithm in Twitter Social Media.This research collects data obtained from IndiHomeCare and FirstMediaCares Twitter.The data obtained was 5000 tweets, then duplicate data was removed so that the remaining 4828 tweets were for Indihome and 4973 tweets for FirstMediaCares.Test results using Latent Dirichlet Allocation (LDA) on the keyword FirstMediaCares produced a total of 10 topics while the keyword IndiHomeCare produced a total of 11 topics.Based on the trend results, the FirstMediaCares topic is the internet that shuts down while working, while IndiHomeCare is the internet that likes to disconnect and turn off.Based on the interpretation, the figure was 70% for FirstMediaCares and 81.81% for IndiHomeCare [10].
Furthermore, in previous research conducted by Bagus Wicaksono Arianto and Gangga Anuraga (2020), regarding Modeling Twitter User Topics regarding the "Ruangguru" application.This research data was conducted using Wordcloud tools.The algorithm used is Latent Dirichlet Allocation.The Latent Dirichlet Allocation (LDA) clustering method in the Ruangguru application groups Twitter data into 28 topics with the topic often discussed being Ruangguru discounts [11].
The difference between this research and previous research is that other studies label it manually, whereas I analyzed it based on the LDA labeler as a dataset and the dataset from LDA will be classified.Based on this description, the Latent Dirichlet Allocation (LDA) algorithm is very precise and suitable for topic modeling.In this research, we will use Instagram to view posts and compare topics discussed on Multi Data University Palembang's Instagram social media and the aim of this research is to find out trending topics that are currently being widely discussed on Multi Data University Palembang's Instagram.

METHODOLOGY
In this implementation, there are several steps taken to get the best results.Figure 1 shows the research stages.

Figure 1. Research Stages A. Identification of problems
At this stage, problem indentification will be carried out regarding topic modeling on the Palembang Multi Data University Instagram application using the Latent Dirichlet Allocation (LDA) Algorithm.

B. Study of literature
At this stage a literature study is carried out from several sources, for example journals and books that are still related to the research topic.It aims to obtain information and data that can be used to conduct this research.

C. Dataset Collection
Data collection on the Instagram site was carried out by a scraping process using Jupyter Notebook.Data scraping in this research was carried out on the Instagram site (https://www.instagram.com/universitasmdp/).The dataset was taken from 2021 to 2023.The data features used in the system were obtained from the Palembang Data University Instagram account and the data collected amounted to around 500 pieces of data.Figure 2

D. Preprocessing Data
In the Pre-processing Stage, the dataset obtained from the results of Multi Data Palembang University Instagram Scraping will carry out the Case Folding, Tokenize, Stopword and Stemming stages.Pre-processing aims to clean or remove text from data that is not needed so as to avoid problematic data and inconsistent data.

E. Data Labeling
At the data labeling stage, the amount of data to be labeled is 500 data.Data labeling will be divided into 8 topic categories namely Achievement, Academic, Alumni, Tips, Graduation, Promotion, Event and Commemoration.At the data labeling stage it wa carried out by researchers and validated by the marketing department at Multi Data Palembang University.Data labeling is used to compare the results of data labeling performed by the LDA algorithm.

F. System Design
The Decision Tree and Random Forest methods are used to predict the topics on Instagram posts, which are the number of topics on Instagram at Multi Data University Palembang.The overall system can be seen in Figure 4.

G. Validation
In this phase, model validation is carried out, targeted validation is the phase where the results obtained by the system are checked for correctness and run according to what is desired.

H. Evaluation
The final step is to evaluate the system that has been created.With the aim to minimize errors.At this stage, the calculation of the confusion matrix will be used to measure the quality of the clustering or grouping produced by the LDA topic modeling.In this stage, the confusion matrix is calculated with True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) data, using the confusion matrix formula to calculate Precision, Recall, F1-Score and Accuracy [12]: F1-score =

B. Visualization of LDA topic modeling
Visualization of topic modeling in this research after LDA topic modeling, the model is saved in pyLDAvis format, which can be from visualization of each topic and words that appear most frequently in documents to diagrams, Figure 6 is a form of visualization.

C. Dataset berdasarkan label LDA
At the LDA data labeling stage, the number of data to be labeled is 499 data.Data labeling carried out by LDA is divided into 3 topic categories, namely topic 1 has the keywords competitions, webinars, and holidays, topic 2 has keywords list of lectures and scholarships, topic 3 has the keywords alumni achievement, academic, and performance.The 3 categories are obtained based on the coherence value.

D. Test Results
At this stage the researcher will compare the results of the random forest and decision tree.Table 4 displays the results obtained from random forest and decision tree testing.
Figure 1.Research Stages A. Identification of problemsAt this stage, problem indentification will be carried out regarding topic modeling on the Palembang Multi Data University Instagram application using the Latent Dirichlet Allocation (LDA) Algorithm.B. Study of literatureAt this stage a literature study is carried out from several sources, for example journals and books that are still related to the research topic.It aims to obtain information and data that can be used to conduct this research.C.Dataset CollectionData collection on the Instagram site was carried out by a scraping process using Jupyter Notebook.Data scraping in this research was carried out on the Instagram site (https://www.instagram.com/universitasmdp/).The dataset was taken from 2021 to 2023.The data features used in the system were obtained from the Palembang Data University Instagram account and the data collected amounted to around 500 pieces of data.Figure2and Figure3are datasets obtained from scraping results.

Figure 10 and
Figure 11 are data labeling that has been carried out by LDA.

Table 1
displays the labeling data carried out manually.

Table 2
displays the results of the coherence values in the study.