OPTIMASI HASIL PENCARIAN PADA WEB SCRAPPING MENGGUNAKAN PEMBOBOTAN KATA TF-IDF

Authors

  • Edy Prayitno Program Studi Sistem Informasi, STMIK AKAKOM Yogyakarta
  • Totok Suprawoto Program Studi Sistem Informasi, STMIK AKAKOM Yogyakarta
  • Beny Fajar Riyanto Program Studi Sistem Informasi, STMIK AKAKOM Yogyakarta

DOI:

https://doi.org/10.53625/jirk.v1i7.822

Keywords:

Curl, Simple Html Dom, Search Information System, TF-IDF, Web Scraping

Abstract

This research is motivated by the amount of information offered on various webs. The large number of relevant websites related to the information sought causes users to have to search for the desired information  one by one on the web so that the time required becomes longer. This research uses web scraping and TF-IDF method. Web scraping is a technique for getting information from  web pages. In doing scraping, curl and simple html dom are needed  to parse the scraped data. TF-IDF is a method to perform a search by looking for similarity of data with the keywords entered so that by using TF-IDF it is hoped that information that is more in line with the keywords entered is obtained. By using web scraping, additional data can be added to the system without using a web service. The use of TF-IDF results in a better search because the search is done by comparing the similarity of words between the data in the system and the search keywords.

References

Deolika, A., Kusrini, and Luthfi, E.T., 2019, Analisis Pembobotan Kata pada Klasifikasi Text Mining, Jurnal Teknologi Informasi, Vol. 3, No. 2, hal 179-184.

Herwijayanti, B., Ratnawati, D. E., and Muflikhah, L., 2018, Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity, Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, Vol. 2, No. 1, hal 306-312.

Melita, R., Amrizal, V., Suseno, H. B., Dirjam, T., 2018, Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) dan Cosine Similarity pada Sistem Temu Kembali Informasi untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Syarah Umdatil Ahkam), Jurnal Teknik Informatika, Vol. 11, No. 2, hal 149-165.

Saadah, M. N., Atmagi, R. W., Rahayu, D. S., and Arifin, A.Z., 2013, Sistem Temu Kembali Dokumen Teks dengan Pembobotan Tf-Idf Dan LCS, Jurnal Ilmiah Teknologi Informasi, Vol. 11, No. 1, hal 17-20

Liu, Q., Zhang, D., Wang, J., Yang, Y., Wang, N., 2018, Text Features Extraction based on TF-IDF Associating Semantic, IEEE 4th International Conference on Computer and Communications, 2338-2343

Wu, H. C., Luk, R. W. P., Wong, K. F., and Kwok, K.L., Interpreting TF-IDF Term Weights as Making Relevance Decisions, ACM Transactions on Information Systems, Vol. 26, No. 3, Article 13, Publication date: June 2008,13:1 - 13:37

Saadah M.N., Atmagi, R.W., Rahayu, D. S., Arifin, A.Z., 2013, Information Retrieval of Text Document with Weighting Tf-Idf and Lcs, Journal of Computer Science and Information, Volume 6, Issue 1, hal 34-37.

Turland, Matthew, 2010, Php|architect’s Guide to Web Scraping, Canada: Marco Tabini & Associates.

Kurniawati, D., Triawan, D., 2017, Increased Information Retrieval Capabilities on e-commerce Websites using Scraping Techniques, International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Nov 24-25.

Rofiqi, M. A., Fauzan, A. C., Agustin, A. P., Ahmad Agung Saputra, A. A., Fahma, H. D., 2019, Implementasi Term-Frequency Inverse Document Frequency (TF-IDF) untuk Mencari Relevansi Dokumen Berdasarkan Query, Journal of Computer Science and Applied Informatics, Vol. 1, No. 2, hal 58-64

Downloads

Published

2021-12-24

How to Cite

Edy Prayitno, Totok Suprawoto, & Beny Fajar Riyanto. (2021). OPTIMASI HASIL PENCARIAN PADA WEB SCRAPPING MENGGUNAKAN PEMBOBOTAN KATA TF-IDF. Journal of Innovation Research and Knowledge, 1(7), 241–246. https://doi.org/10.53625/jirk.v1i7.822

Issue

Section

Articles