The Implementation of Semantic Annotation Recognizing Technique in the Scraper Engine on the E-Publishing Website of the National Research and Innovation Agency (BRIN) Indonesia

Muhammad Izzun Ni'am, Muhammad Haris Frimansyah, Zikrie Pramudia Alfarhisi

Abstract


The increasing need for swift information dissemination in line with modern technological advancements has emphasized the importance and significant impact of data analysis and processing as relevant academic disciplines. These processes encompass data acquisition from various sources, either through direct collection or extraction methods. Among the most crucial and widely utilized techniques for extracting data from the internet is web scraping, particularly when gathering data for research maintenance during the consolidation of multiple institutions into BRIN (National Research and Innovation Agency). Challenges emerge in effectively integrating existing research into a unified system without proper upkeep, as neglecting maintenance can lead to system degradation and hinder access to stored research. Successful maintenance necessitates centralized repositories for researchers' work data. The implementation of semantic annotation recognizing techniques within the web scraping feature of the E-Publishing website holds the potential to expedite this process. The use of web scraping promises to significantly simplify research data collection, while semantic annotation recognizing techniques are poised to streamline implementation, particularly due to the XML data foundation within the Open Archives Initiative (OAI) system. In the context of institution merging and research sustainability, technologies like web scraping and semantic annotation recognizing play pivotal roles in addressing these challenges.

Full Text:

PDF

References


[1] A. Sigov, L. Ratkin, L. A. Ivanov, and L. Da Xu, “Emerging Enabling Technologies for Industry 4.0 and Beyond,” Information Systems Frontiers, Jan. 2022, doi: 10.1007/s10796-021-10213-w.

[2] Badan Pusat Statistik Indonesia, “Telecommunication Statistics in Indonesia 2022,” Jakarta, Jul. 2023.

[3] R. H. Hariri, E. M. Fredericks, and K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges,” J Big Data, vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-0206-3.

[4] rd Mustafa Al Rifaee, “A Comparison of Web Data Extraction Techniques,” IEEE - Jordan Intenational Joint Conference on Electrical Enginering and Information Technology (JEEIT), pp. 785–789, 2019.

[5] Amrizal, “Pemanfaatan Open Jurnal System (OS) untuk Pengelolaan Jurnal Lumbung di Politeknik Pertanian Negeri Payakumbuh,” Lumbung, vol. 17, no. 2, pp. 64–74, 2018.

[6] Y. Fu and J. Schneider, “Towards knowledge maintenance in scientific digital libraries with the keystone framework,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, Institute of Electrical and Electronics Engineers Inc., Aug. 2020, pp. 217–226. doi: 10.1145/3383583.3398514.

[7] M. Khder, “Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application,” International Journal of Advances in Soft Computing and its Applications, vol. 13, no. 3, pp. 145–168, Dec. 2021, doi: 10.15849/IJASCA.211128.11.

[8] N. Islam, Z. Islam, and N. Noor, “A Survey on Optical Character Recognition System,” 2016.

[9] J. Liu, J. Sun, and S. Wang, “Pattern Recognition: An overview,” 2006.

[10] A. Jain and A. Doan, “SQL Queries Over Unstructured Text Databases,” IEEE Access, pp. 1255–1257, 2006.

[11] J. L. Martinez-Rodriguez, I. Lopez-Arevalo, and A. B. Rios-Alvarado, “Mining information from sentences through Semantic Web data and Information Extraction tasks,” J Inf Sci, vol. 48, no. 1, pp. 3–20, Feb. 2022, doi: 10.1177/0165551520934387.

[12] G. Shi and K. Barker, “International Conference on Spatial Data Mining and Geographical Knowledge Services.,” IEEE Access, pp. 273–278, 2011.

[13] B. Bhardwaj, S. I. Ahmed, J. Jaiharie, R. Sorabh Dadhich, and M. Ganesan, “Web Scraping Using Summarization and Named Entity Recognition (NER),” in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE, Mar. 2021, pp. 261–265. doi: 10.1109/ICACCS51430.2021.9441888.

[14] S. De and S. Sirisuriya, “A Comparative Study on Web Scraping,” Processings of 8th Intenational Research Conference, vol. 8, pp. 135–140, 2015.

[15] R. Vording, “Harvesting unstructured data in heterogenous business environments; exploring modern web scraping technologies,” Twente Student Conference on IT, vol. 34, pp. 1–9, 2020.

[16] S. K. Malik and S. Rizvi, “Information extraction using web usage mining, web scrapping and semantic annotation,” in Proceedings - 2011 International Conference on Computational Intelligence and Communication Systems, CICN 2011, 2011, pp. 465–469. doi: 10.1109/CICN.2011.97.

[17] A. Vlachidis, C. Binding, K. May, and D. Tudhope, “Automatic metadata generation in an archaeological digital library: Semantic annotation of grey literature,” Studies in Computational Intelligence, vol. 458, pp. 187–202, 2013, doi: 10.1007/978-3-642-34399-5_10.

[18] V. Uren et al., “Semantic annotation for knowledge management: Requirements and a survey of the state of the art,” Web Semantics, vol. 4, no. 1, pp. 14–28, Jan. 2006, doi: 10.1016/j.websem.2005.10.002.




DOI: https://doi.org/10.18860/mat.v15i2.23755

Refbacks

  • There are currently no refbacks.




Copyright (c) 2023 Muhammad Izzun Ni'am, Muhammad Haris Frimansyah, Zikrie Pramudia Alfarhisi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The journal is indexed by :

Dimensions Sinta CrossRef GoogleScholar
Index Copernicus Moraref Portal Garuda

 

_______________________________________________________________________________________________________________

Editorial Office:
Informatics Engineering Department
Faculty of Science and Technology
Universitas Islam Negeri Maulana Malik Ibrahim Malang
Jalan Gajayana 50 Malang, Jawa Timur, Indonesia 65144
Email: matics@uin-malang.ac.id
_______________________________________________________________________________________________________________

Creative Commons License
This work is licensed under a CC-BY-NC-SA 4.0.
© All rights reserved 2015. MATICS , ISSN : 1978-161X | e-ISSN :  2477-2550