Leveraging Web Data Harvesting for Product Recommendation Systems: A Comprehensive Review of Methodologies and Use Cases
Keywords:
Web data harvesting, product recommendation systems, web scraping, collaborative filteringAbstract
Product recommendation systems have become essential tools for enhancing user engagement and driving sales across e-commerce platforms. With the proliferation of online data sources, web data harvesting offers powerful capabilities to enrich recommendation models with real-time, diverse, and contextually relevant information. This paper presents a comprehensive review of methodologies and use cases related to leveraging web data harvesting for product recommendation. It examines key techniques, including web scraping, API integration, and semantic enrichment, outlining their roles in collecting product metadata, user reviews, competitor pricing, and emerging trends. Additionally, it explores how harvested data can be integrated into collaborative filtering, content-based, and hybrid recommendation frameworks to improve personalization and accuracy. The review also discusses ethical considerations, legal compliance, data quality challenges, and strategies for scalable implementation. By synthesizing current practices and applications, this work aims to guide researchers and practitioners in developing more effective, data-driven recommendation systems.
References
Abolghasemi, R., Viedma, E. H., Engelstad, P., Djenouri,Y., & Yazidi, A. (2024). A Graph Neural Approachfor Group Recommendation System Based onPairwise Preferences. Information Fusion, 107, 102343. https://doi.org/10.1016/j.inffus.2024.102343
Akshay, R. K. P., Rinu, R. T. R., Paul, R. T., & Joy, J.(2024). E-Commerce Recommender System onTwitter using Directed Multilayer Network. https://doi.org//10.21203/rs.3.rs-4223941/v1
Barbera, G., Araujo, L., & Fernandes, S. (2023). TheValue of Web Data Scraping: An Application toTripAdvisor. Big Data and Cognitive Computing, 7(3), 121. https://doi.org/10.3390/bdcc7030121
Barwary, M. J., Jacksi, K., & Al-Zebari, A. (2023).Constructing a Multilingual E-Learning Ontologythrough Web Crawling and Scraping. InternationalJournal of Communication Networks andInformation Security (IJCNIS), 15(3), 137-153. https://doi.org/10.17762/ijcnis.v15i3.6241
Beveridge, A., Studies, W., & Gallagher, J. (2021).Project-Oriented Web Scraping in TechnicalCommunication Research. Journal of Business andTechnical Communication, 36(2), 231-250. https://doi.org/10.1177/10506519211064619
Campos Macias, N., Düggelin, W., Ruf, Y., & Hanne, T.(2022). Building a Technology RecommenderSystem Using Web Crawling and Natural LanguageProcessing Technology. Algorithms, 15(8), 272. https://doi.org/10.3390/a15080272
de Haan, E., Padigar, M., El Kihal, S., Kübler, R., &Wieringa, J. E. (2024). Unstructured Data ResearchIn Business: Toward A Structured Approach. Journal of Business Research, 177, 114655. https://doi.org/10.1016/j.jbusres.2024.114655
Fikri, M. R., Handayanto, R. T., & Irwan, D. (2022).Web Scraping Situs Berita Menggunakan BahasaPemograman Python. Journal of Students'Research in Computer Science, 3(1), 123-136. https://doi.org/10.31599/jsrcs.v3i1.1514
Flores Cayuela, C. M., González Perea, R., CamachoPoyato, E., & Montesinos, P. (2022). An Ict-BasedDecision Support System for Precision IrrigationManagement in Outdoor Orange and GreenhouseTomato Crops. Agricultural Water Management, 269, 107686. https://doi.org/10.1016/j.agwat.2022.107686
Gaffey, J., Rajauria, G., McMahon, H., Ravindran, R.,Dominguez, C., Ambye-Jensen, M., Souza, M. F.,Meers, E., Aragonés, M. M., Skunca, D., &Sanders, J. P. M. (2023). Green BiorefinerySystems for The Production Of Climate-SmartSustainable Products from Grasses, Legumes andGreen Crop Residues. Biotechnology Advances, 66,108168. https://doi.org/10.1016/j.biotechadv.2023.108168
Gebretensae, Y. (2024). Understanding the CulturalCrisis: A Web Scraping Analysis of COVID-19Vaccine Perceptions and Media Patterns. ResearchSquare. https://doi.org/10.21203/rs.3.rs-4297475/v1
Ghoul, D., Patrix, J., Oulmakki, O., & Verny, J. (2024).Information System of Strategic Watch to RankInnovation Article by Machine Learning Models. Procedia Computer Science, 234, 772-779. https://doi.org/10.1016/j.procs.2024.03.063
Guyt, J. Y., Datta, H., & Boegershausen, J. (2024).Unlocking the Potential of Web Data for RetailingResearch. Journal of Retailing, 100(1), 130-147. https://doi.org/10.1016/j.jretai.2024.02.002
Hadasik, B. (2024). Reduction of Information AsymmetryIn E-Commerce: the Web Scraping Approach.
Kang, L., & Wang, Y. (2024). Efficient and AccuratePersonalized Product Recommendations ThroughFrequent Item Set Mining Fusion Algorithm. Heliyon, 10(3), 25044. https://doi.org/10.1016/j.heliyon.2024.e25044
Kudo, T., Yamamoto, T., & Watanabe, T. (2022). Three-Step Master Data Creation Method from Big Data:Scraping, Semi-Structuring, and Extraction. Procedia Computer Science, 207, 360-369. https://doi.org/10.1016/j.procs.2022.09.070
Lee, M. J., Kang, J., Hreha, K., & Pappadis, M. (2022).A Novel Web Scraping Approach to Identify StrokeOutcome Measures: A Feasibility Study. Archivesof Physical Medicine and Rehabilitation, 103(3),30. https://doi.org/10.1016/j.apmr.2022.01.082
Liu, Q., Yu, M., & Bai, M. (2024). A Study on ARecommendation Algorithm Based on SpectralClustering and Gru. IScience, 27(2), 108660. https://doi.org/10.1016/j.isci.2023.108660
Londhe, K., Dharmadhikari, N., Zaveri, P., & Sakoglu,U. (2024). Enhanced Travel Experience usingArtificial Intelligence: A Data-driven Approach. Procedia Computer Science, 235, 1920-1928. https://doi.org/10.1016/j.procs.2024.04.182
Lotfi, C., Srinivasan, S., Ertz, M., & Latrous, I. (2021).Web Scraping Techniques and Applications: ALiterature Review. SCRS Conference Proceedingson Intelligent Systems, 381-394. https://doi.org/10.52458/978-93-91842-08-6-38
Mahmuddah, L. A. A., Wibowo, S. A., & Budiman, G.(2022). Generating Information of Url Based onWeb Scraping Using Yolov3 Face RecognitionTechnology. IJAIT (International Journal ofApplied Information Technology), 5(2), 112-122. https://doi.org/10.25124/ijait.v5i02.3910
Marti, M., Dallo, I., Roth, P., Papadopoulos, A. N., &Zaugg, S. (2023). Illustrating the Impact ofEarthquakes: Evidence-Based and User-CenteredRecommendations on How to Design EarthquakeScenarios and Rapid Impact Assessments. International Journal of Disaster Risk Reduction, 90, 103674. https://doi.org/10.1016/j.ijdrr.2023.103674
Meyberg, C., Rendtel, U., & Leerhoff, H. (2024). FlatRent Price Prediction in Berlin with Web Scraping. AStA Wirtschafts- Und Sozialstatistisches Archiv, 18(2), 245-278. https://doi.org/10.1007/s11943-024-00340-6
Miao, L., Li, X., Yu, D., Ren, Y., Huang, Y., & Cao, S.(2023). Integrating Users' Long-Term and Short-Term Interests with Knowledge Graph to ImproveRestaurant Recommendation. Journal of King SaudUniversity - Computer and Information Sciences, 35(9), 101735. https://doi.org/10.1016/j.jksuci.2023.101735
Nurkholis, A., Fernando, Y., & Ans, F. A. (2023). MetodeVector Space Model Untuk Web Scraping PadaWebsite Freelance. INTI Nusa Mandiri, 18(1), 52-58. https://doi.org/10.33480/inti.v18i1.4266
Park, Y., & Shin, Y. (2022). Novel Scratch ProgrammingBlocks for Web Scraping. Electronics, 11(16),2584. https://doi.org/10.3390/electronics11162584
Pavitha, N., Pungliya, V., Raut, A., Bhonsle, R., Purohit,A., Patel, A., & Shashidhar, R. (2022). MovieRecommendation and Sentiment Analysis UsingMachine Learning. Global TransitionsProceedings, 3(1), 279-284. https://doi.org/10.1016/j.gltp.2022.03.012
Pawar, S., & Chiplunkar, N. (2022). Dynamic Searchingof Web Services Through Web Scraping.
Putrama, I. M., & Martinek, P. (2023). IntegratingPlatforms through Content-Based GraphRepresentation Learning. International Journal ofInformation Management Data Insights, 3(2),100200. https://doi.org/10.1016/j.jjimei.2023.100200
Rejeb, A., Rejeb, K., Appolloni, A., Treiblmaier, H., &Iranmanesh, M. (2024). Exploring The Impact ofChatgpt on Education: A Web Mining AndMachine Learning Approach. The InternationalJournal of Management Education, 22(1), 100932. https://doi.org/10.1016/j.ijme.2024.100932
Ren, S., Shi, L., Liu, Y., Cai, W., & Zhang, Y. (2023). APersonalised Operation and Maintenance Approachfor Complex Products Based oOn EquipmentPortrait Of Product-Service System. Robotics andComputer-Integrated Manufacturing, 80, 102485. https://doi.org/10.1016/j.rcim.2022.102485
Reynaldi, & Istiono, W. (2023). Content-based Filteringand Web Scraping in Website for RecommendedAnime. Asian Journal of Research in ComputerScience, 15(2), 32-42. https://doi.org/10.9734/ajrcos/2023/v15i2318
Rodrigues, B. C. L., Santana, V. V., Queiroz, L. P.,Rebello, C. M., & B. R. Nogueira, I. (2024).Harnessing Graph Neural Networks to CraftFragrances Based on Consumer Feedback. Computers and Chemical Engineering, 185,108674. https://doi.org/10.1016/j.compchemeng.2024.108674
Rostami, M., Berahmand, K., Forouzandeh, S.,Ahmadian, S., Farrahi, V., & Oussalah, M. (2024).A Novel Healthy Food Recommendation to UserGroups Based on a Deep Social CommunityDetection Approach. Neurocomputing, 576,127326. https://doi.org/10.1016/j.neucom.2024.127326
Roy, D., & Dutta, Mala. (2022). A Systematic Reviewand Research Perspective on RecommenderSystems. Journal of Big Data, 9(1), 59. https://doi.org/10.1186/s40537-022-00592-5
Sabesan, N., Nivethitha, Shreyah, J. N., Pranauv, A. J., &Shyam, R. (2023). Medical Ministrations throughWeb Scraping. ArXiv:2306.12310. https://doi.org/10.48550/arXiv.2306.12310
Shahade, A. K., Walse, K. H., Thakare, V. M., & Atique,M. (2023). Multi-Lingual Opinion Mining forSocial Media Discourses: an Approach Using DeepLearning Based Hybrid Fine-Tuned SmithAlgorithm with Adam Optimizer. InternationalJournal of Information Management Data Insights, 3(2), 100182. https://doi.org/10.1016/j.jjimei.2023.100182
Soni, P., de Runz, C., Bouali, F., & Venturini, G. (2024).A Survey on Automatic DashboardRecommendation Systems. Visual Informatics, 8(1), 67-79. https://doi.org/10.1016/j.visinf.2024.01.002
Tabaku, B., & Ali, M. (2021). Protecting WebApplications from Web Scraping. EmergingTechnologies in Computing, 56-70. https://doi.org/10.1007/978-3-030-90016-8_4
Talari, G., Cummins, E., McNamara, C., & O'Brien, J.(2022). State of the Art Review Of Big Data andWeb-Based Decision Support Systems (Dss) forFood Safety Risk Assessment with Respect toClimate Change. Trends in Food Science &Technology, 126, 192-204. https://doi.org/10.1016/j.tifs.2021.08.032
Varghese, R. R., & Mohan, B. R. (2023). Study on theSentimental Influence on Indian Stock Price. Heliyon, 9(12), 22788. https://doi.org/10.1016/j.heliyon.2023.e22788
Vijayakumar, P., & Jagatheeshkumar, G. (2024). User'sLearning Capability Aware E-ContentRecommendation System for Enhanced LearningExperience. Measurement: Sensors, 31, 100947. https://doi.org/10.1016/j.measen.2023.100947
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Prof. Elizabeth Schneider, Prof. Thomas J. Carter

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their articles published in this journal. All articles are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited.