Open Access

Leveraging Web Data Harvesting for Product Recommendation Systems: A Comprehensive Review of Methodologies and Use Cases

4 Department of Computer Science, University College London, London, United Kingdom
4 School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland

Abstract

Product recommendation systems have become essential tools for enhancing user engagement and driving sales across e-commerce platforms. With the proliferation of online data sources, web data harvesting offers powerful capabilities to enrich recommendation models with real-time, diverse, and contextually relevant information. This paper presents a comprehensive review of methodologies and use cases related to leveraging web data harvesting for product recommendation. It examines key techniques, including web scraping, API integration, and semantic enrichment, outlining their roles in collecting product metadata, user reviews, competitor pricing, and emerging trends. Additionally, it explores how harvested data can be integrated into collaborative filtering, content-based, and hybrid recommendation frameworks to improve personalization and accuracy. The review also discusses ethical considerations, legal compliance, data quality challenges, and strategies for scalable implementation. By synthesizing current practices and applications, this work aims to guide researchers and practitioners in developing more effective, data-driven recommendation systems.

How to Cite

Prof. Elizabeth Schneider, & Prof. Thomas J. Carter. (2025). Leveraging Web Data Harvesting for Product Recommendation Systems: A Comprehensive Review of Methodologies and Use Cases. Frontiers in Emerging Computer Science and Information Technology, 2(05), 1–7. Retrieved from https://irjernet.com/index.php/fecsit/article/view/102

References

📄 Abolghasemi, R., Viedma, E. H., Engelstad, P., Djenouri,Y., & Yazidi, A. (2024). A Graph Neural Approachfor Group Recommendation System Based onPairwise Preferences. Information Fusion, 107, 102343. https://doi.org/10.1016/j.inffus.2024.102343
📄 Akshay, R. K. P., Rinu, R. T. R., Paul, R. T., & Joy, J.(2024). E-Commerce Recommender System onTwitter using Directed Multilayer Network. https://doi.org//10.21203/rs.3.rs-4223941/v1
📄 Barbera, G., Araujo, L., & Fernandes, S. (2023). TheValue of Web Data Scraping: An Application toTripAdvisor. Big Data and Cognitive Computing, 7(3), 121. https://doi.org/10.3390/bdcc7030121
📄 Barwary, M. J., Jacksi, K., & Al-Zebari, A. (2023).Constructing a Multilingual E-Learning Ontologythrough Web Crawling and Scraping. InternationalJournal of Communication Networks andInformation Security (IJCNIS), 15(3), 137-153. https://doi.org/10.17762/ijcnis.v15i3.6241
📄 Beveridge, A., Studies, W., & Gallagher, J. (2021).Project-Oriented Web Scraping in TechnicalCommunication Research. Journal of Business andTechnical Communication, 36(2), 231-250. https://doi.org/10.1177/10506519211064619
📄 Campos Macias, N., Düggelin, W., Ruf, Y., & Hanne, T.(2022). Building a Technology RecommenderSystem Using Web Crawling and Natural LanguageProcessing Technology. Algorithms, 15(8), 272. https://doi.org/10.3390/a15080272
📄 de Haan, E., Padigar, M., El Kihal, S., Kübler, R., &Wieringa, J. E. (2024). Unstructured Data ResearchIn Business: Toward A Structured Approach. Journal of Business Research, 177, 114655. https://doi.org/10.1016/j.jbusres.2024.114655
📄 Fikri, M. R., Handayanto, R. T., & Irwan, D. (2022).Web Scraping Situs Berita Menggunakan BahasaPemograman Python. Journal of Students'Research in Computer Science, 3(1), 123-136. https://doi.org/10.31599/jsrcs.v3i1.1514
📄 Flores Cayuela, C. M., González Perea, R., CamachoPoyato, E., & Montesinos, P. (2022). An Ict-BasedDecision Support System for Precision IrrigationManagement in Outdoor Orange and GreenhouseTomato Crops. Agricultural Water Management, 269, 107686. https://doi.org/10.1016/j.agwat.2022.107686
📄 Gaffey, J., Rajauria, G., McMahon, H., Ravindran, R.,Dominguez, C., Ambye-Jensen, M., Souza, M. F.,Meers, E., Aragonés, M. M., Skunca, D., &Sanders, J. P. M. (2023). Green BiorefinerySystems for The Production Of Climate-SmartSustainable Products from Grasses, Legumes andGreen Crop Residues. Biotechnology Advances, 66,108168. https://doi.org/10.1016/j.biotechadv.2023.108168
📄 Gebretensae, Y. (2024). Understanding the CulturalCrisis: A Web Scraping Analysis of COVID-19Vaccine Perceptions and Media Patterns. ResearchSquare. https://doi.org/10.21203/rs.3.rs-4297475/v1
📄 Ghoul, D., Patrix, J., Oulmakki, O., & Verny, J. (2024).Information System of Strategic Watch to RankInnovation Article by Machine Learning Models. Procedia Computer Science, 234, 772-779. https://doi.org/10.1016/j.procs.2024.03.063
📄 Guyt, J. Y., Datta, H., & Boegershausen, J. (2024).Unlocking the Potential of Web Data for RetailingResearch. Journal of Retailing, 100(1), 130-147. https://doi.org/10.1016/j.jretai.2024.02.002
📄 Hadasik, B. (2024). Reduction of Information AsymmetryIn E-Commerce: the Web Scraping Approach.
📄 Kang, L., & Wang, Y. (2024). Efficient and AccuratePersonalized Product Recommendations ThroughFrequent Item Set Mining Fusion Algorithm. Heliyon, 10(3), 25044. https://doi.org/10.1016/j.heliyon.2024.e25044
📄 Kudo, T., Yamamoto, T., & Watanabe, T. (2022). Three-Step Master Data Creation Method from Big Data:Scraping, Semi-Structuring, and Extraction. Procedia Computer Science, 207, 360-369. https://doi.org/10.1016/j.procs.2022.09.070
📄 Lee, M. J., Kang, J., Hreha, K., & Pappadis, M. (2022).A Novel Web Scraping Approach to Identify StrokeOutcome Measures: A Feasibility Study. Archivesof Physical Medicine and Rehabilitation, 103(3),30. https://doi.org/10.1016/j.apmr.2022.01.082
📄 Liu, Q., Yu, M., & Bai, M. (2024). A Study on ARecommendation Algorithm Based on SpectralClustering and Gru. IScience, 27(2), 108660. https://doi.org/10.1016/j.isci.2023.108660
📄 Londhe, K., Dharmadhikari, N., Zaveri, P., & Sakoglu,U. (2024). Enhanced Travel Experience usingArtificial Intelligence: A Data-driven Approach. Procedia Computer Science, 235, 1920-1928. https://doi.org/10.1016/j.procs.2024.04.182
📄 Lotfi, C., Srinivasan, S., Ertz, M., & Latrous, I. (2021).Web Scraping Techniques and Applications: ALiterature Review. SCRS Conference Proceedingson Intelligent Systems, 381-394. https://doi.org/10.52458/978-93-91842-08-6-38
📄 Mahmuddah, L. A. A., Wibowo, S. A., & Budiman, G.(2022). Generating Information of Url Based onWeb Scraping Using Yolov3 Face RecognitionTechnology. IJAIT (International Journal ofApplied Information Technology), 5(2), 112-122. https://doi.org/10.25124/ijait.v5i02.3910
📄 Marti, M., Dallo, I., Roth, P., Papadopoulos, A. N., &Zaugg, S. (2023). Illustrating the Impact ofEarthquakes: Evidence-Based and User-CenteredRecommendations on How to Design EarthquakeScenarios and Rapid Impact Assessments. International Journal of Disaster Risk Reduction, 90, 103674. https://doi.org/10.1016/j.ijdrr.2023.103674
📄 Meyberg, C., Rendtel, U., & Leerhoff, H. (2024). FlatRent Price Prediction in Berlin with Web Scraping. AStA Wirtschafts- Und Sozialstatistisches Archiv, 18(2), 245-278. https://doi.org/10.1007/s11943-024-00340-6
📄 Miao, L., Li, X., Yu, D., Ren, Y., Huang, Y., & Cao, S.(2023). Integrating Users' Long-Term and Short-Term Interests with Knowledge Graph to ImproveRestaurant Recommendation. Journal of King SaudUniversity - Computer and Information Sciences, 35(9), 101735. https://doi.org/10.1016/j.jksuci.2023.101735
📄 Nurkholis, A., Fernando, Y., & Ans, F. A. (2023). MetodeVector Space Model Untuk Web Scraping PadaWebsite Freelance. INTI Nusa Mandiri, 18(1), 52-58. https://doi.org/10.33480/inti.v18i1.4266
📄 Park, Y., & Shin, Y. (2022). Novel Scratch ProgrammingBlocks for Web Scraping. Electronics, 11(16),2584. https://doi.org/10.3390/electronics11162584
📄 Pavitha, N., Pungliya, V., Raut, A., Bhonsle, R., Purohit,A., Patel, A., & Shashidhar, R. (2022). MovieRecommendation and Sentiment Analysis UsingMachine Learning. Global TransitionsProceedings, 3(1), 279-284. https://doi.org/10.1016/j.gltp.2022.03.012
📄 Pawar, S., & Chiplunkar, N. (2022). Dynamic Searchingof Web Services Through Web Scraping.
📄 Putrama, I. M., & Martinek, P. (2023). IntegratingPlatforms through Content-Based GraphRepresentation Learning. International Journal ofInformation Management Data Insights, 3(2),100200. https://doi.org/10.1016/j.jjimei.2023.100200
📄 Rejeb, A., Rejeb, K., Appolloni, A., Treiblmaier, H., &Iranmanesh, M. (2024). Exploring The Impact ofChatgpt on Education: A Web Mining AndMachine Learning Approach. The InternationalJournal of Management Education, 22(1), 100932. https://doi.org/10.1016/j.ijme.2024.100932
📄 Ren, S., Shi, L., Liu, Y., Cai, W., & Zhang, Y. (2023). APersonalised Operation and Maintenance Approachfor Complex Products Based oOn EquipmentPortrait Of Product-Service System. Robotics andComputer-Integrated Manufacturing, 80, 102485. https://doi.org/10.1016/j.rcim.2022.102485
📄 Reynaldi, & Istiono, W. (2023). Content-based Filteringand Web Scraping in Website for RecommendedAnime. Asian Journal of Research in ComputerScience, 15(2), 32-42. https://doi.org/10.9734/ajrcos/2023/v15i2318
📄 Rodrigues, B. C. L., Santana, V. V., Queiroz, L. P.,Rebello, C. M., & B. R. Nogueira, I. (2024).Harnessing Graph Neural Networks to CraftFragrances Based on Consumer Feedback. Computers and Chemical Engineering, 185,108674. https://doi.org/10.1016/j.compchemeng.2024.108674
📄 Rostami, M., Berahmand, K., Forouzandeh, S.,Ahmadian, S., Farrahi, V., & Oussalah, M. (2024).A Novel Healthy Food Recommendation to UserGroups Based on a Deep Social CommunityDetection Approach. Neurocomputing, 576,127326. https://doi.org/10.1016/j.neucom.2024.127326
📄 Roy, D., & Dutta, Mala. (2022). A Systematic Reviewand Research Perspective on RecommenderSystems. Journal of Big Data, 9(1), 59. https://doi.org/10.1186/s40537-022-00592-5
📄 Sabesan, N., Nivethitha, Shreyah, J. N., Pranauv, A. J., &Shyam, R. (2023). Medical Ministrations throughWeb Scraping. ArXiv:2306.12310. https://doi.org/10.48550/arXiv.2306.12310
📄 Shahade, A. K., Walse, K. H., Thakare, V. M., & Atique,M. (2023). Multi-Lingual Opinion Mining forSocial Media Discourses: an Approach Using DeepLearning Based Hybrid Fine-Tuned SmithAlgorithm with Adam Optimizer. InternationalJournal of Information Management Data Insights, 3(2), 100182. https://doi.org/10.1016/j.jjimei.2023.100182
📄 Soni, P., de Runz, C., Bouali, F., & Venturini, G. (2024).A Survey on Automatic DashboardRecommendation Systems. Visual Informatics, 8(1), 67-79. https://doi.org/10.1016/j.visinf.2024.01.002
📄 Tabaku, B., & Ali, M. (2021). Protecting WebApplications from Web Scraping. EmergingTechnologies in Computing, 56-70. https://doi.org/10.1007/978-3-030-90016-8_4
📄 Talari, G., Cummins, E., McNamara, C., & O'Brien, J.(2022). State of the Art Review Of Big Data andWeb-Based Decision Support Systems (Dss) forFood Safety Risk Assessment with Respect toClimate Change. Trends in Food Science &Technology, 126, 192-204. https://doi.org/10.1016/j.tifs.2021.08.032
📄 Varghese, R. R., & Mohan, B. R. (2023). Study on theSentimental Influence on Indian Stock Price. Heliyon, 9(12), 22788. https://doi.org/10.1016/j.heliyon.2023.e22788
📄 Vijayakumar, P., & Jagatheeshkumar, G. (2024). User'sLearning Capability Aware E-ContentRecommendation System for Enhanced LearningExperience. Measurement: Sensors, 31, 100947. https://doi.org/10.1016/j.measen.2023.100947
📄 Dip Bharatbhai Patel. (2025). Leveraging BI for Competitive Advantage: Case Studies from Tech Giants. Frontiers in Emerging Engineering & Technologies, 2(04), 15–21. Retrieved from https://irjernet.com/index.php/feet/article/view/166