Frontiers in Emerging Computer Science and Information Technology

  1. Home
  2. Archives
  3. Vol. 2 No. 05 (2025): Volume02 Issue05 April
  4. Articles
Frontiers in Emerging Computer Science and Information Technology

Article Details Page

Leveraging Web Data Harvesting for Product Recommendation Systems: A Comprehensive Review of Methodologies and Use Cases

Authors

  • Prof. Elizabeth Schneider Department of Computer Science, University College London, London, United Kingdom
  • Prof. Thomas J. Carter School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland

Keywords:

Web data harvesting, product recommendation systems, web scraping, collaborative filtering

Abstract

Product recommendation systems have become essential tools for enhancing user engagement and driving sales across e-commerce platforms. With the proliferation of online data sources, web data harvesting offers powerful capabilities to enrich recommendation models with real-time, diverse, and contextually relevant information. This paper presents a comprehensive review of methodologies and use cases related to leveraging web data harvesting for product recommendation. It examines key techniques, including web scraping, API integration, and semantic enrichment, outlining their roles in collecting product metadata, user reviews, competitor pricing, and emerging trends. Additionally, it explores how harvested data can be integrated into collaborative filtering, content-based, and hybrid recommendation frameworks to improve personalization and accuracy. The review also discusses ethical considerations, legal compliance, data quality challenges, and strategies for scalable implementation. By synthesizing current practices and applications, this work aims to guide researchers and practitioners in developing more effective, data-driven recommendation systems.

References

Abolghasemi, R., Viedma, E. H., Engelstad, P., Djenouri,Y., & Yazidi, A. (2024). A Graph Neural Approachfor Group Recommendation System Based onPairwise Preferences. Information Fusion, 107, 102343. https://doi.org/10.1016/j.inffus.2024.102343

Akshay, R. K. P., Rinu, R. T. R., Paul, R. T., & Joy, J.(2024). E-Commerce Recommender System onTwitter using Directed Multilayer Network. https://doi.org//10.21203/rs.3.rs-4223941/v1

Barbera, G., Araujo, L., & Fernandes, S. (2023). TheValue of Web Data Scraping: An Application toTripAdvisor. Big Data and Cognitive Computing, 7(3), 121. https://doi.org/10.3390/bdcc7030121

Barwary, M. J., Jacksi, K., & Al-Zebari, A. (2023).Constructing a Multilingual E-Learning Ontologythrough Web Crawling and Scraping. InternationalJournal of Communication Networks andInformation Security (IJCNIS), 15(3), 137-153. https://doi.org/10.17762/ijcnis.v15i3.6241

Beveridge, A., Studies, W., & Gallagher, J. (2021).Project-Oriented Web Scraping in TechnicalCommunication Research. Journal of Business andTechnical Communication, 36(2), 231-250. https://doi.org/10.1177/10506519211064619

Campos Macias, N., Düggelin, W., Ruf, Y., & Hanne, T.(2022). Building a Technology RecommenderSystem Using Web Crawling and Natural LanguageProcessing Technology. Algorithms, 15(8), 272. https://doi.org/10.3390/a15080272

de Haan, E., Padigar, M., El Kihal, S., Kübler, R., &Wieringa, J. E. (2024). Unstructured Data ResearchIn Business: Toward A Structured Approach. Journal of Business Research, 177, 114655. https://doi.org/10.1016/j.jbusres.2024.114655

Fikri, M. R., Handayanto, R. T., & Irwan, D. (2022).Web Scraping Situs Berita Menggunakan BahasaPemograman Python. Journal of Students'Research in Computer Science, 3(1), 123-136. https://doi.org/10.31599/jsrcs.v3i1.1514

Flores Cayuela, C. M., González Perea, R., CamachoPoyato, E., & Montesinos, P. (2022). An Ict-BasedDecision Support System for Precision IrrigationManagement in Outdoor Orange and GreenhouseTomato Crops. Agricultural Water Management, 269, 107686. https://doi.org/10.1016/j.agwat.2022.107686

Gaffey, J., Rajauria, G., McMahon, H., Ravindran, R.,Dominguez, C., Ambye-Jensen, M., Souza, M. F.,Meers, E., Aragonés, M. M., Skunca, D., &Sanders, J. P. M. (2023). Green BiorefinerySystems for The Production Of Climate-SmartSustainable Products from Grasses, Legumes andGreen Crop Residues. Biotechnology Advances, 66,108168. https://doi.org/10.1016/j.biotechadv.2023.108168

Gebretensae, Y. (2024). Understanding the CulturalCrisis: A Web Scraping Analysis of COVID-19Vaccine Perceptions and Media Patterns. ResearchSquare. https://doi.org/10.21203/rs.3.rs-4297475/v1

Ghoul, D., Patrix, J., Oulmakki, O., & Verny, J. (2024).Information System of Strategic Watch to RankInnovation Article by Machine Learning Models. Procedia Computer Science, 234, 772-779. https://doi.org/10.1016/j.procs.2024.03.063

Guyt, J. Y., Datta, H., & Boegershausen, J. (2024).Unlocking the Potential of Web Data for RetailingResearch. Journal of Retailing, 100(1), 130-147. https://doi.org/10.1016/j.jretai.2024.02.002

Hadasik, B. (2024). Reduction of Information AsymmetryIn E-Commerce: the Web Scraping Approach.

Kang, L., & Wang, Y. (2024). Efficient and AccuratePersonalized Product Recommendations ThroughFrequent Item Set Mining Fusion Algorithm. Heliyon, 10(3), 25044. https://doi.org/10.1016/j.heliyon.2024.e25044

Kudo, T., Yamamoto, T., & Watanabe, T. (2022). Three-Step Master Data Creation Method from Big Data:Scraping, Semi-Structuring, and Extraction. Procedia Computer Science, 207, 360-369. https://doi.org/10.1016/j.procs.2022.09.070

Lee, M. J., Kang, J., Hreha, K., & Pappadis, M. (2022).A Novel Web Scraping Approach to Identify StrokeOutcome Measures: A Feasibility Study. Archivesof Physical Medicine and Rehabilitation, 103(3),30. https://doi.org/10.1016/j.apmr.2022.01.082

Liu, Q., Yu, M., & Bai, M. (2024). A Study on ARecommendation Algorithm Based on SpectralClustering and Gru. IScience, 27(2), 108660. https://doi.org/10.1016/j.isci.2023.108660

Londhe, K., Dharmadhikari, N., Zaveri, P., & Sakoglu,U. (2024). Enhanced Travel Experience usingArtificial Intelligence: A Data-driven Approach. Procedia Computer Science, 235, 1920-1928. https://doi.org/10.1016/j.procs.2024.04.182

Lotfi, C., Srinivasan, S., Ertz, M., & Latrous, I. (2021).Web Scraping Techniques and Applications: ALiterature Review. SCRS Conference Proceedingson Intelligent Systems, 381-394. https://doi.org/10.52458/978-93-91842-08-6-38

Mahmuddah, L. A. A., Wibowo, S. A., & Budiman, G.(2022). Generating Information of Url Based onWeb Scraping Using Yolov3 Face RecognitionTechnology. IJAIT (International Journal ofApplied Information Technology), 5(2), 112-122. https://doi.org/10.25124/ijait.v5i02.3910

Marti, M., Dallo, I., Roth, P., Papadopoulos, A. N., &Zaugg, S. (2023). Illustrating the Impact ofEarthquakes: Evidence-Based and User-CenteredRecommendations on How to Design EarthquakeScenarios and Rapid Impact Assessments. International Journal of Disaster Risk Reduction, 90, 103674. https://doi.org/10.1016/j.ijdrr.2023.103674

Meyberg, C., Rendtel, U., & Leerhoff, H. (2024). FlatRent Price Prediction in Berlin with Web Scraping. AStA Wirtschafts- Und Sozialstatistisches Archiv, 18(2), 245-278. https://doi.org/10.1007/s11943-024-00340-6

Miao, L., Li, X., Yu, D., Ren, Y., Huang, Y., & Cao, S.(2023). Integrating Users' Long-Term and Short-Term Interests with Knowledge Graph to ImproveRestaurant Recommendation. Journal of King SaudUniversity - Computer and Information Sciences, 35(9), 101735. https://doi.org/10.1016/j.jksuci.2023.101735

Nurkholis, A., Fernando, Y., & Ans, F. A. (2023). MetodeVector Space Model Untuk Web Scraping PadaWebsite Freelance. INTI Nusa Mandiri, 18(1), 52-58. https://doi.org/10.33480/inti.v18i1.4266

Park, Y., & Shin, Y. (2022). Novel Scratch ProgrammingBlocks for Web Scraping. Electronics, 11(16),2584. https://doi.org/10.3390/electronics11162584

Pavitha, N., Pungliya, V., Raut, A., Bhonsle, R., Purohit,A., Patel, A., & Shashidhar, R. (2022). MovieRecommendation and Sentiment Analysis UsingMachine Learning. Global TransitionsProceedings, 3(1), 279-284. https://doi.org/10.1016/j.gltp.2022.03.012

Pawar, S., & Chiplunkar, N. (2022). Dynamic Searchingof Web Services Through Web Scraping.

Putrama, I. M., & Martinek, P. (2023). IntegratingPlatforms through Content-Based GraphRepresentation Learning. International Journal ofInformation Management Data Insights, 3(2),100200. https://doi.org/10.1016/j.jjimei.2023.100200

Rejeb, A., Rejeb, K., Appolloni, A., Treiblmaier, H., &Iranmanesh, M. (2024). Exploring The Impact ofChatgpt on Education: A Web Mining AndMachine Learning Approach. The InternationalJournal of Management Education, 22(1), 100932. https://doi.org/10.1016/j.ijme.2024.100932

Ren, S., Shi, L., Liu, Y., Cai, W., & Zhang, Y. (2023). APersonalised Operation and Maintenance Approachfor Complex Products Based oOn EquipmentPortrait Of Product-Service System. Robotics andComputer-Integrated Manufacturing, 80, 102485. https://doi.org/10.1016/j.rcim.2022.102485

Reynaldi, & Istiono, W. (2023). Content-based Filteringand Web Scraping in Website for RecommendedAnime. Asian Journal of Research in ComputerScience, 15(2), 32-42. https://doi.org/10.9734/ajrcos/2023/v15i2318

Rodrigues, B. C. L., Santana, V. V., Queiroz, L. P.,Rebello, C. M., & B. R. Nogueira, I. (2024).Harnessing Graph Neural Networks to CraftFragrances Based on Consumer Feedback. Computers and Chemical Engineering, 185,108674. https://doi.org/10.1016/j.compchemeng.2024.108674

Rostami, M., Berahmand, K., Forouzandeh, S.,Ahmadian, S., Farrahi, V., & Oussalah, M. (2024).A Novel Healthy Food Recommendation to UserGroups Based on a Deep Social CommunityDetection Approach. Neurocomputing, 576,127326. https://doi.org/10.1016/j.neucom.2024.127326

Roy, D., & Dutta, Mala. (2022). A Systematic Reviewand Research Perspective on RecommenderSystems. Journal of Big Data, 9(1), 59. https://doi.org/10.1186/s40537-022-00592-5

Sabesan, N., Nivethitha, Shreyah, J. N., Pranauv, A. J., &Shyam, R. (2023). Medical Ministrations throughWeb Scraping. ArXiv:2306.12310. https://doi.org/10.48550/arXiv.2306.12310

Shahade, A. K., Walse, K. H., Thakare, V. M., & Atique,M. (2023). Multi-Lingual Opinion Mining forSocial Media Discourses: an Approach Using DeepLearning Based Hybrid Fine-Tuned SmithAlgorithm with Adam Optimizer. InternationalJournal of Information Management Data Insights, 3(2), 100182. https://doi.org/10.1016/j.jjimei.2023.100182

Soni, P., de Runz, C., Bouali, F., & Venturini, G. (2024).A Survey on Automatic DashboardRecommendation Systems. Visual Informatics, 8(1), 67-79. https://doi.org/10.1016/j.visinf.2024.01.002

Tabaku, B., & Ali, M. (2021). Protecting WebApplications from Web Scraping. EmergingTechnologies in Computing, 56-70. https://doi.org/10.1007/978-3-030-90016-8_4

Talari, G., Cummins, E., McNamara, C., & O'Brien, J.(2022). State of the Art Review Of Big Data andWeb-Based Decision Support Systems (Dss) forFood Safety Risk Assessment with Respect toClimate Change. Trends in Food Science &Technology, 126, 192-204. https://doi.org/10.1016/j.tifs.2021.08.032

Varghese, R. R., & Mohan, B. R. (2023). Study on theSentimental Influence on Indian Stock Price. Heliyon, 9(12), 22788. https://doi.org/10.1016/j.heliyon.2023.e22788

Vijayakumar, P., & Jagatheeshkumar, G. (2024). User'sLearning Capability Aware E-ContentRecommendation System for Enhanced LearningExperience. Measurement: Sensors, 31, 100947. https://doi.org/10.1016/j.measen.2023.100947

Downloads

Published

2025-05-01

How to Cite

Prof. Elizabeth Schneider, & Prof. Thomas J. Carter. (2025). Leveraging Web Data Harvesting for Product Recommendation Systems: A Comprehensive Review of Methodologies and Use Cases. Frontiers in Emerging Computer Science and Information Technology, 2(05), 1–7. Retrieved from https://irjernet.com/index.php/fecsit/article/view/102