4
Department of Information Technology, University of the Cumberlands, Williamsburg, KY, 40769, USA
Abstract
Large Language Models are increasingly used in enterprise software development for code generation, debugging, refactoring, documentation, and test creation. Their adoption has changed the practical workflow of software engineering by allowing developers to obtain implementation suggestions, boilerplate code, test scaffolds, configuration files, and architectural explanations directly from conversational systems. However, LLM-generated code may contain security vulnerabilities, hallucinated APIs, outdated dependencies, weak authorization logic, insecure default configurations, and implementation patterns that conflict with organizational policies. These risks are especially significant in enterprise environments where software systems are connected to sensitive data, identity platforms, regulated workflows, cloud infrastructure, and third-party integrations. Existing approaches such as secure prompting and manual code review can improve generated outputs, but prompt instructions alone are insufficient because general-purpose LLMs may not have access to current enterprise standards, approved libraries, secure coding rules, deployment constraints, or project-specific architecture.
This paper proposes RAG-SecCode, a retrieval-augmented secure coding framework that grounds LLM-assisted software development in authoritative and organization-specific security knowledge. The framework integrates retrieval-augmented generation, secure coding policies, vulnerability knowledge bases, dependency governance rules, static and dynamic validation checks, and human-in-the-loop review. RAG-SecCode is designed to improve code security by supplying the model with relevant secure coding context before generation and by validating the generated output after generation. The study conceptually evaluates whether retrieval-augmented secure context can improve generated code quality compared with baseline prompting and secure prompting alone. The evaluation design measures vulnerability count, CWE mapping, hallucinated API frequency, dependency risk, policy compliance, test inclusion, and human reviewer approval rate.
The paper contributes a practical framework for enterprise LLM software development by connecting recent research on vulnerability detection, LLM-based code generation, retrieval-augmented generation, secure repair, and static analysis. It argues that secure code generation should not be treated only as a prompting problem but as a governed software engineering workflow. RAG-SecCode provides a structured approach for reducing insecure and hallucinated LLM-generated code through contextual retrieval, rule-based validation, security-aware review, and continuous feedback.
How to Cite
Kongari, S. S. R. (2026). RAG-SecCode: Retrieval-Augmented Secure Coding Guidance for Enterprise LLM Software Development. Frontiers in Emerging Artificial Intelligence and Machine Learning, 3(06), 08–23. https://doi.org/10.64917/feaiml/Volume03Issue06-01
Aggarwal and P. Jalote, “Integrating static and dynamic analysis for detecting vulnerabilities,” in Proc. 30th Annu. Int. Comput. Softw. Appl. Conf. (COMPSAC), Sep. 2006, pp. 343–350.
V. Akuthota, R. Kasula, S. T. Sumona, M. Mitul, M. T. Reza, and M. D. Rahman, “Vulnerability detection and monitoring using llm, "Vulnerability detection and monitoring using LLM,” in Proc. IEEE 9th Int. Women Eng. (WIE) Conf. Elect. Comput. Eng. (WIECON-ECE), Nov. 2023, pp. 309–314.
Arusoaie, S. Ciobâca, V. Craciun, D. Gavrilut, and D. Lucanu, “A comparison of open-source static analysis tools for vulnerability detection in C/C++ code,” in Proc. 19th Int. Symp. Symbolic Numeric Algorithms Scientific Comput. (SYNASC), Sep. 2017, pp. 161–168.
M. Besta, “Graph of thoughts: Solving elaborate problems with large language models,” in Proc. AAAI Conf. Artif. Intell., Mar. 2024, vol. 38, no. 16, pp. 17682–17690.
G. Bhandari, A. Naseer, and L. Moonen, “CVEfixes: Automated collection of vulnerabilities and their fixes from open-source software,” in Proc. 17th Int. Conf. Predictive Models Data Analytics Softw. Eng., Aug. 2021, pp. 30–39.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell, “Language models are few-shot learners,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 1877–1901.
Q.-C. Bui, R. Scandariato, and N. E. D. Ferreyra, “Vul4J: A dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques,” in Proc. IEEE/ACM 19th Int. Conf. Mining Softw. Repositories (MSR), May 2022, pp. 464–468.
S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?,” IEEE Trans. Softw. Eng., vol. 48, no. 9, pp. 3280–3296, Sep. 2022.
Y. Chen, Z. Ding, L. Alowain, X. Chen, and D. Wagner, “DiverseVul: A new vulnerable source code dataset for deep learning based vulnerability detection,” in Proc. 26th Int. Symp. Res. Attacks, Intrusions Defenses, Oct. 2023, pp. 654–668.
E. Collini, F. Indra Kurniadi, P. Nesi, and G. Pantaleo, “Context-aware retrieval augmented generation using similarity validation to handle context inconsistencies in large language models,” IEEE Access, vol. 13, pp. 170065–170080, 2025.
Cybernative/code_vulnerability_security_dpo–Datasets At Hugging Face, CyberNative AI LLC, Sacramento, CA, USA, 2024, Accessed: Oct. 12, 2025.
D. Dai, “Why can GPT learn in-context? Language models implicitly perform gradient descent as meta-optimizers,” 2023, arXiv:2212.10559.
G. Deng, “PentestGPt: An LLM-empowered automatic penetration testing tool,” Aug. 2023, arXiv:2308.06782 [cs].
Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case fuzzers: Testing deep learning libraries via FuzzGPT,” Apr. 2023, arXiv:2304.02014 [cs].
X. Du, G. Zheng, K. Wang, Y. Zou, Y. Wang, W. Deng, J. Feng, M. Liu, B. Chen, X. Peng, T. Ma, and Y. Lou, “Vul-RAG: Enhancing LLM-based vulnerability detection via knowledge-level RAG,” 2024, arXiv:2406.11147.
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: A pre-trained model for programming and natural languages,” 2020, arXiv:2002.08155.
M. Fu, C. Tantithamthavorn, T. Le, V. Nguyen, and D. Phung, “VulRepair: A T5-based automated software vulnerability repair,” in Proc. 30th ACM Joint Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng. (ESEC/FSE), New York, NY, USA : ACM, 2022, pp. 935–947, doi: 10.1145/3540250.3549098.
Z. Gao, H. Wang, Y. Zhou, W. Zhu, and C. Zhang, “How far have we gone in vulnerability detection using large language models,” 2023, arXiv:2311.12420.
K. Goseva-Popstojanova and A. Perhinschi, “On the capability of static code analysis to detect security vulnerabilities,” Inf. Softw. Technol., vol. 68, pp. 18–33, Dec. 2015.
D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo, Y. Xiong, and W. Liang, “Deepseek-coder: When the large language model meets programming—The rise of code intelligence,” 2024, arXiv:2401.14196.
Y. Guo, C. Patsakis, Q. Hu, Q. Tang, and F. Casino, “Outside the comfort zone: Analysing LLM capabilities in software vulnerability detection,” in Proc. Comput. Secur. - ESORICS: 29th Eur. Symp. Res. Comput. Secur., Bydgoszcz, Poland. Berlin, Germany : Springer, Sep. 2024, pp. 271–289.
R. Heumüller, T. Langer, and F. Ortmeier, “Empirical analysis of openai embeddings for semantic code review comment similarity,” in Proc. Euromicro Conf. Softw. Eng. Adv. Appl. Springer, Sep. 2025, pp. 37–45.
Y. Jiao, J. Han, and C. Huang, “DeepVulHunter: Enhancing the code vulnerability detection capability of LLMs through multi-round analysis,” J. Intell. Inf. Syst., vol. 63, no. 6, pp. 2237–2264, Dec. 2025.
H. Joshi, J. C. Sanchez, S. Gulwani, V. Le, G. Verbruggen, and I. Radiček, “Repair is nearly generation: Multilingual program repair with LLMs,” in Proc. AAAI Conf. Artif. Intell., Jun. 2023, vol. 37, no. 44, pp. 5131–5140.
S. Kaniewski, F. Schmidt, M. Enzweiler, M. Menth, and T. Heer, “A systematic literature review on detecting software vulnerabilities with large language models,” 2025, arXiv:2507.22659.
Kazemian, P. Ramanan, and M. Yildirim, “Text embedding models can be great data engineers,” 2025, arXiv:2505.14802.
Khare, S. Dutta, Z. Li, A. Solko-Breslin, R. Alur, and M. Naik, “Understanding the effectiveness of large language models in detecting security vulnerabilities,” in Proc. IEEE Conf. Softw. Testing, Verification Validation (ICST), Mar. 2025, pp. 103–114.
L. Kumar, V. Singh, S. Patel, and P. Mishra, “Empowering sw security: Codebert and machine learning approaches to vulnerability detection,” in Proc. 21st Int. Conf. Natural Lang. Process. (ICON), 2024, pp. 399–407.
T. H. M. Le, M. A. Babar, and T. H. Thai, “Software vulnerability prediction in low-resource languages: An empirical study of CodeBERT and ChatGPT,” in Proc. 28th Int. Conf. Eval. Assessment Softw. Eng., Jun. 2024, pp. 679–685.
P. Lewis, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Proc. NeurIPS, pp. 9459–9474, 2020.
H. Li, H. Yu, Y. Zhai, and Z. Qian, “Enhancing static analysis for practical bug detection: An LLM-integrated approach,” in Proc. ACM Program. Lang., Apr. 2024, vol. 8, no. OOPSLA1, pp. 474–499.
H. Li, Y. Hao, Y. Zhai, and Z. Qian, “Enhancing static analysis for practical bug detection: An LLM-integrated approach,” in Proc. ACM Program. Lang. (OOPSLA), vol. 7, pp. 474–499, 2023.
Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, “VulDeePecker: A deep learning-based system for vulnerability detection,” 2018, arXiv:1801.01681.
S. Lipp, S. Banescu, and A. Pretschner, “An empirical study on the effectiveness of static c code analyzers for vulnerability detection,” in Proc. 31st ACM SIGSOFT Int. Symp. Softw. Test. Anal. New York, NY, USA : Association for Computing Machinery, Jul. 2022, pp. 544–555.
Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation,” in Proc. Adv. Neural Inf. Process. Syst., Dec. 2023, vol. 36, pp. 21558–21572.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized BERT pretraining approach,” 2019, arXiv:1907.11692.
Z. Liu, Y. Tang, X. Luo, Y. Zhou, and L. F. Zhang, “No need to lift a finger anymore? Assessing the quality of code generation by ChatGPT,” IEEE Trans. Softw. Eng., vol. 50, no. 6, pp. 1548–1584, Jun. 2024.
R. A. Majizi, H. Shaker, B. Kumar, and Z. T. Sharef, “Vulnerability detection: Dynamic analysis of web applications and assessment of penetration testing tools,” in AI and IoT: Driving Business Success and Sustainability in the Digital Age, vol. 2. Switzerland : Springer, 2025, pp. 801–811. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-88874-8_71
Medeiros, N. Neves, and M. Correia, “Detecting and removing web application vulnerabilities with static analysis and data mining,” IEEE Trans. Rel., vol. 65, no. 1, pp. 54–69, Mar. 2016.
R. Meng, M. Mirchev, M. Böhme, and A. Roychoudhury, “Large language model guided protocol fuzzing,” in Proc. Netw. Distrib. Syst. Secur. Symp., San Diego, CA, USA : Internet Society, pp. 1–15, 2024. [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2024-556-paper.pdf
R. Mim, A. Satter, T. Ahammed, and K. Sakib, “Automated software vulnerability detection using CodeBERT and convolutional neural network,” in Proc. 19th Int. Conf. Eval. Novel Approaches to Softw. Eng., 2024, pp. 156–167.
Moradi Dakhel, V. Majdinasab, A. Nikanjam, F. Khomh, M. C. Desmarais, and Z. M. J. Jiang, “GitHub copilot ai pair programmer: Asset or liability? ” J. Syst. Softw., vol. 203, Sep. 2023, Art. no. 111734.
Ouyang, “Training language models to follow instructions with human feedback,” 2022, arXiv:2203.02155.
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub copilot’s code contributions,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2022, pp. 754–768.
H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in Proc. IEEE Symp. Secur. Privacy (SP), May 2023, pp. 2339–2356.
N. Perry, M. Srivastava, D. Kumar, and D. Boneh, “Do users write more insecure code with AI assistants?” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. (CCS), New York, NY, USA : ACM, Nov. 2023, pp. 2785–2799, doi: 10.1145/3576915.3623157.
G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at C: A user study on the security implications of large language model code assistants,” in Proc. 32nd USENIX Security Symp. (USENIX Security), 2023, pp. 2205–2222. [Online]. Available: https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval
X. Shen, Z. Chen, M. Backes, and Y. Zhang, “In ChatGPT we trust? Measuring and characterizing the reliability of ChatGPT,” Oct. 2023, arXiv230408979 [cs].
Y. Sun, “GPTScan: Detecting logic vulnerabilities in smart contracts by combining GPT with program analysis,” in Proc. IEEE/ACM 46th Int. Conf. Softw. Eng. (ICSE), New York, NY, USA : ACM, Apr. 2024, pp. 1–13. [Online]. Available: https://dl.acm.org/doi/10.1145/3597503.3639117
Y. Sun, “LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMS’ vulnerability reasoning,” Jan. 2024, arXiv:2401.16185 [cs].
H. Touvron, “Llama 2: Open foundation and fine-tuned chat models,” 2023, arXiv:2307.09288.
Wei, “Chain-of-thought prompting elicits reasoning in large language models,” in Proc. NeurIPS, pp. 24824–24837, 2022.
S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT,” Apr. 2023, arXiv:2304.00385 [cs].
Yetistiren, I. Ozsoy, and E. Tuzun, “Assessing the quality of GitHub copilot’s code generation,” in Proc. 18th Int. Conf. Predictive Models Data Anal. Softw. Eng. (PROMISE), New York, NY, USA : ACM, Nov. 2022, pp. 62–71. [Online]. Available: https://dl.acm.org/doi/10.1145/3558489.3559072
Zhang, “ACFIX: Guiding LLMs with mined common RBAC practices for context-aware repair of access control vulnerabilities in smart contracts,” Mar. 2024, arXiv:2403.06838 [cs].
Zhong and Z. Wang, “Can ChatGPT replace StackOverflow? A study on robustness and reliability of large language model code generation,” Jan. 2024, arXiv:2308.10335 [cs].