AI for Software Engineering (AI4SW) Research Group

Generative AI (GenAI) has rapidly entered mainstream software engineering, promising productivity boosts but raising serious concerns about security, quality, and long-term maintainability costs. Empirical studies show that a large amount of AI-generated code is insecure: e.g., ~45–62% of solutions across multiple languages contain vulnerabilities, often due to missing input validation, insecure authentication, or misconfigured headers. Studies of large code corpora confirm that GenAI tends to produce code with higher cyclomatic complexity, duplicated fragments, and reduced readability, factors known to increase maintenance costs and bug-proneness.

From a business perspective, early reports highlight that productivity gains are uneven: benefits occur when AI is integrated across the software lifecycle, not just during initial coding. Hidden costs emerge in the total cost of ownership (TCO), including developer time for debugging, reviewing, testing, repairing, and evolving GenAI code.

Our research group addresses these challenges with active research on the following main areas.

Security and maintainability of GenAI code

How secure, maintainable, and prone to changes is AI-generated code compared to human-authored code, and what risks dominate across languages and domains?

We collect paired datasets of GenAI vs. human implementations for benchmark tasks and open-source projects and apply various code analysis (SAST, DAST, SCA, secret scanning, and security linters - taint analysis, header/auth/session checks) for CWE detection and measure recall and false positive rate. We calculate maintainability metrics (cyclomatic complexity, code smells, change churn), and track bug-fix effort and defect density in evolving code bases. We also supplement the research with developer studies.

Vulnerability detection and Automated Program Repair (APR)

To what extent can GenAI tools reliably repair bugs/vulnerabilities, and how to ensure correctness and security of generated patches?

We work on various techniques to automatically detect exploitable vulnerabilities. We leverage the source code embedding capabilities of LLMs and examine the effectiveness of RAG enhanced prompting techniques. We use established APR benchmarks (Defects4J, QuixBugs, Vul4J) for evaluating LLM patch generation. We compare LLM-generated patches with symbolic or search-based APR. We validate correctness using code similarity metrics, human evaluation, regression test suites, differential testing, or model checking. We investigate hybrid pipelines where LLMs propose candidates refined by formal/automated checkers.

Unit test generation capabilities of LLMs

Can LLMs generate tests that achieve good coverage and fault/vulnerability detection in real-world projects?

We extend prior work on AI-generated tests for different programming languages. We evaluate coverage (statement/branch/mutation), bug/vulnerability exposing power using historical bug and vulnerability datasets. We compare plain LLM prompts vs. RAG or pipeline-based methods (e.g., UTGen) and construct our own LLM-based methods for generating security-relevant unit tests.

Members of the group

Dr. habil. Rudolf Ferenc, DSc - head of research group
Dr. Péter Hegedűs - senior researcher
Dr. István Siket - senior researcher
Dr. Judit Jász - senior researcher
Dr. Gábor Antal - researcher
Dr. László Tóth - researcher
Dr. Dénes Bán - researcher
Dr. Edit Pengő - postdoctoral researcher
Dr. Tamás Aladics - postdoctoral researcher
István Kolláth - PhD student
Norbert Vándor - PhD student
Péter Hajnal - PhD student
Zoltán Ságodi - PhD student
Norbert Szolnoki - PhD student
Vivienn Vörös - PhD student

Recent publications

[1] Hinrichs, T., Iannone, E., Aladics, T., Hegedűs, P., De Lucia, A., Palomba, F., & Scandariato, R. (2025). Back to the Roots: Assessing Mining Techniques for Java Vulnerability-Contributing Commits. Acm transactions on software engineering and methodology, 1, 1. https://doi.org/10.1145/3769105

[2] Pasquale, L., Sabetta, A., d’Amorim, M., Hegedűs, P., Mirakhorli, M. T., Okhravi, H., Payer, M., Rashid, A., Santos, J. C. S., Spring, J. M., Tan, L., & Tuma, K. (2025). Challenges to Using Large Language Models in Code Generation and Repair. Ieee security & privacy, 23(2), 81-88. https://doi.org/10.1109/MSEC.2025.3530488

[3] Viszkok, T., & Hegedűs, P. (2025). Unified SAST benchmark: Compare AI-driven and traditional static analyzers. Softwarex, 31, Article 102336. https://doi.org/10.1016/j.softx.2025.102336

[4] Aladics, T., Hegedűs, P., & Ferenc, R. (2024). A Comparative Study of Commit Representations for JIT Vulnerability Prediction. Computers, 13(1), Article 22. https://doi.org/10.3390/computers13010022

[5] Antal, G., Vándor, N., Kolláth, I., Mosolygó, B., Hegedűs, P., & Ferenc, R. (2024). PyBugHive: A Comprehensive Database of Manually Validated, Reproducible Python Bugs. Ieee access, 12, 123739-123756. https://doi.org/10.1109/ACCESS.2024.3449106

[6] Sabetta, A., Ponta, S., Cabrera Lozoya, R., Bezzi, M., Sacchetti, T., Greco, M., Balogh, G., Hegedűs, P., Ferenc, R., Paramitha, R., Pashchenko, I., Papotti, A., Milánkovich, Á., & Massacci, F. (2024). Known Vulnerabilities of Open Source Projects: Where Are the Fixes?. IEEE Security & Privacy, 22(2), 49-59. doi:10.1109/MSEC.2023.3343836

[7] Bagheri, A., & Hegedűs, P. (2024). Towards a Block-Level Conformer-Based Python Vulnerability Detection. Software, 3(3), 310-327. https://doi.org/10.3390/software3030016

[8] Bagheri, A., & Hegedűs, P. (2024). Towards a Block-Level ML-Based Python Vulnerability Detection Tool . Acta Cybernetica, 26(3), 323–371. https://doi.org/10.14232/actacyb.299667

[9] Rajkó, R., Siket, I., Hegedűs, P., & Ferenc, R. (2024). Development of partial least squares regression with discriminant analysis for software bug prediction. Heliyon, 10(15), Article e35045. https://doi.org/10.1016/j.heliyon.2024.e35045

[10] Sagodi, Z., Hegedus, P., & Ferenc, R. (2024). Increased Software Security with Large Language Models. Ercim news, 139, 11-13.

[11] Sagodi, Z., Siket, I., & Ferenc, R. (2024). Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and Copilot. Ieee access, 12, 72303-72316. https://doi.org/10.1109/ACCESS.2024.3403858

[12] Ságodi, Z., Antal, G., Bogenfürst, B., Isztin, M., Hegedűs, P., & Ferenc, R. (2024). Reality Check: Assessing GPT-4 in Fixing Real-World Software Vulnerabilities. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (pp. 252-261). https://doi.org/10.1145/3661167.3661207

[13] Vándor, N., Antal, G., Hegedűs, P., & Ferenc, R. (2024). On the Usefulness of Python Structural Pattern Matching: An Empirical Study. In 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 (pp. 501-511). https://doi.org/10.1109/SANER60148.2024.00058

[14] Aladics, T., Hegedűs, P., & Ferenc, R. (2023). An AST-Based Code Change Representation and Its Performance in Just-in-Time Vulnerability Prediction. (pp. 169-186). https://doi.org/10.1007/978-3-031-37231-5_8

[15] Antal, G., Hegedus, P., Herczeg, Z., Lóki, G., & Ferenc, R. (2023). Is JavaScript Call Graph Extraction Solved Yet? A Comparative Study of Static and Dynamic Tools. Ieee access, 11, 25266-25284. https://doi.org/10.1109/ACCESS.2023.3255984

[16] Gazdag, A., Ferenc, R., & Buttyán, L. (2023). CrySyS dataset of CAN traffic logs containing fabrication and masquerade attacks. Scientific data, 10(1), Article 903. https://doi.org/10.1038/s41597-023-02716-9

[17] Sun, Y., Brockhauser, S., Hegedűs, P., Plückthun, C., Gelisio, L., & Ferreira de Lima, D. E. (2023). Application of self-supervised approaches to the classification of X-ray diffraction spectra during phase transitions. Scientific reports, 13(1), Article 9370. https://doi.org/10.1038/s41598-023-36456-y

[18] Aladics, T., Hegedűs, P., & Ferenc, R. (2022). A Vulnerability Introducing Commit Dataset for Java: An Improved SZZ based Approach. In Proceedings of the 17th International Conference on Software Technologies (pp. 68-78). https://doi.org/10.5220/0011275200003266

[19] Bagheri, A., & Hegedűs, P. (2022). Is Refactoring Always a Good Egg? Exploring the Interconnection Between Bugs and Refactorings. In Proceedings of the 2022 Mining Software Repositories Conference, MSR 2022 (pp. 117-121). https://doi.org/10.1145/3524842.3528034

[20] Bagheri, A., & Hegedűs, P. (2022, June). Towards a Block-Level ML-Based Python Vulnerability Detection Tool. In The 13th Conference of PhD Students in Computer Science : June 29 – July 1, 2022 Szeged, Hungary (pp. 17-20).

[21] Buttyán, L., & Ferenc, R. (2022). IoT Malware Detection with Machine Learning. Ercim news, 129, 17-19.

[22] Ferenc, R. (2022). Deep learning in static, metric-based bug prediction (vol 6, 100021, 2020). Array, 13, Article 100125. https://doi.org/10.1016/j.array.2021.100125

[23] Hegedűs, P., & Ferenc, R. (2022). Static Code Analysis Alarms Filtering Reloaded: A New Real-World Dataset and its ML-Based Utilization. Ieee access, 10, 55090-55101. https://doi.org/10.1109/ACCESS.2022.3176865

[24] Jász, J., Hegedűs, P., Milánkovich, Á., & Ferenc, R. (2022). An End-to-End Framework for Repairing Potentially Vulnerable Source Code. In 22nd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2022 (pp. 242-247). https://doi.org/10.1109/SCAM55253.2022.00034

[25] Mosolygó, B., Vándor, N. R., Hegedűs, P., & Ferenc, R. (2022). A Line-Level Explainable Vulnerability Detection Approach for Java. Lecture notes in computer science, 13380, Article Chapter 8. https://doi.org/10.1007/978-3-031-10542-5_8

[26] Rozsa, B., Antal, G., & Ferenc, R. (2022). Don't DIY: Automatically transform legacy Python code to support structural pattern matching. In 22nd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2022 (pp. 164-169). https://doi.org/10.1109/SCAM55253.2022.00024

[27] Sun, Y., Brockhauser, S., & Hegedűs, P. (2022, June). Self-Supervised Relational Reasoning framework for Spectra Classification. In The 13th Conference of PhD Students in Computer Science : June 29 – July 1, 2022 Szeged, Hungary (pp. 187-191).

[28] Ságodi, Z., Pengő, E., Jász, J., Siket, I., & Ferenc, R. (2022). Static Call Graph Combination to Simulate Dynamic Call Graph Behavior. Ieee access, 10, 131829-131840. https://doi.org/10.1109/ACCESS.2022.3229182

[29] Vándor, N. R., Mosolygó, B., & Hegedűs, P. (2022). Comparing ML-Based Predictions and Static Analyzer Tools for Vulnerability Detection. Lecture notes in computer science, 13380, Article Chapter 7. https://doi.org/10.1007/978-3-031-10542-5_7

[30] Aladics, T., Jász, J., & Ferenc, R. (2021). Bug Prediction Using Source Code Embedding Based on Doc2Vec. Lecture notes in computer science, 12955, 382-397. https://doi.org/10.1007/978-3-030-87007-2_27

[31] Antal, G., Tóth, Z. G., Hegedűs, P., & Ferenc, R. (2021). Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics. Technologies, 9, Article 3. https://doi.org/10.3390/technologies9010003

[32] Bagheri, A., & Hegedűs, P. (2021). A Comparison of Different Source Code Representation Methods for Vulnerability Prediction in Python. In Quality of Information and Communications Technology (pp. 267-281). https://doi.org/10.1007/978-3-030-85347-1_20

[33] Gyimesi, P., Vancsics, B., Stocco, A., Mazinanian, D., Beszédes, Á., Ferenc, R., & Mesbah, A. (2021). BugsJS: a Benchmark and Taxonomy of JavaScript Bugs. Software testing verification & reliability, 31(4), Article e1751. https://doi.org/10.1002/stvr.1751

[34] Mosolygo, B., Vándor, N. R., Antal, G., Hegedűs, P., & Ferenc, R. (2021). Towards a prototype based explainable javascript vulnerability prediction model. In 1st International Conference on Code Quality, ICCQ 2021 (pp. 15-25). https://doi.org/10.1109/ICCQ51190.2021.9392984

[35] Mosolygó, B., Vándor, N. R., Antal, G., & Hegedűs, P. (2021). On the Rise and Fall of Simple Stupid Bugs: a Life-Cycle Analysis of SStuBs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories, MSR 2021 (pp. 495-499). https://doi.org/10.1109/MSR52588.2021.00061

[36] Sun, Y., Brockhauser, S., & Hegedűs, P. (2021). Comparing End-to-End Machine Learning Methods for Spectra Classification. Applied sciences-basel, 11(23), Article 11520. https://doi.org/10.3390/app112311520

[37] Sun, Y., Brockhauser, S., & Hegedűs, P. (2021, September). Machine learning applied for spectra classification. In International Conference on Computational Science and Its Applications (pp. 54-68). Cham: Springer International Publishing.. https://doi.org/10.1007/978-3-030-87013-3_5

[38] Szamosvölgyi, Z. J., Váradi, E. T., Tóth, Z. G., Jász, J., & Ferenc, R. (2021). Assessing Ensemble Learning Techniques in Bug Prediction. Lecture notes in computer science, 12955, Article Chapter 26. https://doi.org/10.1007/978-3-030-87007-2_26

[39] Viszkok, T., Hegedűs, P., & Ferenc, R. (2021). Improving Vulnerability Prediction of JavaScript Functions Using Process Metrics. In Proceedings of the 16th International Conference on Software Technologies (ICSOFT 2021) (pp. 185-195). https://doi.org/10.5220/0010558501850195

Page last modified: October 14, 2025

Search form

Security and maintainability of GenAI code

Vulnerability detection and Automated Program Repair (APR)

Unit test generation capabilities of LLMs

Members of the group

Recent publications