| 2024 | 21st IEEE/ACM International Conference on Mining Software Repositories, MSR 2024, Lisbon, Portugal, April 15-16, 2024 Diomidis Spinellis, Alberto Bacchelli, Eleni Constantinou |
| 2024 | A Dataset of Atoms of Confusion in the Android Open Source Project. Davi Tabosa, Oton Pinheiro, Lincoln S. Rocha, Windson Viana |
| 2024 | A Dataset of Microservices-based Open-Source Projects. Dario Amoroso d'Aragona, Alexander Bakhtin, Xiaozhou Li, Ruoyu Su, Lauren Adams, Ernesto Aponte, Francis Boyle, Patrick Boyle, Rachel Koerner, Joseph Lee, Fangchao Tian, Yuqing Wang, Jesse Nyyssölä, Ernesto Quevedo, Md Shahidur Rahaman, Amr S. Abdelfattah, Mika Mäntylä, Tomás Cerný, Davide Taibi |
| 2024 | A Four-Dimension Gold Standard Dataset for Opinion Mining in Software Engineering. Md. Rakibul Islam, Md. Fazle Rabbi, Youngeun Jo, Arifa I. Champa, Ethan Young, Camden Wilson, Gavin Scott, Minhaz Fahim Zibran |
| 2024 | A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges. Jiaqi Wu, Lingfeng Bao, Xiaohu Yang, Xin Xia, Xing Hu |
| 2024 | A Mutation-Guided Assessment of Acceleration Approaches for Continuous Integration: An Empirical Study of YourBase. Zhili Zeng, Tao Xiao, Maxime Lamothe, Hideaki Hata, Shane McIntosh |
| 2024 | A dataset of GitHub Actions workflow histories. Guillaume Cardoen, Tom Mens, Alexandre Decan |
| 2024 | AI Writes, We Analyze: The ChatGPT Python Code Saga. Md. Fazle Rabbi, Arifa I. Champa, Minhaz Fahim Zibran, Md. Rakibul Islam |
| 2024 | APIstic: A Large Collection of OpenAPI Metrics. Souhaila Serbout, Cesare Pautasso |
| 2024 | AW4C: A Commit-Aware C Dataset for Actionable Warning Identification. Zhipeng Liu, Meng Yan, Zhipeng Gao, Dong Li, Xiaohong Zhang, Dan Yang |
| 2024 | An Empirical Study on Just-in-time Conformal Defect Prediction. Xhulja Shahini, Andreas Metzger, Klaus Pohl |
| 2024 | An Investigation of Patch Porting Practices of the Linux Kernel Ecosystem. Xingyu Li, Zheng Zhang, Zhiyun Qian, Trent Jaeger, Chengyu Song |
| 2024 | Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects. Balreet Grewal, Wentao Lu, Sarah Nadi, Cor-Paul Bezemer |
| 2024 | Analyzing Developer-ChatGPT Conversations for Software Refactoring: An Exploratory Study. Soham Deo, Divya Hinge, Omkar Sandip Chavan, Yaxuan Olivia Wang, Mohamed Wiem Mkaouer |
| 2024 | Analyzing the Evolution and Maintenance of ML Models on Hugging Face. Joel Castaño, Silverio Martínez-Fernández, Xavier Franch, Justus Bogner |
| 2024 | AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis. Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein |
| 2024 | AndroZoo: A Retrospective with a Glimpse into the Future. Marco Alecci, Pedro Jesús Ruiz Jiménez, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein |
| 2024 | Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study. Triet Huynh Minh Le, Xiaoning Du, Muhammad Ali Babar |
| 2024 | Automating GUI-based Test Oracles for Mobile Apps. Kesina Baral, Jack Johnson, Junayed Mahmud, Sabiha Salma, Mattia Fazzini, Julia Rubin, Jeff Offutt, Kevin Moran |
| 2024 | Availability and Usage of Platform-Specific APIs: A First Empirical Study. Ricardo de Sousa Job, André C. Hora |
| 2024 | Bidirectional Paper-Repository Tracing in Software Engineering. Daniel Garijo, Miguel Arroyo, Esteban González, Christoph Treude, Nicola Tarocco |
| 2024 | Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources. Can Li, Jingxuan Zhang, Yixuan Tang, Zhuhang Li, Tianyue Sun |
| 2024 | BugsPHP: A dataset for Automated Program Repair in PHP. K. D. Pramod, W. T. N. De Silva, W. U. K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake |
| 2024 | Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation. Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati |
| 2024 | ChatGPT Chats Decoded: Uncovering Prompt Patterns for Superior Solutions in Software Development Lifecycle. Liangxuan Wu, Yanjie Zhao, Xinyi Hou, Tianming Liu, Haoyu Wang |
| 2024 | ChatGPT in Action: Analyzing Its Use in Software Development. Arifa I. Champa, Md. Fazle Rabbi, Costain Nachuma, Minhaz F. Zibran |
| 2024 | Chatting with AI: Deciphering Developer Conversations with ChatGPT. Suad Mohamed, Abdullah Parvin, Esteban Parra |
| 2024 | CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code. Martin Weyssow, Claudio Di Sipio, Davide Di Ruscio, Houari A. Sahraoui |
| 2024 | Cohort Studies for Mining Software Repositories. Nyyti Saarimäki, Sira Vegas, Valentina Lenarduzzi, Davide Taibi, Mikel Robredo |
| 2024 | Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform Analyses. Magdalena Steinböck, Jakob Bleier, Mikka Rainer, Tobias Urban, Christine Utz, Martina Lindorfer |
| 2024 | CrashJS: A NodeJS Benchmark for Automated Crash Reproduction. Philip Oliver, Jens Dietrich, Craig Anslow, Michael Homer |
| 2024 | Curated Email-Based Code Reviews Datasets. Mingzhao Liang, Wachiraphan Charoenwet, Patanamon Thongtanunam |
| 2024 | DATAR: A Dataset for Tracking App Releases. Yasaman Abedini, Mohammad Hadi Hajihosseini, Abbas Heydarnoori |
| 2024 | DRMiner: A Tool For Identifying And Analyzing Refactorings In Dockerfile. Emna Ksontini, Aycha Abid, Rania Khalsi, Marouane Kessentini |
| 2024 | Data Augmentation for Supervised Code Translation Learning. Binger Chen, Jacek Golebiowski, Ziawasch Abedjan |
| 2024 | Dataset: Copy-based Reuse in Open Source Software. Mahmoud Jahanshahi, Audris Mockus |
| 2024 | DevGPT: Studying Developer-ChatGPT Conversations. Tao Xiao, Christoph Treude, Hideaki Hata, Kenichi Matsumoto |
| 2024 | DistilKaggle: A Distilled Dataset of Kaggle Jupyter Notebooks. Mojtaba Mostafavi Ghahfarokhi, Arash Asgari, Mohammad Abolnejadian, Abbas Heydarnoori |
| 2024 | Does Generative AI Generate Smells Related to Container Orchestration?: An Exploratory Study with Kubernetes Manifests. Yue Zhang, Rachel Meredith, Wilson Reeves, Julia Coriolano, Muhammad Ali Babar, Akond Rahman |
| 2024 | Encoding Version History Context for Better Code Representation. Huy Nguyen, Christoph Treude, Patanamon Thongtanunam |
| 2024 | Enhancing Performance Bug Prediction Using Performance Code Metrics. Guoliang Zhao, Stefanos Georgiou, Ying Zou, Safwat Hassan, Derek Truong, Toby Corbin |
| 2024 | Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution. Saikat Mondal, Suborno Deb Bappon, Chanchal K. Roy |
| 2024 | Estimating Usage Of Open Source Projects. Sophia Vargas, Georg J. P. Link, Jayoung Lee |
| 2024 | Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot. Kei Koyanagi, Dong Wang, Kotaro Noguchi, Masanari Kondo, Alexander Serebrenik, Yasutaka Kamei, Naoyasu Ubayashi |
| 2024 | Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC). Mahi Begoug, Moataz Chouchen, Ali Ouni, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer |
| 2024 | GIRT-Model: Automated Generation of Issue Report Templates. Nafiseh Nikeghbal, Amir Hossein Kargaran, Abbas Heydarnoori |
| 2024 | GitBug-Java: A Reproducible Benchmark of Recent Java Bugs. André Silva, Nuno Saavedra, Martin Monperrus |
| 2024 | Global Prosperity or Local Monopoly? Understanding the Geography of App Popularity. Liu Wang, Conghui Zheng, Haoyu Wang, Xiapu Luo, Gareth Tyson, Yi Wang, Shangguang Wang |
| 2024 | Goblin: A Framework for Enriching and Querying the Maven Central Dependency Graph. Damien Jaime, Joyce El Haddad, Pascal Poizat |
| 2024 | Greenlight: Highlighting TensorFlow APIs Energy Footprint. Saurabhsingh Rajput, Maria Kechagia, Federica Sarro, Tushar Sharma |
| 2024 | GuiEvo: Automated Evolution of Mobile Application GUIs. Sabiha Salma, S M Hasan Mansur, Yule Zhang, Kevin Moran |
| 2024 | Hash4Patch: A Lightweight Low False Positive Tool for Finding Vulnerability Patch Commits. Simone Scalco, Ranindya Paramitha |
| 2024 | How Do So ware Developers Use ChatGPT? An Exploratory Study on GitHub Pull Requests. Moataz Chouchen, Narjes Bessghaier, Mahi Begoug, Ali Ouni, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer |
| 2024 | How I Learned to Stop Worrying and Love ChatGPT. Piotr Przymus, Mikolaj Fejzer, Jakub Narebski, Krzysztof Stencel |
| 2024 | How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions. João Helis Bernardo, Daniel Alencar da Costa, Sérgio Queiroz de Medeiros, Uirá Kulesza |
| 2024 | How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations. Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni |
| 2024 | Improving Automated Code Reviews: Learning from Experience. Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Wachiraphan Charoenwet |
| 2024 | Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads. Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee |
| 2024 | Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study. Joy Krishan Das, Saikat Mondal, Chanchal K. Roy |
| 2024 | Keep Me Updated: An Empirical Study on Embedded JavaScript Engines in Android Apps. Elliott Wen, Jiaxiang Zhou, Xiapu Luo, Giovanni Russello, Jens Dietrich |
| 2024 | Large Language Model vs. Stack Overflow in Addressing Android Permission Related Challenges. Sahrima Jannat Oishwee, Natalia Stakhanova, Zadia Codabux |
| 2024 | Learning to Predict and Improve Build Successes in Package Ecosystems. Harshitha Menon, Daniel Nichols, Abhinav Bhatele, Todd Gamblin |
| 2024 | Leveraging GPT-like LLMs to Automate Issue Labeling. Giuseppe Colavito, Filippo Lanubile, Nicole Novielli, Luigi Quaranta |
| 2024 | MalwareBench: Malware samples are not enough. Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie A. Williams |
| 2024 | MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations. Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang |
| 2024 | MicroRec: Leveraging Large Language Models for Microservice Recommendation. Ahmed Saeed Alsayed, Hoa Khanh Dam, Chau Nguyen |
| 2024 | Mining Our Way Back to Incremental Builds for DevOps Pipelines. Shane McIntosh |
| 2024 | Multi-faceted Code Smell Detection at Scale using DesigniteJava 2.0. Tushar Sharma |
| 2024 | Not all Dockerfile Smells are the Same: An Empirical Evaluation of Hadolint Writing Practices by Experts. Giovanni Rosa, Simone Scalabrino, Gregorio Robles, Rocco Oliveto |
| 2024 | On the Anatomy of Real-World R Code for Static Analysis. Florian Sihler, Lukas Pietzschmann, Raphael Straub, Matthias Tichy, Andor Diera, Abdelhalim Hafedh Dahou |
| 2024 | On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical Study. Amir M. Mir, Mehdi Keshani, Sebastian Proksch |
| 2024 | On the Executability of R Markdown Files. Md. Anaytul Islam, Muhammad Asaduzzaman, Shaowei Wang |
| 2024 | On the Taxonomy of Developers' Discussion Topics with ChatGPT. Ertugrul Sagdic, Arda Bayram, Md. Rakibul Islam |
| 2024 | Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language. Anisha Islam, Kalvin Eng, Abram Hindle |
| 2024 | Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems. Georges Aaron Randrianaina, Djamel Eddine Khelladi, Olivier Zendra, Mathieu Acher |
| 2024 | P3: A Dataset of Partial Program Patches. Dirk Beyer, Lars Grunske, Matthias Kettl, Marian Lingsch Rosenfeld, Moeketsi Raselimo |
| 2024 | PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software. Wenxin Jiang, Jerin Yasmin, Jason Jones, Nicholas Synovic, Jiashen Kuo, Nathaniel Bielanski, Yuan Tian, George K. Thiruvathukal, James C. Davis |
| 2024 | PlayMyData: a curated dataset of multi-platform video games. Andrea D'Angelo, Claudio Di Sipio, Cristiano Politowski, Riccardo Rubei |
| 2024 | Predicting the Impact of Crashes Across Release Channels. Suhaib Mujahid, Diego Elias Costa, Marco Castelluccio |
| 2024 | Quality Assessment of ChatGPT Generated Code and their Use by Developers. Mohammed Latif Siddiq, Lindsay Roney, Jiahao Zhang, Joanna C. S. Santos |
| 2024 | Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows. Hassan Onsori Delicheh, Alexandre Decan, Tom Mens |
| 2024 | Questioning the Questions We Ask About the Impact of AI on Software Engineering : MSR 2024 Keynote. Margaret-Anne D. Storey |
| 2024 | RABBIT: A tool for identifying bot accounts based on their recent GitHub event history. Natarajan Chidambaram, Tom Mens, Alexandre Decan |
| 2024 | SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt. Edi Sutoyo, Andrea Capiluppi |
| 2024 | SensoDat: Simulation-based Sensor Dataset of Self-driving Cars. Christian Birchler, Cyrill Rohrbach, Timo Kehrer, Sebastiano Panichella |
| 2024 | Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models. Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed |
| 2024 | TestDossier: A Dataset of Tested Values Automatically Extracted from Test Execution. André C. Hora |
| 2024 | The Impact of Code Ownership of DevOps Artefacts on the Outcome of DevOps CI Builds. Ajiromola Kola-Olawuyi, Nimmi Rashinika Weeraddana, Meiyappan Nagappan |
| 2024 | The PIPr Dataset of Public Infrastructure as Code Programs. Daniel Sokolowski, David Spielmann, Guido Salvaneschi |
| 2024 | The role of library versions in Developer-ChatGPT conversations. Rachna Raj, Diego Elias Costa |
| 2024 | Thirty-Three Years of Mathematicians and Software Engineers: A Case Study of Domain Expertise and Participation in Proof Assistant Ecosystems. Gwenyth Lincroft, Minsung Cho, Katherine Hough, Mahsa Bazzaz, Jonathan Bell |
| 2024 | TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs. Kaibo Liu, Yudong Han, Yiyang Liu, Jie M. Zhang, Zhenpeng Chen, Federica Sarro, Gang Huang, Yun Ma |
| 2024 | Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study. Rosalia Tufano, Antonio Mastropaolo, Federica Pepe, Ozren Dabic, Massimiliano Di Penta, Gabriele Bavota |
| 2024 | What Can Self-Admitted Technical Debt Tell Us About Security? A Mixed-Methods Study. Nicolás E. Díaz Ferreyra, Mojtaba Shahin, Mansooreh Zahedi, Sodiq Quadri, Riccardo Scandariato |
| 2024 | Whodunit: Classifying Code as Human Authored or GPT-4 generated- A case study on CodeChef problems. Oseremen Joy Idialu, Noble Saji Mathews, Rungroj Maipradit, Joanne M. Atlee, Meiyappan Nagappan |
| 2024 | Write me this Code: An Analysis of ChatGPT Quality for Producing Source Code. Konstantinos Moratis, Themistoklis Diamantopoulos, Dimitrios-Nikitas Nastos, Andreas L. Symeonidis |
| 2024 | Zero-shot Learning based Alternatives for Class Imbalanced Learning Problem in Enterprise Software Defect Analysis. Sangameshwar Patil, Balaraman Ravindran |
| 2024 | gawd: A Differencing Tool for GitHub Actions Workflows. Pooya Rostami Mazrae, Alexandre Decan, Tom Mens |