| 2020 | 20-MAD: 20 Years of Issues and Commits of Mozilla and Apache Development. Maëlick Claes, Mika V. Mäntylä |
| 2020 | A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. Jiahao Fan, Yi Li, Shaohua Wang, Tien N. Nguyen |
| 2020 | A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits. Audris Mockus, Diomidis Spinellis, Zoe Kotti, Gabriel John Dusing |
| 2020 | A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits. Tanner Fry, Tapajit Dey, Andrey Karnauch, Audris Mockus |
| 2020 | A Dataset for GitHub Repository Deduplication. Diomidis Spinellis, Zoe Kotti, Audris Mockus |
| 2020 | A Dataset of Dockerfiles. Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas W. Reps |
| 2020 | A Dataset of Enterprise-Driven Open Source Software. Diomidis Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas |
| 2020 | A Large-Scale Comparative Evaluation of IR-Based Tools for Bug Localization. Shayan A. Akbar, Avinash C. Kak |
| 2020 | A Machine Learning Approach for Vulnerability Curation. Yang Chen, Andrew E. Santosa, Ming Yi Ang, Abhishek Sharma, Asankhaya Sharma, David Lo |
| 2020 | A Mixed Graph-Relational Dataset of Socio-technical Interactions in Open Source Systems. Usman Ashraf, Christoph Mayr-Dorn, Alexander Egyed, Sebastiano Panichella |
| 2020 | A Soft Alignment Model for Bug Deduplication. Irving Muller Rodrigues, Daniel Aloise, Eraldo Rezende Fernandes, Michel R. Dagenais |
| 2020 | A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub. Yaroslav Golubev, Maria Eliseeva, Nikita Povarov, Timofey Bryksin |
| 2020 | A Study on the Accuracy of OCR Engines for Source Code Transcription from Programming Screencasts. Abdulkarim Khormi, Mohammad Alahmadi, Sonia Haiduc |
| 2020 | AIMMX: Artificial Intelligence Model Metadata Extractor. Jason Tsay, Alan Braz, Martin Hirzel, Avraham Shinnar, Todd W. Mummert |
| 2020 | An Empirical Study of Build Failures in the Docker Context. Yiwen Wu, Yang Zhang, Tao Wang, Huaimin Wang |
| 2020 | An Empirical Study of Method Chaining in Java. Tomoki Nakamaru, Tomomasa Matsunaga, Tetsuro Yamazaki, Soramichi Akiyama, Shigeru Chiba |
| 2020 | An Empirical Study on Regular Expression Bugs. Peipei Wang, Chris Brown, Jamie A. Jennings, Kathryn T. Stolee |
| 2020 | An Empirical Study on the Impact of Deimplicitization on Comprehension in Programs Using Application Frameworks. Jürgen Cito, Jiasi Shen, Martin C. Rinard |
| 2020 | An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset. Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou, Debasish Chakroborti, Banani Roy, Chanchal K. Roy, Kevin A. Schneider |
| 2020 | AndroZooOpen: Collecting Large-scale Open Source Android Apps for the Research Community. Pei Liu, Li Li, Yanjie Zhao, Xiaoyu Sun, John Grundy |
| 2020 | Automatically Granted Permissions in Android apps: An Empirical Study on their Prevalence and on the Potential Threats for Privacy. Paolo Calciati, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller |
| 2020 | Behind the Intents: An In-depth Empirical Study on Software Refactoring in Modern Code Review. Matheus Paixão, Anderson G. Uchôa, Ana Carla Bibiano, Daniel Oliveira, Alessandro Garcia, Jens Krinke, Emilio Arvonio |
| 2020 | Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems. Laerte Xavier, Fabio Ferreira, Rodrigo Brito, Marco Túlio Valente |
| 2020 | Boa Views: Easy Modularization and Sharing of MSR Analyses. Che Shian Hung, Robert Dyer |
| 2020 | Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting? Nicole Novielli, Fabio Calefato, Davide Dongiovanni, Daniela Girardi, Filippo Lanubile |
| 2020 | Capture the Feature Flag: Detecting Feature Flags in Open-Source. Jens Meinicke, Juan Hoyos, Bogdan Vasilescu, Christian Kästner |
| 2020 | Challenges in Chatbot Development: A Study of Stack Overflow Posts. Ahmad Abdellatif, Diego Costa, Khaled Badran, Rabe Abdalkareem, Emad Shihab |
| 2020 | Characterizing and Identifying Composite Refactorings: Concepts, Heuristics and Patterns. Leonardo da Silva Sousa, Diego Cedrim, Alessandro Garcia, Willian Nalepa Oizumi, Ana Carla Bibiano, Daniel Oliveira, Miryung Kim, Anderson Oliveira |
| 2020 | Cheating Death: A Statistical Survival Analysis of Publicly Available Python Projects. Rao Hamza Ali, Chelsea Parlett-Pelleriti, Erik Linstead |
| 2020 | Dataset of Video Game Development Problems. Cristiano Politowski, Fábio Petrillo, Gabriel Cavalheiro Ullmann, Josias de Andrade Werly, Yann-Gaël Guéhéneuc |
| 2020 | Detecting Video Game-Specific Bad Smells in Unity Projects. Antonio Borrelli, Vittoria Nardone, Giuseppe A. Di Lucca, Gerardo Canfora, Massimiliano Di Penta |
| 2020 | Detecting and Characterizing Bots that Commit Code. Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, Audris Mockus |
| 2020 | Determining the Intrinsic Structure of Public Software Development History. Antoine Pietri, Guillaume Rousseau, Stefano Zacchiroli |
| 2020 | Developer-Driven Code Smell Prioritization. Fabiano Pecorelli, Fabio Palomba, Foutse Khomh, Andrea De Lucia |
| 2020 | Did You Remember To Test Your Tokens? Danielle Gonzalez, Michael Rath, Mehdi Mirakhorli |
| 2020 | Do Explicit Review Strategies Improve Code Review Performance? Pavlína Wurzel Gonçalves, Enrico Fregnan, Tobias Baum, Kurt Schneider, Alberto Bacchelli |
| 2020 | Embedding Java Classes with code2vec: Improvements from Variable Obfuscation. Rhys Compton, Eibe Frank, Panos Patros, Abigail M. Y. Koay |
| 2020 | Empirical Study of Restarted and Flaky Builds on Travis CI. Thomas Durieux, Claire Le Goues, Michael Hilton, Rui Abreu |
| 2020 | Employing Contribution and Quality Metrics for Quantifying the Software Development Process. Themistoklis Diamantopoulos, Michail D. Papamichail, Thomas Karanikiotis, Kyriakos C. Chatzidimitriou, Andreas L. Symeonidis |
| 2020 | Ethical Mining: A Case Study on MSR Mining Challenges. Nicolas E. Gold, Jens Krinke |
| 2020 | Exploring the Security Awareness of the Python and JavaScript Open Source Communities. Gábor Antal, Márton Keleti, Péter Hegedüs |
| 2020 | Forking Without Clicking: on How to Identify Software Repository Forks. Antoine Pietri, Guillaume Rousseau, Stefano Zacchiroli |
| 2020 | From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies? Ang Jia, Ming Fan, Xi Xu, Di Cui, Wenying Wei, Zijiang Yang, Kai Ye, Ting Liu |
| 2020 | GitterCom: A Dataset of Open Source Developer Communications in Gitter. Esteban Parra, Ashley Ellis, Sonia Haiduc |
| 2020 | Hall-of-Apps: The Top Android Apps Metadata Archive. Laura Bello-Jiménez, Camilo Escobar-Velásquez, Anamaria Mojica-Hanke, Santiago Cortés-Fernández, Mario Linares-Vásquez |
| 2020 | How Often Do Single-Statement Bugs Occur?: The ManySStuBs4J Dataset. Rafael-Michael Karampatsis, Charles Sutton |
| 2020 | Improved Automatic Summarization of Subroutines via Attention to File Context. Sakib Haque, Alexander LeClair, Lingfei Wu, Collin McMillan |
| 2020 | Investigating Severity Thresholds for Test Smells. Davide Spadini, Martin Schvarcbacher, Ana-Maria Oprescu, Magiel Bruntink, Alberto Bacchelli |
| 2020 | JTeC: A Large Collection of Java Test Classes for Test Code Analysis and Processing. Federico Corò, Roberto Verdecchia, Emilio Cruciani, Breno Miranda, Antonia Bertolino |
| 2020 | Large-Scale Manual Validation of Bugfixing Changes. Steffen Herbold, Alexander Trautsch, Benjamin Ledel |
| 2020 | LogChunks: A Data Set for Build Log Analysis. Carolin E. Brandt, Annibale Panichella, Andy Zaidman, Moritz Beller |
| 2020 | MSR '20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020 Sunghun Kim, Georgios Gousios, Sarah Nadi, Joseph Hejderup |
| 2020 | Multi-language Design Smells: A Backstage Perspective. Mouna Abidi, Moses Openja, Foutse Khomh |
| 2020 | Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter. Hongbo Fang, Daniel Klug, Hemank Lamba, James D. Herbsleb, Bogdan Vasilescu |
| 2020 | On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems. Biruk Asmare Muse, Mohammad Masudur Rahman, Csaba Nagy, Anthony Cleve, Foutse Khomh, Giuliano Antoniol |
| 2020 | On the Relationship between User Churn and Software Issues. Omar El Zarif, Daniel Alencar da Costa, Safwat Hassan, Ying Zou |
| 2020 | On the Shoulders of Giants: A New Dataset for Pull-based Development Research. Xunhui Zhang, Ayushi Rastogi, Yue Yu |
| 2020 | PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning. Triet Huynh Minh Le, David Hin, Roland Croft, Muhammad Ali Babar |
| 2020 | Painting Flowers: Reasons for Using Single-State State Machines in Model-Driven Engineering. Nan Yang, Pieter J. L. Cuijpers, Ramon R. H. Schiffelers, Johan Lukkien, Alexander Serebrenik |
| 2020 | Polyglot and Distributed Software Repository Mining with Crossflow. Konstantinos Barmpis, Patrick Neubauer, Jonathan Co, Dimitris S. Kolovos, Nicholas Matragkas, Richard F. Paige |
| 2020 | RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization. Toni Mattis, Patrick Rein, Falco Dürsch, Robert Hirschfeld |
| 2020 | SoftMon: A Tool to Compare Similar Open-source Software from a Performance Perspective. Shubhankar Suman Singh, Smruti R. Sarangi |
| 2020 | Software-related Slack Chats with Disentangled Conversations. Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, Lori L. Pollock |
| 2020 | TestRoutes: A Manually Curated Method Level Dataset for Test-to-Code Traceability. András Kicsi, László Vidács, Tibor Gyimóthy |
| 2020 | The Impact of Dynamics of Collaborative Software Engineering on Introverts: A Study Protocol. Ingrid Nunes, Christoph Treude, Fabio Calefato |
| 2020 | The Impact of a Major Security Event on an Open Source Project: The Case of OpenSSL. James Walden |
| 2020 | The Scent of Deep Learning Code: An Empirical Study. Hadhemi Jebnoun, Houssem Ben Braiek, Mohammad Masudur Rahman, Foutse Khomh |
| 2020 | The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History. Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli |
| 2020 | The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. Danielle Gonzalez, Thomas Zimmermann, Nachiappan Nagappan |
| 2020 | Traceability Support for Multi-Lingual Software Projects. Yalin Liu, Jinfeng Lin, Jane Cleland-Huang |
| 2020 | Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler. Timofey Bryksin, Victor Petukhov, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir Kovalenko, Nikita Povarov |
| 2020 | Using Others' Tests to Identify Breaking Updates. Suhaib Mujahid, Rabe Abdalkareem, Emad Shihab, Shane McIntosh |
| 2020 | Visualization of Methods Changeability Based on VCS Data. Sergey Svitkov, Timofey Bryksin |
| 2020 | What constitutes Software?: An Empirical, Descriptive Study of Artifacts. Rolf-Helge Pfeiffer |
| 2020 | What is the Vocabulary of Flaky Tests? Gustavo Pinto, Breno Miranda, Supun Dissanayake, Marcelo d'Amorim, Christoph Treude, Antonia Bertolino |