| 2022 | 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022 |
| 2022 | A Deep Study of the Effects and Fixes of Server-Side Request Races in Web Applications. Zhengyi Qiu, Shudi Shao, Qi Zhao, Hassan Ali Khan, Xinning Hui, Guoliang Jin |
| 2022 | A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts. Konstantin Grotov, Sergey Titov, Vladimir Sotnikov, Yaroslav Golubev, Timofey Bryksin |
| 2022 | A Large-scale Dataset of (Open Source) License Text Variants. Stefano Zacchiroli |
| 2022 | A Time Series-Based Dataset of Open-Source Software Evolution. Bruno Luan de Sousa, Mariza A. S. Bigonha, Kecia A. M. Ferreira, Glaura C. Franco |
| 2022 | A Versatile Dataset of Agile Open Source Software Projects. Vali Tawosi, Afnan A. Al-Subaihin, Rebecca Moussa, Federica Sarro |
| 2022 | An Alternative Issue Tracking Dataset of Public Jira Repositories. Lloyd Montgomery, Clara Marie Lüders, Walid Maalej |
| 2022 | An Empirical Evaluation of GitHub Copilot's Code Suggestions. Nhan Nguyen, Sarah Nadi |
| 2022 | An Empirical Study on Maintainable Method Size in Java. Shaiful Alam Chowdhury, Gias Uddin, Reid Holmes |
| 2022 | An Empirical Study on the Survival Rate of GitHub Projects. Adem Ait, Javier Luis Cánovas Izquierdo, Jordi Cabot |
| 2022 | An Exploratory Study on Refactoring Documentation in Issues Handling. Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni |
| 2022 | AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information. Saurabh Kumar, Debadatta Mishra, Biswabandan Panda, Sandeep Kumar Shukla |
| 2022 | ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction. Hossein Keshavarz, Meiyappan Nagappan |
| 2022 | Automatically Prioritizing and Assigning Tasks from Code Repositories in Puzzle Driven Development. Yegor Bugayenko, Ayomide Bakare, Arina Cheverda, Mirko Farina, Artem V. Kruglov, Yaroslav Plaksin, Giancarlo Succi, Witold Pedrycz |
| 2022 | Between JIRA and GitHub: ASFBot and its Influence on Human Comments in Issue Trackers. Ambarish Moharil, Dmitrii Orlov, Samar Jameel, Tristan Trouwen, Nathan Cassee, Alexander Serebrenik |
| 2022 | Beyond Duplicates: Towards Understanding and Predicting Link Types in Issue Tracking Systems. Clara Marie Lüders, Abir Bouraffa, Walid Maalej |
| 2022 | Bot Detection in GitHub Repositories. Natarajan Chidambaram, Pooya Rostami Mazrae |
| 2022 | BotHunter: An Approach to Detect Software Bots in GitHub. Ahmad Abdellatif, Mairieli Santos Wessel, Igor Steinmacher, Marco Aurélio Gerosa, Emad Shihab |
| 2022 | CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning. Mohammad Reza Taesiri, Finlay Macklon, Cor-Paul Bezemer |
| 2022 | Challenges and Future Research Direction for Microtask Programming in Industry. Masanari Kondo, Shinobu Saito, Yukako Iimura, Eunjong Choi, Osamu Mizuno, Yasutaka Kamei, Naoyasu Ubayashi |
| 2022 | Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study. Tatiana Castro Vélez, Raffi Khatchadourian, Mehdi Bagherzadeh, Anita Raja |
| 2022 | Characterizing High-Quality Test Methods: A First Empirical Study. Victor Veloso, André C. Hora |
| 2022 | Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack. Eman Abdullah AlOmar, Moataz Chouchen, Mohamed Wiem Mkaouer, Ali Ouni |
| 2022 | Comments on Comments: Where Code Review and Documentation Meet. Nikitha Rao, Jason Tsay, Martin Hirzel, Vincent J. Hellendoorn |
| 2022 | Complex Python Features in the Wild. Yi Yang, Ana L. Milanova, Martin Hirzel |
| 2022 | Constructing Dataset of Functionally Equivalent Java Methods Using Automated Test Generation Techniques. Yoshiki Higo, Shinsuke Matsumoto, Shinji Kusumoto, Kazuya Yasuda |
| 2022 | DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research. Keerthana Muthu Subash, Lakshmi Prasanna Kumar, Sri Lakshmi Vadlamani, Preetha Chatterjee, Olga Baysal |
| 2022 | DaSEA - A Dataset for Software Ecosystem Analysis. Petya Buchkova, Joakim Hey Hinnerskov, Kasper Olsen, Rolf-Helge Pfeiffer |
| 2022 | Dataset: Dependency Networks of Open Source Libraries Available Through CocoaPods, Carthage and Swift PM. Kristiina Rahkema, Dietmar Pfahl |
| 2022 | Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue. Rui Shu, Tianpei Xia, Laurie A. Williams, Tim Menzies |
| 2022 | Detecting Privacy-Sensitive Code Changes with Language Modeling. Gökalp Demirci, Vijayaraghavan Murali, Imad Ahmad, Rajeev Rao, Gareth Ari Aye |
| 2022 | Do Customized Android Frameworks Keep Pace with Android? Pei Liu, Mattia Fazzini, John C. Grundy, Li Li |
| 2022 | Do Small Code Changes Merge Faster? A Multi-Language Empirical Investigation. Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi |
| 2022 | Does Configuration Encoding Matter in Learning Software Performance? An Empirical Study on Encoding Schemes. Jingzhi Gong, Tao Chen |
| 2022 | Does This Apply to Me? An Empirical Study of Technical Context in Stack Overflow. Akalanka Galappaththi, Sarah Nadi, Christoph Treude |
| 2022 | ECench: An Energy Bug Benchmark of Ethereum Client Software. Jinyoung Kim, Misoo Kim, Eunseok Lee |
| 2022 | Empirical Standards for Repository Mining. Preetha Chatterjee, Tushar Sharma, Paul Ralph |
| 2022 | Evaluating the effectiveness of local explanation methods on source code-based defect prediction models. Yuxiang Gao, Yi Zhu, Qiao Yu |
| 2022 | Exploring Apache Incubator Project Trajectories with APEX. Anirudh Ramchandran, Likang Yin, Vladimir Filkov |
| 2022 | Extracting Corrective Actions from Code Repositories. Yegor Bugayenko, Kirill Daniakin, Mirko Farina, Firas Jolha, Artem V. Kruglov, Giancarlo Succi, Witold Pedrycz |
| 2022 | FaST: A linear time stack trace alignment heuristic for crash report deduplication. Irving Muller Rodrigues, Daniel Aloise, Eraldo Rezende Fernandes |
| 2022 | FixJS: A Dataset of Bug-fixing JavaScript Commits. Viktor Csuvik, László Vidács |
| 2022 | Geographic Diversity in Public Code Contributions: An Exploratory Large-Scale Study Over 50 Years. Davide Rossi, Stefano Zacchiroli |
| 2022 | GitDelver Enterprise Dataset (GDED): An Industrial Closed-source Dataset for Socio-Technical Research. Nicolas Riquet, Xavier Devroey, Benoît Vanderose |
| 2022 | GitRank: A Framework to Rank GitHub Repositories. Niranjan Hasabnis |
| 2022 | GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses. Wei Ma, Mengjie Zhao, Ezekiel O. Soremekun, Qiang Hu, Jie M. Zhang, Mike Papadakis, Maxime Cordy, Xiaofei Xie, Yves Le Traon |
| 2022 | How heated is it? Understanding GitHub locked issues. Isabella Ferreira, Bram Adams, Jinghui Cheng |
| 2022 | How to Improve Deep Learning for Software Analytics (a case study with code smell detection). Rahul Yedida, Tim Menzies |
| 2022 | Inspect4py: A Knowledge Extraction Framework for Python Code Repositories. Rosa Filgueira, Daniel Garijo |
| 2022 | Is Open Source Eating the World's Software? Measuring the Proportion of Open Source in Proprietary Software Using Java Binaries. Julius Musseau, John Speed Meyers, George P. Sieniawski, C. Albert Thompson, Daniel M. Germán |
| 2022 | Is Refactoring Always a Good Egg? Exploring the Interconnection Between Bugs and Refactorings. Amirreza Bagheri, Péter Hegedüs |
| 2022 | LAGOON: An Analysis Tool for Open Source Communities. Sourya Dey, Walt Woods |
| 2022 | LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries. Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, Dongmei Zhang |
| 2022 | LineVD: Statement-level Vulnerability Detection using Graph Neural Networks. David Hin, Andrey Kan, Huaming Chen, Muhammad Ali Babar |
| 2022 | LineVul: A Transformer-based Line-Level Vulnerability Prediction. Michael Fu, Chakkrit Tantithamthavorn |
| 2022 | Lupa: A Framework for Large Scale Analysis of the Programming Language Usage. Anna Vlasova, Maria Tigina, Ilya Vlasov, Anastasiia Birillo, Yaroslav Golubev, Timofey Bryksin |
| 2022 | METHODS2TEST: A dataset of focal methods mapped to test cases. Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy |
| 2022 | Maintenance and Evolution: GrimoireLab Graal. Willem Meijer, David Visscher, Erwin de Haan, Merijn Schröder, Leon Visscher, Andrea Capiluppi, Ioan Botez |
| 2022 | ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference. Kevin Jesse, Premkumar T. Devanbu |
| 2022 | Methods for Stabilizing Models Across Large Samples of Projects (with case studies on Predicting Defect and Project Health). Suvodeep Majumder, Tianpei Xia, Rahul Krishna, Tim Menzies |
| 2022 | Microsoft CloudMine: Data Mining for the Executive Order on Improving the Nation's Cybersecurity. Kim Herzig, Luke Ghostling, Maximilian Grothusmann, Sascha Just, Nora Huang, Alan Klimowski, Yashasvini Ramkumar, Myles McLeroy, Kivanç Muslu, Hitesh Sajnani, Varsha Vadaga |
| 2022 | Mining Code Review Data to Understand Waiting Times Between Acceptance and Merging: An Empirical Analysis. Gunnar Kudrjavets, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi |
| 2022 | Mining the Ethereum Blockchain Platform: Best Practices and Pitfalls (MSR 2022 Tutorial). Gustavo Ansaldi Oliva |
| 2022 | Mining the Usage of Reactive Programming APIs: A Study on GitHub and Stack Overflow. Carlos Zimmerle, Kiev Gama, Fernando Castor, José Murilo Mota Filho |
| 2022 | Multimodal Recommendation of Messenger Channels. Ekaterina Koshchenko, Egor Klimov, Vladimir Kovalenko |
| 2022 | Noisy Label Learning for Security Defects. Roland Croft, Muhammad Ali Babar, Huaming Chen |
| 2022 | On the Co-Occurrence of Refactoring of Test and Source Code. Nicholas Alexandre Nagy, Rabe Abdalkareem |
| 2022 | On the Naturalness of Fuzzer-Generated Code. Rajeswari Hita Kambhamettu, John Billos, Tomi Oluwaseun-Apo, Benjamin Gafford, Rohan Padhye, Vincent J. Hellendoorn |
| 2022 | On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models. Triet Huynh Minh Le, Muhammad Ali Babar |
| 2022 | On the Violation of Honesty in Mobile Apps: Automated Detection and Categories. Humphrey O. Obie, Idowu Ilekura, Hung Du, Mojtaba Shahin, John C. Grundy, Li Li, Jon Whittle, Burak Turhan |
| 2022 | OpenSSL 3.0.0: An exploratory case study. James Walden |
| 2022 | Operationalizing Threats to MSR Studies by Simulation-Based Testing. Johannes Härtel, Ralf Lämmel |
| 2022 | Painting the Landscape of Automotive Software in GitHub. Sangeeth Kochanthara, Yanja Dajsuren, Loek Cleophas, Mark van den Brand |
| 2022 | Problems and Solutions in Applying Continuous Integration and Delivery to 20 Open-Source Cyber-Physical Systems. Fiorella Zampetti, Vittoria Nardone, Massimiliano Di Penta |
| 2022 | Quid Pro Quo: An Exploration of Reciprocity in Code Review. Carlos Gavidia-Calderon, DongGyun Han, Amel Bennaceur |
| 2022 | ReCover: a Curated Dataset for Regression Testing Research. Francesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, Luigi L. L. Starace |
| 2022 | Real-World Clone-Detection in Go. Qinyun Wu, Huan Song, Ping Yang |
| 2022 | Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring. Anthony Peruma, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni |
| 2022 | Replicating Data Pipelines with GrimoireLab. Kalvin Eng, Hareem Sahar |
| 2022 | SECOM: Towards a convention for security commit messages. Sofia Reis, Rui Abreu, Hakan Erdogmus, Corina S. Pasareanu |
| 2022 | SLNET: A Redistributable Corpus of 3rd-party Simulink Models. Sohil Lal Shrestha, Shafiul Azam Chowdhury, Christoph Csallner |
| 2022 | SOSum: A Dataset of Stack Overflow Post Summaries. Bonan Kou, Yifeng Di, Muhao Chen, Tianyi Zhang |
| 2022 | Searching for High-Fidelity Builds Using Active Learning. Harshitha Menon, Konstantinos Parasyris, Tom Scogland, Todd Gamblin |
| 2022 | Senatus - A Fast and Accurate Code-to-Code Recommendation Engine. Fran Silavong, Sean J. Moran, Antonios Georgiadis, Rohan Saphal, Robert Otter |
| 2022 | Smelly Variables in Ansible Infrastructure Code: Detection, Prevalence, and Lifetime. Ruben Opdebeeck, Ahmed Zerouali, Coen De Roover |
| 2022 | SniP: An Efficient Stack Tracing Framework for Multi-threaded Programs. K. P. Arun, Saurabh Kumar, Debadatta Mishra, Biswabandan Panda |
| 2022 | SoCCMiner: A Source Code-Comments and Comment-Context Miner. Murali Sridharan, Mika Mäntylä, Maëlick Claes, Leevi Rantala |
| 2022 | Software Bots in Software Engineering: Benefits and Challenges. Mairieli Santos Wessel, Marco Aurélio Gerosa, Emad Shihab |
| 2022 | Starting the InnerSource Journey: Key Goals and Metrics to Measure Collaboration. Daniel Izquierdo-Cortazar, Jesús Alonso-Gutiérrez, Alberto Pérez García-Plaza, Gregorio Robles, Jesús M. González-Barahona |
| 2022 | Studying the Impact of Continuous Delivery Adoption on Bug-Fixing Time in Apache's Open-Source Projects. Carlos Diego Andrade de Almeida, Diego N. Feijó, Lincoln S. Rocha |
| 2022 | TSSB-3M: Mining single statement bugs at massive scale. Cedric Richter, Heike Wehrheim |
| 2022 | The General Index of Software Engineering Papers. Zeinab Abou Khalil, Stefano Zacchiroli |
| 2022 | The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories. Melanie Warrick, Samuel F. Rosenblatt, Jean-Gabriel Young, Amanda Casari, Laurent Hébert-Dufresne, James P. Bagrow |
| 2022 | The Unexplored Treasure Trove of Phabricator Code Reviews. Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi |
| 2022 | The Unsolvable Problem or the Unheard Answer? A Dataset of 24, 669 Open-Source Software Conference Talks. Kimberly Truong, Courtney Miller, Bogdan Vasilescu, Christian Kästner |
| 2022 | To Type or Not to Type? A Systematic Comparison of the Software Quality of JavaScript and TypeScript Applications on GitHub. Justus Bogner, Manuel Merkel |
| 2022 | To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set? Matteo Ciniselli, Luca Pascarella, Gabriele Bavota |
| 2022 | Tooling for Time- and Space-efficient git Repository Mining. Fabian Heseding, Willy Scheibel, Jürgen Döllner |
| 2022 | Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items. Jirat Pasuksmit, Patanamon Thongtanunam, Shanika Karunasekera |
| 2022 | TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs. Jordan Samhi, Tegawendé F. Bissyandé, Jacques Klein |
| 2022 | TwinDroid: A Dataset of Android app System call traces and Trace Generation Pipeline. Asma Razgallah, Raphaël Khoury, Jean-Baptiste Poulet |
| 2022 | Using Bandit Algorithms for Selecting Feature Reduction Techniques in Software Defect Prediction. Masateru Tsunoda, Akito Monden, Koji Toda, Amjed Tahir, Kwabena Ebo Bennin, Keitaro Nakasai, Masataka Nagura, Kenichi Matsumoto |
| 2022 | Varangian: A Git Bot for Augmented Static Analysis. Saurabh Pujar, Yunhui Zheng, Luca Buratti, Burn L. Lewis, Alessandro Morari, Jim Laredo, Kevin Postlethwait, Christoph Görn |
| 2022 | Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared Towards the Study of Program Repair Techniques. Quang-Cuong Bui, Riccardo Scandariato, Nicolás E. Díaz Ferreyra |
| 2022 | WeakSATD: Detecting Weak Self-admitted Technical Debt. Barbara Russo, Matteo Camilli, Moritz Mock |
| 2022 | Which bugs are missed in code reviews: An empirical study on SmartSHARK dataset. Fatemeh Khoshnoud, Ali Rezaei Nasab, Zahra Toudeji, Ashkan Sami |
| 2022 | npm-filter: Automating the mining of dynamic information from npm packages. Ellen Arteca, Alexi Turcotte |