Literature¶
These are books and academic papers that have influenced the course.
| Reference | One-line summary [1] |
|
|---|---|---|
[Chacon and Straub, 2014] |
The book about git |
|
[Jiménez et al., 2017] |
Best practices in research software | |
[Kroll et al., 2013] |
Best practices in software development | |
[Pastrana et al., 2025] |
Literature review of best practices in Scrum and DevOps | |
[Perez-Riverol et al., 2016] |
Recommendations on git and GitHub |
|
[Ram, 2013] |
Effect of git on reproducibility |
|
[Serban et al., 2020] |
Recommends some software engineering best practices, in the field of machine learning | |
[Stieler and Bauer, 2023] |
Applies [Serban et al., 2020] to rate if a project follows the recommended practices |
|
[Stodden and Miguez, 2014] |
Best practices for a project | |
[Visser et al., 2016] |
None | Ten best practices for effective software development |
[Wilson et al., 2014] |
Best practices for a project | |
[Wilson et al., 2017] |
Good enough practices for a project |
[1]You can find more extensive summaries below
Summaries¶
These are summaries of the books.
[Jiménez et al., 2017]
These are the 4 recommendations:
- Make source code publicly accessible from day one
- Make software easy to discover by providing software metadata via a popular community registry
- Adopt a licence and comply with the licence of third-party dependencies
- Define clear and transparent contribution, governance and communication processes
[Kroll et al., 2013]
This paper is a literature review, tailored to best practices in
Follow-the-sun software development. Below is a table that shows
how many papers (n) recommend a specific practice.
n |
Best practice |
|---|---|
| 6 | Agile methods |
| 6 | Use of technology for knowledge sharing |
| 3 | Process documentation |
| 3 | Use of an FTP Server (or data repository) to exchange code and documents |
| 3 | Time window3 |
[Pastrana et al., 2025]
This is a literature review paper on Scrum and DevOps.
Box 11 shows the benefits of Scrum and DevOps practices. Here is an adapted version of box 11:
| Benefits | Improvement Observed |
|---|---|
| Scrum adoption | Actively involved stakeholders |
| . | Transparent communication channels |
| . | Increased team collaboration |
| . | Improved predictability |
| . | Creation of a collaborative culture |
| . | Continuous improvement |
| . | Constant quality measurement or concurrent testing |
| DevOps adoption | Early and continuous feedback |
| . | Productivity increasedby 20% |
| . | Deployment time decreased by 30% |
| Faster release cycles | Time to market decreased by 25% |
| . | Incident resolution time decreased by 40% |
| . | Quality deliverable |
| . | Early and continuous feedback |
| Continuous integration | Quality deliverable |
| . | Time to market decreased by 25% |
| . | Incident resolution time decreased by 40% |
| . | Transparent communication channels |
| Automated testing | Test execution speed increasedby 35% |
| . | Defect detection increased by 18% |
| Security automation | Security vulnerabilities decreased by 30% |
| . | response time decreased by 50% |
| Agile transformation | The development cycle decreasedby 25% |
| . | Project success rates increased by 18% |
I removed the conclusion to [Sravani et al., 2023] ([117] in the paper)
as that paper does not supply these numbers at all.
I used the Doc2Lang image to table converter to convert the image to a table.
[Perez-Riverol et al., 2016]
This paper shared 10 simple rules to take advantage of git and GitHub:
- Rule 1: Use GitHub to Track Your Projects
- Rule 2: GitHub for Single Users, Teams, and Organizations
- Rule 3: Developing and Collaborating on New Features: Branching and Forking
- Rule 4: Naming Branches and Commits: Tags and Semantic Versions
- Rule 5: Let GitHub Do Some Tasks for You: Integrate
- Rule 6: Let GitHub Do More Tasks for You: Automate
- Rule 7: Use GitHub to Openly and Collaboratively Discuss, Address, and Close Issues
- Rule 8: Make Your Code Easily Citable, and Cite Source Code!
- Rule 9: Promote and Discuss Your Projects: Web Page and More
- Rule 10: Use GitHub to Be Social: Follow and Watch
[Ram, 2013]
This paper supplies these 8 use cases for Git in science:
- Lab notebook
- Facilitating collaboration
- Backup and failsafe against data loss
- Freedom to explore new ideas and methods
- Mechanism to solicit feedback and reviews
- Increase transparency and verifiability
- Managing large data
- Lowering barriers to reuse
[Serban et al., 2020]
This aricle shows the importance of a practice and how much it is adopted, in the context of a machine learning project:

Note that there are 2x a 28, where 29 is absent. I assume that the 28 to the right, with and orange circle and an importance of 0.2 had to be 29. I assume so, as the 28 with a green triangle should indeed be a green triangle. This has been clearly annotated :-)
These are the top 10 most important practices, after which I show the full table:
n |
Title |
|---|---|
| 25 | Log Production Predictions with the Model's Version and Input Data |
| 27 | Work Against a Shared Backlog |
| 21 | Continuously Monitor the Behaviour of Deployed Models |
| 18 | Use Continuous Integration |
| 20 | Automate Model Deployment |
| 16 | Use Versioning for Data, Model, Configurations and Training Scripts |
| 26 | Use A Collaborative Development Platform |
| 29 | Enforce Fairness and Privacy |
| 17 | Run Automated Regression Tests |
| 12 | Enable Parallel Training Experiments |
Here is the full table:
n |
Title |
|---|---|
| 1 | Use Sanity Checks for All External Data Sources |
| 2 | Check that Input Data is Complete, Balanced and Well Distributed |
| 3 | Write Reusable Scripts for Data Cleaning and Merging |
| 4 | Ensure Data Labelling is Performed in a Strictly Controlled Process |
| 5 | Make Data Sets Available on Shared Infrastructure (private or public) |
| 6 | Share a Clearly Defined Training Objective within the Team |
| 7 | Capture the Training Objective in a Metric thatis Easy to Measure and Understand |
| 8 | Test all Feature Extraction Code |
| 9 | Assign an Owner to Each Feature and Document its Rationale |
| 10 | Actively Remove or Archive Features That are Not Used |
| 11 | Peer Review Training Scripts |
| 12 | Enable Parallel Training Experiments |
| 13 | Automate Hyper-Parameter Optimisation and Model Selection |
| 14 | Continuously Measure Model Quality and Performance |
| 15 | Share Status and Outcomes of Experiments Within the Team |
| 16 | Use Versioningfor Data, Model, Configurations and Training Scripts |
| 17 | Run Automated Regression Tests |
| 18 | Use Continuous Integration |
| 19 | Use Static Analysis to Check Code Quality |
| 20 | Automate Model Deployment |
| 21 | Continuously Monitor the Behaviour of Deployed Models |
| 22 | Enable Shadow Deployment |
| 23 | Perform Checks to Detect Skews between Models |
| 24 | Enable Automatic Roll Backs for Production Models |
| 25 | Log Production Predictions with the Model's Version and Input Data |
| 26 | Use A Collaborative Development Platform |
| 27 | Work Against a Shared Backlog |
| 28 | Communicate, Align, and Collaborate With Multidisciplinary Team Members |
| 29 | Enforce Fairness and Privacy |
I used the Doc2Lang image to table converter to convert the image to a table
[Stieler and Bauer, 2023]
Applies [Serban et al., 2020] for a data-centric AI project called GW4AL.
It is irrelevant for us.
[Stodden and Miguez, 2014]
This paper suggests these best practices about how to setup your infrastructure to achieve reproducible research:
- Open licensing should be used for data and code
- Workflow tracking should be carried out during the research process.
- Data must be available and accessible
- Code and methods must be available and accessible
- All 3rd party data and software should be cited
[Visser et al., 2016]
This closed-access paper has the following table of content:
- Derive Metrics from Your Measurement Goals
- Make Definition of Done Explicit
- Control Code Versions and Development Branches
- Control Development, Test, Acceptance, and Production Environments
- Automate Tests
- Use Continuous Integration
- Automate Deployment
- Standardize the Development Environment
- Manage Usage of Third-Party Code
- Document Just Enough
[Wilson et al., 2014]
This paper summarizes best practices. Here is (a slightly adapted) box 1 from that paper:
n |
Theme | Recommendatation |
|---|---|---|
| 1 | Write programs for people, not computers | A program should not require its readers to hold more than a handful of facts in memory at once. |
| . | . | Make names consistent, distinctive, and meaningful. |
| . | . | Make code style and formatting consistent. |
| 2 | Let the computer do the work | Make the computer repeat tasks. |
| . | . | Save recent commands in a file for re-use. |
| . | . | Use a build tool to automate workflows. |
| 3 | Make incremental changes | Work in small steps with frequent feedback and course correction. |
| . | . | Use a version control system. |
| . | . | Put everything that has been created manually in version control. |
| 4 | Don't repeat yourself (or others) | Every piece of data must have a single authoritative representation in the system. |
| . | . | Modularize code rather than copying and pasting. |
| . | . | Re-use code instead of rewriting it. |
| 5 | Plan for mistakes | Add assertions to programs to check their operation. |
| . | . | Use an off-the-shelf unit testing library. |
| . | . | Turn bugs into test cases. |
| . | . | Use a symbolic debugger. |
| 6 | Optimize software only after it works correctly | Use a profiler to identify bottlenecks. |
| . | . | Write code in the highest-level language possible. |
| 7 | Document design and purpose, not mechanics | Document interfaces and reasons, not implementations. |
| . | . | Refactor code in preference to explaining how it works. |
| . | . | Embed the documentation for a piece of software in that software. |
| 8 | Collaborate | Use pre-merge code reviews. |
| . | . | Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems. |
| . | . | Use an issue tracking tool. |
[Wilson et al., 2017]
This paper summarizes best practices that are good enough. Here is (a slightly adapted) box 1 from that paper:
n |
Theme | Recommendatation |
|---|---|---|
| 1 | Data management | Save the raw data. |
| . | . | Ensure that raw data are backed up in more than one location. |
| . | . | Create the data you wish to see in the world. |
| . | . | Create analysis-friendly data. |
| . | . | Record all the steps used to process data. |
| . | . | Anticipate the need to use multiple tables, and use a unique identifier for every record. |
| . | . | Submit data to a reputable DOI-issuing repository so that others can access and cite it. |
| 2 | Software | Place a brief explanatory comment at the start of every program. |
| . | . | Decompose programs into functions. |
| . | . | Be ruthless about eliminating duplication. |
| . | . | Always search for well-maintained software libraries that do what you need. |
| . | . | Test libraries before relying on them. |
| . | . | Give functions and variables meaningful names. |
| . | . | Make dependencies and requirements explicit. |
| . | . | Do not comment and uncomment sections of code to control a program's behavior. |
| . | . | Provide a simple example or test data set. |
| . | . | Submit code to a reputable DOI-issuing repository. |
| 3 | Collaboration | Create an overview of your project. |
| . | . | Create a shared "to-do" list for the project. |
| . | . | Decide on communication strategies. |
| . | . | Make the license explicit. |
| . | . | Make the project citable. |
| 4 | Project organization | Put each project in its own directory, which is named after the project. |
| . | . | Put text documents associated with the project in the doc directory. |
| . | . | Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory. |
| . | . | Put project source code in the src directory. |
| . | . | Put external scripts or compiled programs in the bin directory. |
| . | . | Name all files to reflect their content or function. |
| 5 | Keeping track of changes | Back up (almost) everything created by a human being as soon as it is created. |
| . | . | Keep changes small. |
| . | . | Share changes frequently. |
| . | . | Create, maintain, and use a checklist for saving and sharing changes to the project. |
| . | . | Store each project in a folder that is mirrored off the researcher's working machine. |
| . | . | Add a file called CHANGELOG.txt to the project's docs subfolder. |
| . | . | Copy the entire project whenever a significant change has been made. |
| . | . | Use a version control system. |
| 6 | Manuscripts | Write manuscripts using online tools with rich formatting, change tracking, and reference management. |
| . | . | Write the manuscript in a plain text format that permits version control. |
References¶
-
[Chacon and Straub, 2014]Chacon, Scott, and Ben Straub. Pro git. Springer Nature, 2014. Book homepage -
[Jiménez et al., 2017]Jiménez, Rafael C., et al. "Four simple recommendations to encourage best practices in research software." F1000Research 6 (2017): ELIXIR-876. Paper homepage -
[Kroll et al., 2013]Kroll, Josiane, et al. "A systematic literature review of best practices and challenges in follow-the-sun software development." 2013 IEEE 8th International Conference on Global Software Engineering Workshops. IEEE, 2013. Paper homepage -
[Ordoñez-Pacheco et al., 2021]Ordoñez-Pacheco, Rodrigo, Karen Cortes-Verdin, and Jorge Octavio Ocharán-Hernández. "Best practices for software development: A systematic literature review." International Conference on Software Process Improvement. Springer, Cham, 2021. Note: this paper does not exist. It is not part of the book 'Advances in Intelligent Informatics', volume 320, ISBN 978-3-319-11217-6. -
[Pastrana et al., 2025]Pastrana, Manuel, et al. "Best Practices Evidenced for Software Development Based on DevOps and Scrum: A Literature Review." Applied Sciences 15.10 (2025): 5421. Paper homepage -
[Perez-Riverol et al., 2016]Perez-Riverol, Yasset, et al. "Ten simple rules for taking advantage of Git and GitHub." PLoS computational biology 12.7 (2016): e1004947. Paper homepage -
[Ram, 2013]Ram, Karthik. "Git can facilitate greater reproducibility and increased transparency in science." Source code for biology and medicine 8.1 (2013): 7. Paper homepage -
[Serban et al., 2020]Serban, Alex, et al. "Adoption and effects of software engineering best practices in machine learning." Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 2020. Paper homepage -
[Sravani et al., 2023]Sravani, Diyyala, et al. "Python security in devOps: Best practices for secure coding, configuration management, and continuous testing and monitoring." 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE, 2023. -
[Stieler and Bauer, 2023]Stieler, Fabian, and Bernhard Bauer. "Git workflow for active learning-a development methodology proposal for data-centric AI projects." (2023). Paper homepage -
[Stodden and Miguez, 2014]Stodden, Victoria, and Sheila Miguez. "Best practices for computational science: Software infrastructure and environments for reproducible and extensible research." (2014). Paper homepage -
[Visser et al., 2016]Visser, Joost, et al. Building software teams: Ten best practices for effective software development. " O'Reilly Media, Inc.", 2016. -
[Wilson et al., 2014]Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745. Paper homepage -
[Wilson et al., 2017]Wilson, Greg, et al. "Good enough practices in scientific computing." PLoS computational biology 13.6 (2017): e1005510. Paper homepage