Literature¶

These are books and academic papers that have influenced the course.

Reference	PDF	One-line summary `[1]`
`[Chacon and Straub, 2014]`	PDF	The book about `git`
`[Jiménez et al., 2017]`	PDF	Best practices in research software
`[Kroll et al., 2013]`	PDF	Best practices in software development
`[Pastrana et al., 2025]`	PDF	Literature review of best practices in Scrum and DevOps
`[Perez-Riverol et al., 2016]`	PDF	Recommendations on `git` and GitHub
`[Ram, 2013]`	PDF	Effect of `git` on reproducibility
`[Serban et al., 2020]`	PDF	Recommends some software engineering best practices, in the field of machine learning
`[Stieler and Bauer, 2023]`	PDF	Applies `[Serban et al., 2020]` to rate if a project follows the recommended practices
`[Stodden and Miguez, 2014]`	PDF	Best practices for a project
`[Visser et al., 2016]`	None	Ten best practices for effective software development
`[Wilson et al., 2014]`	PDF	Best practices for a project
`[Wilson et al., 2017]`	PDF	Good enough practices for a project

[1] You can find more extensive summaries below

Summaries¶

These are summaries of the books.

[Jiménez et al., 2017]

These are the 4 recommendations:

Make source code publicly accessible from day one
Make software easy to discover by providing software metadata via a popular community registry
Adopt a licence and comply with the licence of third-party dependencies
Define clear and transparent contribution, governance and communication processes

[Kroll et al., 2013]

This paper is a literature review, tailored to best practices in Follow-the-sun software development. Below is a table that shows how many papers (n) recommend a specific practice.

`n`	Best practice
6	Agile methods
6	Use of technology for knowledge sharing
3	Process documentation
3	Use of an FTP Server (or data repository) to exchange code and documents
3	Time window3

[Pastrana et al., 2025]

This is a literature review paper on Scrum and DevOps.

Box 11 shows the benefits of Scrum and DevOps practices. Here is an adapted version of box 11:

Benefits	Improvement Observed
Scrum adoption	Actively involved stakeholders
.	Transparent communication channels
.	Increased team collaboration
.	Improved predictability
.	Creation of a collaborative culture
.	Continuous improvement
.	Constant quality measurement or concurrent testing
DevOps adoption	Early and continuous feedback
.	Productivity increasedby 20%
.	Deployment time decreased by 30%
Faster release cycles	Time to market decreased by 25%
.	Incident resolution time decreased by 40%
.	Quality deliverable
.	Early and continuous feedback
Continuous integration	Quality deliverable
.	Time to market decreased by 25%
.	Incident resolution time decreased by 40%
.	Transparent communication channels
Automated testing	Test execution speed increasedby 35%
.	Defect detection increased by 18%
Security automation	Security vulnerabilities decreased by 30%
.	response time decreased by 50%
Agile transformation	The development cycle decreasedby 25%
.	Project success rates increased by 18%

I removed the conclusion to [Sravani et al., 2023] ([117] in the paper) as that paper does not supply these numbers at all.

I used the Doc2Lang image to table converter to convert the image to a table.

[Perez-Riverol et al., 2016]

This paper shared 10 simple rules to take advantage of git and GitHub:

Rule 1: Use GitHub to Track Your Projects
Rule 2: GitHub for Single Users, Teams, and Organizations
Rule 3: Developing and Collaborating on New Features: Branching and Forking
Rule 4: Naming Branches and Commits: Tags and Semantic Versions
Rule 5: Let GitHub Do Some Tasks for You: Integrate
Rule 6: Let GitHub Do More Tasks for You: Automate
Rule 7: Use GitHub to Openly and Collaboratively Discuss, Address, and Close Issues
Rule 8: Make Your Code Easily Citable, and Cite Source Code!
Rule 9: Promote and Discuss Your Projects: Web Page and More
Rule 10: Use GitHub to Be Social: Follow and Watch

[Ram, 2013]

This paper supplies these 8 use cases for Git in science:

Lab notebook
Facilitating collaboration
Backup and failsafe against data loss
Freedom to explore new ideas and methods
Mechanism to solicit feedback and reviews
Increase transparency and verifiability
Managing large data
Lowering barriers to reuse

[Serban et al., 2020]

This aricle shows the importance of a practice and how much it is adopted, in the context of a machine learning project:

Serban et al., 2020, figure 3, annotation mine

Note that there are 2x a 28, where 29 is absent. I assume that the 28 to the right, with and orange circle and an importance of 0.2 had to be 29. I assume so, as the 28 with a green triangle should indeed be a green triangle. This has been clearly annotated :-)

These are the top 10 most important practices, after which I show the full table:

`n`	Title
25	Log Production Predictions with the Model's Version and Input Data
27	Work Against a Shared Backlog
21	Continuously Monitor the Behaviour of Deployed Models
18	Use Continuous Integration
20	Automate Model Deployment
16	Use Versioning for Data, Model, Configurations and Training Scripts
26	Use A Collaborative Development Platform
29	Enforce Fairness and Privacy
17	Run Automated Regression Tests
12	Enable Parallel Training Experiments

Here is the full table:

`n`	Title
1	Use Sanity Checks for All External Data Sources
2	Check that Input Data is Complete, Balanced and Well Distributed
3	Write Reusable Scripts for Data Cleaning and Merging
4	Ensure Data Labelling is Performed in a Strictly Controlled Process
5	Make Data Sets Available on Shared Infrastructure (private or public)
6	Share a Clearly Defined Training Objective within the Team
7	Capture the Training Objective in a Metric thatis Easy to Measure and Understand
8	Test all Feature Extraction Code
9	Assign an Owner to Each Feature and Document its Rationale
10	Actively Remove or Archive Features That are Not Used
11	Peer Review Training Scripts
12	Enable Parallel Training Experiments
13	Automate Hyper-Parameter Optimisation and Model Selection
14	Continuously Measure Model Quality and Performance
15	Share Status and Outcomes of Experiments Within the Team
16	Use Versioningfor Data, Model, Configurations and Training Scripts
17	Run Automated Regression Tests
18	Use Continuous Integration
19	Use Static Analysis to Check Code Quality
20	Automate Model Deployment
21	Continuously Monitor the Behaviour of Deployed Models
22	Enable Shadow Deployment
23	Perform Checks to Detect Skews between Models
24	Enable Automatic Roll Backs for Production Models
25	Log Production Predictions with the Model's Version and Input Data
26	Use A Collaborative Development Platform
27	Work Against a Shared Backlog
28	Communicate, Align, and Collaborate With Multidisciplinary Team Members
29	Enforce Fairness and Privacy

I used the Doc2Lang image to table converter to convert the image to a table

[Stieler and Bauer, 2023]

Applies [Serban et al., 2020] for a data-centric AI project called GW4AL. It is irrelevant for us.

[Stodden and Miguez, 2014]

This paper suggests these best practices about how to setup your infrastructure to achieve reproducible research:

Open licensing should be used for data and code
Workflow tracking should be carried out during the research process.
Data must be available and accessible
Code and methods must be available and accessible
All 3rd party data and software should be cited

[Visser et al., 2016]

This closed-access paper has the following table of content:

Derive Metrics from Your Measurement Goals
Make Definition of Done Explicit
Control Code Versions and Development Branches
Control Development, Test, Acceptance, and Production Environments
Automate Tests
Use Continuous Integration
Automate Deployment
Standardize the Development Environment
Manage Usage of Third-Party Code
Document Just Enough

[Wilson et al., 2014]

This paper summarizes best practices. Here is (a slightly adapted) box 1 from that paper:

`n`	Theme	Recommendatation
1	Write programs for people, not computers	A program should not require its readers to hold more than a handful of facts in memory at once.
.	.	Make names consistent, distinctive, and meaningful.
.	.	Make code style and formatting consistent.
2	Let the computer do the work	Make the computer repeat tasks.
.	.	Save recent commands in a file for re-use.
.	.	Use a build tool to automate workflows.
3	Make incremental changes	Work in small steps with frequent feedback and course correction.
.	.	Use a version control system.
.	.	Put everything that has been created manually in version control.
4	Don't repeat yourself (or others)	Every piece of data must have a single authoritative representation in the system.
.	.	Modularize code rather than copying and pasting.
.	.	Re-use code instead of rewriting it.
5	Plan for mistakes	Add assertions to programs to check their operation.
.	.	Use an off-the-shelf unit testing library.
.	.	Turn bugs into test cases.
.	.	Use a symbolic debugger.
6	Optimize software only after it works correctly	Use a profiler to identify bottlenecks.
.	.	Write code in the highest-level language possible.
7	Document design and purpose, not mechanics	Document interfaces and reasons, not implementations.
.	.	Refactor code in preference to explaining how it works.
.	.	Embed the documentation for a piece of software in that software.
8	Collaborate	Use pre-merge code reviews.
.	.	Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems.
.	.	Use an issue tracking tool.

[Wilson et al., 2017]

This paper summarizes best practices that are good enough. Here is (a slightly adapted) box 1 from that paper:

`n`	Theme	Recommendatation
1	Data management	Save the raw data.
.	.	Ensure that raw data are backed up in more than one location.
.	.	Create the data you wish to see in the world.
.	.	Create analysis-friendly data.
.	.	Record all the steps used to process data.
.	.	Anticipate the need to use multiple tables, and use a unique identifier for every record.
.	.	Submit data to a reputable DOI-issuing repository so that others can access and cite it.
2	Software	Place a brief explanatory comment at the start of every program.
.	.	Decompose programs into functions.
.	.	Be ruthless about eliminating duplication.
.	.	Always search for well-maintained software libraries that do what you need.
.	.	Test libraries before relying on them.
.	.	Give functions and variables meaningful names.
.	.	Make dependencies and requirements explicit.
.	.	Do not comment and uncomment sections of code to control a program's behavior.
.	.	Provide a simple example or test data set.
.	.	Submit code to a reputable DOI-issuing repository.
3	Collaboration	Create an overview of your project.
.	.	Create a shared "to-do" list for the project.
.	.	Decide on communication strategies.
.	.	Make the license explicit.
.	.	Make the project citable.
4	Project organization	Put each project in its own directory, which is named after the project.
.	.	Put text documents associated with the project in the doc directory.
.	.	Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory.
.	.	Put project source code in the src directory.
.	.	Put external scripts or compiled programs in the bin directory.
.	.	Name all files to reflect their content or function.
5	Keeping track of changes	Back up (almost) everything created by a human being as soon as it is created.
.	.	Keep changes small.
.	.	Share changes frequently.
.	.	Create, maintain, and use a checklist for saving and sharing changes to the project.
.	.	Store each project in a folder that is mirrored off the researcher's working machine.
.	.	Add a file called CHANGELOG.txt to the project's docs subfolder.
.	.	Copy the entire project whenever a significant change has been made.
.	.	Use a version control system.
6	Manuscripts	Write manuscripts using online tools with rich formatting, change tracking, and reference management.
.	.	Write the manuscript in a plain text format that permits version control.

References¶

[Chacon and Straub, 2014] Chacon, Scott, and Ben Straub. Pro git. Springer Nature, 2014. Book homepage
[Jiménez et al., 2017] Jiménez, Rafael C., et al. "Four simple recommendations to encourage best practices in research software." F1000Research 6 (2017): ELIXIR-876. Paper homepage
[Kroll et al., 2013] Kroll, Josiane, et al. "A systematic literature review of best practices and challenges in follow-the-sun software development." 2013 IEEE 8th International Conference on Global Software Engineering Workshops. IEEE, 2013. Paper homepage
[Ordoñez-Pacheco et al., 2021] Ordoñez-Pacheco, Rodrigo, Karen Cortes-Verdin, and Jorge Octavio Ocharán-Hernández. "Best practices for software development: A systematic literature review." International Conference on Software Process Improvement. Springer, Cham, 2021. Note: this paper does not exist. It is not part of the book 'Advances in Intelligent Informatics', volume 320, ISBN 978-3-319-11217-6.
[Pastrana et al., 2025] Pastrana, Manuel, et al. "Best Practices Evidenced for Software Development Based on DevOps and Scrum: A Literature Review." Applied Sciences 15.10 (2025): 5421. Paper homepage
[Perez-Riverol et al., 2016] Perez-Riverol, Yasset, et al. "Ten simple rules for taking advantage of Git and GitHub." PLoS computational biology 12.7 (2016): e1004947. Paper homepage
[Ram, 2013] Ram, Karthik. "Git can facilitate greater reproducibility and increased transparency in science." Source code for biology and medicine 8.1 (2013): 7. Paper homepage
[Serban et al., 2020] Serban, Alex, et al. "Adoption and effects of software engineering best practices in machine learning." Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 2020. Paper homepage
[Sravani et al., 2023] Sravani, Diyyala, et al. "Python security in devOps: Best practices for secure coding, configuration management, and continuous testing and monitoring." 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE, 2023.
[Stieler and Bauer, 2023] Stieler, Fabian, and Bernhard Bauer. "Git workflow for active learning-a development methodology proposal for data-centric AI projects." (2023). Paper homepage
[Stodden and Miguez, 2014] Stodden, Victoria, and Sheila Miguez. "Best practices for computational science: Software infrastructure and environments for reproducible and extensible research." (2014). Paper homepage
[Visser et al., 2016] Visser, Joost, et al. Building software teams: Ten best practices for effective software development. " O'Reilly Media, Inc.", 2016.
[Wilson et al., 2014] Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745. Paper homepage
[Wilson et al., 2017] Wilson, Greg, et al. "Good enough practices in scientific computing." PLoS computational biology 13.6 (2017): e1005510. Paper homepage