More about sharing

Open science

The Open Science movement encourages researchers to share research output beyond the contents of a published academic article (and possibly supplementary information).

Research comic
  • Open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions.

Arguments in favor (from Wikipedia):

  • Open access publication of research reports and data allows for rigorous peer-review

  • Science is publicly funded so all results of the research should be publicly available

  • Open Science will make science more reproducible and transparent

  • Open Science has more impact

  • Open Science will help answer uniquely complex questions

Arguments against (from Wikipedia):

  • Too much unsorted information overwhelms scientists

  • Potential misuse

  • The public will misunderstand science data

  • Increasing the scale of science will make verification of any discovery more difficult

  • Low-quality science

_images/36-data-research-cycle.jpg

(This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. The image was obtained from https://zenodo.org/record/3332808.)

Read more

FAIR

“FAIR” is the current buzzword for data management. You may be asked about it in, for example, making data management plans for grants:

  • Findable

    • Will anyone else know that your data exists?

    • Solutions: put it in a standard repository, or at least a description of the data. Get a digital object identifier (DOI).

  • Accessible

    • Once someone knows that the data exists, can they get it?

    • Usually solved by being in a repository, but for non-open data, may require more procedures.

  • Inter-operable

    • Is your data in a format that can be used by others, like csv instead of PDF?

    • Or better than csv. Example: 5-star open data

  • Reusable

    • Is there a license allowing others to re-use?

Social coding

  • Our work depends on outputs from others. Research of others depends on our outputs.

  • Whether you can share your output depends on how you obtained your input.

  • A repository that is private today might become public one day.

  • Sometimes “OTHERS” are you yourself in the future in a different group/job.

  • Software licenses matter. And this is what we will discuss in the next episode.

Licensing

For more details

Derivative work: Changing, remixing, covering

Images of Bob Marley and Mona Lisa made out of Rubik cubes

Images of Bob Marley and Mona Lisa made out of Rubik cubes. Is this derivative work? “Distillery District 26” CC-BY.

If you build on something, you form a derivative work

  • The original creator may have rights to what you make

  • The whole point of this talk is to make sure that you can make and publish derivative works and others can make and publish derivative works from you

  • You cannot ignore licensing: default is “no one can make copies or derivative works”.

  • License your code very early in the project: ideally develop publicly accessible open source code from day one.

  • Start with a README.md and a LICENSE file.

  • Use GitHub recommendation or/and https://choosealicense.com/.

Licensing and ownership

Who can decide about or change a license?

  • The copyright holder if a separate “Contributor License Agreement” is signed. Otherwise copyright holder provided they secure express consent from all the contributors.

Who owns the copyright for software you write?

  • Intellectual property depends on the country and the employer!

  • So-called works made for hire.

If you own your software:

  • You can change the license.

  • You can dual-license (e.g. GPL for anyone, but you can pay for commercial non-GPL).

If you do not own your software, you can:

  • Request open-sourcing directly (preserves your rights!).

  • Request a transfer of ownership (check with your university).

If you accept contributions (pull requests), you may not be the only owner anymore!

  • Clarify licensing strategy before - otherwise you won’t have all rights to your code.

What is free software?

Software freedom is the freedom to …

  • … run the software for any purpose: new applications

  • study how the software works and to adapt it to your needs: new applications, less reinventing wheels

  • redistribute copies of the software: more users, more citations

  • improve the software and distribute your improvements to the public: fix bugs, new science

Typical confusion

  • Free software does not mean that software is for free

  • Open source license does not mean you need to share everything immediately (share main branch, put unpublished code on a fork)

  • Open source does not mean public domain: software in the public domain has no owner

  • Open source does not mean non-commercial: plenty of companies produce and support it

Taxonomy of software licenses

  • Copyleft is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works.

1. Custom/closed/proprietary

  • Derivative work typically not possible

2. Permissive (MIT, BSD, Apache)

  • Derivative work does not have to be shared

  • Permissive: gives the public permission to use, modify, and share, without any condition for downstream licensing

  • If the licenses of components are permissive, one may use any open license they want.

Permissions, conditions, and limitations of the MIT license

Example: Permissions, conditions, and limitations of the MIT license. Unchanged from https://choosealicense.com/.

3. Weak copyleft share-alike (LGPL, MPL)

  • Derivative work is free software but is limited to the component

Permissions, conditions, and limitations of the LGPL license

Example: Permissions, conditions, and limitations of the LGPL license. Unchanged from https://choosealicense.com/.

4. Strong copyleft share-alike (GPL, AGPL)

  • Derivative work is free software and derivative work extends to the combined project

  • If the licenses of components are strong copyleft, one must use the same license

Permissions, conditions, and limitations of the GPL license

Example: Permissions, conditions, and limitations of the GPL license. Unchanged from https://choosealicense.com/.

Creative Commons licenses

Creative Commons
  • Creative Commons licenses give everyone a standardized way to grant the public permission to use their creative work under copyright law.

  • From the reuser’s perspective, the presence of a Creative Commons license on a copyrighted work answers the question, “What can I do with this work?” https://creativecommons.org/about/cclicenses/

  • More often used for images and text.

  • For software, Creative Commons includes three free licenses created by other institutions: the BSD License, the GNU LGPL, and the GNU GPL. See above.

4 rights:

CC-BY
  • BY – Attribution - Credit must be given to the creator

    • User may copy, distribute, display, perform and make derivative works and remixes based on it only if they give the author or licensor the credits

CC-SA
  • SA - Share alike - Allows remix culture

    • distribute derivative works only under a license identical to (“not more restrictive than”) the license that governs the original work.(Share alike)

CC-NC
  • NC - Non-commercial

    • Licensees may copy, distribute, display, perform the work and make derivative works and remixes based on it only for non-commercial purposes.

CC-ND
  • ND - No derivative work

    • Licensees may copy, distribute, display and perform only verbatim copies of the work, not derivative works and remixes based on it. Since version 4.0, derivative works are allowed but must not be shared.

CC0
  • CC0 (aka CC Zero) - No attribution required

    • not recommended to release software into the public domain because it lacks a patent grant.

If you would like to learn more about licenses, check out our slide deck “Software licensing and open source explained with cakes”.

Note

Also covered on the third day

Software Citation

  • Think the same as for a scientific paper

Our practical recommendations:

This is an example of a simple CITATION.cff file:

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/0000-0003-4925-7248
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11

Recommended format for software citation is to ensure the following information is provided as part of the reference Katz, Chue Hong, Clark, 2021:

  • Creator

  • Title

  • Publication venue

  • Date

  • Identifier

  • Version

  • Type

  • Digital object identifiers (DOI) are the backbone of the academic reference and metrics system.

  • CodeRefinery has an exercise to see how to make a GitHub repository citable by archiving it on the Zenodo archiving service. If you are interested, have a look here

Keypoints

  • Share your code! Eventually others will probably use it anyway.

  • Licence your software and do it early. Default is “no one can make copies or derivative works”.

  • Get DOI or at least state how to cite your software