Objectives:
“We can do that with MS Office / Google Docs / {other software}, right?”
git
Objectives:
git
in a local machinegit
for the first timegit
: add
, commit
, clone
, push
, and pull
git
?git
git
git
for the first timeTerminal
Open your terminal application, then go to the cloned directory
You will find the following files listed:
We can check on the log history
Output
myfile.txt
Output
total 28K
drwxr-xr-x 3 lam lam 4.0K Mar 20 08:13 .
drwxr-xr-x 3 lam lam 4.0K Mar 20 07:37 ..
drwxr-xr-x 7 lam lam 4.0K Mar 20 08:10 .git
-rw-r--r-- 1 lam lam 355 Mar 20 07:33 index.html
-rw-r--r-- 1 lam lam 10 Mar 20 08:13 myfile.txt
-rw-r--r-- 1 lam lam 780 Mar 20 07:33 README.md
-rw-r--r-- 1 lam lam 256 Mar 20 07:33 styles.css
We can track myfile.txt
to our repository then recheck the status
We need to commit
for the changes to be tracked
Checking on the log
$ git log --all --decorate --oneline --graph
* 67802c8 (HEAD -> main) Add a new file to describe lorem ipsum
| * f439fc5 (origin/change-the-title) Update README.md
|/
| * 5806070 (origin/test-branch) Create test.md
|/
* d0dd1f6 (origin/main, origin/HEAD) Pointing to the guide for forking
* bb4cc8d Create styles.css and updated README
* a30c19e Created index page for future collaborative edits
git
and GitHub10 rules by Perez-Riverol et al. (2016)
Use GitHub to track your project
The backbone of GitHub is the distributed version control system Git. Every change, from fixing a typo to a complete redesign of the software, is tracked and uniquely identified. Although Git has a complex set of commands and can be used for rather complex operations, learning to apply the basics requires only a handful of new concepts and commands and will provide a solid ground to efficiently track code and related content for research projects.
Manage permission for repository access
Public projects on GitHub are visible to everyone, but write permission, i.e., the ability to directly modify the content of a repository, needs to be granted explicitly. As a repository owner, you can grant this right to other GitHub users. In addition to being owned by users, repositories can also be created and managed as part of teams and organizations.
Create SOP for branching and forking
Anyone with a GitHub account can fork any repository they have access to. This will create a complete copy of the content of the repository, while retaining a link to the original “upstream” version. It allows anyone to develop and test novel features with existing code and offers the possibility of contributing novel features, bug fixes, and improvements to documentation back into the original upstream project (requested by opening an pull request) repository and becoming a contributor.
Use tags and semantic versions
Tags can be used to label versions during the development process. Version numbering should follow “semantic versioning” practice, with the format X.Y.Z., with X being the major, Y the minor, and Z the patch version of the release, including possible meta information. Correct labeling allows developers and users to easily recover older versions, compare them, or simply use them to reproduce results described in publications.
Continuous integration and automated code testing
Code testing is necessary to detect possible bugs introduced by new features or changes in the code or dependencies, as well as detecting wrong results, often known as logic errors, in which the source code produces a different result than what was intended. Continuous integration provides a way to automatically and systematically run a series of tests to check integrity and performance of code, a task that can be automated through GitHub.
Automate more tasks via webhooks and GitHub actions
More than just code compilation and testing can be integrated into your software project: GitHub hooks can be used to automate numerous tasks to help improve the overall quality of your project. You might consider generating the documentation upon code/documentation modification, i.e. by using Quarto for
R
script or Sphinx forpython
code.
Openly and Collaboratively Discuss, Address, and Close Issues
GitHub issues are a great way to keep track of bugs, tasks, feature requests, and enhancements. While classical issue trackers are primarily intended to be used as bug trackers, in contrast, GitHub issue trackers follow a different philosophy: each tracker has its own section in every repository and can be used to trace bugs, new ideas, and enhancements by using a powerful tagging system. The main objective of issues in GitHub is promoting collaboration and providing context by using cross-references.
Make your code easily citable
GitHub now integrates with archiving services such as Zenodo and Figshare, enabling DOIs to be assigned to code repositories. By default, Zenodo creates an archive of a repository each time a new release is created in GitHub, ensuring the cited code remains up to date. Once the DOI has been assigned, it can be added to literature information resources such as Europe PubMed Central.
Promote and discuss your project
GitHub Pages are simple websites freely hosted by GitHub. Users can create and host blog websites, help pages, manuals, tutorials, and websites related to specific projects.
Social feature in GitHub: follow and watch
In the same way researchers are following developments in their field, scientific programmers could follow publicly available projects that might benefit their research. GitHub enables this functionality by following other GitHub users (see also Rule 2) or watching the activity of projects, which is a common feature in many social media platforms.
Version control systems (VCS), which have long been used to maintain code repositories in the software industry, are now finding new applications in science. One such open source VCS, Git, provides a lightweight yet robust framework that is ideal for managing the full suite of research outputs such as datasets, statistical code, figures, lab notes, and manuscripts. For individual researchers, Git provides a powerful way to track and compare versions, retrace errors, explore new approaches in a structured manner, while maintaining a full audit trail. For larger collaborative efforts, Git and Git hosting services make it possible for everyone to work asynchronously and merge their contributions at any time, all the while maintaining a complete authorship trail. In this paper I provide an overview of Git along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.
Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, to reproduce and thus understand how results came about, while avoiding a variety of errors that may lead to erroneous reporting of statistical and computational results. In this tutorial, we demonstrate how the R package
repro
supports researchers in creating fully computationally reproducible research projects with tools from the software engineering community. Building upon this notion of fully automated reproducibility, we present several applications including the preregistration of research plans with code (Preregistration as Code, PAC). PAC eschews all ambiguity of traditional preregistration and offers several more advantages. Making technical advancements that serve reproducibility more widely accessible for researchers holds the potential to innovate the research process and to help it become more productive, credible, and reliable.
Researchers in ecology and evolutionary biology are increasingly dependent on computational code to conduct research. Hence, the use of efficient methods to share, reproduce, and collaborate on code as well as document research is fundamental. GitHub is an online, cloud-based service that can help researchers track, organize, discuss, share, and collaborate on software and other materials related to research production, including data, code for analyses, and protocols. Despite these benefits, the use of GitHub in ecology and evolution is not widespread. We outline features ranging from low to high technical difficulty, including storing code, managing projects, coding collaboratively, conducting peer review, writing a manuscript, and using automated and continuous integration to streamline analyses. Given that members of a research team may have different technical skills and responsibilities, we describe how the optimal use of GitHub features may vary among members of a research collaboration.
https://namakala.github.io/