Basics of `git`

Objectives:

Able to install and use git in a local machine
Know how to configure git for the first time
Understand 5 most-used commands in git: add, commit, clone, push, and pull

What is `git`?

Getting started with `git`

Download `git`

Configure `git` for the first time

Terminal

$ git config --global user.name "Your Name"
$ git config --global user.email any.email.you.use@mail.com

You can check you configuration by issuing:

Terminal

$ git config --list

Which will give you:

Output

user.email=any.email.you.use@mail.com
user.name="Your Name"
init.defaultbranch=master
merge.tool=vimdiff
credential.helper=cache

Cloning a repository

Terminal

$ git clone https://github.com/octocat/Spoon-Knife

The command above will clone a GitHub repository into your local machine
You may notice that now you have a folder Spoon-Knife created in your directory

Checking on the cloned repository

Open your terminal application, then go to the cloned directory

Linux/MacOS
Windows

Terminal

$ cd /path/to/Spoon-Knife # Change directory
$ ls -lah                 # List all the contents

CMD

C:\Users\Username>"C:\path\to\Spoon-Knife" # Change directory
C:\path\to\Spoon-Knife>dir /s /b /o:gn     # List all the contents

You will find the following files listed:

Output

total 24K
drwxr-xr-x 3 lam lam 4.0K Mar 20 07:33 .
drwxr-xr-x 3 lam lam 4.0K Mar 20 07:37 ..
drwxr-xr-x 7 lam lam 4.0K Mar 20 07:33 .git
-rw-r--r-- 1 lam lam  355 Mar 20 07:33 index.html
-rw-r--r-- 1 lam lam  780 Mar 20 07:33 README.md
-rw-r--r-- 1 lam lam  256 Mar 20 07:33 styles.css

We can check on the log history

Terminal

$ git log --oneline --graph --decorate --all

Output

* f439fc5 (origin/change-the-title) Update README.md
| * 5806070 (origin/test-branch) Create test.md
|/
* d0dd1f6 (HEAD -> main, origin/main, origin/HEAD) Pointing to the guide for forking
* bb4cc8d Create styles.css and updated README
* a30c19e Created index page for future collaborative edits

Create a file and make some changes

Let’s make a new file: myfile.txt
We can put some lorem ipsum in our new file
Now, our directory should contain:

Terminal

$ ls -lah

Output

total 28K
drwxr-xr-x 3 lam lam 4.0K Mar 20 08:13 .
drwxr-xr-x 3 lam lam 4.0K Mar 20 07:37 ..
drwxr-xr-x 7 lam lam 4.0K Mar 20 08:10 .git
-rw-r--r-- 1 lam lam  355 Mar 20 07:33 index.html
-rw-r--r-- 1 lam lam   10 Mar 20 08:13 myfile.txt
-rw-r--r-- 1 lam lam  780 Mar 20 07:33 README.md
-rw-r--r-- 1 lam lam  256 Mar 20 07:33 styles.css

Let’s check our repository status

Terminal

$ git status

Output

On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        myfile.txt

nothing added to commit but untracked files present (use "git add" to track)

We can track myfile.txt to our repository then recheck the status

Terminal

$ git add myfile.txt
$ git status

Output

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   myfile.txt

We need to commit for the changes to be tracked

Terminal

$ git commit myfile.txt -m "Add a new file to describe lorem ipsum"

Output

[main 67802c8] Add a new file to describe lorem ipsum
 1 file changed, 3 insertions(+)
 create mode 100644 myfile.txt

Checking on the log

$ git log --all --decorate --oneline --graph

* 67802c8 (HEAD -> main) Add a new file to describe lorem ipsum
| * f439fc5 (origin/change-the-title) Update README.md
|/
| * 5806070 (origin/test-branch) Create test.md
|/
* d0dd1f6 (origin/main, origin/HEAD) Pointing to the guide for forking
* bb4cc8d Create styles.css and updated README
* a30c19e Created index page for future collaborative edits

Take advantage of `git` and GitHub

10 rules by Perez-Riverol et al. (2016)

Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Rule 7
Rule 8
Rule 9
Rule 10

Use GitHub to track your project

The backbone of GitHub is the distributed version control system Git. Every change, from fixing a typo to a complete redesign of the software, is tracked and uniquely identified. Although Git has a complex set of commands and can be used for rather complex operations, learning to apply the basics requires only a handful of new concepts and commands and will provide a solid ground to efficiently track code and related content for research projects.

Manage permission for repository access

Public projects on GitHub are visible to everyone, but write permission, i.e., the ability to directly modify the content of a repository, needs to be granted explicitly. As a repository owner, you can grant this right to other GitHub users. In addition to being owned by users, repositories can also be created and managed as part of teams and organizations.

Create SOP for branching and forking

Anyone with a GitHub account can fork any repository they have access to. This will create a complete copy of the content of the repository, while retaining a link to the original “upstream” version. It allows anyone to develop and test novel features with existing code and offers the possibility of contributing novel features, bug fixes, and improvements to documentation back into the original upstream project (requested by opening an pull request) repository and becoming a contributor.

Use tags and semantic versions

Tags can be used to label versions during the development process. Version numbering should follow “semantic versioning” practice, with the format X.Y.Z., with X being the major, Y the minor, and Z the patch version of the release, including possible meta information. Correct labeling allows developers and users to easily recover older versions, compare them, or simply use them to reproduce results described in publications.

Continuous integration and automated code testing

Code testing is necessary to detect possible bugs introduced by new features or changes in the code or dependencies, as well as detecting wrong results, often known as logic errors, in which the source code produces a different result than what was intended. Continuous integration provides a way to automatically and systematically run a series of tests to check integrity and performance of code, a task that can be automated through GitHub.

Automate more tasks via webhooks and GitHub actions

More than just code compilation and testing can be integrated into your software project: GitHub hooks can be used to automate numerous tasks to help improve the overall quality of your project. You might consider generating the documentation upon code/documentation modification, i.e. by using Quarto for R script or Sphinx for python code.

Openly and Collaboratively Discuss, Address, and Close Issues

GitHub issues are a great way to keep track of bugs, tasks, feature requests, and enhancements. While classical issue trackers are primarily intended to be used as bug trackers, in contrast, GitHub issue trackers follow a different philosophy: each tracker has its own section in every repository and can be used to trace bugs, new ideas, and enhancements by using a powerful tagging system. The main objective of issues in GitHub is promoting collaboration and providing context by using cross-references.

Make your code easily citable

GitHub now integrates with archiving services such as Zenodo and Figshare, enabling DOIs to be assigned to code repositories. By default, Zenodo creates an archive of a repository each time a new release is created in GitHub, ensuring the cited code remains up to date. Once the DOI has been assigned, it can be added to literature information resources such as Europe PubMed Central.

Promote and discuss your project

GitHub Pages are simple websites freely hosted by GitHub. Users can create and host blog websites, help pages, manuals, tutorials, and websites related to specific projects.

Social feature in GitHub: follow and watch

In the same way researchers are following developments in their field, scientific programmers could follow publicly available projects that might benefit their research. GitHub enables this functionality by following other GitHub users (see also Rule 2) or watching the activity of projects, which is a common feature in many social media platforms.

Version control systems (VCS), which have long been used to maintain code repositories in the software industry, are now finding new applications in science. One such open source VCS, Git, provides a lightweight yet robust framework that is ideal for managing the full suite of research outputs such as datasets, statistical code, figures, lab notes, and manuscripts. For individual researchers, Git provides a powerful way to track and compare versions, retrace errors, explore new approaches in a structured manner, while maintaining a full audit trail. For larger collaborative efforts, Git and Git hosting services make it possible for everyone to work asynchronously and merge their contributions at any time, all the while maintaining a complete authorship trail. In this paper I provide an overview of Git along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.

Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, to reproduce and thus understand how results came about, while avoiding a variety of errors that may lead to erroneous reporting of statistical and computational results. In this tutorial, we demonstrate how the R package repro supports researchers in creating fully computationally reproducible research projects with tools from the software engineering community. Building upon this notion of fully automated reproducibility, we present several applications including the preregistration of research plans with code (Preregistration as Code, PAC). PAC eschews all ambiguity of traditional preregistration and offers several more advantages. Making technical advancements that serve reproducibility more widely accessible for researchers holds the potential to innovate the research process and to help it become more productive, credible, and reliable.

Researchers in ecology and evolutionary biology are increasingly dependent on computational code to conduct research. Hence, the use of efficient methods to share, reproduce, and collaborate on code as well as document research is fundamental. GitHub is an online, cloud-based service that can help researchers track, organize, discuss, share, and collaborate on software and other materials related to research production, including data, code for analyses, and protocols. Despite these benefits, the use of GitHub in ecology and evolution is not widespread. We outline features ranging from low to high technical difficulty, including storing code, managing projects, coding collaboratively, conducting peer review, writing a manuscript, and using automated and continuous integration to streamline analyses. Given that members of a research team may have different technical skills and responsibilities, we describe how the optimal use of GitHub features may vary among members of a research collaboration.

Using git in Research

Version control in research

What our research looks like

We have been here, done that

Version control system (VCS)

An ideal VCS for researcher

An ideal VCS for researcher

Basics of `git`

What is `git`?

Getting started with `git`

Download `git`

Configure `git` for the first time

Cloning a repository

Checking on the cloned repository

Create a file and make some changes

Take advantage of `git` and GitHub

Other articles to read

Reference

Using git in Research

Version control in research

What our research looks like

We have been here, done that

Version control system (VCS)

An ideal VCS for researcher

An ideal VCS for researcher

Basics of git

What is git?

Getting started with git

Download git

Configure git for the first time

Cloning a repository

Checking on the cloned repository

Create a file and make some changes

Take advantage of git and GitHub

Other articles to read

Reference

Basics of `git`

What is `git`?

Getting started with `git`

Download `git`

Configure `git` for the first time

Take advantage of `git` and GitHub