DEV Community

woss
woss

Posted on

Part 1: Rehosting git repositories on IPFS

Have you ever wanted to have a truly distributed way of hosting the git repository at a specific revision, tag, or branch, always available even when the remote server is down? If the answer is YES then this post is for you.

In this post, you will learn how to rehost any git repository by revision, tag, or branch on IPFS.

Prerequisites:

  • the difference between a bare and normal repository. More info on StackOverflow
  • storing files on IPFS and what is the CID
  • a running IPFS node
  • and of course git knowledge

Let's dig in! ⚒️

We will use a dummy repository, and its source code is located here https://github.com/woss/dummy

Basic rehosting

Clone the bare repository and enter the directory:

git clone --bare https://github.com/woss/dummy dummy.git
cd dummy.git
Enter fullscreen mode Exit fullscreen mode

Since we are going to use IPFS to store the data, we need to tell git to change certain things in the bare repo to support dumb servers. IPFS is considered a dumb server because it cannot handle the updates, only reads. More info can be found here: update-server-info, dumb server, and git-repository-layout.

git update-server-info
Enter fullscreen mode Exit fullscreen mode

Now we are ready to upload this bare repo to the IPFS.

### choose only one
ipfs add --cid-version=1 -r .             # this pins the CID by default 
ipfs add --pin=false --cid-version=1 -r . # this will not pin the CID
### choose only one

...
... 
... 
added bafybeihptjdt3maqy66vjwklpk6va3zavwziu6zhftqaedttux7dipzjxy .
Enter fullscreen mode Exit fullscreen mode

And that's it! We have rehosted our git repository on the default branch (in our case it's main) with all tag references.

Now, you should be able to see your rehosted repository on your IPFS gateway or use it as a git dependency with your favorite package manager. 🎉🎉🎉

Advanced rehosting

In the previous section, we did the most basic rehosting. Here you will learn how to rehost the tag, branch and a revision. We will not focus on the differences in the CID for the same revision, tag or a branch given the larger git history. We will cover that in the Part 2 of this series.

Rehosting the revision

In the normal repository to switch to a specific revision we would do this git reset --hard 20888c33cd0f6f897703198199f33369cba8639a. This puts the repository into the Detached HEAD state which is quite useful for the CI and testing the specific PR or MR. It is also very useful for rehosting since you are making sure that anybody who clones the repository will get the desired revision by default without knowing anything about the git history beforehand. This way you can be sure that the dependency behind the rehosted version will always produce the same code.

That said, the tricky part is that on the bare repositories you cannot run [git reset](https://git-scm.com/docs/git-reset) since the bare repo doesn't contain a working tree. Here are two different ways to accomplish git reset on a bare repository:

Detaching the HEAD

It is possible to force a bare repo into the detached state by manually changing the HEAD to the revision we need. This can be done like this:

❯ git log
2021-10-19 aa97502 (HEAD -> main, tag: v0.3.0, origin/main, origin/HEAD) Merge pull request #1 from woss/change-1  [Daniel Maricic]
2021-10-19 3a227d8 (origin/change-1) Merge branch 'main' into change-1  [Daniel Maricic]
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic] # <<<< we need this!!!
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

# get the full revision, short will not work
❯ git rev-parse 20888c3
20888c33cd0f6f897703198199f33369cba8639a

❯ echo 20888c33cd0f6f897703198199f33369cba8639a > HEAD

❯ cat HEAD
20888c33cd0f6f897703198199f33369cba8639a

❯ git log
2021-10-18 20888c3 (HEAD, tag: v0.2.0, main) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]
Enter fullscreen mode Exit fullscreen mode

And that's it! Now you can use [shell#2]()and then [shell#3]() to upload your repo to the IPFS.

Updating the reference to the specific revision with the default or custom branch name

If you are not comfortable with the Detached HEAD state, this approach kind of cheats the refs to point the main branch to a different revision you can use following. Both of the commands will create the file called main ( default branch ) under refs/head with the value of the revision identifier.

❯ git log
2021-10-19 aa97502 (HEAD -> main, tag: v0.3.0, origin/main, origin/HEAD) Merge pull request #1 from woss/change-1  [Daniel Maricic]
2021-10-19 3a227d8 (origin/change-1) Merge branch 'main' into change-1  [Daniel Maricic]
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic] # <<<< we need this!!!
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ git rev-parse 20888c3
20888c33cd0f6f897703198199f33369cba8639a

❯ git update-ref refs/heads/main 20888c33cd0f6f897703198199f33369cba8639a
# or
❯ git update-ref HEAD 20888c33cd0f6f897703198199f33369cba8639a

❯ tree refs
refs
├── heads
│   └── main
└── tags

❯ cat refs/heads/main
20888c33cd0f6f897703198199f33369cba8639a

❯ git log
2021-10-18 20888c3 (HEAD -> main, tag: v0.2.0, origin/main, origin/HEAD) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]
Enter fullscreen mode Exit fullscreen mode

If we wanted to name the reference differently, let's say rehosted instead of main, to avoid confusion with the upstream main branch, you can do it like this:

❯ git log
2021-10-19 aa97502 (HEAD -> main, tag: v0.3.0, origin/main, origin/HEAD) Merge pull request #1 from woss/change-1  [Daniel Maricic]
2021-10-19 3a227d8 (origin/change-1) Merge branch 'main' into change-1  [Daniel Maricic]
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic] # <<<< we need this!!!
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ git rev-parse 20888c3
20888c33cd0f6f897703198199f33369cba8639a

❯ git update-ref refs/heads/rehosted 20888c33cd0f6f897703198199f33369cba8639a

❯ tree refs
refs
├── heads
│   └── rehosted
└── tags

❯ cat refs/heads/rehosted
20888c33cd0f6f897703198199f33369cba8639a

❯ git log
2021-10-18 20888c3 (HEAD, tag: v0.2.0, rehosted) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]
Enter fullscreen mode Exit fullscreen mode

Rehosting the branch

This approach is supported by the bare repo. It involves changing the refs using the [git symbolic-ref](https://git-scm.com/docs/git-symbolic-ref) command.

Let's say we want to point to the branch called change-1 we would do it like this:

❯ git log
2021-10-19 aa97502 (HEAD -> main, tag: v0.3.0, origin/main, origin/HEAD) Merge pull request #1 from woss/change-1  [Daniel Maricic]
2021-10-19 3a227d8 (origin/change-1) Merge branch 'main' into change-1  [Daniel Maricic] # <<<< we need this!!!
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ git symbolic-ref HEAD refs/heads/change-1

❯ git log
2021-10-19 3a227d8 (HEAD -> change-1) Merge branch 'main' into change-1  [Daniel Maricic]
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ cat HEAD
ref: refs/heads/change-1
Enter fullscreen mode Exit fullscreen mode

And that's it! Now you can use [shell#2]()and then [shell#3]() to upload your repo to the IPFS.

Rehosting the tag

This approach is supported by the bare repo. It involves changing the refs using the [git symbolic-ref](https://git-scm.com/docs/git-symbolic-ref) command.

❯ git log
2021-10-19 aa97502 (HEAD -> main, tag: v0.3.0, origin/main, origin/HEAD) Merge pull request #1 from woss/change-1  [Daniel Maricic]
2021-10-19 3a227d8 (origin/change-1) Merge branch 'main' into change-1  [Daniel Maricic]
2021-10-18 7c17223 commit to change-1  [Daniel Maricic]
2021-10-18 20888c3 (tag: v0.2.0) third commit  [Daniel Maricic] # <<<< we need this!!!
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 7e962ca add changefile  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ git tag
v0.2.0
v0.3.0

❯ git symbolic-ref HEAD refs/tags/v0.2.0

❯ git log
2021-10-18 20888c3 (HEAD, tag: v0.2.0) third commit  [Daniel Maricic]
2021-10-18 25ad655 second commit  [Daniel Maricic]
2021-10-18 6365c84 first commit  [Daniel Maricic]

❯ cat HEAD
ref: refs/tags/v0.2.0
Enter fullscreen mode Exit fullscreen mode

And that's it! Now you can use [shell#2]()and then [shell#3]() to upload your repo to the IPFS.

Unpacking the git object

It's hard to summarize what are the git objects in a single sentence ( or at least it is for me ), but essentially it is the git database. All the history and file contents are there ( not the LFS ). Since git is trying to occupy the least amount of space possible it does a lot of optimizations and one of them is packing the objects together into things called packfiles. The more your repository grows the more packfiles will be created and the bigger they get.

For example, at the time of writing, Linux git repo has 8358230 objects which is quite a large number and it will take some time to clone. Now, imagine unpacking these objects just so you can leverage the IPFS deduplication feature! It sounds tempting on smaller repositories, like our dummy one, which has only 19 objects, but on the big ones — definitely no!

Unpacking the packfiles and uploading them to the IPFS has these major effects:

  • 👎🏽 significantly increases the rehosting time
  • 👎🏽 significantly increases the clone time
  • 👍🏽 IPFS deduplicates files, resulting in smaller disk usage

I suggest unpacking AFTER the Re-hosting the * sections.

Those who want to rehost smaller repositories and leverage the IPFS deduplication here is how:

mv objects/pack/*.pack .

### choose only one
# if using the for example nodejs exec this will break
git unpack-objects < *.pack
# but not this
cat *.pack | git unpack-objects
### choose only one

rm -f *.pack
rm -f objects/pack/*.idx
Enter fullscreen mode Exit fullscreen mode

Now you can use [shell#2]()and then [shell#3]() to upload your repo to the IPFS.

Kudos to and inspired by:

Thank you for reading. If you have found this post useful, please share and maybe subscribe.

Top comments (0)