Git LFS (large file system) hell
Managing Large Files with Git LFS
Git LFS (large file system) hell
Managing Large Files with Git LFS
I recently faced a first-thing-for-everything challenge while working on my portfolio (this site). I had some quitelarge .gif filesthat I decided to manage using Git LFS (Large File Storage). However, things didn’t go as smoothly as I anticipated. Here's how I went through it and what I learned along the way.
"--distributed-even-if-your-workflow-isnt"
Gitis a powerful version control system with many benefits, a tool we all use on a daily basis, but when it comes to situations like storing/managing large files, it is important to note that storing large files directly in Git can significantly slow down operations like pulling, pushing, and cloning the repository. This can frustrate collaborators who rely on these operations to work efficiently.
When a large file is added to a Git repository, every collaborator on the repository must download the entire file, including all versions of it. This process can be time-consuming, especially for collaborators with slower internet connections. Additionally, storing large files on Git can result in a large repository size, making collaboration difficult.
That is whenGit LFScomes into play. It is a Git extension that allows you to store large files in a separate, encrypted repository, and stores a single text pointer in the current regular repository that points to the actual centent in the remote server. This means that only the collaborators who need the file can download it, reducing the size of the repository and improving collaboration.
Refer to Git LFS, note that required Git ≥ 1.8.2
windows
$download git-lfs.github.com
$git lfs install
$
macOS
$brew install git-lfs
$git lfs install
$
Linux
$sudo apt-get install git-lfs
$git lfs install
this will return output like this
Updated Git Hooks
Git LFS initialized
To track that .gif type of file in my repo, I just ran
$git add .gitattributes
$git lfs track "*.gif
this cmd let me git lfs track all .gif files in the repo directory, also will actually create the
.gitattributes file in the root of the repo dir, so it has something like
$*.gif filter=lfs diff=lfs merge=lfs -text
This is git mechanism that binds special behaviors to certain file patterns. Git LFS binds to filters using tracked file patterns via the .gitattributes file. And then you can absolutely commit/push
$git add .
$git commit -m "gif files to lfs"
$git push origin main
see now the gif file content does not exist on my actual repo, It is jsut a pointer. so when someone clones or pulls the changes, git will try to pull the changes, there are a few ways to ensure the LFS content is
retrieved&available:
a) Before deploying, you can run `git lfs fetch --all` to download all LFS objects.
b) On-demand fetching: Some hosting platforms (like GitHub Pages) can fetch LFS content on-demand when requested.
c) Custom server logic: You could implement server-side logic to fetch LFS content when requested.
I could not have the content served for me on dev nor production, so I tried to untrack the files by compressing them to be less than 50MB (gh repo limit). first thing to strat with:
now you have to pull the files contents from the lfs remote server to your local machine by running:
This command downloads the actual file content for any LFS-tracked files referenced in your repository.as it would be for any other regular file in the repository.
To ensure the file type is completely removed from LFS tracking, you should remove it from the LFS cache. Run the following command:
Ensure the file is untracked by Git LFS and that the actual file content is present in your local working directory. You can check if the file is tracked/not by listing all the files there:
Gitlabchecks files on push to detect LFS pointers. If it detects LFS pointers, GitLab tries to verify that those files already exist in LFS. If you use a separate server for Git LFS, and you encounter this problem:
- Verify you have installed Git LFS locally.
- Consider a manual push with `git lfs push --all`.
If you store Git LFS files outside of GitLab, you can disable Git LFS on your project.
You can host LFS objects externally by setting a custom LFS URL:
$git config -f .lfsconfig lfs.url <URL>
You might do this if you store LFS data on an appliance, like
Nexus Repository. If you use an external LFS store, GitLab can’t verify the LFS objects. Pushes then fail if you have GitLab LFS support enabled. To stop push failures, you can disable Git LFS support in your
project settings. However, this approach might not be desirable, because it also disables GitLab LFS features like:
- Verifying LFS objects.
- GitLab UI integration for LFS.
"The so-called crisis is because of the difficulty in replicating the work of co-workers or fellow scientists, threatening their ability to build on each others work or to share it with clients or to deploy production services. Since machine learning, and other forms of artificial intelligence software, are so widely used across both academic and corporate research, replicability or reproducibility is a critical problem."