Jump to content
News about the LabVIEW Wiki! Read more... ×
Michael Aivaliotis

Managing large files within GIT repo size limitations.

Recommended Posts

I'm currently using Bitbucket, but I've used other cloud services and have used GitHub as well. The ones I've used have limitations and typically cap your repo size at 2 Gb. Recently I hit this limit on a large project. Mainly because of all the support files. However if you've worked on any large LabVIEW project that has constant development for a decade, then you can probably hit this limit considering the binary nature of LabVIEW source.

So this question is mostly to get community feedback on what approach you use to handle this, if at all. I went down a rabbit hole recently on one big project. I decided to keep ONLY source code in GIT and move all support files to dropbox. I also tried putting the build output and other transitory files to dropbox as well. However I found an issue because dropbox would interfere with the build process (any insight to this from others is welcome).

Thanks for your help.

Share this post


Link to post
Share on other sites

We use our internal SVN services so this might not be the answer you look for. But I generally refrain from adding the build executables to the repository and prefer to set tags for each release. Yes this means that I might have to go back to a specific tag and rebuild the executable, which is time consuming and has the potential to not create exactly the same executable anymore depending on driver installations that happened in the meantime for another project. But this happens very rarely as most times you want to have the latest and greatest software version. Obviously if you develop a product that might be sold to different customers over the years and might require you to be able to reproduce old versions frequently to support a specific client, this might be something that makes this approach very unusable.

And I definitely never ever add the Installer to the repository. That is going to mess with your repository big time, no matter what.

Edited by Rolf Kalbermatter

Share this post


Link to post
Share on other sites

Yeah if you have only source,  2 GB is plenty even for labview. Not sure about build process issues, but for me I ran a little function post-build to upload the exe/etc to an internal server running an artifact repository. Given that you run lava, I'm guessing you could set something like that up and host it yourself, or maybe your workload is light enough you could use the free tier of amazon/ms/oracle/google's cloud services.

As to specifics: I used artifactory, but a nice simple route would be to just set up a separate git server with lfs support and then your post build step is a push to a different repo. Or you could even set up your /build directory as a git submodule or subtree (i forget the diff, I think submodule is right for this) where the main repo points at github while the submodule points at your server. Gitlab I think has a free instance for self hosting, and gitea is a free github clone, and amazon has a hosted version thats free up to a limit as does google.

Edit: or just use gitlab: https://about.gitlab.com/2015/04/08/gitlab-dot-com-storage-limit-raised-to-10gb-per-repo/

Edited by smithd

Share this post


Link to post
Share on other sites
6 hours ago, Michael Aivaliotis said:

The ones I've used have limitations and typically cap your repo size at 2 Gb. Recently I hit this limit on a large project.

Technically, no repository should ever grow to that size (as in "best practice"). Especially executables and installers will never change, so there is no reason to keep a history of it. I suggest putting them in a separate folder. For the same reason LV allows to separate compiled code in the first place.

If you insist on keeping these files in the repository, consider making changes to the structure of your repository to keep it manageable.

6 hours ago, Michael Aivaliotis said:

So this question is mostly to get community feedback on what approach you use to handle this, if at all.

I've been in a similar situation in the past, where a repository exceeded a size of 10 GB. Very painful to pull and push, even on a local network. That repository contained installation scripts and of course the installation files that were bundled with it. Similar to your situation, these support files were the main reason for the size of the repository.

The solution was simple: Split the repository into two repositories. One repository contains the installation scripts, another the support files. The repository with the support files was made a submodule in the main repository (using git submodules), essentially keeping the original structure in place. To reduce the size of the main repository, history was simply rewritten using git filter-branch. Don't do that! First create a copy of the repository and then make changes to the copy. That way the old repository can be archived for future reference (lessons learned 😅).

Of course, the repository for the support files also grows over time. The solution is to create a new empty repository for every major revision (or whatever fits your needs) and re-link the main repository to it. It is still possible to checkout older revisions (the ones pointing to the "previous" repository) but it requires re-initializing submodules when doing so (because it points to a different repository).

6 hours ago, Michael Aivaliotis said:

I also tried putting the build output and other transitory files to dropbox as well. However I found an issue because dropbox would interfere with the build process (any insight to this from others is welcome).

I had similar issues on Windows 7, where the build fails if the output folder is open in Windows Explorer.

You should take a look at the application builder palette and automate the process by moving files after the build finished.

We did so recently with great results. With a simple click of a button it builds all files, puts everything in a ZIP file (named according to naming standard) and moves it to a secure server location that is automatically shared to everyone who need to know about it. The only thing needed is to set the build version before pressing start and to commit (and tag) after it finished.

Hope that is of some use.

Share this post


Link to post
Share on other sites

 

7 hours ago, Rolf Kalbermatter said:

But I generally refrain from adding the build executables to the repository and prefer to set tags for each release.

I use tags as well to mark where a new release version is built. This way I can go back to the tag if I need to branch, to fix a bug on that version or to track down a version related issue. I'm in agreement.

7 hours ago, smithd said:

Edit: or just use gitlab:

Whoah. 10Gb is definitely larger and would basically solve my problem. More than enough (famous last words). However, a lot of these decisions are also related to the surrounding ecosystem of tools. Bitbucket surrounds itself with Atlassian products which I love and use. I previously switched from Kiln and the main reason is not so much the repo management but the surrounding tools were out of date and not getting any feature additions or development support. I just like having everything under one umbrella.

1 hour ago, LogMAN said:

You should take a look at the application builder palette and automate the process by moving files after the build finished.

Ya, I'm already aware of the tools and already have an automated build process in place. I was just hoping to skip this work. But in the end I will do this of course. What else can you do.

Share this post


Link to post
Share on other sites

So it seems Bitbucket has some solution for this actually. I think Github as well. It's called LFS (Large File System), and it manages large files outside of the repository.

Here's their tutorial: https://www.atlassian.com/git/tutorials/git-lfs

You just have to specify 

git lfs track '<pattern>'

This can be a folder, file wildcard etc as explained in the docs.

I think released installers should not be versioned. It doesn't make sense, and is not very convenient, to revert your entire repo to a tag, just to send someone the correct version of the installer. So I definitely think those files are candidates to be off on dropbox. The space on dropbox is much cheaper than the space on Bitbucket, even if it's using LFS. But LFS is useful if you can't predict which files will be large, such as a support folder that should be versioned. For example if it contains build support tools or parts of the installer build support etc. These need to be versioned in case they change throughout the development cycle.

Share this post


Link to post
Share on other sites
1 hour ago, Michael Aivaliotis said:

I think released installers should not be versioned. It doesn't make sense, and is not very convenient, to revert your entire repo to a tag, just to send someone the correct version of the installer. So I definitely think those files are candidates to be off on dropbox. The space on dropbox is much cheaper than the space on Bitbucket, even if it's using LFS. But LFS is useful if you can't predict which files will be large, such as a support folder that should be versioned. For example if it contains build support tools or parts of the installer build support etc. These need to be versioned in case they change throughout the development cycle.

For released installers, GitHub provides a very sensible feature called Releases (https://help.github.com/articles/about-releases/). Each GitHub Release is linked against a tag in your repo, and you can have multiple files within each Release -- for example: One installer for your server PC, one installer for your client PCs, and one Linux RT disk image for your cRIO end node (created via the NI RAD Utility). Each file can be up to 2GB, but there is no hard limit on the total disk space occupied by all your Releases. You can also write release notes for each release which gives you a changelog for your whole project.

BitBucket doesn't have something as comprehensive as GitHub Releases, but it does provide a Downloads feature where you can put your installers (https://bitbucket.org/blog/new-feature-downloads). It allows up to 2 GB per file.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

By using this site, you agree to our Terms of Use.