Jump to content

Who uses Git Submodules?


Recommended Posts

I use git submodules. Do yourself a favor and don't use them. There are many pitfalls, here are just a few examples:

  • If a submodule is placed in a folder that previously existed in the repository (perhaps the one you moved into the submodule), checking out a commit from before the submodule existed will cause an error because "files would be overwritten". So you'd have to first delete the submodule and then check out the old commit. Of course, after you are finished you'd have to checkout master again and update your submodule to get it back to the current version.
    rm submodule/ -rf
    git checkout <old-commit>
    ...
    git checkout master
    git submodule update
    This is hard to remember and annoying to use. Not to mention the difficulties for novice git users and the lack of support by most UI tools.
     
  • Everyone on the team needs to remember to occasionally do git submodule update to pull the correct commit for all submodules. Just imagine shipping an application with outdated code because you forgot to run this command...
     
  • Branching submodules while branching the main repository, or as I call it: "when you spread the love".

    Imagine this situation:

    Colleague A creates a branch in the main repository, let's call it "branch_a". Now they also create a branch on the submodule, let's call it "submodule_branch_a". Of course they'll update the submodule in the main repository to follow their branch. Then they begin changing code and fixing bugs.

    In the meantime, colleague B works on master (because why not). Of course, master uses the master branch of the submodule. So they add some exciting new feature to the submodule and update the commit in the main repository. All of this happens in parallel to colleague A.

    Eventually colleague A will finish their work, which gets merged into master.

    Unfortunately, things aren't that easy... Although "branch_a" followed "submodule_branch_a", it will not do so after merging to master. Because colleague B changed the commit of the submodule in parallel to colleague A, whichever change happened last will be merged to master. This is an exciting game, where you spread the love, flip a coin and hope that you are the chosen one.

I actually changed our main repository (shared among multiple developers) to submodules a few years ago, only to realize that they introduce problems that we simply couldn't fix. So I had to pull the plug, delete all recent changes and go back to the state before submodules. That was a fun week...

That said, we now use subtrees. A subtree is somewhat similar to a submodule in that it allows you to maintain code in separate repositories. However, a subtree doesn't just store the commit hash but actually pulls the subtree into your repository, as if it was part of that repository in the first place (if you want, even with the entire history). With a subtree, you can simply clone the main repository and everything is just there. No additional commands. However, you'd want to avoid changing any files that belong to submodules, so that you essentially only pull subtrees and never push (or merge). I simply put them in a "please don't change" directory (anyone who changes files in that directory will have to revert and do their work over).

Atlassian has a nice article on git subtrees if you are interested: https://www.atlassian.com/git/tutorials/git-subtree

Link to comment

Typically, people tend to use 'repo' from Android to make a project from multiple git repos. It's a lot more flexible than submodules.

With 'repo', you create a 'manifest' (or multiple manifests) which contain all repositories to be included, with indication on branch and commit to be used.

It's intuitive, and anyone who worked with Android already knows how to use it.

 

Link to comment
On 8/28/2020 at 3:40 PM, LogMAN said:

Everyone on the team needs to remember to occasionally do git submodule update to pull the correct commit for all submodules. Just imagine shipping an application with outdated code because you forgot to run this command...

In our case it's the opposite problem; shipping code for a test station that has not been tested for that station.  We have multiple stations, that cannot all be tested at the same time, so changes to a shared component, made in developing one station type, should only be pulled into another type explicitly, not be a general "update all" thing.  Admittedly, I would much rather update common components asap, and I'm hoping splitting the common components into submodules will allow most to be pulled often.  Currently, the different stations get branched, with all shared components, which is a merging nightmare.

Link to comment
On 8/28/2020 at 3:40 PM, LogMAN said:

Branching submodules while branching the main repository, or as I call it: "when you spread the love".

I am hoping that the need to branch the submodule repo will be a big improvement on the current situation.  Needing to make an explicit commit/branch in the submodule will give the developer a point to think if they really meant to change something in the common component.  If they do, they will be making a commit message describing that change.  And I, if I'm the prime developer for that submodule) will notice this commit by another developer and can go and talk to them and deal with the issue (I will follow an aggressive merge-branch-quickly policy in common components).

Link to comment
On 9/1/2020 at 11:42 AM, drjdpowell said:

We have multiple stations, that cannot all be tested at the same time, so changes to a shared component, made in developing one station type, should only be pulled into another type explicitly, not be a general "update all" thing.

What you describe sounds very similar to our situation, except that we only have a single top-level repository for all stations. If you look at a single station repository of yours, however, the structure is almost the same. There is a single top-level repository (station) which depends on code from multiple components, each of which may depend on other libraries (and so forth).

* Station
	+ Component A
		+ Library X
		+ Library Y
	+ Component B
		+ Library X
		+ Library Z
	+ ...

In our case, each component has its own development cycle and only stable code is pulled in the top-level repository. In your case there might be multiple branches for different stations, each of which I imagine will eventually be merged into their respective master branch and pulled by other stations.

* Station (master)
	+ Component A (master)
		+ Library X (dev-station)
		+ Library Y (master)
	+ Component B (dev-station)
		+ Library X (dev-station)
		+ Library Z (master)
	+ ...

In my opinion you should avoid linking development branches in top-level repositories at all costs. Stations should either point to master (for components that are under development) or a tag.

* Station A (released)
	+ Component A (tag: 1.0.0)
	+ Component B (tag: 3.4.7)
* Station B (in development)
	+ Component A (tag. 1.2.0)
	+ Component B (master) <-- under development
* Station C (released)
	+ Component A (tag: 2.4.1)
	+ Component B (tag: 0.1.0)
On 9/1/2020 at 11:48 AM, drjdpowell said:

I am hoping that the need to branch the submodule repo will be a big improvement on the current situation.

Not sure if I misunderstand your comment, but you don't actually have to branch a submodule. In fact, anyone could simply commit to master if they wanted to (and even force-push *sight*). Please also keep in mind that submodules will greatly impact the git workflow and considerably increase the complexity of the entire repository structure. Especially if you have submodules inside submodules...

In my opinion there are only two reasons for using submodules:

  • To switch branches often (i.e. to test different branches of a component at station level).
  • To change code of a component from within the station repository.

Both are strong indicators of tightly coupled code and should therefore be avoided.

We decided to use subtrees instead. For every action on a subtree (pull, change branch, revert, etc.) there is a corresponding commit in the repo. We have a policy that changes to a component is done at component level first and later pulled into the top-level repository. Since the actual code of a subtrees is included in the repository, there is no overhead for subtrees that include subtrees and things like automated tests also work the same as for regular repositories.

On 9/1/2020 at 11:48 AM, drjdpowell said:

Needing to make an explicit commit/branch in the submodule will give the developer a point to think if they really meant to change something in the common component.  If they do, they will be making a commit message describing that change.

You have the right intention, but if any developer is allowed to make any changes to any component, there will eventually be lots of tightly coupled rogue branches in every component, which is even worse that the current state. Not to forget that you also need to make sure that changes to a submodule are actually pushed.

This is where UI tools become handy as they provide features like pushing changes for all submodules when pushing the top-level repository (IIRC Sourcetree had a feature like that).

To be fair, subtrees don't prevent developers from doing those changes. However, since the code is contained in the top-level repository, it becomes responsibility of the station owner instead of the component owner.

On 9/1/2020 at 11:48 AM, drjdpowell said:

And I, if I'm the prime developer for that submodule) will notice this commit by another developer and can go and talk to them and deal with the issue (I will follow an aggressive merge-branch-quickly policy in common components).

In my experience it's a good idea to assign a lead developer to each component, so that every change is verified by single (or a group of) maintainer(s). In theory there should only be a single branch with the latest version of the component (typically master). Users may either pull directly from master, or use a specific tag. You don't want rogue branches that are tightly coupled to a single station at component level.

Edited by LogMAN
  • Like 1
  • Thanks 1
Link to comment

I had the same issue with svn and externals. One app pointed to branch of another component. It was a mess when managing this in long run.

Now when we do transition to git, i try to find alternative solution. So far the most reasonable is pulling component binaries to project that needs them via nuget package or g package manager. From what i see NIPM installs package globally for PC, not per project. 

Link to comment

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.