Thursday, May 11

History of Git

Git is one of those tools that is so simple to use, that you often don’t learn a lot of nuance to it. You wind up cloning a repository from the Internet and that’s about it. If you make changes, maybe you track them and if you are really polite you might create a pull request to give back to the project. But there’s a lot more you can do. For example, did you know that Git can track collaborative Word documents? Or manage your startup files across multiple Linux boxes?

Git belongs to a family of software products that do revision (or version) control. The idea is that you can develop software (for example) and keep track of each revision. Good systems have provisions for allowing multiple people to work on a project at one time. There is also usually some way to split a project into different parts. For example, you might split off to develop a version of the product for a different market or to try an experimental feature without breaking the normal development. In some cases, you’ll eventually bring that split back into the main line.

Although in the next installment, I’ll give you some odd uses for Git you might find useful, this post is mostly the story of how Git came to be. Open source development is known for flame wars and there’s at least a few in this tale. And in true hacker fashion, the hero of the story decides he doesn’t like the tools he’s using so… well, what would you do?

War of the Version Controllers

Historically, a lot of software that did this function had a central-server mindset. That is, the code lived on the network somewhere. When you wanted to work on a file you’d check it out. This only worked if no one else had it checked out. Of course, if you were successful, no one else could check out your files until you put them back. If you were away from the network and you wanted to work on something, too bad.

However, more modern tools relax some of these restrictions. Ideally, a tool could give you a local copy of a project and automatically keep other copies updated as you release changes. This way there was no central copy to lose, you could work anywhere, and you didn’t have to coordinate working on different things with other teammates.

Closed Tool

A very large distributed team develops the Linux kernel. By late 1998 the team was struggling with revision management. A kernel developer, [Larry McVoy], had a company that produced a scalable distributed version control product called BitKeeper. Although it was a commercial product, there was a community license that allowed you to use it as long as you didn’t work on a competing tool while you were using the product and for a year thereafter. The restriction applied to both commercial and open source competition. Although the product kept most data on your machine, there was a server component, so the company could, in fact, track your usage of the product.

In 2002, the Linux kernel team adopted BitKeeper. [Linux Torvalds] was among the proponents of the new system. However, other developers (and interested parties like [Richard Stallman] were concerned about using a proprietary tool to develop open source. BitMover — the company behind BitKeeper — added some gateways so that developers who wanted to use a different system could, to some extent.

For the most part, things quieted down with only occasional flame skirmishes erupting here and there. That is until 2005 when [McVoy’s] company announced it would discontinue the free version of BitKeeper. Ostensibly the reason was due to a user developing a client that added features from the commercial version to the free one.

New Tools

As a result, two projects spun up to develop a replacement. Mercurial was one and Git, of course, was the other. [McVoy] contacted a commercial customer demanding that their employee [Bryan O’Sullivan] stop contributing to Mercurial, which he did. Of course, both Mercurial and Git came to fruition, with Git becoming not only the kernel team’s version control system but the system for a lot of other people as well.

Birth of Git

[Linus] did look for another off-the-shelf system. None at the time had the performance or the features that would suit the kernel development team. He designed git for speed, simplicity, and to avoid doing the same things that CVS (a reviled version control program) did.

Initial development is said to have taken a few days. Since the version 1.0 release in late 2005, the software has spawned more than one major website and has become the system of choice for many developers, both open source, and commercial.

Repo Man

The flow chart shows the secret of how git handles lots of developer’s at one time: repositories or repos. Each developer has a total copy of the entire project (the local repository). In fact, if you don’t care about sharing, you don’t even need a remote repository. Your private repo is as much a full-featured git project as anyone else’s, even the remote one, which is probably on GitHub or another network server. You make your changes in the working directory, stage what you are “done” with (for now), and commit it to your repo. When it is time, you push your changes up to the remote and it gets merged with other people’s changes.

Interestingly, git doesn’t only work on text files (I’ll show you more about that in the next installment). It does, however, work best on text files because it is smart enough to notice changes in files that don’t overlap and merge them automatically. So if I fix a bug involving a for loop in some code and you change some error messages, git will sort it all out when our code merges.

That doesn’t always work, of course. That leads to conflicts that you have to manually resolve. But unless you have two people touching the exact same parts of the code, git usually does a nice job of resolving the difference. Of course, binary files don’t generally get that luxury. You can’t exactly diff an icon to see that one person drew a mustache on it and another person turned the background green. However, technically, if you could figure out the algorithm, you could add it into git.

Denouement

If you want to increase your knowledge of git beyond just doing a clone, you could do worse than spend 15 minutes on this tutorial. If you already know the basics, you might find some new things in a more advanced tutorial or check out the video of a talk [Linus] gave on git a while back.

In 2016, by the way, BitKeeper announced they would move to the Apache License which, of course, is open source. Kind of ironic, isn’t it?

Photo credits:


Filed under: Featured, History

No comments:

Post a Comment