arch bash cakephp conf dauth devops drupal foss git golang information age life linux lua mail monitoring music mysql n900 netlog openstack perf photos php productivity python thesis travel uzbl vimeo web2.0
There are tons of articles describing how you can rewrite history with git, but they do not answer "why should I do it?".
A similar question is "what are the tradeoffs / how do I apply this in my distributed workflow?".
Also, git developers strongly encourage/command you to write commit message in imperative present tense, but do not say why. So, why?
I'll try to answer these to the best of my abilities, largely based on how I see things. I won't get too detailed (there are enough manuals and tutorials for the exact concepts and commands).
Just like source code git history gets mostly read and relatively infrequently written.
You read history when you want to see what has changed, when searching a bug, what the difference is between branches, and so on.
The argument of "I want the history to look like exactly how it really happened" is flawed, because very often your history is suboptimal (you commit a feature, and shortly afterwards you commit a fix for that feature, or a commit that contains separate logical changes/bugfixes)
This makes history more complicated to read then it should be, so for all the folks who will ever look back at your history (even if you think that will only be yourself) a clean history is more easy to "get", just like clean source code.
Also, part of the awesomeness of git is that juggling with features (needed for debugging, trying things out, ..) in your code is so flexible (see the git commit/branch model), but if you have logical changes spread over multiple commits, or one commit containing multiple logical changes, this gets painful very quickly.
Once you figure out history rewriting (and it's pretty easy to learn, really!) it only costs a little time to clean up your history, which will pay off in a much greater extent for every time you or somebody else wants to look at, or needs to work with it. (again, just like source code itself!)
This also means that you don't need to spend so much time thinking about your commit messages for commits that are merely fixups or small additions to other logical changes. Because those will be squashed into the other commits anyway. I usually commit frequently, but end up squashing many commits together, my commit log easily gets compressed by a factor two or more. The less history, the better. (just like source code!)
The commits you actually push (especially when pushing to a master branch) should of course be clean, accuractly described and with correct author information, for obvious reasons such as readability.
Note that there is some kind of paradox: you can only achieve "perfect history" if your commits are well-tested and every introduced feature has no bugs (has all bugfix commits squashed into it), but at the same time, you can only properly expose new code by making it public, and it only gets widely used and tested if it's in your main (master) branch.
This is one of the reasons why a workflow model such as one based on topic branches (aka feature branches) works: you see, git by default doesn't allow non-fast-forward pushes. Because you obviously don't want to break the history of other people following your stable (master branch) development. So once you push to master, it should usually be there for good.
As far as I can see, it is accepted in most projects (those run by folks with git expertise?) to push non-fast-forward to topic branches. The idea being a topic branch is a "work in progress" branch, it is made public so multiple people can review/work on it. Based on that work/review, its history will often get rewritten through a non-fast-forward push. And if you're following/working on such a branch, you should be clever enough to deal with changed history.
So, a topic branch allows you to make changes public, get feedback, clean up the history of the patchset you (and maybe others) are working on, and when satisfied, you can push to master.
There is still a chance you'll later need to push bugfixes to master, but this will happen much more infrequently, so while there is no perfect workflow model that creates perfect history (in master) combined with perfect usability (no need to handle non-fastforward pushes) I find this model brings a quite good compromise.
To paraphrase, I would say:
You should care about clean vcs history for the same reasons you should care about clean code.
Just like using git is good to progressively help reaching better software, so is git history rewriting good for progressively reaching a better git history. Version control on top of version control, if you will. A very crude form of version control but I don't think it needs to be any more advanced then this.
Git developers command doing this (at least for the git project), but they did not document why's. Some commonly cited reasons:
Posted by Radim on Sun Mar 6 01:18:18 2011
Posted by Dieter on Sun Mar 6 04:38:41 2011
Posted by Radim on Sun Mar 6 05:56:43 2011
Posted by Dieter_be on Sun Mar 6 11:58:42 2011
Posted by Radim on Fri Mar 11 13:53:53 2011
Posted by dieter on Sat Mar 12 12:40:41 2011
Posted by Dieter on Wed Mar 16 13:15:34 2011