Work on an open-source project from anywhere: Difference between revisions

From Bitpost wiki
No edit summary
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 8: Line 8:
* [http://git-scm.com/ git]
* [http://git-scm.com/ git]
* Any current project accessible via [http://subversion.tigris.org/ subversion]
* Any current project accessible via [http://subversion.tigris.org/ subversion]
Make sure you have git configured properly, including assigning your name and defining your editor.  It's probably worthwhile to grab the code directly from svn and make sure you can compile it, if you have the time or need.


Here are our goals:
Here are our goals:
Line 16: Line 18:
* Set up any number of additional repositories to work on your code at any location
* Set up any number of additional repositories to work on your code at any location


== Setting up your own branch off svn  ==
== Setting up your own copy of svn  ==


NOTE: I'm going to use ampache in this example, because that's what I wanted to work on.
NOTE: I'm going to use ampache in this example, because that's what I wanted to work on.
Line 24: Line 26:
  cd ampache_ext
  cd ampache_ext
  git svn clone -r2451 https://svn.ampache.org/ trunk
  git svn clone -r2451 https://svn.ampache.org/ trunk
We now have a repo of the project under git control.  You can actually start editing on the project immediately, if you have an itch to scratch.  Note that git has placed the code under the default "master" branch; that's going to be good enough for us in this approach.
We now have a repo of the project under git control.  Let's immediately update it, so we know we have the latest revision:
git svn rebase
You can actually start editing on the project immediately, if you have an itch to scratch.  Note that git has placed the code under the default "master" branch; that's going to be good enough for us in this approach.


As you edit, you're only changing local files.  git refers to the locally modified files as your "working set".  git doesn't track anything about your working set - you're free to abuse it as you see fit.  If you want to keep changes you've made, you need to add each file you've changed or added to the git "index" (also called the "cache").  Then, when you're ready, you commit all the indexed changes to the repository in one single commit.  So git provides the "index" as a space in which you can manage your next commit.
As you edit, you're only changing local files.  git refers to the locally modified files as your "working set".  git doesn't track anything about your working set - you're free to abuse it as you see fit.  If you want to keep changes you've made, you need to add each file you've changed or added to the git "index" (also called the "cache").  Then, when you're ready, you commit all the indexed changes to the repository in one single commit.  So git provides the "index" as a space in which you can manage your next commit.


Let's say you've changed play.php.  Here's how to commit the change to your git branch:
Let's say you've changed play.php.  Here's how to commit your change to your git repo (but not svn yet):
  git add play.php
  git add play.php
  git commit -m "I updated the play page."
  git commit -m "I updated the play page."
The index is an extra layer that you don't usually have with cvs/svn.  It's totally a benefit, and you can ignore it by doing the add and commit in one step.  This will add every file that is already in the repo and has been modified to the index, then commit, in one step ("cvs/svn style"):
The index is an extra layer that you don't usually have with cvs/svn.  It's totally a benefit, and you can ignore it by doing the add and commit in one step.  This will add every file that is already in the repo and has been modified to the index, then commit, in one step ("cvs/svn style"):
  git commit -a -m "I changed this to that."
  git commit -a -m "I changed this to that."
Note that this will not add new files you've created to the repo, you'll always need to use [git add] for that.  Read this paragraph again until it makes sense.  :>
Note that this will not add new files you've created to the repo, you'll need to specifically use [git add] for that.
 
So over time, you'll have a series of commits to your branch.  Later, you'll want to get the latest svn changes and incorporate them.


So over time, you'll have a series of commits.  Later, you'll want to get the latest svn changes, and combine them with your changes.  Here's what is going to happen:
* git saves off your changes, reverting the repository to the last svn revision you grabbed
* git grabs the latest svn changes (from the last svn revision you grabbed to the latest available revision)
* git drops the changes right on top of your repo - the changes will fit perfectly
* git drops your changes back on top of the latest svn
This is as smooth as it could possibly be.  Here's the command - yes it's really this simple:
git svn rebase
Now, that doesn't mean it will always be easy.  If there is any significant change to the same file in both the latest svn changes and your changes, you are obviously going to need to merge the two.  This is a basic truth.  :>


== Committing to the svn repo ==


Now you are keeping your personal masterpiece updated with the latest changes from the open-source developers.  But we want to contribute, right?  Well, once you have permission to do so, git makes it pretty easy, once again.


A minor bit of advice: no matter what language or tech you are using, try to extend others' code with grace.  Try not to jump into the middle and start refactoring like a banshee.  You will pay the piper when it comes time to merge.  Try to extend what is there through derived or extension classes, try to keep your code in separate modules, etc.  Anything you can do to separate your work will be worth the effort.


If you are like me, you will have a series of commits, some major, some just being fixes for something stupid you did.  We all have our egos to deal with - that's partly why we're in this to begin with - so what can we do to clean up before we post everything for the world to see?  git to the rescue once again.  We're going to rebase our changes again, packaging them up into something more attractive.  Viva la rebase!


First, make sure you have grabbed the latest svn revisions, to keep things as simple as possible.  See the previous section if you have ADD and can't remember how.  You may still collide with changes coming in (which you can handle), but this will reduce the chances.


Next, we're going to check on the history of the repo.  We're specifically looking at our most recent set of changes that have not yet been published.  We want to find the exact point where they begin.  Try this:
git log
You will see a list of commits.  Navigate down the list until you find the first local commit for your latest set of changes.  Copy the hash that identifies the commit JUST BEFORE your first new commit.  That is, the last known commit from svn.  Now paste it into this command:
git rebase -i #hash_number#
This is going to work some magic for you.  It will pull together all your recent commits and display the commit messages for you in your editor.  Here's what you get to do in this kind of rebase.  You can remove a commit (probably not a good idea).  You can pick a commit (this just keeps the commit as-is).  And, the most cool of all, you can squash a commit.  Squashing a commit combines it with the previous commit.  So, before I drop my cruft on some other kind soul's svn repo, I like to squash all my local commits into one.  To do so, put an "s" in front of all your commits after the first one.  Once you exit the rebase edit, your editor will display all the messages for the commits that were squashed together.  Now you can rewrite history!  What a beautiful thing.  Simply delete all the embarrassing commits (ha!) and rewrite the message as you need to sound like you knew what you were doing all along.  How sweet is that.  Once you exit the editor, git will squash the commits into one, and apply the new commit message.  You are now a hero, take a bow.


We're going to use rebase againPreviously, we used it to update from svn, let's do that again.  Make sure you switch to the svn branch first.
The commit should now go smooth as buttaThe commit and commit message is already queued up, so this just seals the deal:
git checkout master
git svn rebase
Now it gets interesting.  We're going to switch back to our branch, and once again use rebase.  To be more specific, rebase means to take your current set of commits all the way back to the last rebase, REMOVE them from the branch, "rebase" the branch as specified, and reapply all our commits.  This is good stuff!
git checkout mybranch
git rebase master
git has now reapplied your changes on top of the latest svn.  You won't get any svn conflicts, because the latest svn changes were applied directly to the original svn base.  You get to concentrate on what's important: overlaying your modifications on top of the latest svn changes.  I told you it was cool!
 
We've now handled our first set of requirements.  Next, we'll tackle setting up a public git repo.
 
== Making a public repo  ==
 
git wants you to share.  You can share your repo using ssh, http or the git protocol (most efficient but requires port 9418 to be available).  We'll keep it simple and secure here and use ssh.  Assuming you can access your public server with ssh, you can use it for your git public repo.
 
Next, we take our current repo, with the "master" svn-based branch and the "mybranch" derivative, and clone it into a special repo marked as public.  The --bare option tells git that this repo will just track changes from others, but not have its own working set.
ssh server
cd git_public/mythtv_svn
If the first repo was created on a client, not the publicly-accessible server:
git clone --bare ssh://client1/path/to/git/trunk trunk
If the first repo was on the server already, you can just use a local path instead of the ssh url:
git clone --bare /path/to/repo1/trunk trunk
Now you have a public shared repository.  You can now change mybranch and push to and pull from your public repo to your heart's content.
 
== Sharing changes on a public repo  ==
 
We're now sharing two branches on the public repo.  Do we need to share the svn branch?  We could, but things will get complicated fast.  It is a big step to get the latest svn code and rebase your changes to it.  We really don't want to have to go through that on every repo we have.  The more efficient way is to perform the rebase on one repo, and force-push it through to every other repo.  Note: if you're not convinced, check out [[How (not) to retrofit a repo connection to svn|some of the difficulties I had]] trying to sync the svn branch.


So we can kill the master branch in the public repo (yes, sounds scary, but master is just another branch).  You'll need to pull off a little trick to do this.  You can't just switch from the master branch in the bare public repo, because it doesn't have a working set.  Instead, just directly edit .git's HEAD.
  git svn dcommit
ssh server
cd git_public/mythtv_svn/trunk
Now edit HEAD, changing from:
ref: refs/heads/master
to:
ref: refs/heads/mybranch
You've just manually changed the active branch on a 'bare' repository.  Now you can delete master.
git branch # to verify that mybranch is active
  git branch -d master


== Set up a second client ==
== Set up a second client ==


Let's set up a repo on a second client that uses the public repo.  We start out by setting up client2 just like we did with client1:
We now have powerful direct two-way interaction with someone else's svn repository.  The client you have already configured will remain the primary means by which you rebase from and commit to svn.  But you're not much of a developer if you're chained to one machine.  git makes it very easy to clone your repo at other locationsLet's try it out.  For this approach, I'm assuming you have ssh access to the other locations.
  ssh client2
  ssh client2
  cd my_git_repos/mythtv_svn
  cd ampache_ext
  git svn clone -r15502 http://svn.mythtv.org/svn/trunk trunk
  git svn clone git+ssh://client1/ampache_ext/trunk ampache_ext
git svn rebase
Wow, seriously that's it? Yep.   
git checkout -b mybranch
Now we connect it to the public repo.  We retrofit the dependency on the public repo by editing the .git/config file (don't worry, you're supposed to!  and it's easy...).  It should look something like this:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[svn-remote "svn"]
url = http://svn.mythtv.org/svn/trunk
fetch = :refs/remotes/git-svn
Add this to the bottom:
[remote "origin"]
        url = server.com:/git_public/mythtv_svn/trunk
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "mybranch"]
    remote = origin
    merge = refs/heads/mybranch
We're all set now.  Pull the latest changes from the public repo if you like:
git pull
 
== Rebasing from SVN across your repos  ==
 
When you're ready to grab the latest SVN changes and rebase your code on top, make sure your working repos are all committed and updated.  Then sync each of them to the same SVN revision:
ssh client1
git checkout master
git svn rebase
ssh client2
git checkout master
git svn rebase
Now you're ready to rebase.  Note that you will have to work through any conflicts during the last step.
ssh client1
git checkout mybranch
git rebase --interactive master
Once you're happy with the rebase, force-push the result to the public repository.  We have to do this because our branch has a new SVN head!
git commit -a
git push -f
Now pull to the other working repositories.  First, we flush our changes in mybranch, since we'll be repulling them.  This is essential or git will try to preserve two sets of the same changes, yuck.  Be careful here, you're basically trashing ALL your local work, make sure you know that's what you want!  :>
ssh client2
git checkout mybranch
git reset --hard master  # mybranch now looks like master!
Now pull.  I've found that the [--rebase] option reduces chances of conflicts.
git pull --rebase
If you broke the rule about keeping your repos in sync before attempting this, it will leave you with a mess of conflicts.  It happens.  But it's easy enough to reset.  Make sure client2 doesn't have any new changes, we're going to force it to match client1 now.  We'll back up the config file to preserve our branch connections:
cp .git/config .git/config_backup
git checkout mybranch
git reset --hard
git checkout master
git branch -D mybranch
git branch mybranch
  cp .git/config_backup .git/config
git checkout mybranch
git pull --rebase
 
== Setting up a third client at a later date ==


You can easily set up another client at a later dateJust follow the same directions for setting up the second client, and then rebase across your reposThe best part is that you can pull down only the newest svn revision(s) - you don't have to pull down all the revisions on the other clientsAs long as all the clients are synced up to the same "most-recent" revision, they'll sync perfectly to each otherSweet!
Another bit of advice to ignore at your own peril:>  Always try to push changes on these secondary clients back to the primary repo after each sessionYou'll go crazy with trying to track your changes if you don't.  You can DO that with git, if you need to, but you're just making your life more difficultCommitting is as simple as this:


You can use this strategy to clean out older SVN revisions if you've been pulling down from SVN for a while. Just re-set up an existing client as if it were new, following the instructions above.
git commit -a -m "My changes"
  git push


== Result  ==
I would make it even easier, by putting this in a commit script that you can fire off in an instant.


You're now a git proYou're ready to script all those steps and get up to top speedFinally, as the reward for slogging through all this, here's a diagram of what you've got - seriously cool! :>
There are a couple other tools to mention at this time.  On linux, the command line rules and git is your friend.  Elsewhere (read Mac and Windows), I recommend setting up [http://www.eclipse.org Eclipse] and [http://www.eclipse.org/egit/ Egit]Egit will provide you with a really nice gui to help you survive in those hostile environments :>, and Eclipse is actually a good IDE to use there, as wellAll IMHO.  Wait, that wasn't fair to OS X, git works just fine from the command line.  And the Eclipse solution works just fine on linux.  So, there you have it - choice. Yay.


                    repo1
That's the quick basic rundown, you'll learn as you go. The biggest challenge will be resolving those conflicts, but keep a cool head and you'll be a pro in no time.
            mybranch<->master
            /                \
  public repo                  svn repo
            \                /
            mybranch<->master
                    repo2


Comments welcome on the [http://news.thedigitalmachine.com/2008/10/20/multiple-repo-subversion-sourced-git-howto-updated/ blog]...
Comments welcome on the [http://news.thedigitalmachine.com/2010/09/06/work-on-an-open-source-project-from-anywhere/ blog]...

Latest revision as of 16:41, 6 September 2010

Here are quick and easy steps to set up git to overlay your changes on an open-source project that uses svn.

We'll jump right in. If you want more of an introduction, or if the language just doesn't click, read through the overview, as well as this more-detailed approach, first.

Requirements

Here's the software you need:

Make sure you have git configured properly, including assigning your name and defining your editor. It's probably worthwhile to grab the code directly from svn and make sure you can compile it, if you have the time or need.

Here are our goals:

  • Set up one repository for interaction with svn:
    • Get latest svn code
    • Merge differences between new svn code and your overlay
    • Commit your changes to svn repository
  • Set up any number of additional repositories to work on your code at any location

Setting up your own copy of svn

NOTE: I'm going to use ampache in this example, because that's what I wanted to work on.

Create a local git repository of the project's subversion repository, using [git svn]. First, browse to the project's svn repo to find the current "revision number". By default, git will pull down every revision ever made in svn. It's very likely you will actually only want the most recent (or at least fairly recent) revision; use the current revision number to make your decision. Next:

ssh client1
cd ampache_ext
git svn clone -r2451 https://svn.ampache.org/ trunk

We now have a repo of the project under git control. Let's immediately update it, so we know we have the latest revision:

git svn rebase

You can actually start editing on the project immediately, if you have an itch to scratch. Note that git has placed the code under the default "master" branch; that's going to be good enough for us in this approach.

As you edit, you're only changing local files. git refers to the locally modified files as your "working set". git doesn't track anything about your working set - you're free to abuse it as you see fit. If you want to keep changes you've made, you need to add each file you've changed or added to the git "index" (also called the "cache"). Then, when you're ready, you commit all the indexed changes to the repository in one single commit. So git provides the "index" as a space in which you can manage your next commit.

Let's say you've changed play.php. Here's how to commit your change to your git repo (but not svn yet):

git add play.php
git commit -m "I updated the play page."

The index is an extra layer that you don't usually have with cvs/svn. It's totally a benefit, and you can ignore it by doing the add and commit in one step. This will add every file that is already in the repo and has been modified to the index, then commit, in one step ("cvs/svn style"):

git commit -a -m "I changed this to that."

Note that this will not add new files you've created to the repo, you'll need to specifically use [git add] for that.

So over time, you'll have a series of commits. Later, you'll want to get the latest svn changes, and combine them with your changes. Here's what is going to happen:

  • git saves off your changes, reverting the repository to the last svn revision you grabbed
  • git grabs the latest svn changes (from the last svn revision you grabbed to the latest available revision)
  • git drops the changes right on top of your repo - the changes will fit perfectly
  • git drops your changes back on top of the latest svn

This is as smooth as it could possibly be. Here's the command - yes it's really this simple:

git svn rebase

Now, that doesn't mean it will always be easy. If there is any significant change to the same file in both the latest svn changes and your changes, you are obviously going to need to merge the two. This is a basic truth.  :>

Committing to the svn repo

Now you are keeping your personal masterpiece updated with the latest changes from the open-source developers. But we want to contribute, right? Well, once you have permission to do so, git makes it pretty easy, once again.

A minor bit of advice: no matter what language or tech you are using, try to extend others' code with grace. Try not to jump into the middle and start refactoring like a banshee. You will pay the piper when it comes time to merge. Try to extend what is there through derived or extension classes, try to keep your code in separate modules, etc. Anything you can do to separate your work will be worth the effort.

If you are like me, you will have a series of commits, some major, some just being fixes for something stupid you did. We all have our egos to deal with - that's partly why we're in this to begin with - so what can we do to clean up before we post everything for the world to see? git to the rescue once again. We're going to rebase our changes again, packaging them up into something more attractive. Viva la rebase!

First, make sure you have grabbed the latest svn revisions, to keep things as simple as possible. See the previous section if you have ADD and can't remember how. You may still collide with changes coming in (which you can handle), but this will reduce the chances.

Next, we're going to check on the history of the repo. We're specifically looking at our most recent set of changes that have not yet been published. We want to find the exact point where they begin. Try this:

git log

You will see a list of commits. Navigate down the list until you find the first local commit for your latest set of changes. Copy the hash that identifies the commit JUST BEFORE your first new commit. That is, the last known commit from svn. Now paste it into this command:

git rebase -i #hash_number#

This is going to work some magic for you. It will pull together all your recent commits and display the commit messages for you in your editor. Here's what you get to do in this kind of rebase. You can remove a commit (probably not a good idea). You can pick a commit (this just keeps the commit as-is). And, the most cool of all, you can squash a commit. Squashing a commit combines it with the previous commit. So, before I drop my cruft on some other kind soul's svn repo, I like to squash all my local commits into one. To do so, put an "s" in front of all your commits after the first one. Once you exit the rebase edit, your editor will display all the messages for the commits that were squashed together. Now you can rewrite history! What a beautiful thing. Simply delete all the embarrassing commits (ha!) and rewrite the message as you need to sound like you knew what you were doing all along. How sweet is that. Once you exit the editor, git will squash the commits into one, and apply the new commit message. You are now a hero, take a bow.

The commit should now go smooth as butta. The commit and commit message is already queued up, so this just seals the deal:

git svn dcommit

Set up a second client

We now have powerful direct two-way interaction with someone else's svn repository. The client you have already configured will remain the primary means by which you rebase from and commit to svn. But you're not much of a developer if you're chained to one machine. git makes it very easy to clone your repo at other locations. Let's try it out. For this approach, I'm assuming you have ssh access to the other locations.

ssh client2
cd ampache_ext
git svn clone git+ssh://client1/ampache_ext/trunk ampache_ext

Wow, seriously that's it? Yep.

Another bit of advice to ignore at your own peril.  :> Always try to push changes on these secondary clients back to the primary repo after each session. You'll go crazy with trying to track your changes if you don't. You can DO that with git, if you need to, but you're just making your life more difficult. Committing is as simple as this:

git commit -a -m "My changes"
git push

I would make it even easier, by putting this in a commit script that you can fire off in an instant.

There are a couple other tools to mention at this time. On linux, the command line rules and git is your friend. Elsewhere (read Mac and Windows), I recommend setting up Eclipse and Egit. Egit will provide you with a really nice gui to help you survive in those hostile environments :>, and Eclipse is actually a good IDE to use there, as well. All IMHO. Wait, that wasn't fair to OS X, git works just fine from the command line. And the Eclipse solution works just fine on linux. So, there you have it - choice. Yay.

That's the quick basic rundown, you'll learn as you go. The biggest challenge will be resolving those conflicts, but keep a cool head and you'll be a pro in no time.

Comments welcome on the blog...