Skip Menu |

This queue is for tickets about the Git-FastExport CPAN distribution.

Report information
The Basics
Id: 70695
Status: open
Priority: 0/
Queue: Git-FastExport

People
Owner: Nobody in particular
Requestors: jamesblackburn [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: git-stitch-repo creates unnecessary branches when joining two repositories
Date: Fri, 2 Sep 2011 12:20:14 +0100
To: bug-Git-FastExport [...] rt.cpan.org
From: James Blackburn <jamesblackburn [...] gmail.com>
Hi, I have a test case of two repositories which, when stitched appears to have unnecessary / confusing branching. Reproduction steps: git clone git://github.com/jamesblackburn/org.eclipse.cdt.dsf.git git clone git://github.com/jamesblackburn/org.eclipse.cdt.dsf.ui.git mkdir stitched cd stitched git init git-stitch-repo ../org.eclipse.cdt.dsf:dsf/org.eclipse.cdt.dsf ../org.eclipse.cdt.dsf.ui:dsf/org.eclipse.cdt.dsf.ui |git fast-import If you look at these two repos with gitk --all, nearly all the commits are on the mainline master branch. There are a couple of side branches which are tagged. Importantly (for me) all the 3.X-X tags are on the master branch. Having stitched the repo I end up with a more complicated history: - both masters are now on two different branches - the 3.X-X tags are on different branches The result is I can't unify the tags / branches. If you remove the side branches by removing the tags: cd ../org.eclipse.cdt.dsf.ui/ for tag in `git tag -l v20*` ; do git tag -d $tag ; done git tag -d CDT_6_0_0 CDT_6_0_2 cd ../org.eclipse.cdt.dsf/ for tag in `git tag -l v20*` ; do git tag -d $tag ; done git tag -d CDT_6_0_0 CDT_6_0_2 Then restitching gives nicely linear history. (I just noticed that only the two tags: CDT_6_0_0 CDT_6_0_2 are needed to reproduce this odd behaviour.) Any ideas on what might be the cause of this? Cheers, James
Subject: [rt.cpan.org #70695] AutoReply: git-stitch-repo creates unnecessary branches when joining two repositories
Date: Fri, 2 Sep 2011 12:27:01 +0100
To: bug-Git-FastExport [...] rt.cpan.org
From: James Blackburn <jamesblackburn [...] gmail.com>
Git-FastExport: 0.0.7 (and master from github: https://github.com/book/Git-FastExport) Perl version: 5.8.8
Subject: Re: [rt.cpan.org #70695] AutoReply: git-stitch-repo creates unnecessary branches when joining two repositories
Date: Fri, 2 Sep 2011 13:32:13 +0100
To: bug-Git-FastExport [...] rt.cpan.org
From: James Blackburn <jamesblackburn [...] gmail.com>
This may just be user error... Perhaps I should just ignore the tags / branches marked -N. I was just confused by them appearing when otherwise the tags are re-joined in the stitched repo.
Subject: Re: [rt.cpan.org #70695] AutoReply: git-stitch-repo creates unnecessary branches when joining two repositories
Date: Fri, 2 Sep 2011 14:45:48 +0100
To: bug-Git-FastExport [...] rt.cpan.org
From: James Blackburn <jamesblackburn [...] gmail.com>
Hmm I think it definitely is producing broken output. Having done the steps above: git diff --dirstat master master-A 3.8% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/internal/ui/actions/ 21.7% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/internal/ui/disassembly/ 4.0% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/internal/ui/viewmodel/ 5.8% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/internal/ui/ 6.6% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/breakpoints/ 3.5% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/expression/ 4.4% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/launch/ 5.4% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/numberformat/ 3.2% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/variable/ 4.7% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/viewmodel/ 3.8% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/debug/ui/ 4.0% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/ui/viewmodel/update/ 4.9% dsf/org.eclipse.cdt.dsf.ui/src/org/eclipse/cdt/dsf/ui/viewmodel/ 7.4% dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/ 9.6% dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/debug/service/ 3.7% dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/ So master doesn't contain all the changes from org.eclipse.cdt.dsf.ui and org.eclipse.cdt.dsf.
On Fri Sep 02 09:45:57 2011, jamesblackburn@gmail.com wrote: Show quoted text
> Hmm I think it definitely is producing broken output. Having done the > steps above: >
Thanks for the very complete report. I haven't found time yet to reproduce your steps, and it's been a while since I coded this. I will try to find some time, and make an opinion on the issue. Thanks again, -- BooK
On Tue Sep 06 18:13:49 2011, BOOK wrote: Show quoted text
> On Fri Sep 02 09:45:57 2011, jamesblackburn@gmail.com wrote:
> > Hmm I think it definitely is producing broken output. Having done the > > steps above: > >
> > Thanks for the very complete report. > > I haven't found time yet to reproduce your steps, and it's been a > while since I coded this. > > I will try to find some time, and make an opinion on the issue.
I finally took a little time to look at the issue. The core of the issue is how git-stitch-repo selects where to attach a commit. At some point it picks the "wrong" branch (i.e. not master) on repo B to attach a commit from repo A's master. After that, it is impossible to reconnect both histories. Right now, the algorithm for attaching commits is very coarse-grained. If there are several options, the current options are to pick the (chronologically) first or last commit from the available options, or to pick one at random. Debugging your repositories with first and last, I get the wrong answer both times, because sometimes the "right" commit is the first, and sometimes it's the last. Given how little information there is when making the decision, there's very little that can be done with the current system[*]. In the last few days, I've been exploring a way to improve the decision-making. This might lead to a proper fix. -- BooK [*] Actually, you could pick the "random" selection algorithm and repeat until the result looks right, but the likelyhood of it making the right choice at every point is going to be smaller and smaller with the size of the repositories.
Subject: [rt.cpan.org #70695]
Date: Tue, 2 Jun 2015 15:07:39 -0500
To: <bug-Git-FastExport [...] rt.cpan.org>
From: <dag [...] cray.com>
I know it has been a few years since this was looked at but I just ran into this bug with some SVN repositories converted to git. They are huge repositories so a proper fix for this bug is essential. I don't mind if the stitch takes longer due to reading through the entire histories of the individual repositories first. It will still be faster than writing my own tool. :) Was any progress ever made on this? If there's a prototype fix I'd be happy to test it.
Subject: Re: [rt.cpan.org #70695]
Date: Thu, 4 Jun 2015 12:22:17 +0200
To: "dag [...] cray.com via RT" <bug-Git-FastExport [...] rt.cpan.org>
From: "Philippe Bruhat (BooK)" <book [...] cpan.org>
On Tue, Jun 02, 2015 at 04:10:31PM -0400, dag@cray.com via RT wrote: Show quoted text
> Queue: Git-FastExport > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=70695 > > > I know it has been a few years since this was looked at but I just ran > into this bug with some SVN repositories converted to git. They are > huge repositories so a proper fix for this bug is essential. I don't > mind if the stitch takes longer due to reading through the entire > histories of the individual repositories first. It will still be faster > than writing my own tool. :) > > Was any progress ever made on this? If there's a prototype fix I'd be > happy to test it.
Not much progress. I had an idea on how to deal with this, but never went through. I'm still interested in fixing this, so we could try to work on this together. Are the SVN repositories public? -- Philippe Bruhat (BooK) The best of intentions must still have directions. (Moral from Groo The Wanderer #95 (Epic))
Subject: [rt.cpan.org #70695]
Date: Thu, 4 Jun 2015 12:25:37 -0500
To: <bug-Git-FastExport [...] rt.cpan.org>
From: <dag [...] cray.com>
Unfortunately, no, the repositories are private and there is no hope of making them accessible. Hopefully fixing James' conversion would help with my conversion so I would start there if possible. I would be happy to test things on our repositories if a candidate bugfix appears. I'm not sure how much I could report back other than "it works" or "it doesn't work" and possibly a general description of what happens.
Subject: Re: [rt.cpan.org #70695]
Date: Fri, 5 Jun 2015 01:17:27 +0200
To: "dag [...] cray.com via RT" <bug-Git-FastExport [...] rt.cpan.org>
From: "Philippe Bruhat (BooK)" <book [...] cpan.org>
On Thu, Jun 04, 2015 at 01:25:57PM -0400, dag@cray.com via RT wrote: Show quoted text
> > Unfortunately, no, the repositories are private and there is no hope of > making them accessible. Hopefully fixing James' conversion would help > with my conversion so I would start there if possible.
I was expecting that, given your email domain. ;-) Show quoted text
> I would be happy to test things on our repositories if a candidate > bugfix appears. I'm not sure how much I could report back other than > "it works" or "it doesn't work" and possibly a general description of > what happens.
So, the issue is that when there's a branch in repo A and we need to attach a commit from repo B (assuming the commit from B is part of the current "master" branch), the candidate from A is picked mostly at random. I turns out that if the commit from B is on the master branch and the commit selected from A is on a dead branch (i.e. not merged into the final "master" for A), we end up with the master from B diverging from the master from A. Which is not the desired outcome. At the moment, I use a 1-pass algorithm that reads the history from both repositories as a stream, starting from the oldest commits. But to make the right decision at those crossroads, one needs a global overview of the project: which branches exist in both projects (and will need to be "merged" together), and which commits should be part of that global history. I guess the first step would be to build a small repository set for which we can reproduce the issue. The example repositories I was given are much too big for at-a-glance analysis. In parallel, I need to work on the "second pass", which needs to record which branches each commit belongs to. This is only relevant for branches which exist in both repositories. -- Philippe Bruhat (BooK) All of life is a series of trades. And the more you exchange, the less you have to show for it. (Moral from Groo The Wanderer #4 (Pacific))