Grauw’s blog

A colleague of mine was asking whether Mercurial supported shallow clones. The short answer is no. The slightly longer answer is, it’s under development.

But if you ask me, shallow clones aren’t really needed. Aside from being sort of against the point of having a DVCS, you don’t actually gain that much. I did some measurements (on Windows):

  • Mozilla-central repository:
    • Files: 41926
    • Repository size: 241 MB
    • Working copy size: 287 MB (87,4 MB zipped)
    • Clone time: 276 seconds (0,8 MB/sec)
  • Python trunk repository (going back to 1990!):
    • Files: 4199
    • Repository size: 97,1 MB
    • Working copy size: 55,7 MB (14,8 MB zipped)
    • Clone time: 82 seconds (1,2 MB/sec)
  • Mercurial repository:
    • Files: 1144
    • Repository size: 15,4 MB
    • Working copy size: 6,86 MB (2,08 MB zipped)
    • Clone time: 25 seconds (0,6 MB/sec)
  • Backbase 4 repository (converted from SVN):
    • Files: 4691
    • Repository size: 77,9 MB
    • Working copy size: 74,6 MB (33,6 MB zipped)
  • Backbase cobrowse repository (converted from SVN):
    • Files: 1703
    • Repository size: 79,6 MB
    • Working copy size: 39,3 MB (24,0 MB zipped) (was 87,2 MB 400 revisions ago!)

The repository size is roughly what will be transferred over the wire when cloning. The zipped size gives an indication of the theoretical optimal case where you would retrieve just the files and no history. Note that this does not include metainformation which would also get transmitted with shallow clones — e.g. Python’s changelog is ~10 MB. Finally clone time is the time it takes to do hg clone -U, which skips updating the working directory so is a reasonable approximation of the time spent downloading and creating the repository.

Also of note are some further measurements I did with the Mozilla repository. When I copy it on my hard drive between different disks, it actually also takes a lot of time: 174 seconds! (The Python repository took 23 seconds.) Creating a working copy from the repository also takes a long time, 241 seconds. My guess is that this is likely because of the large amount of small files.

So in other words, even if you would cut the download time of the Mozilla repository by making the shallowest of clones, much cloning time is still spent creating the large amount of small files. A little math suggests that for this repository, you would only be able to bring down the clone time by some 20%.

Conclusion

Looking at these numbers, first of all, in most cases just thinking about how ‘shallow’ you want to clone a repository is probably already going to take more time than to just clone it ;p. And what would you gain? A slightly shorter clone time perhaps, but you lose the ability to look at the full history. And how often do you do a complete clone? Only the first time.

For local clones, Mercurial and git actually create hard links, so making a local clone is much faster (74 seconds Mozilla, 11 seconds Python) and hardly takes any disk space. Clones over a local network will of course be faster as well. And as for slow connections, because in the end the repository is completely hosted locally and few things require interaction with a central server, a DVCS is already very friendly towards those.

Finally, consider also that internet connections get faster every year, so even though repository size grows steadily over time, this does not necessarily have to become a problem. And hey, if it does, by that time Mercurial will have shallow cloning too :).

Grauw

Comments

Bundles to the rescue by C2H5OH at 2010-04-09 11:26

Using downloadable bundles (daylies or weeklys) can speed up clone time. For example, downloading a bundle locally and cloning from it, then pulling from the remote repository if there were some new changesets not included in the bundle.

hg convert by Christopher Cabanne at 2012-06-30 02:27

One can use the “hg convert” extension to convert a large mercurial repository into a version will less history. Note that all of the changesets hashes will be converted in the process. See http://mercurial.selenic.com/wiki/ConvertExtension.

by Joris at 2013-12-09 15:07

My clone of python weighs in at 51GB. Does that mean that mean that it has grown by a factor of 10 in the last three years?

Re: Python repository size by Grauw at 2014-01-05 17:12

@Joris I just made a fresh clone of Python, the repository is now 221 MB and the working copy is 75 MB. Took me a minute or so to download the entire thing.

I don’t know how your clone got to be the size it is now, but you can try making a clone of your local repository using the hg clone --pull option, then replacing the repository directory in your original clone with the new one.

by Seth Williams at 2014-08-28 12:11

Great article Lauren on Mercurial shallow clones and repository sizes.

by John Doe at 2015-07-18 01:14

Check your fast internet connection privilege.

I’m on a 1.5 Mbps connection (USA). Shallow clone can be much faster than deep clone depending on the size of the repository.

Example:
$ time git clone --depth 1 https://github.com/mpv-player/mpv
Cloning into ‘mpv’...
remote: Counting objects: 644, done.
remote: Compressing objects: 100% (628/628), done.
remote: Total 644 (delta 50), reused 111 (delta 4), pack-reused 0
Receiving objects: 100% (644/644), 2.86 MiB | 205.00 KiB/s, done.
Resolving deltas: 100% (50/50), done.
Checking connectivity... done.

real 0m18.002s
user 0m0.543s
sys 0m0.293s

$ time git clone https://github.com/mpv-player/mpv
Cloning into ‘mpv’...
remote: Counting objects: 240954, done.
remote: Compressing objects: 100% (110/110), done.
remote: Total 240954 (delta 61), reused 0 (delta 0), pack-reused 240844
Receiving objects: 100% (240954/240954), 68.88 MiB | 175.00 KiB/s, done.
Resolving deltas: 100% (189239/189239), done.
Checking connectivity... done.

real 6m56.116s
user 0m25.013s
sys 0m7.427s

I’ve yet to successfully clone a Mercurial repository because it takes so long.

shallow clones by Select at 2015-10-26 09:05

I can see that you are an expert at your field. What exactly is the shallow clones and where can I read more about it ? Thanks for all your help.