Downloading source code from SVN/Git repository over HTTP

    Sites hosting open source projects provides an online viewer for browsing source code without actually checking out source code using SVN/Git clients. Checking out entire repository will take long time. Also with Git there is no straight forward way to checkout only a particular directory. Cloning Git repos takes long time as Git downloads entire repository to local machine. Even with sparse checkout, Git downloads entire repository. When bandwidth is a concern, one cannot checkout entire repository.

    One can use GNU Wget to recursively download files from online code repositories. For windows this can be downloaded from http://users.ugent.be/~bpuype/wget/

    Command to download a directory and its child directories and all files in it recursively excluding index.html is below. This will not download parent directories and files from external sites.

wget --cut-dirs=2 --level=15 --include-directories=src/main/java --recursive --no-parent --no-host-directories --reject=index.html -e robots=off --no-clobber  http://sourcesite.com/src/main/java

    Be careful with slashes. When I used backslash, it did not work.

--cut-dirs=N This ignores directories of N levels from root directory of the URL
--level=N Downloads files from N level of directories. Default is 5 levels
--include-directories=src/main/java Include only this directory and its child directories
--recursive Recursively download
--no-parent Do not go to parent directory of the given URL.
--no-host-directories Without this option wget creates a directory by the host name of the server
--reject=index.html Do not create index.html file
-e robots=off Exclude robots.txt when crawling the site
--no-clobber Do not overwrite existing files


    Here are few of the source code repository addresses.

http://svn.apache.org/repos/asf/
http://selenium.googlecode.com/git

    For Google Code sites use projectname.googlecode.com/git for Git repo or projectname.googlecode.com/svn if it is a svn repo.


Comments

Popular posts from this blog

Multi Tenant applications using PostgreSQL Row Level Security

Recovering Dell Laptop Windows OS using factory image from Linux bootable USB drive

How to find actual Tomcat instance in a cluster