Git Mirror

git_mirror

While creating a git mirror is as simple as git clone --mirror, unfortunately this git command does not support git submodules or lfs. The subcommands mirror, push and clone in this file, and associated functions, help in creating and subsequently cloning a mirror of a project with submodules and/or git lfs.

Example

Example usage:

  1. git_mirror mirror https://github.com/visionsystemsinc/vsi_common.git main - Mirror the repository and recursively create mirrors of all submodules currently in the main branch.

  2. Transfer vsi_common_prep/transfer_{date}.tgz to your destination

  3. On the destination, create a directory, e.g., vsi_common_prep, and move the archive into it

  4. Extract the archive (the archive will extract directly into this directory)

  5. Write an info.env file mapping each repository to its mirrored URL:

repos[.]=https://git-server.com/foobar/vsi_common.git
repos[docker/recipes]=https://git-server.com/foobar/recipes.git
  1. git_mirror push ./info.env ./transfer_extracted_dir/ - Push the mirrored repository and all submodules to a new git server as defined by info.env

  2. git_mirror clone ./info.env ./my_project_dir/ - Clone recursively from the new mirror

next_section

Change the color of the text output to sdtout/stderr

Arguments:

[${@}] - An optional string (or list of strings) to harold the next section

Output:

A temporary file is created which stores the index into the COLORS array

Change the current text color. A temporary file is created to track the current index into the COLORS array; this index is updated with each call to this function. This temporary file will automatically be deleted

get_config_submodule_names

Get list of initialized submodules, non-recursive

Output:

A newline separated list of submodules for the current git repository

Get a list of submodules of the current git repository. This command is non-recursive, i.e., submodules of submodule, etc. are not returned. An implementation of this feature is provided for older versions of git (<2.6)

Example

$ get_config_submodule_names() # from within ./vsi_common/
submodule.docker/recipes

Note

Unlike git submodule foreach --quiet 'echo "${name}"', this function works for submodules that have been init’d but not updated

git_mirror_has_lfs

Is git lfs available for the specified implementation of git

Output:

0 - git lfs is available 1 - git lfs is not available

Internal Use:

__git_mirror_has_lfs - A variable to save this state

clone_submodules

Mirror a submodule and all of its submodules (recursively)

This is a helper function to git_mirror_main. Create a mirror of each submodule in a git repository, recursively. Each mirror is stored according to its full relative path in the project repository in a unique temporary directory created in the GIT_MIRROR_PREP_DIR. Submodules of a submodule are processed recursively in a depth-first fashion.

WARNING: git submodule foreach runs commands via sh because git is weird; however I start bash and source this script for its vars and functions, so it’s really bash again.

Parameters:

GIT_MIRROR_PREP_DIR - The directory in which to mirror the repositories [base_submodule_path] - The (relative) path from the project repository to a submodule with, potentially, submodules of its own that need to be mirrored/updated. This must also be the CWD. If unset, mirror the submodules of the parent repository.

Output:

A mirrored submodule located at GIT_MIRROR_PREP_DIR/{temp_dir}/base_submodule_path

Note

This function assumes base_submodule_path is also the PWD

update_submodules

Init/update a submodule and any of its submodules (recursively)

This is a helper function to git_clone_main. Recursively clone a submodule mirrored with git_mirror_main, and fixup the submodules’ remote URLs according to the mapping specified by $1.

WARNING: git submodule foreach runs commands via sh because git is weird; however I start bash and source this script for its vars and functions, so it’s really bash again.

Arguments:

$1 - A file specifying the mapping between each repository and its mirror URL; see _git_mirror_load_info [base_submodule_path] - The (relative) path from the project repository to a submodule with, potentially, submodules of its own that need to be cloned/updated. This must also be the CWD. If unset, update the submodules of the parent repository. [prefix] - A variable set when calling git submodule foreach (and, by necessity, re-exported by this function) to the submodule path (sm_path) from the .gitmodules file. After git v1.8.3, prefix was replaced with displaypath [displaypath] - A variable set when calling git submodule foreach (and, by necessity, re-exported by this function) to the relative path from the current working directory to the submodules root directory

Output:

The recursively cloned submodule

Note

This function assumes base_submodule_path is also the PWD

sync_submodules

Sync the submodules to the mirrored remote (non-recursive)

This is a helper function to update_submodules. Sync the submodules’ remote URLs (non-recursively), a la git submodule sync, however, instead of referring to the .gitmodules file, use the mapping specified by repo_paths and repo_urls.

Arguments:

$1 - The (relative) path from the project repository to a submodule with, potentially, submodules of its own that need to be sync’ed. This must also be the CWD. If unset, sync the submodules of the parent repository.

Parameters:

repo_paths - The array of (relative) paths from the root of the project repository to each submodule (recursively); see _git_mirror_load_info repo_urls - The corresponding URL of each repo_path

Note

This function assumes $1 is also the PWD

git_mirror_main

Mirror the main repository and all submodules (recursively)

Downloads a mirror of a git repository and all of its submodules. The normal git clone --mirror command does not support submodules at all. This at least clones all the submodules available in the specified branch (git’s init.defaultBranch by default).

The script creates a directory, referred to as a prep directory (or prep_dir), which will contain all of the mirrored repositories plus a single transfer_{date}.tgz archive file containing all of these repositories, lfs objects, etc… Only this tgz file needs to be transferred to your destination.

Subsequent calls to git_mirror_main can use the existing prep directory as cache, updating faster than the first time.

Subsequent calls also create a second tgz file, transfer_{date1}_transfer_{date2}.tgz (on supported platforms). This is an incremental archive file. Instead of having to bring in an entire archive, only the incremental file is needed (plus the original full archive).

After you have moved the transfer archive to its destination, you can use git_push_main to push these mirrored repositories to a new git server.

Arguments:
  • $1 - URL of the main git repository. On subsequent calls to this function, the prep (cache) dir created by this function can be used in lue of the repository’s URL

  • [$2] - The git branch from which to identify the submodules. Default: git’s init.defaultBranch

Parameters:

[GIT_MIRROR_PREP_DIR] - The output directory in which to mirror the repositories; Default: ${PWD}/{repo_name}_prep

Output:
  • A prep directory which will contain all of the repositories plus a single transfer_{date}.tgz

  • GIT_MIRROR_PREP_DIR - The path to the mirrored repositories

  • GIT_MIRROR_MAIN_REPO - The main repository’s name. Based off of $1; e.g., vsi_common if the URL is https://github.com/VisionSystemsInc/vsi_common.git

Example

Mirror the vsi_common repository and all of its submodules in a directory called ./vsi_common_prep. Then, create an archive file that can be transferred to your destination.

git_mirror_main https://github.com/visionsystemsinc/vsi_common.git main
# produces ./vsi_common_prep/transfer_2020_03_02_14_16_09.tgz

Example

Calling git_mirror again will use the vsi_common_prep dir as a cache, and then create an incremental file.

git_mirror_main vsi_common_prep
# produces ./vsi_common_prep/transfer_2020_03_02_14_24_12_transfer_2020_03_02_14_16_09.tgz

Example

Both of these examples result in identical mirrors on your destination:

tar zxf transfer_2020_03_02_14_16_09.tgz
tar --incremental zxf transfer_2020_03_02_14_24_12_transfer_2020_03_02_14_16_09.tgz
tar zxf transfer_2020_03_02_14_24_12.tgz

Note

git_mirror_main does not mirror all submodules that have ever been part of the repo, only those from a specific branch/SHA/tag you specify (git’s init.defaultBranch by default). This is because trying to mirror all submodules from the past could be very lengthy, and is very likely to include URLs that do not exist anymore.

Bugs

If the first argument is a local path to a git repository (as opposed to a URL or a path to an existing prep dir), it must be an absolute path.

git_mirror_repos

Mirror the main repository and all submodules (recursively)

Downloads a mirror of a git repository and all of its submodules. The normal git clone --mirror command does not support submodules at all. This at least clones all the submodules available in the specified branch (git’s init.defaultBranch by default).

Creates a directory, referred to as a prep directory (or prep_dir), which will contain all of the mirrored repositories. Subsequent calls to git_mirror_repos can use the existing prep directory as cache, updating faster than the first time.

Arguments:
  • $1 - URL of the main git repository. On subsequent calls to this function, the prep (cache) directory created by this function can be used in lue of the repository’s URL

  • [$2] - The git branch from which to identify the submodules. Default: git’s init.defaultBranch

Parameters:

[GIT_MIRROR_PREP_DIR] - The output directory in which to mirror the repositories; Default: ${PWD}/{repo_name}_prep

Output:
  • A prep directory which will contain all of the repositories

  • GIT_MIRROR_PREP_DIR - The path to the mirrored repositories

  • GIT_MIRROR_MAIN_REPO - The main repository’s name. Based off of $1; e.g., vsi_common if the URL is https://github.com/VisionSystemsInc/vsi_common.git

Note

git_mirror_repos does not mirror all submodules that have ever been part of the repo, only those from a specific branch/SHA/tag you specify (git’s init.defaultBranch by default). This is because trying to mirror all submodules from the past could be very lengthy, and is very likely to include URLs that do not exist anymore.

Bugs

If the first argument is a local path to a git repository (as opposed to a URL or a path to an existing prep dir), it must be an absolute path.

See also

git_mirror_main

archive_mirrors

Create an in-place archive of a directory

Create a transfer_{date}.tgz archive file of a directory. This archive is created within the same directory.

Subsequent calls also create a second tgz file, transfer_{date1}_transfer_{date2}.tgz (on supported platforms). This is an incremental archive file. Only the incremental file is needed (plus the original full archive).

Arguments:
  • $1 - A directory to archive in-place

Output:

A transfer_{date}.tgz archive file and, on subsequent calls, an incremental transfer_{date1}_transfer_{date2}.tgz

_git_mirror_load_info

Load the mapping between each repository and its mirror URL

Arguments:

$1 - A file specifying the mapping between each repository and its mirror URL

Output:

repo_paths - The array of (relative) paths from the root of the project repository to each submodule (recursively). (That is, a path as printed by echo . && git submodule foreach -q 'echo "${displaypath}"' assuming the CWD is the root of the project repository) repo_urls - The corresponding URL of each repo_path

Example

$ cat info.env
repo_paths=(
  .
  docker/recipes
)
repo_urls=(
  https://git-server.com/foobar/vsi_common.git
  https://git-server.com/foobar/recipes.git
)

Or alternatively, using associative arrays:

repos[.]=https://git-server.com/foobar/vsi_common.git
repos[docker/recipes]=https://git-server.com/foobar/recipes.git
_git_mirror_get_url

Return the URL for a given submodule

Arguments:

$1 - A path that can be found in repo_paths

Parameters:

repo_paths - The array of (relative) paths from the root of the project repository to each submodule (recursively); see _git_mirror_load_info repo_urls - The corresponding URL of each repo_path

Output:

The URL corresponding to $1

git_clone_main

Clone recursively from the new mirror

Once the repository has been mirrored to the new git server with git_push_main, it can be cloned. However, because the .gitmodules file will point to different URLs than the mirrors, and changing the .gitmodules file will change the repo, which we don’t want to do, we need to make a shallow clone of the repository, init the submodules, modify the submodules’ URLs, and then finally update the submodules. And all of this has to be done recursively for each submodule. As you can tell, this is very tedious, so this script will do it all for you.

Arguments:

$1 - A file specifying the mapping between each repository and its mirror URL; see _git_mirror_load_info [$2] - The directory in which to clone the repo

Example

git_clone_main init.env ~/
git_push_main

Push the mirrored repository and all submodules to a new git server

After transferring the archive file created by git_mirror_main to a prep directory on your destination and extracting the archive into it, this function pushes all the mirrored repositories in the extracted archive to your own mirrors on a new git server.

Note

The mirrors must be initialized on the git server.

However, because the URLs for your mirrors will be different from the original repo URLs in .gitmodules, and modifying the URLs will change the git repo, which we do not want to do, you must create a file specifying the mapping between each repository and its mirror URL. The main repo is referred to as . while the rest of the repos are referred to by the relative path with respect to the main repo (e.g. external/vsi_common). These need to be stored in an associative array called repos.

Arguments:

$1 - A file specifying the mapping between each repository and its mirror URL; see _git_mirror_load_info $2 - The mirrored repositories (extracted from the archive) created by git_mirror_main

Parameters:

[GIT_MIRROR_FORCE_PUSH_REFS] - A flag specifying whether to force push refs, should it be necessary. Default: 1 (force-push refs)

Example

In this example, the main repo’s mirror URL is https://git-server.com/foobar/vsi_common.git, and the submodule stored at ./docker/recipes has the URL https://git-server.com/foobar/recipes.git. This file is also used by git_clone_main. (Any valid git URL format can be used.)

$ cat info.env
repos[.]=https://git-server.com/foobar/vsi_common.git
repos[docker/recipes]=https://git-server.com/foobar/recipes.git

$ git_push_main info.env vsi_common_prep
git_tag_main

Tag the refs in the mirrored repository and all submodules

Before pushing the mirrored repository and all of its submodules to the new git server with git_push_main, tag the refs so that if, during the next transfer, they become dereferenced (due to a force push, which is sometimes necessary), they are not lost. For a ref of the form refs/heads/main, this (annotated) tag takes the (long) form of refs/tags/just_git_mirror/{datetime}/heads/main

Arguments:

$1 - A file specifying the mapping between each repository and its mirror URL; see _git_mirror_load_info $2 - The mirrored repositories (extracted from the archive) created by git_mirror_main

Parameters:

[GIT_MIRROR_TAG_REFS_REGEX] - An extended regular expression specifying which refs to tag. The refs should be matched in their long form, e.g., refs/heads/main or refs/tags/v1.0.0. Default: ^refs/heads/.+

See also

git_push_main

Note

Because this function creates a tag directory called just_git_mirror/ (note the trailing slash), a user can no longer create a tag called just_git_mirror