Due to some choices in the past (commiting binaries and other generated files) our repo has grown to 1.1GB. A clone can take quite a while, and we're close to GH soft limit.
For years we've wanted to take the cold shower and reduce the size of the repo. I've spend the past 2 days looking at our history and finding out what we could cleanup. Some stuff has migrated to dedicated repos (tutor-content and it's history is now in rascal-website, java m3 has migrated to java-air, and so forth).
Based on that research I've constructed the following command:
../git-filter-repo --replace-refs update-and-add \
--invert-paths \
--path src/boot \
--path doc/ \
--path lib/ \
--path src/org/rascalmpl/library/experiments/RascalTutor/Courses \
--path src/org/rascalmpl/library/analysis/m3 \
--path test/org/rascalmpl/test/data/m3 \
--path src/org/rascalmpl/library/analysis/text/search \
--path src/org/rascalmpl/courses \
--path src/org/rascalmpl/library/demo/jhotdraw \
--path src/benchmark \
--path benchmark \
--path benchmarks \
--path-glob '*.bin' \
--path-glob '*/*RascalRascal.java' \
--path-glob 'flameGraph*.*' \
--path-glob '*.out' \
--path-glob '*.ods' \
--path-glob '*.rvm' \
--path-glob '*.orig' \
--path-glob '*.rvm.ser.gz' \
--path-glob '*.rvm.json.gz' \
--path-glob '*.html' \
--path-glob '*.tbl' \
--path-glob '*.tpl' \
--path-glob '*.xz' \
--path-glob '*.class' \
--path-glob '*.igr' \
--path-glob '*.rsc.bin' \
--path-glob '*.rsc.bin.cpgz'
This runs in 20s, and takes the repo down to 86MB. The biggest change is that some older branches & tags won't be able to compile to a working jar anymore (as we took out m3 & lucene, and very old versions (pre 2014) of the rascal parser). But we have jars for that.
Aside from the process of everyone having to take a fresh clone of their checkout, the biggest issue will be all the places where old commit references are stored, for example in zenodo but also closed PRs where the branches have been deleted. GH will not be able to reconnect those. I've enabled replacement refs, so there is information in the .git about the old refs, but the most up to date documentation suggests that github ignores that.
So it would mean it's a bit harder to use GH interface to analyze old & closed PRs.
Due to some choices in the past (commiting binaries and other generated files) our repo has grown to 1.1GB. A clone can take quite a while, and we're close to GH soft limit.
For years we've wanted to take the cold shower and reduce the size of the repo. I've spend the past 2 days looking at our history and finding out what we could cleanup. Some stuff has migrated to dedicated repos (tutor-content and it's history is now in rascal-website, java m3 has migrated to java-air, and so forth).
Based on that research I've constructed the following command:
This runs in 20s, and takes the repo down to 86MB. The biggest change is that some older branches & tags won't be able to compile to a working jar anymore (as we took out m3 & lucene, and very old versions (pre 2014) of the rascal parser). But we have jars for that.
Aside from the process of everyone having to take a fresh clone of their checkout, the biggest issue will be all the places where old commit references are stored, for example in zenodo but also closed PRs where the branches have been deleted. GH will not be able to reconnect those. I've enabled replacement refs, so there is information in the .git about the old refs, but the most up to date documentation suggests that github ignores that.
So it would mean it's a bit harder to use GH interface to analyze old & closed PRs.