How to mirror CPAN
There are several ways to mirror CPAN depending upon what you want to achieve.
How do I create a private or offline mirror?
minicpan from CPAN::Mini is the best tool for this. Also look at CPAN::Mini::Inject which allows you to add your own modules into your private mirror.
Requirements for a full / public mirror
- Good internet connectivity
- Around 1GB of storage space for just the current modules.
- Around 25GB of storage space for the full mirror.
It's highly recommended that you also subscribe to the announcements-only cpan-mirrors mailing list by emailing cpan-mirrors-subscribe at perl.org.
Tools
CPAN::Mini provides you with a minimal mirror of CPAN (the latest version of all modules). This makes working offline easy, it is the best tool if you are running a private mirror.
New: rrr-client allows instant mirroring, and should be used on official public mirrors where possible. See instant mirroring instructions.
rsync is the best tool if you need to mirror the whole of CPAN or if you are providing a public mirror. Rsync Instructions.
Only use FTP if these other methods are absolutely impossible. Never mirror with HTTP - you will end up with a million duplicate files in tens of gigabytes.
Which CPAN Mirror should I use?
You can find your nearest rsync enabled site on http://www.cpan.org/SITES.html, or use mirrors.json especially if you are building a tool which lets the user select a mirror.
You can also sync from rsync://cpan-rsync.perl.org/CPAN/
(the
"tier 1 mirrors"), though you currently might get better
performance from a "local" mirror.
Using rsync
Please limit to once or twice a day. For more frequent updates please see Instant mirroring.
On Unix systems
/usr/bin/rsync -av --delete cpan-rsync.perl.org::CPAN /project/CPAN/
Using 'crontab' you can make rsync run once a day, for example
40 4 * * * sleep $(expr $RANDOM \% 7200); /usr/bin/rsync -a --delete
cpan-rsync.perl.org::CPAN /project/CPAN/
The "sleep $(...);" statement makes the command delay up to 2 hours before
running rsync; the advantage of this is that you (and everybody else) won't
access the mirror at the same time.
Unless you are mirroring to an SSD you might get timeouts using --delete-after when many symlinks are being purged. Using --delete will work properly.
If you have a problem with permissions (files are created with mode
-rw-------), set umask in your cronjob :
40 4 * * * umask 022 ; sleep ... ; /usr/bin/rsync ...
The umask 022 allows rsync to set proper permissions for
files and directories.
On Windows systems
C:\Program Files\Rsync\rsync -av --delete cpan-rsync.perl.org::CPAN /project/CPAN/
Using the 'AT' tool, you can schedule rsync to run daily, for example:
AT 20:00 /every:M,T,W,Th,F,S,Su "C:\Program Files\Rsync\rsync -a
--delete cpan-rsync.perl.org::CPAN /project/CPAN/"
How do I create a public mirror?
- Consider Instant mirroring, required if you wish to be a tier 1 mirror, or..
- rsync once a day
- Provide (in order of preference) rsync, HTTP and/or FTP public access
- To be added to http://www.cpan.org/SITES.html and mirrors.json please complete the template confirming the public accessible URLs to your mirror: rsync, ftp, http and email it to cpan@perl.org.
Instant mirroring
"Instant mirroring" keeps your CPAN mirror up-to-date by continuously tracking the CPAN master; picking up the changes from the master, a short time (minutes) after they occur.
Instant mirroring is used for all Tier 1 mirrors (so cpan-rsync.perl.org stays in sync across mirrors).
To use "instant mirroring", you need a special client: "rrr-client" or "iim".
"rrr-client" is part of the File::Rsync::Mirror::Recent
(also known as rrr
) package ; it is the official client, used
on the CPAN master to get updates from PAUSE : the true heart and soul of "all things
perl", see the setup
guide for more details.
"iim" is an alternative for "rrr-client" ; basically it does the same thing, but it is more efficient (on start-up) and has some features that may be helpful to CPAN mirror operators.