Rclone
Contents
Summary and Version Information
Package | Rclone |
---|---|
Description | Command line utility for syncing files to/from various cloud storage providers |
Categories | Misc, Miscellaneous |
Version | Module tag | Availability* | GPU Ready |
Notes |
---|---|---|---|---|
1.43.1 | rclone/1.43.1 | Non-HPC Glue systems RedHat6 |
N | |
1.47.0 | rclone/1.47.0 | Non-HPC Glue systems RedHat6 |
N | |
1.48-beta | rclone/1.48-beta | Non-HPC Glue systems RedHat6 |
N |
Notes:
*: Packages labelled as "available" on an HPC cluster means
that it can be used on the compute nodes of that cluster. Even software
not listed as available on an HPC cluster is generally available on the
login nodes of the cluster (assuming it is available for the
appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters).
This is due to the fact that the compute nodes do not use AFS and so have
copies of the AFS software tree, and so we only install packages as requested.
Contact us if you need a version
listed as not available on one of the clusters.
In general, you need to prepare your Unix environment to be able to use this software. To do this, either:
tap TAPFOO
module load MODFOO
where TAPFOO and MODFOO are one of the tags in the tap
and module columns above, respectively. The tap
command will
print a short usage text (use -q
to supress this, this is needed
in startup dot files); you can get a similar text with
module help MODFOO
. For more information on
the tap and module commands.
For packages which are libraries which other codes get built against, see the section on compiling codes for more help.
Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.
In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.
Overview
Rclone is a command line utility that enables one to synchronize files and directories with a wide range of Cloud and other storage providers. A full list of support Cloud and other storage providers can be found on the rclone documentation site, but supported Cloud providers include Box and Google Drive.
Before you can effectively start using rclone to copy files or synchronize
directories, you need to define one or more remotes which
define your storage backends. This is further explained below.
These remotes contain information about
what Cloud storage provider is being used as well as your authentication
and authorization information. This data is stored in your rclone configuration
file, normally ~/.config/rclone/rclone.conf
.
|
The
~/.config/rclone/rclone.conf file
will contain access credentials to your cloud storage. BE SURE
TO PROTECT THIS FILE. Rclone will protect it by default,
but be sure not to unprotect it. Anyone with read access
to this file can access your cloud storage as you. You might want to
also consider encrypting this config file with a master password.
|
When you define a remote, you give it a name of your own choosing,
which you use with a colon suffix to refer to files
and directories on the associated Cloud storage provider. You can then issue
various rclone subcommands to move data back and forth. E.g., if you defined
a remote 'gdrive' to access your personal Google Drive, you could use a command
like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt
to copy
the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'.
This is elaborated in the usage section below.
Configuring rclone for a storage backend
Rclone has an interactive config
subcommand which allows
you to define, delete, copy, and edit "remotes". It also allows you to
password encrypt your rclone configuration file (which contains authorization
tokens to your storage backends) and which is something you might wish to consider
if any data on your Cloud providers should have additional security. This will
prevents anyone from using the information in the rclone config file from accessing
your data on the Cloud storage providers unless they know the password --- unfortunately
that also means that you need to enter the password with every rclone invocation.
Rclone uses "remotes" to define the various Cloud storage or other storage backends
you wish to use with the rclone command, as well as authentication and authorization
information and other settings. When you define a remote, you give it a name of your
choosing, which is how you refer to files using the provider. For example, if you
defined a remote 'gdrive' to attach to your personal Google drive, then
gdrive:SomeFile.txt
would refer to a file named 'SomeFile.txt' on your
Google drive. You can define multiple 'remotes' attached to the same Cloud storage
provider backend, which can sometimes be useful (e.g. to make distinct remotes for your
personal and for team Google drives). Because you will be using the name of the
remote on the command line a lot, it is advised to keep it somewhat short and to
avoid characters which are not convenient to use in the shell (i.e. use only letters,
numbers, and maybe hyphens/underscores).
All management of your remotes is done with the interactive rclone config
command. Rclone supports many different storage backends, and many options on all of
these backends, so the configuration of backends can be somewhat complicated. The rclone
config command will step you through it, asking questions, most of which have reasonable
defaults. Because many backends primarily authenticate via the web, you are likely to
need a web browser to complete the authentication. Some backends will allow you to use
a web browser on your desktop while you ssh to the HPC login node to run the rclone config
command, others will only work if the web browser is running on the same system that the
rclone config command ran on. In the latter case, things work best if you can setup an
X11 session between your desktop and the
HPC login node, and when needed run firefox or chrome on the login node to display on
your desktop (this will admittedly be sluggish, but you only need it briefly to get the
authentication tokens).
The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.
Configuring rclone for Box
Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.
- >To set up rclone to access Box, this easiest way requires you to be able to
start a web browser on the machine on which you will be running the
rclone
; so for the HPC clusters on one of the login nodes. Please see the page on using X for more information; you will basically need to start an X server on your desktop and connect to the login box with X forwarding enabled (e.g. use the '-X' flag on your ssh command). Then start your web browser of choice (bothchrome
andfirefox
are available). - Start up another terminal on your desktop and ssh to the login node (this terminal
does NOT require X forwarding). On this terminal, issue the
rclone config
command. - Type 'n' at the rclone rclone prompt to create a new remote
- Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'box' or 'umdbox'
- Select the type of backend, in this case Box (5)
- Most of the questions following you can just hit return to accept defaults:
Box App Client Id
Box App Client Secret
Edit advanced config?
Unless you know what you are doing, I would recommend 'n'.Use auto config?
: Type 'y' for Yes. Your browser running on the login node should be redirected to an URL that is printed on the terminal running the rclone command (something like http://127.0.0.1:53682/auth). If the browser running on the login node does not get redirected, copy the URL and paste it into that browser. You will be prompted to login, and to allow rclone to access your account. After agreeing to allow rclone access, the shell with the rclone config command should receive the token.- Type y in the terminal running rclone to save this new remote, and q to exit the rclone config command.
More complete details on configuring rclone for use with Box can be found on the rclone documentation site. There are ways to configure rclone for use with Box without requiring you to be able to run a web browser on the login nodes, see e.g. , but these typically require you to have rclone installed on a system on which you can use a web browser.
You can now use the rclone commands with this remote as discussed below.
Configuring rclone for Google Drive
Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.
rclone config
. Type 'n' to
create a new remote.Google Application Client Id
Google Application Client Secret
Scope that rclone should use when requesting access from drive
: choose the
appropriate scope. Typically people will want full access, but if you are pushing
data to Google drive from else where and just downloading to the Unix system, you
might want to use read-only access.ID of the root folder
Service Account Credentials JSON file path
Edit advanced config?
Unless you know what you are doing, I would recommend 'n'.Use auto config?
: Type 'n' for No. The rclone command should print out
a long URL (starting 'https://accounts.google.com'). Copy and paste the entire
URL into your web browser. It will then ask you to log into Google drive, and to authorize
rclone for access your account. When you accept, you should get a long string returned on
your web browser; cut and paste this string to the Enter verification code>
prompt on your rclone terminal.Configure as a team drive?
Answer yes or no depending on whether this should be a team
drive or not.More complete details on configuring rclone for use with Google drive can be found on the rclone documentation site.
You can now use the rclone commands with this remote as discussed below.
Using rclone
Rclone commands generally follow the format:
rclone SUBCOMMAND SOURCE DEST
although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.
A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)
rclone ls SOURCE
: this will list files in SOURCE. E.g.rclone ls gdrive:MyFolder
will list files in MyFolder in Google drive.rclone lsd SOURCE
: this will list folders under SOURCE. E.g.rclone ls gdrive:
will list all folders under the root folder on Google drive .
rclone copy SOURCE DEST
: this will copy file SOURCE to DEST. E.g.rclone copy box:MyFile.txt .
will copy MyFile.txt from Box to the current directory on the local system.rclone sync SOURCE DEST
: will modify the directory DEST to make it identical with SOURCE. E.g.rclone sync gdrive:ImportantStuff box:Copy
will add/delete/copy files in the Copy folder of Box as needed to synchronize it with the ImportantStuff folder in Google drive.
Many more subcommands are available, see the man page or the rclone documentation site for more information.