Rclone: command line tool for managing cloud storage
Contents
- Overview of package
- Overview of package
- Availability of package by cluster
- Configuring rclone for a storage backend
- Using rclone
Overview of package
Package: | Rclone |
---|---|
Description: | command line tool for managing cloud storage |
For more information: | https://rclone.org/ |
Categories: | |
License: | OpenSource (MIT) |
General usage information
Sometimes called the \"The Swiss army knife of cloud storage\", Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business and consumer file storage services, as well as standard transfer protocols.
Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat. Its familiar syntax includes shell pipeline support, and --dry-run protection.
This module will add the rclone command to your path.
Before you can effectively start using rclone to copy files or synchronize
directories, you need to define one or more remotes which
define your storage backends. This is further explained below.
These remotes contain information about
what Cloud storage provider is being used as well as your authentication
and authorization information. This data is stored in your rclone configuration
file, normally ~/.config/rclone/rclone.conf
.
![]() |
The
~/.config/rclone/rclone.conf file
will contain access credentials to your cloud storage. BE SURE
TO PROTECT THIS FILE. Rclone will protect it by default,
but be sure not to unprotect it. Anyone with read access
to this file can access your cloud storage as you. You might want to
also consider encrypting this config file with a master password.
|
When you define a remote, you give it a name of your own choosing,
which you use with a colon suffix to refer to files
and directories on the associated Cloud storage provider. You can then issue
various rclone subcommands to move data back and forth. E.g., if you defined
a remote 'gdrive' to access your personal Google Drive, you could use a command
like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt
to copy
the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'.
This is elaborated in the usage section below.
Available versions of the package Rclone, by cluster
This section lists the available versions of the package Rcloneon the different clusters.
Available versions of Rclone on the Deepthought2 cluster (RHEL8)
Version | Module tags | CPU(s) optimized for | GPU ready? |
---|---|---|---|
1.51.0 | rclone/1.51.0 | ivybridge | Y |
Available versions of Rclone on the Juggernaut cluster
Version | Module tags | CPU(s) optimized for | GPU ready? |
---|---|---|---|
1.51.0 | rclone/1.51.0 | x86_64 | Y |
Available versions of Rclone on the Deepthought2 cluster (RHEL6) [DEPRECATED]
Version | Module tags | CPU(s) optimized for | GPU ready? |
---|---|---|---|
1.52.2 | rclone/1.52.2 | x86_64 | N |
1.47.0 | rclone/1.47.0 | x86_64 | N |
1.43.1 | rclone/1.43.1 | x86_64 | N |
Configuring rclone for a storage backend
Rclone has an interactive config
subcommand which allows
you to define, delete, copy, and edit "remotes". It also allows you to
password encrypt your rclone configuration file (which contains authorization
tokens to your storage backends) and which is something you might wish to consider
if any data on your Cloud providers should have additional security. This will
prevents anyone from using the information in the rclone config file from accessing
your data on the Cloud storage providers unless they know the password --- unfortunately
that also means that you need to enter the password with every rclone invocation.
Rclone uses "remotes" to define the various Cloud storage or other storage backends
you wish to use with the rclone command, as well as authentication and authorization
information and other settings. When you define a remote, you give it a name of your
choosing, which is how you refer to files using the provider. For example, if you
defined a remote 'gdrive' to attach to your personal Google drive, then
gdrive:SomeFile.txt
would refer to a file named 'SomeFile.txt' on your
Google drive. You can define multiple 'remotes' attached to the same Cloud storage
provider backend, which can sometimes be useful (e.g. to make distinct remotes for your
personal and for team Google drives). Because you will be using the name of the
remote on the command line a lot, it is advised to keep it somewhat short and to
avoid characters which are not convenient to use in the shell (i.e. use only letters,
numbers, and maybe hyphens/underscores).
All management of your remotes is done with the interactive rclone config
command. Rclone supports many different storage backends, and many options on all of
these backends, so the configuration of backends can be somewhat complicated. The rclone
config command will step you through it, asking questions, most of which have reasonable
defaults.
Because many backends primarily authenticate via the web, you are likely to
need a web browser to complete the authentication. The HPC cluster login nodes no
longer provide web browsers, so the procedures described below usually assume that
you are at your workstation with an ssh session from your workstation to the HPC
login node in one window, and a web browser (running on your workstation) in
another window. Some storage backends (e.g. Box) might require a third window on
your workstation running a local (workstation) command prompt, and might require
a reasonably current version of rclone
be
installed on your workstation. Binaries for
installing rclone on various platforms are available from the
main Rclone site.
The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.
Configuring rclone for Box
Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.
To access Box via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need three windows on your workstation: one running your web broswer of choice, one at the command prompt on your workstation, and one in which you have ssh-ed to the login node of the appropriate HPC cluster.
You will also need to have a recent version of rclone installed on your workstation in order to complete the configuration on the HPC cluster. Binaries for installing rclone on various platforms are available from the main Rclone site.
- On the login node, issue the command "module load rclone"
- [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
- [login node, rclone config prompt] Type 'n' to create a new remote
- [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'box' or 'umdbox'
- [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Box service, type "box"
- [login node, rclone config prompt] The system will then prompt for
Box App Client Id
. Just hit return to accept the default. - [login node, rclone config prompt] The system will then prompt for
Box App Client Secret
. Just hit return to accept the default. - [login node, rclone config prompt] The system will then prompt for
Box App config.json location
. Just hit return to accept the default. - [login node, rclone config prompt] The system will then prompt for
box_sub_type
, with options "user" and "enterprise". For UMD users, you should be choosing "enterprise" (which is not the default). - [login node, rclone config prompt] The system will then prompt for
Edit advanced config?
Unless you know what you are doing, I would recommend 'n'. - [login node, rclone config prompt] The system will then prompt for
Use auto config?
: Type 'n' for No. (not the default). Rclone will instruct you to run a command on your workstation and prompts for theresult
. Switch to your workstation window - On your workstation command prompt, run the command
rclone authorize box
. - The browser show a new window with a button
Grant access to Box
. If that window does not appear in your browser, copy the URL printed in your workstation command prompt (something like "htpp://127.0.0.1:...") into the URL bar on your browser. You should get theGrant access to Box
button. - [workstation, browser]: Click on the
Grant access to Box
button - [workstation, command prompt]: The
rclone authorize box
should have printed out some text and finished. Copy the text between thePaste the following into your remote machine --->
and<--End paste
lines into theresult
prompt on your login node rclone config prompt and hit return. - [login node, rclone config prompt]: Rclone should parse the string you gave and print back some basic information (e.g. type is Box, box_sub_type is enterprise, etc), and ask for confirmation. Type y at the prompt.
- [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your box entry (whatever you named it in step 4) should be appearing. You can type q to quit.
See also:
- The main Rclone documentation site.
- Rclone Box documentation site.
- Rclone remote setup documentation
You can now use the rclone commands with this remote as discussed below.
Configuring rclone for Google Drive
Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.
To access Google drive via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need two windows on your workstation: one running your web broswer of choice, and one in which you have ssh-ed to the login node of the appropriate HPC cluster. You do not need to have rclone installed on your workstation.
Before starting the rclone configuration, it is advised that you create your own Google Drive client ID for rclone. If you do not, rclone can use its own internal client_id. But that is shared by everyone else using rclone who did not set up their own client ID. This client ID is used by Google when rate limitting the access to Google Drive, which means by using the Rclone's internal client ID you may be rate limitted by what other users are doing. So while this step is optional, it is advised that you create your own client ID for Rclone for best performance, as described below:
- From the web browser on your workstation, go to the Google API Console and log in.
- On the blue menu bar near the top, choose
Select a project
, and either select an existing project (if you have any), or click theNEW PROJECT
button in the upper right of the popup. If creating a new project, you need to give it a name (e.g. your_username-rclone) and a parent organization or folder (e.g. Self-Service Projects). Then hit the blueCREATE
button at the bottom left. - From the project dashboard, select the
Dashboard
option underAPIs & Services
in the menu on the left. - Near the top, there should be an option to
ENABLE APIS AND SERVICES
. Click that link. - In the search field, enter
Drive
, and click on theGoogle Drive API
option. This should open up aGoogle Drive API
page, click on the blueENABLE
button. - You should now be on a page with a
CREATE CREDENTIALS
button near the upper right, click on that button. - The page should have a drop down labelled
Select an API
; selectGoogle Drive API
. It will then askWhat date will you be accessing?
; you probably want to check theUser data
. Then click theNEXT
button. - You will now be in a section labelled
OAuth Consent Screen
. Fill out the required fields:-
App name
: rclone is fine -
User support email
: Enter your email address -
App logo
: leave blank -
Developer contact information
: Enter your email address
SAVE AND CONTINUE
button. -
- You will now be in a section labelled
Scopes (optional)
. Just click theSAVE AND CONTINUE
button. - You will now be in a section labelled
OAuth Client ID
. In the drop-down labelledApplication Type
, enterDesktop app
. You can leave defaultName
or change it to something meaningful likerclone
. Click the blueCREATE
button. - You will now be in the
Your Credentials
section. It will display aClient ID
, basically a string of digits followed by a hyphen and a string of alphanumerics followed by.apps.googleusercontent.com
. There should also be a link to yourcredentials page
. Follow that link. You should now be on the Credentials page on your Google Drive API page. This should
list all credentials you have created. Under the
OAuth 2.0 Client IDs
you
should see the rclone credential you just created. Click on the pencil icon to the right
of it in order to edit it. This should display the client id and client secret. You
will need these to provide to the rclone config command in the process below. I recommend
keeping that browser tab open, or at least copying the data somewhere. NOTE:
the secret code should be carefully guarded. If you copy it, make sure you do so
securely.
We now proceed with the configuration of the Google Drive remote for rclone.
- On the login node, issue the command "module load rclone"
- [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
- [login node, rclone config prompt] Type 'n' to create a new remote
- [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'gdrive' or 'google'
- [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Google Drive service, type "drive"
- [login node, rclone config prompt] The system will then prompt for
Google Application Client Id
. If you created your own Google Application Client id as discussed above, now is the time to enter the Client ID created. This should be a string of digits followed by a hyphen and a string of alphanumerics followed by.apps.googleusercontent.com
. You should cut and paste it from the Google API Console Credentials page. If you do not want to bother with creating your own Client ID, you can just hit return for the default, but this will be much lower performance. - [login node, rclone config prompt] The system will then prompt for
Google Application Client Secret
. If you created your own Google Application Client id and used it in the previous step, cut and paste the Client secret from the Google API Console Credentials page. If you entered the default blank string at the previous prompt, just do so again. - [login node, rclone config prompt] The system will then prompt for
Scope that rclone should use when requeting access from drive
. It will provide different scopes for access. Common selections are:Full access all files, "drive"
, meaning rclone will have full access to all of your files on Google drive. This is probably the best option for most people.-
Access to files created by rclone only. "drive.file"
only allows rclone to access files that it placed on Google drive. This will prevent rclone from accessing files put on Google Drive by other means, which might be useful if you have some more sensitive data on Google Drive. Read-only access to file metadata and file contents, "drive.readonly"
. This will allow you to list and download files from Google drive, but not upload or modify files.
- [login node, rclone config prompt] The system will then prompt for
ID of the root folder
. It is recommended to leave this blank (the default). - [login node, rclone config prompt] The system will then prompt for
Service Account Credentials JSON file path
. It is recommended to leave this blank (the default). - [login node, rclone config prompt] The system will then prompt for
Edit advanced config
. Unless you know what you are doing, I would recommend 'n'. - [login node, rclone config prompt] The system will then prompt for
Use auto config?
: Type 'n' for No. (not the default). Rclone will then print out a long URL. Copy and paste that into the web browser running on your workstation. - On your workstation web browser, go to the URL listed by rclone on
the login node. Authenticate as your
@umd.edu
account. - [workstation, web browser]: You will get a Google page stating that
rclone wants to access your Google account
(the application namerclone
might vary). Click on theAllow
button. - [workstation, web browser]: The Google page will then print an access
code (a long string of characters). Copy this and then paste it into
the
Enter verification code
rclone prompt on the login node. - [login node, rclone config prompt] Paste the access code from Google
into the
Enter verification code
prompt and hit return. - [login node, rclone config prompt] Rclone will then prompt as to whether you wish to configure this as a team drive. Answer appropriately (if unsure, type No (the default)). If you configure it as a shared drive, you will be given a list of shared drived and prompted to select which one to use.
- [login node, rclone config prompt] Rclone will then print out a summary of the configuration for this new remote (include type=drive, the client id you selected, the scope you selected, and a long string for token), and ask for confirmation. Enter y for Yes.
- [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your gdrive entry (whatever you named it in step 4) should be appearing. You can type q to quit.
See also:
- The main Rclone documentation site.
- The Rclone Google Drive documentation.
- Rclone instructions on creating your own Google Client ID
- Google docs on different access scopes
You can now use the rclone commands with this remote as discussed below.
Using rclone
Rclone commands generally follow the format:
rclone SUBCOMMAND SOURCE DEST
although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.
A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)
rclone ls SOURCE
: this will list files in SOURCE. E.g.rclone ls gdrive:MyFolder
will list files in MyFolder in Google drive.rclone lsd SOURCE
: this will list folders under SOURCE. E.g.rclone ls gdrive:
will list all folders under the root folder on Google drive .
rclone copy SOURCE DEST
: this will copy file SOURCE to DEST. E.g.rclone copy box:MyFile.txt .
will copy MyFile.txt from Box to the current directory on the local system.rclone sync SOURCE DEST
: will modify the directory DEST to make it identical with SOURCE. E.g.rclone sync gdrive:ImportantStuff box:Copy
will add/delete/copy files in the Copy folder of Box as needed to synchronize it with the ImportantStuff folder in Google drive.
Many more subcommands are available, see the man page or the rclone documentation site for more information.