Skip to main content

Rclone: command line tool for managing cloud storage

Contents

  1. Overview of package
    1. General usage
  2. Availability of package by cluster
  3. Configuring rclone for a storage backend
    1. Configuring rclone for Box
    2. Configuring rclone for Google Drive
  4. Using rclone

Overview of package

General information about package
Package: Rclone
Description: command line tool for managing cloud storage
For more information: https://rclone.org/
Categories:
License: OpenSource (MIT)

General usage information

Sometimes called the \"The Swiss army knife of cloud storage\", Rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products support rclone including S3 object stores, business and consumer file storage services, as well as standard transfer protocols.

Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat. Its familiar syntax includes shell pipeline support, and --dry-run protection.

This module will add the rclone command to your path.

Before you can effectively start using rclone to copy files or synchronize directories, you need to define one or more remotes which define your storage backends. This is further explained below. These remotes contain information about what Cloud storage provider is being used as well as your authentication and authorization information. This data is stored in your rclone configuration file, normally ~/.config/rclone/rclone.conf.

WARNING
The ~/.config/rclone/rclone.conf file will contain access credentials to your cloud storage. BE SURE TO PROTECT THIS FILE. Rclone will protect it by default, but be sure not to unprotect it. Anyone with read access to this file can access your cloud storage as you. You might want to also consider encrypting this config file with a master password.

When you define a remote, you give it a name of your own choosing, which you use with a colon suffix to refer to files and directories on the associated Cloud storage provider. You can then issue various rclone subcommands to move data back and forth. E.g., if you defined a remote 'gdrive' to access your personal Google Drive, you could use a command like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt to copy the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'. This is elaborated in the usage section below.

Available versions of the package Rclone, by cluster

This section lists the available versions of the package Rcloneon the different clusters.

Available versions of Rclone on the Zaratan cluster

Available versions of Rclone on the Zaratan cluster
Version Module tags CPU(s) optimized for GPU ready?
1.59.1 rclone/1.59.1 zen2 Y
1.57.0 rclone/1.57.0 zen2 Y

Configuring rclone for a storage backend

Rclone has an interactive config subcommand which allows you to define, delete, copy, and edit "remotes". It also allows you to password encrypt your rclone configuration file (which contains authorization tokens to your storage backends) and which is something you might wish to consider if any data on your Cloud providers should have additional security. This will prevents anyone from using the information in the rclone config file from accessing your data on the Cloud storage providers unless they know the password --- unfortunately that also means that you need to enter the password with every rclone invocation.

Rclone uses "remotes" to define the various Cloud storage or other storage backends you wish to use with the rclone command, as well as authentication and authorization information and other settings. When you define a remote, you give it a name of your choosing, which is how you refer to files using the provider. For example, if you defined a remote 'gdrive' to attach to your personal Google drive, then gdrive:SomeFile.txt would refer to a file named 'SomeFile.txt' on your Google drive. You can define multiple 'remotes' attached to the same Cloud storage provider backend, which can sometimes be useful (e.g. to make distinct remotes for your personal and for team Google drives). Because you will be using the name of the remote on the command line a lot, it is advised to keep it somewhat short and to avoid characters which are not convenient to use in the shell (i.e. use only letters, numbers, and maybe hyphens/underscores).

All management of your remotes is done with the interactive rclone config command. Rclone supports many different storage backends, and many options on all of these backends, so the configuration of backends can be somewhat complicated. The rclone config command will step you through it, asking questions, most of which have reasonable defaults.

Because many backends primarily authenticate via the web, you are likely to need a web browser to complete the authentication. The HPC cluster login nodes no longer provide web browsers, so the procedures described below usually assume that you are at your workstation with an ssh session from your workstation to the HPC login node in one window, and a web browser (running on your workstation) in another window. Some storage backends (e.g. Box) might require a third window on your workstation running a local (workstation) command prompt, and might require a reasonably current version of rclone be installed on your workstation. Binaries for installing rclone on various platforms are available from the main Rclone site.

The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.

Configuring rclone for Box

Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.

To access Box via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need three windows on your workstation: one running your web broswer of choice, one at the command prompt on your workstation, and one in which you have ssh-ed to the login node of the appropriate HPC cluster.

You will also need to have a recent version of rclone installed on your workstation in order to complete the configuration on the HPC cluster. Binaries for installing rclone on various platforms are available from the main Rclone site.

  1. On the login node, issue the command "module load rclone"
  2. [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
  3. [login node, rclone config prompt] Type 'n' to create a new remote
  4. [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'box' or 'umdbox'
  5. [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Box service, type "box"
  6. [login node, rclone config prompt] The system will then prompt for Box App Client Id. Just hit return to accept the default.
  7. [login node, rclone config prompt] The system will then prompt for Box App Client Secret. Just hit return to accept the default.
  8. [login node, rclone config prompt] The system will then prompt for Box App config.json location. Just hit return to accept the default.
  9. [login node, rclone config prompt] The system will then prompt for box_sub_type, with options "user" and "enterprise". For UMD users, you should be choosing "enterprise" (which is not the default).
  10. [login node, rclone config prompt] The system will then prompt for Edit advanced config? Unless you know what you are doing, I would recommend 'n'.
  11. [login node, rclone config prompt] The system will then prompt for Use auto config?: Type 'n' for No. (not the default). Rclone will instruct you to run a command on your workstation and prompts for the result. Switch to your workstation window
  12. On your workstation command prompt, run the command rclone authorize box.
  13. The browser show a new window with a button Grant access to Box. If that window does not appear in your browser, copy the URL printed in your workstation command prompt (something like "htpp://127.0.0.1:...") into the URL bar on your browser. You should get the Grant access to Box button.
  14. [workstation, browser]: Click on the Grant access to Box button
  15. [workstation, command prompt]: The rclone authorize box should have printed out some text and finished. Copy the text between the Paste the following into your remote machine ---> and <--End paste lines into the result prompt on your login node rclone config prompt and hit return.
  16. [login node, rclone config prompt]: Rclone should parse the string you gave and print back some basic information (e.g. type is Box, box_sub_type is enterprise, etc), and ask for confirmation. Type y at the prompt.
  17. [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your box entry (whatever you named it in step 4) should be appearing. You can type q to quit.
Configuration is now complete.

See also:

You can now use the rclone commands with this remote as discussed below.

Configuring rclone for Google Drive

Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

To access Google drive via rclone on a login node of one of the HPC clusters, you need to configure rclone first. This needs to be done once per cluster. You will need two windows on your workstation: one running your web broswer of choice, and one in which you have ssh-ed to the login node of the appropriate HPC cluster. You do need to have rclone installed on your workstation, preferably the same version as is running on the cluster..

The configuration of Google Drive remote for rclone proceeds as follows:

  1. On the login node, issue the command "module load rclone"
  2. [login node], issue the command "rclone config". This will list any remotes you currently have configured, and leave you at an interactive rclone config prompt.
  3. [login node, rclone config prompt] Type 'n' to create a new remote
  4. [login node, rclone config prompt] Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'gdrive' or 'google'
  5. [login node, rclone config prompt] Select the type of backend you wish to configure. The available options are in alphabetical order, but the numbering changes between rclone versions. Since we are configuring for the Google Drive service, type "drive"
  6. [login node, rclone config prompt] The system will then prompt for Google Application Client Id. Just hit return for the default.
  7. [login node, rclone config prompt] The system will then prompt for Google Application Client Secret. If you created your own Google Application Client id and used it in the previous step, cut and paste the Client secret from the Google API Console Credentials page. If you entered the default blank string at the previous prompt, just do so again.
  8. [login node, rclone config prompt] The system will then prompt for Scope that rclone should use when requeting access from drive. It will provide different scopes for access. Common selections are:
    • Full access all files, "drive", meaning rclone will have full access to all of your files on Google drive. This is probably the best option for most people.
    • Access to files created by rclone only. "drive.file" only allows rclone to access files that it placed on Google drive. This will prevent rclone from accessing files put on Google Drive by other means, which might be useful if you have some more sensitive data on Google Drive.
    • Read-only access to file metadata and file contents, "drive.readonly". This will allow you to list and download files from Google drive, but not upload or modify files.
  9. [login node, rclone config prompt] The system will then prompt for ID of the root folder. It is recommended to leave this blank (the default).
  10. [login node, rclone config prompt] The system will then prompt for Service Account Credentials JSON file path. It is recommended to leave this blank (the default).
  11. [login node, rclone config prompt] The system will then prompt for Edit advanced config. Unless you know what you are doing, I would recommend 'n'.
  12. [login node, rclone config prompt] The system will then prompt for Use auto config?: Type 'n' for No. (not the default). Rclone will then print out a command to run on your workstation, something like rclone authorize "drive" "...(long random string)...".
  13. On your workstation, issue the command reported above. That should cause a web browser to open up on your workstation. Login (using your @umd.edu account), and authorize rclone for access. The rclone command should then return a secret token, with instructions to paste it into your remote machine. Copy this into your clipboard.
  14. [login node, rclone config prompt]]: Paste the secret token printed out in the rclone command on your workstation (which you should have copied above) in the rclone prompt on the login node and hit return.
  15. [login node, rclone config prompt] Rclone will then prompt as to whether you wish to configure this as a team drive. Answer appropriately (if unsure, type No (the default)). If you configure it as a shared drive, you will be given a list of shared drived and prompted to select which one to use.
  16. [login node, rclone config prompt] Rclone will then print out a summary of the configuration for this new remote (include type=drive, the client id you selected, the scope you selected, and a long string for token), and ask for confirmation. Enter y for Yes.
  17. [login node, rclone config prompt]: Rclone should now list your configured remotes again, and your gdrive entry (whatever you named it in step 4) should be appearing. You can type q to quit.

See also:

You can now use the rclone commands with this remote as discussed below.

Using rclone

Rclone commands generally follow the format:

rclone SUBCOMMAND SOURCE DEST

although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.

A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)

Many more subcommands are available, see the man page or the rclone documentation site for more information.





Back to Top