Rclone

Contents

  1. Summary and Version Information
  2. Overview
  3. Configuring rclone for a storage backend
    1. Configuring rclone for Box
    2. Configuring rclone for Google Drive
  4. Using rclone

Summary and Version Information

Package Rclone
Description Command line utility for syncing files to/from various cloud storage providers
Categories Misc,   Miscellaneous
Version Module tag Availability* GPU
Ready
Notes
1.43.1 rclone/1.43.1 Non-HPC Glue systems
RedHat6
N
1.47.0 rclone/1.47.0 Non-HPC Glue systems
RedHat6
N
1.48-beta rclone/1.48-beta Non-HPC Glue systems
RedHat6
N

Notes:
*: Packages labelled as "available" on an HPC cluster means that it can be used on the compute nodes of that cluster. Even software not listed as available on an HPC cluster is generally available on the login nodes of the cluster (assuming it is available for the appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters). This is due to the fact that the compute nodes do not use AFS and so have copies of the AFS software tree, and so we only install packages as requested. Contact us if you need a version listed as not available on one of the clusters.

In general, you need to prepare your Unix environment to be able to use this software. To do this, either:

  • tap TAPFOO
OR
  • module load MODFOO

where TAPFOO and MODFOO are one of the tags in the tap and module columns above, respectively. The tap command will print a short usage text (use -q to supress this, this is needed in startup dot files); you can get a similar text with module help MODFOO. For more information on the tap and module commands.

For packages which are libraries which other codes get built against, see the section on compiling codes for more help.

Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.

In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.

Overview

Rclone is a command line utility that enables one to synchronize files and directories with a wide range of Cloud and other storage providers. A full list of support Cloud and other storage providers can be found on the rclone documentation site, but supported Cloud providers include Box and Google Drive.

Before you can effectively start using rclone to copy files or synchronize directories, you need to define one or more remotes which define your storage backends. This is further explained below. These remotes contain information about what Cloud storage provider is being used as well as your authentication and authorization information. This data is stored in your rclone configuration file, normally ~/.config/rclone/rclone.conf.

WARNING
The ~/.config/rclone/rclone.conf file will contain access credentials to your cloud storage. BE SURE TO PROTECT THIS FILE. Rclone will protect it by default, but be sure not to unprotect it. Anyone with read access to this file can access your cloud storage as you. You might want to also consider encrypting this config file with a master password.

When you define a remote, you give it a name of your own choosing, which you use with a colon suffix to refer to files and directories on the associated Cloud storage provider. You can then issue various rclone subcommands to move data back and forth. E.g., if you defined a remote 'gdrive' to access your personal Google Drive, you could use a command like rclone copy ./MyFile.txt gdrive:CopyOfMyFile.txt to copy the local file 'MyFile.txt' to your Google drive as 'CopyOfMyFile.txt'. This is elaborated in the usage section below.

Configuring rclone for a storage backend

Rclone has an interactive config subcommand which allows you to define, delete, copy, and edit "remotes". It also allows you to password encrypt your rclone configuration file (which contains authorization tokens to your storage backends) and which is something you might wish to consider if any data on your Cloud providers should have additional security. This will prevents anyone from using the information in the rclone config file from accessing your data on the Cloud storage providers unless they know the password --- unfortunately that also means that you need to enter the password with every rclone invocation.

Rclone uses "remotes" to define the various Cloud storage or other storage backends you wish to use with the rclone command, as well as authentication and authorization information and other settings. When you define a remote, you give it a name of your choosing, which is how you refer to files using the provider. For example, if you defined a remote 'gdrive' to attach to your personal Google drive, then gdrive:SomeFile.txt would refer to a file named 'SomeFile.txt' on your Google drive. You can define multiple 'remotes' attached to the same Cloud storage provider backend, which can sometimes be useful (e.g. to make distinct remotes for your personal and for team Google drives). Because you will be using the name of the remote on the command line a lot, it is advised to keep it somewhat short and to avoid characters which are not convenient to use in the shell (i.e. use only letters, numbers, and maybe hyphens/underscores).

All management of your remotes is done with the interactive rclone config command. Rclone supports many different storage backends, and many options on all of these backends, so the configuration of backends can be somewhat complicated. The rclone config command will step you through it, asking questions, most of which have reasonable defaults. Because many backends primarily authenticate via the web, you are likely to need a web browser to complete the authentication. Some backends will allow you to use a web browser on your desktop while you ssh to the HPC login node to run the rclone config command, others will only work if the web browser is running on the same system that the rclone config command ran on. In the latter case, things work best if you can setup an X11 session between your desktop and the HPC login node, and when needed run firefox or chrome on the login node to display on your desktop (this will admittedly be sluggish, but you only need it briefly to get the authentication tokens).

The rclone web documentation site details all of the backends it supports as well as how to configure remotes for these backends. We will discuss two of the Cloud providers most used at UMD below, but see the rclone documentation for more detail.

Configuring rclone for Box

Be sure to see the the UMD Box service catalog entry for more information on this service, including restrictions on what data can be stored on Box and how to set up your account if you have not already done so.

  1. >To set up rclone to access Box, this easiest way requires you to be able to start a web browser on the machine on which you will be running the rclone; so for the HPC clusters on one of the login nodes. Please see the page on using X for more information; you will basically need to start an X server on your desktop and connect to the login box with X forwarding enabled (e.g. use the '-X' flag on your ssh command). Then start your web browser of choice (both chrome and firefox are available).
  2. Start up another terminal on your desktop and ssh to the login node (this terminal does NOT require X forwarding). On this terminal, issue the rclone config command.
  3. Type 'n' at the rclone rclone prompt to create a new remote
  4. Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'box' or 'umdbox'
  5. Select the type of backend, in this case Box (5)
  6. Most of the questions following you can just hit return to accept defaults:
    1. Box App Client Id
    2. Box App Client Secret
  7. Edit advanced config? Unless you know what you are doing, I would recommend 'n'.
  8. Use auto config?: Type 'y' for Yes. Your browser running on the login node should be redirected to an URL that is printed on the terminal running the rclone command (something like http://127.0.0.1:53682/auth). If the browser running on the login node does not get redirected, copy the URL and paste it into that browser. You will be prompted to login, and to allow rclone to access your account. After agreeing to allow rclone access, the shell with the rclone config command should receive the token.
  9. Type y in the terminal running rclone to save this new remote, and q to exit the rclone config command.

More complete details on configuring rclone for use with Box can be found on the rclone documentation site. There are ways to configure rclone for use with Box without requiring you to be able to run a web browser on the login nodes, see e.g. , but these typically require you to have rclone installed on a system on which you can use a web browser.

You can now use the rclone commands with this remote as discussed below.

Configuring rclone for Google Drive

Be sure to see the Google drive service catalog entry for more information on this service, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

  • To set up rclone to access Google drive, you need to have a web browser running. The instructions below assume you will NOT be using "auto config", in which case the browser does NOT need to be running on the system on which you run the rclone command. This means that you can just start a web browser on your desktop. (I believe most users will find not using "auto conf" will be easier in this environment. However, if you really want to use the "auto config" mode, you will need to have a web browser running on the same system that the rclone command is running on --- see the section in the configuration for Box regarding the browser setup.)
  • In a shell on the HPC login node, run rclone config. Type 'n' to create a new remote.
  • Enter the name you wish to use for this remote. This will prefix all paths for the remote system, so you will probably want something somewhat short and shell friendly, e.g. 'gdrive' or 'gd'
  • Select the type of backend, in this case Google Drive (11)
  • Most of the questions following you can just hit return to accept defaults:
    1. Google Application Client Id
    2. Google Application Client Secret
  • Scope that rclone should use when requesting access from drive: choose the appropriate scope. Typically people will want full access, but if you are pushing data to Google drive from else where and just downloading to the Unix system, you might want to use read-only access.
  • Most of the questions following you can just hit return to accept defaults:
    1. ID of the root folder
    2. Service Account Credentials JSON file path
  • Edit advanced config? Unless you know what you are doing, I would recommend 'n'.
  • Use auto config?: Type 'n' for No. The rclone command should print out a long URL (starting 'https://accounts.google.com'). Copy and paste the entire URL into your web browser. It will then ask you to log into Google drive, and to authorize rclone for access your account. When you accept, you should get a long string returned on your web browser; cut and paste this string to the Enter verification code> prompt on your rclone terminal.
  • Configure as a team drive? Answer yes or no depending on whether this should be a team drive or not.
  • Type y to save this new remote, and q to quit rclone.
  • More complete details on configuring rclone for use with Google drive can be found on the rclone documentation site.

    You can now use the rclone commands with this remote as discussed below.

    Using rclone

    Rclone commands generally follow the format:

    rclone SUBCOMMAND SOURCE DEST

    although a few commands omit the DEST. The SUBCOMMAND tells rclone what it is you want to do, and the SOURCE and DEST, if given, are what is acted upon. SOURCE and DEST are paths to files or directories, either local or in a Cloud or other storage provider. To reference files or directories on a Cloud or storage provider, the SOURCE or DEST specification should start with the name of one of the remotes you defined followed by a colon (':') and optionally any additional path components needd.

    A complete description of all of the rclone subcommands, etc. can be found on the rclone documentation site. But below is a brief description of some of the more commonly used subcommands: (in examples below, we assume that 'gdrive' is a remote for Google drive, and 'box' on for Box)

    • rclone ls SOURCE: this will list files in SOURCE. E.g. rclone ls gdrive:MyFolder will list files in MyFolder in Google drive.
    • rclone lsd SOURCE: this will list folders under SOURCE. E.g. rclone ls gdrive: will list all folders under the root folder on Google drive
    • .
    • rclone copy SOURCE DEST: this will copy file SOURCE to DEST. E.g. rclone copy box:MyFile.txt . will copy MyFile.txt from Box to the current directory on the local system.
    • rclone sync SOURCE DEST: will modify the directory DEST to make it identical with SOURCE. E.g. rclone sync gdrive:ImportantStuff box:Copy will add/delete/copy files in the Copy folder of Box as needed to synchronize it with the ImportantStuff folder in Google drive.

    Many more subcommands are available, see the man page or the rclone documentation site for more information.