Bucket Sync and Migration With Rclone

Posted on Jun 9, 2024

In case of a bucket migration or syncing with a backup purpose, rclone is very handy tool to use. Rclone is not just only used for doing migrations, but instead it is very comprehensive tool that can be used in so many varieties, but here we will focus on one of the most important S3 operations.

The important information to collect here is source and destination bucket access information which will be endpoints, region,** Access Key** and Secret.

In overall steps should look like the following;

  1. Install rclone to a server/laptop where it can access both endpoints.
  2. Configure the buckets on both endpoints.
  3. Start and monitor the sync

⚠️ Make sure that your intermediary device performing the sync has enough system resources. Lacking or insufficient resources will decrease the sync performance significantly

Install rclone into your environment

Depending the environment you are using, the installation method may differ. With this documentation, the installation will be described on Ubuntu. For all environments, rclone website can be checked.

rclone can be installed on Ubuntu using the apt  command

apt install rclone

Configuring rclone

To be able to sync/migrate a bucket, we should configure the access information for both source and target buckets, including their credential, endpoint and region information.

Please keep in mind that, this tool will not create the bucket in the target endpoint. Therefore, the bucket in the target should be created in advance with needed settings.

With both configuration, we will be setting up profiles for each endpoint. This can be achived using rclone config command.

user@server:~$ rclone config
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> source-bucket
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
...
Storage> 4
** See help for s3 backend at: https://rclone.org/s3/ **

Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
13 / Any other S3 compatible provider
   \ "Other"
provider> 13
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
 1 / Enter AWS credentials in the next step
   \ "false"
...
env_auth> 1
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> ***************
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> ***************
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Use this if unsure. Will use v4 signatures and an empty region.
   \ ""
...
region> region-1
Endpoint for S3 API.
Required when using an S3 clone.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
endpoint> https://source.example.com
Location constraint - must be set to match the Region.
Leave blank if not sure. Used when creating buckets only.
Enter a string value. Press Enter for the default ("").
location_constraint>
Canned ACL used when creating buckets and storing or copying objects.

This ACL is used for creating objects and if bucket_acl isn't set, for creating buckets too.

For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Note that this ACL is applied when server side copying objects as S3
doesn't copy the ACL from the source but rather writes a fresh one.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Owner gets FULL_CONTROL. No one else has access rights (default).
   \ "private"
...
acl> 1
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[source-bucket]
provider = Other
env_auth = false
access_key_id = ***************
secret_access_key = ***************
region = region-1
endpoint = https://source.example.com
acl = private
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
source-bucket  s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q



user@server:~$ rclone config
Current remotes:

Name                 Type
====                 ====
source-bucket  s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> destination bucket
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
...
Storage> 4
** See help for s3 backend at: https://rclone.org/s3/ **

Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
...
13 / Any other S3 compatible provider
   \ "Other"
provider> 13
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
 1 / Enter AWS credentials in the next step
   \ "false"
...
env_auth> 1
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> ***************
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> ***************
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Use this if unsure. Will use v4 signatures and an empty region.
   \ ""
...
region> region-2
Endpoint for S3 API.
Required when using an S3 clone.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
endpoint> https://destination.example.com
Location constraint - must be set to match the Region.
Leave blank if not sure. Used when creating buckets only.
Enter a string value. Press Enter for the default ("").
location_constraint>
Canned ACL used when creating buckets and storing or copying objects.

This ACL is used for creating objects and if bucket_acl isn't set, for creating buckets too.

For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Note that this ACL is applied when server side copying objects as S3
doesn't copy the ACL from the source but rather writes a fresh one.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Owner gets FULL_CONTROL. No one else has access rights (default).
   \ "private"
...
acl> 1
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[destination-bucket]
provider = Other
env_auth = false
access_key_id = ***************
secret_access_key = ***************
region = region-2
endpoint = https://destination.example.com
acl = private
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
destination-bucket  s3
source-bucket  s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

In the final state, both configs should be shown similar to the below

user@server:~$ rclone config show
[source-bucket]
provider = Other
env_auth = false
access_key_id = ***************
secret_access_key = ***************
region = region-1
endpoint = https://source.example.com
acl = private

[destination-bucket]
provider = Other
env_auth = false
access_key_id = ***************
secret_access_key = ***************
region = region-2
endpoint = https://destination.example.com
acl = private

user@server:~$

With the above both source and target endpoints can be accessed. This can be checked with rclone ls command easily

user@server:~$ rclone lsf --max-depth 1 source-bucket:testbucket
1.txt
2.txt
3.txt
logs/
csv_files/

Sync/Migrate Bucket Contents 

After verifying that the buckets can be accessed successfully, sync/migration can be started. To be able to do that rclone sync  command can be used.

First the command can be run with –dry-run  option to see everything is fine for the sync

user@server:~$ rclone --dry-run sync prod-cloudian-files:cloudian-files ags1-cloudian-files:cloudian-files
2024/04/19 13:06:40 NOTICE: csv_files/2024-06-01.csv: Skipped copy as --dry-run is set
2024/04/19 13:06:40 NOTICE: csv_files/2024-06-02.csv: Skipped copy as --dry-run is set
...

If considering that everything is ok, then we can proceed with normal run.

user@server:~$ rclone sync -P source-bucket:testbucket destination-bucket:testbucket | tee -a sync-progress.log
...
Checks:              3791 / 13807, 27%
Transferred:            3 / 3, 100%
Elapsed time:      1m43.5s
Checking:
 * 1.txt: checking
 * 2.txt: checking
 * 3.txt: checking
 * logs/access.log: checking
 * logs/error.log: checking
 * csv_files/2024-06-01.csv: checking
 * csv_files/2024-06-02.csv: checking
 * csv_files/2024-06-03.csv: checking
Transferred:             81.360k / 81.360 kBytes, 100%, 805 Bytes/s, ETA 0s
Checks:              3811 / 13827, 28%
Transferred:            3 / 3, 100%
Elapsed time:      1m44.0s
Checking:
 * csv_files/2024-06-04.csv: checking
 * csv_files/2024-06-05.csv: checking
 ...

⚠️ In the above example, the source and the destination bucket names are kept same due to the fact that they are inside different S3 clusters. You may need to change the bucket name in the destination depending on your setup.

Always consider about running a diff sync for the changes before the cutover.