LinuxCommandLibrary

aws-s3-sync

Synchronize files between local and S3 locations

TLDR

Sync files and directories from local to a bucket

$ aws s3 sync [path/to/file_or_directory] s3://[bucket_target_name]/[path/to/remote_location]
copy

Sync files and directories from a bucket to local
$ aws s3 sync s3://[bucket_source_name]/[path/to/remote_location] [path/to/file_or_directory]
copy

Sync objects between two buckets
$ aws s3 sync s3://[bucket_source_name]/[path/to/remote_location] s3://[bucket_target_name]/[path/to/remote_location]
copy

Sync local files to S3 while excluding specific files or directories
$ aws s3 sync [path/to/file_or_directory] s3://[bucket_target_name]/[path/to/remote_location] --exclude [path/to/file] --exclude [path/to/directory]/*
copy

Sync objects between buckets and delete destination files not in source
$ aws s3 sync s3://[bucket_source_name]/[path/to/remote_location] s3://[bucket_target_name]/[path/to/remote_location] --delete
copy

Sync to S3 with advanced options (set ACL and storage class)
$ aws s3 sync [path/to/local_directory] s3://[bucket_name]/[path/to/remote_location] --acl [private|public-read] --storage-class [STANDARD_IA|GLACIER]
copy

Sync files to S3 and skip unchanged ones (compare size and modification time)
$ aws s3 sync [path/to/file_or_directory] s3://[bucket_name]/[path/to/remote_location] --size-only
copy

Display help
$ aws s3 sync help
copy

SYNOPSIS

aws s3 sync <SOURCE> <DESTINATION> [<OPTIONS>]

Examples of SOURCE/DESTINATION:
<LocalPath>
s3://<Bucket>[/Prefix]

PARAMETERS

<SOURCE>
    The path to the source directory or S3 prefix.

<DESTINATION>
    The path to the destination directory or S3 prefix.

--exclude <PATTERN>
    Excludes files or objects that match the specified pattern from the synchronization operation.

--include <PATTERN>
    Includes files or objects that match the specified pattern. This option overrides any preceding --exclude patterns.

--dryrun
    Performs a trial run without making any changes, showing which operations would be performed.

--delete
    Deletes files or objects in the destination that are not present in the source. Use with extreme caution.

--exact-timestamps
    Compares timestamps with millisecond precision instead of just second precision to determine if a file needs to be synchronized.

--acl <ACL>
    Sets the access control list (ACL) for the uploaded objects. E.g., 'public-read', 'private'.

--storage-class <CLASS>
    Sets the storage class for the uploaded objects. E.g., 'STANDARD', 'INTELLIGENT_TIERING', 'GLACIER_IR', 'DEEP_ARCHIVE'.

--sse
    Enables server-side encryption for the uploaded objects using S3-managed encryption keys.

--sse-kms
    Enables server-side encryption for the uploaded objects using AWS KMS-managed keys. Optionally specify 'KMSKeyId'.

--sse-c
    Enables server-side encryption for the uploaded objects using customer-provided encryption keys. Requires 'SSECustomerKey' and 'SSECustomerKeyMD5'.

--profile <PROFILE>
    Uses a specific profile from the AWS credentials file to authenticate the command.

DESCRIPTION

The aws s3 sync command provides a robust and efficient way to synchronize files and directories between a local filesystem and an Amazon S3 bucket, or between two S3 locations.

It intelligently determines which files need to be copied by comparing file sizes and last modified timestamps (and MD5 checksums for S3 objects). This makes it significantly more efficient than simple copy operations, especially for large datasets or repeated synchronization tasks, as it only transfers new or modified files.

Common use cases include backing up local data to S3, deploying website content, replicating data between S3 buckets, and ensuring data consistency across various environments. It supports powerful options for filtering files, controlling access, and managing storage classes, offering a cloud-native equivalent to the traditional rsync utility for S3.

CAVEATS

The --delete option is powerful and can lead to irreversible data loss. Always perform a --dryrun before using --delete to understand the impact.

Synchronization is primarily based on file size and last modified timestamp. While efficient, differences in timestamp precision between local filesystems and S3 can sometimes lead to unexpected re-transfers. The --exact-timestamps option can mitigate this but might also increase transfer frequency.

The command generally performs a one-way sync from source to destination. It does not handle bi-directional synchronization or advanced versioning logic automatically.

COMPARISON LOGIC

When synchronizing, aws s3 sync compares files/objects based on their size and last modified timestamp. If both are identical, the file is skipped. For S3-to-S3 transfers or after uploading local files, it also uses the ETag (often an MD5 checksum for non-multipart uploads) to verify integrity and determine if an object has changed.

MULTIPART UPLOADS

For large files, aws s3 sync automatically leverages S3's multipart upload capability. This breaks large files into smaller parts, uploading them concurrently, which significantly improves transfer speed and resilience, especially over unstable networks. If a part fails, only that part needs to be re-uploaded.

HISTORY

The aws s3 sync command is an integral part of the AWS Command Line Interface (CLI), which was first released in 2013. It quickly became a crucial utility for developers and system administrators, providing a native and optimized way to interact with Amazon S3. Its development focused on addressing the common need for efficient, robust data synchronization that was previously handled by general-purpose tools like rsync or custom scripts, adapting them for the cloud-object storage paradigm. Over the years, it has been continually updated to support new S3 features, performance improvements, and user experience enhancements.

SEE ALSO

aws s3 cp(1), aws s3 mv(1), aws s3 rm(1), rsync(1)

Copied to clipboard