aws-s3-sync
Synchronize files between local and S3 locations
TLDR
Sync files and directories from local to a bucket
Sync files and directories from a bucket to local
Sync objects between two buckets
Sync local files to S3 while excluding specific files or directories
Sync objects between buckets and delete destination files not in source
Sync to S3 with advanced options (set ACL and storage class)
Sync files to S3 and skip unchanged ones (compare size and modification time)
Display help
SYNOPSIS
aws s3 sync <SOURCE> <DESTINATION> [<OPTIONS>]
Examples of SOURCE/DESTINATION:
<LocalPath>
s3://<Bucket>[/Prefix]
PARAMETERS
<SOURCE>
The path to the source directory or S3 prefix.
<DESTINATION>
The path to the destination directory or S3 prefix.
--exclude <PATTERN>
Excludes files or objects that match the specified pattern from the synchronization operation.
--include <PATTERN>
Includes files or objects that match the specified pattern. This option overrides any preceding --exclude patterns.
--dryrun
Performs a trial run without making any changes, showing which operations would be performed.
--delete
Deletes files or objects in the destination that are not present in the source. Use with extreme caution.
--exact-timestamps
Compares timestamps with millisecond precision instead of just second precision to determine if a file needs to be synchronized.
--acl <ACL>
Sets the access control list (ACL) for the uploaded objects. E.g., 'public-read', 'private'.
--storage-class <CLASS>
Sets the storage class for the uploaded objects. E.g., 'STANDARD', 'INTELLIGENT_TIERING', 'GLACIER_IR', 'DEEP_ARCHIVE'.
--sse
Enables server-side encryption for the uploaded objects using S3-managed encryption keys.
--sse-kms
Enables server-side encryption for the uploaded objects using AWS KMS-managed keys. Optionally specify 'KMSKeyId'.
--sse-c
Enables server-side encryption for the uploaded objects using customer-provided encryption keys. Requires 'SSECustomerKey' and 'SSECustomerKeyMD5'.
--profile <PROFILE>
Uses a specific profile from the AWS credentials file to authenticate the command.
DESCRIPTION
The aws s3 sync command provides a robust and efficient way to synchronize files and directories between a local filesystem and an Amazon S3 bucket, or between two S3 locations.
It intelligently determines which files need to be copied by comparing file sizes and last modified timestamps (and MD5 checksums for S3 objects). This makes it significantly more efficient than simple copy operations, especially for large datasets or repeated synchronization tasks, as it only transfers new or modified files.
Common use cases include backing up local data to S3, deploying website content, replicating data between S3 buckets, and ensuring data consistency across various environments. It supports powerful options for filtering files, controlling access, and managing storage classes, offering a cloud-native equivalent to the traditional rsync utility for S3.
CAVEATS
The --delete option is powerful and can lead to irreversible data loss. Always perform a --dryrun before using --delete to understand the impact.
Synchronization is primarily based on file size and last modified timestamp. While efficient, differences in timestamp precision between local filesystems and S3 can sometimes lead to unexpected re-transfers. The --exact-timestamps option can mitigate this but might also increase transfer frequency.
The command generally performs a one-way sync from source to destination. It does not handle bi-directional synchronization or advanced versioning logic automatically.
COMPARISON LOGIC
When synchronizing, aws s3 sync compares files/objects based on their size and last modified timestamp. If both are identical, the file is skipped. For S3-to-S3 transfers or after uploading local files, it also uses the ETag (often an MD5 checksum for non-multipart uploads) to verify integrity and determine if an object has changed.
MULTIPART UPLOADS
For large files, aws s3 sync automatically leverages S3's multipart upload capability. This breaks large files into smaller parts, uploading them concurrently, which significantly improves transfer speed and resilience, especially over unstable networks. If a part fails, only that part needs to be re-uploaded.
HISTORY
The aws s3 sync command is an integral part of the AWS Command Line Interface (CLI), which was first released in 2013. It quickly became a crucial utility for developers and system administrators, providing a native and optimized way to interact with Amazon S3. Its development focused on addressing the common need for efficient, robust data synchronization that was previously handled by general-purpose tools like rsync or custom scripts, adapting them for the cloud-object storage paradigm. Over the years, it has been continually updated to support new S3 features, performance improvements, and user experience enhancements.