AWS sync is not reliable!

While migrating from s3cmd to aws s3 cli i noticed that files don鈥檛 yet sync when using aws cli.

I tested so far with different versions and they all revealed the same behavior:

  • python2.7-awscli1.9.7
  • python2.7-awscli1.15.47
  • python3.6-awscli1.15.47

Test-Setup

  1. Setup AWS CLI utility and configure your credentials
  2. Create a testing S3 bucket
  3. Setup some random files
    bash
    #create 10 radnom files 谩 10MB
    for i in {1..10}; do dd if=/dev/urandom of=multi/part-$i.out bs=1MB count=10; done;
    # then copy the first 5 files over
    mkdir multi-changed
    cp -r multi/part-{1,2,3,4,5}.out multi-changed
    # and replace the content i 5 files
    for i in {6..10}; do dd if=/dev/urandom of=multi-changed/part-$i.out bs=1MB count=10; done;

Testing S3 sync with aws cli

Cleanup

$ aws s3 rm s3://l3testing/multi --recursive 

Inital sync

$ aws s3 sync multi s3://l3testing/multi
upload: multi/part-1.out to s3://l3testing/multi/part-1.out         
upload: multi/part-3.out to s3://l3testing/multi/part-3.out      
upload: multi/part-2.out to s3://l3testing/multi/part-2.out      
upload: multi/part-4.out to s3://l3testing/multi/part-4.out      
upload: multi/part-10.out to s3://l3testing/multi/part-10.out    
upload: multi/part-5.out to s3://l3testing/multi/part-5.out      
upload: multi/part-6.out to s3://l3testing/multi/part-6.out      
upload: multi/part-8.out to s3://l3testing/multi/part-8.out      
upload: multi/part-7.out to s3://l3testing/multi/part-7.out      
upload: multi/part-9.out to s3://l3testing/multi/part-9.out  

Update files

Only 5 files should now be uploaded. Timestamps for all 10 files should be changed.

$ aws s3 sync multi-changed/ s3://l3testing/multi/

ERROR: No files synced!

Testing with s3cmd

Cleanup

$ aws s3 rm s3://l3testing/multi --recursive 

Inital sync

$ s3cmd sync -v --check-md5 multi-changed/  s3://l3testing/multi/
s3cmd sync --delete-removed multi/  s3://l3testing/multi/ 
upload: 'multi/part-1.out' -> 's3://l3testing/multi/part-1.out'  [1 of 10]
 10000000 of 10000000   100% in    1s     5.12 MB/s  done
upload: 'multi/part-10.out' -> 's3://l3testing/multi/part-10.out'  [2 of 10]
 10000000 of 10000000   100% in    1s     7.54 MB/s  done
upload: 'multi/part-2.out' -> 's3://l3testing/multi/part-2.out'  [3 of 10]
 10000000 of 10000000   100% in    1s     8.60 MB/s  done
upload: 'multi/part-3.out' -> 's3://l3testing/multi/part-3.out'  [4 of 10]
 10000000 of 10000000   100% in    1s     7.17 MB/s  done
upload: 'multi/part-4.out' -> 's3://l3testing/multi/part-4.out'  [5 of 10]
 10000000 of 10000000   100% in    1s     7.72 MB/s  done
upload: 'multi/part-5.out' -> 's3://l3testing/multi/part-5.out'  [6 of 10]
 10000000 of 10000000   100% in    1s     8.19 MB/s  done
upload: 'multi/part-6.out' -> 's3://l3testing/multi/part-6.out'  [7 of 10]
 10000000 of 10000000   100% in    1s     7.60 MB/s  done
upload: 'multi/part-7.out' -> 's3://l3testing/multi/part-7.out'  [8 of 10]
 10000000 of 10000000   100% in    1s     7.73 MB/s  done
upload: 'multi/part-8.out' -> 's3://l3testing/multi/part-8.out'  [9 of 10]
 10000000 of 10000000   100% in    1s     7.52 MB/s  done
upload: 'multi/part-9.out' -> 's3://l3testing/multi/part-9.out'  [10 of 10]
 10000000 of 10000000   100% in    1s     8.31 MB/s  done
Done. Uploaded 100000000 bytes in 12.9 seconds, 7.38 MB/s.

Now update the files

Only 5 files should now be uploaded. Timestamps for all 10 files should be changed.

s3cmd sync  --delete-removed multi-changed/  s3://l3testing/multi/ 
upload: 'multi-changed/part-10.out' -> 's3://l3testing/multi/part-10.out'  [1 of 5]
 10000000 of 10000000   100% in    1s     5.97 MB/s  done
upload: 'multi-changed/part-6.out' -> 's3://l3testing/multi/part-6.out'  [2 of 5]
 10000000 of 10000000   100% in    1s     9.45 MB/s  done
upload: 'multi-changed/part-7.out' -> 's3://l3testing/multi/part-7.out'  [3 of 5]
 10000000 of 10000000   100% in    1s     9.18 MB/s  done
upload: 'multi-changed/part-8.out' -> 's3://l3testing/multi/part-8.out'  [4 of 5]
 10000000 of 10000000   100% in    1s     8.81 MB/s  done
upload: 'multi-changed/part-9.out' -> 's3://l3testing/multi/part-9.out'  [5 of 5]
 10000000 of 10000000   100% in    1s     8.79 MB/s  done
Done. Uploaded 50000000 bytes in 5.8 seconds, 8.17 MB/s.

Note: s3cmd also supports --dry-run.

SUCCESS: File content got updated…
WARNING: ..timestamps not

Analysis

Summary

Using --debug and aws s3api list-objects --bucket l3testing reveals that objects are stored as storage-class=STANDARD and do have their hashes.

Using aws cli --exact-timestamps, --delete and the payload_signing_enabled-option did change nothing.

Looking at the sync strategies (search for syncstrategy) within the aws cli sources reveals that they really shitty and as github issues reveal, that they are still doing a lot of unecessary things. Stackoverflow and Github reveals that there are several issues, also when syncing files over 5GB.

AWS Default sync fails MD5 #facepalm

We also get this when checking with s3cmd after an inital aws cli sync:

$ s3cmd sync -v --dry-run  multi-changed/  s3://l3testing/multi/
INFO: No cache file found, creating it.
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 10 files, this may take some time...
INFO: Retrieving list of remote files for s3://l3testing/multi/ ...
INFO: Found 10 local files, 10 remote files
INFO: Verifying attributes...
INFO: disabled md5 check for part-1.out
INFO: disabled md5 check for part-10.out
INFO: disabled md5 check for part-2.out
INFO: disabled md5 check for part-3.out
INFO: disabled md5 check for part-4.out
INFO: disabled md5 check for part-5.out
INFO: disabled md5 check for part-6.out
INFO: disabled md5 check for part-7.out
INFO: disabled md5 check for part-8.out
INFO: disabled md5 check for part-9.out
INFO: Summary: 0 local files to upload, 0 files to remote copy, 0 remote files to delete
INFO: Done. Uploaded 0 bytes in 1.0 seconds, 0.00 B/s.

Also, wehen we use the s3cmd for initial sync, aws cli also won鈥檛 be able to do a sync.

AWS CLI internaly uses boto3 and aws s3api CreateMultipartUploadTaskInspecting for multipart-uploads. MD5 checksums for the consolidated uploaded parts are correctly transferred but somehow not stored.

Better solutions?

Tooling

Sure! My choice would be s4cmd which does the sync perfectly and is currently as fast as node-s3-cli. AWS CLI is currently as fast but well has faulty sync. node-s3-cli is baded on node and it’s said they still have some issues.

Performance

Activating the fast bucket option at AWS console just serves more reliable connections (less latency). This can range about [-7%, -1%, 1%, %1, %2, %3, 7%] speed improvements for some lcoations. I soemtiems can observe that when using too many connections it can hang a bit. Yet, I do not recommand to pay for that micro-option since multi-part uploads with files consolidated an the client side should be standard for HTTPS S3 API.

Further notes

AWS just does MD5 which should be sufficient for most files (yet I had md5 collisions in my life as developer!)

From their documentation

--payload_signing_enabled Refers to whether or not to SHA256 sign sigv4 payloads. By default, this is disabled for streaming uploads (UploadPart and PutObject) when using https.

Trackback

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.