Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few more:

* Multipart uploads cannot be performed from multiple machines having instance credentials (as the principal will be different and they don't have access to each other's multipart uploads). You need an actual IAM user if you want to assemble a multipart upload from multiple machines.

* LIST requests are not only slow, but also very expensive if done in large numbers. There are workarounds ("bucket inventory") but they are neither convenient nor cheap

* Bucket creation is not read-after-write consistent, because it uses DNS under the hood. So it is possible that you can't access a bucket right after creating it, or that you can't delete a bucket you just created until you waited enough for the changes to propagate. See https://github.com/julik/talks/blob/master/euruko-2019-no-su...

* You can create an object called "foo" and an object called "foo/bar". This will make the data in your bucket unportable into a filesystem structure (it will be a file clobbering a directory)

* S3 is case-sensitive, meaning that you can create objects which will unportable into a filesystem structure (Rails file storage assumed a case-sensitive storage system, which made it break badly on macOS - this was fixed by always using lowercase identifiers)

* Most S3 configurations will allow GETs, but will not allow HEADs. Apparently this is their way to prevent probing for object existence, I am not sure. Either way - cache-honoring flows involving, say, a HEAD request to determine how large an object is will not work (with presigned URLs for sure!). You have to work around this doing a GET with a Range: of "very small" (say, the first byte only)

* If you do a lot of operations using pre-signed URLs, it is likely you can speed up the generation of these URLs by a factor of 10x-40x (see https://github.com/WeTransfer/wt_s3_signer)

* You still pay for storage of unfinished multipart uploads. If you are not careful and, say, these uploads can be initiated by users, you will be paying for storing them - there is a setting for deleting unfinished MP uploads automatically after some time. Do enable it if you don't want to have a bad time.

These just off the top of my head :-) Paradoxically, S3 used to be revolutionaly and still is, onl multiple levels, a great products. But: plenty features, plenty caveats.



The one that caught me a couple of weeks ago is multipart uploads have a minimum initial chunk size of 5 MiBs (https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts...). I built a streaming CSV post-processing pipeline in Elixir that uses Stream.transform (https://hexdocs.pm/elixir/Stream.html#transform/3) to modify and inject columns. The Elixir AWS and CSV modules handle streaming data in but the AWS module throws an error (from S3) if you stream "out" that totals less than 5 MiBs as is uses multi-part uploads which made me sad.


The last part can be any size, so with a few tweaks to the streaming code you should be fine. Ready-made AWS SDKs handle this (chunking) for you. Truth be told, the multupart upload on GCP is even worse :/




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: