Here's my $700 surprise bill story. There are many like it, but this one is mine.
* An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
* Not trusting CloudFormation, I looked through each (known service, region) manually to make sure resources had been torn down. This failed to identify the running resources because a tutorial div opened in regions with no running resources and remained open if you switched to a region with running resources, hiding them.
* Not trusting my manual service tour, I kept a close eye on my daily costs until I saw several days pass with $0 spend. This failed because free tier credits were hiding substantial service usage.
* Not trusting any of the above, I had billing alerts set as a catch-all. They correctly triggered on an unrelated usage surge, but with such high latency that I incorrectly attributed their failure to reset to high latency rather than to a genuine underlying charge.
Bam, $700 charge next month. Amazon was quick to refund half of it. I was eventually able to get them to refund the other half by making waves in the support system of a high-spend business account.
At the last re:invent session I went to, I surveyed a table of 6 people. After sharing my $700 figure, 3 of the 6 came forward with even bigger numbers, 1 of the 6 with a smaller number, and the remaining person was a newbie.
This sort of story probably fully explains origins of the OP's lament (that AWS billing is so opaque that even internal Amazon teams can't figure it out.) Some AWS middle manager knows that if he orders better billing tools to be implemented, AWS income will drop; not significantly, but ~$700 multiplied by the number of AWS users might make enough of a dent to be noted in his performance report. In order to make this omission more plausible, even internal investigation tools (which would encourage infrastructural improvements that make user-facing tools easier to build) are neglected.
But they didn't actually get the $700 they refunded it, along with spending who knows how many support hours along the way.
Maybe there enough people out there who don't notice a $700 overage to offset that, but it seems like there would at least be a visible decrease in support costs if they had a better system.
I suspect that full refunds for user mistakes aren't entirely typical.
Not everyone has a high spend account and a rep who brags about their policy of refunding mistakes frequently enough to snipe them with "If you have a policy of refunding mistakes, then why didn't you?" in front of customers.
I would like to emphasize that I've chosen my words carefully: I don't know for certain that the rep intervened, but I made trouble, emailed him the information necessary to resolve my issue, and then my half refund became a full refund. Any connection between these dots is pure speculation on my part.
Part of retail sales is people not caring enough to spend the energy to return / get a refund. Even if this story resulted in a refund, there are probably 100 other who didn't bother because it wasn't worth their time.
This is why it’s crucial to review employees against each of the _project_ goals that they aimed to deliver. You cannot judge all work equally, sometimes the appropriate outcomes can be not just orthogonal (has zero benefit with respect to performance metric “A”) but can even be detrimental (negative with respect to performance metric “A”) ... not handling this with something is what create these kind of problems.
Not from reinvent, but similar bill latency problem ...
Many government agencies have to go through 3rd parties ("lowest qualified bidder", which AWS often doesn't bid on) to pay our bill... And the contractors themselves use 3rd parties to figure out how much to bill us... so things like AWS bill alerts are not possible (imagine the whole billing section being permission denied while using the root account user). In addition, 3rd parties do not always provide great tools to set up bill alerts.
We have 40+ AWS account and can't track our spend without 10 button clicks per account which can't be done in parallel due to the apps browser caching locking us to working with one account at a time or it corrupts the cache. The process takes a minute per account, if the website doesn't crash trying to process our records (which are >3 million line item records for some accounts)
All that said: we basically were screwed at monitoring our bill.
In the fun that is semi-server less, we had a container running in ECS and logged directly into cloud watch. It went into an infinite loop late one month. It didn't really take our bill out of the expected ballpark when it was processed by the third party and delivered to us 20 days after the bill cycle closed
Next month however, by the time we got the bill, it had run for the ENTIRE month in an infinite loop, and already 75% of the next month. It pushed 50,000GB of an infinite loop log data at 50 cents per GB ingested the first month (plus storage costs). (That's about $30,000 for month 1)
AWS did not provide assistance because it was paid for on credits, but it basically ate all or remaining credits after the second month bill came in.
After that, we got the contractor to simply give us a copy of the "detailed billing reports" daily and built a process to make our own bill monitors (which at that stage had markups from the contractor). We eventually got somewhat of a better monitoring system through the third party app as well, but we were not aware that was possible because it was not accessible without the contractor setting it up for us (hidden menu options to combine accounts)
Better, there is a plug-in that manages your account switching for AWS so you can stay in one container. I'm on mobile so I don't have the name right now.
This all being said, I'm a generally happy AWS admin... What happened really can't be blamed squarely on anyone... AWS can't show us a bill since they don't bill us.. the 3rd party probably could have helped with the bill alerts before the incident, but they did help after... Also couldn't really expect AWS to provide more credits to cover lost credits they already provided to help us get off the ground. Overall we really wouldn't be able to do what we do with any on prem solution.
> What happened really can't be blamed squarely on anyone
Call this a "distributed denial of responsibility attack". It's very convenient that there's no one point of easy blame, that means that nobody has to change.
When I had shutdown my startup, I had to remove the resources from AWS which produced ~$1000/mo. I removed those and followed the same practices as the parent did along with regular Any.do alerts to ensure I didn't miss anything. In subsequent months I found that, there was something or the other which came up in charges, like a CloudWatch Alarm, Log somewhere. It took a while to remove all of them to bring a zero bill.
I'm not complaining that it's something nefarious on AWS part, I'm saying it's designed in a way such that someone using various services of it can easily loose track of billing.
Yes, I could have closed the account altogether; but I didn't want to. Now I wonder, if AWS starts charging for the billing alerts itself whether I would catch it before I actually receive a billing alert for the billing alerts.
Among the cloud providers, this seems to be unique to Amazon. There's a lot of malfeasance that's unique to Amazon, like HQ2 and the counterfeit and offensive products on Amazon.com. I don't feel like giving any part of their company money. For twitch I donate on Stream Labs. It sucks not being counted as a subscriber or being able to use the emotes, but I prefer that over Amazon taking a cut.
Google's Firebase has had some absolute horror stories, especially when Firebase found out they were under-billing some people and then "adjusted" their bill to be more accurate, causing massive charges.
I recently had a surprise GCP bill from a personal kubernetes project I did. It wasn't anything crazy, but it made me realize that I'm always only one click away from seriously hitting my credit card.
What the HQ2 malfeasance? Ie, if they committed unlawful acts why not pursue that in court?
I know they selected a location in New York they shouldn't have / the community rejected, and after a big outcry ended up going with not against community pressure.
I think the one in Virginia is going ahead, they just got permits there for a metro station. Are there protests in Virginia they are not listening too?
Actually the scale is larger than that. Amazon got over 100 cities to bid on the project. Entered into deeper negotiations with at least 20 of those. Got many cities to spend hundreds (or thousands) of hours putting proposals together, pitching teams, crunching numbers, offering deeper tax incentives than anyone else can get. In the end, they chose the location they were going to choose anyways.
So it was a colossal waste of time for 5,000+ people across North America. Burnt a lot of goodwill. Many cities learned a valuable lesson on dealing with these big multi-nationals as a result of it. So it won't likely happen like this again.
Genuinely curious - I thought it was strange when it first started - but surely every company must do this? Wouldn't making the investment of a large presence likely involve seeking concessions from each location as part of the scouting process?
What did Amazon do different here? Instead of in back rooms, it was all out in the open.
The posters here are misusing malfeasance terribly.
Other examples are federal grant programs - they are required to publicize them and lots of folks / nonprofits spend lots of time applying but reality is most will renew to existing partners etc
I believe (from having read a fair amount of the process hullabaloo) that they were taking advantage of tax incentives available to all employers choosing to locate in that area. To the extent that’s the case, I see absolutely no malfeasance.
There were additional, negotiated real estate related credits that are also likely to be similarly negotiated by any other developer.
They were negotiating with the government - who writes the laws.
I was curious about the malfeasance - but if this is the the claim of criminality - uh... not a good look for the folks yelling at amazon.
I think MUCH stronger claims may be possible around just their fake product volume and consumer harm there, but good luck with these claims - are they being litigated anywhere?
What is the cost reporting situation on azure and gcp?
We've gone for aws because they're supposed to have good customer support, but opaque cost reporting and the inability to inpose spending limits is a concern.
Does anyone know how azure/gcp:
1. Handle cost reporting
2. Handle spending limits (e.g. can I impose a hard spending limit per service/per user/globally?)
Used all 3 for complicated things, preferred AWS for better overall security/compliance support and more features. Azure is largely comparable to AWS for most shops and some services can be slightly cheaper. GCP seems to be way behind both but I'm sure lots of places could get away with using them.
1. Azure basically has comparable cost reporting to Amazon, though has a cost aggregator if you want to use both Azure and AWS. I personally thought it didn't really bring all the nice features AWS billing had into Azure very well so I'd not recommend if your AWS usage is large and varied. I found GCP to have less features than either Azure or AWS for billing.
2. None of the 3 providers have a hard spending limit feature, though Google app engine service (not GCP) let's you shut it down. Other than that, permission roles are generally the same, AWS wins slightly on features again but Azure had a slightly nicer UI.
Anyways, you should do your own research on what cloud seems sane to you, and not let randos on HN make your business decisions.
One thing that bit me was that GCP spending limits could only be updated on a 24 hour timer (so one full day before they kicked in)
This is absolutely terrible if you have a spike in legitimate traffic and try to increase the limit and you lose all that “front page of hn traffic” forever because your supposedly scalable system didnt scale.
I have no idea if they have fixed that issue yet but I doubt it.
2) No cloud has proper spending limits (aside from barebone compute like linode). It works on your bill continuing to get pricier as their overall multitenant costs come down.
> An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
Similar happened to me at a Kubernetes/GCP tutorial. We started with a $100 credit. The tutorial included setting up massive (for tutorial purposes) instances. Because I had played with my account before and had created a single instance before, I hit account limits and my tutorial code failed to work. A frustrating experience richer a was very busy at work when returning from the conference.
When getting back to my tutorial code 3 weeks later, I noticed that less than $1 of my $100 were left. The n - 1 instances I created during the tutorial had been running for 3 weeks. User error of course, but at which SW job no errors happen?
The only positive about GCP I remember is that they promised not to overflow from the credit into real money from my credit card. At least that's how I remember it. Did not (need to) test, because I noted the issue $1 early. In AWS there is no such promise as we know.
* An example CloudFormation from a Re:Invent (AWS conference) session silently failed to tear down some resources.
* Not trusting CloudFormation, I looked through each (known service, region) manually to make sure resources had been torn down. This failed to identify the running resources because a tutorial div opened in regions with no running resources and remained open if you switched to a region with running resources, hiding them.
* Not trusting my manual service tour, I kept a close eye on my daily costs until I saw several days pass with $0 spend. This failed because free tier credits were hiding substantial service usage.
* Not trusting any of the above, I had billing alerts set as a catch-all. They correctly triggered on an unrelated usage surge, but with such high latency that I incorrectly attributed their failure to reset to high latency rather than to a genuine underlying charge.
Bam, $700 charge next month. Amazon was quick to refund half of it. I was eventually able to get them to refund the other half by making waves in the support system of a high-spend business account.
At the last re:invent session I went to, I surveyed a table of 6 people. After sharing my $700 figure, 3 of the 6 came forward with even bigger numbers, 1 of the 6 with a smaller number, and the remaining person was a newbie.