> Then everyone started copying them without knowing why
People tend to have a very bad sense of what constitutes large scale. It usually maps to "larger than the largest thing I've personally seen". So they hear "Use X instead of Y when operating at scale", and all of a sudden we have people implementing distributed datastore for a few MB of data.
Having gone downward in scale over the last few years of my career it has been eye opening how many people tell me X won't work due to "our scale", and I point out I have already used X in prior jobs for scale that's much larger than what we have.
100% agree. I've also run across many cases where no-one bothered to even attempt any benchmarks or load tests on anything (either old or new solutions), compared latency, optimize anything, etc.
Sometimes making 10+ million dollar decisions off that gut feel with literally zero data on what is actually going on.
It rarely works out well, but hey, have to leave that opening for competition somehow I guess?
And I'm not talking about 'why didn't they spend 6 months optimizing that one call which would save them $50 type stuff'. I mean literally zero idea what is going on, what actual performance issues are, etc.
Yep. I've personally been in the situation where I had to show someone that I could do their analysis in a few seconds using the proverbial awk-on-a-laptop when they were planning on building a hadoop cluster in the cloud because "BIG DATA". (Their Big Data was 50 gigabytes.)
I remember going to a PyData conference in... 2011 (maybe off by a year or two)... and one of the presenters making the point that if your data was less than about 10-100TB range, you were almost certainly better off running your code in a tight loop on one beefy server than trying to use Hadoop or a similar MapReduce cluster approach. He said that when he got a job, he'd often start by writing up the generic MapReduce code (one of the advantages is that it tends to be very simple to implement), starting the job running, and then writing a dedicated tight loop version while it ran. He almost always finished implementing the optimized version, got it loaded onto a server, and completed the analysis long before the MapReduce job had finished. The MapReduce implementation was just there as "insurance" if, eg, he hit 5pm on Friday without his optimized version quite done, he could go home and the MR job might just finish over the weekend.
It's self reinforcing too. All the "system design" questions I've seen have started from the perspective of "we're going to run this at scale". Really? You're going to build for 50 million users -from the beginning-? Without first learning some lessons from -actual- use? That...seems non-ideal.
Place I’ve recently left had 10M record MongoDB table without indexes which would take tens of seconds to query. Celery was running in cron mode every 2 second or so meaning jobs would just pile up and redis eventually ran out of memory. No one understood why this was happening so just restart everything after pagerduty alert…
Yikes. Don’t get me wrong, it’s always been this way to some extent - not enough people who can look into a problem and understand what is happening to make many things actually work correctly, so iterate with new shiny thing.
It seems like the last 4-5 years though have really made
it super common again. Bubble maybe?
Huge horde of newbs?
Maybe I’m getting crustier.
I remember it was SUPER bad before the dot-com crash, all the fake it ‘til you make it too. I even had someone claim 10 years of Java experience who couldn’t write out a basic class on a whiteboard at all, and tons of folks starting that literally couldn’t write a hello world in the language they claimed experience in, and this was before decent GUI IDEs.
> It seems like the last 4-5 years though have really made it super common again. Bubble maybe?
Cloud providers have successfully redefined the baseline performance of a server in the minds of a lot of developers. Many people don't understand just how powerful (and at the same time cheap) a single physical machine can be when all they've used is shitty overpriced AWS instances, so no wonder they have no confidence in putting a standard RDBMS in there when anything above 4GB of RAM will cost you an arm and a leg, therefore they're looking for "magic" workarounds, which the business often accepts - it's easier to get them to pay lots of $$$$ for running a "web-scale" DB than paying the same amount for a Postgres instance, or God forbid, actually opting for a bare-metal server outside of the cloud.
In my career I've seen significant amount of time & effort being wasted on workarounds such as deferring very trivial tasks onto queues or building an insanely-distributed system where the proper solution would've been to throw more hardware at it (even expensive AWS instances would've been cost-effective if you count the amount of developer time spent working around the problem).
Just to give a reference for those that don't know, I rent a dedicated server that has 128gb of ram and 16 core processor (32 threads) and 2tb of local SSD storage and virtually unlimited traffic for $265 USD a month. A comparable VM on AWS would be around $750 a month (if you reserve it long term) and then of course you will pay out the nose for traffic.
the one of those most likely to be humming along fine is redis in my experience. once ssh'd to the redis box (ec2), which was hugely critical to business: 1 core, instance had been up for 853 days, just chilling and handling things like a boss.
This is funny, because I suffer from the opposite issue... every time I try to bring up scaling issues on forums like HN, everyone says I don't actually need to worry because it can scale up to size X... but my current work is with systems at 100X size.
I feel like sometimes the pendulum has swung too far the other way, where people deny that there ARE people dealing with actual scale problems.
In this case it might be helpful to mention the solutions you’ve already tried/evaluated and the reasons why they’re not suitable. Without those details you’re no different from the dreamers who think their 10GB database is FAANG-scale so it’s normal that you get the usual responses.
I mean what percentage of companies are web scale or at your scale? I would guess around 1% being really generous. So it makes sense that the starting advice would be to not worry about scaling.
I get it, and I can't even say I blame the people for responding like that.
I think it is the same frustration I get when I call my ISP for tech support and they tell me to reboot my computer. I realize that they are giving advice for the average person, but it sucks having to sit through it.
Nothing quite as anger inducing as knowing WHY it is that way, but also knowing you are stuck, it makes no sense for you, and it sucks ass.
My new fav rant is the voice phone systems for Kaiser, which makes me say 'Yes or No' constantly - but literally can only hear me somehow if I'm yelling. And they don't tell you to press a number to say yes or no until after you've failed several times with the voice system.
All human convos have zero issues, not even a little faint.
Probably true - hopefully you can prefix your question with 'Yes, this is 10 Exabytes - no, I'm not typo'ng it' to save some of us from foot-in-mouth syndrome?
That is probably a good idea, get that out of the way up front.
I feel similar frustrations with commenters saying I am doing it wrong by not moving everything to the cloud… I work for a CDN, we would go out of business pretty quickly if we moved everything to the cloud. Oh well.
Yes, exactly. When people cite scaling concerns and/or big data, I start by asking them what they mean by scale and/or big. It's a great way to get down to brass tacks quickly.
Now when dealing with someone convinced that their single TB of data is google scale, the harder issue is changing that belief. But at least you know where they stand.
That sounds like you're not giving enough detail. If you don't mention the approximate scale that you have right now, you can't expect people to glark it from context.
Same. I think there's this idea that 5 companies have more than 1PB of data and everyone else is just faking it. My field operates on many petabytes of data per customer.
Yes, the set of people truly operating "at scale" is more than FAANG and far, far less than the set of people believing they operate "at scale". This means there are still people in that middle ground.
One gotcha here is not all PBs are equal. My field also is a case where multi-PB datastores are common. However for the most part those data sit at rest in S3 or similar. They'll occasionally be pulled in chunks of a couple TB at most. But when you talk to people they'll flash their "do you know how big our storage budget is?" badge at the drop of a hat. It gets used to explain all sorts of compute patterns. Meanwhile, all they need is a large S3 footprint and a machine with a reasonable amount of RAM.
People tend to have a very bad sense of what constitutes large scale. It usually maps to "larger than the largest thing I've personally seen". So they hear "Use X instead of Y when operating at scale", and all of a sudden we have people implementing distributed datastore for a few MB of data.
Having gone downward in scale over the last few years of my career it has been eye opening how many people tell me X won't work due to "our scale", and I point out I have already used X in prior jobs for scale that's much larger than what we have.