Before golang was a thing, there were highly scalable systems that handled way more traffic than anything written in golang today. Those systems were (and are) written in languages like C++ and Java and C#.
You're just seeing golang in articles because of hype.
Java and C# lag behind Golang on most performance metrics. Combine it with the awesome deployment story (single binary) and you'd be hard pressed to choose the former
I'd like to see those performance metrics. Other than that, this is not true to the slightest, not just from what I observed, but from established performance people like Martin Thompson[1]. If you watch that talk, he mentions towards the end that they ported Aeron (originally Java) to C#, golang, and C++. The Java version was the fastest out of the box, but with some work, they were able to get the C# version to be faster. I suspect this mainly has to do with value types, which is being developed for the JVM as well.
What you're probably referring to is GC pauses. The golang GC is tuned for latency, at the expense of throughput. The JVM has several GCs, and is gaining several more like Shenandoah and ZGC, which allows you to select the GC that best fits your use case. You can tune for latency or throughput.
A lot of Java deployments these days are in the form of uber/shaded jars, which is basically one jar file that contains the entire app, and run with a single command, not much different than running a binary.
But you must understand your Java app won’t have performance advantage because of faster alloc speed IRL: GC will take lots of CPU, because Go’s one is much easier at memory release. You can only see allocation advantage in micro benchmarks where the app stops before the GC will start.
> You can only see allocation advantage in micro benchmarks where the app stops before the GC will start.
The golang gc is tuned for latency at the expense of throughput, meaning if you look at the duration of time spent in GC over the course of the code execution, it would actually be longer compared to a GC tuned for throughput.
If you have a use case that requires high throughput, then you cannot change the GC behavior. Unlike on the JVM, where you have several GCs to choose from. The JVM is also getting two new low latency GCs for use cases that require low latency.
And it's not just microbenchmarks where Java does better than golang, it's especially longer running processes where the JVM's runtime optimizations really kick in. Not to mention that the JVM is getting value types as well to ease the load on the GC when required (it does an excellent job as it is even without value types).
I did a dummy port of the C# version of the Havlak code here[1] to Java, preserving the same behavior and not making any data structure changes. On the machine I tested on, the C# version took over 70 seconds to run, while the Java version took ~17 seconds. In comparison, the C++ version took ~24 seconds, and the golang version took ~30 seconds.
Yes, you could most likely spend much more time tuning the C++ version, avoiding allocations, and so on, but at the expense of readability. This is what the JVM gives you, you write straight-forward, readable code, and it does a lot of optimizations for you.
The brainfuck2 benchmark is especially interesting. Kotlin tops the list, but I was able to get Java to the same performance since Kotlin by writing it in a similar manner as the Kotlin code. Again, Java/Kotlin beat out even C++ when I tested them, and by quite a margin.
How much CPU GC takes for any given GC implementation is largely down to the design of the application, its data structures and allocation graph.
Request / response servers which keep caches and other allocations prone to middle age death out of the GC heap are consistent with the generational hypothesis and ought to spend no more than a few (low single digit) percent in GC with a generational collector.
This benchmark isn't really useful as you pointed out. Microbenchmarks are always tricky, but check out the other two posts I just wrote here (about Martin Thompson and the benchmarks on GitHub) for hopefully more realistic benchmarks.
Because of the use case. Go wins if all you need is the easiest way to write services with high concurrency requirements. I expect this is true for Twitch's systems.
Crystal is still immature, Rust is more suited to use cases where you want to avoid garbage collection.
Hype is not the only factor but it makes hiring easier. And anything Google puts its weight behind will get hyped. More often than not it's better to choose a technology which suits your organisational (read: hiring) needs.
I'd argue that Java or C# would have worked out just fine for Twitch. There was a recent post on Twitch's early architecture, and it seems they started out with Ruby. Unsurprisingly, they had to switch from it once they needed performance (similar story happened with Twitter).
You're just seeing golang in articles because of hype.