Making Reasonable Use of Computer Resources

symmetricsaurus · on Aug 8, 2021

The author claims that making better use of the cache hierarchy and programming in a data oriented way will improve perceived performance for users. This isn't where we should start though.

Much of the time bad performance comes from excessive I/O, bad algorithms and data structures (e. g. accidentally quadratic code), or misuse of third party libraries or services. Eliminating problems like this needs to come first and only then is it reasonable to consider the cache-friendlyness of our code.

hsn915 · on Aug 8, 2021

A lot of poor performance in the wild comes from not batching operations that can be batched, thus increasing latency unnecessarily.

Utilization of the CPU's cache line is just one instance of this problem.

For example, imagine trying to parse a text file by literally reading one character at a time from the OS (literally issuing a 'read' command per character) vs reading the entire file upfront and then operating on the text in memory.

Even the if we assume OS does caching for disk data to minimize the actual number of reads from disk, you will still incurs at least an unnecessary cost from issuing all the sys calls (one per character).

The same can be said about, for example, issuing sql commands, and other things.

jhgb · on Aug 8, 2021

> vs reading the entire file upfront and then operating on the text in memory.

What if you just mmap it?

XorNot · on Aug 8, 2021

I'd argue most user experience problems come from misplaced priorities. For example IO to my HDD shouldn't result in desktop frame rate stutters - ever. Those two subsystems should be totally independent once the system is booted

kfprt · on Aug 8, 2021

obligatory re. quadratic https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

Ericson2314 · on Aug 8, 2021

Saying programmers should "just do more work" is incredible economic naivety.

Back in the day people dependencies were an obscure free software thing and everyone just rewrote the same shit against Microsoft's or AT&T's interfaces, with maybe some in-house libraries.

People sucked at making and using abstractions, and didn't do it.

Then the pendulum swung and people started being download all sorts of crap and everything began to depend on everything else. Every once and a while the pile caves in and people star all over.

People sucked at making and using abstractions, but did it.

There's no way to roll back the clock. Forgoing code reuse is too costly. But we could write better abstractions and think about what it means to maintain whole library ecosystems (planning the division of labor and separation of concerns) rather than just individual libraries.

jfoutz · on Aug 8, 2021

Ftfa

> They can perform more operations in one second than a human could in one hundred years.

Please identify the traffic lights.

We suck at abstractions, but I think we’re still winning there.

Ericson2314 · on Aug 8, 2021

I'm a bit confused about what you are saying. "Please identify where our abstractinos go wrong"? "Our abstractions are slowly getting better"? "Abstractions are bad but not for performance"?

jfoutz · on Aug 8, 2021

Ah, it was a late night quip with the intention of undermining the article.

Yes, computers are fast, but there's no underlying intention. Our abstractions aren't great, but I think our abstractions let us communicate intention in a way that computers are far far away from achieving.

Show a webpage with "identify all the traffic lights" to a child and, it might take a little while (especially if they can't read!) but the human will build up abstraction and leverage prior knowledge and solve the problem.

Show the same webpage to a supercomputer, and nothing will happen, ever. The supercomputer has access to all of the required information via the internet. Heck, the instructions are right there on the page. But there's no intention of solving the problem.

I guess, I'd say our abstractions do get better over time, but it can take a long time. an obvious example is Aristotle vs Newton vs Einstein. Computers haven't even started down that road. There are some vague ideas, and there has been concrete progress in narrow scopes.

I guess I had a somewhat emotional reaction to the criticism of "programs must be written for people to read, and only incidentally for machines to execute." Which lead me down a crazy tangent.

Performance matters, but only insofar as it helps people do stuff. But ultimately that's a people problem not a computer problem. The notion that computers are fast doesn't matter very much. We can use them more efficiently if we want to, but do we really want to? Who cares? Maybe it does make sense, maybe it's better to build a new feature.

Anyway, crazy tangent.

Ericson2314 · on Aug 9, 2021

> I guess I had a somewhat emotional reaction to the criticism...

I agree wtith the rest of your post there.

> I think our abstractions let us communicate intention in a way that computers are far far away from achieving. > > Show a webpage with "identify all the traffic lights" to a child and, it might take a little while (especially if they can't read!) but the human will build up abstraction and leverage prior knowledge and solve the problem.

Ah. I don't measure progress using computers relative to the human brain. So when I say programmers are using bad abstractions, I am not envious of human pattern recognition etc. at all. Likewise natural language abstractions are messy and culturally loaded and so much more I don't want to get into.

When I say abstractions I am just talking about the rigid formal ones we program.

I think the field is too focused on A.I. stuff because of old romantic notions about human vs computer, piss poor education making programming knowledge scarce, and general laziness among the non-technical powers that be about defining the requirements. I don't think should be the main thread of how we use computers in society.

the__alchemist · on Aug 8, 2021

This reminded me of Jai, the language by Jon Blow. In the Primer here: https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md, the section on Data-oriented design is relevant. Note that the bottom of the OP article mentions explicitly the games industry as caring about this; relevant given Blow's background.

hsn915 · on Aug 10, 2021

This link is misleading and contains very outdated information.

the__alchemist · on Aug 10, 2021

You're right - just noticed the note at the top. Do you know of a better and/or current synopsis?

chenglou · on Aug 10, 2021

The new way Jai does SoA is through just userland macros: https://www.youtube.com/watch?v=zgoqZtu15kI

Tldr SoA isn't a single transformation. Depending on your data access pattern, you might want to shape the structure differently. Kinda like GPU. So first-classing a particular pattern of AoS->SoA conversion (the one we often see in tutorials) wouldn't have been enough. In this case, a little bit of tasteful usage of macros covers the variations well enough.

hsn915 · on Aug 10, 2021

People focus too much on the SoA thing as if it was the whole point of the language, but it's really really not.

Jon does a lot of programming streams, and I think anyone who watches his programming streams will recognize that "structure of arrays" is not the default mode that he usually programs in.

That's not even the point of the language.

Yes the language does a lot of what you expect from a modern systems language: structs, explicit memory allocation and management, no header files, etc.

But as far as I can tell, this is just the bare minimum that you should expect from a modern systems language, and the ultimate goal of this language is not to just be that.

The best way I can think of to describe it is collapsing layers into just one: the language and its compiler.

This is why the language has meta programming and code generation built in.

I've seen talks where people boast about how they use a python script to generate a whole lot of C++ code!

I've seen projects written ostensibly in C++ but to build them you need download 10 different tools (including compilers or interpreters for other languages).

Big C++ projects are notorious for being a nightmare to build.

The point of this language is to make it so that you only need one thing: the compiler of this language.

You can write a large system program that builds very quickly and does not need any extra tools to build.

Ultimately even the linker would be deprecated: the compiler would output the executable file directly. But for practical reasons the language has to support linking in order to interop with C libraries.

chenglou · on Aug 13, 2021

Oh for sure, build is the main point and SoA isn't. The SoA feature discussions probably came from its deprecated interesting approach.

Collapsing into a single layer is great. As an aside, another language notorious for doing so would be Smalltalk, which is on the complete opposite end of the spectrum in terms of paradigm. It's kinda interesting to see the horseshoe theory in action in software philosophy.

the__alchemist · on Aug 11, 2021

Thanks for the great explanation. Do you know how it plans to handle dependencies? Is it expected to work on embedded platforms like Cortex-M?

It's nice seeing a mix of modern systems languages: Zig, Rust, Jai. All seem very well designed and promising.

hsn915 · on Aug 13, 2021

I don't really know .. these are just guesses:

- I think it will not support any platform that is less than 64 bits.

- I think the preferred way will be to explicitly host the dependency in your repository. What ever the language does to help with dependency management will probably be geared towards that direction. Code bases are expected to be self contained. Once you checkout the code from your version control system, you should not need to download an additional thing from the internet to build.

That said, there is a video where Jon talked at length about his plans for dependency management, but it was more of an information discussion than a concrete plan. It is from 3 years ago (2018.01)

https://www.youtube.com/watch?v=3TwEaRZ4H3w

jameshart · on Aug 8, 2021

It’s interesting the author frames cache optimization as a way to make ‘reasonable use’ of computing resources.

But cache locality is only relevant to your code if you have a monopoly on a CPU core. There are likely hundreds of other processes running on a machine apart from yours and to make ‘reasonable’ use of computing resources your program needs to be effective and perform at even if those other processes preempt your code from running.

Making your code really good at using ALL the compute resources on a machine is not the same thing as making your code make reasonable use of computer resources.

rrss · on Aug 8, 2021

If one process is using only 4 bytes out of every 64 byte cache line fill, the rest of the line is wasted regardless of whether there are hundreds of other processes loading different lines into the various levels of cache. It’s not using more of the cache, just using the same lines (or fewer) more effectively.

vzaliva · on Aug 8, 2021

The SCIP quote the author tries to dispute is still very true. If you ask developers of a decently sized modern mature software system to list the main problems they are encountering, they will probably name complexity, legacy code, lack of documentation, bugs, and not the performance.

As to making readable programs run fast, this is the job for compilers. Compilers should perform more and more sophisticated code transformations for high-level languages with well-defined semantics, generating faster and faster code. The application-level programmers should not be burdened with details like CPU cache size.

jcelerier · on Aug 8, 2021

> Compilers should perform more and more sophisticated code transformations

Maybe but they don't, and my users asked for more performance yesterday (and the year before, and the year before that too, and likely the decade before that too).

The problem with performance is that some people believe that it's a target that can be reached ; but the only acceptable target in most cases is "everything is instantaneous" - otherwise "power users" in your domain will complain and give you a bad reputation.

But because people seemingly don't care or are in a position of power, I'm at a friend's home who asked me desperately to "look at why their computer doesn't work well, it's full of viruses" ; computer from 2012 takes 2min30 to go from "power button" to MS Word, there's literally nothing else installed than office and chrome and definitely no viruses, just windows services hogging all the resources because some people at Redmond likely believes in magic optimizing compilers.

II2II · on Aug 8, 2021

> and my users asked for more performance yesterday (and the year before, and the year before that too, and likely the decade before that too).

That made me think of the traditional response of throwing bigger and faster computers at a problem. The thing is, much of the performance of those bigger and faster computers comes from architectural optimizations that can only be fully exploited if the program takes advantage of them. In some cases, putting the code through an optimizing compiler may be enough. In other cases, the program needs to be explicitly designed to take advantage of those optimizations. (The cache is an okay example of this. There was a time when caches did not exist. The introduction of caches would have sped up some programs more than others, depending upon the program's design.)

hsn915 · on Aug 8, 2021

Deferring responsibility to other people is not a suitable attitude for a professional software engineer.

There's a dangerous hidden assumption too in this mentality: the compiler is magic!

It's not.

The compiler mostly just does "micro optimizations" on the code.

The most trivial example to demonstrate this is looping on a grid by column vs by row. The by-column iteration will be slower due to bad utilization of the CPU cache, and no compiler will ever rearrange the loop. Not because compilers are being purposely stupid. There's just no way to prove that the semantics will stay the same (except in the most trivial situation).

rrss · on Aug 8, 2021

> the most trivial example to demonstrate this is looping on a grid by column vs by row. The by-column iteration will be slower due to bad utilization of the CPU cache, and no compiler will ever rearrange the loop

I think LLVM’s polyhedral optimization (Polly) probably will. I played with it a while back and found that if I handed it a naive matrix multiply with terrible locality, it was able to transform it to rearrange the iteration order and use tiling such that the naive w/ Polly ewas faster than any optimized version I could write. (my optimized versions probably weren’t great, but I tried the basics and got the normal speedups - Polly was just faster).

https://polly.llvm.org/ https://releases.llvm.org/12.0.1/tools/polly/docs/UsingPolly...

hsn915 · on Aug 9, 2021

> matrix multiply

probably one of the easy cases? I suppose the compiler must be able to somehow prove to itself that the code does not do anything else.

What if, for example, within the loop you "printf(column)" or something like that? In this case re-arranging the loop would produce a different result, wouldn't it? So the compiler probably will not optimize it in that case, would it?

tuatoru · on Aug 8, 2021

No. No. And maybe.

What developers think does not matter if you want your software to be used.

Responsiveness has always been the number one UI complaint.

If your program is sluggish, that's virtually always because it's badly designed. That's your fault, not the compiler's.

Application programmers don't need to know about CPU cache size because they should have designed their programs to perform on low-end hardware.

mlang23 · on Aug 8, 2021

I am working in HPC, and I beg to differ. Yes, compilers should do optimisation whereever possible, but that is only a small percentage of the story. If the programmer doesnt know which algorithms to choose, the compiler is unable to fix that. If the programmer does not understand the underlying architecture at least a bit, the compiler is also likely going to fail fixing that.

Take one of the most basic programming examples, fib. Depending on how your programming language might be implemented, the recursive definition is likely slower then necessary. So at the end of the lesson, you're being taught to make a loop out of the recursion, so save stack space and such. But usually, the lesson ends here. But this is sooooo far away from the best solution. Think 2x2 matrix and the power of n. No compiler will ever transform your naiv recursive fib to that. So no, the compiler is not the ultimate answer. If you dont manage to educate more programmers, we will continue to run slow software.

wiz21c · on Aug 8, 2021

yeah, but the more you optimize, the less your code is a way to communicate an idea on how/why (business wise) you did some things. I used to optimize some code. When I try to squeeze every CPU cycle, then I usually have to turn my algorithm upside down so that they "fit" correctly to the CPU architecture (cache, branch prediction, avoiding costly, instructions). In the end, algorithm cleverness, which is important to understand to read my code is totally muddied by optimization techniques here and there.

Having the right balance between performance and readability is tough...

Another example I cam across a while ago is matrix multiplication. See

https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

The naive algorithm is super simple to understand and corresponds to the actual math definition. But the optimized algorithm, that's something totally different :-)

II2II · on Aug 8, 2021

Your point that the application level programmer should not be burdened with details like CPU cache size is something the article's author may agree with. They mentioned that cramming 17 floats into a cache that can hold 16 is (premature) optimization.

On the other hand, the author is arguing that developers should take the presence of a CPU cache into consideration. The distinction is important. By failing to take the hardware into consideration it becomes much more difficult to solve performance issues down the road since an avenue to resolve it will likely involve rewriting disparate parts of code. Incidentally, this type of issue is unlikely to be addressed by optimizing compilers since it requires a knowledge of the runtime performance of disparate parts of code and may involve code that is not compiled at build time (e.g. the data structure may be passed to a third-party library).