First of all, this sounds like an amazing project. My workplace runs isort+flake8 in pre-commit hooks, so making these even 10x faster would be a huge quality of life improvement for us.
Personally I'm interested to hear from you what are the specific reasons you can't achieve this kind of performance with CPython.
Usually the major factors are: A) python's generalised data structures (int, list, etc); B) extra overhead of common operations like reading a variable value, calling a function or iterating over a collection; C) no real multithreading (i.e. the GIL); D) lack of control over memory management.
I'd love to know if there's anything else that makes Python that much slower.
It's mostly the reasons you've hit on but I'll try to add some color to them based on my experience with Ruff.
1. The "fearless concurrency" that you get with Rust is a big one though. Ruff has a really simple parallelism model right now (each file is a separate task), but even that goes a long way. I always found Python's multi-processing to be really challenging -- hard to get right, but also, the performance characteristics are often confusing and unintuitive to me.
2. Ruff performs very few allocations, and Rust gives us the level of control to make that possible. (I'd like to perform even fewer...) We tokenize each file once, run some checks over that stream, parse it into an AST, run some checks over that AST, and with a few exceptions, the only allocations outside of that process are for the Violation structs themselves.
3. Related to the above (and this would be possible with CPython too), by shipping an integrated tool, we can consolidate a lot of work that would otherwise be duplicated in a more traditional setup. If you're using a bunch of disparate tools, and they all need a tokenized representation, or they all need the AST, then they're all going to repeat that work. With Ruff, we tokenize and parse once, and share that representation across the linter.
4. Again possible with CPython, but in Ruff, we take a lot of care to only do the "necessary" work on a given invocation. So if you have the isort rules enabled, we'll do the work necessary to sort your imports; but if you don't, we skip that step entirely. It sounds obvious, but we try to extend this "all the way down": so if you have a subset of the pycodestyle rules enabled, we'll avoid running any of the expensive regexes that would be required to power the ignored rules.
Personally I'm interested to hear from you what are the specific reasons you can't achieve this kind of performance with CPython.
Usually the major factors are: A) python's generalised data structures (int, list, etc); B) extra overhead of common operations like reading a variable value, calling a function or iterating over a collection; C) no real multithreading (i.e. the GIL); D) lack of control over memory management.
I'd love to know if there's anything else that makes Python that much slower.