It's useful if you want array-like semantics (e.g. O(1) lookup) on Unicode text ...

Joker_vD · on June 2, 2023

And it compresses just as well as UTF-8 for transfer/storage purposes.

chrismorgan · on June 2, 2023

Except code point indexing simply isn’t useful.

In the words of the article: “The choice of UTF-32 (or Python 3-style code point sequences) arises from wanting the wrong thing.”

nerdponx · on June 3, 2023

I think that's the issue here. People disagree on how useful or not useful it is. It's maybe not ideal, but I don't think it's anywhere near so bad as to be entirely not useful. Strings-are-sequences-of-bytes is worse in my opinion. Python literally used to have that. It was worse.

chrismorgan · on June 4, 2023

The problem with what Python used to have is that the encoding wasn’t fixed.

I’ll agree with you that strings-are-sequences-of-bytes is bad. That’s painful compiler-flag, codepage, &c. territory.

But what’s not bad is strings-are-sequences-of-code-units. That’s what Rust has, for example. Rust strings aren’t sequences of bytes, but of UTF-8 code units, and the two are semantically very different.