What were the use cases where you found it useful to index by code point (and th...

arp242 · on June 2, 2023

In many cases it's not very useful, but there are clearly cases where it is, e.g. if you want to normalize text, compose/change emojis, stuff like that.

A codepoint is the "smallest useful addressable unit" when dealing with Unicode text, so it makes sense that's the default.

It's also comparatively expensive to address grapheme clusters.

lmm · on June 4, 2023

> In many cases it's not very useful, but there are clearly cases where it is, e.g. if you want to normalize text, compose/change emojis, stuff like that.

I can see that iterating through by codepoint could be useful for some of those cases, but I still can't see why you'd ever want to index by codepoint?

arp242 · on June 7, 2023

For the same reason you want to index anything: to slice, remove, etc. stuff. e.g. to replace a skin tone in an emoji: "str[i] = 0x1f3ff", or to insert one: "str = str[:i] + 0x1f3ff + str[i:]".

lmm · on June 12, 2023

But that's a pointlessly inefficient way to do it - surely what you want there is to iterate and transform rather than scan through and then slice? (And don't you need to group by extended grapheme cluster rather than codepoint anyway for that to make sense?)