Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There might be something a little imprecise here: code points vs code units vs character codes.

I'm open to being wrong but I would be very surprised if they defined text as a "series of code units" the count of which can vary by encoding even for the same character. IMO in this context 'character codes' would likely be far more consistent with 'code points' and they're just trying to differentiate between styled and un-styled text. Whereas the 1.3 definition appears to be trying to make an authoritative definition of 'text.'

If we read 2.2's "character codes" as code points, then that can be multiple code points as referenced in 1.3

[edit] I originally flipped 'units' and 'codes' - cleaned it up.



"Character code" is short for "character code point" or just code point. All Unicode algorithms and properties are defined in terms of the code point. UTF encodings are just a way of encoding a code point. From Unicode's perspective, you care about what is encoded (i.e. the code point) and not how it is encoded (i.e. UTF-8).

Unicode is one of the most poorly understood topics. I think the confusion stems from 1. most programming languages getting the abstraction wrong, and 2. programmers trying to reconcile their non-technical interpretation of what "character" means.


I agree with everything you said, I think I'm just trying to reconcile that with the top of thread saying python was the most correct because it was returning '7 code points' and that 'UTF-whatever is an implementation detail'

But 7 is not the number of code points/USVs - that's the number of UTF-16 code units. The string is 5 USVs. If UTF-whatever is an implementation detail, wouldn't the correct answer to length be 5?

What am I missing haha.


Python does return 5. JavaScript returns 7. Python is returning the number of code points, JavaScript is returning the number of UTF-16 code units.


There's my mistake. Thank you. Flipped them in my head, it's ben a long day.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: