On UTF-16 vs. UTF-32

It'd be more space-efficient [to use UTF-16] in most cases, and less time-efficient in all cases (either "somewhat so" or "grossly so".) Using UTF-16 internally is probably not nearly as far down the Bad Idea scale as... oh, Huffman-coded arrays would be. ("Newly created arrays are full of 0s! Why waste 31 or 63 extra bits? Of course, this makes (SETF AREF) hard, but you could have a bit somewhere!")

Gary Byers in openmcl-devel, discussing (here and here) why variable-length encodings are a bad idea for the internal representation of Lisp strings. An enlightening read, as usual.

0 komentoj: