unicode articles

Unicode data file compression: achieving 40-70% reduction over gzip alone

A little story about how writing a domain-specific compression algorithm in a few days can sometimes yield big benefits, why it's sometimes worth giving it a shot, and how to tell when you should try. Note: this is about Unicode spec data files, not general purpose text compression.

Unicode sorting is hard & why browsers added special emoji matching to regexp

As I work on Zorex, an omnipotent regexp engine I have stumbled into a world of tales about why Unicode text sorting is so annoying in the modern day. Let's talk about that.