The Unicode Consortium celebrated its 25th anniversary last year. The truth is that despite all the work Unicode does to ensure text from languages around the world work — most of us know Unicode as the group that approves new emojis.
What might not be so clear is why a large consortium is required, or the hidden complexity of Unicode. Or how the vomit emojis shown in the XCKD cartoon above are already considered "valid (but not recommended)".
Above: Many think of Unicode in terms of emoji support. We do.
Mark Davis, co-founder and current-day president of Unicode, has sought to clarify how emoji fits into Unicode in this high-level overview that looks at what Unicode is, and how the Unicode Emoji Subcommittee ("Emoji SC") fits into it.
Davis notes that emojis make up just a fraction of the total number of characters in the Unicode Standard.You can barely make them out in this chart:
All images that follow are from this presentation.
Characters alone don't tell half the story. A number of glyphs need to combine when displayed in certain orders or combinations.
A combination that will be familiar to many is how emoji skin tones are implemented.
A more complicated implementation involves joining two or more emojis together into what is called an Emoji ZWJ Sequence.
A "ZWJ" (Zero Width Joiner) character stands between these two emojis, and is an invisible glue that joins multiple emojis into one (where supported).
And yes, you can combine modifiers and ZWJs to create a longer sequence.
Unicode doesn't control ZWJ Sequences in the same way as new emojis that require their own code point.
Unicode does recommended sequences which should be supported for cross-platform consistency. However vendors are free to combine any emoji with any other, as they see fit.
Astro Cat is valid (as it uses a correct sequence structure) but not recommended like other professions and genders are.
XKCD suggested that vomit should be a modifier character to make a "Vomiting Cowboy".
Above: Vomiting suggestions from XKCD. Davis notes that no modifier is needed for these.
Other sequence types exist for emoji, including flag sequences, tag sequences and keycap sequences. You should check out the entire set of slides to see these in more detail.
Finally, a look at the (current, 2017) timeline for how a new emoji is born:
🚨 Update April 2020: the current timeline for how a new emoji is create has been significantly impacted by the COVID-19 pandemic. You can read more about the revised schedule for 2020 and beyond here.
Of course Unicode still has plenty to do outside of emoji support:
"There are approximately 7,000 living human languages, with varying levels of vitality. Less than 100 of these languages are well-supported on computers, mobile phones, and other devices, while all the rest risk being digitally disadvantaged"
Unicode has an Adopt a Character program. Funds raised from adoptions go toward research to support these digitally disadvantaged languages.
Disclaimer: I am a member of the Unicode Emoji Subcommittee. ↩︎
VS-16 is an invisible character that tells the previous character to use emoji presentation. In this example, ♀️ Female Sign has a text and emoji version. ZWJ Sequences should specify the emoji version if both exist and the default is text. ↩︎