The Linguistic Secrets Found in Billions of Emoji - SXSW 2016 presentation


Every day, we send almost 6 billion emoji from our smartphones, but what kinds of patterns can you find when you look at all this data together? How do different cultures and nationalities use emoji differently? Are there hidden linguistic patterns in our quickly-dashed-off emoji utterances? Do emoji represent a fundamental shift away from old-fashioned word-based language or a return to a more flexible, pre-modern style of textual communication?

Join SwiftKey CTO Ben Medlock and internet linguist Gretchen McCulloch as they share never-before-seen insights based of billions of data points from people's real emoji use.

Find the full audio of this session on SoundCloud here:

  1. 1. Please find the full audio of this session at SXSW on SoundCloud here: interactive-2016 Hashtag: #EmojiLang
  2. 2. SwiftKey is the creator of SwiftKey Keyboard, one of the most popular keyboard apps in the world. We’re known for our AI-powered word and emoji predictions that adapt to your personal writing style. You also may have seen we were very recently acquired by Microsoft.
  3. 3. I’m an internet linguist – I write about linguistics for a general audience, especially internet language. I'm the resident linguist at The Toast, where you may remember me from my articles about the grammar of doge, and the synonyms for "Benedict Cumberbatch.” I also have an upcoming book on internet language with Penguin.
  4. 4. SwiftKey has published several emoji data reports which look at individual emoji use across 30 languages and all 50 US states. This analysis is based on aggregate, anonymized data from users who sign into SwiftKey. We occasionally look at this data for pertinent trends and to make sure SwiftKey’s word & emoji predictions are accurate.
  6. 6. In 1999, mobile phone users in Japan were sending a lot of picture messages – which, due to early tech limitations, were massive files. The solution: instead of encoding images pixel-by-pixel, they encoded common images as "text", aka “emoji.” The first to do this was DoCoMo i-mode, a Japanese mobile phone provider, but others soon followed. The result of this were different sets of images from several Japanese carriers that didn't work on other platforms. The Unicode Consortium then stepped into fix this by standardizing emoji, and also added more symbols from wingdings, webdings, AOL/MSN emoticons, and more.
  7. 7. Emoji became popular in the West after emoji keyboards were introduced on iOS in 2011 and Android in 2013. The Unicode Consortium is still working on adding more emoji in response to pressures from this expanded market, such as the more recent addition of the taco emoji and the skin tone modifiers.
  8. 8. But emoji is not the first way of expressing emotion in digital text! Punctuation, repeated letters, emoticons made from ASCII characters, & online slang/abbreviations all emerged in the digital age as a way of adding color & tone to texts, emails, IMs and more.
  9. 9. And we have data to show that people are using emoji similarly to emoticons. This graph shows usage of the standard ‘smile’ punctuation emoticon by SwiftKey Account users, compared with usage of the Tears of Joy emoji over the same time period. You can see these trends crossing paths in late Jan./early Feb. 2015.
  10. 10. Similarly, a study by Instagram shows that people use particular emoji in the same types of contexts where they use certain other types of slang. The Tears of Joy emoji is used like lol, haha, lmao; Heart emoji are used like xoxo, loveyou, muah; Loudly crying emoji is used like ughh, omg, omfg. Emoji are just a newer way of solving a problem that’s existed since the earliest days of the internet – how to efficiently represent emotion along with text.
  11. 11. By themselves, emoji don’t necessarily mean a lot. However – when you look at emoji usage on a massive scale, we can learn quite a bit about our society, our quirks - and ourselves.
  12. 12. Generally, we use emoji to communicate happiness, love & joy. This trend may be related to a similar phenomenon we see on social media, where positive posts significantly outweigh negative. While our lives aren’t necessarily 70% happy & joyful, we want to project that image to others digitally.
  13. 13. This year, Oxford Dictionairies named the “Tears of Joy” emoji the ‘word of the year’. An undeniably happy emoji, it was the most popular in terms of usage worldwide, making up up 17% of emoji used in the US, 20% of all the emoji used in the UK in 2015. This was a huge rise over 2014, where it made up 9% of all emoji in the US and 4% in the UK.
  14. 14. We see a lot of positivity in the top 10 emoji worldwide. Even the ‘sad’ emoji is overly dramatic, we’ll all probably seen it used in ways that aren’t necessarily actually ‘sad’. And all the top emoji are emotional, which is a theme that'll keep coming up.
  15. 15. Again, even if we back up and look at the bigger picture by organizing emoji by category, we still see a lot of positivity and romance, but significant portion of sad/negative emoji as well. Here again, we see a lot of face, hand, and emotional emoji in the top 5 categories -- we'll be getting back to embodiment later.
  16. 16. A single person can also cause an emoji trend. The key emoji has taken off in popularity since DJ Khaled started using it often as a way of spreading tips via his Snapchat & Twitter to indicate ‘major key’ life tips/advice.
  17. 17. Here is SwiftKey data on usage of the key emoji: 500% increase of use of key emoji from Dec 2015 to January 2016. We have no idea whether DJ Khaled himself uses SwiftKey! But the point is it's not just him.
  18. 18. By examining large corpora of emoji data, can look at how this data compares to existing assumptions Emoji usage often reinforces these assumptions, stereotypes – which is even supported by public data – other times it’s completely unexplainable. Best example: Hawaii ranks top or very highly for everything you might expect: palm tree, sunset, cocktail, surfer, pineapple – all of which are things you’d likely see/do, or at least associate with being in Hawaii.
  19. 19. And then there’s Vegas…. Anything surprising?
  20. 20. Louisiana ranks #1 for the use of the guitar emoji; the state is home of New Orleans, the birthplace of jazz.
  21. 21. Ah, the city of love, living up to its reputation!
  22. 22. Interesting fact: Finnish-speaking neighbors of Denmark, Norway and Sweden UNDER index for the Santa emoji, despite a Finland town claiming the title of Santa’s birthplace.
  23. 23. Portugal Portuguese and Australian English speakers are #1 and #2 languages, respectively for use of ‘drugs’ emoji. The countries in which these languages are spoken happen to have some of the most permissive drug laws in the world - Portugal was the 1st country in Europe to decriminalize drugs.
  24. 24. Sometimes the emoji data can be mysterious or very surprising…. For example, Montana, North Dakota and Wyoming are all in the top 5 for use of LGBT emoji (including men holding hands, women holding hands, rainbow). Wyoming and North Dakota are traditionally conservative states; Montana is a bit more split but still trends conservative. This is mainly due to states’ overwhelming use of men/women holding hands, not as much the rainbow.
  25. 25. One hypothesis about why we see certain emoji used in states where it’s completely unexpected is the novelty factor. If something is very novel or uncommon, it may be more likely someone will talk/text about it. An example of this could be Idaho, which ranks #1 for the iPhone/mobile phone emoji – not California, despite it being the home of Silicon Valley, birthplace of Apple and Google. Example 2: Many sunny states do not rank for using sun/hot weather emoji. Some, in fact, significantly under-index for these emoji.
  26. 26. Arabic speakers primarily live in North Africa and the Middle East, both regions known for dry weather and desert environments.
  27. 27. Let's move on to the reason you've got a linguist on this panel: Are emoji actually a language? If you're inclined to moral panics, you could follow that with "are emoji a universal language?, "Are emoji going to replace English?" and "Are we going to hell in a handbasket????" but let's start with the basic question.
  28. 28. People often see emoji and think of hieroglyphs, but they're not actually the same. In the history of our own alphabet, we've seen symbols move from a very concrete level, for ex. an ox's head literally standing for an ox, to a more abstract shape standing for the sound at the beginning of the WORD for ox, which was aleph, to getting transmitted to cultures that didn't even know that it was about oxen at all and just adopted it as the word for the letter, the Greek alpha. The thing is, it's way more useful to be able to write any word with an "a" sound than to only be able to talk about oxen - if you scale it up, linguistic abstraction is the difference between a zoo or the entire universe. So if emoji are language, they need to be abstract. They need to have a meaning that's more than what you can guess based on the literal object in the picture.
  29. 29. So are there emoji mean more than their literal symbol? Well, kinda – for example, the eggplant emoji is a phallic symbol, the peach is also generally interpreted as raunchy, and the painted fingernails is often used to be kind of dismissive -- all things that you wouldn't necessarily know from their pictures.
  30. 30. But having a non-literal meaning isn't enough to say that emoji are language. We have other non-verbal symbols that carry meaning, many of which are quite old. For example, the heart symbol doesn't look like your physical heart, and even the association between "love" and the physical heart is essentially arbitrary. Same goes for a lot of symbols: "stop" for example isn't inherently octagonal, and so on. For emoji to really be linguistic, they also have to mean different things in different combinations. In other words, they need GRAMMAR. There has to be a difference between "dog bites man" and "man bites dog."
  31. 31. So in order to figure out whether we can really talk about a GRAMMAR of emoji, we took a look at what people are actually doing when they create sequences of emoji.
  32. 32. These are the top 7 most commonly used emoji combination categories. Similar to individual emoji, when used in combination, it’s most often in a positive, affirming way, or to express love -- and we resoundingly see emotional, face and hand emoji.
  33. 33. If we drill down a bit, we see that the Tears of Joy emoji also dominates emoji combos like it does individual emoji use. It’s clear we love to laugh and communicate laughter digitally!
  34. 34. For Tears of Joy in particular, it's the most common emoji both singly and in combination with other emoji, appearing in 30% of combinations including 2, 3, or 4 emoji.
  35. 35. We also found a slight preference for people to use tears of joy before other emoji, indicating their stance before commenting on the topic. It's slight but significant, since we're talking billions of datapoints. This is a nice validation of our study with earlier work by linguist Tyler Schnoebelen, who used a completely different dataset and also found that emotions tended to be before objects. But we want to see what else we can find.
  36. 36. Because there are no emoji to signify sex or sex acts, people get inventive! Surprisingly: there is no eggplant emoji in the top 200 emoji combinations.
  37. 37. There are a few other creative emoji combinations that we discovered through research about which emoji are used most often with certain words, phrases or names. People can be quite resourceful! 1-Beyonce/’Queen Bey’ 2-Snorting cocaine 3-Pope bars – a meme about the Pope rapping based on a photo that was captured while he was traveling & speaking
  38. 38. But just because we CAN create emoji stories, how much do we actually DO SO? Only 4.6% of all SwiftKey sessions include any emoji at all. If you put that the other way around, this means 95% don't, so we definitely haven't stopped using words just because we have emoji.
  39. 39. Within the 4.6% of text messages including emoji, 15% of this is JUST emoji used alone, no words. The types of emoji use that make headlines are things like the translation of Moby Dick into emoji (called Emoji Dick). It's a cool art project, but it doesn't represent common usage. Most people use emoji WITH their words, not instead of them.
  40. 40. Digging into this 15% of emoji-only texts further, we see that most of the emoji-only messages are just one or two emoji. There are so SO FEW extended sequences of emoji that could be thought of as emoji stories, less than 1 in 1000. But we have billions of data points, so let's see what some of this fraction of a fraction of a fraction of potential emoji stories look like
  41. 41. Here are the most common sequences of three and four emoji. You’ll notice that there's LOTS of repetition -- in fact, they're all repetition. You have to get down to #23 on each of these lists before any non-repeating sequences of emoji show up at all. But the really cool thing here is, linguists have already calculated this same data for sequences of words, based on large amounts of written text, so let's see what that looks like.
  42. 42. Here are the 10 most common sequences of 3 and 4 words (3-grams and 4-grams) according to the Corpus of Contemporary American English by Mark Davies. This is from a data set of about half a billion words, and you'll notice that unlike emoji, words really don't repeat very much at all. The only cases where you get any repetition at all are “as well as”, “the end of the” and “the rest of the”.
  43. 43. Zooming out a bit and looking at the top 200 pairs of words and top 200 pairs of emoji (bigrams), you'll notice the same thing. None of the word bigrams are repeats, but over half of the emoji bigrams are -- and these numbers look essentially the same for trigrams and quadrigrams. And it's not that you CAN'T repeat words -- you can sometimes, like in "very very very" or "I love love love love it", just like you COULD use heterogeneous sequences of emoji. It's just that people DON'T.
  44. 44. And what do these non-repeating sequences of emoji even look like? Well, a lot of them are still pretty similar, like hearts of different colors, the three different monkey expressions, several faces, or lips plus kiss-face. There is no evidence emoji are commonly used to tell stories, or narrate complex sequences of events. A few people might do it for fun, like playing charades, but for practical purposes, words are better at this task.
  45. 45. So how do emoji fit into our systems of language? Traditionally, we've had both formal and informal spoken language, but written language has only been formal. With the rise of internet, we're filling in that 4th quadrant: informal written language, our texts and tweets and quick messages to each other. And while formal language is disembodied, spoken from behind a podium or going through the second voice of editor and style guide, informal language uses gesture, faces, tone of voice and other things that let us quickly and efficiently communicate emotion -- something that emoji help us do when we start writing it.
  46. 46. Conclusion: Emoji themselves aren’t a new language, but they are a vital part of this new fourth quadrant, this new register of informal written language. They repeat because our emotive gestures (clapping, laughing) often repeat or last for the duration of whatever else we're saying. Emoji make it efficient for normal people to write emotions in real time. Early internet predictions suggested we were going to make avatars to embody ourselves online, but instead we’ve taken to profile pics and face/body/hand gesture emoji to embody our emotions.
  47. 47. Where is electronic communication headed next? GIFs, stickers and custom ‘emoji’ are all on the rise. This doesn't mean they'll replace language either - we tried this back in the silent film era, and guess what, we decided we liked it better when we also had words!
  48. 48. How do we keep emoji fresh & keep adding the new ones that people want, while at the same time keeping them easy to type? The Unicode Consortium is small group of tech execs (of which SwiftKey is a part) and is in charge of adding new emoji, but it's a bit weird that a small committee can have so much power over the emotional range of the world. Language, on the other hand, is a grassroots, open source project, where anyone can write a new word using the same old 26 letters. And all you have to do is convince some fellow humans to use it, not an institution: dictionaries follow common use.
  49. 49. As they keep growing, it's also getting harder to find the emoji you're looking for. Emoji organization on your keyboard imposes a theory of category of the universe, but there's no universally sensible organizational system for the world -- linguists and philosophers have tried since the 1600s, but Leibniz eventually concluded there was no such thing as a universal taxonomy. So rather than deal with a confusing ordering system, people mostly do the fastest thing and re-use their most commonly used emoji.
  50. 50. So will emoji ever become a universal language? Are they one already? No! Sure, emoji are acquiring more abstract meanings and they could eventually acquire a grammar, but every additional level of abstraction makes them less universal. Someone had to tell you what the eggplant emoji means, because it's not obvious from the picture. It's a catch 22: emoji can be universal OR they can be linguistic (ie capable of complex, abstract meaning) but can’t be both at the same time, because that's just not how abstraction works. For example, we (very briefly!) considered doing this talk entirely in emoji but we realized that we couldn't say very much at all. Emoji are paralinguistic, words are linguistic, and they work better together than separately.
