Chirp 2010: Twitter International
Upcoming SlideShare
Loading in...5
×
 

Chirp 2010: Twitter International

on

  • 2,761 views

Talk on Twitter International from the Twitter Chirp conference. Presented on 2010-04-15.

Talk on Twitter International from the Twitter Chirp conference. Presented on 2010-04-15.

Statistics

Views

Total Views
2,761
Views on SlideShare
2,712
Embed Views
49

Actions

Likes
4
Downloads
45
Comments
1

4 Embeds 49

http://www.labnol.org 32
http://www.slideshare.net 14
http://coderwall.com 2
http://paper.li 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • The original file is a Keynote presentation. On Windows I'm guessing it thinks .key is a registry extension. Sorry about that. I'll upload PDFs in the future.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Holding pattern. <br />
  • Title Slide. <br /> &#x2022; &#x201C;let&#x2019;s get to it&#x201D; <br />
  • &#x2022; Who am I? <br /> &#x2022; Some business-y talk for the entrepreneurs in the group <br /> &#x2022; Notes on how we&#x2019;ve gone about translation <br /> &#x2022; Engineering challenges for the coders in the group <br />
  • &#x2022; From Summmize <br /> &#x2022; Search, platform (might remember API Group) <br /> &#x2022; Original OAuth (sorry) <br /> &#x2022; International (translation, char counting, twitter-text) <br />
  • &#x2022; Before the technical stuff a little info on why you should be interested. <br /> &#x2022; the main reason: Users <br />
  • &#x2022; You might have seen the blog post on international growth. <br /> &#x2022; Passed 50% not long after the team formed <br /> &#x2022; In large part: Japan, translation <br />
  • &#x2022; Take advantage of these markets. <br /> &#x2022; Due to a slew of factors dev is mainly US (Twitter, EN, etc) but that does not mean it&#x2019;s not looking outward <br />
  • &#x2022; Case in point, Chile <br /> &#x2022; Translating alone was not a big jump <br /> &#x2022; But we had set the stage. When the need for faster information arose we were there <br />
  • &#x2022; You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. <br /> &#x2022; What&#x2019;s great isn&#x2019;t the users, but the utility [click] <br /> &#x2022; This tweet for example. <br /> &#x2022; It&#x2019;s not what someone had for breakfast, but solving a real communication problem. <br />
  • &#x2022; You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. <br /> &#x2022; What&#x2019;s great isn&#x2019;t the users, but the utility [click] <br /> &#x2022; This tweet for example. <br /> &#x2022; It&#x2019;s not what someone had for breakfast, but solving a real communication problem. <br />
  • &#x2022; unlike the event-driven growth in Chile, Japan is a long-term stead growth <br /> &#x2022; We&#x2019;ve been dedicating resources and working on more local features <br />
  • &#x2022; Rather than users I want to highlight daily unique &#x2018;Tweeters&#x2019; (people who tweet) <br /> - We&#x2019;ve been working as much on adding people as increasing the utility to those people <br /> - Done this via a new mobile site matching Japanese expectations, along with email/photoposting <br /> &#x2022; The red dot here is the &#x2018;follow me&#x2019; feature on the mobile site. It&#x2019;s not the sole cause of the uptake but it&#x2019;s helped. <br /> &#x2022; I&#x2019;d like to take a moment and explain that &#x2026; [next] <br />
  • &#x2022; We&#x2019;ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the &#x2018;follow me&#x2019; flow. <br /> &#x2022; This is something that people can learn from: We took advantage of existing user behavior, even though it&#x2019;s not a behavior in the US. We use the QR-code. <br /> &#x2022; QR-codes are big in Japan [click] &#x2026; like this one on a sign. Goes to the store site <br /> &#x2022; People are so used to this they use it for context like [click] these real estate listings <br /> &#x2022; We used this existing behavior [click] to let people share their &#x2018;contact info&#x2019; in the form of their twitter profile. <br /> - Like &#x2018;Bump&#x2019; on the iPhone but it works on all handsets in Japan and is immediately evident to users. <br /> <br />
  • &#x2022; We&#x2019;ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the &#x2018;follow me&#x2019; flow. <br /> &#x2022; This is something that people can learn from: We took advantage of existing user behavior, even though it&#x2019;s not a behavior in the US. We use the QR-code. <br /> &#x2022; QR-codes are big in Japan [click] &#x2026; like this one on a sign. Goes to the store site <br /> &#x2022; People are so used to this they use it for context like [click] these real estate listings <br /> &#x2022; We used this existing behavior [click] to let people share their &#x2018;contact info&#x2019; in the form of their twitter profile. <br /> - Like &#x2018;Bump&#x2019; on the iPhone but it works on all handsets in Japan and is immediately evident to users. <br /> <br />
  • &#x2022; We&#x2019;ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the &#x2018;follow me&#x2019; flow. <br /> &#x2022; This is something that people can learn from: We took advantage of existing user behavior, even though it&#x2019;s not a behavior in the US. We use the QR-code. <br /> &#x2022; QR-codes are big in Japan [click] &#x2026; like this one on a sign. Goes to the store site <br /> &#x2022; People are so used to this they use it for context like [click] these real estate listings <br /> &#x2022; We used this existing behavior [click] to let people share their &#x2018;contact info&#x2019; in the form of their twitter profile. <br /> - Like &#x2018;Bump&#x2019; on the iPhone but it works on all handsets in Japan and is immediately evident to users. <br /> <br />
  • &#x2022; We&#x2019;ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the &#x2018;follow me&#x2019; flow. <br /> &#x2022; This is something that people can learn from: We took advantage of existing user behavior, even though it&#x2019;s not a behavior in the US. We use the QR-code. <br /> &#x2022; QR-codes are big in Japan [click] &#x2026; like this one on a sign. Goes to the store site <br /> &#x2022; People are so used to this they use it for context like [click] these real estate listings <br /> &#x2022; We used this existing behavior [click] to let people share their &#x2018;contact info&#x2019; in the form of their twitter profile. <br /> - Like &#x2018;Bump&#x2019; on the iPhone but it works on all handsets in Japan and is immediately evident to users. <br /> <br />
  • &#x2022; We&#x2019;ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the &#x2018;follow me&#x2019; flow. <br /> &#x2022; This is something that people can learn from: We took advantage of existing user behavior, even though it&#x2019;s not a behavior in the US. We use the QR-code. <br /> &#x2022; QR-codes are big in Japan [click] &#x2026; like this one on a sign. Goes to the store site <br /> &#x2022; People are so used to this they use it for context like [click] these real estate listings <br /> &#x2022; We used this existing behavior [click] to let people share their &#x2018;contact info&#x2019; in the form of their twitter profile. <br /> - Like &#x2018;Bump&#x2019; on the iPhone but it works on all handsets in Japan and is immediately evident to users. <br /> <br />
  • &#x2022; Translation is a big part of what we do, and we do it a little different <br /> &#x2022; Like all features we turn to users for feedback. Could have paid, would have been cheaper, but would not have had community feedback <br /> &#x2022; Crowd-source, like open source for data. We had a great group &#x2026; [next] <br />
  • Of more than 2,600 translators. <br /> - Soon to send out more invites. Planning to make it open to anyone later this year. <br />
  • Twitter isn&#x2019;t just 200 labels. Settings, about pages, features, etc. <br /> [click] and more features every day. <br />
  • Twitter isn&#x2019;t just 200 labels. Settings, about pages, features, etc. <br /> [click] and more features every day. <br />
  • Those 2,600 translators have been so passionate it just blows me away. As of today they&#x2019;ve contributed 480k translation <br />
  • &#x2022; We augmented with a wonderful group in-house (shoutout) <br /> &#x2022; Built the tool into twitter.com, provides context (see pointer) for quality, social game dynamic in jump-around prompt (see counter) <br /> &#x2022; DB backed with cache, no-deploy launching. <br /> &#x2022; Multi-level voting <br />
  • We&#x2019;ve released translations of the most common terms on the wiki so you can use them. We want to provide even more help, let us know how. New translation UI upcoming (not of too much interest, other than more data) <br />
  • Engineering topics. Not complete but most i18n topics boil down to things that are easy 99% of the time and very hard 1% of the time. We&#x2019;ll cover parsing tweets, counting characters, and invalid tweet text <br />
  • Twitter-text libs. <br /> - Extract, autolink <br /> - Open Source Ruby and Java. Also following community ports to Python and PHP (though PHP could use some love). Look forward to more. <br /> - Conformance data: Unicode, YAML, assurance, non-EN test cases <br /> &#x2022; A good example of the 1% issues we handle in the libs are Japanese Tweets &#x2026;[next] <br />
  • &#x2022; Punctuation: s sucks in most languages. Full-width @ and # (if you want more info on this let me know afterward.) <br /> &#x2022; No spaces between words. Turns out, we assume a lot [click] <br /> - http://S+ does not work. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; When your product is char based, it matters. Some issues are obvious, some not. <br /> &#x2022; [click] Don&#x2019;t count bytes. You knew that. <br /> &#x2022; [click] Don&#x2019;t count code points. That&#x2019;s news to many people. <br /> &#x2022; We try to count what a person would call a char, where possible. So, we [click] use the shortest. <br />
  • &#x2022; Two types of things we don&#x2019;t allow. On purpose, technical limitation. <br /> &#x2022; On Purpose: BOM (not utf-16), reserved, dir change (security, layout is not at home in a Tweet) <br /> &#x2022; Limitations of MySQL (&lt;v6) prevent some chars. (small set of Kanji, musical symbols, ancient scripts) <br />
  • <br />

Chirp 2010: Twitter International Chirp 2010: Twitter International Presentation Transcript

  • Twitter International
  • Twitter International by Matt Sanford
  • Agenda: * Who am I * Some business-y talk about popularity outside of the US What’s on Tap * Some quick notes on our translation process * Technical details on what’s hard about non-English text • Hashtag for Questions: #chirpintl • Who’s this guy? • Twitter’s popularity outside of the US • Twitter’s Current & Future Translation Tools • Non-English Tweet Handling • Extraction and Auto-linking with Twitter Text • Character Counting • Invalid Tweet Text
  • Matt Sanford / @mzsanford • Joined Twitter from Summize (Twitter Search) • Worked on Search and Platform Short bio slide. Helpful when it comes to Q&A time. • Search by language, search refresh bar • Original OAuth implementer at Twitter • Now tech lead of the International team • Working on translation tools and non-US features • Standardized character counting • Author of Open Source Twitter Text libraries
  • Before I cover any technical details I wanted to give a little information on why people using the Twitter Platform should be interested in International The best way to do that is with numbers … International Business. Why Bother With International?
  • International: 60% & Growing 100% 75% 50% 25% 0% June 2009 September 2009 December 2009 March 2010 Bam. 60% of all Twitter A big part of this is Japan, where accounts are non-US we’re quite popular. … We crossed the 50% Another big part was the new mark September of translation efforts we launched. 2009 Spanish especially has been well received.
  • Attendees vs. Users Non-US 17% US International US 83% Chirp Attendees Twitter Accounts
  • A good example of Twitter International is Chile. Translating didn’t create an explosion in Twitter usage. What created an explosion was a need for faster information. Case Study: Chile We’re There When People Need Us.
  • Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd
  • Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd Urgent. In en Constitución apareció IVAN LARA DE URGENTEConstitucion an eight-year old boy named 8 Ivan Lara showed ABANDONADO en esa ciudad...busca AÑOS QUE ESTÁ up alone. He's looking for his family parientes en todo Chile favor copiar y pegar 10:50 AM Mar 2nd via web
  • As opposed to the event inflection we saw in Chile, in Japan we’ve seen long term, sustained growth. We’ve also been dedicating resources to some local- specific features. Case Study: Japan Not Godzilla Big, But We’re Working On It.
  • Daily Tweeters in Japan More Users Are Good. More Engaged Users Are Better. July ‘09 October ‘09 January ‘10 April ‘10
  • Japanese Mobile Follow Me Take Advantage of Existing Behavior
  • Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog
  • Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • Since translation is a big part of what we’re working on I want to cover that a little bit. Like all Twitter features we rely on user need to help define what we do. We could have paid translators but we felt like having user’s participate in the process was important. Translation Tool That led us to our current crow-sourcing model … Present & Future
  • 2,600 Participating Translators And we plan more than double that number very soon when we send out more invites.
  • 3,500 Strings to Translate
  • 3,600 Strings to Translate
  • 480,000 Translations Staggering passion and participation from the community.
  • On context: Point out the Post-slide note: We’ll be arrow versus the list- rolling out changes very view of other sites. Also: soon that focus on Translation Tools suggestions consensus over new translations. On deploy: unaided today • Volunteer crowd-sourcing • Augmented by in-house people • Built-in to twitter.com • Provides context during translation • Significantly higher quality • Social game dynamics • Database backed and heavily cached • Edits are launched in ~2 hours • Multiple levels of voting • Helps prevent abuse • Built-in proofing system
  • Translation Tools tomorrow • We’ve released some common terms on the API wiki • So you can benefit from our translation work • To help with consistency across clients • We’re hoping to provide even more data in the future • More languages. More strings. More ease. • New translation UI changes coming soon On releasing translations: We made this a goal and covered it in the translation agreement. Let me know after this talk what would help you.
  • Up until now we’ve covered more general Twitter topics. Now we’re going to talk about some of the more complicated topics. Most international issues boil down to things you think are simple turning out to be deceptively hard to get right. Things like: * Parsing t weets (and what’s so hard about it) * Counting characters (and why it’s not that simple) * Tweet text that we cannot accept (today) Engineering Topics Yeah, It’s Complicated.
  • Twitter Text Libraries Rather than re-implement these common features we recommend using the Open Source libraries we help maintain. • Provides extraction and auto-linking If you’re not using Ruby or • @user, @user/list, #hashtag, URLs Java: We provide a cross- language test suite so you can implement the same • Open Source* rules in another language. • Available in Ruby and Java from Twitter • Conformance Testing Data • Modeled after the Unicode conformance suite • YAML description of test cases for any language • Assurance that you meet the same standards • Many non-English test cases * http://twitter.com/about/opensource and on github
  • Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words
  • Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words My homepage is http://twitter.com http://twitter.com
  • Character counting Unicode FTW!
  • Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character
  • Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9
  • Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9 We count the shortest representation* * Unicode NFC form. See: http://unicode.org/reports/tr15/
  • Invalid Tweet Text Slide on characters that Twitter does not allow in a Tweet. We purposely disallow those that have no meaning in the context of a Tweet, or that For a variety of reasons have security implications. We also have a technical limitation in MySQL that disallows certain characters. It’s fixed in MySQL 6 but we’ll be moving to Disallowed on Purpose Cassandra. • Byte order Marks (not needed since we only accept UTF-8): U+FFFE & U+FEFF • Reserved Unicode Special: U+FFFF • Directional Change Characters (they allow complicated phishing attacks)*: U +202A, U+202B, U+202C, U+202D & U+202E Disallowed Due to Technical Limitations • Characters outside of the Basic Multilingual Plane (BMP) • That means all Unicode code points above U+FFFF • Some Unicode 5 Kanji, Many ancient writing systems and things like musical symbols. • We’re actively working on the move from MySQL to Cassandra, which will solve this. * Unicode Security Considerations: http://www.unicode.org/reports/tr36
  • Questions & Answers Here To Help.