SlideShare a Scribd company logo
1 of 33
Twitter International
Twitter International
by Matt Sanford
Agenda:
                                          * Who am I

                                          * Some business-y talk about
                                         popularity outside of the US


What’s on Tap                             * Some quick notes on our translation
                                         process

                                          * Technical details on what’s hard
                                         about non-English text



• Hashtag for Questions: #chirpintl
• Who’s this guy?
• Twitter’s popularity outside of the US
• Twitter’s Current & Future Translation Tools
• Non-English Tweet Handling
   • Extraction and Auto-linking with Twitter Text
   • Character Counting
   • Invalid Tweet Text
Matt Sanford / @mzsanford
• Joined Twitter from Summize (Twitter Search)
• Worked on Search and Platform                   Short bio slide. Helpful
                                                  when it comes to Q&A
                                                  time.
   • Search by language, search refresh bar
   • Original OAuth implementer at Twitter
• Now tech lead of the International team
   • Working on translation tools and non-US features
   • Standardized character counting
   • Author of Open Source Twitter Text libraries
Before I cover any technical details I
                  wanted to give a little information on
                  why people using the Twitter Platform
                  should be interested in International

                  The best way to do that is with numbers …




International Business.
Why Bother With International?
International: 60% & Growing
100%

 75%

50%

25%

 0%
 June 2009              September 2009              December 2009       March 2010
         Bam. 60% of all Twitter   A big part of this is Japan, where
         accounts are non-US       we’re quite popular.

         … We crossed the 50%      Another big part was the new
         mark September of         translation efforts we launched.
         2009                      Spanish especially has been well
                                   received.
Attendees vs. Users

  Non-US
   17%
                                   US


                   International

            US
           83%



Chirp Attendees   Twitter Accounts
A good example of Twitter
                           International is Chile.

                           Translating didn’t create an
                           explosion in Twitter usage. What
                           created an explosion was a need
                           for faster information.




Case Study: Chile
We’re There When People Need Us.
Twitter Signups in Chile
     We’re There When People Need Us.




Fenruary 21st          February 24th    February 27th   March 2nd
Twitter Signups in Chile
     We’re There When People Need Us.




Fenruary 21st          February 24th    February 27th                      March 2nd


     Urgent. In en Constitución apareció IVAN LARA DE
     URGENTEConstitucion an eight-year old boy named 8
     Ivan Lara showed ABANDONADO en esa ciudad...busca
     AÑOS QUE ESTÁ up alone. He's looking for his family
     parientes en todo Chile favor copiar y pegar
                                                        10:50 AM Mar 2nd via web
As opposed to the event inflection we
                              saw in Chile, in Japan we’ve seen long
                              term, sustained growth. We’ve also
                              been dedicating resources to some local-
                              specific features.




Case Study: Japan
Not Godzilla Big, But We’re Working On It.
Daily Tweeters in Japan
      More Users Are Good. More Engaged Users Are Better.




July ‘09                October ‘09                 January ‘10   April ‘10
Japanese Mobile Follow Me
              Take Advantage of Existing Behavior
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog




                                Photo: flickr.com/netwalkerz
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog




                                Photo: flickr.com/netwalkerz
Since translation is a big
                   part of what we’re
                   working on I want to cover
                   that a little bit.

                   Like all Twitter features
                   we rely on user need to
                   help define what we do. We
                   could have paid translators
                   but we felt like having
                   user’s participate in the
                   process was important.




Translation Tool
                   That led us to our current
                   crow-sourcing model …




Present & Future
2,600
Participating Translators
          And we plan more than
          double that number very
          soon when we send out
          more invites.
3,500
Strings to Translate
3,600
Strings to Translate
480,000
 Translations
     Staggering passion and
     participation from the
     community.
On context: Point out the                                    Post-slide note: We’ll be
arrow versus the list-                                       rolling out changes very
view of other sites. Also:                                   soon that focus on


                             Translation Tools
suggestions                                                  consensus over new
                                                             translations.
On deploy: unaided

                                           today
                                           • Volunteer crowd-sourcing
                                             • Augmented by in-house people
                                           • Built-in to twitter.com
                                           • Provides context during translation
                                             • Significantly higher quality
                                             • Social game dynamics
                                           • Database backed and heavily cached
                                             • Edits are launched in ~2 hours
                                           • Multiple levels of voting
                                             • Helps prevent abuse
                                             • Built-in proofing system
Translation Tools
                      tomorrow
• We’ve released some common terms on the API wiki
   • So you can benefit from our translation work
   • To help with consistency across clients
• We’re hoping to provide even more data in the future
   • More languages. More strings. More ease.
• New translation UI changes coming soon
              On releasing translations: We made
              this a goal and covered it in the
              translation agreement. Let me know
              after this talk what would help you.
Up until now we’ve covered more general Twitter topics.
                      Now we’re going to talk about some of the more
                      complicated topics. Most international issues boil down
                      to things you think are simple turning out to be
                      deceptively hard to get right. Things like:

                          * Parsing t weets (and what’s so hard about it)
                          * Counting characters (and why it’s not that simple)
                          * Tweet text that we cannot accept (today)




Engineering Topics
Yeah, It’s Complicated.
Twitter Text Libraries
                                                                       Rather than re-implement
                                                                       these common features we
                                                                       recommend using the Open
                                                                       Source libraries we help
                                                                       maintain.



          • Provides extraction and auto-linking
                                                           If you’re not using Ruby or
             • @user, @user/list, #hashtag, URLs           Java: We provide a cross-
                                                           language test suite so you
                                                           can implement the same
          • Open Source*                                   rules in another language.


          • Available in Ruby and Java from Twitter
          • Conformance Testing Data
             • Modeled after the Unicode conformance suite
             • YAML description of test cases for any language
             • Assurance that you meet the same standards
          • Many non-English test cases


* http://twitter.com/about/opensource and on github
Twitter Text: Japanese Linking
 Issues not encountered in English:
  • Additional punctuation characters
                                                            Quick tour of the issues
     • s in many languages ignores U+3000 (‘ ’)            the Twitter Text libraries
                                                            handle in Japanese that
                                                            many previous libraries
                                                            didn’t handle.
    • Full-width punctuation forms:
      • @ versus
                                          The lack of word spaces is a fundamental
      •   #   versus                      issue when it comes to parsing Tweets.



  • No spaces between words
Twitter Text: Japanese Linking
 Issues not encountered in English:
  • Additional punctuation characters
                                                            Quick tour of the issues
     • s in many languages ignores U+3000 (‘ ’)            the Twitter Text libraries
                                                            handle in Japanese that
                                                            many previous libraries
                                                            didn’t handle.
    • Full-width punctuation forms:
      • @ versus
                                          The lack of word spaces is a fundamental
      •   #   versus                      issue when it comes to parsing Tweets.



  • No spaces between words
   My homepage is http://twitter.com
                        http://twitter.com
Character counting
Unicode FTW!
Character counting
Unicode FTW!

Don’t count bytes
          UTF-8: 0xE5 0x91 0xB3 (3 bytes)

          UTF-16: 0x54 0x73 (2 bytes)

 U+5473   Human: 1 character
Character counting
Unicode FTW!

Don’t count bytes
          UTF-8: 0xE5 0x91 0xB3 (3 bytes)

          UTF-16: 0x54 0x73 (2 bytes)

 U+5473   Human: 1 character




Don’t even count Unicode code points
 e +
U+0065    U+0301
                   =é  {U+0065, U+0301}
                                            OR
                                                  é
                                                 U+00E9
Character counting
              Unicode FTW!

              Don’t count bytes
                                UTF-8: 0xE5 0x91 0xB3 (3 bytes)

                                UTF-16: 0x54 0x73 (2 bytes)

                   U+5473       Human: 1 character




              Don’t even count Unicode code points
                  e +
                 U+0065        U+0301
                                         =é    {U+0065, U+0301}
                                                                  OR
                                                                        é
                                                                       U+00E9




              We count the shortest representation*
* Unicode NFC form. See: http://unicode.org/reports/tr15/
Invalid Tweet Text
                                                                         Slide on characters that Twitter does not
                                                                         allow in a Tweet.

                                                                         We purposely disallow those that have no
                                                                         meaning in the context of a Tweet, or that
            For a variety of reasons                                     have security implications.

                                                                         We also have a technical limitation in
                                                                         MySQL that disallows certain characters.
                                                                         It’s fixed in MySQL 6 but we’ll be moving to

           Disallowed on Purpose                                         Cassandra.


           • Byte order Marks (not needed since we only accept UTF-8): U+FFFE & U+FEFF
           • Reserved Unicode Special: U+FFFF
           • Directional Change Characters (they allow complicated phishing attacks)*: U
           +202A, U+202B, U+202C, U+202D & U+202E



           Disallowed Due to Technical Limitations
           • Characters outside of the Basic Multilingual Plane (BMP)
              • That means all Unicode code points above U+FFFF
              • Some Unicode 5 Kanji, Many ancient writing systems and things like musical symbols.
           • We’re actively working on the move from MySQL to Cassandra, which will solve this.


* Unicode Security Considerations: http://www.unicode.org/reports/tr36
Questions & Answers
Here To Help.

More Related Content

Similar to Chirp 2010: Twitter International

Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutNett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutJonathan Crossfield
 
Social Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsSocial Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsTom Eston
 
Social Media Overview: June 2012
Social Media Overview: June 2012Social Media Overview: June 2012
Social Media Overview: June 2012Sociabull
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from AnsibleGreg DeKoenigsberg
 
Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6dml communications
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentationAndrew Marietta
 
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Laura Norvig
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!AfricanCommonsProject
 
Short Essay On Importance Of School Library
Short Essay On Importance Of School LibraryShort Essay On Importance Of School Library
Short Essay On Importance Of School LibraryNikki Wheeler
 
Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Lingoport (www.lingoport.com)
 
N1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterN1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterAndrew Grant
 
Liveblogging and mobile journalism
Liveblogging and mobile journalismLiveblogging and mobile journalism
Liveblogging and mobile journalismPaul Bradshaw
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsRodrigo Campos
 
2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozillaGen Kanai
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Social Media and Crisis Management
Social Media and Crisis ManagementSocial Media and Crisis Management
Social Media and Crisis ManagementMark Gibbs
 
東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 chrissalzberg
 

Similar to Chirp 2010: Twitter International (20)

Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutNett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
 
Social Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsSocial Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More Brains
 
Doonish
DoonishDoonish
Doonish
 
Doonish
DoonishDoonish
Doonish
 
Social Media Overview: June 2012
Social Media Overview: June 2012Social Media Overview: June 2012
Social Media Overview: June 2012
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from Ansible
 
Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentation
 
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!
 
Short Essay On Importance Of School Library
Short Essay On Importance Of School LibraryShort Essay On Importance Of School Library
Short Essay On Importance Of School Library
 
Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...
 
N1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterN1 how to guide: make money from Twitter
N1 how to guide: make money from Twitter
 
Liveblogging and mobile journalism
Liveblogging and mobile journalismLiveblogging and mobile journalism
Liveblogging and mobile journalism
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOps
 
2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Social Media and Crisis Management
Social Media and Crisis ManagementSocial Media and Crisis Management
Social Media and Crisis Management
 
東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Chirp 2010: Twitter International

  • 3. Agenda: * Who am I * Some business-y talk about popularity outside of the US What’s on Tap * Some quick notes on our translation process * Technical details on what’s hard about non-English text • Hashtag for Questions: #chirpintl • Who’s this guy? • Twitter’s popularity outside of the US • Twitter’s Current & Future Translation Tools • Non-English Tweet Handling • Extraction and Auto-linking with Twitter Text • Character Counting • Invalid Tweet Text
  • 4. Matt Sanford / @mzsanford • Joined Twitter from Summize (Twitter Search) • Worked on Search and Platform Short bio slide. Helpful when it comes to Q&A time. • Search by language, search refresh bar • Original OAuth implementer at Twitter • Now tech lead of the International team • Working on translation tools and non-US features • Standardized character counting • Author of Open Source Twitter Text libraries
  • 5. Before I cover any technical details I wanted to give a little information on why people using the Twitter Platform should be interested in International The best way to do that is with numbers … International Business. Why Bother With International?
  • 6. International: 60% & Growing 100% 75% 50% 25% 0% June 2009 September 2009 December 2009 March 2010 Bam. 60% of all Twitter A big part of this is Japan, where accounts are non-US we’re quite popular. … We crossed the 50% Another big part was the new mark September of translation efforts we launched. 2009 Spanish especially has been well received.
  • 7. Attendees vs. Users Non-US 17% US International US 83% Chirp Attendees Twitter Accounts
  • 8. A good example of Twitter International is Chile. Translating didn’t create an explosion in Twitter usage. What created an explosion was a need for faster information. Case Study: Chile We’re There When People Need Us.
  • 9. Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd
  • 10. Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd Urgent. In en Constitución apareció IVAN LARA DE URGENTEConstitucion an eight-year old boy named 8 Ivan Lara showed ABANDONADO en esa ciudad...busca AÑOS QUE ESTÁ up alone. He's looking for his family parientes en todo Chile favor copiar y pegar 10:50 AM Mar 2nd via web
  • 11. As opposed to the event inflection we saw in Chile, in Japan we’ve seen long term, sustained growth. We’ve also been dedicating resources to some local- specific features. Case Study: Japan Not Godzilla Big, But We’re Working On It.
  • 12. Daily Tweeters in Japan More Users Are Good. More Engaged Users Are Better. July ‘09 October ‘09 January ‘10 April ‘10
  • 13. Japanese Mobile Follow Me Take Advantage of Existing Behavior
  • 14. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog
  • 15. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • 16. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • 17. Since translation is a big part of what we’re working on I want to cover that a little bit. Like all Twitter features we rely on user need to help define what we do. We could have paid translators but we felt like having user’s participate in the process was important. Translation Tool That led us to our current crow-sourcing model … Present & Future
  • 18. 2,600 Participating Translators And we plan more than double that number very soon when we send out more invites.
  • 21. 480,000 Translations Staggering passion and participation from the community.
  • 22. On context: Point out the Post-slide note: We’ll be arrow versus the list- rolling out changes very view of other sites. Also: soon that focus on Translation Tools suggestions consensus over new translations. On deploy: unaided today • Volunteer crowd-sourcing • Augmented by in-house people • Built-in to twitter.com • Provides context during translation • Significantly higher quality • Social game dynamics • Database backed and heavily cached • Edits are launched in ~2 hours • Multiple levels of voting • Helps prevent abuse • Built-in proofing system
  • 23. Translation Tools tomorrow • We’ve released some common terms on the API wiki • So you can benefit from our translation work • To help with consistency across clients • We’re hoping to provide even more data in the future • More languages. More strings. More ease. • New translation UI changes coming soon On releasing translations: We made this a goal and covered it in the translation agreement. Let me know after this talk what would help you.
  • 24. Up until now we’ve covered more general Twitter topics. Now we’re going to talk about some of the more complicated topics. Most international issues boil down to things you think are simple turning out to be deceptively hard to get right. Things like: * Parsing t weets (and what’s so hard about it) * Counting characters (and why it’s not that simple) * Tweet text that we cannot accept (today) Engineering Topics Yeah, It’s Complicated.
  • 25. Twitter Text Libraries Rather than re-implement these common features we recommend using the Open Source libraries we help maintain. • Provides extraction and auto-linking If you’re not using Ruby or • @user, @user/list, #hashtag, URLs Java: We provide a cross- language test suite so you can implement the same • Open Source* rules in another language. • Available in Ruby and Java from Twitter • Conformance Testing Data • Modeled after the Unicode conformance suite • YAML description of test cases for any language • Assurance that you meet the same standards • Many non-English test cases * http://twitter.com/about/opensource and on github
  • 26. Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words
  • 27. Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words My homepage is http://twitter.com http://twitter.com
  • 29. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character
  • 30. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9
  • 31. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9 We count the shortest representation* * Unicode NFC form. See: http://unicode.org/reports/tr15/
  • 32. Invalid Tweet Text Slide on characters that Twitter does not allow in a Tweet. We purposely disallow those that have no meaning in the context of a Tweet, or that For a variety of reasons have security implications. We also have a technical limitation in MySQL that disallows certain characters. It’s fixed in MySQL 6 but we’ll be moving to Disallowed on Purpose Cassandra. • Byte order Marks (not needed since we only accept UTF-8): U+FFFE & U+FEFF • Reserved Unicode Special: U+FFFF • Directional Change Characters (they allow complicated phishing attacks)*: U +202A, U+202B, U+202C, U+202D & U+202E Disallowed Due to Technical Limitations • Characters outside of the Basic Multilingual Plane (BMP) • That means all Unicode code points above U+FFFF • Some Unicode 5 Kanji, Many ancient writing systems and things like musical symbols. • We’re actively working on the move from MySQL to Cassandra, which will solve this. * Unicode Security Considerations: http://www.unicode.org/reports/tr36

Editor's Notes

  1. Holding pattern.
  2. Title Slide. • “let’s get to it”
  3. • Who am I? • Some business-y talk for the entrepreneurs in the group • Notes on how we’ve gone about translation • Engineering challenges for the coders in the group
  4. • From Summmize • Search, platform (might remember API Group) • Original OAuth (sorry) • International (translation, char counting, twitter-text)
  5. • Before the technical stuff a little info on why you should be interested. • the main reason: Users
  6. • You might have seen the blog post on international growth. • Passed 50% not long after the team formed • In large part: Japan, translation
  7. • Take advantage of these markets. • Due to a slew of factors dev is mainly US (Twitter, EN, etc) but that does not mean it’s not looking outward
  8. • Case in point, Chile • Translating alone was not a big jump • But we had set the stage. When the need for faster information arose we were there
  9. • You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. • What’s great isn’t the users, but the utility [click] • This tweet for example. • It’s not what someone had for breakfast, but solving a real communication problem.
  10. • You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. • What’s great isn’t the users, but the utility [click] • This tweet for example. • It’s not what someone had for breakfast, but solving a real communication problem.
  11. • unlike the event-driven growth in Chile, Japan is a long-term stead growth • We’ve been dedicating resources and working on more local features
  12. • Rather than users I want to highlight daily unique ‘Tweeters’ (people who tweet) - We’ve been working as much on adding people as increasing the utility to those people - Done this via a new mobile site matching Japanese expectations, along with email/photoposting • The red dot here is the ‘follow me’ feature on the mobile site. It’s not the sole cause of the uptake but it’s helped. • I’d like to take a moment and explain that … [next]
  13. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  14. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  15. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  16. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  17. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  18. • Translation is a big part of what we do, and we do it a little different • Like all features we turn to users for feedback. Could have paid, would have been cheaper, but would not have had community feedback • Crowd-source, like open source for data. We had a great group … [next]
  19. Of more than 2,600 translators. - Soon to send out more invites. Planning to make it open to anyone later this year.
  20. Twitter isn’t just 200 labels. Settings, about pages, features, etc. [click] and more features every day.
  21. Twitter isn’t just 200 labels. Settings, about pages, features, etc. [click] and more features every day.
  22. Those 2,600 translators have been so passionate it just blows me away. As of today they’ve contributed 480k translation
  23. • We augmented with a wonderful group in-house (shoutout) • Built the tool into twitter.com, provides context (see pointer) for quality, social game dynamic in jump-around prompt (see counter) • DB backed with cache, no-deploy launching. • Multi-level voting
  24. We’ve released translations of the most common terms on the wiki so you can use them. We want to provide even more help, let us know how. New translation UI upcoming (not of too much interest, other than more data)
  25. Engineering topics. Not complete but most i18n topics boil down to things that are easy 99% of the time and very hard 1% of the time. We’ll cover parsing tweets, counting characters, and invalid tweet text
  26. Twitter-text libs. - Extract, autolink - Open Source Ruby and Java. Also following community ports to Python and PHP (though PHP could use some love). Look forward to more. - Conformance data: Unicode, YAML, assurance, non-EN test cases • A good example of the 1% issues we handle in the libs are Japanese Tweets …[next]
  27. • Punctuation: s sucks in most languages. Full-width @ and # (if you want more info on this let me know afterward.) • No spaces between words. Turns out, we assume a lot [click] - http://S+ does not work.
  28. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  29. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  30. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  31. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  32. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  33. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  34. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  35. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  36. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  37. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  38. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  39. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  40. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  41. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  42. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  43. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  44. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  45. • Two types of things we don’t allow. On purpose, technical limitation. • On Purpose: BOM (not utf-16), reserved, dir change (security, layout is not at home in a Tweet) • Limitations of MySQL (<v6) prevent some chars. (small set of Kanji, musical symbols, ancient scripts)