SlideShare a Scribd company logo
1 of 54
Parlez-vous zwo-null, señor?
Providing Cross-National and Multi-Lingual
Web Communities


                                                October 22th, 2008
                                                    Andreas Ravn
                                              Senior Software Engineer
                                             namics (deutschland) gmbh
                                                              Hamburg




   1
Worth ladies and Mr.,
   I be pleased that nevertheless
or other one grants too to this early
that away was not afraid and to this
lecture with the title parlez vous two
      zero senor to appear here.



  3
With one another will we in
    the next grants a hopefully
completely stimulating trip the world
     from chances and tücken
  multilingual Web of communities
   to undertake and now geht' s
      also equivalent loosely.



   4
Parlez-vous zwo-null, señor?
Providing Cross-National and Multi-Lingual
Web Communities


                                                October 22th, 2008
                                                    Andreas Ravn
                                              Senior Software Engineer
                                             namics (deutschland) gmbh
                                                              Hamburg




   5
Web 1.0




   6
Web 1.0

»   Content centrally created by editorial staff
»   Internal Quality Assurance for content
»   Content inheritance
»   Control over
    – website structure
    – Languages used
    – Localized content




     7
8
9
10
Web 2.0 applications (and the like)

»   social networks
»   communities
»   content portals
»   grassroot
»   aggregation sites
    (lists, social
    bookmarking)
»   crowdsourcing sites
»   discussion groups
»   weblogs
»   wikis
»   …
    13
In Web 2.0

» User generated content
   –    content elements
   –    reviews
   –    comments
   –    profiles
   –    discussions


» User generated structure
   –    tags
   –    lists
   –    links
   –    rating, reputation


   14
Where does your content come from?




                                           wikis
                                                         Content portals
       weblogs         Social networks

                                                                           forums
                  aggregation                      Communities

 Centrally generated             content and structure               User-generated




  15
Web 2.0

» Provider holds less control over
   – Content creation
   – Categorization of content
   – Quality of content

» How to organize a multi-lingual community?
» How to organize a community that has members in
   different countries?




   16
Why bother?




 17
Why mix content and users in the first place?

» more members = more content, more exchange
» maybe higher quality level
» community model may require multi-country context (or
   benefit from)
» Yes, we’re international!




   18
Now what‘s so difficult about this?




  19
« Localization
     is a technical problem
           and will be solved
           by my developers
                in no time. »
                              — a customer
                (localization not online yet)




20
Localization issues.

»   Technical
»   Content
»   Organizational
»   Cultural
»   Marketing




    21
Issue: Pinpoint the user.

» Where does my user come from and what languages
   does he speak?
    –   browser language
    –   chosen domain
    –   IP address tables
    –   profiles
» If wrong, the user knows better, and can configure.




   22
Issue: Encoding, character sets, codepages

» Special characters
   – iso-8859-1 = latin
        iso-8859-5 = cyrillic
        iso-8859-9 = greek
        …
» Solution (mostly): UTF-8
   – Now standard; generally not a
        problem anymore
   – Check software components




   23
Issue: Content and structure

» User Generated Content, sure.
» Structure: e.g. tags




   24
artist bad email gift
 handy schmuck fick
     pain   chef garage mist



25
Can we translate user generated content?

» Automatically
   – Translation services exist
   – e.g. Google Language API (translation, language
        detection)
   – Problems:
         – General quality
         – special vocabulary
         – fixed terms, technical and functional terms
         – UGC: You speak on behalf of the user!
              – bad translation reduces the author’s credibility




   26
Can we translate user generated content? (2)

» Automatically
» Editorial
   – Lots of work!
   – Featured user articles, focus on „best rated“
» Community
   – Involve community
   – Reward advertising




   28
Localization Is Not Just Translation

» Simpler aspects: Date/Time Formats, Timezones,
   Currencies, Temperature, Measures
» Work & research: Audio, video
» Culture: Meaning, customs and taste (e.g. image types,
   colors) vary in in different countries
» Don‘t forget: legislative provisions, etc.




    29
Issue: Searching (Full Text Search)

» What language are the search terms in?
   – Use context (language, profile, country) or language
        detection
   – Spelling
» How to display the result?
   – Tokenizing, stemming, similarity/proximity
   – Result set: what to show? Mix languages?
   – Result structure: relevance!




   30
More issues.

» Geocoding: different standards
» Administration: e.g. user support




   32
Some Approaches.

Lösungsvorschlag.




          14. Juli 2006
     Vorname Name, Funktion



    33
Approach 1
„Web 1.0“




  34
1. „Web1.0“.

» How it works.
   – Separate communities for every language and/or country.
» Advantages
   – No localization problems
» Disadvantages
   – No intl. web community at all
   – Redundancy
   – Critical user mass for each community




   35
Approach 2
     „Laissez-faire“




36
2. Laissez-faire.

» How it works.
   – The user is free to use any language he likes.
   – The frontend is localized, thus giving the impression of a
        “web 1.0” site.
» Advantages
   – Easy to implement
   – Better language level in the individual language
» Disadvantages
   – Navigation, search, categorization diffuse – result quality
        suffers




   37
Approach 3
„Common Ground“




  38
3. „Common Ground“

» How it works
   – Offer the community platform only in the “lingua franca”
» Advantages
   – Includes most of your users.
» Disadvantages
   – Most common language still leaves out large relevant
        groups (e.g. French also a large community)
   – Language level altogether lower (in foreign language)




   39
Approach 4
„S.O.D.“




  40
4. „S.O.D.“

» How it works.
   – Provider chooses the language to be used.
   – Frontend is available only in this language
» Advantages
   – Easy to implement
» Disadvantages
   – Foreign users are not encouraged to contribute content




   41
Approach 5
„Babelfish“




  42
5. „Babelfish“

» How it works.
   – Contributors use their own language to enter content
   – All content is presented in the user’s own language
» Advantages
   – Maximum coverage of the community
   – No need to learn Esperanto or Volapük! ;-)
» Disadvantages
   – Automatic translation necessary. Quality issues.
   – Cultural problems




   43
What Others Do




  45
How much localization
do you need?




 46
47
http://der-mo.net/relationBrowser
             48
How much localization do you need?

» Check against your community needs
   – e.g.: local.ch in Switzerland
   – covers all of Switzerland, but
        – CH has strict(-ish) regional language zones
        – services are used mostly locally




   49
Controlled vocabulary




 50
Prefer „controlled vocabulary“.

» Structured vs. unstructured data
» Avoid free text (where you can):
   – Text analysis is complicated and still nonsatisfying
» What are your „core assets“?
» What data can be made structured?
   – data that is language-neutral (e.g. Names)
   – Prefer pre-defined options to free text
         – naturally pre-defined (e.g. male/female)
         – definable by yourself (e.g. categorizations at ebay)
   – easily translatable and probably already available data
        (e.g. City names)




   51
What data to show to the user?

» Be consistent.
» A good practice:
   – show all structured data in a translated form (e.g. in a
        user profile)
   – Let user enter his spoken languages in his profile (and
        show those)
   – offer to present other-language data
         – either in native language
         – or translated („Would you like to see…“)
         – shows awareness of multi-language problem




   53
Gentle integration of other language/country
content
» Priorize for relevancy (once again)
» You may offer automatic translations…
   – … but ask before doing so…
   – … and clearly label an automatic translation.
   – Check quality. Allow user feedback.
» Decide according to your community model, if country
   content can be mixed
   – also cultural issues




   54
Internationalization from the beginning


 » Every bit of content has an origin
        – For each tiny content element, save the language and
          region.
 » Some software systems offer multilanguage support
        – e.g. Drupal: user indicates his home country, the primary
          language, other languages he speaks
        – Automatic composition of regionalized pages
 » Use language-neutral page templating
        – Technical issue: Separate user interface and code, keep
          page templates language-neutral




   55
Translation Files




   MSG_NEW_MAIL_ALERT = „Good
   morning, {USER_FIRST_NAME}.
   You have {NUM_NEW_MAILS}
   new mails.“



   56
Grammar and special formats




 {USER_NAME} {START_DATE} {END_DATE}
 {TXT_IS_ON_VACATION}




  57
„Andreas Ravn from 18.10.2008 to
21.10.2008 is on vacation.“




 58
Cultural issues




    Welcome, Andreas!
   Welcome, Mr. Ravn!



   59
So what should I do?




   60
Thank you.                          Andreas Ravn

Danke.
Mahalo.
Merci villmah.                      namics (deutschland) gmbh
Ookini arigatou.                    andreas.ravn@namics.com
                                    http://www.namics.com
Néá'êshemeno.
Wokol a wala.
Dêkuji.
Bedankt.
Paljon kiitoksia.
Gracias.
Nagyon köszönöm.
Qujanarssuaq.
Rakux u kapamaxemaxes namen dimo.




  61

More Related Content

Similar to Parlez-vous zwei-null, señor?

APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketScott Abel
 
Northern Arizona State ACM talk (10/08)
Northern Arizona State ACM talk (10/08)Northern Arizona State ACM talk (10/08)
Northern Arizona State ACM talk (10/08)Joshua Drake
 
More than a 1000 words
More than a 1000 wordsMore than a 1000 words
More than a 1000 wordsTimothy Kunau
 
ILUG 2008 Templates, Templates Everywhere
ILUG 2008 Templates, Templates EverywhereILUG 2008 Templates, Templates Everywhere
ILUG 2008 Templates, Templates EverywhereKevin Pettitt
 
Building Scalable Backends with Go
Building Scalable Backends with GoBuilding Scalable Backends with Go
Building Scalable Backends with GoShiju Varghese
 
From Inspiration to Activation: Making Online Collaborative Communities Work
From Inspiration to Activation: Making Online Collaborative Communities WorkFrom Inspiration to Activation: Making Online Collaborative Communities Work
From Inspiration to Activation: Making Online Collaborative Communities WorkCommunitySense
 
Chapter 3_Multimedia Design.pdf
Chapter 3_Multimedia Design.pdfChapter 3_Multimedia Design.pdf
Chapter 3_Multimedia Design.pdfHeryMach1
 
TypeScript - Javascript done right
TypeScript - Javascript done rightTypeScript - Javascript done right
TypeScript - Javascript done rightWekoslav Stefanovski
 
Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Dan Brickley
 
FOSDEM 2009 Thunderbird 3 talk
FOSDEM 2009 Thunderbird 3 talkFOSDEM 2009 Thunderbird 3 talk
FOSDEM 2009 Thunderbird 3 talkdavidascher
 
Living in a multiligual world: Internationalization for Web 2.0 Applications
Living in a multiligual world: Internationalization for Web 2.0 ApplicationsLiving in a multiligual world: Internationalization for Web 2.0 Applications
Living in a multiligual world: Internationalization for Web 2.0 ApplicationsLars Trieloff
 
ALOE - Combining User Generated Content and Traditional Metadata
ALOE - Combining User Generated Content and Traditional MetadataALOE - Combining User Generated Content and Traditional Metadata
ALOE - Combining User Generated Content and Traditional MetadataMartin Memmel
 
How to get started in Open Source!
How to get started in Open Source!How to get started in Open Source!
How to get started in Open Source!Pradeep Singh
 
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...adunne
 
Domain Specific Languages
Domain Specific LanguagesDomain Specific Languages
Domain Specific LanguagesLakshan Perera
 
Evergreen Documentation Lightning Talk
Evergreen Documentation Lightning TalkEvergreen Documentation Lightning Talk
Evergreen Documentation Lightning TalkEvergreen ILS
 

Similar to Parlez-vous zwei-null, señor? (20)

APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
 
Northern Arizona State ACM talk (10/08)
Northern Arizona State ACM talk (10/08)Northern Arizona State ACM talk (10/08)
Northern Arizona State ACM talk (10/08)
 
Web1 2
Web1 2Web1 2
Web1 2
 
More than a 1000 words
More than a 1000 wordsMore than a 1000 words
More than a 1000 words
 
ILUG 2008 Templates, Templates Everywhere
ILUG 2008 Templates, Templates EverywhereILUG 2008 Templates, Templates Everywhere
ILUG 2008 Templates, Templates Everywhere
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
 
Building Scalable Backends with Go
Building Scalable Backends with GoBuilding Scalable Backends with Go
Building Scalable Backends with Go
 
From Inspiration to Activation: Making Online Collaborative Communities Work
From Inspiration to Activation: Making Online Collaborative Communities WorkFrom Inspiration to Activation: Making Online Collaborative Communities Work
From Inspiration to Activation: Making Online Collaborative Communities Work
 
Chapter 3_Multimedia Design.pdf
Chapter 3_Multimedia Design.pdfChapter 3_Multimedia Design.pdf
Chapter 3_Multimedia Design.pdf
 
TypeScript - Javascript done right
TypeScript - Javascript done rightTypeScript - Javascript done right
TypeScript - Javascript done right
 
Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)
 
FOSDEM 2009 Thunderbird 3 talk
FOSDEM 2009 Thunderbird 3 talkFOSDEM 2009 Thunderbird 3 talk
FOSDEM 2009 Thunderbird 3 talk
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Living in a multiligual world: Internationalization for Web 2.0 Applications
Living in a multiligual world: Internationalization for Web 2.0 ApplicationsLiving in a multiligual world: Internationalization for Web 2.0 Applications
Living in a multiligual world: Internationalization for Web 2.0 Applications
 
ALOE - Combining User Generated Content and Traditional Metadata
ALOE - Combining User Generated Content and Traditional MetadataALOE - Combining User Generated Content and Traditional Metadata
ALOE - Combining User Generated Content and Traditional Metadata
 
How to get started in Open Source!
How to get started in Open Source!How to get started in Open Source!
How to get started in Open Source!
 
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...
Living in a Multi-lingual World: Internationalization in Web and Desktop Appl...
 
Domain Specific Languages
Domain Specific LanguagesDomain Specific Languages
Domain Specific Languages
 
Evergreen Documentation Lightning Talk
Evergreen Documentation Lightning TalkEvergreen Documentation Lightning Talk
Evergreen Documentation Lightning Talk
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Parlez-vous zwei-null, señor?

  • 1. Parlez-vous zwo-null, señor? Providing Cross-National and Multi-Lingual Web Communities October 22th, 2008 Andreas Ravn Senior Software Engineer namics (deutschland) gmbh Hamburg 1
  • 2. Worth ladies and Mr., I be pleased that nevertheless or other one grants too to this early that away was not afraid and to this lecture with the title parlez vous two zero senor to appear here. 3
  • 3. With one another will we in the next grants a hopefully completely stimulating trip the world from chances and tücken multilingual Web of communities to undertake and now geht' s also equivalent loosely. 4
  • 4. Parlez-vous zwo-null, señor? Providing Cross-National and Multi-Lingual Web Communities October 22th, 2008 Andreas Ravn Senior Software Engineer namics (deutschland) gmbh Hamburg 5
  • 6. Web 1.0 » Content centrally created by editorial staff » Internal Quality Assurance for content » Content inheritance » Control over – website structure – Languages used – Localized content 7
  • 7. 8
  • 8. 9
  • 9. 10
  • 10. Web 2.0 applications (and the like) » social networks » communities » content portals » grassroot » aggregation sites (lists, social bookmarking) » crowdsourcing sites » discussion groups » weblogs » wikis » … 13
  • 11. In Web 2.0 » User generated content – content elements – reviews – comments – profiles – discussions » User generated structure – tags – lists – links – rating, reputation 14
  • 12. Where does your content come from? wikis Content portals weblogs Social networks forums aggregation Communities Centrally generated content and structure User-generated 15
  • 13. Web 2.0 » Provider holds less control over – Content creation – Categorization of content – Quality of content » How to organize a multi-lingual community? » How to organize a community that has members in different countries? 16
  • 15. Why mix content and users in the first place? » more members = more content, more exchange » maybe higher quality level » community model may require multi-country context (or benefit from) » Yes, we’re international! 18
  • 16. Now what‘s so difficult about this? 19
  • 17. « Localization is a technical problem and will be solved by my developers in no time. » — a customer (localization not online yet) 20
  • 18. Localization issues. » Technical » Content » Organizational » Cultural » Marketing 21
  • 19. Issue: Pinpoint the user. » Where does my user come from and what languages does he speak? – browser language – chosen domain – IP address tables – profiles » If wrong, the user knows better, and can configure. 22
  • 20. Issue: Encoding, character sets, codepages » Special characters – iso-8859-1 = latin iso-8859-5 = cyrillic iso-8859-9 = greek … » Solution (mostly): UTF-8 – Now standard; generally not a problem anymore – Check software components 23
  • 21. Issue: Content and structure » User Generated Content, sure. » Structure: e.g. tags 24
  • 22. artist bad email gift handy schmuck fick pain chef garage mist 25
  • 23. Can we translate user generated content? » Automatically – Translation services exist – e.g. Google Language API (translation, language detection) – Problems: – General quality – special vocabulary – fixed terms, technical and functional terms – UGC: You speak on behalf of the user! – bad translation reduces the author’s credibility 26
  • 24. Can we translate user generated content? (2) » Automatically » Editorial – Lots of work! – Featured user articles, focus on „best rated“ » Community – Involve community – Reward advertising 28
  • 25. Localization Is Not Just Translation » Simpler aspects: Date/Time Formats, Timezones, Currencies, Temperature, Measures » Work & research: Audio, video » Culture: Meaning, customs and taste (e.g. image types, colors) vary in in different countries » Don‘t forget: legislative provisions, etc. 29
  • 26. Issue: Searching (Full Text Search) » What language are the search terms in? – Use context (language, profile, country) or language detection – Spelling » How to display the result? – Tokenizing, stemming, similarity/proximity – Result set: what to show? Mix languages? – Result structure: relevance! 30
  • 27. More issues. » Geocoding: different standards » Administration: e.g. user support 32
  • 28. Some Approaches. Lösungsvorschlag. 14. Juli 2006 Vorname Name, Funktion 33
  • 30. 1. „Web1.0“. » How it works. – Separate communities for every language and/or country. » Advantages – No localization problems » Disadvantages – No intl. web community at all – Redundancy – Critical user mass for each community 35
  • 31. Approach 2 „Laissez-faire“ 36
  • 32. 2. Laissez-faire. » How it works. – The user is free to use any language he likes. – The frontend is localized, thus giving the impression of a “web 1.0” site. » Advantages – Easy to implement – Better language level in the individual language » Disadvantages – Navigation, search, categorization diffuse – result quality suffers 37
  • 34. 3. „Common Ground“ » How it works – Offer the community platform only in the “lingua franca” » Advantages – Includes most of your users. » Disadvantages – Most common language still leaves out large relevant groups (e.g. French also a large community) – Language level altogether lower (in foreign language) 39
  • 36. 4. „S.O.D.“ » How it works. – Provider chooses the language to be used. – Frontend is available only in this language » Advantages – Easy to implement » Disadvantages – Foreign users are not encouraged to contribute content 41
  • 38. 5. „Babelfish“ » How it works. – Contributors use their own language to enter content – All content is presented in the user’s own language » Advantages – Maximum coverage of the community – No need to learn Esperanto or Volapük! ;-) » Disadvantages – Automatic translation necessary. Quality issues. – Cultural problems 43
  • 40. How much localization do you need? 46
  • 41. 47
  • 43. How much localization do you need? » Check against your community needs – e.g.: local.ch in Switzerland – covers all of Switzerland, but – CH has strict(-ish) regional language zones – services are used mostly locally 49
  • 45. Prefer „controlled vocabulary“. » Structured vs. unstructured data » Avoid free text (where you can): – Text analysis is complicated and still nonsatisfying » What are your „core assets“? » What data can be made structured? – data that is language-neutral (e.g. Names) – Prefer pre-defined options to free text – naturally pre-defined (e.g. male/female) – definable by yourself (e.g. categorizations at ebay) – easily translatable and probably already available data (e.g. City names) 51
  • 46. What data to show to the user? » Be consistent. » A good practice: – show all structured data in a translated form (e.g. in a user profile) – Let user enter his spoken languages in his profile (and show those) – offer to present other-language data – either in native language – or translated („Would you like to see…“) – shows awareness of multi-language problem 53
  • 47. Gentle integration of other language/country content » Priorize for relevancy (once again) » You may offer automatic translations… – … but ask before doing so… – … and clearly label an automatic translation. – Check quality. Allow user feedback. » Decide according to your community model, if country content can be mixed – also cultural issues 54
  • 48. Internationalization from the beginning » Every bit of content has an origin – For each tiny content element, save the language and region. » Some software systems offer multilanguage support – e.g. Drupal: user indicates his home country, the primary language, other languages he speaks – Automatic composition of regionalized pages » Use language-neutral page templating – Technical issue: Separate user interface and code, keep page templates language-neutral 55
  • 49. Translation Files MSG_NEW_MAIL_ALERT = „Good morning, {USER_FIRST_NAME}. You have {NUM_NEW_MAILS} new mails.“ 56
  • 50. Grammar and special formats {USER_NAME} {START_DATE} {END_DATE} {TXT_IS_ON_VACATION} 57
  • 51. „Andreas Ravn from 18.10.2008 to 21.10.2008 is on vacation.“ 58
  • 52. Cultural issues Welcome, Andreas! Welcome, Mr. Ravn! 59
  • 53. So what should I do? 60
  • 54. Thank you. Andreas Ravn Danke. Mahalo. Merci villmah. namics (deutschland) gmbh Ookini arigatou. andreas.ravn@namics.com http://www.namics.com Néá'êshemeno. Wokol a wala. Dêkuji. Bedankt. Paljon kiitoksia. Gracias. Nagyon köszönöm. Qujanarssuaq. Rakux u kapamaxemaxes namen dimo. 61