SlideShare a Scribd company logo
1 of 51
Behavior-driven Machine Translation at eBay
Asim Mathur, Irina Borisova
Outline
 Intro
o Why is eBay Investing in Language Technology?
o Machine Translation Experience at eBay
o Key Data Challenges
 Machine Translation Training Process
o Data Selection
o Evaluation
 Measuring Language Performance at Scale
Why is eBay Investing in
Language Technology
E-Commerce Growth by Region
Source: Forrester Research
Why Is Machine Translation Important For eBay?
 Cross-border trade is growing 2x as domestic!
 It’s already big: almost 25% of Inc. business
 61% of eBay GMV is international
Static Content
…is translated by the
localization team
Dynamic Content
…requires machine translation
Inventory eligible for Russian market: 60M
listings
Average # of characters per listing: 3,000
Sentence duplication: 50%
# of human translators: 1,000
It would take more than 5 years!
And this is for one language pair only!
Solution: Statistical Machine Translation
 Statistical machine translation started about 20 years ago and is now very competitive
 Aims at teaching a machine how to translate from one language to another using examples of
human translated documents
Training
Data
Model
Translation
Source sentence
Translated sentence
MT engine
Machine Translation Experience at
eBay
Translation Flow
1. A user issues a query in the foreign language
Translation Flow
2. The engine translates the foreign query into English
lunettes de soleil homme men’s sunglasses
Translation Flow
3. The translated query is issued against the eBay English search engine to
retrieve English inventory
men’s sunglasses
Translation Flow
4. The engine translates the English inventory into the foreign language
Translation Flow
5. The translated inventory is served to the user
Machine Translation Experience at eBay
Types of MT at eBay:
 Search query translation
 Item title translation
 Item Descriptions (Planned)
 Member-to-Member communication (Planned)
Supported languages:
Operational Statistics:
➢ Avg. translation calls for -
Queries: ~90 Million per day
Item Titles: ~180 Million
➢ Translation Latency -
Queries: ~ 99%ile within 10 ms
Item Titles: ~ 99%ile within 80 ms
➢ Service Availability: ~99.95 %
 Russian  German
 Spanish  Italian
 Portuguese (Brazil)  Hindi (Planned)
 French  Chinese (Planned)
Key Data Challenges
eBay Scale
A pair of shoes sells
every 2 seconds
Women’s accessories
sell every 2.5 seconds
A Woman’s dress
sold every 2 seconds
A cell phone sold
every 4 seconds
Headphones sold
every 12 seconds
A major appliance sold
every 19 seconds
An car or truck sells
every 5 minutes
A Harley-Davidson
sells every 38 minutes
An iPad sells every 10
seconds
A boat sells every 35
minutes
Very Diverse Data
A tiny sample from 15,000 categories. 800 million listings live at any given time.
A Wide Range of Inputs
The translation engine must accommodate a wide range of input like
 Foreign queries: require translation
 English queries: don’t try to translate!
 Ambiguous queries (no article): “figure” (English or French?), “time” (English or
Portuguese?)
 Misspelled queries: e.g. 334 likely spelling variants of “Samsung” in 10M queries
sansung samsug samsumg samung amsung samnsung smsung samsuns samaung sumsung smasung samsng samsing samusng sammlung ssamsung samdung
samusung sasmung sasung samsugn samgung samsum samsuung samsubg samsnug samsunng samsunf ssmsung samsunh samasung samnsug damsung
sampsung sanmsung samssung sammsung samsund saamsung aamsung samsyng samsungs samsong samsungg samsang samsungh samsunga sqmsung
sambung hamsun sasmsung samsumng samsaung samsunv samsunsg samnung salmsung samsunt samnsun sammung asmsung samsjng samunsg samsungn
samsunge salsung samyoung samusug samsui samsnung sampung samgun samesung samcung isamsung gamsung zamsung xsamsung samxung samsuny
samsunfg samsuing sameung ùsamsung szmsung swmsung smausng samumg samsusng samsunug samsunb samsoung samsiung samsdung samiang asamsung
sumsumg sumsong somsung smasun smamsung samzung samusun samsungù samsungo samsungf samsums samsujg samsuhg samsin samshng sampsun
saksung saamsug rsamsung lsamsung eamsung xamsung …
Brand Preservation
VS.
Machine Translation
Training Process
Machine Translation Training
Text.en-es
This is a line
Text.es-en
Esta es una línea
Machine Translation Training
Data Selection
Why Choose Data
 There are bilingual open source data sets available (legal, subtitles etc.), but language is
diverse and ambiguous
case (for court) vs. case (for a cell phone) vs. case (for a watch)
 Data genre is essential for domain specific machine translation
 We need to get human translation of (some) eBay data and train on it
How to Choose Data
Data Extraction: Sample Relevant Data
 Key buyer interest signals from clickstream logs:
o Queries: Search frequency
o Titles: Search page impressions
o Descriptions: Product page views
 Rank by popularity to exclude tail & outliers
 Sample proportionally by category weight
Data Selection: Maximize Language Coverage
 Ranking: Compare candidate data against existing training data
Parameters:
o Unknown words: selfie stick, x67df-25 …
o Phrase overlap: most similar or dissimilar data
o User popularity metric
 Selection: Minimize redundancy across ranked segments
 Send for human translation/post-editing
Evaluation
Pre-Launch Automatic Metrics: Traditional Approach
 Traditional metrics compare machine translation output to human translation through phrase
overlap (BLEU) and edit distance (WER, PER, TER)
BLEU: 70.71%
WER: 40%
TER: 20%
PER: 0%
 Require human translation
 Do not scale well and give only limited insights
IT source: strumenti musicali usati chitarra
classica
EN human translation: used musical instruments classical
guitar
EN machine translation: musical instruments classical guitar
used
Pre-Launch Automatic Metrics: eBay Extension
 Minimize the % of unknown words across all categories
 Minimize the % of falsely untranslated words
 Maximize brand preservation
 Expect lower null SRPs for machine translated vs. untranslated queries
 Expect similar category distribution for machine translated queries and human translated
queries
 Follow SLAs and CPU requirements
Pre-launch Human Evaluation
Professional linguist judgment on machine translated output given original segment
 Query translation:
○ Acceptability
○ Search result relevance
 Title translation: measure translation adequacy for purchasing decision
o Rate translation on 1-5 continuous scale;
o Emphasize product name translation and brand preservation
Seigneur des Anneaux Acceptable? Relevant
Master of the Rings Yes No
Lord of the Rings Yes Yes
Pre-Launch Human Evaluation: eBay Extension
On the web site users see item images and translations + English titles are hard to understand
vs. Fisherman Hunter Equipment Fishing Travel Bag Pack
Tackle Storage Outdoor Gear
Title clarity evaluation based on item image, not English title
Post-Launch Linguistic Quality Assurance
Manual QA to check against seasonal queries/categories and translation appearance online
Example: handling swear words in translation
****
****
Post-launch Evaluation: User Surveys
Machine translated item titles
improved my shopping
experience on eBay
Translation is of
high/highest quality
Post-launch Evaluation: User Surveys
 Question: Please rate the quality of the machine translated item title
“It would be
better if they
weren't
automated, but at
any rate, they are
sufficiently
good.”
“It would be better
if they weren't
automated, but at
any rate, they are
sufficiently good.”
Crowdsourcing Human Evaluation: Explicit User Feedback
Item title translations are accompanied by hover window that includes original title and rating
scale
Crowdsourcing Human Evaluation: Explicit User Feedback
It does not have to be bad to be rated!
Cross-validation with professional human evaluation:
➢ high level of agreement for high-rated translations (4-5);
➢ low-rated translations are more likely to receive an average rating from a professional linguist
User ratings exhibit sensitivity to poorly expressed grammatical relations
Measuring Language Performance
at Scale
Machine Translation A / B Testing
 Intuition vs Reality
 Data driven
 Reduce Risk
 Critical for measuring feature
performance
 Assess financial impact & user
engagement on site
Machine Translation A / B Testing
 Launched multiple tests in 2014
 Conducted deep dives of test data post wire-off
 Focused on specific signals, by language and product category:
No Translation Translation enabled
❏ Site exits ❏ Language abandonment ❏ User engagement
❏ Vocabulary loss ❏ Untranslated/Unknown words ❏ Search recall
❏ Hover response ❏ Conversion velocity ❏ Revenue per Visit
Title Translation A / B Test – Deep Dive
 2 problematic categories: Specialty Services and Musical
Instruments & Gear.
 Automatic MT metrics below average: more unknown words.
 Samples sent for human evaluation. Results < original release
candidate set.
 Hover feedback had lower scores ( < 3) in above 2 categories.
 Increased opt-out behavior seen in treatment vs. control group
Product Health Monitoring
 Daily jobs mine unstructured behavioral
clickstream data.
 Targeted attribution approach – analyze
demand and supply data within search blocks.
 Events processed/day ~ 7.5 Billion  Ability to react quickly and identify issues.
 Size of data processed/day ~ 10 TB  Intuitive visualizations leveraged by PM and PD
➢ Example KPI – Language Abandonment Rate
➢ Identify visitors who switch searching from their native language to English.
➢ Do not revert back to native language during subsequent search activity within given window.
➢ Strong indicator of translation quality :
poor translations null-to-low search recall poor search experience abandoning native language
RU BR LATAM
Product Health Monitoring
Translation Caching Strategy
 Improve latency by serving pre-cached translations
 Leverage inventory and clickstream data to define caching strategy
 Identify product categories where:
o Over time, more existing vs. new inventory seen
o Rate of Decay fastest
b = 1 −
𝑥 𝑦
𝑎
a: Initial pool of product listings
y: Final pool yet to be viewed
x: Time period
b: Percent decrease
1 – b: Decay factor
Technologies
Moses
 Make your data work for your use case!
 Analyze data in multiple ways!
 Avoid analysis paralysis!
Conclusion
If you talk to a man in a language he understands, that goes to his head.
If you talk to him in his language, that goes to his heart
N. Mandela
COMMERCE WITHOUT LANGUAGE BARRIERS

More Related Content

Viewers also liked (17)

โครงงานคอม
โครงงานคอมโครงงานคอม
โครงงานคอม
 
Resume vivek 015
Resume vivek 015Resume vivek 015
Resume vivek 015
 
โครงงานคอม
โครงงานคอมโครงงานคอม
โครงงานคอม
 
Accomplishments take out STD TIS
Accomplishments take out STD TISAccomplishments take out STD TIS
Accomplishments take out STD TIS
 
Welcome to 205
Welcome to 205Welcome to 205
Welcome to 205
 
6cw95kf4spaybsbhz5na 140618171029-phpapp02
6cw95kf4spaybsbhz5na 140618171029-phpapp026cw95kf4spaybsbhz5na 140618171029-phpapp02
6cw95kf4spaybsbhz5na 140618171029-phpapp02
 
Bachelor Australia
Bachelor AustraliaBachelor Australia
Bachelor Australia
 
Law6 050258-7
Law6 050258-7Law6 050258-7
Law6 050258-7
 
2558 project (บันทึกอัตโนมัติ)
2558 project  (บันทึกอัตโนมัติ)2558 project  (บันทึกอัตโนมัติ)
2558 project (บันทึกอัตโนมัติ)
 
KenB2016
KenB2016KenB2016
KenB2016
 
Gameproject
GameprojectGameproject
Gameproject
 
Kia Aki Encouraging Māori Values in the Workplace
Kia Aki Encouraging Māori Values in the WorkplaceKia Aki Encouraging Māori Values in the Workplace
Kia Aki Encouraging Māori Values in the Workplace
 
CV28022015
CV28022015CV28022015
CV28022015
 
Retail_Rocket_ita
Retail_Rocket_itaRetail_Rocket_ita
Retail_Rocket_ita
 
Finance 320 Research Project
Finance 320 Research ProjectFinance 320 Research Project
Finance 320 Research Project
 
Junk kills 1
Junk kills 1Junk kills 1
Junk kills 1
 
FOT Catalog 2016
FOT Catalog 2016FOT Catalog 2016
FOT Catalog 2016
 

Similar to Strata - Final_IB_02_17

Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Sajan
 
Search analytics what why how - By Otis Gospodnetic
 Search analytics what why how - By Otis Gospodnetic  Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic lucenerevolution
 
[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in UberPaul Lo
 
Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis GospodneticSearch analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodneticlucenerevolution
 
Global Search Engine Marketing
Global Search Engine MarketingGlobal Search Engine Marketing
Global Search Engine MarketingBill Hunt
 
LavaCon Kinetic, Mangaing the Translation Process-A Peek Behind the Curtain
LavaCon  Kinetic, Mangaing the Translation Process-A Peek Behind the CurtainLavaCon  Kinetic, Mangaing the Translation Process-A Peek Behind the Curtain
LavaCon Kinetic, Mangaing the Translation Process-A Peek Behind the CurtainScott Carothers
 
LavaCon2014 Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...
LavaCon2014   Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...LavaCon2014   Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...
LavaCon2014 Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...Scott Carothers
 
Product Internationalization Strategies by Amazon Alexa Sr PM
Product Internationalization Strategies by Amazon Alexa Sr PMProduct Internationalization Strategies by Amazon Alexa Sr PM
Product Internationalization Strategies by Amazon Alexa Sr PMProduct School
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganKirti Vashee
 
Top Trans Survey Translation Issues
Top Trans Survey Translation IssuesTop Trans Survey Translation Issues
Top Trans Survey Translation IssuesRaya Wasser
 
International Digital Marketing Mistakes, Opportunities & Future
International Digital Marketing Mistakes, Opportunities & FutureInternational Digital Marketing Mistakes, Opportunities & Future
International Digital Marketing Mistakes, Opportunities & FutureWill Cecil
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013pangeanic
 
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...Amazon Web Services
 
Bigit 2018 - data and nlp for content recommendation & personalized experience
Bigit 2018 -  data and nlp for content recommendation & personalized experienceBigit 2018 -  data and nlp for content recommendation & personalized experience
Bigit 2018 - data and nlp for content recommendation & personalized experienceKim Ming Teh
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivitykantanmt
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolometauyou
 

Similar to Strata - Final_IB_02_17 (20)

Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies
 
Search analytics what why how - By Otis Gospodnetic
 Search analytics what why how - By Otis Gospodnetic  Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic
 
[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber
 
Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis GospodneticSearch analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic
 
Global Search Engine Marketing
Global Search Engine MarketingGlobal Search Engine Marketing
Global Search Engine Marketing
 
Bill Hunt - Global Search
Bill Hunt - Global SearchBill Hunt - Global Search
Bill Hunt - Global Search
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
LavaCon Kinetic, Mangaing the Translation Process-A Peek Behind the Curtain
LavaCon  Kinetic, Mangaing the Translation Process-A Peek Behind the CurtainLavaCon  Kinetic, Mangaing the Translation Process-A Peek Behind the Curtain
LavaCon Kinetic, Mangaing the Translation Process-A Peek Behind the Curtain
 
LavaCon2014 Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...
LavaCon2014   Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...LavaCon2014   Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...
LavaCon2014 Kinetic, Mangaing the Translation Process- A Peek Behind the Cu...
 
Product Internationalization Strategies by Amazon Alexa Sr PM
Product Internationalization Strategies by Amazon Alexa Sr PMProduct Internationalization Strategies by Amazon Alexa Sr PM
Product Internationalization Strategies by Amazon Alexa Sr PM
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 
Top Trans Survey Translation Issues
Top Trans Survey Translation IssuesTop Trans Survey Translation Issues
Top Trans Survey Translation Issues
 
International Digital Marketing Mistakes, Opportunities & Future
International Digital Marketing Mistakes, Opportunities & FutureInternational Digital Marketing Mistakes, Opportunities & Future
International Digital Marketing Mistakes, Opportunities & Future
 
Web Globalization
Web GlobalizationWeb Globalization
Web Globalization
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013
 
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...
Improve Your Customer Experience with Machine Translation (AIM321) - AWS re:I...
 
Bigit 2018 - data and nlp for content recommendation & personalized experience
Bigit 2018 -  data and nlp for content recommendation & personalized experienceBigit 2018 -  data and nlp for content recommendation & personalized experience
Bigit 2018 - data and nlp for content recommendation & personalized experience
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
 

Strata - Final_IB_02_17

  • 1. Behavior-driven Machine Translation at eBay Asim Mathur, Irina Borisova
  • 2. Outline  Intro o Why is eBay Investing in Language Technology? o Machine Translation Experience at eBay o Key Data Challenges  Machine Translation Training Process o Data Selection o Evaluation  Measuring Language Performance at Scale
  • 3. Why is eBay Investing in Language Technology
  • 4.
  • 5.
  • 6. E-Commerce Growth by Region Source: Forrester Research
  • 7. Why Is Machine Translation Important For eBay?  Cross-border trade is growing 2x as domestic!  It’s already big: almost 25% of Inc. business  61% of eBay GMV is international
  • 8. Static Content …is translated by the localization team
  • 9. Dynamic Content …requires machine translation Inventory eligible for Russian market: 60M listings Average # of characters per listing: 3,000 Sentence duplication: 50% # of human translators: 1,000 It would take more than 5 years! And this is for one language pair only!
  • 10. Solution: Statistical Machine Translation  Statistical machine translation started about 20 years ago and is now very competitive  Aims at teaching a machine how to translate from one language to another using examples of human translated documents Training Data Model Translation Source sentence Translated sentence MT engine
  • 12. Translation Flow 1. A user issues a query in the foreign language
  • 13. Translation Flow 2. The engine translates the foreign query into English lunettes de soleil homme men’s sunglasses
  • 14. Translation Flow 3. The translated query is issued against the eBay English search engine to retrieve English inventory men’s sunglasses
  • 15. Translation Flow 4. The engine translates the English inventory into the foreign language
  • 16. Translation Flow 5. The translated inventory is served to the user
  • 18. Types of MT at eBay:  Search query translation  Item title translation  Item Descriptions (Planned)  Member-to-Member communication (Planned) Supported languages: Operational Statistics: ➢ Avg. translation calls for - Queries: ~90 Million per day Item Titles: ~180 Million ➢ Translation Latency - Queries: ~ 99%ile within 10 ms Item Titles: ~ 99%ile within 80 ms ➢ Service Availability: ~99.95 %  Russian  German  Spanish  Italian  Portuguese (Brazil)  Hindi (Planned)  French  Chinese (Planned)
  • 20. eBay Scale A pair of shoes sells every 2 seconds Women’s accessories sell every 2.5 seconds A Woman’s dress sold every 2 seconds A cell phone sold every 4 seconds Headphones sold every 12 seconds A major appliance sold every 19 seconds An car or truck sells every 5 minutes A Harley-Davidson sells every 38 minutes An iPad sells every 10 seconds A boat sells every 35 minutes
  • 21. Very Diverse Data A tiny sample from 15,000 categories. 800 million listings live at any given time.
  • 22. A Wide Range of Inputs The translation engine must accommodate a wide range of input like  Foreign queries: require translation  English queries: don’t try to translate!  Ambiguous queries (no article): “figure” (English or French?), “time” (English or Portuguese?)  Misspelled queries: e.g. 334 likely spelling variants of “Samsung” in 10M queries sansung samsug samsumg samung amsung samnsung smsung samsuns samaung sumsung smasung samsng samsing samusng sammlung ssamsung samdung samusung sasmung sasung samsugn samgung samsum samsuung samsubg samsnug samsunng samsunf ssmsung samsunh samasung samnsug damsung sampsung sanmsung samssung sammsung samsund saamsung aamsung samsyng samsungs samsong samsungg samsang samsungh samsunga sqmsung sambung hamsun sasmsung samsumng samsaung samsunv samsunsg samnung salmsung samsunt samnsun sammung asmsung samsjng samunsg samsungn samsunge salsung samyoung samusug samsui samsnung sampung samgun samesung samcung isamsung gamsung zamsung xsamsung samxung samsuny samsunfg samsuing sameung ùsamsung szmsung swmsung smausng samumg samsusng samsunug samsunb samsoung samsiung samsdung samiang asamsung sumsumg sumsong somsung smasun smamsung samzung samusun samsungù samsungo samsungf samsums samsujg samsuhg samsin samshng sampsun saksung saamsug rsamsung lsamsung eamsung xamsung …
  • 25. Machine Translation Training Text.en-es This is a line Text.es-en Esta es una línea
  • 28. Why Choose Data  There are bilingual open source data sets available (legal, subtitles etc.), but language is diverse and ambiguous case (for court) vs. case (for a cell phone) vs. case (for a watch)  Data genre is essential for domain specific machine translation  We need to get human translation of (some) eBay data and train on it
  • 30. Data Extraction: Sample Relevant Data  Key buyer interest signals from clickstream logs: o Queries: Search frequency o Titles: Search page impressions o Descriptions: Product page views  Rank by popularity to exclude tail & outliers  Sample proportionally by category weight
  • 31. Data Selection: Maximize Language Coverage  Ranking: Compare candidate data against existing training data Parameters: o Unknown words: selfie stick, x67df-25 … o Phrase overlap: most similar or dissimilar data o User popularity metric  Selection: Minimize redundancy across ranked segments  Send for human translation/post-editing
  • 33. Pre-Launch Automatic Metrics: Traditional Approach  Traditional metrics compare machine translation output to human translation through phrase overlap (BLEU) and edit distance (WER, PER, TER) BLEU: 70.71% WER: 40% TER: 20% PER: 0%  Require human translation  Do not scale well and give only limited insights IT source: strumenti musicali usati chitarra classica EN human translation: used musical instruments classical guitar EN machine translation: musical instruments classical guitar used
  • 34. Pre-Launch Automatic Metrics: eBay Extension  Minimize the % of unknown words across all categories  Minimize the % of falsely untranslated words  Maximize brand preservation  Expect lower null SRPs for machine translated vs. untranslated queries  Expect similar category distribution for machine translated queries and human translated queries  Follow SLAs and CPU requirements
  • 35. Pre-launch Human Evaluation Professional linguist judgment on machine translated output given original segment  Query translation: ○ Acceptability ○ Search result relevance  Title translation: measure translation adequacy for purchasing decision o Rate translation on 1-5 continuous scale; o Emphasize product name translation and brand preservation Seigneur des Anneaux Acceptable? Relevant Master of the Rings Yes No Lord of the Rings Yes Yes
  • 36. Pre-Launch Human Evaluation: eBay Extension On the web site users see item images and translations + English titles are hard to understand vs. Fisherman Hunter Equipment Fishing Travel Bag Pack Tackle Storage Outdoor Gear Title clarity evaluation based on item image, not English title
  • 37. Post-Launch Linguistic Quality Assurance Manual QA to check against seasonal queries/categories and translation appearance online Example: handling swear words in translation **** ****
  • 38. Post-launch Evaluation: User Surveys Machine translated item titles improved my shopping experience on eBay Translation is of high/highest quality
  • 39. Post-launch Evaluation: User Surveys  Question: Please rate the quality of the machine translated item title “It would be better if they weren't automated, but at any rate, they are sufficiently good.” “It would be better if they weren't automated, but at any rate, they are sufficiently good.”
  • 40. Crowdsourcing Human Evaluation: Explicit User Feedback Item title translations are accompanied by hover window that includes original title and rating scale
  • 41. Crowdsourcing Human Evaluation: Explicit User Feedback It does not have to be bad to be rated! Cross-validation with professional human evaluation: ➢ high level of agreement for high-rated translations (4-5); ➢ low-rated translations are more likely to receive an average rating from a professional linguist User ratings exhibit sensitivity to poorly expressed grammatical relations
  • 43. Machine Translation A / B Testing  Intuition vs Reality  Data driven  Reduce Risk  Critical for measuring feature performance  Assess financial impact & user engagement on site
  • 44. Machine Translation A / B Testing  Launched multiple tests in 2014  Conducted deep dives of test data post wire-off  Focused on specific signals, by language and product category: No Translation Translation enabled ❏ Site exits ❏ Language abandonment ❏ User engagement ❏ Vocabulary loss ❏ Untranslated/Unknown words ❏ Search recall ❏ Hover response ❏ Conversion velocity ❏ Revenue per Visit
  • 45. Title Translation A / B Test – Deep Dive  2 problematic categories: Specialty Services and Musical Instruments & Gear.  Automatic MT metrics below average: more unknown words.  Samples sent for human evaluation. Results < original release candidate set.  Hover feedback had lower scores ( < 3) in above 2 categories.  Increased opt-out behavior seen in treatment vs. control group
  • 46. Product Health Monitoring  Daily jobs mine unstructured behavioral clickstream data.  Targeted attribution approach – analyze demand and supply data within search blocks.  Events processed/day ~ 7.5 Billion  Ability to react quickly and identify issues.  Size of data processed/day ~ 10 TB  Intuitive visualizations leveraged by PM and PD
  • 47. ➢ Example KPI – Language Abandonment Rate ➢ Identify visitors who switch searching from their native language to English. ➢ Do not revert back to native language during subsequent search activity within given window. ➢ Strong indicator of translation quality : poor translations null-to-low search recall poor search experience abandoning native language RU BR LATAM Product Health Monitoring
  • 48. Translation Caching Strategy  Improve latency by serving pre-cached translations  Leverage inventory and clickstream data to define caching strategy  Identify product categories where: o Over time, more existing vs. new inventory seen o Rate of Decay fastest b = 1 − 𝑥 𝑦 𝑎 a: Initial pool of product listings y: Final pool yet to be viewed x: Time period b: Percent decrease 1 – b: Decay factor
  • 50.  Make your data work for your use case!  Analyze data in multiple ways!  Avoid analysis paralysis! Conclusion
  • 51. If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart N. Mandela COMMERCE WITHOUT LANGUAGE BARRIERS