Saša Hassan presented recent launches of Machine Translation on the eBay site for various locales, e.g. Russian and Latin American markets, which enables buyers to shop in their native languages and fosters overall cross-border trade. The core MT stack is based on Moses and is embedded in a larger orchestration layer.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
3. Who We Are
– One of the world’s largest online marketplaces
– Connect buyers and sellers globally
– 152+M active users
– 800+M total listings (80% new)
– Enabled Commerce Volume in Q3 2014 was $63 billion
– Cross-border trade grew 27%, representing $14 billion of ECV
– Technical scale:
• 150PB of data storage (Hadoop)
• 10k nodes / 150+k cores
• Log aggregation: 8-10TB per day
• 300+M user queries per day
• 8.6B pages served per day
eBay MT 3
14. Teams
eBay MT 14
MT Engineering
Orchestration Layer
Deployment
Monitoring
MT Science
Data acquisition
Engine training & Analytics
Quality improvements
L10N / MTLS
Human translations
Post-Editing
Evaluation & Feedback
15. Technology stack
– Orchestration layer (Java)
– Core MT based on Moses (XMLRPC):
• Phrase-based decoder w/ out-of-the-box features
• Tuned heavily for translation throughput:
– 20 msec per user search query (online, “realtime”)
– <500 msec per item title (offline, “near-realtime”)
– Caching (MongoDB)
– Translated queries:
• 72M per day
• 99%-ile at 19 msec
– Translated titles:
• 30M per day
• 99%-ile at 100 msec
Moses
eBay MT 15
16. Technology stack (cont’d)
– In-domain data (Teradata):
• Item titles sampled from data warehouse
• Relevant categories based on #impressions
– User behavioral data (Hadoop):
• Analytics based on click-through statistics
and other user engagement
eBay MT 16
17. Quality assurance
– Post-edited eBay-specific data sets for training (100+k)
– Human-translated eBay-specific data sets for tuning and testing (1+k)
FR
PT PT
PT
– Automatic evaluations:
• Search recall,
• Brand names preservation, and
• English query preservation (e.g. for XEN)
• Out-Of-Vocabulary rate, position-independent error rate (PER), BLEU
eBay MT 17
Query MT
vitesse speed
duplo double
rei king
Query MT
e cigarette and cigarette
cobra snake
car because
сумки из
натуральной кожи
#search
results
bags of genuine leather 161
genuine leather bag 47,097
FR
PT
RU
18. Quality assurance (cont’d)
ENPT
Item title Authentic Coach Ladies Purse MEDIUM
Bing translation Autêntico treinador senhoras bolsa médio
eBay translation Authentic Coach Senhoras Bolsa Médio
– Human evaluations:
• Acceptability (QT), 1-5 ratings (ITT), internal release criteria
eBay MT 18
ENPT
Item title Triumph Stag Wind Deflector
Bing translation Defletor de vento veado de triunfo
eBay translation Triumph Stag defletor do vento
19. Summary
– eBay Machine Translation:
• Moses core
• Complex orchestration layer
• Optimized for speed & quality
• eBay-specific evaluation criteria
– Coordination among 3 teams:
• MT Engineering
• MT Applied Science
• Localization, MT language specialists
– Analytics and monitoring:
• User behavioral data
• System health
eBay MT 19