Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bicrawler, Gema Ramirez (Prompsit)


Published on

Data, monolingual and bilingual, are essential to our industry. So, what do you do when you need more? Come and discover Bicrawler, a modern web-based tool to get high-quality parallel corpora from the Internet.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bicrawler, Gema Ramirez (Prompsit)

  1. 1. Bicrawler: create bitexts from multilingual websites By Gema Ramírez/Prompsit
  2. 2. Translations are our best friends: we learn from them, reuse them, exploit them… But we lack translations!! More data? REALLY???
  3. 3. Deep learning is data-hungry!!
  4. 4. Not enough for some languages and domains… Automotive in en-ar UN corpus? No translations found 
  5. 5. But transla- tions are… …out there
  6. 6. There is plenty of multilingual content Let’s get it!!! Lurking in the world wide web
  7. 7. All I want is to get translations from a website!!! 
  8. 8. Thanks! Bicrawler: create bitexts from multilingual websites Gema Ramírez/Prompsit