Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)

4,699 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics

Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)

  1. 1. © Nube Technologies Real Time Fuzzy Matching With Spark and ElasticSearch
  2. 2. © Nube Technologies About Us The only way to do great work is to love what you do. - Steve Jobs
  3. 3. © Nube Technologies The problem - lake or swamp?
  4. 4. © Nube Technologies Duplicates
  5. 5. © Nube Technologies Challenges ● Quadratic problem ● No standard notion of similarity ● Omissions, typos and other issues ● Different languages
  6. 6. © Nube Technologies Use Case - Customer Record Dedup
  7. 7. © Nube Technologies Use Case - Customer Record Dedup
  8. 8. © Nube Technologies Use Case - Shopping Site Comparison
  9. 9. © Nube Technologies Use Case - Shopping Site Comparison
  10. 10. © Nube Technologies Other Use Cases ● Cross selling ● Financial Credit Ratings ● Fraud Analytics ● Catalog and inventory management ● Household and individual level analytics.
  11. 11. © Nube Technologies Lets start wishing... ● Data variety ● Scalable ● No manual configuration of rules or algorithms ● Multi language ● Real time
  12. 12. © Nube Technologies Reifier - learn
  13. 13. © Nube Technologies Reifier - learn
  14. 14. © Nube Technologies Reifier - learn
  15. 15. © Nube Technologies Reifier - learn
  16. 16. © Nube Technologies Real Time Spark + ElasticSearch
  17. 17. © Nube Technologies Spark Benefits ● Distributed ● Scalable ● Fast ● Machine Learning ● Sampling ● No need to orchestrate multiple jobs
  18. 18. © Nube Technologies Thank You! www.nubetech.co sonal@nubetech.co

×