Daurum: Introduction

914 views
843 views

Published on

Introduction to Daurum. Daurum is a deduplication and fusion technology by Sparsty-Technologies.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
914
On SlideShare
0
From Embeds
0
Number of Embeds
215
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Daurum: Introduction

  1. 1. Deduplication & Fusion<br />info@sparsity-technologies.com<br />
  2. 2. Index<br /><ul><li>Introduction
  3. 3. Process
  4. 4. Successful stories
  5. 5. Demo</li></li></ul><li>Index<br /><ul><li>Introduction
  6. 6. Process
  7. 7. Successful stories
  8. 8. Demo</li></li></ul><li>IntroductionBenefits<br />Identification of suspected duplicated records inside a database<br />Merging of data belonging to several databases with different formats detecting duplicated records<br />Validation tools for the detected similarities<br />
  9. 9. IntroductionDeduplication<br />
  10. 10. Introduction<br />Deduplication<br />Configuration<br />Automatic execution<br />Validation of results<br />Personalized export<br />
  11. 11. IntroductionDeduplication<br />Configuration<br />Automatic execution<br />Validation of results<br />Personalized export<br />
  12. 12. IntroductionFusion<br />
  13. 13. IntroductionFusion<br />Configuration<br />Automatic execution<br />Validation of results<br />Personalized export<br />
  14. 14. IntroductionFusion<br />Configuration<br />Automatic execution<br />Validation of results<br />Personalized export<br />
  15. 15. IntroductionFeatures<br />
  16. 16. Index<br /><ul><li>Introduction
  17. 17. Process
  18. 18. Successful stories
  19. 19. Demo</li></li></ul><li>ProcessConfigurations<br /><ul><li>Input data file format: CSV
  20. 20. Select relevant columns to link registers
  21. 21. Relation between columns from different data sources (only when merging)
  22. 22. Assign types to columns to help using the most adequate automatic filters</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  23. 23. ProcessConfigurations<br /><ul><li>Comparative type: exact value, estimation by text, numerical estimation
  24. 24. Percentage of the importance of each column for the similarity computation</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />30%<br />35%<br />35%<br /> 100% =<br />Excel<br />PDF<br />XML<br />CSV<br />
  25. 25. ProcessConfigurations<br /><ul><li>Specific percentage for registers with null valued columns
  26. 26. Use filters to make values standard
  27. 27. Available automatic and specific filters for values such as name, dates, address, etc…</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  28. 28. ProcessConfigurations<br /><ul><li>Edit filters (create new filters, delete or update existing ones)
  29. 29. Use of dictionaries: name-converter dictionary (I.e.: Pepe Jose)
  30. 30. Similarity computation algorithm configuration:
  31. 31. Size for the sliding window: number of registers compared at the same time
  32. 32. Order by column
  33. 33. Threshold of accepted similarity</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  34. 34. ProcessExecution<br /><ul><li> Order by Surname 1
  35. 35. Sliding window = 2</li></ul>CSV<br />Configurations<br />Execution<br />Window = 2<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  36. 36. ProcessExecution<br /><ul><li> Similarities detected</li></ul>CSV<br />Configurations<br />Execution<br />Window = 2<br />Validation<br />Similarities <br />Exportation<br />Similarity degree<br />Excel<br />PDF<br />XML<br />CSV<br />
  37. 37. ProcessExecution<br /><ul><li>Similarities detected</li></ul>CSV<br />window = 2<br />Configurations<br />Similarities<br />Execution<br />Validation<br />Exportation<br />Similaritydegree<br />Excel<br />PDF<br />XML<br />CSV<br />
  38. 38. ProcessExecution<br /><ul><li>List of detected similarities</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  39. 39. ProcessExecution<br /><ul><li>List of detected similarities with percentage bigger than threshold 50% </li></ul>CSV<br />Configurations<br />> 50%<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  40. 40. ProcessValidation<br /><ul><li>Validation of results (including only those above the threshold)
  41. 41. Visualize by similarity/by group
  42. 42. Massive validation
  43. 43. Share validation between several supervisors</li></ul>CSV<br />Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  44. 44. ProcessExportation<br />CSV<br /><ul><li>Select output format</li></ul>Configurations<br />Execution<br />Validation<br />Exportation<br />Excel<br />PDF<br />XML<br />CSV<br />
  45. 45. Index<br /><ul><li>Introduction
  46. 46. Process
  47. 47. Successful stories
  48. 48. Demo</li></li></ul><li>Successful storiesHealth Service<br />Who? Health Service<br />Objective Detect repeated health id cards<br />Solution Detect repeated registers in the database and delete them<br />Deduplicaction with DAURUM<br />Result Health id cards database cleaned of repetitions<br />
  49. 49. Successful storiesBeer Manufacturer<br />Who? Beer manufacturer<br />Objective Detect dealers that deliver to not previously assigned centers <br />Solution Identify duplicates in each dealer’s delivery database and delete them<br />Deduplication with DAURUM<br /> Detect deliveries to centers shared between different dealers<br />Fusion with DAURUM<br />Result Master database clean of repetitions and detection of dealers with wrong deliveries<br />
  50. 50. Index<br /><ul><li>Introduction
  51. 51. Process
  52. 52. Successful stories
  53. 53. Demo</li></li></ul><li>Demo<br />
  54. 54. Thanks for your attention<br />Any questions?<br />Pere Baleta Ferrer<br />CEO<br />pbaleta@sparsity-technologies.com<br />Josep Lluís Larriba Pey<br />Founder<br />larri@sparsity-technologies.com<br />SPARSITY-TECHNOLOGIES<br />Jordi Girona, 1-3, Edifici K2M 08034 Barcelona<br />info@sparsity-technologies.com<br />http://www.sparsity-technologies.com<br />

×