Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tamr | cdo-summit


Published on

Enterprise data unification in practice with Ihab Ilyas, co-founder of Tamr and professor at University of Waterloo

Published in: Technology
  • Be the first to comment

Tamr | cdo-summit

  1. 1. Enterprise Data Unification in Practice IHAB ILYAS Professor, University of Waterloo Co-founder, Tamr, Inc. @ihabilyas
  2. 2. Top-Down Data Integration Limits Data Quality and Connectedness <10% Enterprise data is siloed . . . . . . expensive to connect & curate # of sources $ The Consequences: • Limited data available • Missed opportunity • Ballooning costs
  3. 3. Hiring More Data Experts Is Not the Answer Reality Enterprise RealityGoal • Manual data collection and preparation • Long lead-time to analyses • Limited individual view on variety of data • Extensive rework • No cohesive view of data efforts • Expertise across organization is underutilized
  4. 4. Data Curation: Many Definitions and One Goal Extract Value from Data “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” NYtimes August, 2014
  5. 5. Exploding Big Data Variety Will Make the Problem Worse RadicalIncreasein DataVariety 0 2000 2011 Source: IDC 2011 Digital Universe Study 1.0 2.0 Corporate databases Semi-structured data JSON Sources Increasingly valuable Missing Capability: Connecting and curating in an automated way
  6. 6. Structured and Semi-structured Data Sources Collaborative Curation Data Experts (Source owners) Data Stewards and Curators Data Inventory APIs Systems Tools Data Scientists The Core of Tamr: Machine Learning with Human Insight Advanced Algorithms & Machine Learning Expert Input Integrated Data & Metadata Identify sources, understand relationships and curate the massive variety of siloed data Expert Directory
  7. 7. DemoExample Use Cases
  8. 8. Solution Overview: Sourcing & Supply Chain Spend Optimization The Problem • Part/supplier data in ERPs, life cycle management systems, and catalogs across departments • Inaccurate data / incongruent naming conventions The Solution • Create a unified schema that leverages all relevant data sources, including parts, procurement, logistical, and vendor data Benefit • Discover opportunities to optimize purchases across different suppliers and lines of business Tamr Unified View Hundreds of Potential Sources
  9. 9. Solution Overview: Customer Data Integration The Problem • Customer data stored in CRMs, data warehouses, back-office applications, and other enriching sources • Complexity of unifying personal data / incongruent naming conventions / data sparseness / manual entry The Solution • Create a holistic and adaptive customer view by unifying disparate data sources across the enterprise Benefits • Apply a unified and enriched customer view across multiple channels / lines of business • Discover hidden opportunities to improve upsell / cross-sell, reduce churn, and identify key opinion leaders (KOL) via enhanced segmentation/targeting
  10. 10. Solution Overview: Clinical Trials The Problem • Clinical trial data is reported in a wide variety of formats, ontologies and standards • Underspecified attribute names, varying qualities of annotation, duplicate data etc… The Solution • Unify attribute names to build a common clinical trial data model Benefit • Ability to cluster clinical trials based on drug, target or investigator • Easier way to aggregate and report ongoing trial data • Simplified reporting for various agency ontologies
  11. 11. Solution Overview: Medical Instruments The Problem • Instruments perform experiments at thousands of labs and hospitals across the world • Data stored in inconsistent formats and standards across various labs and hospitals The Solution • Build a unified view of instruments leveraging all available internal/external data-sources Benefit • Ability to cluster analysis based on instrument, location and other attributes
  12. 12. Tamr Architecture: a Data Curation Stack
  13. 13. DemoQuestions? @Tamr_Inc