Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta

916 views

Published on

As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers.

Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta

  1. 1. Hadoop User Group London: Data Wrangling on Hadoop September 8 2016 Olivier de Garrigues, EMEA Solutions Lead
  2. 2. Creating radical productivity for people who analyze data. JEFFREY HEER Co-Founder & CXO VISUALIZATION JOE HELLERSTEIN Co-Founder & CSO BIG DATA SEAN KANDEL Co-Founder & CTO HUMAN-COMPUTER INTERACTION
  3. 3. 3 3,000+ Companies 10,000+ Users
  4. 4. What is Data Wrangling? 4 QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
  5. 5. The Bridge Between Raw Data & Analysis 5 v Ingestion Storage Processing ANALYSIS & VISUALIZATION LOBCLEANING ENRICHMENT DISTILLATIONSTRUCTURINGDISCOVERY End-User Capabilities IT GOVERNANCE INTEGRATION AVAILABILTIYSCALABILITYSECURITY Technical Capabilities
  6. 6. Conventional Approaches Inhibit User Empowerment Hand-Coding Technical Workflow Mapping
  7. 7. Trifacta Approach: It’s All About The Experience Interact Predict Preview
  8. 8. Data Wrangling for Financial Fraud
  9. 9. TRIFACTA DATA WRANGLING WORKFLOW Trifacta. Confidential & Proprietary. Sample Scale Up Refine Sample Results Identify/Register Data 1. Predictive Interaction 2 . Consume Schedulers Monitor and Adjust 3 . Schedule Visualization & Analysis Secure Access
  10. 10. Ingestion Processing Storage ANALYSIS & CONSUMPTION v Discover Structure Clean Enrich Distill LOB IT News Topics Time Trades Tickers Date $ eMails Recipients Topics Phone Logs Call Details Recipients Corporations Company Relations Individuals Financial Services use case: Trader Fraud
  11. 11. Data Wrangling Benefits ➔  Empower the people who know the data best ➔  Accelerate time to value ➔  Lower business risk with more accurate data ➔  Unlock innovation using a wider variety of data

×