SlideShare a Scribd company logo
1 of 11
Download to read offline
Hadoop User Group London: Data Wrangling on Hadoop
September 8 2016
Olivier de Garrigues, EMEA Solutions Lead
Creating radical productivity
for people who analyze data.
JEFFREY HEER
Co-Founder & CXO
VISUALIZATION
JOE HELLERSTEIN
Co-Founder & CSO
BIG DATA
SEAN KANDEL
Co-Founder & CTO
HUMAN-COMPUTER INTERACTION
3
3,000+ Companies 10,000+ Users
What is Data Wrangling?
4
QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
The Bridge Between Raw Data & Analysis
5
v
Ingestion Storage Processing
ANALYSIS & VISUALIZATION
LOBCLEANING ENRICHMENT DISTILLATIONSTRUCTURINGDISCOVERY
End-User Capabilities
IT
GOVERNANCE INTEGRATION AVAILABILTIYSCALABILITYSECURITY
Technical Capabilities
Conventional Approaches Inhibit User Empowerment
Hand-Coding Technical Workflow Mapping
Trifacta Approach: It’s All About The Experience
Interact Predict
Preview
Data Wrangling
for Financial Fraud
TRIFACTA
DATA WRANGLING WORKFLOW
Trifacta. Confidential & Proprietary.
Sample Scale Up
Refine
Sample
Results
Identify/Register Data
1.
Predictive Interaction
2
.
Consume
Schedulers
Monitor and Adjust
3
.
Schedule
Visualization & Analysis
Secure Access
Ingestion Processing Storage
ANALYSIS & CONSUMPTION
v
Discover Structure Clean Enrich Distill
LOB
IT
News
Topics
Time
Trades
Tickers
Date
$
eMails
Recipients
Topics
Phone Logs
Call Details
Recipients
Corporations
Company Relations
Individuals
Financial Services use case: Trader Fraud
Data Wrangling Benefits
➔  Empower the people who know the data best
➔  Accelerate time to value
➔  Lower business risk with more accurate data
➔  Unlock innovation using a wider variety of data

More Related Content

What's hot

Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 

What's hot (20)

Apouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-dataApouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-data
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge Graph
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Combining a Knowledge Graph and Graph Algorithms to Find Hidden Skills at NASA
Combining a Knowledge Graph and Graph Algorithms to Find Hidden Skills at NASACombining a Knowledge Graph and Graph Algorithms to Find Hidden Skills at NASA
Combining a Knowledge Graph and Graph Algorithms to Find Hidden Skills at NASA
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Webinar - Fighting Bank Fraud with Real-time Graph Database
Webinar - Fighting Bank Fraud with Real-time Graph Database Webinar - Fighting Bank Fraud with Real-time Graph Database
Webinar - Fighting Bank Fraud with Real-time Graph Database
 
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 

Viewers also liked

Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 

Viewers also liked (8)

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
 
Role of Analytics in Consumer Packaged Goods Industry
Role of Analytics in Consumer Packaged Goods IndustryRole of Analytics in Consumer Packaged Goods Industry
Role of Analytics in Consumer Packaged Goods Industry
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Préparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec TrifactaPréparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec Trifacta
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 

Similar to Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta

SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
Bardess Group
 

Similar to Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta (20)

Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
 
Seminario OpenStack - Universita' Tor Vergata - Cattedra Cloud Computing
Seminario OpenStack - Universita'  Tor Vergata - Cattedra Cloud Computing Seminario OpenStack - Universita'  Tor Vergata - Cattedra Cloud Computing
Seminario OpenStack - Universita' Tor Vergata - Cattedra Cloud Computing
 
World’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdfWorld’s 10 Best Data Integration Solution Providers 2022.pdf
World’s 10 Best Data Integration Solution Providers 2022.pdf
 
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
 
AI is moving from its academic roots to the forefront of business and industry
AI is moving from its academic roots to the forefront of business and industryAI is moving from its academic roots to the forefront of business and industry
AI is moving from its academic roots to the forefront of business and industry
 
GraphTalks Frankfurt - Graph Database Überblick
GraphTalks Frankfurt - Graph Database ÜberblickGraphTalks Frankfurt - Graph Database Überblick
GraphTalks Frankfurt - Graph Database Überblick
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
Hitachi Data Systems Big Data Roadmap
Hitachi Data Systems Big Data RoadmapHitachi Data Systems Big Data Roadmap
Hitachi Data Systems Big Data Roadmap
 
Record manager 8.0 presentation
Record manager 8.0  presentationRecord manager 8.0  presentation
Record manager 8.0 presentation
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
 
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEnHP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
HP Software Performance Tour 2014 - Vincere i Big Data con HP HAVEn
 
How HudsonAlpha Transforms Hybrid Cloud Deployment Complexity Into a Manageme...
How HudsonAlpha Transforms Hybrid Cloud Deployment Complexity Into a Manageme...How HudsonAlpha Transforms Hybrid Cloud Deployment Complexity Into a Manageme...
How HudsonAlpha Transforms Hybrid Cloud Deployment Complexity Into a Manageme...
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 

More from huguk

More from huguk (20)

ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta

  • 1. Hadoop User Group London: Data Wrangling on Hadoop September 8 2016 Olivier de Garrigues, EMEA Solutions Lead
  • 2. Creating radical productivity for people who analyze data. JEFFREY HEER Co-Founder & CXO VISUALIZATION JOE HELLERSTEIN Co-Founder & CSO BIG DATA SEAN KANDEL Co-Founder & CTO HUMAN-COMPUTER INTERACTION
  • 4. What is Data Wrangling? 4 QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
  • 5. The Bridge Between Raw Data & Analysis 5 v Ingestion Storage Processing ANALYSIS & VISUALIZATION LOBCLEANING ENRICHMENT DISTILLATIONSTRUCTURINGDISCOVERY End-User Capabilities IT GOVERNANCE INTEGRATION AVAILABILTIYSCALABILITYSECURITY Technical Capabilities
  • 6. Conventional Approaches Inhibit User Empowerment Hand-Coding Technical Workflow Mapping
  • 7. Trifacta Approach: It’s All About The Experience Interact Predict Preview
  • 9. TRIFACTA DATA WRANGLING WORKFLOW Trifacta. Confidential & Proprietary. Sample Scale Up Refine Sample Results Identify/Register Data 1. Predictive Interaction 2 . Consume Schedulers Monitor and Adjust 3 . Schedule Visualization & Analysis Secure Access
  • 10. Ingestion Processing Storage ANALYSIS & CONSUMPTION v Discover Structure Clean Enrich Distill LOB IT News Topics Time Trades Tickers Date $ eMails Recipients Topics Phone Logs Call Details Recipients Corporations Company Relations Individuals Financial Services use case: Trader Fraud
  • 11. Data Wrangling Benefits ➔  Empower the people who know the data best ➔  Accelerate time to value ➔  Lower business risk with more accurate data ➔  Unlock innovation using a wider variety of data