SlideShare a Scribd company logo
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Data Wrangling and the Art of Big Data Discovery
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
Should I Bring My Tools?
Ø  Hammers aren’t good for
plumbing!
Ø  Big Data requires a new set
of tools
Ø  Preparing and Exploring are
very different
Ø  Don’t throw out your old
tool box!
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
Trifacta and Zoomdata
Trifacta offers a platform for
data transformation and
preparation
  The interface is rich in
visualization and provides
previews and
recommendations
  The platform also includes a
learning layer which employs
machine learning algorithms to
facilitate automation and self-
learning
Zoomdata is a Big Data
exploration, visualization and
analytics platform
  The platform offers a wide
range of analytics and BI tools,
such as dashboards, stream
processing and IoT analytics
  Its pre-built connectors allow
the Zoomdata server to
connect directly to data
sources
Twitter Tag: #briefr The Briefing Room
Guests:
Russ Cosentino is Vice President of Marketing & Business
Development at Zoomdata. Throughout his career he has focused on
developing solutions that leverage technology to solve business
problems. His experience includes application development for
mission critical systems for the DoD, automated recruitment
programs for the intelligence community and the application of text
analytics for commercial VOC programs.
Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a
Professor of Computer Science at Berkeley. His career in research
and industry has focused on data-centric systems and the way they
drive computing. In 2010, Fortune Magazine included him in their
list of 50 smartest people in technology, and MIT Technology Review
magazine included his Bloom language for cloud computing on their
TR10 list of the 10 technologies “most likely to change our world.”
Data Wrangling and the Art
of Big Data Discovery
Dr. Joe Hellerstein
Professor, EECS Computer Science Division, UC Berkeley
Co-founder & Chief Strategy Officer, Trifacta
DATA WRANGLING
AND THE ART OF BIG DATA DISCOVERY
Russ Cosentino
Vice President
Marketing & Business Development, Zoomdata
Founded in 2012, from Berkeley/Stanford research roots
dp = data to the people
“facilitating interactions between people and data
throughout the analytic lifecycle”
Stanford Visualization Group’s “Data Wrangler”
Elegant solutions for a messy world:
The 80% problem of preparing data for exploratory analytics
TRADITIONAL APPROACH TO DATA MANAGEMENT
Enterprise	
  Data	
  Warehouse	
  
Implement	
  Data	
  Sources	
  
ETL	
  
Structured	
  
Ingest
Storage	
  #1,	
  2,	
  N	
  
ELT	
  
Store	
  &	
  Process	
  
EDW	
  
Archive	
  
ETL	
  
Access	
  Data	
  
Analyze	
  Data	
  
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT	
  
ELT	
  
MANY PEOPLE INVOLVED IN THE PROCESS
DATA
ARCHITECT
DATABASE
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
BUSINESS
ANALYST
BI
ADMINISTRATOR
SYSTEM
ADMINISTRATOR
IT COULD BE SIMPLER
DATABASE ADMINISTRATOR BUSINESS ANALYST
MODERN DATA AND VISUALIZATION ENVIRONMENT
Visualiza8on	
  Data	
  Sources	
  
Structured	
  
Ingest
Store	
  &	
  Process	
   Data	
  Prepara8on	
  
Serve
Unstructured	
  
Ingest
Serve
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
Real-Time
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
+48%
Speed of Delivery
+42%
Self-Service Access
+40%
Decision Quality
Real-Time Big Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15%
Cash Increase
+26%
Pipeline Growth
-67%
Cost Reduction
+70%
Collaboration
+64%
Decision Speed
+61%
User Adoption
+48%
Speed of Delivery
+42%
Self-Service Access
+40%
Decision Quality
Real-Time InteractiveBig Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
DEMONSTRATION
MODERN DATA ARCHITECTURE FOR SELF-SERVICE
INTELLIGENCE
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
I am not a
number!
To Round-Up & Wrangle
Robin Bloor, PhD
The Flow of Data
The movement of data:
from ACQUISITION
through PREPARATION
to ANALYSIS
Is not necessarily simple…
The General Picture
Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
Immediate Analytics & the Rest
§  Metadata discovery
§  Metadata management
§  Data cleansing
§  Data lineage
IMMEDIATE ANALYTICS Data Sources
Analytics
Service
Mgt
Life Cycle
Mgt
MetaData
Discovery
MDM
MetaData
Mgt
Data
Cleansing
Data
Lineage
R
O
U
N
D
|
U
P
W
R
A
N
G
L
I
N
G
Staging Area
(Hadoop)
Data Warehouse
or other location
Data Streams
ETL
ETL
§  MDM
§  Service mgt
§  Lifecycle mgt
§  ETL
DOWNSTREAM
The Analytics Business Process
§  The main point to note
is that it is iterative
§  It has morphed, because
of:
o  Data availability
o  Parallel technology
o  Scalable software
o  Open source tools
o  M/C learning
Data
Access
Data
Prep
Model
Analyze
Deploy
Execute
Analytical Latencies
1.  Data access
2.  Data preparation
3.  Model development
4.  Execution
5.  Implementation
6.  Model audit & update
This is where the
rubber meets the road:
Speed = Value
The Impending Reality
Technology is speeding up analytics
by TWO ORDERS OF MAGNITUDE
(on the IT side)
This is changing analytics
u  Is your capability only relevant to analytics or
does it have broader areas of application?
u  Technically, what makes it fast?
u  Please comment on analytical workloads:
- What do you see as the natural IT bottlenecks?
- What do you see as the natural business
bottlenecks?
u  Do we want business analysts to become ersatz
data scientists?
u  In respect to scale, what is your largest
implementation by data volume, and what was
the industry sector/problem space?
u  Who do you partner with?
u  What do you see as the largest barrier to
adoption of Trifacta?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons and Wikipedia, including:
"Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg

More Related Content

What's hot

What's hot (20)

Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data StoresUsing A Distributed Graph Database To Make Sense Of Disparate Data Stores
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 

Viewers also liked

Impact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherenceImpact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherence
Skillet Tony
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & Secondary
Prathamesh Parab
 

Viewers also liked (16)

Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Real time analytics in Big Data
Real time analytics in Big DataReal time analytics in Big Data
Real time analytics in Big Data
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
 
Impact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherenceImpact of health education on tuberculosis drug adherence
Impact of health education on tuberculosis drug adherence
 
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
OUR GOAL AND FOCUS FOR "OPEN FOG CONSORTIUM"
 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
Data Collection-Primary & Secondary
Data Collection-Primary & SecondaryData Collection-Primary & Secondary
Data Collection-Primary & Secondary
 

Similar to Data Wrangling and the Art of Big Data Discovery

How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
mark madsen
 

Similar to Data Wrangling and the Art of Big Data Discovery (20)

Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th feb
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Joe C
Joe CJoe C
Joe C
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
 
Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics Fire in the Hole: How a Spark-Powered Platform Charges Analytics
Fire in the Hole: How a Spark-Powered Platform Charges Analytics
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Drive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven CultureDrive It Home: A Roadmap for Today's Data-Driven Culture
Drive It Home: A Roadmap for Today's Data-Driven Culture
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers Analysts
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 

More from Inside Analysis

Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 

More from Inside Analysis (20)

An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 

Recently uploaded

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Data Wrangling and the Art of Big Data Discovery

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Briefing Room Data Wrangling and the Art of Big Data Discovery
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 6. Twitter Tag: #briefr The Briefing Room Should I Bring My Tools? Ø  Hammers aren’t good for plumbing! Ø  Big Data requires a new set of tools Ø  Preparing and Exploring are very different Ø  Don’t throw out your old tool box!
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room Trifacta and Zoomdata Trifacta offers a platform for data transformation and preparation   The interface is rich in visualization and provides previews and recommendations   The platform also includes a learning layer which employs machine learning algorithms to facilitate automation and self- learning Zoomdata is a Big Data exploration, visualization and analytics platform   The platform offers a wide range of analytics and BI tools, such as dashboards, stream processing and IoT analytics   Its pre-built connectors allow the Zoomdata server to connect directly to data sources
  • 9. Twitter Tag: #briefr The Briefing Room Guests: Russ Cosentino is Vice President of Marketing & Business Development at Zoomdata. Throughout his career he has focused on developing solutions that leverage technology to solve business problems. His experience includes application development for mission critical systems for the DoD, automated recruitment programs for the intelligence community and the application of text analytics for commercial VOC programs. Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world.”
  • 10. Data Wrangling and the Art of Big Data Discovery
  • 11. Dr. Joe Hellerstein Professor, EECS Computer Science Division, UC Berkeley Co-founder & Chief Strategy Officer, Trifacta DATA WRANGLING AND THE ART OF BIG DATA DISCOVERY Russ Cosentino Vice President Marketing & Business Development, Zoomdata
  • 12. Founded in 2012, from Berkeley/Stanford research roots dp = data to the people “facilitating interactions between people and data throughout the analytic lifecycle” Stanford Visualization Group’s “Data Wrangler” Elegant solutions for a messy world: The 80% problem of preparing data for exploratory analytics
  • 13. TRADITIONAL APPROACH TO DATA MANAGEMENT Enterprise  Data  Warehouse   Implement  Data  Sources   ETL   Structured   Ingest Storage  #1,  2,  N   ELT   Store  &  Process   EDW   Archive   ETL   Access  Data   Analyze  Data   Search Statistical Machine Learning SQL Serve Serve Optimize Implement Custom Application Point Solution ELT   ELT  
  • 14. MANY PEOPLE INVOLVED IN THE PROCESS DATA ARCHITECT DATABASE ADMINISTRATOR SYSTEM ADMINISTRATOR BUSINESS ANALYST BI ADMINISTRATOR SYSTEM ADMINISTRATOR
  • 15. IT COULD BE SIMPLER DATABASE ADMINISTRATOR BUSINESS ANALYST
  • 16. MODERN DATA AND VISUALIZATION ENVIRONMENT Visualiza8on  Data  Sources   Structured   Ingest Store  &  Process   Data  Prepara8on   Serve Unstructured   Ingest Serve
  • 17. REAL BENEFITS OF A SELF-SERVICE APPROACH +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction Real-Time
  • 18. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time Big Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  • 19. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +70% Collaboration +64% Decision Speed +61% User Adoption +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time InteractiveBig Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  • 21. MODERN DATA ARCHITECTURE FOR SELF-SERVICE INTELLIGENCE
  • 22.
  • 23. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 24. I am not a number! To Round-Up & Wrangle Robin Bloor, PhD
  • 25. The Flow of Data The movement of data: from ACQUISITION through PREPARATION to ANALYSIS Is not necessarily simple…
  • 26. The General Picture Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL
  • 27. Immediate Analytics & the Rest §  Metadata discovery §  Metadata management §  Data cleansing §  Data lineage IMMEDIATE ANALYTICS Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL §  MDM §  Service mgt §  Lifecycle mgt §  ETL DOWNSTREAM
  • 28. The Analytics Business Process §  The main point to note is that it is iterative §  It has morphed, because of: o  Data availability o  Parallel technology o  Scalable software o  Open source tools o  M/C learning Data Access Data Prep Model Analyze Deploy Execute
  • 29. Analytical Latencies 1.  Data access 2.  Data preparation 3.  Model development 4.  Execution 5.  Implementation 6.  Model audit & update This is where the rubber meets the road: Speed = Value
  • 30. The Impending Reality Technology is speeding up analytics by TWO ORDERS OF MAGNITUDE (on the IT side) This is changing analytics
  • 31. u  Is your capability only relevant to analytics or does it have broader areas of application? u  Technically, what makes it fast? u  Please comment on analytical workloads: - What do you see as the natural IT bottlenecks? - What do you see as the natural business bottlenecks? u  Do we want business analysts to become ersatz data scientists?
  • 32. u  In respect to scale, what is your largest implementation by data volume, and what was the industry sector/problem space? u  Who do you partner with? u  What do you see as the largest barrier to adoption of Trifacta?
  • 33. Twitter Tag: #briefr The Briefing Room
  • 34. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 35. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons and Wikipedia, including: "Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg