SlideShare a Scribd company logo
© 2014 Datameer, Inc. All rights reserved.
Analyzing Unstructured Data in
Hadoop!
View Recording !!

You can view the recording of this webinar
at:

http://info.datameer.com/Online-Slideshare-
Analyzing-Unstructured-Data-in-Hadoop-
On-Demand.html
© 2013 Datameer, Inc. All rights reserved.
Matt Schumpert @datameer
Senior Director, Solutions Engineering

Matt has been working in the enterprise
infrastructure software space for over 14 years in
various capacities, including sales engineering,
strategic alliances and consulting.

Matt currently runs the pre-sales engineering team at
Datameer, supporting all technical aspects of
customer engagement from initial contact through
roll-out of customers into production.

Matt holds a BS in Computer Science from the
University of Virginia. 
#datameer @datameer
About Our Speaker!
Agenda!
•  Market & Data trends
•  Tuning into new channels
•  The good news
•  The rise of wrangling
•  Analytics requirements
•  Bringing order to chaos
•  Use Cases
What we learned in 2010… (or before)!
Market & Data Trends!
•  Data volumes will grow 800% in 5 years
•  Unstructured data is growing 62% faster
•  80% of all data will be unstructured in 2019
•  “Big Unstructured Data” requires new tech.
•  85% of the Fortune 500 will be unable to exploit
Big Data for competitive advantage through 2015
Source: Gartner
Market & Data Trends!
•  ‘Multi-structured’ is the word of the day
•  Mainstream IT tools broadening the base
•  Competitive advantage lies outside your firewall!
S U
Tuning Into New Channels!
Tuning Into New Channels!
•  Public & social data is available by the firehose
•  The new discipline: connecting, filtering, switching
•  Find the right keywords, dictionaries, segments
•  Learn from, but don’t emulate search engines
•  Beware of point solutions
The Good News!
•  All data has structure
•  Storage is cheap (Hadoop ~= $300 / TB)
•  Processing is cheap (“free”)
•  Unstructured data compresses well
•  Data APIs abound
•  Public data blossoming (data.gov, etc.)
The Rise of Wrangling!
•  A ‘record’ is no longer a record
•  Event streams need different angles of attack
•  Explode, project, align, window, search
•  New companies/technologies specializing in it
Source: Gartner
Analytics Requirements (1)!
•  A scalable Big Data foundation (Hadoop)
•  Schema-on-read
•  Data profiling & cleansing
•  Fast, visual iteration over samples
Source: Gartner
Analytics Requirements (2)!
•  Text mining, without programing
•  Helper functions for semi/un-structured formats
•  Data connectors, new visualizations
•  Patience, and a an culture of data discovery
Datameer:!
End-to-End Big Data Analytics!
Enterprise Integration!
Bringing Order to Chaos!
•  ‘Big Data Visualization’ is an oxymoron
•  Rich, detailed summaries are the goal
•  ‘It’s the analytics, stupid’
Industry Use Cases!
•  Retail: Competitive pricing through web scraping
•  MFG: Product sentiment through Twitter
•  FSI: Brand preferences from Facebook “likes”
•  Gov: Nefarious behavior through email seizure!
For more information!

http://www.datameer.com



" @datameer
" mschumpert@datameer.com

Learn more
Contact
#datameer @datameer

More Related Content

What's hot

Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera, Inc.
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
Caserta
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
Klaas Bosteels
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
BigDataEverywhere
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
DeZyre
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
EMC
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
Inside Analysis
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
Cloudera, Inc.
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
GetInData
 
Platfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora Girl Geek Dinner
Platfora Girl Geek Dinner
Platfora
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
Dataiku
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Halo BI
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
Cloudera, Inc.
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAP
SAP Analytics
 

What's hot (20)

Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Platfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora Girl Geek Dinner
Platfora Girl Geek Dinner
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAP
 

Viewers also liked

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
Datameer
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
George Roth
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Peter Wren-Hilton
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
Great Wide Open
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Marco Gralike
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
phanleson
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
Health Catalyst
 
Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015
FSJU AUJF
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile ServicesJan Hentschel
 
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Salesforce Deutschland
 
Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Model-Driven Software Development 2.0
Model-Driven Software Development 2.0
Etienne Juliot
 
Datameer
DatameerDatameer
Datameer
Chris Morrison
 
Model Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionModel Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model Evolution
Sander Vermolen
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
Heiko Behrens
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
Eelco Visser
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
Monaheng Diaho
 
APEX 5.0, und sonst?
APEX 5.0, und sonst?APEX 5.0, und sonst?
APEX 5.0, und sonst?
Niels de Bruijn
 
Agile MDD
Agile MDDAgile MDD
Agile MDD
fntnhd
 

Viewers also liked (20)

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
 
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
 
Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Model-Driven Software Development 2.0
Model-Driven Software Development 2.0
 
Datameer
DatameerDatameer
Datameer
 
Model Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionModel Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model Evolution
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
APEX 5.0, und sonst?
APEX 5.0, und sonst?APEX 5.0, und sonst?
APEX 5.0, und sonst?
 
Agile MDD
Agile MDDAgile MDD
Agile MDD
 

Similar to Analyzing Unstructured Data in Hadoop Webinar

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
Caserta
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
Caserta
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
Caserta
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
Gary Allemann
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Devon Ziegenfuss
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
Matt Turner
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
Hortonworks
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
ANAND PRAKASH
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
Precisely
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Caserta
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
Hagar Alaa el-din
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 

Similar to Analyzing Unstructured Data in Hadoop Webinar (20)

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 

More from Datameer

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
Datameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
Datameer
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
Datameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
Datameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
Datameer
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
Datameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
Datameer
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
Datameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
Datameer
 

More from Datameer (14)

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Analyzing Unstructured Data in Hadoop Webinar

  • 1. © 2014 Datameer, Inc. All rights reserved. Analyzing Unstructured Data in Hadoop!
  • 2. View Recording !! You can view the recording of this webinar at: http://info.datameer.com/Online-Slideshare- Analyzing-Unstructured-Data-in-Hadoop- On-Demand.html
  • 3. © 2013 Datameer, Inc. All rights reserved. Matt Schumpert @datameer Senior Director, Solutions Engineering Matt has been working in the enterprise infrastructure software space for over 14 years in various capacities, including sales engineering, strategic alliances and consulting. Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement from initial contact through roll-out of customers into production. Matt holds a BS in Computer Science from the University of Virginia.  #datameer @datameer About Our Speaker!
  • 4. Agenda! •  Market & Data trends •  Tuning into new channels •  The good news •  The rise of wrangling •  Analytics requirements •  Bringing order to chaos •  Use Cases
  • 5. What we learned in 2010… (or before)!
  • 6. Market & Data Trends! •  Data volumes will grow 800% in 5 years •  Unstructured data is growing 62% faster •  80% of all data will be unstructured in 2019 •  “Big Unstructured Data” requires new tech. •  85% of the Fortune 500 will be unable to exploit Big Data for competitive advantage through 2015 Source: Gartner
  • 7. Market & Data Trends! •  ‘Multi-structured’ is the word of the day •  Mainstream IT tools broadening the base •  Competitive advantage lies outside your firewall! S U
  • 8. Tuning Into New Channels!
  • 9. Tuning Into New Channels! •  Public & social data is available by the firehose •  The new discipline: connecting, filtering, switching •  Find the right keywords, dictionaries, segments •  Learn from, but don’t emulate search engines •  Beware of point solutions
  • 10. The Good News! •  All data has structure •  Storage is cheap (Hadoop ~= $300 / TB) •  Processing is cheap (“free”) •  Unstructured data compresses well •  Data APIs abound •  Public data blossoming (data.gov, etc.)
  • 11. The Rise of Wrangling! •  A ‘record’ is no longer a record •  Event streams need different angles of attack •  Explode, project, align, window, search •  New companies/technologies specializing in it Source: Gartner
  • 12. Analytics Requirements (1)! •  A scalable Big Data foundation (Hadoop) •  Schema-on-read •  Data profiling & cleansing •  Fast, visual iteration over samples Source: Gartner
  • 13. Analytics Requirements (2)! •  Text mining, without programing •  Helper functions for semi/un-structured formats •  Data connectors, new visualizations •  Patience, and a an culture of data discovery
  • 16. Bringing Order to Chaos! •  ‘Big Data Visualization’ is an oxymoron •  Rich, detailed summaries are the goal •  ‘It’s the analytics, stupid’
  • 17. Industry Use Cases! •  Retail: Competitive pricing through web scraping •  MFG: Product sentiment through Twitter •  FSI: Brand preferences from Facebook “likes” •  Gov: Nefarious behavior through email seizure!
  • 18.
  • 19. For more information! http://www.datameer.com " @datameer " mschumpert@datameer.com Learn more Contact #datameer @datameer