SlideShare a Scribd company logo
1 of 19
Download to read offline
© 2014 Datameer, Inc. All rights reserved.
Analyzing Unstructured Data in
Hadoop!
View Recording !!

You can view the recording of this webinar
at:

http://info.datameer.com/Online-Slideshare-
Analyzing-Unstructured-Data-in-Hadoop-
On-Demand.html
© 2013 Datameer, Inc. All rights reserved.
Matt Schumpert @datameer
Senior Director, Solutions Engineering

Matt has been working in the enterprise
infrastructure software space for over 14 years in
various capacities, including sales engineering,
strategic alliances and consulting.

Matt currently runs the pre-sales engineering team at
Datameer, supporting all technical aspects of
customer engagement from initial contact through
roll-out of customers into production.

Matt holds a BS in Computer Science from the
University of Virginia. 
#datameer @datameer
About Our Speaker!
Agenda!
•  Market & Data trends
•  Tuning into new channels
•  The good news
•  The rise of wrangling
•  Analytics requirements
•  Bringing order to chaos
•  Use Cases
What we learned in 2010… (or before)!
Market & Data Trends!
•  Data volumes will grow 800% in 5 years
•  Unstructured data is growing 62% faster
•  80% of all data will be unstructured in 2019
•  “Big Unstructured Data” requires new tech.
•  85% of the Fortune 500 will be unable to exploit
Big Data for competitive advantage through 2015
Source: Gartner
Market & Data Trends!
•  ‘Multi-structured’ is the word of the day
•  Mainstream IT tools broadening the base
•  Competitive advantage lies outside your firewall!
S U
Tuning Into New Channels!
Tuning Into New Channels!
•  Public & social data is available by the firehose
•  The new discipline: connecting, filtering, switching
•  Find the right keywords, dictionaries, segments
•  Learn from, but don’t emulate search engines
•  Beware of point solutions
The Good News!
•  All data has structure
•  Storage is cheap (Hadoop ~= $300 / TB)
•  Processing is cheap (“free”)
•  Unstructured data compresses well
•  Data APIs abound
•  Public data blossoming (data.gov, etc.)
The Rise of Wrangling!
•  A ‘record’ is no longer a record
•  Event streams need different angles of attack
•  Explode, project, align, window, search
•  New companies/technologies specializing in it
Source: Gartner
Analytics Requirements (1)!
•  A scalable Big Data foundation (Hadoop)
•  Schema-on-read
•  Data profiling & cleansing
•  Fast, visual iteration over samples
Source: Gartner
Analytics Requirements (2)!
•  Text mining, without programing
•  Helper functions for semi/un-structured formats
•  Data connectors, new visualizations
•  Patience, and a an culture of data discovery
Datameer:!
End-to-End Big Data Analytics!
Enterprise Integration!
Bringing Order to Chaos!
•  ‘Big Data Visualization’ is an oxymoron
•  Rich, detailed summaries are the goal
•  ‘It’s the analytics, stupid’
Industry Use Cases!
•  Retail: Competitive pricing through web scraping
•  MFG: Product sentiment through Twitter
•  FSI: Brand preferences from Facebook “likes”
•  Gov: Nefarious behavior through email seizure!
For more information!

http://www.datameer.com



" @datameer
" mschumpert@datameer.com

Learn more
Contact
#datameer @datameer

More Related Content

What's hot

DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 

What's hot (20)

Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Platfora Girl Geek Dinner
Platfora Girl Geek DinnerPlatfora Girl Geek Dinner
Platfora Girl Geek Dinner
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Analytics Solutions from SAP
Analytics Solutions from SAPAnalytics Solutions from SAP
Analytics Solutions from SAP
 

Viewers also liked

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Peter Wren-Hilton
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
Jan Hentschel
 

Viewers also liked (20)

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured DataHotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015Israel redefining innovation at International CES 2015
Israel redefining innovation at International CES 2015
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
 
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012Service Cloud für Fortgeschrittene – Die Roadmap für 2012
Service Cloud für Fortgeschrittene – Die Roadmap für 2012
 
Model-Driven Software Development 2.0
Model-Driven Software Development 2.0Model-Driven Software Development 2.0
Model-Driven Software Development 2.0
 
Datameer
DatameerDatameer
Datameer
 
Model Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model EvolutionModel Driven Software Development - Data Model Evolution
Model Driven Software Development - Data Model Evolution
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
APEX 5.0, und sonst?
APEX 5.0, und sonst?APEX 5.0, und sonst?
APEX 5.0, und sonst?
 
Agile MDD
Agile MDDAgile MDD
Agile MDD
 

Similar to Analyzing Unstructured Data in Hadoop Webinar

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 

Similar to Analyzing Unstructured Data in Hadoop Webinar (20)

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
 
Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09Zementis hortonworks-webinar-2014-09
Zementis hortonworks-webinar-2014-09
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 

More from Datameer

How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
Datameer
 

More from Datameer (14)

Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Analyzing Unstructured Data in Hadoop Webinar

  • 1. © 2014 Datameer, Inc. All rights reserved. Analyzing Unstructured Data in Hadoop!
  • 2. View Recording !! You can view the recording of this webinar at: http://info.datameer.com/Online-Slideshare- Analyzing-Unstructured-Data-in-Hadoop- On-Demand.html
  • 3. © 2013 Datameer, Inc. All rights reserved. Matt Schumpert @datameer Senior Director, Solutions Engineering Matt has been working in the enterprise infrastructure software space for over 14 years in various capacities, including sales engineering, strategic alliances and consulting. Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement from initial contact through roll-out of customers into production. Matt holds a BS in Computer Science from the University of Virginia.  #datameer @datameer About Our Speaker!
  • 4. Agenda! •  Market & Data trends •  Tuning into new channels •  The good news •  The rise of wrangling •  Analytics requirements •  Bringing order to chaos •  Use Cases
  • 5. What we learned in 2010… (or before)!
  • 6. Market & Data Trends! •  Data volumes will grow 800% in 5 years •  Unstructured data is growing 62% faster •  80% of all data will be unstructured in 2019 •  “Big Unstructured Data” requires new tech. •  85% of the Fortune 500 will be unable to exploit Big Data for competitive advantage through 2015 Source: Gartner
  • 7. Market & Data Trends! •  ‘Multi-structured’ is the word of the day •  Mainstream IT tools broadening the base •  Competitive advantage lies outside your firewall! S U
  • 8. Tuning Into New Channels!
  • 9. Tuning Into New Channels! •  Public & social data is available by the firehose •  The new discipline: connecting, filtering, switching •  Find the right keywords, dictionaries, segments •  Learn from, but don’t emulate search engines •  Beware of point solutions
  • 10. The Good News! •  All data has structure •  Storage is cheap (Hadoop ~= $300 / TB) •  Processing is cheap (“free”) •  Unstructured data compresses well •  Data APIs abound •  Public data blossoming (data.gov, etc.)
  • 11. The Rise of Wrangling! •  A ‘record’ is no longer a record •  Event streams need different angles of attack •  Explode, project, align, window, search •  New companies/technologies specializing in it Source: Gartner
  • 12. Analytics Requirements (1)! •  A scalable Big Data foundation (Hadoop) •  Schema-on-read •  Data profiling & cleansing •  Fast, visual iteration over samples Source: Gartner
  • 13. Analytics Requirements (2)! •  Text mining, without programing •  Helper functions for semi/un-structured formats •  Data connectors, new visualizations •  Patience, and a an culture of data discovery
  • 16. Bringing Order to Chaos! •  ‘Big Data Visualization’ is an oxymoron •  Rich, detailed summaries are the goal •  ‘It’s the analytics, stupid’
  • 17. Industry Use Cases! •  Retail: Competitive pricing through web scraping •  MFG: Product sentiment through Twitter •  FSI: Brand preferences from Facebook “likes” •  Gov: Nefarious behavior through email seizure!
  • 18.
  • 19. For more information! http://www.datameer.com " @datameer " mschumpert@datameer.com Learn more Contact #datameer @datameer