Submit Search
Upload
Rob peglar introduction_analytics _big data_hadoop
•
1 like
•
601 views
G
Ghassan Al-Yafie
Follow
Introduction_Analytics _Big Data_Hadoop
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 47
Download now
Download to read offline
Recommended
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
Big Data: An Overview
Big Data: An Overview
C. Scyphers
BDaas- BigData as a service
BDaas- BigData as a service
Agile Testing Alliance
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
SpringPeople
Big data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
vinoth kumar
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
Recommended
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
Big Data: An Overview
Big Data: An Overview
C. Scyphers
BDaas- BigData as a service
BDaas- BigData as a service
Agile Testing Alliance
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
SpringPeople
Big data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
vinoth kumar
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
Big Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
Big Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
ITJobZone.biz
Big Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Introduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
BIG DATA
BIG DATA
Shashank Shetty
Big data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
Amazon Web Services
Big Data & Oracle Technologies
Big Data & Oracle Technologies
Oleksii Movchaniuk
Big data abstract
Big data abstract
nandhiniarumugam619
Big Data Final Presentation
Big Data Final Presentation
17aroumougamh
An introduction to Big Data
An introduction to Big Data
ForwardSprint
Big Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
Big Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
Hadoop core concepts
Hadoop core concepts
Maryan Faryna
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
Avkash Chauhan
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Tokyo University of Science
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Introduction To Analytics
Introduction To Analytics
Alex Meadows
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2
More Related Content
What's hot
Big Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
Big Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
ITJobZone.biz
Big Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
Introduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
BIG DATA
BIG DATA
Shashank Shetty
Big data analytics - hadoop
Big data analytics - hadoop
Vishwajeet Jadeja
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
Amazon Web Services
Big Data & Oracle Technologies
Big Data & Oracle Technologies
Oleksii Movchaniuk
Big data abstract
Big data abstract
nandhiniarumugam619
Big Data Final Presentation
Big Data Final Presentation
17aroumougamh
An introduction to Big Data
An introduction to Big Data
ForwardSprint
Big Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
Big Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
Hadoop core concepts
Hadoop core concepts
Maryan Faryna
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
Avkash Chauhan
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Tokyo University of Science
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
What's hot
(20)
Big Data Analytics with Hadoop
Big Data Analytics with Hadoop
Big Data and Hadoop Basics
Big Data and Hadoop Basics
Introduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Big Data Real Time Applications
Big Data Real Time Applications
Big data analytics, survey r.nabati
Big data analytics, survey r.nabati
Introduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
BIG DATA
BIG DATA
Big data analytics - hadoop
Big data analytics - hadoop
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
Big Data & Oracle Technologies
Big Data & Oracle Technologies
Big data abstract
Big data abstract
Big Data Final Presentation
Big Data Final Presentation
An introduction to Big Data
An introduction to Big Data
Big Data Tech Stack
Big Data Tech Stack
Big Data - A brief introduction
Big Data - A brief introduction
Hadoop core concepts
Hadoop core concepts
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Big data, map reduce and beyond
Big data, map reduce and beyond
Viewers also liked
Introduction To Analytics
Introduction To Analytics
Alex Meadows
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
WSO2
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
NUI Galway
Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.
Ravi Mandal, MBA
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
DataStax
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
WSO2
Introduction to analytics
Introduction to analytics
KRD Pravin
Power Of Analytics
Power Of Analytics
Nitin Godawat
India's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & Exploits
Anivar Aravind
Biometrics
Biometrics
senejug
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
Srinath Perera
Biometrics ppt
Biometrics ppt
MOUNIKA VINNAKOTA
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
BigData AAI
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.Abbas
Basra University, Iraq
Introduction To Biometrics
Introduction To Biometrics
Abdul Rehman
Biometrics
Biometrics
Divya Shah
Introduction to Analytics
Introduction to Analytics
Michael Bloom
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
Srinath Perera
Viewers also liked
(20)
Introduction To Analytics
Introduction To Analytics
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
Introduction to analytics
Introduction to analytics
Power Of Analytics
Power Of Analytics
India's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & Exploits
Biometrics
Biometrics
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
Biometrics ppt
Biometrics ppt
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction To Biometrics
Introduction To Biometrics
Biometrics
Biometrics
Introduction to Analytics
Introduction to Analytics
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
Similar to Rob peglar introduction_analytics _big data_hadoop
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
Jeff Kelly
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
Slim Baltagi
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
Big Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
Exploring the Wider World of Big Data
Exploring the Wider World of Big Data
NetApp
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
jdijcks
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
Big data and you
Big data and you
IBM
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights final
Sal Marcus
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012
Cloudera, Inc.
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
IBM Smarter Analytics
IBM Smarter Analytics
Adrian Turcu
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
Cloudera, Inc.
Deutsche Telekom on Big Data
Deutsche Telekom on Big Data
DataWorks Summit
Understanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
Similar to Rob peglar introduction_analytics _big data_hadoop
(20)
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Big Data Performance and Capacity Management
Big Data Performance and Capacity Management
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Exploring the Wider World of Big Data
Exploring the Wider World of Big Data
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Big data and you
Big data and you
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights final
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Smarter Analytics
IBM Smarter Analytics
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
Deutsche Telekom on Big Data
Deutsche Telekom on Big Data
Understanding Big Data And Hadoop
Understanding Big Data And Hadoop
Recently uploaded
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Suhani Kapoor
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
fulawalesam
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
atducpo
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Timothy Spann
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
Samantha Rae Coolbeth
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
olyaivanovalion
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
olyaivanovalion
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
olyaivanovalion
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
olyaivanovalion
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
olyaivanovalion
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
ffjhghh
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
manisha194592
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Invezz1
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
olyaivanovalion
Recently uploaded
(20)
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
Rob peglar introduction_analytics _big data_hadoop
1.
Introduction to Analytics and
Big Data - Hadoop Rob Peglar EMC Isilon
2.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2
3.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. BIG DATA AND HADOOP Data Challenges Why Hadoop
4.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. IN 2010 THE DIGITAL UNIVERSE WAS 1.2 ZETTABYTES IN A DECADETHE DIGITAL UNIVERSEWILL BE 35 ZETTABYTES 90% OF THE DIGITAL UNIVERSE IS UNSTRUCTURED IN 2011 THE DIGITAL UNIVERSE IS 300 QUADRILLIONFILES Customer Challenges: The Data Deluge The Economist, Feb 25, 2010
5.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Big Data Is Different than Business Intelligence “BIG DATA ANALYTICS” “TRADITIONAL BI” GBs to 10s of TBs Operational Structured Repetitive 10s of TB to 100’s of PB’s External + Operational Mostly Semi-Structured Experimental,Ad Hoc
6.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Questions from Businesses will Vary Reporting, Dashboards Forensics & Data Mining What happened? Why did it happen? Real-Time Analytics Real-Time Data Mining What is happening? Why is it happening? Predictive Analytics Prescriptive Analytics What is likely to happen? What should I do about it? Past Future
7.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Web 2.0 is “Data-Driven” “The future is here, it’s just not evenly distributed yet.” William Gibson
8.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. The world of Data-Driven Applications
9.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Attributes of Big Data Volume Velocity Variety Batch NearTime RealTime Streams Structured Unstructured Semistructured Terabytes Transactions Tables Records Files
10.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Ten Common Big Data Problems 1. Modeling true risk 2. Customer churn analysis 3. Recommendation engine 4. Ad targeting 5. PoS transaction analysis 6. Analyzing network data to predict failure 7. Threat analysis 8. Trade surveillance 9. Search quality 10.Data “sandbox”
11.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. The Big Data Opportunity Financial Services Healthcare Retail Web/Social/Mobile Manufacturing Government
12.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Industries Are Embracing Big Data Retail • CRM – Customer Scoring • Store Siting and Layout • Fraud Detection / Prevention • Supply Chain Optimization Advertising & Public Relations • Demand Signaling • Ad Targeting • Sentiment Analysis • Customer Acquisition Financial Services • Algorithmic Trading • Risk Analysis • Fraud Detection • Portfolio Analysis Media &Telecommunications • Network Optimization • Customer Scoring • Churn Prevention • Fraud Prevention Manufacturing • Product Research • Engineering Analytics • Process & Quality Analysis • Distribution Optimization Energy • Smart Grid • Exploration Government • Market Governance • Counter-Terrorism • Econometrics • Health Informatics Healthcare & Life Sciences • Pharmaco-Genomics • Bio-Informatics • Pharmaceutical Research • Clinical Outcomes Research
13.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Why Hadoop? Answer: Big Datasets!
14.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Why Hadoop? Big Data analytics and the Apache Hadoop open source project are rapidly emerging as the preferred solution to address business and technology trends that are disrupting traditional data management and processing. Enterprises can gain a competitive advantage by being early adopters of big data analytics.
15.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Storage & Memory B/W lagging CPU CPU B/W requirements out-pacing memory and storage Disk & memory getting “further” away from CPU Large sequential transfers better for both memory & disk CPU DRAM LAN Disk Annual bandwidth improvement (all milestones) 1.5 1.27 1.39 1.28 Annual latency improvement (all milestones) 1.17 1.07 1.12 1.11 MemoryWall Storage Chasm
16.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Commodity Hardware Economics For $1000 One computer can Process ~32GB Store ~15TB 99.9% Of data is Underutilized
17.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Enterprise + Big Data = Big Opportunity 17
18.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. WHAT IS HADOOP Hadoop Adoption HDFS MapReduce Ecosystem Projects
19.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Adoption in the Industry 2007 2008 2009 2010 The Datagraph Blog Source: Hadoop Summit Presentations
20.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. What is Hadoop? A scalable fault-tolerant distributed system for data storage and processing Core Hadoop has two main components Hadoop Distributed File System (HDFS): self-healing, high-bandwidth clustered storage Reliable, redundant, distributed file system optimized for large files MapReduce: fault-tolerant distributed processing Programming model for processing sets of data Mapping inputs to outputs and reducing the output of multiple Mappers to one (or a few) answer(s) Operates on unstructured and structured data A large and active ecosystem Open source under the friendly Apache License http://wiki.apache.org/hadoop/
21.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS 101 The Data Set System
22.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS Concepts Sits on top of a native (ext3, xfs, etc..) file system Performs best with a ‘modest’ number of large files Files in HDFS are ‘write once’ HDFS is optimized for large, streaming reads of files
23.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS Hadoop Distributed File System – Data is organized into files & directories – Files are divided into blocks, distributed across cluster nodes – Block placement known at runtime by map- reduce = computation co-located with data – Blocks replicated to handle failure – Checksums used to ensure data integrity Replication: one and only strategy for error handling, recovery and fault tolerance – Self Healing – Make multiple copies
24.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Server Roles Slave Task Tracker Data Node Slave Task Tracker Data Node Slave Task Tracker Data Node Master Name Node Master Secondary Node Job Tracker Client Client Client Client Client Client Client Client Slave Task Tracker Data Node Slave Task Tracker Data Node Slave Task Tracker Data Node Up to 4K Nodes
25.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Cluster DN,TT Up to 4K Nodes DN,TT DN,TT DN,TT DN,TT DN,TT NN 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT JT 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT SNN 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT 1GbE/10GbE CORE SWITCH DN,TT CORE SWITCH Client
26.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS File Write Operation
27.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS File Read Operation
28.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce 101 Functional Programming meets Distributed Processing
29.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Provides Automatic parallelization and distribution Fault Tolerance Status and Monitoring Tools A clean abstraction for programmers Google Technology RoundTable: MapReduce
30.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. What is MapReduce? A method for distributing a task across multiple nodes Each node processes data stored on that node Consists of two developer-created phases 1. Map 2. Reduce In between Map and Reduce is the Shuffle and Sort
31.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Key MapReduce Terminology Concepts A user runs a client program on a client computer The client program submits a job to Hadoop The job is sent to the JobTracker process on the Master Node Each Slave Node runs a process called the TaskTracker The JobTracker instructs TaskTrackers to run and monitor tasks A task attempt is an instance of a task running on a slave node There will be at least as many task attempts as there are tasks which need to be performed
32.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce: Basic Concepts Each Mapper processes single input split from HDFS Hadoop passes developer’s Map code one record at a time Each record has a key and a value Intermediate data written by the Mapper to local disk During shuffle and sort phase, all values associated with same intermediate key are transferred to same Reducer Reducer is passed each key and a list of all its values Output from Reducers is written to HDFS
33.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Operation What was the max/min temperature for the last century?
34.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Sample Dataset The requirement: you need to find out grouped by type of customer how many of each type are in each country with the name of the country listed in the countries.dat in the final result (and not the 2 digit country name). Each record has a key and a value To do this you need to: Join the data sets Key on country Count type of customer per country Output the results
35.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Paradigm Input Map Shuffle and Sort Reduce Output Map Reduce cat grep sort uniq output Map Map Reduce
36.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Example Problem: Count the number of times that each word appears in the following paragraph: John has a red car, which has no radio. Mary has a red bicycle. Bill has no car or bicycle. Map Server 1: John has a red car, which has no radio. John: 1 has: 2 a: 1 red: 1 car: 1 which: 1 no: 1 radio: 1 Server 2: Mary has a red bicycle. Mary: 1 has: 1 a: 1 red: 1 bicycle: 1 Server 3: Bill has no car or bicycle. Bill: 1 has: 1 no: 1 car: 1 or: 1 biclycle:1 Reduce John: 1 has 2 has: 1 has: 1 a: 1 a: 1 red: 1 red: 1 car: 1 car: 1 which: 1 no: 1 no: 1 radio: 1 Mary: 1 bicycle: 1 bicycle: 1 Bill: 1 or: 1 John: 1 has 4 a: 2 red: 2 car: 2 which: 1 no: 2 radio: 1 Mary: 1 bicycle: 2 Bill: 1 or: 1 Server 1 Server 2 Server 3 Server 1 Server 2 Server 3
37.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Putting it all Together: MapReduce and HDFS Task Tracker Task Tracker Task Tracker Job Tracker Hadoop Distributed File System (HDFS) Client/Dev Large Data Set (Log files, Sensor Data) Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job 2 1 3 4
38.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Ecosystem Projects • Hadoop is a ‘top-level’ Apache project • Created and managed under the auspices of the Apache Software Foundation • Several other projects exist that rely on some or all of Hadoop • Typically either both HDFS and MapReduce, or just HDFS • Ecosystem Projects Include • Hive • Pig • HBase • Many more….. http://hadoop.apache.org/
39.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop, SQL & MPP Systems Hadoop Traditional SQL Systems MPP Systems Scale-Out Scale-Up Scale-Out Key/Value Pairs RelationalTables RelationalTables Functional Programming Declarative Queries Declarative Queries Offline Batch Processing Online Transactions Online Transactions
40.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Comparing RDBMS and MapReduce Traditional RDBMS MapReduce Data Size Gigabytes (Terabytes) Petabytes (Exabytes) Access Interactive and Batch Batch Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear DBA Ratio 1:40 1:3000 Reference: Tom White’s Hadoop: The Definitive Guide
41.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Use Cases
42.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Diagnostics and Customer Churn Issues What make and model systems are deployed? Are certain set top boxes in need of replacement based on system diagnostic data? Is the a correlation between make, model or vintage of set top box and customer churn? What are the most expensive boxes to maintain? Which systems should we pro-actively replace to keep customers happy? Big Data Solution Collect unstructured data from set top boxes—multiple terabytes Analyze system data in Hadoop in near real time Pull data in to Hive for interactive query and modeling Analytics with Hadoop increases customer satisfaction
43.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Pay Per View Advertising Issues Fixed inventory of ad space is provided by national content providers. For example, 100 ads offered to provider for 1 month of programming Provider can use this space to advertise its products and services, such as pay per view Do we advertise “The Longest Yard” in the middle of a football game or in the middle of a romantic comedy? 10% increase in pay per view movie rentals = $10M in incremental revenue • Big Data Solution Collect programming data and viewer rental data in a large data repository Develop models to correlate proclivity to rent to programming format Find the most productive time slots and programs to advertise pay per view inventory Improve ad placement and pay-per-view conversion with Hadoop
44.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Risk Modeling Risk Modeling – Bank had customer data across multiple lines of business and needed to develop a better risk picture of its customers. i.e, if direct deposits stop coming into checking acct, it’s likely that customer lost his/her job, which impacts creditworthiness for other products (CC, mortgage, etc.) – Data existing in silos across multiple LOB’s and acquired bank systems – Data size approached 1 petabyte Why do this in Hadoop? – Ability to cost-effectively integrate + 1 PB of data from multiple data sources: data warehouse, call center, chat and email – Platform for more analysis with poly-structured data sources; i.e., combining bank data with credit bureau data; Twitter, etc. – Offload intensive computation from DW
45.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Sentiment Analysis Sentiment Analysis – Hadoop used frequently to monitor what customers think of company’s products or services – Data loaded from social media sources (Twitter, blogs, Facebook, emails, chats, etc.) into Hadoop cluster – Map/Reduce jobs run continuously to identify sentiment (i.e., Acme Company’s rates are “outrageous” or “rip off”) – Negative/positive comments can be acted upon (special offer, coupon, etc.) Why Hadoop – Social media/web data is unstructured – Amount of data is immense – New data sources arise weekly
46.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Resources to enable the Big Data Conversation World Economic Forum: “Personal Data: The Emergence of a New Asset Class” 2011 McKinsey Global Institute: Big Data: The next frontier for innovation, competition, and productivity Big Data: Harnessing a game-changing asset IDC: 2011 Digital Universe Study: Extracting Value from Chaos The Economist: Data, Data Everywhere Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field O’Reilly – What is Data Science? O’Reilly – Building Data Science Teams? O’Reilly – Data for the public good Obama Administration “Big Data Research and Development Initiative.”
47.
Introduction to Analytics
and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Q&A / Feedback Please send any questions or comments on this presentation to the SNIA at this address: trackanalyticsbigdata@snia.org 47 Many thanks to the following individuals for their contributions to this tutorial. SNIA Education Committee Denis Guyadeen Rob Peglar
Download now