SlideShare a Scribd company logo
1 of 47
Download to read offline
Introduction to Analytics
and Big Data - Hadoop
Rob Peglar
EMC Isilon
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA.
Member companies and individual members may use this material in
presentations and literature under the following conditions:
Any slide or slides used must be reproduced in their entirety without
modification
The SNIA must be acknowledged as the source of any material used in the
body of any document containing material from these presentations.
This presentation is a project of the SNIA Education Committee.
Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be, or should be construed as legal advice or an
opinion of counsel. If you need legal advice or a legal opinion please
contact your attorney.
The information presented herein represents the author's personal opinion
and current understanding of the relevant issues involved. The author, the
presenter, and the SNIA do not assume any responsibility or liability for
damages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
2
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
BIG DATA AND HADOOP
Data Challenges
Why Hadoop
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
IN 2010 THE DIGITAL UNIVERSE WAS
1.2 ZETTABYTES
IN A DECADETHE DIGITAL UNIVERSEWILL BE
35 ZETTABYTES
90% OF THE DIGITAL UNIVERSE IS
UNSTRUCTURED
IN 2011 THE DIGITAL UNIVERSE IS
300 QUADRILLIONFILES
Customer Challenges: The Data Deluge
The Economist, Feb 25, 2010
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Big Data Is Different than Business Intelligence
“BIG DATA ANALYTICS”
“TRADITIONAL BI”
GBs to 10s of TBs
Operational
Structured
Repetitive
10s of TB to 100’s of PB’s
External + Operational
Mostly Semi-Structured
Experimental,Ad Hoc
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Questions from Businesses will Vary
Reporting,
Dashboards
Forensics & Data
Mining
What
happened?
Why did it
happen?
Real-Time
Analytics
Real-Time
Data Mining
What is
happening?
Why is it
happening?
Predictive
Analytics
Prescriptive
Analytics
What is likely to
happen?
What should I do
about it?
Past Future
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Web 2.0 is “Data-Driven”
“The future is here, it’s just not evenly distributed yet.”
William Gibson
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
The world of Data-Driven Applications
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Attributes of Big Data
Volume
Velocity Variety
Batch
NearTime
RealTime
Streams
Structured
Unstructured
Semistructured
Terabytes
Transactions
Tables
Records
Files
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Ten Common Big Data Problems
1. Modeling true risk
2. Customer churn
analysis
3. Recommendation
engine
4. Ad targeting
5. PoS transaction
analysis
6. Analyzing network
data to predict
failure
7. Threat analysis
8. Trade surveillance
9. Search quality
10.Data “sandbox”
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
The Big Data Opportunity
Financial Services Healthcare
Retail Web/Social/Mobile
Manufacturing Government
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Industries Are Embracing Big Data
Retail
• CRM – Customer Scoring
• Store Siting and Layout
• Fraud Detection / Prevention
• Supply Chain Optimization
Advertising & Public Relations
• Demand Signaling
• Ad Targeting
• Sentiment Analysis
• Customer Acquisition
Financial Services
• Algorithmic Trading
• Risk Analysis
• Fraud Detection
• Portfolio Analysis
Media &Telecommunications
• Network Optimization
• Customer Scoring
• Churn Prevention
• Fraud Prevention
Manufacturing
• Product Research
• Engineering Analytics
• Process & Quality Analysis
• Distribution Optimization
Energy
• Smart Grid
• Exploration
Government
• Market Governance
• Counter-Terrorism
• Econometrics
• Health Informatics
Healthcare & Life Sciences
• Pharmaco-Genomics
• Bio-Informatics
• Pharmaceutical Research
• Clinical Outcomes Research
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Why Hadoop?
Answer: Big Datasets!
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Why Hadoop?
Big Data analytics and the Apache Hadoop open source
project are rapidly emerging as the preferred solution to
address business and technology trends that are
disrupting traditional data management and processing.
Enterprises can gain a competitive advantage by
being early adopters of big data analytics.
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Storage & Memory B/W lagging CPU
CPU B/W requirements out-pacing memory and
storage
Disk & memory getting “further” away from CPU
Large sequential transfers better for both memory &
disk
CPU DRAM LAN Disk
Annual bandwidth improvement (all milestones)
1.5 1.27 1.39 1.28
Annual latency improvement (all milestones) 1.17 1.07 1.12 1.11
MemoryWall Storage Chasm
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Commodity Hardware Economics
For $1000
One computer can
Process
~32GB
Store
~15TB
99.9%
Of data is Underutilized
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Enterprise + Big Data = Big Opportunity
17
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
WHAT IS HADOOP
Hadoop Adoption
HDFS
MapReduce
Ecosystem Projects
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop Adoption in the Industry
2007 2008 2009 2010
The Datagraph Blog
Source: Hadoop Summit Presentations
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
What is Hadoop?
A scalable fault-tolerant distributed system for data storage and
processing
Core Hadoop has two main components
Hadoop Distributed File System (HDFS): self-healing, high-bandwidth clustered
storage
Reliable, redundant, distributed file system optimized for large files
MapReduce: fault-tolerant distributed processing
Programming model for processing sets of data
Mapping inputs to outputs and reducing the output of multiple Mappers to
one (or a few) answer(s)
Operates on unstructured and structured data
A large and active ecosystem
Open source under the friendly Apache License
http://wiki.apache.org/hadoop/
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
HDFS 101
The Data Set System
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
HDFS Concepts
 Sits on top of a native (ext3, xfs, etc..) file system
 Performs best with a ‘modest’ number of large files
 Files in HDFS are ‘write once’
 HDFS is optimized for large, streaming reads of files
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
HDFS
 Hadoop Distributed File System
– Data is organized into files & directories
– Files are divided into blocks, distributed across
cluster nodes
– Block placement known at runtime by map-
reduce = computation co-located with data
– Blocks replicated to handle failure
– Checksums used to ensure data integrity
 Replication: one and only strategy for error
handling, recovery and fault tolerance
– Self Healing
– Make multiple copies
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop Server Roles
Slave
Task
Tracker
Data
Node
Slave
Task Tracker
Data
Node
Slave
Task Tracker
Data
Node
Master
Name
Node
Master
Secondary
Node
Job
Tracker
Client Client Client Client Client Client Client Client
Slave
Task
Tracker
Data
Node
Slave
Task Tracker
Data
Node
Slave
Task
Tracker
Data
Node
Up to 4K
Nodes
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop Cluster
DN,TT
Up to 4K
Nodes
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
NN
1GbE/10GbE
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
JT
1GbE/10GbE
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
SNN
1GbE/10GbE
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
DN,TT
1GbE/10GbE
CORE SWITCH
DN,TT
CORE SWITCH Client
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
HDFS File Write Operation
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
HDFS File Read Operation
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce 101
Functional Programming meets
Distributed Processing
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce Provides
Automatic parallelization and distribution
Fault Tolerance
Status and Monitoring Tools
A clean abstraction for programmers
Google Technology RoundTable: MapReduce
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
What is MapReduce?
A method for distributing a task across multiple nodes
Each node processes data stored on that node
Consists of two developer-created phases
1. Map
2. Reduce
In between Map and Reduce is the Shuffle and Sort
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Key MapReduce Terminology Concepts
A user runs a client program on a client computer
The client program submits a job to Hadoop
The job is sent to the JobTracker process on the
Master Node
Each Slave Node runs a process called the
TaskTracker
The JobTracker instructs TaskTrackers to run and
monitor tasks
A task attempt is an instance of a task running on a
slave node
There will be at least as many task attempts as there
are tasks which need to be performed
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce: Basic Concepts
Each Mapper processes single input split from HDFS
Hadoop passes developer’s Map code one record at a
time
Each record has a key and a value
Intermediate data written by the Mapper to local disk
During shuffle and sort phase, all values associated
with same intermediate key are transferred to same
Reducer
Reducer is passed each key and a list of all its values
Output from Reducers is written to HDFS
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce Operation
What was the max/min temperature for the last century?
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Sample Dataset
The requirement:
you need to find out grouped by type of customer how
many of each type are in each country with the name of the
country listed in the countries.dat in the final result
(and not the 2 digit country name). Each record has a key
and a value
To do this you need to:
Join the data sets
Key on country
Count type of customer per country
Output the results
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce Paradigm
Input Map Shuffle and Sort Reduce Output
Map
Reduce
cat grep sort uniq output
Map
Map
Reduce
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
MapReduce Example
Problem: Count the number of times that each word appears in the following paragraph:
John has a red car, which has no radio. Mary has a red
bicycle. Bill has no car or bicycle.
Map
Server 1: John has a red car, which has no radio.
John: 1
has: 2
a: 1
red: 1
car: 1
which: 1
no: 1
radio: 1
Server 2: Mary has a red bicycle.
Mary: 1
has: 1
a: 1
red: 1
bicycle: 1
Server 3: Bill has no car or bicycle.
Bill: 1
has: 1
no: 1
car: 1
or: 1
biclycle:1
Reduce
John:
1
has 2
has: 1
has: 1
a: 1
a: 1
red: 1
red: 1
car: 1
car: 1
which: 1
no: 1
no: 1
radio: 1
Mary: 1
bicycle: 1
bicycle: 1
Bill: 1
or: 1
John: 1
has 4
a: 2
red: 2
car: 2
which: 1
no: 2
radio: 1
Mary: 1
bicycle: 2
Bill: 1
or: 1
Server 1 Server 2 Server 3 Server 1 Server 2 Server 3
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Putting it all Together:
MapReduce and HDFS
Task Tracker
Task Tracker
Task Tracker
Job Tracker
Hadoop Distributed File System (HDFS)
Client/Dev
Large Data Set
(Log files, Sensor Data)
Map Job
Reduce Job
Map Job
Reduce Job
Map Job
Reduce Job
Map Job
Reduce Job
Map Job
Reduce Job
Map Job
Reduce Job
2
1
3
4
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop Ecosystem Projects
• Hadoop is a ‘top-level’ Apache project
• Created and managed under the auspices of the Apache Software Foundation
• Several other projects exist that rely on some or all of Hadoop
• Typically either both HDFS and MapReduce, or just HDFS
• Ecosystem Projects Include
• Hive
• Pig
• HBase
• Many more…..
http://hadoop.apache.org/
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop, SQL & MPP Systems
Hadoop Traditional SQL
Systems
MPP Systems
Scale-Out Scale-Up Scale-Out
Key/Value Pairs RelationalTables RelationalTables
Functional
Programming
Declarative Queries Declarative Queries
Offline Batch
Processing
Online Transactions Online Transactions
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Comparing RDBMS and MapReduce
Traditional RDBMS MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Exabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Hadoop Use Cases
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Diagnostics and Customer Churn
Issues
What make and model systems are deployed?
Are certain set top boxes in need of replacement based on system
diagnostic data?
Is the a correlation between make, model or vintage of set top box and
customer churn?
What are the most expensive boxes to maintain?
Which systems should we pro-actively replace to keep customers happy?
Big Data Solution
Collect unstructured data from set top boxes—multiple terabytes
Analyze system data in Hadoop in near real time
Pull data in to Hive for interactive query and modeling
Analytics with Hadoop increases customer satisfaction
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Pay Per View Advertising
Issues
Fixed inventory of ad space is provided by national content providers. For
example, 100 ads offered to provider for 1 month of programming
Provider can use this space to advertise its products and services, such as
pay per view
Do we advertise “The Longest Yard” in the middle of a football game or in
the middle of a romantic comedy?
10% increase in pay per view movie rentals = $10M in incremental revenue
• Big Data Solution
Collect programming data and viewer rental data in a large data repository
Develop models to correlate proclivity to rent to programming format
Find the most productive time slots and programs to advertise pay per
view inventory
Improve ad placement and pay-per-view conversion with Hadoop
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Risk Modeling
 Risk Modeling
– Bank had customer data across multiple lines of business and needed to
develop a better risk picture of its customers. i.e, if direct deposits stop
coming into checking acct, it’s likely that customer lost his/her job, which
impacts creditworthiness for other products (CC, mortgage, etc.)
– Data existing in silos across multiple LOB’s and acquired bank systems
– Data size approached 1 petabyte
 Why do this in Hadoop?
– Ability to cost-effectively integrate + 1 PB of data from multiple data
sources: data warehouse, call center, chat and email
– Platform for more analysis with poly-structured data sources; i.e.,
combining bank data with credit bureau data; Twitter, etc.
– Offload intensive computation from DW
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Sentiment Analysis
 Sentiment Analysis
– Hadoop used frequently to monitor what customers think of
company’s products or services
– Data loaded from social media sources (Twitter, blogs,
Facebook, emails, chats, etc.) into Hadoop cluster
– Map/Reduce jobs run continuously to identify sentiment (i.e.,
Acme Company’s rates are “outrageous” or “rip off”)
– Negative/positive comments can be acted upon (special offer,
coupon, etc.)
 Why Hadoop
– Social media/web data is unstructured
– Amount of data is immense
– New data sources arise weekly
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Resources to enable the Big Data Conversation
World Economic Forum: “Personal Data: The Emergence of a New Asset
Class” 2011
McKinsey Global Institute: Big Data: The next frontier for innovation,
competition, and productivity
Big Data: Harnessing a game-changing asset
IDC: 2011 Digital Universe Study: Extracting Value from Chaos
The Economist: Data, Data Everywhere
Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New
Field
O’Reilly – What is Data Science?
O’Reilly – Building Data Science Teams?
O’Reilly – Data for the public good
Obama Administration “Big Data Research and Development Initiative.”
Introduction to Analytics and Big Data – Hadoop
© 2012 Storage Networking Industry Association. All Rights Reserved.
Q&A / Feedback
Please send any questions or comments on this
presentation to the SNIA at this address:
trackanalyticsbigdata@snia.org
47
Many thanks to the following individuals
for their contributions to this tutorial.
SNIA Education Committee
Denis Guyadeen
Rob Peglar

More Related Content

What's hot

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoAmazon Web Services
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core conceptsMaryan Faryna
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 

What's hot (20)

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Hardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project RhinoHardening Hadoop for Healthcare with Project Rhino
Hardening Hadoop for Healthcare with Project Rhino
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 

Viewers also liked

Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...WSO2
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin NUI Galway
 
Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.Ravi Mandal, MBA
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorWSO2
 
Introduction to analytics
Introduction to analyticsIntroduction to analytics
Introduction to analyticsKRD Pravin
 
India's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & ExploitsIndia's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & ExploitsAnivar Aravind
 
Biometrics
BiometricsBiometrics
Biometricssenejug
 
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEPAnalyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEPSrinath Perera
 
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...BigData AAI
 
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasIntroduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasBasra University, Iraq
 
Introduction To Biometrics
Introduction To BiometricsIntroduction To Biometrics
Introduction To BiometricsAbdul Rehman
 
Introduction to Analytics
Introduction to AnalyticsIntroduction to Analytics
Introduction to AnalyticsMichael Bloom
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Srinath Perera
 

Viewers also liked (20)

Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.Introduction about analytics with sas+r programming.
Introduction about analytics with sas+r programming.
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Introduction to analytics
Introduction to analyticsIntroduction to analytics
Introduction to analytics
 
Power Of Analytics
Power Of AnalyticsPower Of Analytics
Power Of Analytics
 
India's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & ExploitsIndia's UID Project: Biometrics Vulnerabilities & Exploits
India's UID Project: Biometrics Vulnerabilities & Exploits
 
Biometrics
BiometricsBiometrics
Biometrics
 
Analyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEPAnalyzing a Soccer Game with WSO2 CEP
Analyzing a Soccer Game with WSO2 CEP
 
Biometrics ppt
Biometrics pptBiometrics ppt
Biometrics ppt
 
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
 
Introduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.AbbasIntroduction to Biometric lectures... Prepared by Dr.Abbas
Introduction to Biometric lectures... Prepared by Dr.Abbas
 
Introduction To Biometrics
Introduction To BiometricsIntroduction To Biometrics
Introduction To Biometrics
 
Biometrics
BiometricsBiometrics
Biometrics
 
Introduction to Analytics
Introduction to AnalyticsIntroduction to Analytics
Introduction to Analytics
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
 

Similar to Rob peglar introduction_analytics _big data_hadoop

Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceSlim Baltagi
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Big data and you
Big data and you Big data and you
Big data and you IBM
 
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin   simplifying big data delivery to drive new insights finalTdwi austin   simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights finalSal Marcus
 
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Cloudera, Inc.
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 

Similar to Rob peglar introduction_analytics _big data_hadoop (20)

Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Big data and you
Big data and you Big data and you
Big data and you
 
Tdwi austin simplifying big data delivery to drive new insights final
Tdwi austin   simplifying big data delivery to drive new insights finalTdwi austin   simplifying big data delivery to drive new insights final
Tdwi austin simplifying big data delivery to drive new insights final
 
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Rob peglar introduction_analytics _big data_hadoop

  • 1. Introduction to Analytics and Big Data - Hadoop Rob Peglar EMC Isilon
  • 2. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2
  • 3. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. BIG DATA AND HADOOP Data Challenges Why Hadoop
  • 4. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. IN 2010 THE DIGITAL UNIVERSE WAS 1.2 ZETTABYTES IN A DECADETHE DIGITAL UNIVERSEWILL BE 35 ZETTABYTES 90% OF THE DIGITAL UNIVERSE IS UNSTRUCTURED IN 2011 THE DIGITAL UNIVERSE IS 300 QUADRILLIONFILES Customer Challenges: The Data Deluge The Economist, Feb 25, 2010
  • 5. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Big Data Is Different than Business Intelligence “BIG DATA ANALYTICS” “TRADITIONAL BI” GBs to 10s of TBs Operational Structured Repetitive 10s of TB to 100’s of PB’s External + Operational Mostly Semi-Structured Experimental,Ad Hoc
  • 6. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Questions from Businesses will Vary Reporting, Dashboards Forensics & Data Mining What happened? Why did it happen? Real-Time Analytics Real-Time Data Mining What is happening? Why is it happening? Predictive Analytics Prescriptive Analytics What is likely to happen? What should I do about it? Past Future
  • 7. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Web 2.0 is “Data-Driven” “The future is here, it’s just not evenly distributed yet.” William Gibson
  • 8. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. The world of Data-Driven Applications
  • 9. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Attributes of Big Data Volume Velocity Variety Batch NearTime RealTime Streams Structured Unstructured Semistructured Terabytes Transactions Tables Records Files
  • 10. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Ten Common Big Data Problems 1. Modeling true risk 2. Customer churn analysis 3. Recommendation engine 4. Ad targeting 5. PoS transaction analysis 6. Analyzing network data to predict failure 7. Threat analysis 8. Trade surveillance 9. Search quality 10.Data “sandbox”
  • 11. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. The Big Data Opportunity Financial Services Healthcare Retail Web/Social/Mobile Manufacturing Government
  • 12. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Industries Are Embracing Big Data Retail • CRM – Customer Scoring • Store Siting and Layout • Fraud Detection / Prevention • Supply Chain Optimization Advertising & Public Relations • Demand Signaling • Ad Targeting • Sentiment Analysis • Customer Acquisition Financial Services • Algorithmic Trading • Risk Analysis • Fraud Detection • Portfolio Analysis Media &Telecommunications • Network Optimization • Customer Scoring • Churn Prevention • Fraud Prevention Manufacturing • Product Research • Engineering Analytics • Process & Quality Analysis • Distribution Optimization Energy • Smart Grid • Exploration Government • Market Governance • Counter-Terrorism • Econometrics • Health Informatics Healthcare & Life Sciences • Pharmaco-Genomics • Bio-Informatics • Pharmaceutical Research • Clinical Outcomes Research
  • 13. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Why Hadoop? Answer: Big Datasets!
  • 14. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Why Hadoop? Big Data analytics and the Apache Hadoop open source project are rapidly emerging as the preferred solution to address business and technology trends that are disrupting traditional data management and processing. Enterprises can gain a competitive advantage by being early adopters of big data analytics.
  • 15. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Storage & Memory B/W lagging CPU CPU B/W requirements out-pacing memory and storage Disk & memory getting “further” away from CPU Large sequential transfers better for both memory & disk CPU DRAM LAN Disk Annual bandwidth improvement (all milestones) 1.5 1.27 1.39 1.28 Annual latency improvement (all milestones) 1.17 1.07 1.12 1.11 MemoryWall Storage Chasm
  • 16. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Commodity Hardware Economics For $1000 One computer can Process ~32GB Store ~15TB 99.9% Of data is Underutilized
  • 17. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Enterprise + Big Data = Big Opportunity 17
  • 18. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. WHAT IS HADOOP Hadoop Adoption HDFS MapReduce Ecosystem Projects
  • 19. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Adoption in the Industry 2007 2008 2009 2010 The Datagraph Blog Source: Hadoop Summit Presentations
  • 20. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. What is Hadoop? A scalable fault-tolerant distributed system for data storage and processing Core Hadoop has two main components Hadoop Distributed File System (HDFS): self-healing, high-bandwidth clustered storage Reliable, redundant, distributed file system optimized for large files MapReduce: fault-tolerant distributed processing Programming model for processing sets of data Mapping inputs to outputs and reducing the output of multiple Mappers to one (or a few) answer(s) Operates on unstructured and structured data A large and active ecosystem Open source under the friendly Apache License http://wiki.apache.org/hadoop/
  • 21. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS 101 The Data Set System
  • 22. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS Concepts  Sits on top of a native (ext3, xfs, etc..) file system  Performs best with a ‘modest’ number of large files  Files in HDFS are ‘write once’  HDFS is optimized for large, streaming reads of files
  • 23. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS  Hadoop Distributed File System – Data is organized into files & directories – Files are divided into blocks, distributed across cluster nodes – Block placement known at runtime by map- reduce = computation co-located with data – Blocks replicated to handle failure – Checksums used to ensure data integrity  Replication: one and only strategy for error handling, recovery and fault tolerance – Self Healing – Make multiple copies
  • 24. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Server Roles Slave Task Tracker Data Node Slave Task Tracker Data Node Slave Task Tracker Data Node Master Name Node Master Secondary Node Job Tracker Client Client Client Client Client Client Client Client Slave Task Tracker Data Node Slave Task Tracker Data Node Slave Task Tracker Data Node Up to 4K Nodes
  • 25. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Cluster DN,TT Up to 4K Nodes DN,TT DN,TT DN,TT DN,TT DN,TT NN 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT JT 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT SNN 1GbE/10GbE DN,TT DN,TT DN,TT DN,TT DN,TT DN,TT 1GbE/10GbE CORE SWITCH DN,TT CORE SWITCH Client
  • 26. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS File Write Operation
  • 27. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. HDFS File Read Operation
  • 28. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce 101 Functional Programming meets Distributed Processing
  • 29. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Provides Automatic parallelization and distribution Fault Tolerance Status and Monitoring Tools A clean abstraction for programmers Google Technology RoundTable: MapReduce
  • 30. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. What is MapReduce? A method for distributing a task across multiple nodes Each node processes data stored on that node Consists of two developer-created phases 1. Map 2. Reduce In between Map and Reduce is the Shuffle and Sort
  • 31. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Key MapReduce Terminology Concepts A user runs a client program on a client computer The client program submits a job to Hadoop The job is sent to the JobTracker process on the Master Node Each Slave Node runs a process called the TaskTracker The JobTracker instructs TaskTrackers to run and monitor tasks A task attempt is an instance of a task running on a slave node There will be at least as many task attempts as there are tasks which need to be performed
  • 32. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce: Basic Concepts Each Mapper processes single input split from HDFS Hadoop passes developer’s Map code one record at a time Each record has a key and a value Intermediate data written by the Mapper to local disk During shuffle and sort phase, all values associated with same intermediate key are transferred to same Reducer Reducer is passed each key and a list of all its values Output from Reducers is written to HDFS
  • 33. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Operation What was the max/min temperature for the last century?
  • 34. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Sample Dataset The requirement: you need to find out grouped by type of customer how many of each type are in each country with the name of the country listed in the countries.dat in the final result (and not the 2 digit country name). Each record has a key and a value To do this you need to: Join the data sets Key on country Count type of customer per country Output the results
  • 35. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Paradigm Input Map Shuffle and Sort Reduce Output Map Reduce cat grep sort uniq output Map Map Reduce
  • 36. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. MapReduce Example Problem: Count the number of times that each word appears in the following paragraph: John has a red car, which has no radio. Mary has a red bicycle. Bill has no car or bicycle. Map Server 1: John has a red car, which has no radio. John: 1 has: 2 a: 1 red: 1 car: 1 which: 1 no: 1 radio: 1 Server 2: Mary has a red bicycle. Mary: 1 has: 1 a: 1 red: 1 bicycle: 1 Server 3: Bill has no car or bicycle. Bill: 1 has: 1 no: 1 car: 1 or: 1 biclycle:1 Reduce John: 1 has 2 has: 1 has: 1 a: 1 a: 1 red: 1 red: 1 car: 1 car: 1 which: 1 no: 1 no: 1 radio: 1 Mary: 1 bicycle: 1 bicycle: 1 Bill: 1 or: 1 John: 1 has 4 a: 2 red: 2 car: 2 which: 1 no: 2 radio: 1 Mary: 1 bicycle: 2 Bill: 1 or: 1 Server 1 Server 2 Server 3 Server 1 Server 2 Server 3
  • 37. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Putting it all Together: MapReduce and HDFS Task Tracker Task Tracker Task Tracker Job Tracker Hadoop Distributed File System (HDFS) Client/Dev Large Data Set (Log files, Sensor Data) Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job Map Job Reduce Job 2 1 3 4
  • 38. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Ecosystem Projects • Hadoop is a ‘top-level’ Apache project • Created and managed under the auspices of the Apache Software Foundation • Several other projects exist that rely on some or all of Hadoop • Typically either both HDFS and MapReduce, or just HDFS • Ecosystem Projects Include • Hive • Pig • HBase • Many more….. http://hadoop.apache.org/
  • 39. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop, SQL & MPP Systems Hadoop Traditional SQL Systems MPP Systems Scale-Out Scale-Up Scale-Out Key/Value Pairs RelationalTables RelationalTables Functional Programming Declarative Queries Declarative Queries Offline Batch Processing Online Transactions Online Transactions
  • 40. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Comparing RDBMS and MapReduce Traditional RDBMS MapReduce Data Size Gigabytes (Terabytes) Petabytes (Exabytes) Access Interactive and Batch Batch Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear DBA Ratio 1:40 1:3000 Reference: Tom White’s Hadoop: The Definitive Guide
  • 41. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Hadoop Use Cases
  • 42. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Diagnostics and Customer Churn Issues What make and model systems are deployed? Are certain set top boxes in need of replacement based on system diagnostic data? Is the a correlation between make, model or vintage of set top box and customer churn? What are the most expensive boxes to maintain? Which systems should we pro-actively replace to keep customers happy? Big Data Solution Collect unstructured data from set top boxes—multiple terabytes Analyze system data in Hadoop in near real time Pull data in to Hive for interactive query and modeling Analytics with Hadoop increases customer satisfaction
  • 43. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Pay Per View Advertising Issues Fixed inventory of ad space is provided by national content providers. For example, 100 ads offered to provider for 1 month of programming Provider can use this space to advertise its products and services, such as pay per view Do we advertise “The Longest Yard” in the middle of a football game or in the middle of a romantic comedy? 10% increase in pay per view movie rentals = $10M in incremental revenue • Big Data Solution Collect programming data and viewer rental data in a large data repository Develop models to correlate proclivity to rent to programming format Find the most productive time slots and programs to advertise pay per view inventory Improve ad placement and pay-per-view conversion with Hadoop
  • 44. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Risk Modeling  Risk Modeling – Bank had customer data across multiple lines of business and needed to develop a better risk picture of its customers. i.e, if direct deposits stop coming into checking acct, it’s likely that customer lost his/her job, which impacts creditworthiness for other products (CC, mortgage, etc.) – Data existing in silos across multiple LOB’s and acquired bank systems – Data size approached 1 petabyte  Why do this in Hadoop? – Ability to cost-effectively integrate + 1 PB of data from multiple data sources: data warehouse, call center, chat and email – Platform for more analysis with poly-structured data sources; i.e., combining bank data with credit bureau data; Twitter, etc. – Offload intensive computation from DW
  • 45. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Sentiment Analysis  Sentiment Analysis – Hadoop used frequently to monitor what customers think of company’s products or services – Data loaded from social media sources (Twitter, blogs, Facebook, emails, chats, etc.) into Hadoop cluster – Map/Reduce jobs run continuously to identify sentiment (i.e., Acme Company’s rates are “outrageous” or “rip off”) – Negative/positive comments can be acted upon (special offer, coupon, etc.)  Why Hadoop – Social media/web data is unstructured – Amount of data is immense – New data sources arise weekly
  • 46. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Resources to enable the Big Data Conversation World Economic Forum: “Personal Data: The Emergence of a New Asset Class” 2011 McKinsey Global Institute: Big Data: The next frontier for innovation, competition, and productivity Big Data: Harnessing a game-changing asset IDC: 2011 Digital Universe Study: Extracting Value from Chaos The Economist: Data, Data Everywhere Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field O’Reilly – What is Data Science? O’Reilly – Building Data Science Teams? O’Reilly – Data for the public good Obama Administration “Big Data Research and Development Initiative.”
  • 47. Introduction to Analytics and Big Data – Hadoop © 2012 Storage Networking Industry Association. All Rights Reserved. Q&A / Feedback Please send any questions or comments on this presentation to the SNIA at this address: trackanalyticsbigdata@snia.org 47 Many thanks to the following individuals for their contributions to this tutorial. SNIA Education Committee Denis Guyadeen Rob Peglar