SlideShare a Scribd company logo
1 of 69
Download to read offline
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 1
Mohammed Guller
Oct 02, 2016
Introduction to Big Data
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 2
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
Agenda
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 4
About Me
• Engineering Manager / Principal Architect at Glassbeam
• Founded two startups
• Passionate about building products, big data analytics, and
machine learning
• www.linkedin.com/in/mohammedguller
• @MohammedGuller
4
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 6
• Hands-on guide with lots of examples
• Covers both fundamental and advanced
topics such as machine learning
• Includes a primer on functional
programming and Scala
• Introduces other important Big Data
technologies such as HDFS, Parquet,
Kafka, HBase, Cassandra, Mesos, and
YARN
Big Data Analytics with Spark
Available on Amazon
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 7
About Glassbeam
Glassbeam brings structure and meaning to data from any connected machine or device while providing
actionable intelligence
Cloud based analytics platform that helps
organizations turn raw machine data to insights
Making sense of multi
structured machine data
 Data center devices
 Medical devices
 Sensors
 ATMs
 Automobiles
 Data from any machine
Providing comprehensive set of apps
& tools for machine data analysis
 50,000+ systems being tracked today
 1,500+ different software rev codes
 1.2 Billion sensor readings per day
 1+ Trillion sensor readings tracked
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 8
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 9
Data Growing At a Faster Pace Than Ever
9
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 10
Internet of Things (IoT)
• Network of objects embedded with
software for collecting and sending data
over the Internet
• 5x more connected things than people by
2020
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 11
Industrial IoT
• Manufacturing
• Automotive
• Medical
• Data Center
• EVC
• Smart Meter
11
Glassbeam target market is focused on driving opera onal & business
analy cs value for connected product companies in Industrial IoT market
IT & Networks Medical & Health Care
Transporta on
EV Chargers & Smart Grid
Industrial & Mfg
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 12
Key Attributes of Big Data
Volume
Scale of Data
Variety
Diversity of Data
Velocity
Speed of Data
•
•
•
•
•
•
•
•
•
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 13
Big Data Comes with Big Challenges
• Storage
• Processing
• Value
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 14
Storage Challenges
• Legacy SAN / NAS storage devices are expensive
• Traditional RDBMS were not designed for Big Data
• Cannot handle volume, velocity, variety of Big Data
14
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 15
Processing Challenges
• Diverse processing
• Organizations want do more than just BI / traditional analytics
• Go beyond SQL queries
• Timeliness
• Process data in reasonable amount of time
• Value of data decreases over time
15
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 16
How Much Data Can a Standard Server Process
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 17
•
•
17
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 18
• Large number of CPUs / cores
• Faster cores
• Large amount of memory
• Faster memory bus
• High-performance architecture
Scale-up with Powerful High-end Server
18
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 19
Disadvantages of Scale-up Architecture
• Proprietary
• Expensive
• Limited scalability
19
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 20
• Cluster of servers
• Commodity machines
• Pool together resources
• CPU
• Memory
• Disk
Scale-out Architecture
20
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 21
Benefits of Scale-out Architecture
• Relatively inexpensive
• Economical to scale
• No huge upfront investment
• Start small and expand cluster as workload increases
21
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 22
Challenges With Scale-out Architecture
• Writing distributed applications is very hard
• Split job into chunks that can be distributed across a cluster
• Schedule compute resources among different jobs
• Manage inter-node communication
• Handle network and node failures
• Hardware failures are more common at a cluster level
• Probability of a single node failing is low
• Probability of any one node in a large cluster failing is high
22
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 23
Getting Value Out of Big Data
• Traditional analytics / BI
• Custom processing
• Machine Learning
• Predictive analytics
• Automate complex tasks
• Stream processing
• Analyze in real-time/near real-time
• React in real-time
23
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 24
Traditional Analytics / BI
• What
• Customer growth for the last month/quarter/year
• Segmentation of customers by demographics
• Average time spent by mobile app users
• Why
• Sales growth slowed
• regional issue
• supply issue
• Profit dropped
• revenue dropped
• expenses increased
24
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 25
Custom Processing
• Index web pages
• Google
• Bing
• Process genome data
• Identify mutations linked to cancer, Alzheimer's and other disease
• Click analysis
• Log analysis
• 360-degree real time view of a customer
25
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 26
Predictive Analytics
• Advertisements that a visitor will most likely click
• Movies / songs / news that a customer will like
• Products that a customer will buy
• Patient will have an heart attack
26
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 27
• Virtual assistant
• Siri
• Google Now
• Autonomous machine
• Self-driving car
• Robots
• Tag Images
• Facebook
• Flickr
• Expert System
• Medical diagnosis
• Personalized medicine
• Security
• Fraud detection
• Network Security
• Music recognition
• Shazam
• SoundHound
Automate Complex Tasks
27
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 28
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 29
•
•
•
•
•
•
29
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 30
•
•
•
•
•
•
30
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 31
• Text
• CSV
• JSON
• XML
• Binary
• Sequence File
• Avro
• Parquet
• Optimized Row Columnar
(ORC)
File Formats
31
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 32
• Hive
• Spark SQL
• Impala
• Presto
• Drill
• Phoenix
• HAWQ
• Tajo
Distributed SQL Query Engine
32
Data Warehouse
Distributed
Storage
Distributed
Query Engine
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 33
•
•
•
•
•
•
33
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 34
•
•
•
34
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 35
•
•
35
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 36
Publish – Subscribe / Messaging Systems
• Kafka
• RabbitMQ
• ActiveMQ
• ZeroMQ
36
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 37
• Batch
• Hadoop MapReduce
• HPCC
• Stream
• Kafka Streams
• Heron
• Storm
• Samza
• Batch and Stream
• Spark
• Flink
• Beam
• Apex
• Ignite
Big Data Computing Frameworks
37
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 38
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 39
• Distributed publish-subscribe
messaging system
• Partitioned and replicated
commit log service for
building distributed datastore
Kafka
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 40
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
40
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 41
•
•
•
•
•
•
•
•
41
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 42
•
•
•
•
•
•
42
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 43
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 44
Hadoop
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 45
•
•
•
•
45
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 46
•
•
•
•
•
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 47
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 48
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 49
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 50
Hadoop is Not a Single Product
50
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 51
Hadoop Core Components
51
=
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 52
Big Data
Big Data Technologies
Kafka
Hadoop
Spark
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 53
•
•
•
53
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 54
•
•
•
•
•
•
•
54
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 55
Adoption of Spark is Growing Rapidly
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 56
Spark
Fast, easy-to-use, general-purpose cluster computing framework
for processing large datasets using a simpler programming
model
56
• • •
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 57
Benefits
• Scale
• Fault-tolerance
• Abstracts distributed computing
• Hides the messy details of writing distributed applications
• Allows developers to just focus on the data processing logic
• Same code works on a laptop or a cluster of servers
• Ease-of-use
• Speed
• Flexibility
57
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 58
Easy To Use
• Library with an expressive API
• Scala, Java, Python, R
• RDD API with 80+ operators (MR has only two)
• Dataset/DataFrame API
• Interactive development
• spark-shell
• notebooks
58
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 59
• Batch processing
• Interactive analytics
• Stream analysis
• Machine learning
• Graph analytics
Integrated Libraries For a Variety of DP Tasks
Spark Core
Spark
SQL
GraphX
Spark
Streaming
MLlib
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 60
Benefits of a Unified Platform
• Solve a variety of problems with a single toolkit
• No need to learn different tools for each use case
• Avoid code and data duplication
• Achieve operational simplicity
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 61
Why is Spark Fast
• Advanced job execution engine
• Allows applications to cache data in memory
61
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 62
Advanced Job Execution Engine
• Directed Acyclic Graph (DAG) of stages
• simple job can contain just one stage
• complex job can contain many stages
• eliminates expensive operations between multiple jobs
• synchronization
• serialization/deserialization
• disk I/O
• Lazy operator evaluation
• Pipelined operations
62
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 65
Allows Applications to Cache Data in Memory
•Minimize disk I/O
•Reading data from memory is orders of magnitude
faster than reading from disk
•In-memory data sharing across DAGs
• different jobs can work with the same cached data
65
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 66
Why Caching Makes Applications Run Faster
66
100 MB/s
500 MB/s
10 GB/s
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 67
Read Latency Comparison
67
0
50
100
150
200
1 TB
Time (Min)
Data Read
HDD
SSD
RAM
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 74
Spark Does Not Provide Storage
• Works with a variety of data sources
• No need to import data into Spark
• Scale compute and storage cluster independently
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 75
Process Data From a Variety Of Data Sources
And Many More
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 76
Spark Does Not Replace Hadoop
76
= =
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 77
Hadoop is Optional
77
= =
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 78
Ideal Applications
• Complex data processing
• multi-step pipeline
• Iterative algorithm
• Machine Learning
• Graph analytics
• Ad hoc analysis
• Interactive
© COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 110110

More Related Content

What's hot

Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?SnapLogic
 
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...In-Memory Computing Summit
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 

What's hot (19)

Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big Data
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 

Viewers also liked

Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkMohammed Guller
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Jeff Kelly
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value ChainPRELIDA Project
 
Becoming a Data Driven Organisation
Becoming a Data Driven OrganisationBecoming a Data Driven Organisation
Becoming a Data Driven OrganisationWizdee
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...Usama Fayyad
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Lecture on Data Science in a Data-Driven Culture
Lecture on Data Science in a Data-Driven Culture Lecture on Data Science in a Data-Driven Culture
Lecture on Data Science in a Data-Driven Culture Johan Himberg
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
How to reach a Data Driven culture
How to reach a Data Driven cultureHow to reach a Data Driven culture
How to reach a Data Driven cultureMark Beekman
 
The big data value chain r1-31 oct13
The big data value chain r1-31 oct13The big data value chain r1-31 oct13
The big data value chain r1-31 oct13Rei Lynn Hayashi
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015 Den Reymer
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 

Viewers also liked (15)

Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
Becoming a Data Driven Organisation
Becoming a Data Driven OrganisationBecoming a Data Driven Organisation
Becoming a Data Driven Organisation
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Lecture on Data Science in a Data-Driven Culture
Lecture on Data Science in a Data-Driven Culture Lecture on Data Science in a Data-Driven Culture
Lecture on Data Science in a Data-Driven Culture
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
How to reach a Data Driven culture
How to reach a Data Driven cultureHow to reach a Data Driven culture
How to reach a Data Driven culture
 
The big data value chain r1-31 oct13
The big data value chain r1-31 oct13The big data value chain r1-31 oct13
The big data value chain r1-31 oct13
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 

Similar to Introduction to Big Data

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyC1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyDr. Wilfred Lin (Ph.D.)
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data AnalyticsCynthia Saracco
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020HostedbyConfluent
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
How Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data CenterHow Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data CenterHostway|HOSTING
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs DataCore Software
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I StartedDatavail
 
Building the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing MicroservicesBuilding the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing MicroservicesSargun Dhillon
 
Journey to analytics in the cloud
Journey to analytics in the cloudJourney to analytics in the cloud
Journey to analytics in the cloudSaama
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analyticsWei-Chiu Chuang
 
DC/OS 1.8 Container Networking
DC/OS 1.8 Container NetworkingDC/OS 1.8 Container Networking
DC/OS 1.8 Container NetworkingSargun Dhillon
 
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018Amazon Web Services
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Linthicum next generation-iaa s-paas-and-database-as-a-service
Linthicum next generation-iaa s-paas-and-database-as-a-serviceLinthicum next generation-iaa s-paas-and-database-as-a-service
Linthicum next generation-iaa s-paas-and-database-as-a-serviceDavid Linthicum
 
The Cloud and Microsoft Windows Azure - A Walk through the clouds
The Cloud and Microsoft Windows Azure - A Walk through the cloudsThe Cloud and Microsoft Windows Azure - A Walk through the clouds
The Cloud and Microsoft Windows Azure - A Walk through the cloudsMark Rodseth
 
Introduction To IPaaS: Drivers, Requirements And Use Cases
Introduction To IPaaS: Drivers, Requirements And Use CasesIntroduction To IPaaS: Drivers, Requirements And Use Cases
Introduction To IPaaS: Drivers, Requirements And Use CasesSynerzip
 

Similar to Introduction to Big Data (20)

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyC1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategy
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?
 
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
Couchbase Cloud No Equal (Rick Jacobs, Couchbase) Kafka Summit 2020
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
How Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data CenterHow Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data Center
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs
 
Journey to the Cloud: What I Wish I Knew Before I Started
 Journey to the Cloud: What I Wish I Knew Before I Started Journey to the Cloud: What I Wish I Knew Before I Started
Journey to the Cloud: What I Wish I Knew Before I Started
 
Building the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing MicroservicesBuilding the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing Microservices
 
Journey to analytics in the cloud
Journey to analytics in the cloudJourney to analytics in the cloud
Journey to analytics in the cloud
 
Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analytics
 
DC/OS 1.8 Container Networking
DC/OS 1.8 Container NetworkingDC/OS 1.8 Container Networking
DC/OS 1.8 Container Networking
 
Erlang containers
Erlang containersErlang containers
Erlang containers
 
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Linthicum next generation-iaa s-paas-and-database-as-a-service
Linthicum next generation-iaa s-paas-and-database-as-a-serviceLinthicum next generation-iaa s-paas-and-database-as-a-service
Linthicum next generation-iaa s-paas-and-database-as-a-service
 
The Cloud and Microsoft Windows Azure - A Walk through the clouds
The Cloud and Microsoft Windows Azure - A Walk through the cloudsThe Cloud and Microsoft Windows Azure - A Walk through the clouds
The Cloud and Microsoft Windows Azure - A Walk through the clouds
 
Introduction To IPaaS: Drivers, Requirements And Use Cases
Introduction To IPaaS: Drivers, Requirements And Use CasesIntroduction To IPaaS: Drivers, Requirements And Use Cases
Introduction To IPaaS: Drivers, Requirements And Use Cases
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Introduction to Big Data

  • 1. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 1 Mohammed Guller Oct 02, 2016 Introduction to Big Data
  • 2. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 2 Big Data Big Data Technologies Kafka Hadoop Spark Agenda
  • 3. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 4 About Me • Engineering Manager / Principal Architect at Glassbeam • Founded two startups • Passionate about building products, big data analytics, and machine learning • www.linkedin.com/in/mohammedguller • @MohammedGuller 4
  • 4. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 6 • Hands-on guide with lots of examples • Covers both fundamental and advanced topics such as machine learning • Includes a primer on functional programming and Scala • Introduces other important Big Data technologies such as HDFS, Parquet, Kafka, HBase, Cassandra, Mesos, and YARN Big Data Analytics with Spark Available on Amazon
  • 5. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 7 About Glassbeam Glassbeam brings structure and meaning to data from any connected machine or device while providing actionable intelligence Cloud based analytics platform that helps organizations turn raw machine data to insights Making sense of multi structured machine data  Data center devices  Medical devices  Sensors  ATMs  Automobiles  Data from any machine Providing comprehensive set of apps & tools for machine data analysis  50,000+ systems being tracked today  1,500+ different software rev codes  1.2 Billion sensor readings per day  1+ Trillion sensor readings tracked
  • 6. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 8 Big Data Big Data Technologies Kafka Hadoop Spark
  • 7. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 9 Data Growing At a Faster Pace Than Ever 9
  • 8. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 10 Internet of Things (IoT) • Network of objects embedded with software for collecting and sending data over the Internet • 5x more connected things than people by 2020
  • 9. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 11 Industrial IoT • Manufacturing • Automotive • Medical • Data Center • EVC • Smart Meter 11 Glassbeam target market is focused on driving opera onal & business analy cs value for connected product companies in Industrial IoT market IT & Networks Medical & Health Care Transporta on EV Chargers & Smart Grid Industrial & Mfg
  • 10. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 12 Key Attributes of Big Data Volume Scale of Data Variety Diversity of Data Velocity Speed of Data • • • • • • • • •
  • 11. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 13 Big Data Comes with Big Challenges • Storage • Processing • Value
  • 12. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 14 Storage Challenges • Legacy SAN / NAS storage devices are expensive • Traditional RDBMS were not designed for Big Data • Cannot handle volume, velocity, variety of Big Data 14
  • 13. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 15 Processing Challenges • Diverse processing • Organizations want do more than just BI / traditional analytics • Go beyond SQL queries • Timeliness • Process data in reasonable amount of time • Value of data decreases over time 15
  • 14. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 16 How Much Data Can a Standard Server Process
  • 15. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 17 • • 17
  • 16. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 18 • Large number of CPUs / cores • Faster cores • Large amount of memory • Faster memory bus • High-performance architecture Scale-up with Powerful High-end Server 18
  • 17. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 19 Disadvantages of Scale-up Architecture • Proprietary • Expensive • Limited scalability 19
  • 18. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 20 • Cluster of servers • Commodity machines • Pool together resources • CPU • Memory • Disk Scale-out Architecture 20
  • 19. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 21 Benefits of Scale-out Architecture • Relatively inexpensive • Economical to scale • No huge upfront investment • Start small and expand cluster as workload increases 21
  • 20. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 22 Challenges With Scale-out Architecture • Writing distributed applications is very hard • Split job into chunks that can be distributed across a cluster • Schedule compute resources among different jobs • Manage inter-node communication • Handle network and node failures • Hardware failures are more common at a cluster level • Probability of a single node failing is low • Probability of any one node in a large cluster failing is high 22
  • 21. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 23 Getting Value Out of Big Data • Traditional analytics / BI • Custom processing • Machine Learning • Predictive analytics • Automate complex tasks • Stream processing • Analyze in real-time/near real-time • React in real-time 23
  • 22. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 24 Traditional Analytics / BI • What • Customer growth for the last month/quarter/year • Segmentation of customers by demographics • Average time spent by mobile app users • Why • Sales growth slowed • regional issue • supply issue • Profit dropped • revenue dropped • expenses increased 24
  • 23. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 25 Custom Processing • Index web pages • Google • Bing • Process genome data • Identify mutations linked to cancer, Alzheimer's and other disease • Click analysis • Log analysis • 360-degree real time view of a customer 25
  • 24. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 26 Predictive Analytics • Advertisements that a visitor will most likely click • Movies / songs / news that a customer will like • Products that a customer will buy • Patient will have an heart attack 26
  • 25. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 27 • Virtual assistant • Siri • Google Now • Autonomous machine • Self-driving car • Robots • Tag Images • Facebook • Flickr • Expert System • Medical diagnosis • Personalized medicine • Security • Fraud detection • Network Security • Music recognition • Shazam • SoundHound Automate Complex Tasks 27
  • 26. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 28 Big Data Big Data Technologies Kafka Hadoop Spark
  • 27. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 29 • • • • • • 29
  • 28. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 30 • • • • • • 30
  • 29. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 31 • Text • CSV • JSON • XML • Binary • Sequence File • Avro • Parquet • Optimized Row Columnar (ORC) File Formats 31
  • 30. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 32 • Hive • Spark SQL • Impala • Presto • Drill • Phoenix • HAWQ • Tajo Distributed SQL Query Engine 32 Data Warehouse Distributed Storage Distributed Query Engine
  • 31. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 33 • • • • • • 33
  • 32. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 34 • • • 34
  • 33. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 35 • • 35
  • 34. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 36 Publish – Subscribe / Messaging Systems • Kafka • RabbitMQ • ActiveMQ • ZeroMQ 36
  • 35. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 37 • Batch • Hadoop MapReduce • HPCC • Stream • Kafka Streams • Heron • Storm • Samza • Batch and Stream • Spark • Flink • Beam • Apex • Ignite Big Data Computing Frameworks 37
  • 36. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 38 Big Data Big Data Technologies Kafka Hadoop Spark
  • 37. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 39 • Distributed publish-subscribe messaging system • Partitioned and replicated commit log service for building distributed datastore Kafka
  • 38. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 40 • • • • • • • • • • • • • • • 40
  • 39. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 41 • • • • • • • • 41
  • 40. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 42 • • • • • • 42
  • 41. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 43 Big Data Big Data Technologies Kafka Hadoop Spark
  • 42. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 44 Hadoop
  • 43. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 45 • • • • 45
  • 44. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 46 • • • • •
  • 45. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 47
  • 46. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 48
  • 47. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 49
  • 48. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 50 Hadoop is Not a Single Product 50
  • 49. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 51 Hadoop Core Components 51 =
  • 50. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 52 Big Data Big Data Technologies Kafka Hadoop Spark
  • 51. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 53 • • • 53
  • 52. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 54 • • • • • • • 54
  • 53. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 55 Adoption of Spark is Growing Rapidly
  • 54. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 56 Spark Fast, easy-to-use, general-purpose cluster computing framework for processing large datasets using a simpler programming model 56 • • •
  • 55. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 57 Benefits • Scale • Fault-tolerance • Abstracts distributed computing • Hides the messy details of writing distributed applications • Allows developers to just focus on the data processing logic • Same code works on a laptop or a cluster of servers • Ease-of-use • Speed • Flexibility 57
  • 56. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 58 Easy To Use • Library with an expressive API • Scala, Java, Python, R • RDD API with 80+ operators (MR has only two) • Dataset/DataFrame API • Interactive development • spark-shell • notebooks 58
  • 57. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 59 • Batch processing • Interactive analytics • Stream analysis • Machine learning • Graph analytics Integrated Libraries For a Variety of DP Tasks Spark Core Spark SQL GraphX Spark Streaming MLlib
  • 58. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 60 Benefits of a Unified Platform • Solve a variety of problems with a single toolkit • No need to learn different tools for each use case • Avoid code and data duplication • Achieve operational simplicity
  • 59. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 61 Why is Spark Fast • Advanced job execution engine • Allows applications to cache data in memory 61
  • 60. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 62 Advanced Job Execution Engine • Directed Acyclic Graph (DAG) of stages • simple job can contain just one stage • complex job can contain many stages • eliminates expensive operations between multiple jobs • synchronization • serialization/deserialization • disk I/O • Lazy operator evaluation • Pipelined operations 62
  • 61. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 65 Allows Applications to Cache Data in Memory •Minimize disk I/O •Reading data from memory is orders of magnitude faster than reading from disk •In-memory data sharing across DAGs • different jobs can work with the same cached data 65
  • 62. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 66 Why Caching Makes Applications Run Faster 66 100 MB/s 500 MB/s 10 GB/s
  • 63. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 67 Read Latency Comparison 67 0 50 100 150 200 1 TB Time (Min) Data Read HDD SSD RAM
  • 64. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 74 Spark Does Not Provide Storage • Works with a variety of data sources • No need to import data into Spark • Scale compute and storage cluster independently
  • 65. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 75 Process Data From a Variety Of Data Sources And Many More
  • 66. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 76 Spark Does Not Replace Hadoop 76 = =
  • 67. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 77 Hadoop is Optional 77 = =
  • 68. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 78 Ideal Applications • Complex data processing • multi-step pipeline • Iterative algorithm • Machine Learning • Graph analytics • Ad hoc analysis • Interactive
  • 69. © COPYRIGHT 2016 GLASSBEAM INC. CONFIDENTIAL. DO NOT DISTRIBUTE 110110