SlideShare a Scribd company logo
1 of 37
Download to read offline
Hadoop and object
stores –
can we do it better?
Gil Vernik, Trent Gray-Donald
IBM
Speakers..
§ Gil Vernik
- IBM Research from 2010
- Architect, 25+ years of development experience
- Active in open source
- Recent interest: Big Data engines and object stores
§ Trent
- IBM Distinguished Engineer
- Architect on Watson Data Platform
- Historically worked on the IBM Java VM
Twitter: @vernikgil
Agenda
§ Storage for unstructured data
§ Introduction to object storage – why are they needed and what are they?
§ HDFS and object stores – differences
§ Real world usage
- How Hadoop accesses object stores
- Understanding the issues
- An alternative approach
§ SETI usage
Storage for unstructured data
§ HDFS, distributed file system, or similar
§ Object storage
- On premise, cloud based, hybrid, etc.
- IBM Cloud Object Storage
- Amazon S3
- OpenStack Swift, Azure Blob Storage, etc.
§ Non-SQL data bases / key-value stores
Ingest raw data
Read data with
schema
Unstructured
data storage
HDFS - Summary
§ Hadoop Distributed File System (distributed, and Hadoop-native).
§ Stores large amounts of unstructured data in arbitrary formats.
§ Default internal block size is large - usually 64MB.
§ Blocks are replicated.
§ Write once – read many (append allowed)
§ (Often) collocated with compute capacity.
§ Need an HDFS client to work with HDFS.
§ Hadoop FS shell is widely used with HDFS.
What is an object store?
§ Object store is a perfect solution to store files (we call them data objects)
§ Each data object contains rich metadata and data itself.
§ Capable of storing huge amounts of unstructured data.
§ On premise, cloud based, hybrid, etc.
Object storage
Good things about object stores
§ Resilient store: data is will not be lost.
§ Fault tolerant : object store designed to operate during failures.
§ Various security models – data is safe.
§ Can be easily accessed for write or read flows.
§ (effectively) infinitely scalable – EB and beyond.
§ Low cost, long term storage solution.
Organize data in the object store
§ Data objects are organized inside buckets (s3) or containers (Swift).
§ Each data object may contain a name with delimiters, usually “/”.
§ Conceptual grouping via delimiters allows hierarchical organization, an analogy to the directories in
file systems but without the overhead or scalability limits of lots of directories.
mytalks/year=2016/month=5/day=24/data-palooza.pdf
mytalks/year=2017/month=5/day=24/hadoop-strata.pdf
mytalks/year=2017/month=6/day=07/spark-summit.pdf
bucket
data object
Object storage is not a file system
§ Write once – no append in place
§ Usually eventual consistent
§ Accessed via RESTful API, SDKs available for many languages.
§ Each data object has a unique URI.
§ Rename in object store is not atomic operation (unlike on file systems).
- Rename = GET and PUT/COPY and DELETE.
§ Object creation is atomic.
- Writing a file is not.
§ Examples
- Store raw data for archive, raw IoT sensor data.
- Export old data from database and store it as objects.
Export “old” data
HDFS HDFS HDFS
Worker 1 Worker 2 Worker 3Worker 1 Worker 2 Worker 3
Object
storage
The usual dilemma
No data locality Data locality
• Impossible to scale storage without scaling compute.
• Difficult to share HDFS data more globally
• Separated from compute nodes thus storage can
be scaled independently from compute
• Data is easily shared and can be accessed from
different locations.
HDFS HDFS HDFS
Worker 1 Worker 2 Worker 3Worker 1 Worker 2 Worker 3
Object
storage
The usual dilemma
No data locality Data locality
• Impossible to scale storage without scaling compute.
• Difficult to share HDFS data more globally
• Separated from compute nodes thus storage can
be scaled independently from compute
• Data is easily shared and can be accessed from
different locations.
Lower cost
More versatile
Fast enough
Higher cost
Less versatile
Potentially Faster
Choose your storage
Big Data engines
Storage
Choose your storage
Big Data engines
Storage
Choose your storage
Big Data engines
Storage
Hadoop ecosystem
§ Hadoop FileSystem interface is popular to interact with underlying storage
§ Hadoop shipped with various storage connectors that implement FileSystem interface
§ Many Big Data engines utilize Hadoop storage connectors
Object Storage ( S3 API, Swift API, Azure API)HDFS
Apache Spark
§ Apache Spark is a fast and general engine for large-scale data processing
§ Written in Scala, Python, Java, R
§ Very active Big Data project
§ Apache Spark combines Spark SQL, streaming, machine learning, graph processing and complex
analytics (MapReduce plus) in a single engine and is able to optimize programs across all of these
paradigms
§ Spark can handle multiple object stores as a data source
§ Spark depends on Hadoop connectors to interact with objects
Example: persist collection as an object
val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
val myData = sc.parallelize(data, 9)
myData.saveAsTextFile(”s3a://mybucket/data.txt")
API GET HEAD PUT DELETE
Hadoop (s3a) S3
Example: persist collection as an object
val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
val myData = sc.parallelize(data, 9)
myData.saveAsTextFile(”s3a://mybucket/data.txt")
API GET HEAD PUT DELETE
Hadoop (s3a) S3 158 361 26 16
The deep dive into the numbers
§ What is wrong?
- We observed that some of the Hadoop components are highly inefficient to
work with object stores
- Two major reasons
- The existing algorithms used by Hadoop for persisting distributed data sets
are not optimized for object stores.
- Cost of supporting FS shell operations and treating object store as a file
system. This has negative effect on the Hadoop connectors.
We can make it much better
We did it better
It doesn’t have to be like this
Fault tolerance algorithms in the write flows
§ Output committers are code components in the Hadoop that responsible to persist data sets
generated by MapReduce jobs. Output committers designed to be fault tolerant.
..result/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000
..result/_temporary/0/task_201702221313_0000_m_000000/part-00000
..result/part-00001
input data set wordcount
Persist result as an object “result”
Output committers and object stores
§ Output committers uses temp files and folders for every write operation and then renames them.
§ Algorithms used by output committers uses temporary files to achieve fault tolerance of the write
flows. Hadoop has FileOutputComitter version 1 and 2
§ File systems supports atomic rename, which perfectly fits into this paradigm.
§ Object stores do not support rename natively; use copy and delete instead.
..result/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000
..result/_temporary/0/task_201702221313_0000_m_000000/part-00000
..result/part-00001
This leads to dozens of expensive requests targeted to the object
store
Hadoop FS shell operations and Hadoop connectors
§ All the Hadoop connectors to be 100% compliant with the Hadoop ecosystem must support FS shell
operations.
§ FS shell operations are frequently used with HDFS
§ FS shell operations are not object store friendly
- not native object store operations : operations on files/directories such as : copy, rename, etc.
- not optimized object store operations: upload object will first create temp object, then rename it to
the final name
§ Object store vendors provide CLI tools that are preferable over Hadoop FS shell commands.
./bin/hadoop fs –mkdirs hdfs://myhdfs/a/b/c/
./bin/hadoop fs –put mydata.txt hdfs://myhdfs/a/b/c/data.txt
Hadoop FS shell operations and analytic flows
§ The code to enable FS shell indirectly hurts entire analytic flows in the Hadoop connectors by
performing operations that are not inherent to the analytic flows
- Recursive directories create (empty object), check if directory exists, etc.
- Supporting move, rename, recursive listing of directories, etc.
§ Analytic flows such as Spark or Map Reduce do not directly need these FS shell operations
Hadoop FS shell operations and analytic flows
§
-
-
§
§ What does analytic flows need?
- Object listing
- Create new objects (object name may contain “/” to indicate pseudo-directory)
- Read objects
- Get data partitions (data partition is the unit of data parallelism for each MapReduce task)
- Delete
Analytic flows need only a small subset of the functionality
Why does supporting FS shell affect analytic flows?
/data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000
/data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1/part-0001
……
/data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000008_8/part-0008
4
5
3
2
1
0
7
8
6
Persist distributed data
set as an object
Why does supporting FS shell affect analytic flows?
Operation File
1 Spark Driver: make
directories recursively
..data.txt/_temporary/0
2 Spark Executor: make
directories recursively
..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1
3 (SE): write task
temporary object
..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1/part-00001
4 (SE): list directory ..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1
5 (SE): rename task
temporary object to job
temporary object
..data.txt/_temporary/0/task_201702221313_0000_m_000001/part-00001
6 (SD): list job temporary
directories recursively
..data.txt/_temporary/0/task_201702221313_0000_m_000001
7 (SD): rename job
temporary object to
final name
..data.txt/part-00001
8 (SD): write
SUCCESS object
..data.txt/_SUCCESS
Certain Hadoop components designed to
work with file systems and not object stores
An opinionated object store connector for
Spark can provide significant gains
.FileSystem
Stocator – the next-gen object store connector
§ Advanced connector designed for object stores. Doesn’t create temp files and folders for write
operations and still provides fault tolerance coverage, including speculative mode.
§ Doesn’t use Hadoop modules and interacts with object store directly. This makes Stocator superior
faster for write flow and generate many less REST calls
§ Supports analytic flows and not shell commands
§ Implements Hadoop FileSystem interface.
§ No need to modify Spark or Hadoop
§ Stocator doesn’t need local HDFS
Stocator adapted for analytic flows
https://github.com/SparkTC/stocator
Released under Apache License 2.0
Hadoop and objects
Object Storage ( S3 API, Swift API, Azure API)HDFS
Where to find Stocator
§ IBM Cloud Object Storage
§ Based on the open source stocator
§ Bluemix Spark as a Service
§ IBM Data Science Experience
§ Open source - https://github.com/SparkTC/stocator
- Stocator-core module, stocator-openstack-swift connector
- Apache License 2.0
Example: persist collection as object
val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
val myData = sc.parallelize(data, 9)
myData.saveAsTextFile(”s3d://mybucket/data.txt")
API GET HEAD PUT DELETE
Stocator S3 1 2 11 0
Hadoop (s3a) S3 158 361 26 16
Compare performance of Stocator
0
100
200
300
400
500
600
700
800
Teragen Copy	 Terasort Wordcount Read	(50GB) Read	(500GB) TPC-DS
Seconds
Stocator Hadoop	Swift S3a
18x 10x 9x 2x 1x1x 1x**
** Comparing Stocator to S3a
* 40Gbps in accesser tier
§ Stocator is much faster
for write-intensive
workloads
§ Stocator as good for
read-intensive
workloads
S3a connector is improving*
1.5x 1.3x 1.3x 1.1x 1x1x 1x**
** Comparing Stocator to S3a with CV2 and FU
§ File Output Committer
Algorithm 2 halves
number of renames
(CV2)
§ Fast Upload introduces
streaming on output
(FU)
§ Stocator still faster for
write-intensive
workloads and as good
for read-intensive 0
100
200
300
400
500
600
700
800
Teragen Copy Terasort Wordcount Read	(50	GB) Read	(500	GB) TPC-DS
seconds
Stocator S3a S3a	CV2 S3a	CV2+FU
Compare number of REST operations*
21x 16x 15x 16x 2x2x 2x**
** Comparing Stocator to S3a with CV2 and FU
§ Stocator does
many less REST
operations
§ Less operations
means
• Lower overhead
• Lower cost
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Teragen Copy	 Terasort Wordcount Read	(50GB) Read	(500GB) TPC-DS
RESTful	operations
Stocator Hadoop	Swift S3a* 40Gbps in accesser tier
IBM Spark@SETI
§ Headquartered in Mountain View, CA. Founded 1984. 150 Scientists, researchers and staff.
§ The mission of the SETI Institute is to explore the potential for extra-terrestrial life….
§ Allen Telescope Array (ATA)
42 Receiving Dishes
Each 6m diameter
1GHz to 10GHz
The Allen Telescope Array
The Spark@SETI Project – By the Numbers
§ 200 million signal events
§ 14 million complex amplitude files in Object Store
- Signal of interest
- Each binary file contains 90 second ‘snapshot’ of raw antennae voltages
- 14M files = 1TB of raw signal data
feature extraction for clustering ~12 hours
§ Long duration observations = 2 beams @ 2.5TB each
- Wide-band analysis…. 5TB processed for wideband detection in approximately 13.5 hours
wall time.
Visit our joint talk with SETI at Spark Summit San Francisco, Wednesday, June 7 5:40 PM – 6:10 PM
“Very large data files, object stores, and deep learning –
lessons learned while looking for sights of extra-terrestrial life”
Lessons learned
Object storage provides a good alternative for HDFS
Existing Hadoop ecosystem doesn’t work efficient with object stores
Nothing fundamental wrong with object stores and the inefficiency is due to software
components that are not adapted for object stores
We demonstrated Stocator – an object store connector
Gil Vernik (gilv@il.ibm.com), Trent Gray-Donald (trent@ca.ibm.com)

More Related Content

What's hot

How to Calculate Data Center TCO (SlideShare)
How to Calculate Data Center TCO (SlideShare)How to Calculate Data Center TCO (SlideShare)
How to Calculate Data Center TCO (SlideShare)SP Home Run Inc.
 
grid computing
grid computinggrid computing
grid computingrock om
 
holographic memory
holographic memoryholographic memory
holographic memorybhavithd
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
 
semiconductor protection act
semiconductor protection actsemiconductor protection act
semiconductor protection actwelcometofacebook
 
Storage Area Network (San)
Storage Area Network (San)Storage Area Network (San)
Storage Area Network (San)sankcomp
 
CLOUD COMPUTING BY SIVASANKARI
CLOUD COMPUTING BY SIVASANKARICLOUD COMPUTING BY SIVASANKARI
CLOUD COMPUTING BY SIVASANKARISivaSankari36
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualizationDr.Neeraj Kumar Pandey
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud ComputingRishikese MR
 
Hp proLiant servers
Hp proLiant serversHp proLiant servers
Hp proLiant serversSere Rent
 
Smart Card Presentation
Smart Card Presentation Smart Card Presentation
Smart Card Presentation ppriteshs
 
Evolution and working of Torrents
Evolution and working of TorrentsEvolution and working of Torrents
Evolution and working of TorrentsPrashanth Suresh
 
COMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and TrackingCOMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and TrackingMark Billinghurst
 
Intellectual property rights (IPR)
Intellectual property rights (IPR)Intellectual property rights (IPR)
Intellectual property rights (IPR)Ncell
 

What's hot (20)

Cloud Computing Using OpenStack
Cloud Computing Using OpenStack Cloud Computing Using OpenStack
Cloud Computing Using OpenStack
 
How to Calculate Data Center TCO (SlideShare)
How to Calculate Data Center TCO (SlideShare)How to Calculate Data Center TCO (SlideShare)
How to Calculate Data Center TCO (SlideShare)
 
grid computing
grid computinggrid computing
grid computing
 
holographic memory
holographic memoryholographic memory
holographic memory
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
semiconductor protection act
semiconductor protection actsemiconductor protection act
semiconductor protection act
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Storage Area Network (San)
Storage Area Network (San)Storage Area Network (San)
Storage Area Network (San)
 
CLOUD COMPUTING BY SIVASANKARI
CLOUD COMPUTING BY SIVASANKARICLOUD COMPUTING BY SIVASANKARI
CLOUD COMPUTING BY SIVASANKARI
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
 
Grid computing
Grid computingGrid computing
Grid computing
 
Virtualization and cloud Computing
Virtualization and cloud ComputingVirtualization and cloud Computing
Virtualization and cloud Computing
 
SAP virtualization
SAP virtualizationSAP virtualization
SAP virtualization
 
Hp proLiant servers
Hp proLiant serversHp proLiant servers
Hp proLiant servers
 
Smart Card Presentation
Smart Card Presentation Smart Card Presentation
Smart Card Presentation
 
Evolution and working of Torrents
Evolution and working of TorrentsEvolution and working of Torrents
Evolution and working of Torrents
 
Iris scanning
Iris scanningIris scanning
Iris scanning
 
COMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and TrackingCOMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and Tracking
 
Intellectual property rights (IPR)
Intellectual property rights (IPR)Intellectual property rights (IPR)
Intellectual property rights (IPR)
 

Similar to Hadoop and object stores: Can we do it better?

Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...AyeeshaParveen
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 

Similar to Hadoop and object stores: Can we do it better? (20)

Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
מיכאל
מיכאלמיכאל
מיכאל
 
Anju
AnjuAnju
Anju
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Ess1000 glossary
Ess1000 glossaryEss1000 glossary
Ess1000 glossary
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Hadoop and object stores: Can we do it better?

  • 1. Hadoop and object stores – can we do it better? Gil Vernik, Trent Gray-Donald IBM
  • 2. Speakers.. § Gil Vernik - IBM Research from 2010 - Architect, 25+ years of development experience - Active in open source - Recent interest: Big Data engines and object stores § Trent - IBM Distinguished Engineer - Architect on Watson Data Platform - Historically worked on the IBM Java VM Twitter: @vernikgil
  • 3. Agenda § Storage for unstructured data § Introduction to object storage – why are they needed and what are they? § HDFS and object stores – differences § Real world usage - How Hadoop accesses object stores - Understanding the issues - An alternative approach § SETI usage
  • 4. Storage for unstructured data § HDFS, distributed file system, or similar § Object storage - On premise, cloud based, hybrid, etc. - IBM Cloud Object Storage - Amazon S3 - OpenStack Swift, Azure Blob Storage, etc. § Non-SQL data bases / key-value stores Ingest raw data Read data with schema Unstructured data storage
  • 5. HDFS - Summary § Hadoop Distributed File System (distributed, and Hadoop-native). § Stores large amounts of unstructured data in arbitrary formats. § Default internal block size is large - usually 64MB. § Blocks are replicated. § Write once – read many (append allowed) § (Often) collocated with compute capacity. § Need an HDFS client to work with HDFS. § Hadoop FS shell is widely used with HDFS.
  • 6. What is an object store? § Object store is a perfect solution to store files (we call them data objects) § Each data object contains rich metadata and data itself. § Capable of storing huge amounts of unstructured data. § On premise, cloud based, hybrid, etc. Object storage
  • 7. Good things about object stores § Resilient store: data is will not be lost. § Fault tolerant : object store designed to operate during failures. § Various security models – data is safe. § Can be easily accessed for write or read flows. § (effectively) infinitely scalable – EB and beyond. § Low cost, long term storage solution.
  • 8. Organize data in the object store § Data objects are organized inside buckets (s3) or containers (Swift). § Each data object may contain a name with delimiters, usually “/”. § Conceptual grouping via delimiters allows hierarchical organization, an analogy to the directories in file systems but without the overhead or scalability limits of lots of directories. mytalks/year=2016/month=5/day=24/data-palooza.pdf mytalks/year=2017/month=5/day=24/hadoop-strata.pdf mytalks/year=2017/month=6/day=07/spark-summit.pdf bucket data object
  • 9. Object storage is not a file system § Write once – no append in place § Usually eventual consistent § Accessed via RESTful API, SDKs available for many languages. § Each data object has a unique URI. § Rename in object store is not atomic operation (unlike on file systems). - Rename = GET and PUT/COPY and DELETE. § Object creation is atomic. - Writing a file is not. § Examples - Store raw data for archive, raw IoT sensor data. - Export old data from database and store it as objects. Export “old” data
  • 10. HDFS HDFS HDFS Worker 1 Worker 2 Worker 3Worker 1 Worker 2 Worker 3 Object storage The usual dilemma No data locality Data locality • Impossible to scale storage without scaling compute. • Difficult to share HDFS data more globally • Separated from compute nodes thus storage can be scaled independently from compute • Data is easily shared and can be accessed from different locations.
  • 11. HDFS HDFS HDFS Worker 1 Worker 2 Worker 3Worker 1 Worker 2 Worker 3 Object storage The usual dilemma No data locality Data locality • Impossible to scale storage without scaling compute. • Difficult to share HDFS data more globally • Separated from compute nodes thus storage can be scaled independently from compute • Data is easily shared and can be accessed from different locations. Lower cost More versatile Fast enough Higher cost Less versatile Potentially Faster
  • 12. Choose your storage Big Data engines Storage
  • 13. Choose your storage Big Data engines Storage
  • 14. Choose your storage Big Data engines Storage
  • 15. Hadoop ecosystem § Hadoop FileSystem interface is popular to interact with underlying storage § Hadoop shipped with various storage connectors that implement FileSystem interface § Many Big Data engines utilize Hadoop storage connectors Object Storage ( S3 API, Swift API, Azure API)HDFS
  • 16. Apache Spark § Apache Spark is a fast and general engine for large-scale data processing § Written in Scala, Python, Java, R § Very active Big Data project § Apache Spark combines Spark SQL, streaming, machine learning, graph processing and complex analytics (MapReduce plus) in a single engine and is able to optimize programs across all of these paradigms § Spark can handle multiple object stores as a data source § Spark depends on Hadoop connectors to interact with objects
  • 17. Example: persist collection as an object val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) val myData = sc.parallelize(data, 9) myData.saveAsTextFile(”s3a://mybucket/data.txt") API GET HEAD PUT DELETE Hadoop (s3a) S3
  • 18. Example: persist collection as an object val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) val myData = sc.parallelize(data, 9) myData.saveAsTextFile(”s3a://mybucket/data.txt") API GET HEAD PUT DELETE Hadoop (s3a) S3 158 361 26 16
  • 19. The deep dive into the numbers § What is wrong? - We observed that some of the Hadoop components are highly inefficient to work with object stores - Two major reasons - The existing algorithms used by Hadoop for persisting distributed data sets are not optimized for object stores. - Cost of supporting FS shell operations and treating object store as a file system. This has negative effect on the Hadoop connectors. We can make it much better We did it better It doesn’t have to be like this
  • 20. Fault tolerance algorithms in the write flows § Output committers are code components in the Hadoop that responsible to persist data sets generated by MapReduce jobs. Output committers designed to be fault tolerant. ..result/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000 ..result/_temporary/0/task_201702221313_0000_m_000000/part-00000 ..result/part-00001 input data set wordcount Persist result as an object “result”
  • 21. Output committers and object stores § Output committers uses temp files and folders for every write operation and then renames them. § Algorithms used by output committers uses temporary files to achieve fault tolerance of the write flows. Hadoop has FileOutputComitter version 1 and 2 § File systems supports atomic rename, which perfectly fits into this paradigm. § Object stores do not support rename natively; use copy and delete instead. ..result/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000 ..result/_temporary/0/task_201702221313_0000_m_000000/part-00000 ..result/part-00001 This leads to dozens of expensive requests targeted to the object store
  • 22. Hadoop FS shell operations and Hadoop connectors § All the Hadoop connectors to be 100% compliant with the Hadoop ecosystem must support FS shell operations. § FS shell operations are frequently used with HDFS § FS shell operations are not object store friendly - not native object store operations : operations on files/directories such as : copy, rename, etc. - not optimized object store operations: upload object will first create temp object, then rename it to the final name § Object store vendors provide CLI tools that are preferable over Hadoop FS shell commands. ./bin/hadoop fs –mkdirs hdfs://myhdfs/a/b/c/ ./bin/hadoop fs –put mydata.txt hdfs://myhdfs/a/b/c/data.txt
  • 23. Hadoop FS shell operations and analytic flows § The code to enable FS shell indirectly hurts entire analytic flows in the Hadoop connectors by performing operations that are not inherent to the analytic flows - Recursive directories create (empty object), check if directory exists, etc. - Supporting move, rename, recursive listing of directories, etc. § Analytic flows such as Spark or Map Reduce do not directly need these FS shell operations
  • 24. Hadoop FS shell operations and analytic flows § - - § § What does analytic flows need? - Object listing - Create new objects (object name may contain “/” to indicate pseudo-directory) - Read objects - Get data partitions (data partition is the unit of data parallelism for each MapReduce task) - Delete Analytic flows need only a small subset of the functionality
  • 25. Why does supporting FS shell affect analytic flows? /data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000000_0/part-0000 /data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1/part-0001 …… /data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000008_8/part-0008 4 5 3 2 1 0 7 8 6 Persist distributed data set as an object
  • 26. Why does supporting FS shell affect analytic flows? Operation File 1 Spark Driver: make directories recursively ..data.txt/_temporary/0 2 Spark Executor: make directories recursively ..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1 3 (SE): write task temporary object ..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1/part-00001 4 (SE): list directory ..data.txt/_temporary/0/_temporary/attempt_201702221313_0000_m_000001_1 5 (SE): rename task temporary object to job temporary object ..data.txt/_temporary/0/task_201702221313_0000_m_000001/part-00001 6 (SD): list job temporary directories recursively ..data.txt/_temporary/0/task_201702221313_0000_m_000001 7 (SD): rename job temporary object to final name ..data.txt/part-00001 8 (SD): write SUCCESS object ..data.txt/_SUCCESS
  • 27. Certain Hadoop components designed to work with file systems and not object stores An opinionated object store connector for Spark can provide significant gains .FileSystem
  • 28. Stocator – the next-gen object store connector § Advanced connector designed for object stores. Doesn’t create temp files and folders for write operations and still provides fault tolerance coverage, including speculative mode. § Doesn’t use Hadoop modules and interacts with object store directly. This makes Stocator superior faster for write flow and generate many less REST calls § Supports analytic flows and not shell commands § Implements Hadoop FileSystem interface. § No need to modify Spark or Hadoop § Stocator doesn’t need local HDFS Stocator adapted for analytic flows https://github.com/SparkTC/stocator Released under Apache License 2.0
  • 29. Hadoop and objects Object Storage ( S3 API, Swift API, Azure API)HDFS
  • 30. Where to find Stocator § IBM Cloud Object Storage § Based on the open source stocator § Bluemix Spark as a Service § IBM Data Science Experience § Open source - https://github.com/SparkTC/stocator - Stocator-core module, stocator-openstack-swift connector - Apache License 2.0
  • 31. Example: persist collection as object val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9) val myData = sc.parallelize(data, 9) myData.saveAsTextFile(”s3d://mybucket/data.txt") API GET HEAD PUT DELETE Stocator S3 1 2 11 0 Hadoop (s3a) S3 158 361 26 16
  • 32. Compare performance of Stocator 0 100 200 300 400 500 600 700 800 Teragen Copy Terasort Wordcount Read (50GB) Read (500GB) TPC-DS Seconds Stocator Hadoop Swift S3a 18x 10x 9x 2x 1x1x 1x** ** Comparing Stocator to S3a * 40Gbps in accesser tier § Stocator is much faster for write-intensive workloads § Stocator as good for read-intensive workloads
  • 33. S3a connector is improving* 1.5x 1.3x 1.3x 1.1x 1x1x 1x** ** Comparing Stocator to S3a with CV2 and FU § File Output Committer Algorithm 2 halves number of renames (CV2) § Fast Upload introduces streaming on output (FU) § Stocator still faster for write-intensive workloads and as good for read-intensive 0 100 200 300 400 500 600 700 800 Teragen Copy Terasort Wordcount Read (50 GB) Read (500 GB) TPC-DS seconds Stocator S3a S3a CV2 S3a CV2+FU
  • 34. Compare number of REST operations* 21x 16x 15x 16x 2x2x 2x** ** Comparing Stocator to S3a with CV2 and FU § Stocator does many less REST operations § Less operations means • Lower overhead • Lower cost 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Teragen Copy Terasort Wordcount Read (50GB) Read (500GB) TPC-DS RESTful operations Stocator Hadoop Swift S3a* 40Gbps in accesser tier
  • 35. IBM Spark@SETI § Headquartered in Mountain View, CA. Founded 1984. 150 Scientists, researchers and staff. § The mission of the SETI Institute is to explore the potential for extra-terrestrial life…. § Allen Telescope Array (ATA) 42 Receiving Dishes Each 6m diameter 1GHz to 10GHz The Allen Telescope Array
  • 36. The Spark@SETI Project – By the Numbers § 200 million signal events § 14 million complex amplitude files in Object Store - Signal of interest - Each binary file contains 90 second ‘snapshot’ of raw antennae voltages - 14M files = 1TB of raw signal data feature extraction for clustering ~12 hours § Long duration observations = 2 beams @ 2.5TB each - Wide-band analysis…. 5TB processed for wideband detection in approximately 13.5 hours wall time. Visit our joint talk with SETI at Spark Summit San Francisco, Wednesday, June 7 5:40 PM – 6:10 PM “Very large data files, object stores, and deep learning – lessons learned while looking for sights of extra-terrestrial life”
  • 37. Lessons learned Object storage provides a good alternative for HDFS Existing Hadoop ecosystem doesn’t work efficient with object stores Nothing fundamental wrong with object stores and the inefficiency is due to software components that are not adapted for object stores We demonstrated Stocator – an object store connector Gil Vernik (gilv@il.ibm.com), Trent Gray-Donald (trent@ca.ibm.com)