SlideShare a Scribd company logo
HDFS Federation


Suresh Srinivas
@suresh_m_s




                  Page 1
Agenda

• HDFS Background

• Current Limitations

• Federation Architecture

• Federation Details

• Next Steps
HDFS Architecture

                                                        Two main layers
                                                        •   Namespace
Block Storage Namespace




                                                            ›   Consists of dirs, files and blocks
                            Namenode
                                        NS                  ›   Supports create, delete, modify and list files or
                                                                dirs operations
                                Block Management


                                                        •   Block Storage
                           Datanode     …    Datanode       ›   Block Management
                                      Storage                   • Datanode cluster membership
                                                                • Supports create/delete/modify/get block location
                                                                  operations
                                                                • Manages replication and replica placement
                                                            ›   Storage - provides read and write access to
                                                                blocks



                                                                   3
HDFS Architecture…

                                                        Implemented as
Block Storage Namespace




                           Namenode                     • Single Namespace Volume
                                        NS                 › Namespace Volume = Namespace +
                                Block Management             Blocks
                                                        • Single namenode with a namespace
                           Datanode     …    Datanode      › Entire namespace is in memory
                                      Storage              › Provides Block Management
                                                        • Datanodes store block replicas
                                                           › Block files stored on local file system




                                                                4
Limitation - Scalability

 Scalability
 •   Storage scales horizontally - namespace doesn’t
 •   Limited number of files, dirs and blocks
     ›   250 million files and blocks at 64GB Namenode heap size
           •   Still a very large cluster
           •   Facebook clusters are sized at ~70 PB storage



 Performance
 •   File system operations throughput limited by a single node
     ›   120K read ops/sec and 6000 write ops/sec
           •   Easily scalable to 20K write ops/sec by code improvements



                                              5
Limitation - Isolation

 Poor Isolation
 •   All the tenants share a single namespace
     ›   Separate volume for tenants is desirable

 •   Lacks separate namespace for different application
     categories or application requirements
     ›   Experimental apps can affect production apps

     ›   Example - HBase could use its own namespace




                                       6
Limitation – Tight Coupling

Namespace and Block Management are distinct services
• Tightly coupled due to co-location
• Scaling block management independent of namespace is simpler
• Simplifies Namespace and scaling it


Block Storage could be a generic service
• Namespace is one of the applications to use the service
• Other services can be built directly on Block Storage
   › HBase
   ›   Foreign namespaces



                                    7
Isolation is a problem for even small
                 clusters




                   8
HDFS Federation
         Namespace       NN-1                    NN-k                   NN-n

                                                                               Foreig
                                NS1                     NS k                   n NS n
                                           ...                    ...


                                  Pool 1             Pool k              Pool n
         Block Storage




                                                    Block Pools




                          Datanode 1               Datanode 2            Datanode m
                                ...                     ...                    ...
                                                 Common Storage


•   Multiple independent Namenodes and Namespace Volumes in a cluster
     ›   Namespace Volume = Namespace + Block Pool
•   Block Storage as generic storage service
     ›   Set of blocks for a Namespace Volume is called a Block Pool
     ›   DNs store blocks for all the Namespace Volumes – no partitioning
Key Ideas & Benefits

•   Distributed Namespace: Partitioned across namenodes
                                                                                        Alternate NN
     ›     Simple and Robust due to independent masters                                Implementation   HBase
                                                                           HDFS
         • Each master serves a namespace volume                         Namespace                      MR tmp

         • Preserves namenode stability – little namenode code change

     ›     Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files

•   Block Pools enable generic storage service                                       Storage Service


     ›     Enables Namespace Volumes to be independent of each other
     ›     Fuels innovation and Rapid development
            • New implementations of file systems and Applications on top of block storage possible
            • New block pool categories – tmp storage, distributed cache, small object storage

•   In future, move Block Management out of namenode to separate set of nodes
     ›     Simplifies namespace/application implementation
            • Distributed namenode becomes significantly simpler
HDFS Federation Details
• Simple design
   › Little change to the Namenode, most changes in Datanode, Config and Tools
   › Core development in 4 months
   › Namespace and Block Management remain in Namenode
      • Block Management could be moved out of namenode in the future

• Little impact on existing deployments
   › Single namenode configuration runs as is

• Datanodes provide storage services for all the namenodes
   › Register with all the namenodes
   › Send periodic heartbeats and block reports to all the namenodes
   › Send block received/deleted for a block pool to corresponding namenode



                                           11
HDFS Federation Details…
• Cluster Web UI for better manageability
    › Provides cluster summary
    › Namenode list and summary of namenode status
    › Decommissioning status
• Tools
    › Decommissioning works with multiple namespace
    › Balancer works with multiple namespaces
       • Both Datanode storage or Block Pool storage can be balanced

• Namenode can be added/deleted in Federated cluster
    › No need to restart the cluster
• Single configuration for all the nodes in the cluster
Managing Namespaces
•   Federation has multiple namespaces – Don’t                                 Client-side
    you need a single global namespace?                                    /
                                                                               mount-table
     ›   Key is to share the data and the names used
         to access the data

•   A global namespace is one way to do that
                                                               data project home     tmp
•   Client-side mount table is another way to
    share.
     ›   Shared mount-table => “global” shared view
     ›   Personalized mount-table => per-application                                  NS4
         view
         • Share the data that matter by mounting it

•   Client-side implementation of mount tables
                                                         NS1       NS2         NS3
     ›   No single point of failure
     ›   No hotspot for root and top level directories
Next Steps

• Complete separation of namespace and block
  management layers
  › Block storage as generic service

• Partial namespace in memory for further scalability
• Move partial namespace from one namenode to
  another
  › Namespace operation - no data copy
Next Steps…
                          • Namenode as a container for namespaces
                             › Lots of small namespaces
                               •   Chosen per user/tenant/data feed

                               •   Mount tables for unified namespace
                                   •    Can be managed by a central volume server
                 …

     Namenodes               › Move namespace from one container to
                               another for balancing
Datanode   …   Datanode
                          • Combined with partial namespace
                             › Choose number of namenodes to match
                               •   Sum of (Namespace working set)

                               •   Sum of (Namespace throughput)




                                   15
Thank You

More information
1. HDFS-1052 – HDFS Scalability with multiple namenodes
2. An Introduction to HDFS Federation –
  https://hortonworks.com/an-introduction-to-hdfs-federation/

More Related Content

What's hot

Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
Rkrishna Mishra
 
Basics of Cloud Computing
Basics of Cloud ComputingBasics of Cloud Computing
Basics of Cloud Computing
Pranav Vashistha
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Hadoop
HadoopHadoop
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
Fabio Fumarola
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
Animesh Chaturvedi
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
Eric Wendelin
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Redis-SGX: Dmitrii Kuvaiskii
Redis-SGX: Dmitrii KuvaiskiiRedis-SGX: Dmitrii Kuvaiskii
Redis-SGX: Dmitrii Kuvaiskii
Redis Labs
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
MongoDB
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Introduction to CloudStack
Introduction to CloudStack Introduction to CloudStack
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
Karan Singh
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Servlets
ServletsServlets
Directory services
Directory servicesDirectory services
Directory services
Christalin Nelson
 
01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure
Muhammad Ahad
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
nomathjobs
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Manuel Correa
 

What's hot (20)

Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
Basics of Cloud Computing
Basics of Cloud ComputingBasics of Cloud Computing
Basics of Cloud Computing
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Hadoop
HadoopHadoop
Hadoop
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Redis-SGX: Dmitrii Kuvaiskii
Redis-SGX: Dmitrii KuvaiskiiRedis-SGX: Dmitrii Kuvaiskii
Redis-SGX: Dmitrii Kuvaiskii
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Introduction to CloudStack
Introduction to CloudStack Introduction to CloudStack
Introduction to CloudStack
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Servlets
ServletsServlets
Servlets
 
Directory services
Directory servicesDirectory services
Directory services
 
01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure01. 03.-introduction-to-infrastructure
01. 03.-introduction-to-infrastructure
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Viewers also liked

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Edureka!
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
DataWorks Summit
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
Hortonworks
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
Edureka!
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Adam Kawa
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
Adam Kawa
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
Arvind Prabhakar
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
GetInData
 
HCatalog
HCatalogHCatalog
HCatalog
GetInData
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
GetInData
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
Hanborq Inc.
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Adam Kawa
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
Adam Kawa
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
GetInData
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
Arvind Prabhakar
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
Subhas Kumar Ghosh
 

Viewers also liked (20)

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
HCatalog
HCatalogHCatalog
HCatalog
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 

Similar to HDFS Federation

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Hortonworks
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Yahoo Developer Network
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
Yahoo Developer Network
 
Inexpensive storage
Inexpensive storageInexpensive storage
Inexpensive storage
Manfred Furuholmen
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
Plamen Jeliazkov
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
Open Stack
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
Atanu Chatterjee
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual Machines
The Linux Foundation
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
SwarnaSLcse
 
Index file
Index fileIndex file
Index file
SushantGote
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
Sisimon Soman
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storage
GlusterFS
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
Hortonworks
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
StampedeCon
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Yahoo Developer Network
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
GlusterFS
 

Similar to HDFS Federation (20)

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Inexpensive storage
Inexpensive storageInexpensive storage
Inexpensive storage
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual Machines
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Index file
Index fileIndex file
Index file
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storage
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 

Recently uploaded (20)

Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 

HDFS Federation

  • 2. Agenda • HDFS Background • Current Limitations • Federation Architecture • Federation Details • Next Steps
  • 3. HDFS Architecture Two main layers • Namespace Block Storage Namespace › Consists of dirs, files and blocks Namenode NS › Supports create, delete, modify and list files or dirs operations Block Management • Block Storage Datanode … Datanode › Block Management Storage • Datanode cluster membership • Supports create/delete/modify/get block location operations • Manages replication and replica placement › Storage - provides read and write access to blocks 3
  • 4. HDFS Architecture… Implemented as Block Storage Namespace Namenode • Single Namespace Volume NS › Namespace Volume = Namespace + Block Management Blocks • Single namenode with a namespace Datanode … Datanode › Entire namespace is in memory Storage › Provides Block Management • Datanodes store block replicas › Block files stored on local file system 4
  • 5. Limitation - Scalability Scalability • Storage scales horizontally - namespace doesn’t • Limited number of files, dirs and blocks › 250 million files and blocks at 64GB Namenode heap size • Still a very large cluster • Facebook clusters are sized at ~70 PB storage Performance • File system operations throughput limited by a single node › 120K read ops/sec and 6000 write ops/sec • Easily scalable to 20K write ops/sec by code improvements 5
  • 6. Limitation - Isolation Poor Isolation • All the tenants share a single namespace › Separate volume for tenants is desirable • Lacks separate namespace for different application categories or application requirements › Experimental apps can affect production apps › Example - HBase could use its own namespace 6
  • 7. Limitation – Tight Coupling Namespace and Block Management are distinct services • Tightly coupled due to co-location • Scaling block management independent of namespace is simpler • Simplifies Namespace and scaling it Block Storage could be a generic service • Namespace is one of the applications to use the service • Other services can be built directly on Block Storage › HBase › Foreign namespaces 7
  • 8. Isolation is a problem for even small clusters 8
  • 9. HDFS Federation Namespace NN-1 NN-k NN-n Foreig NS1 NS k n NS n ... ... Pool 1 Pool k Pool n Block Storage Block Pools Datanode 1 Datanode 2 Datanode m ... ... ... Common Storage • Multiple independent Namenodes and Namespace Volumes in a cluster › Namespace Volume = Namespace + Block Pool • Block Storage as generic storage service › Set of blocks for a Namespace Volume is called a Block Pool › DNs store blocks for all the Namespace Volumes – no partitioning
  • 10. Key Ideas & Benefits • Distributed Namespace: Partitioned across namenodes Alternate NN › Simple and Robust due to independent masters Implementation HBase HDFS • Each master serves a namespace volume Namespace MR tmp • Preserves namenode stability – little namenode code change › Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files • Block Pools enable generic storage service Storage Service › Enables Namespace Volumes to be independent of each other › Fuels innovation and Rapid development • New implementations of file systems and Applications on top of block storage possible • New block pool categories – tmp storage, distributed cache, small object storage • In future, move Block Management out of namenode to separate set of nodes › Simplifies namespace/application implementation • Distributed namenode becomes significantly simpler
  • 11. HDFS Federation Details • Simple design › Little change to the Namenode, most changes in Datanode, Config and Tools › Core development in 4 months › Namespace and Block Management remain in Namenode • Block Management could be moved out of namenode in the future • Little impact on existing deployments › Single namenode configuration runs as is • Datanodes provide storage services for all the namenodes › Register with all the namenodes › Send periodic heartbeats and block reports to all the namenodes › Send block received/deleted for a block pool to corresponding namenode 11
  • 12. HDFS Federation Details… • Cluster Web UI for better manageability › Provides cluster summary › Namenode list and summary of namenode status › Decommissioning status • Tools › Decommissioning works with multiple namespace › Balancer works with multiple namespaces • Both Datanode storage or Block Pool storage can be balanced • Namenode can be added/deleted in Federated cluster › No need to restart the cluster • Single configuration for all the nodes in the cluster
  • 13. Managing Namespaces • Federation has multiple namespaces – Don’t Client-side you need a single global namespace? / mount-table › Key is to share the data and the names used to access the data • A global namespace is one way to do that data project home tmp • Client-side mount table is another way to share. › Shared mount-table => “global” shared view › Personalized mount-table => per-application NS4 view • Share the data that matter by mounting it • Client-side implementation of mount tables NS1 NS2 NS3 › No single point of failure › No hotspot for root and top level directories
  • 14. Next Steps • Complete separation of namespace and block management layers › Block storage as generic service • Partial namespace in memory for further scalability • Move partial namespace from one namenode to another › Namespace operation - no data copy
  • 15. Next Steps… • Namenode as a container for namespaces › Lots of small namespaces • Chosen per user/tenant/data feed • Mount tables for unified namespace • Can be managed by a central volume server … Namenodes › Move namespace from one container to another for balancing Datanode … Datanode • Combined with partial namespace › Choose number of namenodes to match • Sum of (Namespace working set) • Sum of (Namespace throughput) 15
  • 16. Thank You More information 1. HDFS-1052 – HDFS Scalability with multiple namenodes 2. An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/