SlideShare a Scribd company logo
1 of 34
NameNode Analytics
1
NameNode Analytics
2
Who Am I?
Bachelor of Science in Computer Science from UC San Diego (Eleanor Roosevelt College).
I have been fortunate to work alongside Konstantin Shvachko, one of the original architects of the HDFS NameNode from
Yahoo!, for several years.
I have spent 6 years working on HDFS internals and related projects at eBay, WANdisco, and now PayPal.
Hadoop open source contributor:
• HDFS-3107: Introduce truncate to HDFS.
• HDFS-4456: Add concat to HttpFS and WebHDFS.
• HADOOP-10641: Introduce coordination / consensus interface to HDFS.
• MAPREDUCE-2669: Add StandardDev, Mean, and Mode, examples to MapReduce.
• Various bug fixes.
Work on NameNode internals and distributed File System design.
 Giraffa File System: https://github.com/GiraffaFS/giraffa
 GeoDistributed File System (WANdisco Patent): https://patents.justia.com/patent/20150278244
©2015 PayPal Inc. Confidential and proprietary. 3
Plamen Jeliazkov.
Background
Created as a means of storing petabytes order of data securely (through replication).
By virtue of being a distributed file system, HDFS is seen as a safe haven for any type of data.
However, HDFS does have its own scaling limitations:
• “Limits are around 10,000 clients working on around 200 million files and directories, totaling around 500 million file
system objects (inodes and blocks). Typically capping out around 20 PBs, though larger clusters do exist.”
 https://www.usenix.org/publications/login/april-2010-volume-35-number-2/hdfs-scalability-limits-growth - Konstantin Shvachko,
Therefore, HDFS is best used as a system for storing large single files of data.
• Best case scenario is large files with large block sizes so that the NameNode has to store less metadata per raw
storage.
Because of the nature of having large sequential files it is also best used as a system for processing batch analytics or by
applications that benefit from sequential reads / writes.
©2015 PayPal Inc. Confidential and proprietary. 4
The Hadoop Distributed File System.
HDFS @ PayPal
Customers tend to see HDFS as a giant black box. Dump and forget.
Customers just want to store their data in the easiest manner. No storage optimization or security.
• Do not like to build any sort of “clean-up” or TTL mechanisms into their applications.
When space issues arise Hadoop Management lacks context:
• What took up that space? (RCA required)
• Who took up that space? (RCA required)
• What targets can we look at for deleting quickly? (Small files, old files, empty files, specific user, etc.)
Even in the event we catch wind of a data issue:
• Difficult to determine which team or person is responsible.
• Difficult to determine which datasets were affected.
• Damage is already done. (Cluster performance degraded; quota hit; application deployed; etc. It’s already too late…)
• Difficult to be pro-active, so we end up being re-active instead. Often times very late to react.
©2015 PayPal Inc. Confidential and proprietary. 5
My observations of HDFS data management pain points.
Previous Architecture(s)
©2015 PayPal Inc. Confidential and proprietary. 6
The Old World
Active
NN
Standby
NN
FsImage Processed Image
Offline
Image
Viewer
Kibana /
Elastic Search
3 mins 90 mins 30 mins
Legacy
FsImage
* This assumes a large enterprise Hadoop environment where the FsImage is larger than 20 GB. For smaller image sizes, this is trivial.
* This architecture usually leads to generation of daily reports. This diagram is presentative of the fastest possible report generation.
HDFS Usage Analytics Today
Standby NameNode is forced to create a legacy FSImage.
• This requires additional work by Standby NameNode to achieve.
• This legacy image is created in addition to the regular, Protobuf’d, FSImage created for the active NN.
• Storage redundancy solely for the purpose of performing analytics later.
(We end up creating 2 FSImages per checkpoint – double storage cost, double IO cost, no instant benefit).
• Legacy image retains less metadata than the Protobuf image. (No XAttrs, tokens, storage policies).
Legacy-format FSImage is parsed and uploaded to Kibana or ElasticSearch.
• This process typically happens once a day.
• It takes approximately 15 to 20 minutes to fully parse a 25GB FSImage, about current size of large cluster FSImage.
We have seen FSImages of over 30+ GB when things are bad.
• Requires pulling the FSImage off the Standby NameNode. Network cost is not very high however.
Making this process more frequent will increase network cost on the Standby. RPC issues seen if bandwidth saturated.
• Image dump -> Parsing -> Processing can take anywhere between 2-3 hours. Only about 4-6 reports per day at best.
Other third party solutions tend to follow the architecture described on this slide.
©2015 PayPal Inc. Confidential and proprietary. 7
My observations on the current “standard”.
Engineering A New Solution
In order to query near real time you require something like a constantly updating NameNode.
• Attempting to do so in any distributed manner involves solving the distributed atomic rename or coordination.
(Think HBase region transitions).
• We cannot rely on parsing the FSImage and EditLogs as that adds too much processing time.
 15-30 minutes to parse legacy FSImage and 1-2 minutes per large EditLog.
 Protobuf parsing means loading the entire INode set into memory.
To filter or query effectively requires parallel processing.
• Assuming we can’t utilize a distributed system effectively, can we work with a single node? Yes.
We can also utilize multiple CPU cores…
• Java 8 Stream API allows simple filters, maps, reduces, collections on large parallelized in-memory data structures.
• A single NameNode stores the entire metadata set in-memory already in such a structure.
Do we need to build a whole new system? No.
• We need to write some custom query engine logic but can re-use most HDFS data structures and logic.
• We can keep our “NameNode” up to date using live cluster Journal Nodes.
• We can simplify further by removing the RPC Server. No need for DataNodes or clients to connect to our
“NameNode”.
©2015 PayPal Inc. Confidential and proprietary. 8
Combining old knowledge and new ideas.
Inspiration from Dr. Elephant
Dr. Elephant is a tool from LinkedIn for providing ”self-help” suggestions on how to tune various YARN applications in order
to free up more capacity queue space and perform better. NNA was also conceived as a “self-help” tool.
©2015 PayPal Inc. Confidential and proprietary. 9
Ideas inspiring other ideas.
Inspiration from Dr. Elephant
©2015 PayPal Inc. Confidential and proprietary. 10
Ideas inspiring other ideas.
NameNode Analytics
“A modified, isolated, read-only, Standby NameNode, with no RPC Server,
but with a Web Server and custom query engine embedded inside it.”
©2015 PayPal Inc. Confidential and proprietary. 11
It can best be described as:
Architecture
©2015 PayPal Inc. Confidential and proprietary. 12
Basic high-level view.
Client
NameNode Analytics
(Off the cluster;
isolated and read-only NN)
JournalNodes
(On the cluster)
NameNode
(On the cluster)
(1) Query
(0*) One-time Bootstrap Call
(Fetch remote FsImage)
(3) Response
(*) EditLog Tailing
(*) Writes editLog to JournalNodes
* = conditional or “in the background”
(2) Processing
Architecture
©2015 PayPal Inc. Confidential and proprietary. 13
Deep dive view into NNA.
NameNode Analytics
Rest API
(Spark Java Web Server)
Java 8 Stream API
(Query Processing)
NameNode FSNamesystem
(Image loading; editLog tailing / updating; and in-memory set)
NameNode
In-Memory
Metadata
Set
(INode Tree)
(GSet)
Query
EditLog-Tailer updates
Response
NNA @ PayPal
NNA provides the information and an internal TICK stack keeps the historical data, visualizes, and takes action.
(TICK stack is: telegraf, influxDB, chronograf, kapacitor)
©2015 PayPal Inc. Confidential and proprietary. 14
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 15
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 16
How do we utilize this?
NNA @ PayPal
©2015 PayPal Inc. Confidential and proprietary. 17
How do we utilize this?
NNA @ PayPal
Who is creating the most empty files?
Who is creating the most empty directories?
Who are the biggest users of the file system in terms of file count or space usage?
What are the largest directories by in terms of file count or space usage?
Who is creating small files? (Greater than 0 bytes but much less than 1 block size).
Who has the most “open permission” files? (chmod 777 abusers).
What is the average file size under a particular directory?
What files are open / being written to right now?
©2015 PayPal Inc. Confidential and proprietary. 18
How do we utilize this?
NNA @ PayPal
Tracking of quota usage.
Tracking of old files.
Tracking of small files / areas for archival or compression and compaction.
Tracking of user last delegation token issued date.
Tracking of File types (extensions).
Per user usage reports and suggestions.
Query against any dimension available in the HDFS INode(s).
(In progress) AUTOMATED HDFS DATA MANAGEMENT.
©2015 PayPal Inc. Confidential and proprietary. 19
How do we utilize this?
First detect, then fix.
NNA is your detection tool.
©2015 PayPal Inc. Confidential and proprietary. 20
Understanding NNA API
NNA first asks you to define a set to work with; either the set of all files, or the set of all directories.
Depending on which set you pick, different options are available to you.
From there you build a set of filters to apply to that set and then finally some result you want to reduce to, the sum.
• Take this example: /filter?set=files&filters=fileSize:eq:0&sum=count
• "Starting with the set of all files, get all those that have a file size equal to zero, and count how many there are."
• Or this example: /filter?set=files&filters=modTime:olderThanYears:1&sum=diskspaceConsumed
• "Starting with the set of all files, get all those with a modification time older than 1 year, and sum up their diskspace
usage."
From there we allow even more complex groupings via a /histogram endpoint:
• For example: /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
• "Starting with the set of all files, get all those that have a file size equal to zero, group them by user, and count how
many there are.
©2015 PayPal Inc. Confidential and proprietary. 21
What do queries look like?
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 22
For example…
Graphing:
Users by # of empty files they own
/histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 23
For example…
Graphing:
Users by # of empty directories
they own
/histogram?set=dirs&filters=dirNumChildren:eq:0&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 24
For example…
Graphing:
Users by # of small files
/histogram?set=files&filters=fileSize:lte:1024&type=user&sum=count
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 25
For example…
Dumping:
Files currently being
written to
/filter?set=files&filters=isUnderConstruction:eq:true&limit=1000
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 26
For example…
Histogram Binning:
Size of Files vs
Disk space consumed by Files
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 27
For example…
Histogram Binning:
Disk space consumed by
different replication factors
Some Pictures
©2015 PayPal Inc. Confidential and proprietary. 28
For example…
Histogram Binning:
File Type Extensions
Story Time!
HDFS-11419
Slow addBlock operation on NameNode due to users writing into WARM StoragePolicy directories.
Difficult to find all the WARM directories; impossible from legacy FsImage alone; very simple on NNA.
Dump all WARM directory path from API: /filter?set=dirs&filters=storageType:eq:WARM
NameNode Pushing Scalability Limits
We were pushing the limits of the NameNode and close to going full GC. 400+ million files. 800+ million total file system objects.
Difficult to find datasets to delete and little time.
Find old datasets to delete: /histogram?set=files&filters=accessTime:olderThanYears:2&type=parentDir&sum=count
Small File Prevention
Midway through an imitative to find and clean-up small files from HDFS we found users were creating small files at the rate we were
compressing and cleaning them.
Difficult to find which users are creating small files.
Find users by small files: /histogram?set=files&filters=fileSize:lte:1048576,accessTime:hoursAgo:24&type=user&sum=count
©2015 PayPal Inc. Confidential and proprietary. 29
When has NNA saved us?
Successes
Near real-time analysis.
‘nough said.
For anyone wondering - the magic is in skipping the FSNamesystem lock and introducing multi-core processing.
Easy to install and maintain.
NNA’s Gradle build can construct RPM packages.
Difficulty is about equal to that of bringing up a new, additional, Standby NameNode.
Scalable?
While NNA is not a distributed system, it is a replicated read-only copy.
If you require more analytical throughput you could spin up multiple NNA instances.
The Journal Nodes can handle many readers.
©2015 PayPal Inc. Confidential and proprietary. 30
Where has NNA won?
Flaws
It is still a NameNode.
NNA is subject to all the faults and flaws of a regular HDFS NameNode.
If you have too many files and blocks, your NNA instance will operate slower as a result.
Interactive queries that don’t reduce the working set are not great for NNA.
It is not a distributed system.
While NNA can serve cached reports very frequently, it cannot handle many interactive queries at the same time.
Queries are best used by admins while reports are best used by end users.
It is “one of those” single-person projects.
While I had assistance in coding, NNA was mostly a one person show.
Fixing bugs and adding features over a period of nearly a year and a half now.
There is plenty of work still to do and things to improve.
©2015 PayPal Inc. Confidential and proprietary. 31
NNA is not Perfect.
Future Work
©2015 PayPal Inc. Confidential and proprietary. 32
Where can NNA go from here?
HDFS-6382 : TTL In HDFS
Discussion about TTL living outside the NameNode. Desire to not introduce TTL management due to additional thread resource requirements
on active NameNode. NNA could be extended to provide a routine TTL service on top of it.
HDFS-13150 : Faster Tailing of Edits from Journal Nodes
Part of the work to make Standby NameNode(s) service reads is to reduce the latency between when an EditLog transaction is applied on the
Active vs on the Standby. Reducing this latency means NNA queries become even closer to real time as well.
HDFS Cluster Management Integration
NNA is trivial enough to install that it should be able to easily create an Ambari package, Cloudera Parcel, or other integration package for
your flavor of management consoles.
Web & Security
NNA supports LDAP only at the moment. Uses JSON Web Tokens to maintain sessions. Would any Security experts like to lend a hand?
Support for Kerberos authentication would be great!
Demo
Example Local Cluster from Code
©2015 PayPal Inc. Confidential and proprietary. 33
END
(Q & A?)
©2015 PayPal Inc. Confidential and proprietary. 34

More Related Content

Similar to NameNode Analytics - Querying HDFS Namespace in Real Time

Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingSam Ng
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopStefano Paluello
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysisMichael Boman
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 

Similar to NameNode Analytics - Querying HDFS Namespace in Real Time (20)

Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Introduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data ProcessingIntroduction to Hadoop and Big Data Processing
Introduction to Hadoop and Big Data Processing
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Hadoop
HadoopHadoop
Hadoop
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis44CON 2014: Using hadoop for malware, network, forensics and log analysis
44CON 2014: Using hadoop for malware, network, forensics and log analysis
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 

NameNode Analytics - Querying HDFS Namespace in Real Time

  • 3. Who Am I? Bachelor of Science in Computer Science from UC San Diego (Eleanor Roosevelt College). I have been fortunate to work alongside Konstantin Shvachko, one of the original architects of the HDFS NameNode from Yahoo!, for several years. I have spent 6 years working on HDFS internals and related projects at eBay, WANdisco, and now PayPal. Hadoop open source contributor: • HDFS-3107: Introduce truncate to HDFS. • HDFS-4456: Add concat to HttpFS and WebHDFS. • HADOOP-10641: Introduce coordination / consensus interface to HDFS. • MAPREDUCE-2669: Add StandardDev, Mean, and Mode, examples to MapReduce. • Various bug fixes. Work on NameNode internals and distributed File System design.  Giraffa File System: https://github.com/GiraffaFS/giraffa  GeoDistributed File System (WANdisco Patent): https://patents.justia.com/patent/20150278244 ©2015 PayPal Inc. Confidential and proprietary. 3 Plamen Jeliazkov.
  • 4. Background Created as a means of storing petabytes order of data securely (through replication). By virtue of being a distributed file system, HDFS is seen as a safe haven for any type of data. However, HDFS does have its own scaling limitations: • “Limits are around 10,000 clients working on around 200 million files and directories, totaling around 500 million file system objects (inodes and blocks). Typically capping out around 20 PBs, though larger clusters do exist.”  https://www.usenix.org/publications/login/april-2010-volume-35-number-2/hdfs-scalability-limits-growth - Konstantin Shvachko, Therefore, HDFS is best used as a system for storing large single files of data. • Best case scenario is large files with large block sizes so that the NameNode has to store less metadata per raw storage. Because of the nature of having large sequential files it is also best used as a system for processing batch analytics or by applications that benefit from sequential reads / writes. ©2015 PayPal Inc. Confidential and proprietary. 4 The Hadoop Distributed File System.
  • 5. HDFS @ PayPal Customers tend to see HDFS as a giant black box. Dump and forget. Customers just want to store their data in the easiest manner. No storage optimization or security. • Do not like to build any sort of “clean-up” or TTL mechanisms into their applications. When space issues arise Hadoop Management lacks context: • What took up that space? (RCA required) • Who took up that space? (RCA required) • What targets can we look at for deleting quickly? (Small files, old files, empty files, specific user, etc.) Even in the event we catch wind of a data issue: • Difficult to determine which team or person is responsible. • Difficult to determine which datasets were affected. • Damage is already done. (Cluster performance degraded; quota hit; application deployed; etc. It’s already too late…) • Difficult to be pro-active, so we end up being re-active instead. Often times very late to react. ©2015 PayPal Inc. Confidential and proprietary. 5 My observations of HDFS data management pain points.
  • 6. Previous Architecture(s) ©2015 PayPal Inc. Confidential and proprietary. 6 The Old World Active NN Standby NN FsImage Processed Image Offline Image Viewer Kibana / Elastic Search 3 mins 90 mins 30 mins Legacy FsImage * This assumes a large enterprise Hadoop environment where the FsImage is larger than 20 GB. For smaller image sizes, this is trivial. * This architecture usually leads to generation of daily reports. This diagram is presentative of the fastest possible report generation.
  • 7. HDFS Usage Analytics Today Standby NameNode is forced to create a legacy FSImage. • This requires additional work by Standby NameNode to achieve. • This legacy image is created in addition to the regular, Protobuf’d, FSImage created for the active NN. • Storage redundancy solely for the purpose of performing analytics later. (We end up creating 2 FSImages per checkpoint – double storage cost, double IO cost, no instant benefit). • Legacy image retains less metadata than the Protobuf image. (No XAttrs, tokens, storage policies). Legacy-format FSImage is parsed and uploaded to Kibana or ElasticSearch. • This process typically happens once a day. • It takes approximately 15 to 20 minutes to fully parse a 25GB FSImage, about current size of large cluster FSImage. We have seen FSImages of over 30+ GB when things are bad. • Requires pulling the FSImage off the Standby NameNode. Network cost is not very high however. Making this process more frequent will increase network cost on the Standby. RPC issues seen if bandwidth saturated. • Image dump -> Parsing -> Processing can take anywhere between 2-3 hours. Only about 4-6 reports per day at best. Other third party solutions tend to follow the architecture described on this slide. ©2015 PayPal Inc. Confidential and proprietary. 7 My observations on the current “standard”.
  • 8. Engineering A New Solution In order to query near real time you require something like a constantly updating NameNode. • Attempting to do so in any distributed manner involves solving the distributed atomic rename or coordination. (Think HBase region transitions). • We cannot rely on parsing the FSImage and EditLogs as that adds too much processing time.  15-30 minutes to parse legacy FSImage and 1-2 minutes per large EditLog.  Protobuf parsing means loading the entire INode set into memory. To filter or query effectively requires parallel processing. • Assuming we can’t utilize a distributed system effectively, can we work with a single node? Yes. We can also utilize multiple CPU cores… • Java 8 Stream API allows simple filters, maps, reduces, collections on large parallelized in-memory data structures. • A single NameNode stores the entire metadata set in-memory already in such a structure. Do we need to build a whole new system? No. • We need to write some custom query engine logic but can re-use most HDFS data structures and logic. • We can keep our “NameNode” up to date using live cluster Journal Nodes. • We can simplify further by removing the RPC Server. No need for DataNodes or clients to connect to our “NameNode”. ©2015 PayPal Inc. Confidential and proprietary. 8 Combining old knowledge and new ideas.
  • 9. Inspiration from Dr. Elephant Dr. Elephant is a tool from LinkedIn for providing ”self-help” suggestions on how to tune various YARN applications in order to free up more capacity queue space and perform better. NNA was also conceived as a “self-help” tool. ©2015 PayPal Inc. Confidential and proprietary. 9 Ideas inspiring other ideas.
  • 10. Inspiration from Dr. Elephant ©2015 PayPal Inc. Confidential and proprietary. 10 Ideas inspiring other ideas.
  • 11. NameNode Analytics “A modified, isolated, read-only, Standby NameNode, with no RPC Server, but with a Web Server and custom query engine embedded inside it.” ©2015 PayPal Inc. Confidential and proprietary. 11 It can best be described as:
  • 12. Architecture ©2015 PayPal Inc. Confidential and proprietary. 12 Basic high-level view. Client NameNode Analytics (Off the cluster; isolated and read-only NN) JournalNodes (On the cluster) NameNode (On the cluster) (1) Query (0*) One-time Bootstrap Call (Fetch remote FsImage) (3) Response (*) EditLog Tailing (*) Writes editLog to JournalNodes * = conditional or “in the background” (2) Processing
  • 13. Architecture ©2015 PayPal Inc. Confidential and proprietary. 13 Deep dive view into NNA. NameNode Analytics Rest API (Spark Java Web Server) Java 8 Stream API (Query Processing) NameNode FSNamesystem (Image loading; editLog tailing / updating; and in-memory set) NameNode In-Memory Metadata Set (INode Tree) (GSet) Query EditLog-Tailer updates Response
  • 14. NNA @ PayPal NNA provides the information and an internal TICK stack keeps the historical data, visualizes, and takes action. (TICK stack is: telegraf, influxDB, chronograf, kapacitor) ©2015 PayPal Inc. Confidential and proprietary. 14 How do we utilize this?
  • 15. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 15 How do we utilize this?
  • 16. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 16 How do we utilize this?
  • 17. NNA @ PayPal ©2015 PayPal Inc. Confidential and proprietary. 17 How do we utilize this?
  • 18. NNA @ PayPal Who is creating the most empty files? Who is creating the most empty directories? Who are the biggest users of the file system in terms of file count or space usage? What are the largest directories by in terms of file count or space usage? Who is creating small files? (Greater than 0 bytes but much less than 1 block size). Who has the most “open permission” files? (chmod 777 abusers). What is the average file size under a particular directory? What files are open / being written to right now? ©2015 PayPal Inc. Confidential and proprietary. 18 How do we utilize this?
  • 19. NNA @ PayPal Tracking of quota usage. Tracking of old files. Tracking of small files / areas for archival or compression and compaction. Tracking of user last delegation token issued date. Tracking of File types (extensions). Per user usage reports and suggestions. Query against any dimension available in the HDFS INode(s). (In progress) AUTOMATED HDFS DATA MANAGEMENT. ©2015 PayPal Inc. Confidential and proprietary. 19 How do we utilize this?
  • 20. First detect, then fix. NNA is your detection tool. ©2015 PayPal Inc. Confidential and proprietary. 20
  • 21. Understanding NNA API NNA first asks you to define a set to work with; either the set of all files, or the set of all directories. Depending on which set you pick, different options are available to you. From there you build a set of filters to apply to that set and then finally some result you want to reduce to, the sum. • Take this example: /filter?set=files&filters=fileSize:eq:0&sum=count • "Starting with the set of all files, get all those that have a file size equal to zero, and count how many there are." • Or this example: /filter?set=files&filters=modTime:olderThanYears:1&sum=diskspaceConsumed • "Starting with the set of all files, get all those with a modification time older than 1 year, and sum up their diskspace usage." From there we allow even more complex groupings via a /histogram endpoint: • For example: /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count • "Starting with the set of all files, get all those that have a file size equal to zero, group them by user, and count how many there are. ©2015 PayPal Inc. Confidential and proprietary. 21 What do queries look like?
  • 22. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 22 For example… Graphing: Users by # of empty files they own /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count
  • 23. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 23 For example… Graphing: Users by # of empty directories they own /histogram?set=dirs&filters=dirNumChildren:eq:0&type=user&sum=count
  • 24. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 24 For example… Graphing: Users by # of small files /histogram?set=files&filters=fileSize:lte:1024&type=user&sum=count
  • 25. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 25 For example… Dumping: Files currently being written to /filter?set=files&filters=isUnderConstruction:eq:true&limit=1000
  • 26. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 26 For example… Histogram Binning: Size of Files vs Disk space consumed by Files
  • 27. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 27 For example… Histogram Binning: Disk space consumed by different replication factors
  • 28. Some Pictures ©2015 PayPal Inc. Confidential and proprietary. 28 For example… Histogram Binning: File Type Extensions
  • 29. Story Time! HDFS-11419 Slow addBlock operation on NameNode due to users writing into WARM StoragePolicy directories. Difficult to find all the WARM directories; impossible from legacy FsImage alone; very simple on NNA. Dump all WARM directory path from API: /filter?set=dirs&filters=storageType:eq:WARM NameNode Pushing Scalability Limits We were pushing the limits of the NameNode and close to going full GC. 400+ million files. 800+ million total file system objects. Difficult to find datasets to delete and little time. Find old datasets to delete: /histogram?set=files&filters=accessTime:olderThanYears:2&type=parentDir&sum=count Small File Prevention Midway through an imitative to find and clean-up small files from HDFS we found users were creating small files at the rate we were compressing and cleaning them. Difficult to find which users are creating small files. Find users by small files: /histogram?set=files&filters=fileSize:lte:1048576,accessTime:hoursAgo:24&type=user&sum=count ©2015 PayPal Inc. Confidential and proprietary. 29 When has NNA saved us?
  • 30. Successes Near real-time analysis. ‘nough said. For anyone wondering - the magic is in skipping the FSNamesystem lock and introducing multi-core processing. Easy to install and maintain. NNA’s Gradle build can construct RPM packages. Difficulty is about equal to that of bringing up a new, additional, Standby NameNode. Scalable? While NNA is not a distributed system, it is a replicated read-only copy. If you require more analytical throughput you could spin up multiple NNA instances. The Journal Nodes can handle many readers. ©2015 PayPal Inc. Confidential and proprietary. 30 Where has NNA won?
  • 31. Flaws It is still a NameNode. NNA is subject to all the faults and flaws of a regular HDFS NameNode. If you have too many files and blocks, your NNA instance will operate slower as a result. Interactive queries that don’t reduce the working set are not great for NNA. It is not a distributed system. While NNA can serve cached reports very frequently, it cannot handle many interactive queries at the same time. Queries are best used by admins while reports are best used by end users. It is “one of those” single-person projects. While I had assistance in coding, NNA was mostly a one person show. Fixing bugs and adding features over a period of nearly a year and a half now. There is plenty of work still to do and things to improve. ©2015 PayPal Inc. Confidential and proprietary. 31 NNA is not Perfect.
  • 32. Future Work ©2015 PayPal Inc. Confidential and proprietary. 32 Where can NNA go from here? HDFS-6382 : TTL In HDFS Discussion about TTL living outside the NameNode. Desire to not introduce TTL management due to additional thread resource requirements on active NameNode. NNA could be extended to provide a routine TTL service on top of it. HDFS-13150 : Faster Tailing of Edits from Journal Nodes Part of the work to make Standby NameNode(s) service reads is to reduce the latency between when an EditLog transaction is applied on the Active vs on the Standby. Reducing this latency means NNA queries become even closer to real time as well. HDFS Cluster Management Integration NNA is trivial enough to install that it should be able to easily create an Ambari package, Cloudera Parcel, or other integration package for your flavor of management consoles. Web & Security NNA supports LDAP only at the moment. Uses JSON Web Tokens to maintain sessions. Would any Security experts like to lend a hand? Support for Kerberos authentication would be great!
  • 33. Demo Example Local Cluster from Code ©2015 PayPal Inc. Confidential and proprietary. 33
  • 34. END (Q & A?) ©2015 PayPal Inc. Confidential and proprietary. 34