The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's master-slave architecture with a single NameNode master and multiple DataNode slaves. The NameNode manages filesystem metadata and data placement, while DataNodes store data blocks. The document outlines HDFS components like the SecondaryNameNode, DataNodes, and how files are written and read. It also discusses high availability solutions, operational tools, and the future of HDFS.
The slides were created for one University Program on Apache Hadoop + Apache Apex workshop.
It explains almost all the hdfs related commands in details along with the examples.
The document starts with the introduction for Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN).
It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model.
The slides were created for one University Program on Apache Hadoop + Apache Apex workshop.
It explains almost all the hdfs related commands in details along with the examples.
The document starts with the introduction for Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN).
It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Hadoop is a well-known framework used for big data processing now-a-days. It implements MapReduce for processing and utilizes distributed file system known as Hadoop Distributed File System (HDFS) to store data. HDFS provides fault tolerant, distributed and scalable storage for big data so that MapReduce can easily perform jobs on this data. Knowledge and understanding of data storage over HDFS is very important for a researcher working on Hadoop for big data storage and processing optimization. The aim of this presentation is to describe the architecture and process flow of HDFS. This presentation highlights prominent features of this file system implemented by Hadoop to execute MapReduce jobs. Moreover the presentation provides the description of process flow for achieving the design objectives of HDFS. Future research directions to explore and improve HDFS performance are also elaborated on.
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. This presentation will describe the features and implementation of HDFS Federation scheduled for release with Hadoop-0.23.
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
Hadoop Summit, April 2014
Amsterdam, Netherlands
Just as the survival of living species depends on the transfer of essential knowledge within the community and between generations, the availability and reliability of a distributed computer system relies upon consistent replication of core metadata between its components. This presentation will highlight the implementation of a replication technique for the namespace of the Hadoop Distributed File System (HDFS). In HDFS, the namespace represented by the NameNode is decoupled from the data storage layer. While the data layer is conventionally replicated via block replication, the namespace remains a performance and availability bottleneck. Our replication technique relies on quorum-based consensus algorithms and provides an active-active model of high availability for HDFS where metadata requests (reads and writes) can be load-balanced between multiple instances of the NameNode. This session will also cover how the same techniques are extended to provide replication of metadata and data between geographically distributed data centers, providing global disaster recovery and continuous availability. Finally, we will review how consistent replication can be applied to advance other systems in the Apache Hadoop stack; e.g., how in HBase coordinated updates of regions selectively replicated on multiple RegionServers improve availability and overall cluster throughput.
This presentation will make reader understand about the flow mechanism of data in the HDFS cluster with some basic points discussed on Resource Management.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part (HDFS) and a processing part (MapReduce).
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Hadoop is a well-known framework used for big data processing now-a-days. It implements MapReduce for processing and utilizes distributed file system known as Hadoop Distributed File System (HDFS) to store data. HDFS provides fault tolerant, distributed and scalable storage for big data so that MapReduce can easily perform jobs on this data. Knowledge and understanding of data storage over HDFS is very important for a researcher working on Hadoop for big data storage and processing optimization. The aim of this presentation is to describe the architecture and process flow of HDFS. This presentation highlights prominent features of this file system implemented by Hadoop to execute MapReduce jobs. Moreover the presentation provides the description of process flow for achieving the design objectives of HDFS. Future research directions to explore and improve HDFS performance are also elaborated on.
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. This presentation will describe the features and implementation of HDFS Federation scheduled for release with Hadoop-0.23.
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
Hadoop Summit, April 2014
Amsterdam, Netherlands
Just as the survival of living species depends on the transfer of essential knowledge within the community and between generations, the availability and reliability of a distributed computer system relies upon consistent replication of core metadata between its components. This presentation will highlight the implementation of a replication technique for the namespace of the Hadoop Distributed File System (HDFS). In HDFS, the namespace represented by the NameNode is decoupled from the data storage layer. While the data layer is conventionally replicated via block replication, the namespace remains a performance and availability bottleneck. Our replication technique relies on quorum-based consensus algorithms and provides an active-active model of high availability for HDFS where metadata requests (reads and writes) can be load-balanced between multiple instances of the NameNode. This session will also cover how the same techniques are extended to provide replication of metadata and data between geographically distributed data centers, providing global disaster recovery and continuous availability. Finally, we will review how consistent replication can be applied to advance other systems in the Apache Hadoop stack; e.g., how in HBase coordinated updates of regions selectively replicated on multiple RegionServers improve availability and overall cluster throughput.
This presentation will make reader understand about the flow mechanism of data in the HDFS cluster with some basic points discussed on Resource Management.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part (HDFS) and a processing part (MapReduce).
Snapshots In Red Hat Storage Server Overview & QuickstartRed_Hat_Storage
"The snapshots feature in Red Hat Storage Server provides an integrated solution for point-in-time recovery of GlusterFS volumes. Join us to check it out. And while you’ll be exposed to architectural details of how snapshots are implemented,this session won’t be a deep dive into snapshot internals.
In this session, you’ll learn:
How to configure and use snapshots in Red Hat Storage Server.
How to do point-in-time recovery using snapshots.
The architectural overview of snapshots in Red Hat Storage Server.
The key features and offerings in Red Hat Storage Server snapshots.
Best practice recommendations for configuring Red Hat Storage Server volumes for optimal use with snapshots.
The limitations in the snapshots feature release.
Some important issues to consider when migrating from older Red Hat Storage Server releases to a snapshot-supported one.
The product roadmap and extensions to snapshots.
"
The Domain Name System (DNS) is a critical part of Internet infrastructure and the largest distributed Internet directory service. DNS translates names to IP addresses, a required process for web navigation, email delivery, and other Internet functions. However, the DNS infrastructure is not secure enough unless the security mechanisms such as Transaction Signatures (TSIG) and DNS Security Extensions (DNSSEC) are implemented. To guarantee the availability and the secure Internet services, it is important for networking professionals to understand DNS concepts, DNS Security, configurations, and operations.
This course will discuss the concept of DNS Operations in detail, mechanisms to authenticate the communication between DNS Servers, mechanisms to establish authenticity, and integrity of DNS data and mechanisms to delegate trust to public keys of third parties. Participant will be involved in Lab exercises and do configurations based on number of scenarios.
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
3 examples for Big Data analytics containerized:
1. The installation with Docker and Weave for small and medium,
2. Hadoop on Mesos w/ Appache Myriad
3. Spark on Mesos
How to improve the network of Docker? How to integrate with OpenVSwitch? How to apply more fine-grained QoS limitation and monitor the resource usage of containers? How to improve Docker in non-intrusive ways? This slides shared by Li Yulai, the Chief Architect of SpeedyCloud, is to answer questions above. Visit http://www.speedycloud.cn to find out more.
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
A Hanborq optimized Hadoop Distribution, especially with high performance of MapReduce. It's the core part of HDH (Hanborq Distribution with Hadoop for Big Data Engineering).
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
4. What is HDFS?
• Hadoop Distributed FileSystem
• Good For:
Large Files
Streaming Data Access
• NOT For:
x Lots of Small Files
x Random Access
x Low-latency Access
4
5. Design of HDFS
• GFS-like
– http://research.google.com/archive/gfs.html
• Master-slave design
– Master
• Single NameNode for managing FS meta
– Slaves
• Multiple DataNode s for storing data
– One more:
• SecondaryNameNode for checkpointing
5
7. HDFS Storage
• HDFS Files are broken into Blocks
– Basic unit of reading/writing like disk block
– Default to 64MB, may be larger in product env.
– Make HDFS good for large file & high throughput
• Block may have multiple Replicas
– One block stored as multiple locations
– Make HDFS storage fault tolerant
7
15. Load FSImage
• Name Directory
– dfs.name.dir: can be multiple dirs
– Check consistence of all name dirs
– Load fsimage file
– Load edit logs
– Save namespace
• Mainly setup dirs & files properly
15
16. Check Safemode
• Safemode
– Fsimage loaded but locations of blocks not known
yet!
– Exit when minimal replication condition meet
• dfs.safemode.threshold.pct
• dfs.replication.min
• Default case: 99.9% of block have 1 replicas
– Start SafeModeMonitor to periodically check to
leave safe mode
– Leave safe mode manually
• hadoop dfsadmin -safemode leave
• (or enter it /get status by: hadoop dfsadmin -safemode
enter/get)
16
17. Start Daemons
• HeartbeatMonitor
– Check lost DN & schedule necessary replication
• LeaseManager
– Check lost lease
• ReplicationMonitor
– computeReplicationWork
– computeInvalidateWork
– dfs.replication.interval, defautl to 3 secs
• DecommissionManager
– Check and set node decommissioned
17
18. Trash Emptier
• /user/{user.name}/.Trash
– fs.trash.interval > 0 to enable
– When delete, file moved to .Trash
• Trash.Empiter
– Run every fs.trash.interval mins
– Delete old checkpoint (fs.trash interval mins ago)
18
20. SecondaryNameNode
• Not Standby/Backup NameNode
– Only for checkpointing
– Though, has a NON-Realtime copy of FSImage
• Need as much memory as NN to do the
checkpointing
– Estimation: 1GB for every one million blocks
20
21. SecondaryNameNode
• Do the checkpointing
– Copy NN’s fsimage &
editlogs
– Merge them to a new
fsimage
– Replace NN’s fsimage with
new one & clean editlogs
• Timing
– Size of editlog >
fs.checkpoint.size (poll
every 5 min)
– Every fs.checkpoint.period
secs
21
23. DataNode
• Store data blocks
– Have no knowledge about FSName
• Receive blocks from Client
• Receive blocks from DataNode peer
– Replication
– Pipeline writing
• Receive delete command from NameNode
23
24. Block Placement Policy
On Cluster Level
• replication = 3
– First replica local
with Client
– Second & Third
on two nodes of
same remote rack
24
25. Block Placement Policy
On one single node
• Write each disk in turn
– No balancing is considered !
• Skip a disk when it’s almost full or failed
• DataNode may go offline when disks failed
– dfs.datanode.failed.volumes.tolerated
25
26. DataNode Startup
• On DN Startup:
– Load data dirs
– Register itself to NameNode
– Start IPC Server
– Start DataXceiverServer
• Transfer blocks
– Run the main loop …
• Start BlockScanner
• Send heartbeats
• Process command from NN
• Send block report
26
31. Write File
• DFSClient.create
– NameNode.create
• Check existence
• Check permission
• Check and get Lease
• Add new INode to rootDir
31
32. Write File
• outputStream.write
– Get DNs to write to From NN
– Break bytes into packets
– Write packets to First DataNode’s DataXceiver
– DN mirror packet to downstream DNs (Pipeline)
– When complete, confirm NN blockReceived
32
35. Lease
• What is lease ?
– Write lock for file modification
– No lease for reading files
• Avoid concurrent write on the same file
– Cause inconsistent & undefined behavior
35
36. Lease
• LeaseManager
– Lease is managed in NN
– When file create (or append), lease added
• DFSClient.LeaseChecker
– Client start thread to renew lease periodically
36
37. Lease Expiration
• Soft Limit
– No renewing for 1 min
– Other client compete for the lease
• Hard Limit
– No renewing for 60 min (60 * softLimit)
– No competition for the lease
37
40. Read File
• DFClient.open
– Create FSDataInputStream
• Get block locations of file from NN
• FSDataInputStream.read
– Read data from DNs block by block
• Read the data
• Do the checksum
40
41. Desc Repl
• Code Sample
DFSClient dfsclient = …;
dfsclient.setReplication(…, 2) ;
• Or use the CLI
hadoop fs -setrep -w 2 /path/to/file
41
43. Desc Repl
• Change FSName replication factor
• Choose excess replicas
– Rack number do not decrease
– Get block from least available disk space node
• Add to invalidateSets(to-be-deleted block set)
• ReplicationMonitor compute blocks to be deleted
for each DN
• On next DN’s heartbeat, give delete block
command to DN
• DN delete specified blocks
• Update blocksMap when DN send blockReport
43
44. One DN down
• DataNode stop sending heartbeat
• NameNode
– HeartbeatMonitor find DN dead when doing heartbeat
check
– Remove all blocks belong to DN
– Update neededReplications (block set need one or more
replication)
– ReplicationMonitor compute block to be replicated for
each DN
– On next DN’s heartbeat, NameNode send replication block
command
• DataNode
– Replicate block
44
46. High Availability
• NameNode SPOF
– NameNode hold all the meta
– If NN crash, the whole cluster unavailable
• Though fsimage can recover from SNN
– It’s not a up-to-date fsimage
• Need HA solutions
46
48. HA - DRBD
• DRBD (http://www.drbd.org)
– block devices designed as a building block to form
high availability (HA) clusters.
– Like network based raid-1
• Use DRBD to backup NN’s fsimage & editlogs
– A cold backup for NN
– Restart NN cost no more than 10 minutes
48
49. HA - DRBD
• Mirror one of NN’s name dir to a remote node
– All name dir is the same
• When NN fail
– Copy mirrored name dir to all name dir
– Restart NN
– All will be done in no more than 20 mins
49
51. HA - AvatarNode
• Complete Hot Standby
– NFS for storage of fsimage and editlogs
– Standby node Consumes transactions from
editlogs on NFS continuously
– DataNodes send message to both primary and
standby node
• Fast Switchover
– Less than a minute
51
52. HA - AvatarNode
• Active-Standby Pair Client
– Coordinated via ZooKeeper
– Failover in few seconds Client retrieves block
location from Primary
– Wrapper over NameNode or Standby
• Active AvatarNode Active
Write
transaction
Read
transaction Standby
AvatarNode
– Writes transaction log to NFS AvatarNode
(NameNode) NFS (NameNode)
filer
Filer
• Standby AvatarNode
Block Block
– Reads/Consumes Location Location
transactions from NFS filer messages messages
– Processes all messages from
DataNodes DataNodes
– Latest metadata in memory
52
53. HA - AvatarNode
• Four steps to failover
– Wipe ZooKeeper entry. Clients will know the failover
is in progress. (0 seconds)
– Stop the primary NameNode. Last bits of data will be
flushed to Transaction Log and it will die. (Seconds)
– Switch Standby to Primary. It will consume the rest of
the Transaction log and get out of SafeMode ready to
serve traffic. (Seconds)
– Update the entry in ZooKeeper. All the clients waiting
for failover will pick up the new connection (0 seconds)
• After: Start the first node in the Standby Mode
– Takes a while, but the cluster is up and running
53
56. HA - BackupNode
• NN synchronously
streams Client
transaction log to Client retrieves block location from
BackupNode NN
• BackupNode applies log NN
Synchronous stream
transacton logs to
to in-memory and disk (NameNode) BN
image
BN
• BN always commit to disk Block
Location (BackupNode)
messages
before success to NN
• If BN restarts, it has to
catch up with NN
DataNodes
56
58. Tools - Balancer
• Need Re-Balancing
– When new node is add to cluster
• bin/start-balancer.sh
– Move block from over-utilized node to under-utilized node
• dfs.balance.bandwidthPerSec
– Control the impact on business
• -t <threshold>
– Default 10%
– stop when difference from average utilization is less than
threshold
58
60. Tools - Distcp
• Inter-cluster copy
– hadoop distcp -i –pp -log /logdir
hdfs://srcip/srcpath/ /destpath
– Use map-reduce(actually maps) to start a
distributed-fashion copy
• Also fast copy in the same cluster
60
62. Hadoop Future
• Short-circuit local reads
– dfs.client.read.shortcircuit = true
– Available in hadoop-1.x or cdh3u4
• Native checksums (HDFS-2080)
• BlockReader keepalive to DN (HDFS-941)
• “Zero-copy read” support (HDFS-3051)
• NN HA (HDFS-3042)
• HDFS Federation
• HDFS RAID
62
63. References
• Tom White, Hadoop The definitive guide
• http://hadoop.apache.org/hdfs/
• Hadoop WiKi – HDFS
– http://wiki.apache.org/hadoop/HDFS
• Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, The
Google File System
– http://research.google.com/archive/gfs.html
• Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert
Chansler , The Hadoop Distributed File System
– http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
63