* The file size is 1664MB
* HDFS block size is usually 128MB by default in Hadoop 2.0
* To calculate number of blocks required: File size / Block size
* 1664MB / 128MB = 13 blocks
* 8 blocks have been uploaded successfully
* So remaining blocks = Total blocks - Uploaded blocks = 13 - 8 = 5
If another client tries to access/read the data while the upload is still in progress, it will only be able to access the data from the 8 blocks that have been uploaded so far. The remaining 5 blocks of data will not be available or visible to other clients until the full upload is completed. HDFS follows write-once semantics, so partial
The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.
In this session you will learn:
1. History of hadoop
2. Hadoop Ecosystem
3. Hadoop Animal Planet
4. What is Hadoop?
5. Distinctions of hadoop
6. Hadoop Components
7. The Hadoop Distributed Filesystem
8. Design of HDFS
9. When Not to use Hadoop?
10. HDFS Concepts
11. Anatomy of a File Read
12. Anatomy of a File Write
13. Replication & Rack awareness
14. Mapreduce Components
15. Typical Mapreduce Job
The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.
In this session you will learn:
1. History of hadoop
2. Hadoop Ecosystem
3. Hadoop Animal Planet
4. What is Hadoop?
5. Distinctions of hadoop
6. Hadoop Components
7. The Hadoop Distributed Filesystem
8. Design of HDFS
9. When Not to use Hadoop?
10. HDFS Concepts
11. Anatomy of a File Read
12. Anatomy of a File Write
13. Replication & Rack awareness
14. Mapreduce Components
15. Typical Mapreduce Job
Enroll Free Live demo of Hadoop online training and big data analytics courses online and become certified data analyst/ Hadoop developer. Get online Hadoop training & certification.
I have studied on Big Data analysis and found Hadoop is the best technology and most popular as well for it's distributed data processing approaches. I have gathered all possible information about various Hadoop distributions available in the market and tried to describe most important tools and their functionality in the Hadoop echosystems in this slide show. I have also tried to discuss about connectivity with language R interm of data analysis and visualization perspective. Hope you will be enjoying the whole!
Enroll Free Live demo of Hadoop online training and big data analytics courses online and become certified data analyst/ Hadoop developer. Get online Hadoop training & certification.
I have studied on Big Data analysis and found Hadoop is the best technology and most popular as well for it's distributed data processing approaches. I have gathered all possible information about various Hadoop distributions available in the market and tried to describe most important tools and their functionality in the Hadoop echosystems in this slide show. I have also tried to discuss about connectivity with language R interm of data analysis and visualization perspective. Hope you will be enjoying the whole!
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
2. Big Data Platforms
• Hadoop
• Architecture
• Storage
• Resource navigator
• Computations
• Ecosystems
• HBASE
• HIVE
• ZOOKEEPER
• MOSES
• … Etc.
• Spark
• Architecture
• Concept of RDDs
• Spark Streaming
• Spark Mlib
• Spark SQL
• Eco systems
In This presentation
Hadoop Architecture, Storage,
Resource Navigator, and
Computations
Yet to come
3. Introduction
• In the “distributed data” world, the terms Spark, Hadoop, and Kafka should sound familiar.
• However, with numerous big data solutions available,
• it may be unclear exactly what they are, their main differences, and which is better.
• Determine
• what kinds of applications, such as machine learning, distributed streaming, and data storage
that you can expect to make effective and efficient by using Hadoop, Spark, and Kafka.
4. What is Hadoop?
Hadoop is an open-source software that stores massive amounts of data while running large numbers of
commodity-grade computers to tackle tasks that are too large for a single computer to process on its own.
Store and compute:
can be used to write software that stores data or runs computations across hundreds or thousands of
machines without needing to know the details of what each machine can do, or how it can communicate.
Error Handling:
designed to handle them within the framework itself, which significantly reduces the amount of error
handling necessary within your solution.
Key components
At its most basic level of functionality, Hadoop includes the standard libraries between its modules, the file
system HDFS (Hadoop Distributed File System), YARN (Yet Another Resource Negotiator), and its
implementation of MapReduce.
5. The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models. It is designed to scale up from single servers to
thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a
highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
• Hadoop Common: The common utilities that support the other Hadoop modules.
• Hadoop Distributed File System (HDFS™): A distributed file system that provides high- throughput access to
application data.
• Hadoop YARN: A framework for job scheduling and cluster resource management.
• Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
6. Hadoop-related Apache Projects
• Ambari™: A web-based tool for provisioning, managing, and monitoring Hadoop clusters. It also
provides a dashboard for viewing cluster health and ability to view MapReduce, Pig and Hive
applications visually.
• Avro™: A data serialization system.
• Cassandra™: A scalable multi-master database with no single points of failure.
• Chukwa™: A data collection system for managing large distributed systems.
• HBase™: A scalable, distributed database that supports structured data storage for large tables.
• Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
• Mahout™: A Scalable machine learning and data mining library.
7. Hadoop-related Apache Projects
• Pig™: A high-level data-flow language and execution framework for parallel computation.
• Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and
expressive programming model that supports a wide range of applications, including ETL,
machine learning, stream processing, and graph computation.
• Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a
powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch
and interactive use-cases.
• ZooKeeper™: A high-performance coordination service for distributed applications.
11. Common Use Cases for Big Data in Hadoop
• Log Data Analysis
most common, fits perfectly for HDFS scenario: Write once & Read often.
• Data Warehouse Modernization
• Fraud Detection
• Risk Modeling
• Social Sentiment Analysis
• Image Classification
• Graph Analysis
• Beyond
12.
13. Data Storage Operations on HDFS
• Hadoop is designed to work best with a modest number of
extremely large files.
• Average file sizes ➔ larger than 500MB.
• Write Once, Read Often model.
• Content of individual files cannot be modified, other than
appending new data at the end of the file.
What we can do:
• Create a new file
• Append content to the end of a file
• Delete a file
• Rename a file
• Modify file attributes like owner
14.
15. HDFS Deamons
NameNode
Keeps the metadata of all files/blocks in the file system, and tracks where across the cluster the file
data is kept.
It does not store the data of these files itself. Kind of block lookup dictionary (index or address book
of blocks).
Client applications talk to the NameNode whenever they wish to locate a file, or when they want to
add/copy/move/delete a file.
The NameNode responds the successful requests by returning a list of relevant DataNode servers
where the data lives.
16. HDFS Deamons
DataNode
DataNode stores data in the Hadoop Filesystem
A functional filesystem has more than one DataNode, with data replicated across them
On startup, a DataNode connects to the NameNode; spinning until that service comes up.
It then responds to requests from the NameNode for filesystem operations.
Client applications can talk directly to a DataNode, once the NameNode has provided the location of
the data
17. HDFS Deamons
Secondry NameNode
Not a failover NameNode
The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary
name-node periodically
downloads current name-node image and edits log files, joins them into new image and uploads
the new image back to the (primary and the only) name-node
Default checkpoint time is one hour. It can be set to one minute on highly busy clusters where lots
of write operations are being performed.
18. HDFS blocks
• File is divided into blocks (default: 64MB) and duplicated in multiple places. 128MB in Hadoop
2.0.
• Dividing into blocks is normal for a file system. E.g., the default block size in Linux is 4KB.
• The difference of HDFS is the scale. Hadoop was designed to operate at the petabyte scale.
• Every data block stored in HDFS has its own metadata and needs to be tracked by a central server
19. HDFS Blocks
When HDFS stores the replicas of the original blocks across the
Hadoop cluster, it tries to ensure that the block replicas are stored
in different failure points.
21. Data Node
• Data Replication:
• HDFS is designed to handle large scale data in distributed environment
• Hardware or software failure, or network partition exist
• Therefore need replications for those fault tolerance
• Replication placement:
• High initialization time to create replication to all machines
• An approximate solution: Only 3 replications
One replication resides in current node
One replication resides in current rack
One replication resides in another rack
24. Data Replication
Re-replicating missing replicas
• Missing heartbeats signify lost nodes
• Name node consults metadata, finds affected data
• Name node consults rack awareness script
• Name node tells a data node to re replicate
25. Name Node Failure
• NameNode is the single point of failure in the cluster- Hadoop 1.0
• If NameNode is down due to software glitch, restart the machine
• If original NameNode can be restored, secondary can re-establish the most current metadata
snapshot
• If machine don’t come up, metadata for the cluster is irretrievable. In this situation create a new
NameNode, use secondary to copy metadata to new primary, restart whole cluster
26. • Before Hadoop 2.0, NameNode was a single point of failure and operation limitation.
• Before Hadoop 2, Hadoop clusters usually have fewer clusters that were able to scale beyond 3,000 or
4,000 nodes.
• Multiple NameNodes can be used in Hadoop 2.x. (HDFS High Availability feature – one is in an Active
state, the other one is in a Standby state).
27. High Availability of the NameNodes
Standby NameNode – keeping the state of the block
locations and block metadata in memory
JournalNode – if a failure occurs, the Standby Node
reads all completed journal entries to
ensure the new Active NameNode is fully consistent
with the state of cluster.
Zookeeper – provides coordination and
configuration services for distributed systems.
28. Several useful commands for HDFS
All hadoop commands are invoked by the bin/hadoop script.
% hadoop fsck / -files –blocks:
➔ list the blocks that make up each file in HDFS.
For HDFS, the schema name is hdfs, and for the local file system, the schema name is file.
A file or director in HDFS can be specified in a fully qualified way, such as:
hdfs://namenodehost/parent/child or hdfs://namenodehost
The HDFS file system shell command is similar to Linux file commands, with the following general
syntax: hadoop hdfs –file_cmd
For instance mkdir runs as:
$hadoop hdfs dfs –mkdir /user/directory_name
30. Calculating HDFS nodes storage
• Key players in computing HDFS node storage
• H = HDFS Storage size
• C = Compression ratio. It depends on the type of compression used and size of the data. When no
compression is used, C=1.
• R = Replication factor. It is usually 3 in a production cluster.
• S = Initial size of data need to be moved to Hadoop. This could be a combination of historical data and
incremental data.
• i = intermediate data factor. It is usually 1/3 or 1/4. It is Hadoop’s Intermediate working space dedicated
to storing intermediate results of Map Tasks are any temporary storage used in Pig or Hive. This is a
common guidlines for many production applications. Even Cloudera has recommended 25% for
intermediate results.
31. Calculating Initial Data
• This could be a combination of historical data and incremental data.
• In this, we need to consider the growth rate of Initial Data as well, at least for next 3-6 months
period,
• Like we have 500 TB data now, and it is expected that 50 TB will be ingested in next three
months, and Output files from MR Jobs may create at least 10 % of the initial data, then we
need to consider 600 TB as the initial data size.
• i.e., 500 TB + 50 TB +500*10/100 = 600 TB initial size
• Now if we have nodes having size of 8TBs, How many nodes will be needed. Number of data
nodes (n): n = H/d (d= disk space available per node.) = 600/8 (without considering replication
and intermediate data factors along with compression techniques that may be employed)
• Question: Is it feasible to use 100% disk space?
32. Estimating size for Hadoop storage based on
initial data
• Suppose you have to upload X GBs of data into HDFS (Hadoop 2.0). with no compression,
a Replication factor of 3, and Intermediate factor of 0.25 = ¼. Compute how many times Hadoop’s
storage will be increased with respect to initial data i.e., X GBs.
• H = (3+1/4)*X = 3.25*X
With the assumptions above, the Hadoop storage is estimated to be 3.25 times the size of the
initial data size.
H = HDFS Storage size
C = Compression ratio. When no compression is used, C=1.
R = Replication factor. It is usually 3 in a production cluster.
S = Initial size of data need to be moved to Hadoop. This could be a combination of historical data and incremental data.
i = intermediate data factor. It is usually 1/3 or 1/4. when no information is given assume it as zero.
33. If 8TB is the available disk space per node (10 disks with 1 TB, 2 disk for operating system etc. were
excluded.). Assuming initial data size is 600 TB. How will you estimate the number of data nodes (n)?
• Estimating the hardware requirement is always challenging in Hadoop environment because we never know
when data storage demand can increase for a business.
• We must understand following factors in detail to come to a conclusion for the current scenario of adding right
numbers to the cluster:
• The actual size of data to store – 600 TB
• At what pace the data will increase in the future (per day/week/month/quarter/year) – Data trending
analysis or business requirement justification (prediction)
• We are in Hadoop world, so replication factor plays an important role – default 3x replicas
• Hardware machine overhead (OS, logs etc.) – 2 disks were considered
• Intermediate mapper and reducer data output on hard disk - 1x
• Space utilization between 60 % to 70 % - Finally, as a perfect designer we never want our hard drive to be
full with their storage capacity.
• Compression ratio
34. Calculation to find the number of data nodes
required to store 600 TB of data
• Rough Calculation
• Data Size – 600 TB
• Replication factor – 3
• Intermediate data – 1
• Total Storage requirement – (3+1) * 600 = 2400 TB
• Available disk size for storage – 8 TB
• Total number of required data nodes (approx.): n = H/d
• 2400/8 = 300 machines
The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper
nodes. This is typically a temporary directory location which can be setup in config by the hadoop administrator.
The intermediate data is cleaned up after the Hadoop Job complete
35. Calculation to find the number of data nodes
required to store 600 TB of data
• Actual Calculation:
• Disk space utilization – 65 % (differ business to business)
• Compression ratio – 2.3
• Total Storage requirement – 2400/2.3 = 1043.5 TB
• Available disk size for storage – 8*0.65 = 5.2 TB
• Total number of required data nodes (approx.): 1043.5/5.2 = 201 machines
36. Case: Business has predicted 20 % data increase in a quarter
and we need to predict the new machines to be added in a
year.
• Data increase – 20 % over a quarter
• 1st quarter: 1043.5 * 0.2 = 208.7 TB
• 2nd quarter: 1043.5 * 1.2 * 0.2 =
250.44 TB
• 3rd quarter: 1043.5 * (1.2)^2 * 0.2 =
300.5 TB
• 4th quarter: 1043.5 * (1.2)^3 * 0.2 =
360.6 TB
• Additional data nodes requirement
(approx.):
• 1st quarter: 208.7/5.2 = 41 machines
• 2nd quarter: 250.44/5.2 = 49 machines
• 3rd quarter: 300.5/5.2 = 58 machines
• 4th quarter: 360.6/5.2 = 70 machines
Compound Interest Formula: A = P (1 + R/100)ⁿ * percentage of increase
Here, P = 1043.5, R = 20 % Quarterly and n = increment after every quarter.
37. Thought Question
• Imagine that you are uploading a file of 1664MB into HDFS (Hadoop 2.0).
• 8 blocks are successfully uploaded into HDFS Please find how many blocks are remaining.
• Another client wants to work or read the uploaded data while the upload is still in progress
i.e., data which is updated in 8 blocks. What will happen in such a scenario, will the 8 blocks
of data that is uploaded will it be displayed or available for use?