SlideShare a Scribd company logo

Managing Big data with Hadoop

The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.

1 of 24
Download to read offline
MANAGING BIG DATA WITH 
HADOOP 
Presented by: 
Nalini Mehta 
Student(MLVTEC Bhilwara) 
Email: nalinimehta52@gmail.com
Introduction 
Big Data: 
•Big data is a term used to describe the voluminous amount of unstructured and 
semi-structured data . 
•Data that would take too much time and cost too much money to load into a 
relational database for analysis. 
• Big data doesn't refer to any specific quantity, the term is often used when 
speaking about petabytes and exabytes of data.
Managing Big data with Hadoop
General framework of Big Data 
Networking 
 The driving force behind 
the implementation of Big 
data is both infrastructure 
and analytics which 
together constitutes the 
software. 
 Hadoop is the Big Data 
management software 
which is used to 
distribute, catalogue 
manage and query data 
across multiple, 
horizontally scaled server 
nodes.
Managing Big Data
Overview of Hadoop 
• Hadoop is a platform for 
processing large amount of 
data in distributed fashion. 
• It provides scheduling and 
resource management 
framework to execute the 
map and to reduce phases 
in the cluster environment. 
• Hadoop Distributed File is 
Hadoop’s data storage layer 
which is designed to handle 
the petabytes and exabytes 
of data distributed over 
multiple nodes in parallel.
Ad

Recommended

1 introduction databases and database users
1 introduction databases and database users1 introduction databases and database users
1 introduction databases and database usersKumar
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 
03 Ch3 Notes Revised
03 Ch3 Notes Revised03 Ch3 Notes Revised
03 Ch3 Notes Revisedguest6f408c
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)Ameer Sameer
 

More Related Content

What's hot

2 database system concepts and architecture
2 database system concepts and architecture2 database system concepts and architecture
2 database system concepts and architectureKumar
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusemailharmeet
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment>. <
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxRadhika R
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management SystemAjay Jha
 
Database systems - Chapter 1
Database systems - Chapter 1Database systems - Chapter 1
Database systems - Chapter 1shahab3
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbmssethu pm
 

What's hot (20)

2 database system concepts and architecture
2 database system concepts and architecture2 database system concepts and architecture
2 database system concepts and architecture
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
History of Database
History  of DatabaseHistory  of Database
History of Database
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
Object oriented database
Object oriented databaseObject oriented database
Object oriented database
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
DBMS
DBMSDBMS
DBMS
 
Schema
SchemaSchema
Schema
 
Temporal database
Temporal databaseTemporal database
Temporal database
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
database
databasedatabase
database
 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management System
 
Database systems - Chapter 1
Database systems - Chapter 1Database systems - Chapter 1
Database systems - Chapter 1
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbms
 

Similar to Managing Big data with Hadoop

Similar to Managing Big data with Hadoop (20)

hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 

Recently uploaded

20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.pptMohanumar S
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...GauravBhartie
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPTdinesh babu
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxADILRASHID54
 
Nexus - Final Day 12th February 2024.pptx
Nexus - Final Day 12th February 2024.pptxNexus - Final Day 12th February 2024.pptx
Nexus - Final Day 12th February 2024.pptxRohanAgarwal340656
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证muvgemo
 
CDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdfCDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdf8-koi
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfartpoa9
 
Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Marian Marinov
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balajiSriBalaji891607
 
Get start with Machine Learning and Vertexai
Get start with Machine Learning and VertexaiGet start with Machine Learning and Vertexai
Get start with Machine Learning and VertexaiAshishChanchal1
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsPurva Nikam
 
MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024Tamer Emara
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignteddymebratie
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxnikshaikh786
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaIgnacio J. Palma, Arch PhD.
 
Introduction to the telecom tower industry
Introduction to the telecom tower industryIntroduction to the telecom tower industry
Introduction to the telecom tower industryssuserf5bbfd
 
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxCenter Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxsjzzztc
 
Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0RaishKhanji
 
ExtraordinAIre Monthly Newsletter Jan 2024
ExtraordinAIre Monthly Newsletter Jan 2024ExtraordinAIre Monthly Newsletter Jan 2024
ExtraordinAIre Monthly Newsletter Jan 2024Savipriya Raghavendra
 

Recently uploaded (20)

20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
20CE501PE – INDUSTRIAL WASTE MANAGEMENT.ppt
 
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
Microstrip Bandpass Filter Design using EDA Tolol such as keysight ADS and An...
 
Metrology Measurements and All units PPT
Metrology Measurements and  All units PPTMetrology Measurements and  All units PPT
Metrology Measurements and All units PPT
 
Eversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptxEversendai - HSE Performance Management Systems-R1.pptx
Eversendai - HSE Performance Management Systems-R1.pptx
 
Nexus - Final Day 12th February 2024.pptx
Nexus - Final Day 12th February 2024.pptxNexus - Final Day 12th February 2024.pptx
Nexus - Final Day 12th February 2024.pptx
 
【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证【文凭定制】坎特伯雷大学毕业证学历认证
【文凭定制】坎特伯雷大学毕业证学历认证
 
CDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdfCDE_Sustainability Performance_20240214.pdf
CDE_Sustainability Performance_20240214.pdf
 
Deluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdfDeluck Technical Works Company Profile.pdf
Deluck Technical Works Company Profile.pdf
 
Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?Microservices: Benefits, drawbacks and are they for me?
Microservices: Benefits, drawbacks and are they for me?
 
self introduction sri balaji
self introduction sri balajiself introduction sri balaji
self introduction sri balaji
 
Get start with Machine Learning and Vertexai
Get start with Machine Learning and VertexaiGet start with Machine Learning and Vertexai
Get start with Machine Learning and Vertexai
 
Laser And its Application's - Engineering Physics
Laser And its Application's - Engineering PhysicsLaser And its Application's - Engineering Physics
Laser And its Application's - Engineering Physics
 
MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024MedTech R&D - Tamer Emara - resume @2024
MedTech R&D - Tamer Emara - resume @2024
 
chap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processignchap. 3. lipid deterioration oil and fat processign
chap. 3. lipid deterioration oil and fat processign
 
Module 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptxModule 2_ Divide and Conquer Approach.pptx
Module 2_ Divide and Conquer Approach.pptx
 
Architectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi ArabiaArchitectural Preservation - Heritage, focused on Saudi Arabia
Architectural Preservation - Heritage, focused on Saudi Arabia
 
Introduction to the telecom tower industry
Introduction to the telecom tower industryIntroduction to the telecom tower industry
Introduction to the telecom tower industry
 
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docxCenter Enamel is the leading bolted steel tanks manufacturer in China.docx
Center Enamel is the leading bolted steel tanks manufacturer in China.docx
 
Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0Introduction about Technology roadmap for Industry 4.0
Introduction about Technology roadmap for Industry 4.0
 
ExtraordinAIre Monthly Newsletter Jan 2024
ExtraordinAIre Monthly Newsletter Jan 2024ExtraordinAIre Monthly Newsletter Jan 2024
ExtraordinAIre Monthly Newsletter Jan 2024
 

Managing Big data with Hadoop

  • 1. MANAGING BIG DATA WITH HADOOP Presented by: Nalini Mehta Student(MLVTEC Bhilwara) Email: nalinimehta52@gmail.com
  • 2. Introduction Big Data: •Big data is a term used to describe the voluminous amount of unstructured and semi-structured data . •Data that would take too much time and cost too much money to load into a relational database for analysis. • Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.
  • 4. General framework of Big Data Networking  The driving force behind the implementation of Big data is both infrastructure and analytics which together constitutes the software.  Hadoop is the Big Data management software which is used to distribute, catalogue manage and query data across multiple, horizontally scaled server nodes.
  • 6. Overview of Hadoop • Hadoop is a platform for processing large amount of data in distributed fashion. • It provides scheduling and resource management framework to execute the map and to reduce phases in the cluster environment. • Hadoop Distributed File is Hadoop’s data storage layer which is designed to handle the petabytes and exabytes of data distributed over multiple nodes in parallel.
  • 7. Hadoop Cluster • DataNode- The DataNodes are the repositories for the data, and it consist of multiple smaller database infrastructures. • Client- The client represents the user interface to the big data implementation and query engine. The client could be a server or PC with a traditional user interface. • NameNode- the NameNode is equivalent to the address router and location of every data node. • Job Tracker- The job tracker represents the software tracking mechanism to distribute and aggregate search queries across multiple nodes for ultimate client analysis.
  • 8. Apache Hadoop • Apache Hadoop is an open source distributed software platform for storing and processing data. • It is a framework for running applications on large cluster built of commodity hardware. • A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available. The Hadoop Distributed File system (HDFS), takes care of this problem. • MapReduce is a simple programming model for processing and generating large data sets.
  • 9. What is MapReduce?  MapReduce is a programming model .  Programs written automatically parallelized and executed on a large cluster of commodity machines.  Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pair, and a reduce function that merges all intermediate values associated with the same intermediate key. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key.
  • 10. The Programming Model Of MapReduce  Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key and passes them to the Reduce function.
  • 11.  The Reduce function, also written by the user, accepts an intermediate key and a set of values for that key. It merges together these values to form a possibly smaller set of values.
  • 12. HADOOP DISTRIBUTED FILE SYSTEM (HDFS)  Apache Hadoop comes with a distributed file system called HDFS, which stands for Hadoop Distributed File System.  HDFS is designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information.  HDFS is designed for scalability and fault tolerance and provides APIs MapReduce applications to read and write data in parallel.  The capacity and performance of HDFS can be scaled by adding Data Nodes, and a single Name Node mechanisms that manages data placement and monitor server availability.
  • 13. Assumptions and Goals 1. Hardware Failure • An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. • There are a huge number of components and that each component has a non-trivial probability of failure. • Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS. 2. Streaming Data Access • Applications that run on HDFS need streaming access to their data sets. • HDFS is designed more for batch processing rather than interactive use by users. • The emphasis is on high throughput of data access rather than low latency of data access. 3. Large Data Sets • A typical file in HDFS is gigabytes to terabytes in size. • Thus, HDFS is tuned to support large files. • It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster.
  • 14. 4. Simple coherency model • HDFS applications need a write-once-read-many access model for files. • A file once created, written, and closed need not be changed. • This assumption simplifies data coherency issues and enables high throughput data access. 5. “Moving Computation is Cheaper than Moving Data” • A computation requested by an application is much more efficient if it is executed near the data it operates on when the size of the data set is huge. • This minimizes network congestion and increases the overall throughput of the system. 6. Portability across Heterogeneous Hardware and Software Platforms • HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.
  • 16. NameNode and DataNodes  A HDFS cluster has two types of node operating in a master-slave pattern: a NameNode (the master) and a number of DataNodes (slaves).  The NameNode manages the file system namespace. It maintains the file system tree and the metadata for all the files and directories in the tree.  Internally a file is split into one or more blocks and these blocks are stored in a set of DataNodes.
  • 17.  The NameNode executes file system namespace operations like opening, closing, and renaming files and directories.  DataNodes store and retrieve blocks when they are told to (by clients or the NameNode), and they report back to the NameNode periodically with lists of blocks that they are storing.  The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.  Without the NameNode, the file system cannot be used. In fact, if the machine running the NameNode were destroyed, all the files on the file system would be lost since there would be no way of knowing how to reconstruct the files from the blocks on the DataNodes.
  • 18. File System Namespace  HDFS supports a traditional hierarchical file organization. A user or an application can create and remove files, move a file from one directory to another, rename a file, create directories and store files inside these directories.  The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode.  An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.
  • 19. Data Replication  The blocks of a file are replicated for fault tolerance.  The block and replication factor are configurable as per file.  The NameNode makes all decisions regarding replication of blocks.  A Block report contains a list of all blocks on a DataNode.
  • 20. Hadoop as a Service in the Cloud (Haas):  Hadoop is economical for large scale data driven companies like Yahoo or Facebook.  The ecosystem around Hadoop nowadays offers various tools like Hive and Pig to make Big Data processing accessible focusing on what to do with the data and to avoid the complexity of programming.  Consequently, a minimal Hadoop as a Service provide a managed Hadoop cluster ready to use without the need to configure or install any Hadoop relevant services on any cluster nodes like Job tracker, Task tracker, NameNode or DataNode.  Depending on the level of service, abstraction and tools provided, Hadoop as a Service (HaaS) can be placed in the cloud stack as a Platform or Software as a Service solutions, between infrastructure services and cloud clients.
  • 21. Limitations: It places several requirements on the network:  Data locality  The distributed Hadoop nodes running jobs parallel causes east-west network traffic that can be adversely affected by the suboptimal network connectivity.  The network should provide high bandwidth, low latency and any to any connectivity between the nodes for optimal Hadoop performance.  Scale out  Deployments might start with a small cluster and then scale out over time as the customer may realize the initial success and then needs.  The underlying network architecture should also scale seamlessly with Hadoop clusters and should provide predictable performance.
  • 22. Conclusion  The growth of communication and connectivity has led to the emergence of Big Data. Apache Hadoop is an open source framework that has become a de-facto standard for big data platforms deployed today.  To sum up, we conclude that promising progress has been made in the area of Big Data but much remains to be done. Almost all proposed approaches are evaluated to a limited scale, and further research is required for large scale evaluations.
  • 23. References:  White paper –Introduction to Big Data: Infrastructure and Network consideration  MapReduce: Simplified Data processing on Large Clusters, http://research .google.com/archive /mapreduce.html  White paper Big Data Analytics[http:/Hadoop.intel.com]  The Hadoop Distributed File System Architecture and Design:by Dhruba Borthakur  Big Data in the enterprise, Cisco White Paper.  Cloudera capacity planning recommendations: http://www.cloudera.com/blog/ 2010/08/Hadoop HBase-capacity- planning/  Apache Hadoop Wiki Website: http://en.wikipedia.org/wiki/Apache-Hadoop.  Towards a Big Data Reference Architecture  [www.win.tue.nl/~gfletche/Maier_MSc_thesis.pdf]