SlideShare a Scribd company logo
1 of 11
Download to read offline
April 2014, HAPPIEST MINDS TECHNOLOGIES
Big Data - Infrastructure Considerations
Author
Anand Veeramani / Deepak Shivamurthy
SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
2
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Copyright Information
This document is an exclusive property of Happiest Minds Technologies Pvt. Ltd. It is intended
for limited circulation.
3
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Contents
Copyright Information ......................................................................................................................... 2
Abstract............................................................................................................................................... 4
Introduction......................................................................................................................................... 4
The Current Big Data Adoption Process................................................................................................ 6
Big Data Adoption Process - Recommended......................................................................................... 7
Conclusion: ....................................................................................................................................10
References:....................................................................................................................................11
4
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Abstract
Big Data is a much talked about technology across businesses today. A vast majority of organizations
spanning across industries are convinced of its usefulness, but the implementation focus is primarily
application oriented than infrastructure oriented. However, the infrastructure architecture for any
Big Data cluster is of critical importance because it affects the performance of the cluster. Modeling
the infrastructure architecture for Big Data essentially requires balancing cost and efficiency to meet
the specific needs of businesses.
This paper takes a closer look at the Big Data concept with the Hadoop framework as an example. We
look at the architecture and methods of implementing a Hadoop cluster, how it relates to server and
network infrastructure and the typical storage requirements for a Big Data cluster. We also look at
Information Security in the context of Big Data at a high level. The content presented here is largely
based on academic work, experiments conducted within Happiest Minds Technologies labs and
experiences derived from implementations for our customers.
Introduction
The volume of data generated globally is growing at a phenomenal scale and pace. The variety of
data generated further additions to its complexity. Data is continuously being generated by sensors
and humans; and volumes will grow exponentially over time. Cellular Networks and Social
Networking applications are some of the major contributors to data generation. Big Data has opened
up a completely new avenue for organizations to leverage these growing information assets to better
understand and compete in the market.
Big Data can be defined as any data repository with the following characteristics:
 Handles large amounts (a petabyte or more) of data
 Has distributed redundant data storage
 Processes tasks in parallel
 Provides data processing (MapReduce or equivalent) capabilities
 Centrally managed and orchestrated
 Is relatively inexpensive
 Accessible —easy to use and available
 Extensible — basic capabilities can be augmented and altered
The current focus of the development community globally is driving the creation of best practices
and learnings for Big Data adoption. Within Happiest Minds, we have experienced the pressing need
for a Big Data specific architecture framework. Based on our understanding of the Big Data
architecture design process and its limitations, this paper recommends a robust approach to address
these limitations. The process is adaptive and can be extended to adopt best practices, as it evolves.
Going forward, we shall use Hadoop as an example of a Big Data product. The most important
components of Hadoop are the Hadoop Distributed File System (HDFS) which provides storage and
MapReduce, for parallel processing of large data sets.
5
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Hadoop Cluster Node-level architecture
NameNode:
The NameNode is the master of the HDFS that directs slave DataNodes to perform low level
input/output tasks.
DataNode:
Each slave machine, referred to as DataNode, reads and writes from HDFS blocks to actual files on
the local file system.
Secondary NameNode:
Secondary NameNode (SNN) assists in monitoring the state of the HDFS cluster.
JobTracker:
JobTracker is the master overseeing the overall execution of a MapReduce job. It acts as a liaison
between Hadoop and the application.
TaskTracker:
TaskTracker manages the execution of individual tasks on each slave node.
6
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
The Current Big Data Adoption Process
Huge growth in volumes, the growing variety and the pace at which data is generated are making a
big impact on organizations’ business decisions. Data storage requirements are becoming difficult to
predict and provide for. The adoption of Big Data may be driven by two perspectives: the application
perspective of analytics or the infrastructure perspective of storage.
Presently, the global Big Data adoption trend focuses primarily on application and not infrastructure.
Hence, there is scope for improvement in taking a holistic view of the application and infrastructure
requirements, while designing a cluster. It is imperative to understand that the underlying data
processing algorithm (MapReduce or equivalent) will produce efficient output only if the data storage
cluster is designed well.
The overall Big Data adoption process, as it stands today, is depicted in below diagram.
Following is a detailed explanation of the current Big Data adoption process:
1. Cluster Design: Application requirements are analyzed in terms of workload, volume and
other associated parameters based on which the cluster is designed. Cluster design is not an
iterative process. The initial setup is verified and validated with a sample application and
sample data before being rolled out. Although Big Data cluster design allows flexibility in fine-
tuning the configuration parameters, the large number of parameters and their cross-impacts
introduce additional complexity.
2. Hardware Architecture: The key success factor for Hadoop clusters is the usage of high
quality commodity equipment. Most Hadoop users are cost conscious and as clusters grow,
their cost can be significant. In the present scenario, the hardware architecture requirements
for the NameNode are higher RAM and moderate HDD. If the JobTracker is a physically
separate server, it will have higher RAM and CPU speed. DataNodes are standard low-end
server class machines.
7
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
3. Network Architecture: Currently, network architecture is not specifically designed for Big
Data, i.e., inputs from cluster design and application requirement are not always mapped to
it. Standard network setup within the existing data center is used as the backbone. In most
cases, this may result in overestimated network deployment and, at times, have a negative
effect on the MapReduce data processing algorithm. Hence there is significant scope for
creating concrete guidelines related to designing network architecture for Big Data.
4. Storage Architecture: Most enterprises have huge investments in NAS and SAN devices.
When implementing Big Data, they attempt to re-use this existing storage infrastructure even
though DAS is the recommended storage for Big Data clusters. Parameters like type of disk,
shared-nothing vs shared something, are often not taken into account.
5. Information Security Architecture: General examination of different Big Data
implementations shows that security features are sparse and aftermarket security offerings
are not fully tailored to these clusters. Findings show these deployments to be largely
insecure and wholly reliant on network and perimeter security support.
Big Data Adoption Process - Recommended
While designing the Big Data architecture for an enterprise setup, it is necessary to take a
comprehensive approach.
 Application requirements should drive the overall cluster design activity including cluster
sizing, hardware architecture, network architecture, storage architecture and information
security architecture.
 Hardware architecture should be based on application requirements and cluster sizing.
 Network architecture should also be derived from application requirements and cluster
sizing. This can be worked out in parallel to hardware architecture design.
 Storage architecture should depend upon cluster sizing, hardware architecture and network
architecture. Application requirements should help fine tune the storage architecture.
 Information security architecture should depend upon hardware architecture, network
architecture and storage architecture. Application requirements should validate the security
architecture.
8
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
A view of the recommended adoption process is depicted in below diagram. The darker boxes
indicate the complex steps.
1. Application Requirements and Cluster Sizing: There are a number of important cluster
configuration parameters to be derived from the application requirements.
2. Cluster Design and Hardware Architecture: Hadoop is built to handle component failure well and
to scale out on low cost gear; thereby eliminating the need for RAID cards, redundant power supplies
and other per-component reliability features. Error-correcting RAM and SATA drives with good MTBF
numbers should be used, as this assures reliable computations. Hard drives are the largest source of
failures. Therefore, care must be taken in making the right choice of hard drives with focus on
reliability and efficiency. Hardware architecture for each cluster component should be decided in line
with the application requirements and cluster design.
3. Network Architecture: The underlying network architecture impacts the efficiency of the
MapReduce algorithm. In fact, each of the networking components affects the performance of a
9
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
cluster and this becomes the toughest variable to nail down. Since Hadoop workloads vary a lot, the
key is to use adequate network capacity to allow all nodes in the cluster to communicate with each
other at reasonable speed and cost.
4. Storage Architecture: While considering storage for Big Data, the disk size is more important than
the seek time. The architectural parameters to be considered for storage are as follows:
Characteristics of Big Data Storage
Scalable: Storage should be scalable in terms of size, throughput and speed of access.
Provides tiered storage: It is critical for the storage system to be able to manage the “tiering” of data
across the range of media types: flash, fast disk, slower disk and tape.
Makes content widely accessible: Storage should distribute data geographically so that it is closer to
users.
Supports workflow automation: Big Data must be delivered to users in the context of a workflow. For
this reason, Big Data storage architecture must support easy integration of workflow.
Supports integration with legacy systems, and analytical and content applications: A well designed
Big Data storage system should be heterogeneous and flexible. It should offer interfaces that allow
direct access to the Big Data storage functionality.
Supports integration with cloud ecosystems: An ideal Big Data storage system must be built from the
ground up, to be cloud enabled.
Self-managing: The Big Data storage system should have the built-in ability to handle failures; it must
accommodate component failures and repair itself without intervention.
5. Information Security Architecture: To handle information security of Big Data, the architectural
and operational security aspects need to be looked into. By its very nature, Big Data security has
inherent challenges to tackle. And many of these challenges still need appropriately devised
mechanisms to address them. The top 10 challenges of handling Security within Big Data
environment are illustrated below.
10
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Big Data clusters share most of the same vulnerabilities as web applications and traditional data
warehouses. Concerns over how nodes and client applications are vetted before joining the cluster,
how data at rest is protected from unwanted inspection, privacy of network communications and
how nodes are managed, remain in focus. The security of the web applications that front end Big
Data clusters is equally important.
As many clusters are being deployed within virtual and cloud environments, they can leverage vendor
supplied management tools to address operational security issues. While these measures cannot
provide fail-proof security, a reasonable amount of effort can make it considerably more difficult to
subvert systems or steal information.
Big Data in Cloud
With the cloud computing and Big Data trends converging, the era of Big Data in the cloud
commences. From technology perspective, focus will shift away from the software powering Big Data
projects towards the infrastructure necessary to support it.
For many Big Data scenarios, information comes from outside the company, for e.g., social media,
demographic data, web data, events, feeds, etc. The elasticity of the cloud makes it ideal for Big Data
analytics -- the practice of rapidly processing large volumes of unstructured data to identify patterns
and improve business strategies.
Making the storage perform at a level that enables the kind of data analysis Big Data needs is
important. Also, the cloud service provider’s inability to provide infrastructure from the same
physical location results in network latency and proves detrimental to the overall performance of
MapReduce. These factors prove to be the biggest detriments to the use of cloud for Big Data
processing.
Conclusion:
Looking at architecture design for Big Data only from the application perspective gives an isolated
view. Similarly, taking the infrastructure perspective alone into consideration may negate the
advantages of Big Data. A comprehensive strategy covering application and infrastructure aspects
while architecting Big Data is most desirable. The commonly followed process for Big Data
implementation today still has scope for improvement in this regard.
Big Data implementation is a multi-skilled discipline. It has major dependency on the underlying
infrastructure and the deployment architecture. Once a decision to use Big Data has been made by
an organization, the network, storage and security architecture associated with the deployment
architecture must be worked out in an iterative manner before finalizing it. The rule of thumb is to
exploit the benefit of low total cost of ownership (TCO) by using commodity hardware and get almost
real-time application performance.
11
© 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
References
[1] Apache Foundation, Hadoop overview,
http://hadoop.apache.org/
[2] MapReduce.org, information about MapReduce Framework,
http://www.mapreduce.org/
[3] Forbes, Ten properties of the Perfect Big Data storage architecture,
http://www.forbes.com/sites/danwoods/2012/07/23/ten-properties-of-the-perfect-big-data-
storage-architecture/2/
[4] Securosis, Securing Big Data: Security Recommendations for Hadoop and NoSQL Environments,
https://securosis.com/Research/Publication/securing-big-data-security-recommendations-for-
hadoop-and-nosql-environment
[5] Forbes, BigData meets cloud,
http://www.forbes.com/sites/forrester/2012/08/15/big-data-meets-cloud/
[6] Virginia State University, Evaluating MapReduce System Performance: A Simulation Approach,
http://scholar.lib.vt.edu/theses/available/etd-08282012-152556/unrestricted/Wang_G_D_2012.pdf

More Related Content

What's hot

What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of HadoopDataWorks Summit
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data recordsDavid Walker
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
20 point checklist : why move backup and disaster recovery to the cloud - druva
20 point checklist : why move backup and disaster recovery to the cloud - druva20 point checklist : why move backup and disaster recovery to the cloud - druva
20 point checklist : why move backup and disaster recovery to the cloud - druvaDruva
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseEMC
 
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distIntel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distKetan Paranjape
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessInside Analysis
 
Intel precision medicine apr 2015
Intel precision medicine apr 2015Intel precision medicine apr 2015
Intel precision medicine apr 2015Ketan Paranjape
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkKetan Paranjape
 
Intel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineIntel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineKetan Paranjape
 
Protecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environmentsProtecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environmentsat MicroFocus Italy ❖✔
 
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014pietvz
 
Cio data center definition and solutions
Cio   data center definition and solutionsCio   data center definition and solutions
Cio data center definition and solutionsrami20122
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergencekvnnrao
 

What's hot (20)

What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data records
 
E018142329
E018142329E018142329
E018142329
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
What Is A Datacenter?
What Is A Datacenter?What Is A Datacenter?
What Is A Datacenter?
 
20 point checklist : why move backup and disaster recovery to the cloud - druva
20 point checklist : why move backup and disaster recovery to the cloud - druva20 point checklist : why move backup and disaster recovery to the cloud - druva
20 point checklist : why move backup and disaster recovery to the cloud - druva
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
Hadoop and-cisco-ucs
Hadoop and-cisco-ucsHadoop and-cisco-ucs
Hadoop and-cisco-ucs
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum Database
 
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distIntel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Intel precision medicine apr 2015
Intel precision medicine apr 2015Intel precision medicine apr 2015
Intel precision medicine apr 2015
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talk
 
Intel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineIntel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicine
 
Protecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environmentsProtecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environments
 
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014
 
Cio data center definition and solutions
Cio   data center definition and solutionsCio   data center definition and solutions
Cio data center definition and solutions
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergence
 

Viewers also liked

Whitepaper: Unified Communications Solution on Communication Enabled Business...
Whitepaper: Unified Communications Solution on Communication Enabled Business...Whitepaper: Unified Communications Solution on Communication Enabled Business...
Whitepaper: Unified Communications Solution on Communication Enabled Business...Happiest Minds Technologies
 
Whitepaper: Network Virtualization - Happiest Minds
Whitepaper: Network Virtualization - Happiest MindsWhitepaper: Network Virtualization - Happiest Minds
Whitepaper: Network Virtualization - Happiest MindsHappiest Minds Technologies
 
Whitepaper: Omni-Channel Retail - Happiest Minds
Whitepaper: Omni-Channel Retail - Happiest MindsWhitepaper: Omni-Channel Retail - Happiest Minds
Whitepaper: Omni-Channel Retail - Happiest MindsHappiest Minds Technologies
 
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest Minds
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest MindsWhitepaper: Flow Document – A solution to text readers in .Net - Happiest Minds
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest MindsHappiest Minds Technologies
 
Whitepaper: Evolution of the Software Defined Data Center - Happiest Minds
Whitepaper: Evolution of the Software Defined Data Center - Happiest MindsWhitepaper: Evolution of the Software Defined Data Center - Happiest Minds
Whitepaper: Evolution of the Software Defined Data Center - Happiest MindsHappiest Minds Technologies
 
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...Happiest Minds Technologies
 
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...Happiest Minds Technologies
 

Viewers also liked (9)

Whitepaper: Unified Communications Solution on Communication Enabled Business...
Whitepaper: Unified Communications Solution on Communication Enabled Business...Whitepaper: Unified Communications Solution on Communication Enabled Business...
Whitepaper: Unified Communications Solution on Communication Enabled Business...
 
Whitepaper: Network Virtualization - Happiest Minds
Whitepaper: Network Virtualization - Happiest MindsWhitepaper: Network Virtualization - Happiest Minds
Whitepaper: Network Virtualization - Happiest Minds
 
Whitepaper: Omni-Channel Retail - Happiest Minds
Whitepaper: Omni-Channel Retail - Happiest MindsWhitepaper: Omni-Channel Retail - Happiest Minds
Whitepaper: Omni-Channel Retail - Happiest Minds
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest Minds
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest MindsWhitepaper: Flow Document – A solution to text readers in .Net - Happiest Minds
Whitepaper: Flow Document – A solution to text readers in .Net - Happiest Minds
 
Whitepaper: Evolution of the Software Defined Data Center - Happiest Minds
Whitepaper: Evolution of the Software Defined Data Center - Happiest MindsWhitepaper: Evolution of the Software Defined Data Center - Happiest Minds
Whitepaper: Evolution of the Software Defined Data Center - Happiest Minds
 
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...
Whitepaper: DEVELOPER ENGAGEMENT SOLUTION KEY TO SUCCESS OF YOUR PLATFORM - H...
 
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...
Whitepaper: AirAudit: Intelligent Audit Solution for Airlines - Reducing Reve...
 
Cyber Security Needs and Challenges
Cyber Security Needs and ChallengesCyber Security Needs and Challenges
Cyber Security Needs and Challenges
 

Similar to Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds

DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEijsptm
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
IRJET- Survey of Big Data with Hadoop
IRJET-  	  Survey of Big Data with HadoopIRJET-  	  Survey of Big Data with Hadoop
IRJET- Survey of Big Data with HadoopIRJET Journal
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET Journal
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAijp2p
 

Similar to Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds (20)

DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big Data
Big DataBig Data
Big Data
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
IRJET- Survey of Big Data with Hadoop
IRJET-  	  Survey of Big Data with HadoopIRJET-  	  Survey of Big Data with Hadoop
IRJET- Survey of Big Data with Hadoop
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
big data
big databig data
big data
 
BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 

More from Happiest Minds Technologies

Largest Electricity provider in the US- Case Study
Largest Electricity provider in the US- Case StudyLargest Electricity provider in the US- Case Study
Largest Electricity provider in the US- Case StudyHappiest Minds Technologies
 
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & Insurance
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & InsuranceExploring the Potential of ChatGPT in Banking, Financial SERVICES & Insurance
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & InsuranceHappiest Minds Technologies
 
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0Happiest Minds Technologies
 
Automating SOC1/2 Compliance- For a leading Software solution company in UK
Automating SOC1/2 Compliance- For a leading Software solution company in UKAutomating SOC1/2 Compliance- For a leading Software solution company in UK
Automating SOC1/2 Compliance- For a leading Software solution company in UKHappiest Minds Technologies
 
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...Happiest Minds Technologies
 
Complete Guide to General Data Protection Regulation (GDPR)
Complete Guide to General Data Protection Regulation (GDPR)Complete Guide to General Data Protection Regulation (GDPR)
Complete Guide to General Data Protection Regulation (GDPR)Happiest Minds Technologies
 
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)Happiest Minds Technologies
 
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITHappiest Minds Technologies
 
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDITHappiest Minds Technologies
 
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITHappiest Minds Technologies
 

More from Happiest Minds Technologies (20)

Largest Electricity provider in the US- Case Study
Largest Electricity provider in the US- Case StudyLargest Electricity provider in the US- Case Study
Largest Electricity provider in the US- Case Study
 
BFSI GLOBAL TRENDS FY 24
BFSI GLOBAL TRENDS FY 24BFSI GLOBAL TRENDS FY 24
BFSI GLOBAL TRENDS FY 24
 
ARTIFICIAL INTELLIGENCE IN DIGITAL BANKING
ARTIFICIAL INTELLIGENCE IN DIGITAL BANKINGARTIFICIAL INTELLIGENCE IN DIGITAL BANKING
ARTIFICIAL INTELLIGENCE IN DIGITAL BANKING
 
DIGITAL MANUFACTURING
DIGITAL MANUFACTURINGDIGITAL MANUFACTURING
DIGITAL MANUFACTURING
 
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & Insurance
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & InsuranceExploring the Potential of ChatGPT in Banking, Financial SERVICES & Insurance
Exploring the Potential of ChatGPT in Banking, Financial SERVICES & Insurance
 
AN OVERVIEW OF THE METAVERSE
AN OVERVIEW OF THE METAVERSEAN OVERVIEW OF THE METAVERSE
AN OVERVIEW OF THE METAVERSE
 
VMware to AWS Cloud Migration
VMware to AWS Cloud MigrationVMware to AWS Cloud Migration
VMware to AWS Cloud Migration
 
Digital-Content-Monetization-DCM-Platform-2.pdf
Digital-Content-Monetization-DCM-Platform-2.pdfDigital-Content-Monetization-DCM-Platform-2.pdf
Digital-Content-Monetization-DCM-Platform-2.pdf
 
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0
AUTOMATING CYBER RISK DETECTION AND PROTECTION WITH SOC 2.0
 
Cloud Reshaping Banking
Cloud Reshaping BankingCloud Reshaping Banking
Cloud Reshaping Banking
 
Automating SOC1/2 Compliance- For a leading Software solution company in UK
Automating SOC1/2 Compliance- For a leading Software solution company in UKAutomating SOC1/2 Compliance- For a leading Software solution company in UK
Automating SOC1/2 Compliance- For a leading Software solution company in UK
 
PAMaaS- Powered by CyberArk
PAMaaS- Powered by CyberArkPAMaaS- Powered by CyberArk
PAMaaS- Powered by CyberArk
 
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...
GUIDE TO KEEP YOUR END-USERS CONNECTED TO THE DIGITAL WORKPLACE DURING DISRUP...
 
SECURING THE CLOUD DATA LAKES
SECURING THE CLOUD DATA LAKESSECURING THE CLOUD DATA LAKES
SECURING THE CLOUD DATA LAKES
 
Complete Guide to General Data Protection Regulation (GDPR)
Complete Guide to General Data Protection Regulation (GDPR)Complete Guide to General Data Protection Regulation (GDPR)
Complete Guide to General Data Protection Regulation (GDPR)
 
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)
Azure bastion- Remote desktop RDP/SSH in Azure using Bastion Service as (PaaS)
 
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
 
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN CPG THROUGH INTELLIGENT FREIGHT AUDIT
 
How to Approach Tool Integrations
How to Approach Tool IntegrationsHow to Approach Tool Integrations
How to Approach Tool Integrations
 
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDITREDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
REDUCING TRANSPORTATION COSTS IN RETAIL THROUGH INTELLIGENT FREIGHT AUDIT
 

Recently uploaded

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds

  • 1. April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY.
  • 2. 2 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Copyright Information This document is an exclusive property of Happiest Minds Technologies Pvt. Ltd. It is intended for limited circulation.
  • 3. 3 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Contents Copyright Information ......................................................................................................................... 2 Abstract............................................................................................................................................... 4 Introduction......................................................................................................................................... 4 The Current Big Data Adoption Process................................................................................................ 6 Big Data Adoption Process - Recommended......................................................................................... 7 Conclusion: ....................................................................................................................................10 References:....................................................................................................................................11
  • 4. 4 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Abstract Big Data is a much talked about technology across businesses today. A vast majority of organizations spanning across industries are convinced of its usefulness, but the implementation focus is primarily application oriented than infrastructure oriented. However, the infrastructure architecture for any Big Data cluster is of critical importance because it affects the performance of the cluster. Modeling the infrastructure architecture for Big Data essentially requires balancing cost and efficiency to meet the specific needs of businesses. This paper takes a closer look at the Big Data concept with the Hadoop framework as an example. We look at the architecture and methods of implementing a Hadoop cluster, how it relates to server and network infrastructure and the typical storage requirements for a Big Data cluster. We also look at Information Security in the context of Big Data at a high level. The content presented here is largely based on academic work, experiments conducted within Happiest Minds Technologies labs and experiences derived from implementations for our customers. Introduction The volume of data generated globally is growing at a phenomenal scale and pace. The variety of data generated further additions to its complexity. Data is continuously being generated by sensors and humans; and volumes will grow exponentially over time. Cellular Networks and Social Networking applications are some of the major contributors to data generation. Big Data has opened up a completely new avenue for organizations to leverage these growing information assets to better understand and compete in the market. Big Data can be defined as any data repository with the following characteristics:  Handles large amounts (a petabyte or more) of data  Has distributed redundant data storage  Processes tasks in parallel  Provides data processing (MapReduce or equivalent) capabilities  Centrally managed and orchestrated  Is relatively inexpensive  Accessible —easy to use and available  Extensible — basic capabilities can be augmented and altered The current focus of the development community globally is driving the creation of best practices and learnings for Big Data adoption. Within Happiest Minds, we have experienced the pressing need for a Big Data specific architecture framework. Based on our understanding of the Big Data architecture design process and its limitations, this paper recommends a robust approach to address these limitations. The process is adaptive and can be extended to adopt best practices, as it evolves. Going forward, we shall use Hadoop as an example of a Big Data product. The most important components of Hadoop are the Hadoop Distributed File System (HDFS) which provides storage and MapReduce, for parallel processing of large data sets.
  • 5. 5 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Hadoop Cluster Node-level architecture NameNode: The NameNode is the master of the HDFS that directs slave DataNodes to perform low level input/output tasks. DataNode: Each slave machine, referred to as DataNode, reads and writes from HDFS blocks to actual files on the local file system. Secondary NameNode: Secondary NameNode (SNN) assists in monitoring the state of the HDFS cluster. JobTracker: JobTracker is the master overseeing the overall execution of a MapReduce job. It acts as a liaison between Hadoop and the application. TaskTracker: TaskTracker manages the execution of individual tasks on each slave node.
  • 6. 6 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved The Current Big Data Adoption Process Huge growth in volumes, the growing variety and the pace at which data is generated are making a big impact on organizations’ business decisions. Data storage requirements are becoming difficult to predict and provide for. The adoption of Big Data may be driven by two perspectives: the application perspective of analytics or the infrastructure perspective of storage. Presently, the global Big Data adoption trend focuses primarily on application and not infrastructure. Hence, there is scope for improvement in taking a holistic view of the application and infrastructure requirements, while designing a cluster. It is imperative to understand that the underlying data processing algorithm (MapReduce or equivalent) will produce efficient output only if the data storage cluster is designed well. The overall Big Data adoption process, as it stands today, is depicted in below diagram. Following is a detailed explanation of the current Big Data adoption process: 1. Cluster Design: Application requirements are analyzed in terms of workload, volume and other associated parameters based on which the cluster is designed. Cluster design is not an iterative process. The initial setup is verified and validated with a sample application and sample data before being rolled out. Although Big Data cluster design allows flexibility in fine- tuning the configuration parameters, the large number of parameters and their cross-impacts introduce additional complexity. 2. Hardware Architecture: The key success factor for Hadoop clusters is the usage of high quality commodity equipment. Most Hadoop users are cost conscious and as clusters grow, their cost can be significant. In the present scenario, the hardware architecture requirements for the NameNode are higher RAM and moderate HDD. If the JobTracker is a physically separate server, it will have higher RAM and CPU speed. DataNodes are standard low-end server class machines.
  • 7. 7 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved 3. Network Architecture: Currently, network architecture is not specifically designed for Big Data, i.e., inputs from cluster design and application requirement are not always mapped to it. Standard network setup within the existing data center is used as the backbone. In most cases, this may result in overestimated network deployment and, at times, have a negative effect on the MapReduce data processing algorithm. Hence there is significant scope for creating concrete guidelines related to designing network architecture for Big Data. 4. Storage Architecture: Most enterprises have huge investments in NAS and SAN devices. When implementing Big Data, they attempt to re-use this existing storage infrastructure even though DAS is the recommended storage for Big Data clusters. Parameters like type of disk, shared-nothing vs shared something, are often not taken into account. 5. Information Security Architecture: General examination of different Big Data implementations shows that security features are sparse and aftermarket security offerings are not fully tailored to these clusters. Findings show these deployments to be largely insecure and wholly reliant on network and perimeter security support. Big Data Adoption Process - Recommended While designing the Big Data architecture for an enterprise setup, it is necessary to take a comprehensive approach.  Application requirements should drive the overall cluster design activity including cluster sizing, hardware architecture, network architecture, storage architecture and information security architecture.  Hardware architecture should be based on application requirements and cluster sizing.  Network architecture should also be derived from application requirements and cluster sizing. This can be worked out in parallel to hardware architecture design.  Storage architecture should depend upon cluster sizing, hardware architecture and network architecture. Application requirements should help fine tune the storage architecture.  Information security architecture should depend upon hardware architecture, network architecture and storage architecture. Application requirements should validate the security architecture.
  • 8. 8 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved A view of the recommended adoption process is depicted in below diagram. The darker boxes indicate the complex steps. 1. Application Requirements and Cluster Sizing: There are a number of important cluster configuration parameters to be derived from the application requirements. 2. Cluster Design and Hardware Architecture: Hadoop is built to handle component failure well and to scale out on low cost gear; thereby eliminating the need for RAID cards, redundant power supplies and other per-component reliability features. Error-correcting RAM and SATA drives with good MTBF numbers should be used, as this assures reliable computations. Hard drives are the largest source of failures. Therefore, care must be taken in making the right choice of hard drives with focus on reliability and efficiency. Hardware architecture for each cluster component should be decided in line with the application requirements and cluster design. 3. Network Architecture: The underlying network architecture impacts the efficiency of the MapReduce algorithm. In fact, each of the networking components affects the performance of a
  • 9. 9 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved cluster and this becomes the toughest variable to nail down. Since Hadoop workloads vary a lot, the key is to use adequate network capacity to allow all nodes in the cluster to communicate with each other at reasonable speed and cost. 4. Storage Architecture: While considering storage for Big Data, the disk size is more important than the seek time. The architectural parameters to be considered for storage are as follows: Characteristics of Big Data Storage Scalable: Storage should be scalable in terms of size, throughput and speed of access. Provides tiered storage: It is critical for the storage system to be able to manage the “tiering” of data across the range of media types: flash, fast disk, slower disk and tape. Makes content widely accessible: Storage should distribute data geographically so that it is closer to users. Supports workflow automation: Big Data must be delivered to users in the context of a workflow. For this reason, Big Data storage architecture must support easy integration of workflow. Supports integration with legacy systems, and analytical and content applications: A well designed Big Data storage system should be heterogeneous and flexible. It should offer interfaces that allow direct access to the Big Data storage functionality. Supports integration with cloud ecosystems: An ideal Big Data storage system must be built from the ground up, to be cloud enabled. Self-managing: The Big Data storage system should have the built-in ability to handle failures; it must accommodate component failures and repair itself without intervention. 5. Information Security Architecture: To handle information security of Big Data, the architectural and operational security aspects need to be looked into. By its very nature, Big Data security has inherent challenges to tackle. And many of these challenges still need appropriately devised mechanisms to address them. The top 10 challenges of handling Security within Big Data environment are illustrated below.
  • 10. 10 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Big Data clusters share most of the same vulnerabilities as web applications and traditional data warehouses. Concerns over how nodes and client applications are vetted before joining the cluster, how data at rest is protected from unwanted inspection, privacy of network communications and how nodes are managed, remain in focus. The security of the web applications that front end Big Data clusters is equally important. As many clusters are being deployed within virtual and cloud environments, they can leverage vendor supplied management tools to address operational security issues. While these measures cannot provide fail-proof security, a reasonable amount of effort can make it considerably more difficult to subvert systems or steal information. Big Data in Cloud With the cloud computing and Big Data trends converging, the era of Big Data in the cloud commences. From technology perspective, focus will shift away from the software powering Big Data projects towards the infrastructure necessary to support it. For many Big Data scenarios, information comes from outside the company, for e.g., social media, demographic data, web data, events, feeds, etc. The elasticity of the cloud makes it ideal for Big Data analytics -- the practice of rapidly processing large volumes of unstructured data to identify patterns and improve business strategies. Making the storage perform at a level that enables the kind of data analysis Big Data needs is important. Also, the cloud service provider’s inability to provide infrastructure from the same physical location results in network latency and proves detrimental to the overall performance of MapReduce. These factors prove to be the biggest detriments to the use of cloud for Big Data processing. Conclusion: Looking at architecture design for Big Data only from the application perspective gives an isolated view. Similarly, taking the infrastructure perspective alone into consideration may negate the advantages of Big Data. A comprehensive strategy covering application and infrastructure aspects while architecting Big Data is most desirable. The commonly followed process for Big Data implementation today still has scope for improvement in this regard. Big Data implementation is a multi-skilled discipline. It has major dependency on the underlying infrastructure and the deployment architecture. Once a decision to use Big Data has been made by an organization, the network, storage and security architecture associated with the deployment architecture must be worked out in an iterative manner before finalizing it. The rule of thumb is to exploit the benefit of low total cost of ownership (TCO) by using commodity hardware and get almost real-time application performance.
  • 11. 11 © 2014 Happiest Minds Technologies Pvt. Ltd. All Rights Reserved References [1] Apache Foundation, Hadoop overview, http://hadoop.apache.org/ [2] MapReduce.org, information about MapReduce Framework, http://www.mapreduce.org/ [3] Forbes, Ten properties of the Perfect Big Data storage architecture, http://www.forbes.com/sites/danwoods/2012/07/23/ten-properties-of-the-perfect-big-data- storage-architecture/2/ [4] Securosis, Securing Big Data: Security Recommendations for Hadoop and NoSQL Environments, https://securosis.com/Research/Publication/securing-big-data-security-recommendations-for- hadoop-and-nosql-environment [5] Forbes, BigData meets cloud, http://www.forbes.com/sites/forrester/2012/08/15/big-data-meets-cloud/ [6] Virginia State University, Evaluating MapReduce System Performance: A Simulation Approach, http://scholar.lib.vt.edu/theses/available/etd-08282012-152556/unrestricted/Wang_G_D_2012.pdf