SlideShare a Scribd company logo
Cisco and Big Data – Hadoop World 2011
Unified Fabric  optimized for Big Data infrastructures  with seamless integration with current data models Traditional Database RDBMS Storage SAN/NAS “ Big Data” Store And Analyze “ Big Data” Real-Time Capture, Read & Update  NoSQL Application Virtualized, Bare-Metal, Cloud Sensor Data Logs Social Media Click Streams Mobility Trends Event Data Cisco Unified Fabric
 
[object Object],[object Object],The Setup Lab environment overview ,[object Object],[object Object]
[object Object],[object Object],… .. … .. Note: Two topologies were tested to examine the benefits of providing an integrated solution that can support multiple technologies, such as Traditional RDBMS, SAN/NAS, Virtualization, etc. N7K N7K N7K N3k N3k N3k N3k N3k N3k N7K N7K N7K N5k N5k N5k N5k … N2k … N2k UCS
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A general characteristic of an optimally configured cluster is the ability to decrease job completion times by scaling out the nodes. Test results from ETL-like Workload (Yahoo Terasort) using 1TB data set.
The complexity of the functions used in Map and/or Reduce has a large impact on the type of job and network traffic. ,[object Object],[object Object],[object Object]
Network Graph of all Traffic received on an single node (80 Node run) Reducers Start Maps Finish Job Complete Maps Start These symbols represent a node sending traffic to HPC064 Note: Shortly after the Reducers start Map tasks are finishing and data is being shuffled to reducers As Maps completely finish the network is no loner used as Reducers have all the data they need to finish the job The red line is the total amount of traffic received by hpc064
Network Graph of all Traffic received on an single node (80 Node run) ,[object Object],[object Object],[object Object],Note: If output replication is enabled, then the end of the terasort, must store additional copies. For a 1TB sort, 2TB will need to be replicated across the network.
Network Graph of all Traffic received on an single node (80 node run) Reducers Start Maps Finish Job Complete Maps Start The red line is the total amount of traffic received by hpc064 These symbols represent a node sending traffic to HPC064 Note: Due the the combination of the length of the Map phase and the reduced data set being shuffled, the network is being utilized throughout the job, but by a limited amount.
Given the same MapReduce Job, the larger the input dataset, the longer the job will take.  Note: It is important to note that as dataset sizes increase completion times may not scale linearly as many jobs can hit the ceiling of I/O and/or Compute power. Test results from ETL-like Workload (Yahoo Terasort) using varying data set sizes.
The I/O capacity, CPU and memory of the Data Node can have a direct impact on performance of a cluster. Note: A 2RU Server with 16 disks gives the node more storage, but trading off CPU per RU. On the other hand a 1RU server gives more CPU per rack.
Data Locality – The ability to process data where it is locally stored. Note: During the Map Phase, the JobTracker attempts to use data locality to schedule map tasks where the data is locally stored. This is not perfect and is dependent on a data nodes where the data is located. This is a consideration when choosing the replication factor. More replicas tend to create higher probability for data locality. Map Tasks: Initial spike for non-local data. Sometimes a task may be scheduled on a node that does not have the data available locally.
Hadoop clusters are generally multi-use. The effect of background use can effect any single jobs completion. Note: A given Cluster, is generally running may different types of Jobs, Importing into HDFS, Etc.  Example View of 24 Hour Cluster Use Large ETL Job Overlaps with medium and small ETL Jobs and many small BI Jobs (Blue lines are ETL Jobs and purple lines are BI Jobs) Importing Data into HDFS
[object Object],Impact of Network Characteristics on Job Completion times
The failure of a networking device can affect multiple data nodes of a Hadoop cluster with a range of effects. Note: The tasks on affected nodes need to be rescheduled, schedule maintenance activities such as data rebalancing, increasing load on the cluster. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Several HDFS operations and phases of MapReduce jobs are very bursty in nature Note: The extent of bursts largely depend on the type of job (ETL vs. BI). Bursty phases can include replication of data (either importing into HDFS or output replication) and the output of the mappers during the shuffle phase. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Buffer being used during shuffle phase Buffer being used during output replication ,[object Object],[object Object],Note: The Aggregation switch buffer remained flat as the bursts were absorbed at the Top of Rack layer
Buffer being used during shuffle phase Buffer being used during output replication Note: The Fabric Extender Buffer utilization was roughly equivalent to that of the N3k, but has 32MB of buffer vs. 9MB of N3k ,[object Object],[object Object]
In a multi-use cluster described previously, multiple job types (ETL, BI, etc.) and Importing data into HDFS can be happening at the same time Note: Usage may vary depending on job scheduling options
In the largest workloads, multi-terabytes can be transmitted across the network Note: Data taken from multi-use workload (Multi-ETL + Multi-BI + HDFS Import).
Generally 1GE is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload. Note: Multiple 1GE links can be bonded together to not only increase bandwidth, but increase bandwidth.
Moving from 1GE to 10GE actually lowers the buffer requirement on the switching layer.  Note: By moving to 10GE, the data node has a larger input to receive data lessening the need for buffers on the network as the total aggregate speed or amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
Generally network latency, while consistent latency being important, does not represent a significant factor for Hadoop Clusters. Note: There is a difference in network latency vs. application latency. Optimization in the application stack can decrease application latency that can potentially have a significant benefit.
For more information: www.cisco.com/go/bigdata
 

More Related Content

What's hot

Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Omid Vahdaty
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
Subhas Kumar Ghosh
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Apache Apex
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
mcsrivas
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
Antonios Katsarakis
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
Antonios Katsarakis
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
Jason Shao
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
Nicola Cadenelli
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
Kyong-Ha Lee
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
Subhas Kumar Ghosh
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceShift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
MapR Technologies
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
mapr-academy
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
Pietro Michiardi
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Frane Bandov
 

What's hot (20)

Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceShift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 

Viewers also liked

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Implementing the Hybrid Data Center
Implementing the Hybrid Data CenterImplementing the Hybrid Data Center
Implementing the Hybrid Data Center
Cisco Canada
 
Wprowadzenie do technologii Big Data
Wprowadzenie do technologii Big DataWprowadzenie do technologii Big Data
Wprowadzenie do technologii Big Data
Sages
 
Scalable midsize data center designs
Scalable midsize data center designsScalable midsize data center designs
Scalable midsize data center designs
Jing Bai
 
Tweeting hadoop
Tweeting hadoopTweeting hadoop
Tweeting hadoop
Zizon Qiu
 
Hadoop
HadoopHadoop
Hadoop
Zizon Qiu
 
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
Alice Chan
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
pramodbiligiri
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Gluster.org
 
Performing Network & Security Analytics with Hadoop
Performing Network & Security Analytics with HadoopPerforming Network & Security Analytics with Hadoop
Performing Network & Security Analytics with Hadoop
DataWorks Summit
 
ahepburn MDES PRES2 Production Tech Its only a Comic
ahepburn MDES PRES2 Production Tech Its only a Comicahepburn MDES PRES2 Production Tech Its only a Comic
ahepburn MDES PRES2 Production Tech Its only a Comic
Andrew Hepburn
 
TRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOPTRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOP
Kirthan S Holla
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
Gluster.org
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
Paula Battalia Brand
 
Chistesvarios8
Chistesvarios8Chistesvarios8
Chistesvarios8
josemorales
 
How to Add an App to Your LinkedIn Profile
How to Add an App to Your LinkedIn ProfileHow to Add an App to Your LinkedIn Profile
How to Add an App to Your LinkedIn Profile
NET:101
 
ExcelCertificate18122014
ExcelCertificate18122014ExcelCertificate18122014
ExcelCertificate18122014
Peter Garces
 

Viewers also liked (20)

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Implementing the Hybrid Data Center
Implementing the Hybrid Data CenterImplementing the Hybrid Data Center
Implementing the Hybrid Data Center
 
Wprowadzenie do technologii Big Data
Wprowadzenie do technologii Big DataWprowadzenie do technologii Big Data
Wprowadzenie do technologii Big Data
 
Scalable midsize data center designs
Scalable midsize data center designsScalable midsize data center designs
Scalable midsize data center designs
 
Tweeting hadoop
Tweeting hadoopTweeting hadoop
Tweeting hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
Monitoring and Analyzing Big Traffic Data of a Large-Scale Cellular Network w...
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
 
Performing Network & Security Analytics with Hadoop
Performing Network & Security Analytics with HadoopPerforming Network & Security Analytics with Hadoop
Performing Network & Security Analytics with Hadoop
 
ahepburn MDES PRES2 Production Tech Its only a Comic
ahepburn MDES PRES2 Production Tech Its only a Comicahepburn MDES PRES2 Production Tech Its only a Comic
ahepburn MDES PRES2 Production Tech Its only a Comic
 
TRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOPTRAFFIC DATA ANALYSIS USING HADOOP
TRAFFIC DATA ANALYSIS USING HADOOP
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
 
Chistesvarios8
Chistesvarios8Chistesvarios8
Chistesvarios8
 
How to Add an App to Your LinkedIn Profile
How to Add an App to Your LinkedIn ProfileHow to Add an App to Your LinkedIn Profile
How to Add an App to Your LinkedIn Profile
 
ExcelCertificate18122014
ExcelCertificate18122014ExcelCertificate18122014
ExcelCertificate18122014
 

Similar to Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - Jacob Rapp, Cisco

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Hadoop
HadoopHadoop
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
Shreyansh Ajit kumar
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
Ericsson
 
H04502048051
H04502048051H04502048051
H04502048051
ijceronline
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
Geohedrick
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
butest
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 

Similar to Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - Jacob Rapp, Cisco (20)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
H04502048051
H04502048051H04502048051
H04502048051
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
E031201032036
E031201032036E031201032036
E031201032036
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - Jacob Rapp, Cisco

  • 1. Cisco and Big Data – Hadoop World 2011
  • 2. Unified Fabric optimized for Big Data infrastructures with seamless integration with current data models Traditional Database RDBMS Storage SAN/NAS “ Big Data” Store And Analyze “ Big Data” Real-Time Capture, Read & Update NoSQL Application Virtualized, Bare-Metal, Cloud Sensor Data Logs Social Media Click Streams Mobility Trends Event Data Cisco Unified Fabric
  • 3.  
  • 4.
  • 5.
  • 6.
  • 7. A general characteristic of an optimally configured cluster is the ability to decrease job completion times by scaling out the nodes. Test results from ETL-like Workload (Yahoo Terasort) using 1TB data set.
  • 8.
  • 9. Network Graph of all Traffic received on an single node (80 Node run) Reducers Start Maps Finish Job Complete Maps Start These symbols represent a node sending traffic to HPC064 Note: Shortly after the Reducers start Map tasks are finishing and data is being shuffled to reducers As Maps completely finish the network is no loner used as Reducers have all the data they need to finish the job The red line is the total amount of traffic received by hpc064
  • 10.
  • 11. Network Graph of all Traffic received on an single node (80 node run) Reducers Start Maps Finish Job Complete Maps Start The red line is the total amount of traffic received by hpc064 These symbols represent a node sending traffic to HPC064 Note: Due the the combination of the length of the Map phase and the reduced data set being shuffled, the network is being utilized throughout the job, but by a limited amount.
  • 12. Given the same MapReduce Job, the larger the input dataset, the longer the job will take. Note: It is important to note that as dataset sizes increase completion times may not scale linearly as many jobs can hit the ceiling of I/O and/or Compute power. Test results from ETL-like Workload (Yahoo Terasort) using varying data set sizes.
  • 13. The I/O capacity, CPU and memory of the Data Node can have a direct impact on performance of a cluster. Note: A 2RU Server with 16 disks gives the node more storage, but trading off CPU per RU. On the other hand a 1RU server gives more CPU per rack.
  • 14. Data Locality – The ability to process data where it is locally stored. Note: During the Map Phase, the JobTracker attempts to use data locality to schedule map tasks where the data is locally stored. This is not perfect and is dependent on a data nodes where the data is located. This is a consideration when choosing the replication factor. More replicas tend to create higher probability for data locality. Map Tasks: Initial spike for non-local data. Sometimes a task may be scheduled on a node that does not have the data available locally.
  • 15. Hadoop clusters are generally multi-use. The effect of background use can effect any single jobs completion. Note: A given Cluster, is generally running may different types of Jobs, Importing into HDFS, Etc. Example View of 24 Hour Cluster Use Large ETL Job Overlaps with medium and small ETL Jobs and many small BI Jobs (Blue lines are ETL Jobs and purple lines are BI Jobs) Importing Data into HDFS
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. In a multi-use cluster described previously, multiple job types (ETL, BI, etc.) and Importing data into HDFS can be happening at the same time Note: Usage may vary depending on job scheduling options
  • 22. In the largest workloads, multi-terabytes can be transmitted across the network Note: Data taken from multi-use workload (Multi-ETL + Multi-BI + HDFS Import).
  • 23. Generally 1GE is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload. Note: Multiple 1GE links can be bonded together to not only increase bandwidth, but increase bandwidth.
  • 24. Moving from 1GE to 10GE actually lowers the buffer requirement on the switching layer. Note: By moving to 10GE, the data node has a larger input to receive data lessening the need for buffers on the network as the total aggregate speed or amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
  • 25. Generally network latency, while consistent latency being important, does not represent a significant factor for Hadoop Clusters. Note: There is a difference in network latency vs. application latency. Optimization in the application stack can decrease application latency that can potentially have a significant benefit.
  • 26. For more information: www.cisco.com/go/bigdata
  • 27.