• Save
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc.
 

Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc.

on

  • 1,050 views

SGI has been a leading commercial vendor of Hadoop clusters since 2008. Leveraging SGI's experience with high performance clusters at scale, SGI has delivered individual Hadoop clusters of up to 4000 ...

SGI has been a leading commercial vendor of Hadoop clusters since 2008. Leveraging SGI's experience with high performance clusters at scale, SGI has delivered individual Hadoop clusters of up to 4000 nodes. Integration, performance, and management all become issues at scale, and Hadoop clusters scale! In this presentation, SGI will discuss representative customer use cases, major design considerations for performance and power optimization, how integrated Hadoop solutions leveraging CDH, SGI Rackable clusters, and SGI Management Center best meet customer needs, and how SGI envisions the needs of enterprise customers evolving as Hadoop continues to move into mainstream adoption.

Statistics

Views

Total Views
1,050
Views on SlideShare
784
Embed Views
266

Actions

Likes
2
Downloads
0
Comments
0

7 Embeds 266

http://www.cloudera.com 240
http://www.techgig.com 13
http://cloudera.louddog.net 6
http://115.112.206.131 3
http://blog.cloudera.com 2
http://cloudera.matt.dev 1
http://www.m.techgig.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.
  • only one to support full height PCI cards The Rackable cabinet line accommodates either form factor but is especially relevant to support half-depth, back-to-back mounted servers. A variety of sizes are available in both 24” and 26” rack widths. Both feature cable troughs for exceptionally clean cabling and serviceability. As mentioned previously, one of the big advantages of the Rackable line is the ability of cabinets to ship fully integrated with servers, networking and storage. This allows servers to be up and running extraordinarily quickly, assuming the data center has been prepared in advance to receive them. There was a new 24 in. Foundation design, but here we’re talking about the 24 in. Destination Rack, which has the followed dimensions: With front and rear doors air cooled 24” wide X 44.80” deep X 78.74” tall With front door and water cooled 24” wide X 50.38” deep X 78.74” tall Bare frame without doors 24” wide X 40” deep X 78.74” tall
  • Software Engineering is an enablement organization for platform, software and services sales through software differentiation and integration to end-users, developers and IT. As a product company,the most R&D goes in silicon design , board and system integration, as well as packaging and cooling. The main goal for the SW stack is to expose the HW differentiations to the application level where our end-users can take benefit of the HW differentiaoons and also add more value through SW differentiations. This is a key element of our SW steategy, we invest in developemnt only there where we add value (MPT and tools, NUMA tools,RAS features). If there is a good 3rd party or open source solution we integrate it in our Sw stack. We can say we have an RDI org (Research, Develop and Integrate ) 1 ANIMATION The end -users buy our platform to solve their business problems and thiese are addressed by appliciations. Custmers select frist or have alredy in use the application(s) needed in their workflow, and that application drives the platform prourement. We do platform enablement, integrate third party or open soiurce OS, middleware, libraries and have experts in applciation domain like CSM, CFD CEM;.... Our SW development resources apply mostly to both ends of the stack, system SW level (years of contribution to Linux to support scalabiliy to the level that or SM architecture is capable) and SFS which a key component to implement SGI differentiations on a standard Linux distribution and the application level 2nd Animation But there are two other categories of customes: - IT – system administrators that make and maintain the system operational. - Then we have developers, people that do SW developmemt on our platforms. 3rd animation For them we offer two products: SMS and SPS The slide captures all our SW Products as they align with the new platforms. 4th animation Summatizing the thre are of focus defining the core of SW engineering For IT Intelligent System Monitoring and Management Software For code writtes: deleopment tools for performance For customer: deliver results through applications 3 min
  • The Rackable line is unmatched in its configurability. Starting with the choice of an Intel or AMD based architecture, Rackable servers are available in a wide variety of form factors that enable customers to tailor the server to their data center and applications. Half-depth servers mount back to back, delivering a high density with hot aisle containment within the rack. Standard-depth servers are designed for heterogeneous deployments in hot-aisle/cold-aisle based data centers. Beyond the form factor, a wide variety of hard drive, memory and networking infrastructure choices are available. And if that still doesn’t yield the desired configuration, the SGI Engineering team can engineer a customized, Design To Order solution that it is an exact fit.
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.

Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc. Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc. Presentation Transcript

  • I Want to Be BigLessons Learned at ScaleDave “ Sunny” SundstromDirector, Software Products 1
  • Our StrategyThe Trusted Leader in Technical Computing Business Big Technical Applications Data Applications Business Computing Technical Computing Redundancy Workload Optimized Scale & Speed©2011 SGI 2
  • The Opportunity HPC Big Data Cloud Commercial Hadoop Public Scientific In-memory Private Modeling & Simulation Analytics Government Archive Providing Customers with Trusted Technical Computing Solutions©2011 SGI 3
  • Solutions That Are Our Customers’ Business Content distribution and fraud reduction NASDAQ and Frequency Firms trading billions of shares a day Motorola delivering a new generation NBA keeps score with their content of content to the palm of your hand©2011 SGI 4
  • SGI Leadership at Speed and Scale • First and only: • server to surpass both 256 cores and 4 TB • x86 server to surpass 80 cores and 2 TB • server to surpass 500K SPECjbb2005 BOPS/JVM, 25M SPECjbb2005 throughput • server to surpass 5000 STREAM GB/s (5X nearest competition) • 1PF+ peak IB cluster using only general purpose cpus • Worlds fastest distributed memory system as measured by SPEC MPI 2007. (Published results on spec.org as of 7-25-11.) • Largest disk-based archive solution (1.44 PB usable in a single rack) • Densest storage system (2.7 PB raw per cabinet)©2011 SGI 5
  • SGI Hadoop Background • SGI has been one of the leading commercial suppliers of Hadoop servers since the technology was introduced • Leading technology users deploy on SGI Hadoop Clusters • SGI supplies customer optimized Hadoop Clusters to key US government agencies • SGI has sourced Hadoop installations as large as 40,000 nodes and individual clusters as large as 4,000 nodes©2011 SGI 6
  • Terasort Scaling: SGI Hadoop Cluster Terasort Scaling: SGI Rackable C2005-TY6 Hadoop Cluster - 100 GB job size 50 45 Terasort Scaling Terasort @ 100GB scales 40 Linear Scaling super linearly on a 20-node 35 SGI Rackable C2005-TY6Scaling 30 25 cluster running Cloudera 20 distribution of Apache Hadoop 15 (CDH3u0) 10 5 Terasort Scaling on SGI vs. Sun Hadoop Cluster 0 100 GB input data size 1 5 10 15 20 Terasort Scaling SGI Rackable C2005-TY6 Number of Nodes 50 cluster Terasort Scaling Sun X2270 M2 cluster 40 Linear Scaling 30 SGI Rackable C2005- Scaling TY6 cluster running 20 Cloudera Hadoop is 10 81% faster than Sun X2270 cluster of a 0 similar size. 0 5 10 15 20 25 -10 Number of Nodes Sources:  http://sun.systemnews.com/articles/152/1/server/23549 and SGI internal measurements ©2011 SGI 7
  • Hadoop matures…………….. Chasm©2011 SGI 8
  • from experiment………….©2011 SGI 9
  • to technology …………...©2011 SGI 10
  • to system …………...©2011 SGI 11
  • ………………to platform!©2011 SGI 12
  • What Hath Scale Wrought• Large scale cluster TCO is dominated by infrastructure and management costs• Power and cooling costs are trending to exceed capital costs• Time to production is a major wasted cost• Management of large scale platforms is difficult, expensive and reduces availability ©2011 SGI 13
  • SGI Hadoop Clusters – Key Customer Values • Optimized to specific to customer requirements: • High performance • Power • Density • Price • Unprecedented speed and scale • World record benchmarks • Manageability of large configurations • Scale at lower datacenter TCO • High performance networking • Data management solutions for data ingest and archive • Start Up and GO! • Factory integrated Hadoop solutions • Cloudera’s Distribution (CDH) and Cloudera Enterprise • Direct delivery to production ©2011 SGI 14
  • SGI Hadoop Clusters – Have It Your Way • Flexible, optimized and specific to customer requirements. Designed to Order (DTO): • Performance • Power • Dynamically managed • AC or DC • Density • Half depth back-to-back • Full depth • Standard and CloudRack • Cooling • Hotzone/cold zone • Central • Storage options • Price • SGI Hadoop Cluster Reference Implementations©2011 SGI 15
  • Introducing SGI Hadoop ClusterReference Implementations 1 Rack:  10GigE • Highest density 256 TB  useable  capacity  Import, Export,  • Power optimized Search, Mine,  Predict & Visualize  1/2 Rack:  data for Business  • Highest data 128 TB  useable  Intelligence  capacity capacity  Multi‐Rack:  Petabytes  • Factory useable  capacity  integrated • Cloudera certified©2011 SGI 16
  • SGI Hadoop Cluster Reference Implementation LG-Ericsson ES4548 - 48port GigE 22 LG-Ericsson ES4548 - 48port GigE 21 LG-Ericsson ES4548 - 48port GigE C2005-TY2 2x5645 6x8GB 4X1TB 20 C2005-TY2 2x5675 12x8GB 4X1TBSecondaryNameNode/SGI-MC Headnode 19 Application Node C2005-TY2 2x5645 6x8GB 4X1TB 18 C2005-TY2 2x5645 6x8GB 4X1TB Namenode 17 Jobtracker C2005-TY6 2x5645 6x8GB 10X1TB 16 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 15 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 14 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 13 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 12 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 11 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 10 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 9 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 8 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 7 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 6 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 5 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 4 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 3 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 2 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 1 Data/TaskTracker Node • SGI Hadoop Cluster (half rack building block/20‐node cluster shown) • SGI Rackable half‐depth, back to back servers (shown) or flow‐through • NameNode, JobTracker, Secondary NameNode, Application Node, 16  DataNodes/TaskTrackers • 128 Terabyte capacity • World Record ssj_ops/watt per DataNode • Dynamic power management through SGI Management Center ©2011 SGI 17
  • SGI Hadoop Cluster Power Optimized and Managed • Industry first: power optimized Hadoop cluster • Fully tunable Hadoop operations/watt: more ops per watt • Fine grained power management • Dynamic, policy driven, power envelope management©2011 SGI 18
  • SGI/Cloudera Software Environment Complete, factory installed, production Hadoop environment File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata SGI® Management Center – Premium Edition APACHE OOZIE APACHE OOZIE APACHE HIVE SGI® Management Center‐ Standard Edition SGI® Management Center – Power Option SGI® Management Suite Languages / Compilers APACHE PIG, APACHE HIVE Fast Read/Write Data Integration Access APACHE FLUME, APACHE SQOOP APACHE HBASE Coordination APACHE ZOOKEEPER SGI® Foundation Software Commercial and Community Linux Distributions SGI Hadoop Cluster Reference implementation or DTO©2011 SGI 19
  • Stand Up and Go SGI Services for Minimal Time to Production Design/Plan Deploy/Run Support/Maintain Sales/Consulting Services Managed Services Customer Support Services• Solution design across • In-factory integration • Global 7x24 Call Centers hardware, software, and and testing • eServices and online datacenter • Installation and support• Site readiness production services • Extended warranty and• Assessments • Performance tuning response times• Integration of 3rd party • System administration • Onsite and remote solutions • Onsite personnel technical support• Architecture design • Education • Materials management and development • Managed services• Packaged offerings SGI Partner Ecosystem ©2011 SGI 20
  • Hadoop Analytical Players SGI Analytics Ecosystem Expanding partnerships with industry leading analytics,          visualization and tool vendors Big Data Import, Export, Rich Analytics Big Data Content Search  and Interactive Analytics Big Data ETL, Business Intelligence  Data Modeling, and Visualization for  on Hadoop with Hive QL Interactive Business Intelligence on  Hadoop ©2011 SGI 21
  • SGI Hadoop Clusters • Design to order and reference implementations with industry leading speed and scale: • Highest performance Intel & AMD processors, including both low- and high-wattage CPUs • Integrated scale-up and scale out systems customized to analytics workloads and data ingest requirements • High performance networking: GigE, 10 GigE, 40GigE, IB with acceleration • Broadest variety of storage options: transfer rate, density, capacities, speeds©2011 SGI 22
  • SGI Business Intelligence SolutionsIntegrated Solutions Data ingest Compute Storage Network Analytics Contracting Archive SGI Uniquely Positioned to Deliver Highest speed and scale Integrated solutions Proven best-of-breed technologies©2011 SGI 23
  • SGI Business Intelligence SolutionsIntegrated Analytics Cluster Actionable, Time‐critical Business Intelligence Applications  Social Media (Click  Financials (Risk analysis,  Federal & Defense  Telecom Stream Analytics, User  detecting frauds in credit  (Fraud detection,  (Customer trend  search patterns,  transactions/ insurance  predictive  analysis, network  Behavioral analysis)  claims)  demographics, security  usage patterns, fraud  analysis  detection)  Pentaho Import, Analytics,  Hadoop BI Appliance  BI, Dashboard,  Mining, Export  Datameer Kitenga Quantum4D Import, Analytics,  Content Search,  Visual Insight, Data  Hive QL Dashboard, Export  Analytics  Modeling,   Interactive BI  HBase Cloudera Enterprise  Red Hat Enterprise Linux  SGI Integrated Cluster ©2011 SGI 24
  • SGI Business Intelligence Solutions In-memory Transactional Database with Hadoop • VoltDB Relational Database Transactions/Sec: VoltDB on a SGI Rack C1001‐ TY3 Cluster • Single‐node throughput:  4,000,000 120K ACID transactions/sec 3,500,000 • 30‐node cluster throughput:  3,000,000 3.4M ACID transactions/sec 2,500,000 • Scaling: linear, scale factor TPS 2,000,000 0.86/node 1,500,000 • Real‐time inserts up to  1,000,000 10/sec having no impact on  500,000 the throughput 0 • Constant 10ms latency  5 10 15 20 25 30 across the 30‐node cluster Number of Server Nodes • Price/Performance:   $0.08  TPS Refer to: http://www.sgi.com/pdfs/4238.pdf ©2011 SGI 25
  • SGI Business Intelligence SolutionsIntegration of structured and unstructured data Actionable, Time‐critical Business Intelligence Applications  Social Media (Click  Financials (Risk analysis,  Federal & Defense  Telecom Stream Analytics, User  detecting frauds in credit  (Fraud detection,  (Customer trend  search patterns,  transactions/ insurance  predictive  analysis, network  Behavioral analysis)  claims)  demographics, security  usage patterns, fraud  analysis  detection)  Online Transaction Applications Online Social  Online Financial  Online Federal &  Online  Media  Transactions  Defense Transactions  Telecommunications  Transactions   Pentaho Import, Analytics,  Data Ingest Hadoop BI Appliance  BI, Dashboard,  Datameer  Kitenga  Quantum4D Mining, Export  Data  Integration Import, Analytics,  Content Search,  Visual Insight, Data  Hive QL Dashboard, Export  Analytics  Modeling,   Interactive BI  HBase  Transactional Database for Data  Cloudera Enterprise Ingestion (VoltDB)  Red Hat Enterprise Linux  SGI Integrated Cluster  SGI Integrated Cluster SGI Hadoop Solution for “Deep Data” Analytics Transactional Database/Trading systems                    for “Fast Data” IngestionSGI BI solution combines data ingestion and data integration capabilities to provide a best‐in‐class solution for fast ingestion and deep analyticsBuilt on SGI Hadoop Cluster using Rackable or CloudRack Server, commercial Linux distributions, SGI Management Center, and      Cloudera Enterprise ©2011 SGI 26
  • See More SGI High Performance Solutions at: • November 14 -18 • Seattle • December 5 - 8 2011 • Las Vegas©2011 SGI 27
  • SGI Hadoop Clusters – Key Customer Values • Optimized to specific to customer requirements: • High performance • Power • Density • Price • Unprecedented speed and scale • World record benchmarks • Manageability of large configurations • Scale at lower datacenter TCO • High performance networking • Data management solutions for data ingest and archive • Start Up and GO! • Factory integrated Hadoop solutions • Cloudera’s Distribution (CDH) and Cloudera Enterprise • Direct delivery to production ©2011 SGI 28
  • !!%$@#*#)©2011 SGI 29
  • ©2011 SGI I Want to Be Big 30
  • BIG! BIG!©2011 SGI 31
  • Q&A Dave “ Sunny” Sundstrom dsundstrom@sgi.com©2011 SGI 32
  • ©2011 SGI 33