I Want to Be BigLessons Learned at ScaleDave “ Sunny” SundstromDirector, Software Products                          1
Our StrategyThe Trusted Leader in Technical Computing             Business                Big             Technical       ...
The Opportunity            HPC                Big Data                  Cloud      Commercial                  Hadoop     ...
Solutions That Are Our Customers’ Business Content distribution and fraud reduction       NASDAQ and Frequency Firms      ...
SGI Leadership at Speed and Scale        • First and only:            • server to surpass both 256 cores and 4 TB         ...
SGI Hadoop Background        • SGI has been one of the leading commercial          suppliers of Hadoop servers since the  ...
Terasort Scaling: SGI Hadoop Cluster               Terasort Scaling: SGI Rackable C2005-TY6 Hadoop                        ...
Hadoop matures……………..                     Chasm©2011 SGI            8
from experiment………….©2011 SGI            9
to    technology    …………...©2011 SGI        10
to system    …………...©2011 SGI       11
………………to platform!©2011 SGI           12
What Hath Scale Wrought•     Large scale cluster      TCO is dominated by      infrastructure and      management costs•  ...
SGI Hadoop Clusters – Key Customer Values  • Optimized to specific to customer    requirements:             •   High perfo...
SGI Hadoop Clusters – Have It Your Way • Flexible, optimized and specific to   customer requirements. Designed   to Order ...
Introducing SGI Hadoop ClusterReference Implementations      1 Rack:                    10GigE                            ...
SGI Hadoop Cluster Reference Implementation   LG-Ericsson ES4548 - 48port GigE                22   LG-Ericsson ES4548 - 48...
SGI Hadoop Cluster Power Optimized and Managed      •     Industry first: power optimized            Hadoop cluster      •...
SGI/Cloudera Software Environment  Complete, factory installed, production Hadoop environment                             ...
Stand Up and Go  SGI Services for Minimal Time to Production  Design/Plan                   Deploy/Run                 Sup...
Hadoop Analytical Players              SGI Analytics Ecosystem             Expanding partnerships with industry leading an...
SGI Hadoop Clusters   • Design to order and reference     implementations with industry leading     speed and scale:      ...
SGI Business Intelligence SolutionsIntegrated Solutions  Data ingest    Compute   Storage    Network   Analytics   Contrac...
SGI Business Intelligence SolutionsIntegrated Analytics Cluster                                                 Actionable...
SGI Business Intelligence Solutions In-memory Transactional Database with Hadoop                                          ...
SGI Business Intelligence SolutionsIntegration of structured and unstructured data                               Actionabl...
See More SGI High            Performance Solutions at:                               • November 14 -18                    ...
SGI Hadoop Clusters – Key Customer Values  • Optimized to specific to customer    requirements:             •   High perfo...
!!%$@#*#)©2011 SGI   29
©2011 SGI            I Want to Be Big                   30
BIG!                 BIG!©2011 SGI   31
Q&A            Dave “ Sunny” Sundstrom             dsundstrom@sgi.com©2011 SGI           32
©2011 SGI   33
Upcoming SlideShare
Loading in...5
×

Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc.

1,000
-1

Published on

SGI has been a leading commercial vendor of Hadoop clusters since 2008. Leveraging SGI's experience with high performance clusters at scale, SGI has delivered individual Hadoop clusters of up to 4000 nodes. Integration, performance, and management all become issues at scale, and Hadoop clusters scale! In this presentation, SGI will discuss representative customer use cases, major design considerations for performance and power optimization, how integrated Hadoop solutions leveraging CDH, SGI Rackable clusters, and SGI Management Center best meet customer needs, and how SGI envisions the needs of enterprise customers evolving as Hadoop continues to move into mainstream adoption.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,000
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.
  • only one to support full height PCI cards The Rackable cabinet line accommodates either form factor but is especially relevant to support half-depth, back-to-back mounted servers. A variety of sizes are available in both 24” and 26” rack widths. Both feature cable troughs for exceptionally clean cabling and serviceability. As mentioned previously, one of the big advantages of the Rackable line is the ability of cabinets to ship fully integrated with servers, networking and storage. This allows servers to be up and running extraordinarily quickly, assuming the data center has been prepared in advance to receive them. There was a new 24 in. Foundation design, but here we’re talking about the 24 in. Destination Rack, which has the followed dimensions: With front and rear doors air cooled 24” wide X 44.80” deep X 78.74” tall With front door and water cooled 24” wide X 50.38” deep X 78.74” tall Bare frame without doors 24” wide X 40” deep X 78.74” tall
  • Software Engineering is an enablement organization for platform, software and services sales through software differentiation and integration to end-users, developers and IT. As a product company,the most R&D goes in silicon design , board and system integration, as well as packaging and cooling. The main goal for the SW stack is to expose the HW differentiations to the application level where our end-users can take benefit of the HW differentiaoons and also add more value through SW differentiations. This is a key element of our SW steategy, we invest in developemnt only there where we add value (MPT and tools, NUMA tools,RAS features). If there is a good 3rd party or open source solution we integrate it in our Sw stack. We can say we have an RDI org (Research, Develop and Integrate ) 1 ANIMATION The end -users buy our platform to solve their business problems and thiese are addressed by appliciations. Custmers select frist or have alredy in use the application(s) needed in their workflow, and that application drives the platform prourement. We do platform enablement, integrate third party or open soiurce OS, middleware, libraries and have experts in applciation domain like CSM, CFD CEM;.... Our SW development resources apply mostly to both ends of the stack, system SW level (years of contribution to Linux to support scalabiliy to the level that or SM architecture is capable) and SFS which a key component to implement SGI differentiations on a standard Linux distribution and the application level 2nd Animation But there are two other categories of customes: - IT – system administrators that make and maintain the system operational. - Then we have developers, people that do SW developmemt on our platforms. 3rd animation For them we offer two products: SMS and SPS The slide captures all our SW Products as they align with the new platforms. 4th animation Summatizing the thre are of focus defining the core of SW engineering For IT Intelligent System Monitoring and Management Software For code writtes: deleopment tools for performance For customer: deliver results through applications 3 min
  • The Rackable line is unmatched in its configurability. Starting with the choice of an Intel or AMD based architecture, Rackable servers are available in a wide variety of form factors that enable customers to tailor the server to their data center and applications. Half-depth servers mount back to back, delivering a high density with hot aisle containment within the rack. Standard-depth servers are designed for heterogeneous deployments in hot-aisle/cold-aisle based data centers. Beyond the form factor, a wide variety of hard drive, memory and networking infrastructure choices are available. And if that still doesn’t yield the desired configuration, the SGI Engineering team can engineer a customized, Design To Order solution that it is an exact fit.
  • The value proposition of the Rackable product line is truly compelling. The amount of configuration flexibility is truly industry-leading, allowing customers to meet their exact needs for any given application. With up to 2,208 cores per cabinet, Rackable servers deliver high density, allowing data centers to conserve precious space. With power increasingly at a premium, our Eco-Logical ™ technology is of particular value to customers. Rackable servers typically consume significantly less power than competitive offerings, dropping Apex cost per server while allowing a larger number of servers to fit in the facility ’ s power budget. Whether it be done at the application level or in the server hardware, Rackable configurations are capable of delivering high reliability, availability, serviceability and manageability. And finally, SGI offers Rackable servers as part of complete solutions – fully integrated from both a hardware and software point of view.
  • Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny" Sundstrom - SGI, Inc.

    1. 1. I Want to Be BigLessons Learned at ScaleDave “ Sunny” SundstromDirector, Software Products 1
    2. 2. Our StrategyThe Trusted Leader in Technical Computing Business Big Technical Applications Data Applications Business Computing Technical Computing Redundancy Workload Optimized Scale & Speed©2011 SGI 2
    3. 3. The Opportunity HPC Big Data Cloud Commercial Hadoop Public Scientific In-memory Private Modeling & Simulation Analytics Government Archive Providing Customers with Trusted Technical Computing Solutions©2011 SGI 3
    4. 4. Solutions That Are Our Customers’ Business Content distribution and fraud reduction NASDAQ and Frequency Firms trading billions of shares a day Motorola delivering a new generation NBA keeps score with their content of content to the palm of your hand©2011 SGI 4
    5. 5. SGI Leadership at Speed and Scale • First and only: • server to surpass both 256 cores and 4 TB • x86 server to surpass 80 cores and 2 TB • server to surpass 500K SPECjbb2005 BOPS/JVM, 25M SPECjbb2005 throughput • server to surpass 5000 STREAM GB/s (5X nearest competition) • 1PF+ peak IB cluster using only general purpose cpus • Worlds fastest distributed memory system as measured by SPEC MPI 2007. (Published results on spec.org as of 7-25-11.) • Largest disk-based archive solution (1.44 PB usable in a single rack) • Densest storage system (2.7 PB raw per cabinet)©2011 SGI 5
    6. 6. SGI Hadoop Background • SGI has been one of the leading commercial suppliers of Hadoop servers since the technology was introduced • Leading technology users deploy on SGI Hadoop Clusters • SGI supplies customer optimized Hadoop Clusters to key US government agencies • SGI has sourced Hadoop installations as large as 40,000 nodes and individual clusters as large as 4,000 nodes©2011 SGI 6
    7. 7. Terasort Scaling: SGI Hadoop Cluster Terasort Scaling: SGI Rackable C2005-TY6 Hadoop Cluster - 100 GB job size 50 45 Terasort Scaling Terasort @ 100GB scales 40 Linear Scaling super linearly on a 20-node 35 SGI Rackable C2005-TY6Scaling 30 25 cluster running Cloudera 20 distribution of Apache Hadoop 15 (CDH3u0) 10 5 Terasort Scaling on SGI vs. Sun Hadoop Cluster 0 100 GB input data size 1 5 10 15 20 Terasort Scaling SGI Rackable C2005-TY6 Number of Nodes 50 cluster Terasort Scaling Sun X2270 M2 cluster 40 Linear Scaling 30 SGI Rackable C2005- Scaling TY6 cluster running 20 Cloudera Hadoop is 10 81% faster than Sun X2270 cluster of a 0 similar size. 0 5 10 15 20 25 -10 Number of Nodes Sources:  http://sun.systemnews.com/articles/152/1/server/23549 and SGI internal measurements ©2011 SGI 7
    8. 8. Hadoop matures…………….. Chasm©2011 SGI 8
    9. 9. from experiment………….©2011 SGI 9
    10. 10. to technology …………...©2011 SGI 10
    11. 11. to system …………...©2011 SGI 11
    12. 12. ………………to platform!©2011 SGI 12
    13. 13. What Hath Scale Wrought• Large scale cluster TCO is dominated by infrastructure and management costs• Power and cooling costs are trending to exceed capital costs• Time to production is a major wasted cost• Management of large scale platforms is difficult, expensive and reduces availability ©2011 SGI 13
    14. 14. SGI Hadoop Clusters – Key Customer Values • Optimized to specific to customer requirements: • High performance • Power • Density • Price • Unprecedented speed and scale • World record benchmarks • Manageability of large configurations • Scale at lower datacenter TCO • High performance networking • Data management solutions for data ingest and archive • Start Up and GO! • Factory integrated Hadoop solutions • Cloudera’s Distribution (CDH) and Cloudera Enterprise • Direct delivery to production ©2011 SGI 14
    15. 15. SGI Hadoop Clusters – Have It Your Way • Flexible, optimized and specific to customer requirements. Designed to Order (DTO): • Performance • Power • Dynamically managed • AC or DC • Density • Half depth back-to-back • Full depth • Standard and CloudRack • Cooling • Hotzone/cold zone • Central • Storage options • Price • SGI Hadoop Cluster Reference Implementations©2011 SGI 15
    16. 16. Introducing SGI Hadoop ClusterReference Implementations 1 Rack:  10GigE • Highest density 256 TB  useable  capacity  Import, Export,  • Power optimized Search, Mine,  Predict & Visualize  1/2 Rack:  data for Business  • Highest data 128 TB  useable  Intelligence  capacity capacity  Multi‐Rack:  Petabytes  • Factory useable  capacity  integrated • Cloudera certified©2011 SGI 16
    17. 17. SGI Hadoop Cluster Reference Implementation LG-Ericsson ES4548 - 48port GigE 22 LG-Ericsson ES4548 - 48port GigE 21 LG-Ericsson ES4548 - 48port GigE C2005-TY2 2x5645 6x8GB 4X1TB 20 C2005-TY2 2x5675 12x8GB 4X1TBSecondaryNameNode/SGI-MC Headnode 19 Application Node C2005-TY2 2x5645 6x8GB 4X1TB 18 C2005-TY2 2x5645 6x8GB 4X1TB Namenode 17 Jobtracker C2005-TY6 2x5645 6x8GB 10X1TB 16 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 15 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 14 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 13 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 12 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 11 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 10 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 9 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 8 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 7 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 6 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 5 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 4 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 3 Data/TaskTracker Node C2005-TY6 2x5645 6x8GB 10X1TB 2 C2005-TY6 2x5645 6x8GB 10X1TB Data/TaskTracker Node 1 Data/TaskTracker Node • SGI Hadoop Cluster (half rack building block/20‐node cluster shown) • SGI Rackable half‐depth, back to back servers (shown) or flow‐through • NameNode, JobTracker, Secondary NameNode, Application Node, 16  DataNodes/TaskTrackers • 128 Terabyte capacity • World Record ssj_ops/watt per DataNode • Dynamic power management through SGI Management Center ©2011 SGI 17
    18. 18. SGI Hadoop Cluster Power Optimized and Managed • Industry first: power optimized Hadoop cluster • Fully tunable Hadoop operations/watt: more ops per watt • Fine grained power management • Dynamic, policy driven, power envelope management©2011 SGI 18
    19. 19. SGI/Cloudera Software Environment Complete, factory installed, production Hadoop environment File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata SGI® Management Center – Premium Edition APACHE OOZIE APACHE OOZIE APACHE HIVE SGI® Management Center‐ Standard Edition SGI® Management Center – Power Option SGI® Management Suite Languages / Compilers APACHE PIG, APACHE HIVE Fast Read/Write Data Integration Access APACHE FLUME, APACHE SQOOP APACHE HBASE Coordination APACHE ZOOKEEPER SGI® Foundation Software Commercial and Community Linux Distributions SGI Hadoop Cluster Reference implementation or DTO©2011 SGI 19
    20. 20. Stand Up and Go SGI Services for Minimal Time to Production Design/Plan Deploy/Run Support/Maintain Sales/Consulting Services Managed Services Customer Support Services• Solution design across • In-factory integration • Global 7x24 Call Centers hardware, software, and and testing • eServices and online datacenter • Installation and support• Site readiness production services • Extended warranty and• Assessments • Performance tuning response times• Integration of 3rd party • System administration • Onsite and remote solutions • Onsite personnel technical support• Architecture design • Education • Materials management and development • Managed services• Packaged offerings SGI Partner Ecosystem ©2011 SGI 20
    21. 21. Hadoop Analytical Players SGI Analytics Ecosystem Expanding partnerships with industry leading analytics,          visualization and tool vendors Big Data Import, Export, Rich Analytics Big Data Content Search  and Interactive Analytics Big Data ETL, Business Intelligence  Data Modeling, and Visualization for  on Hadoop with Hive QL Interactive Business Intelligence on  Hadoop ©2011 SGI 21
    22. 22. SGI Hadoop Clusters • Design to order and reference implementations with industry leading speed and scale: • Highest performance Intel & AMD processors, including both low- and high-wattage CPUs • Integrated scale-up and scale out systems customized to analytics workloads and data ingest requirements • High performance networking: GigE, 10 GigE, 40GigE, IB with acceleration • Broadest variety of storage options: transfer rate, density, capacities, speeds©2011 SGI 22
    23. 23. SGI Business Intelligence SolutionsIntegrated Solutions Data ingest Compute Storage Network Analytics Contracting Archive SGI Uniquely Positioned to Deliver Highest speed and scale Integrated solutions Proven best-of-breed technologies©2011 SGI 23
    24. 24. SGI Business Intelligence SolutionsIntegrated Analytics Cluster Actionable, Time‐critical Business Intelligence Applications  Social Media (Click  Financials (Risk analysis,  Federal & Defense  Telecom Stream Analytics, User  detecting frauds in credit  (Fraud detection,  (Customer trend  search patterns,  transactions/ insurance  predictive  analysis, network  Behavioral analysis)  claims)  demographics, security  usage patterns, fraud  analysis  detection)  Pentaho Import, Analytics,  Hadoop BI Appliance  BI, Dashboard,  Mining, Export  Datameer Kitenga Quantum4D Import, Analytics,  Content Search,  Visual Insight, Data  Hive QL Dashboard, Export  Analytics  Modeling,   Interactive BI  HBase Cloudera Enterprise  Red Hat Enterprise Linux  SGI Integrated Cluster ©2011 SGI 24
    25. 25. SGI Business Intelligence Solutions In-memory Transactional Database with Hadoop • VoltDB Relational Database Transactions/Sec: VoltDB on a SGI Rack C1001‐ TY3 Cluster • Single‐node throughput:  4,000,000 120K ACID transactions/sec 3,500,000 • 30‐node cluster throughput:  3,000,000 3.4M ACID transactions/sec 2,500,000 • Scaling: linear, scale factor TPS 2,000,000 0.86/node 1,500,000 • Real‐time inserts up to  1,000,000 10/sec having no impact on  500,000 the throughput 0 • Constant 10ms latency  5 10 15 20 25 30 across the 30‐node cluster Number of Server Nodes • Price/Performance:   $0.08  TPS Refer to: http://www.sgi.com/pdfs/4238.pdf ©2011 SGI 25
    26. 26. SGI Business Intelligence SolutionsIntegration of structured and unstructured data Actionable, Time‐critical Business Intelligence Applications  Social Media (Click  Financials (Risk analysis,  Federal & Defense  Telecom Stream Analytics, User  detecting frauds in credit  (Fraud detection,  (Customer trend  search patterns,  transactions/ insurance  predictive  analysis, network  Behavioral analysis)  claims)  demographics, security  usage patterns, fraud  analysis  detection)  Online Transaction Applications Online Social  Online Financial  Online Federal &  Online  Media  Transactions  Defense Transactions  Telecommunications  Transactions   Pentaho Import, Analytics,  Data Ingest Hadoop BI Appliance  BI, Dashboard,  Datameer  Kitenga  Quantum4D Mining, Export  Data  Integration Import, Analytics,  Content Search,  Visual Insight, Data  Hive QL Dashboard, Export  Analytics  Modeling,   Interactive BI  HBase  Transactional Database for Data  Cloudera Enterprise Ingestion (VoltDB)  Red Hat Enterprise Linux  SGI Integrated Cluster  SGI Integrated Cluster SGI Hadoop Solution for “Deep Data” Analytics Transactional Database/Trading systems                    for “Fast Data” IngestionSGI BI solution combines data ingestion and data integration capabilities to provide a best‐in‐class solution for fast ingestion and deep analyticsBuilt on SGI Hadoop Cluster using Rackable or CloudRack Server, commercial Linux distributions, SGI Management Center, and      Cloudera Enterprise ©2011 SGI 26
    27. 27. See More SGI High Performance Solutions at: • November 14 -18 • Seattle • December 5 - 8 2011 • Las Vegas©2011 SGI 27
    28. 28. SGI Hadoop Clusters – Key Customer Values • Optimized to specific to customer requirements: • High performance • Power • Density • Price • Unprecedented speed and scale • World record benchmarks • Manageability of large configurations • Scale at lower datacenter TCO • High performance networking • Data management solutions for data ingest and archive • Start Up and GO! • Factory integrated Hadoop solutions • Cloudera’s Distribution (CDH) and Cloudera Enterprise • Direct delivery to production ©2011 SGI 28
    29. 29. !!%$@#*#)©2011 SGI 29
    30. 30. ©2011 SGI I Want to Be Big 30
    31. 31. BIG! BIG!©2011 SGI 31
    32. 32. Q&A Dave “ Sunny” Sundstrom dsundstrom@sgi.com©2011 SGI 32
    33. 33. ©2011 SGI 33

    ×