101 ab 1415-1445


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

101 ab 1415-1445

  1. 1. The Infrastructure of Tomorrow, Today – Integrating Supermicro, Greenplum and SAS to enable Big Data Analytics Jeff Tsai 蔡穎碩 Solution Manager © Supermicro 2012
  2. 2. AgendaBig Data Analytics Platform & InfrastructureEMC+Supermicro  1,000 Nodes Hadoop Cluster
  3. 3. !!!“Big Data Is Less !!! About Size, And More About Freedom” ―Techcrunch !!! THE ERA OF !!! BIG DATA “Findings: „Big Data‟ !!! Is More Extreme Than Volume” “Big Data! It‟s Real, IS HERE… ― Gartner It‟s Real-time, and It‟s Already “Total data: „bigger‟ Changing Your than big data” World” !!! ― 451 Group !!! !!! ―IDB
  4. 4. Data Sources Are Expanding THE DIGITAL UNIVERSE WILL GROW 44X IN THE NEXT 10 YEARSSource : 2011 IDC Digital Universe Study
  5. 5. BIG Data is Just a Bunch of Data to Store…? OR 90 80 70 60 50 Big 40 Data 30 Sources 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC
  6. 6. To Create Significant value to your business… HOW?...
  7. 7. Make BIG DataAccessible Identify the data source Store the data Connect applications and users Utilize the data in different views
  8. 8. EMC UAP Solutions – Analytics Platform This is what my analyticsenvironment looks like…
  9. 9. Building The Big Data Analytics “Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Greenplum HD Enterprise & Community Editions Hadoop Enterprise & Community EditionsWorld’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data
  10. 10. Greenplum Becomes the Foundation of EMC’s Data Computing Division E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0 “For three years, Gartner has identified Greenplum as the most advanced vendor in the visionaryquadrant of its data warehouse DBMS Magic Quadrant….” – Gartner
  11. 11. SAS at a GlanceCompany Highlight:• Founded 1976: 11,000+ employees in 400+ offices• 2010 worldwide revenue $2.43 B• IDC: SAS is leader in Analytics with a 34.5% market share : Analytics and Reporting• 4.5 million users worldwide• 50,000+sites in 114 countries• From Tools to Vertical Solutions Services Retail 11% Other 4% Financial Services 2% 42% Manufacturing 6% Healthcare Communications & Life Sciences 8% 8% Government Education 14% Energy & Utilities 3% 2%
  12. 12. Overview SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCIRevenues: FY09 $500M, FY10 $721M , FY11 ~$1BGlobal Footprint: >100 CountriesProduction: US, EU and Asia Production facilitiesEngineering: 70% of workforce in engineering (30% growth through recession)Market Share: #1 Server Channel (SMCI enables ~10% of global server market)Brand Equity: Growing public profile since 2007 IPOCorporate Focus: Energy Efficiency, Earth-friendly, Green Technology Innovation
  13. 13. Product FamilyResource Optimized (WIO/UIO) Twin Architecture GPU SuperComputing Data Center Optimized EmbeddedApplication Optimized: Multi I/O SuperBlade WorkstationMainstream Business Solutions Storage Server
  14. 14. In-House Design and Server Building Block Solutions® Technology Partners Server Building Block Solutions® Customer Requirements Application Optimized OEM Specs Tri-Lab Optimized Data Center In-House Design Server Building Block Solutions® > 350 Operating >550 >1300 > 140 Power Open Cooling Systems / Motherboards Chassis Supplies CPU/ Memory Modules Applications(1) As of Q2, 2009
  15. 15. Big Data Analytics on HadoopInternet companies are not built on SQL but are building Analytics on Hadoop/NoSQL Existing Hadoop Users (Internet) This is what I think BI & ETL Tools Web Apps my analytics Reporting environment looks like… Management & Coordination Pig Hive HBase Hadoop System MapReduce Layer Hadoop Storage Web Portal, Social Networks
  16. 16. Hadoop Components (hadoop.apache.org) HDFS • Hadoop Distributed File System MapReduce • Framework for writing scalable data applications Pig • Procedural language that abstracts lower level MapReduce Zookeeper • Highly reliable distributed coordination Hive • Data warehouse infrastructure built on top of Hadoop HBase • Database for random, real time read/write access Oozie • workflow/coordination to manage jobs Mahout • Scalable machine learning libraries
  17. 17. What can Hadoop do for you? Financial Services  Web & e-Tailing  Better knowing customers  Web usage, click stream behavior  Risk analysis and management.  Market & customer segmentation  Fraud detection and security  Ad customer targeting analytics.  On-line fraud detection Telecommunications  Government  Customer churn prevention.  Fraud detection  Price optimization and marketing  Compliance and regulatory analytics  Network analysis and optimization  Customer experience management  Retail  Market and consumer segmentation Healthcare  Merchandizing and cross-selling  Patient care quality  Promotion and campaign analysis  Drug development Data Source: Cloudera
  18. 18. Hadoop Use Cases Linkedin – “People You May Know” and other facts Yahoo! – Hadoop to support AdSystems and web search Visa – Credit card fraud detection and analysis T-Mobile – Churn analysis, user experience Amazon, Baidu, AOL, eBay, Facebook, Twitter, … Data Source: Cloudera
  19. 19. Hadoop Cluster HW selection What’s the HW configuration for Hadoop clusters?... It depends, workloads matter. CPU Intensive I/O Intensive Machine learning Data importing and exporting Natural language processing Indexing Complex data mining Searching Feature extraction Grouping Decoding/decompressing Data Storage Capacity General Configuration # of data mirroring 2 Quad Core CPUs 16-96GB Memory TCO 2 x GE Rack space 1TB-2TB Disk x n Power consumption 1U/2U Rack mount Different workloads
  20. 20. Proven at Scale with Worldwide SupportProduction-scale testing of Apache Trunk & hosted environment for customer POC‟s  Industry’s largest Hadoop support team  Industry‟s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.)  Tested at scale on the Greenplum Analytics Workbench  1,000-node, 24-petabyte cluster  Multi-million dollar investment by EMC and partners  Reduced risk for EMC Bringing Rapid Innovation customers to Hadoop  Certification of partner products
  21. 21. Supermicro Server Functions in the ClusterSupermicroData Nodes2U Storage ServerSupermicro InfrastructureNodes • 1,000+ Physical Supermicro Server Nodes (10k virtual nodes) • 12,000 Processor Cores • 24 Petabytes of Storage Capacity (6Gbps SATA) • 48 Terabytes RAM 2U Twin2 Server • 56 Gbps Infiniband Connectivity
  22. 22. Supermicro Multi-Node Server Solutions Switch Data Center - Las Vegas NV
  23. 23. Minutes Initial Benchmark Data…Results before fine-tuning.  World record performance results expected to be announced before 2013.
  24. 24. Other testing programs – Supermicro & Intel CPU Benchmark
  25. 25. Supermicro Advantages Why Supermicro… Building Blocks for different High Efficiency, High Quality Workloads & Requirement -Green IT -Meet any Hadoop workloads by models -High Efficiency Power -I/O, CPU, Disks, Density -High Quality for highest system availability and - Customize by specific workload requirement best utilization Proven solutions TCO -EMC Greenplum proven solutions Solutions to Cost-Effective Hadoop Clusters -100% Apache Hadoop Compatible Best choice of Hadoop Hardware platforms -Benchmark and testing programs with partners
  26. 26. Turnkey Hadoop: Supermicro Complete Rack Solutions One Stop Shop for Hardware, End to End Total Solutions Speedup Deployment With Ready to Run Rack Systems Single Source, Consistent Build Quality and Delivery Time Multi-Vendor Compatibility Test, Zero Compatibility Issue Premium Service With Competitive PricingShipped Directly From US, NL, TW
  27. 27. Broad Product Portfolios and Building Blocks Best platform to your Hadoop cluster
  28. 28. SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Q&AThank You