Your SlideShare is downloading. ×
101 ab 1415-1445
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

101 ab 1415-1445


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. The Infrastructure of Tomorrow, Today – Integrating Supermicro, Greenplum and SAS to enable Big Data Analytics Jeff Tsai 蔡穎碩 Solution Manager © Supermicro 2012
  • 2. AgendaBig Data Analytics Platform & InfrastructureEMC+Supermicro  1,000 Nodes Hadoop Cluster
  • 3. !!!“Big Data Is Less !!! About Size, And More About Freedom” ―Techcrunch !!! THE ERA OF !!! BIG DATA “Findings: „Big Data‟ !!! Is More Extreme Than Volume” “Big Data! It‟s Real, IS HERE… ― Gartner It‟s Real-time, and It‟s Already “Total data: „bigger‟ Changing Your than big data” World” !!! ― 451 Group !!! !!! ―IDB
  • 4. Data Sources Are Expanding THE DIGITAL UNIVERSE WILL GROW 44X IN THE NEXT 10 YEARSSource : 2011 IDC Digital Universe Study
  • 5. BIG Data is Just a Bunch of Data to Store…? OR 90 80 70 60 50 Big 40 Data 30 Sources 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC
  • 6. To Create Significant value to your business… HOW?...
  • 7. Make BIG DataAccessible Identify the data source Store the data Connect applications and users Utilize the data in different views
  • 8. EMC UAP Solutions – Analytics Platform This is what my analyticsenvironment looks like…
  • 9. Building The Big Data Analytics “Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Greenplum HD Enterprise & Community Editions Hadoop Enterprise & Community EditionsWorld’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data
  • 10. Greenplum Becomes the Foundation of EMC’s Data Computing Division E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0 “For three years, Gartner has identified Greenplum as the most advanced vendor in the visionaryquadrant of its data warehouse DBMS Magic Quadrant….” – Gartner
  • 11. SAS at a GlanceCompany Highlight:• Founded 1976: 11,000+ employees in 400+ offices• 2010 worldwide revenue $2.43 B• IDC: SAS is leader in Analytics with a 34.5% market share : Analytics and Reporting• 4.5 million users worldwide• 50,000+sites in 114 countries• From Tools to Vertical Solutions Services Retail 11% Other 4% Financial Services 2% 42% Manufacturing 6% Healthcare Communications & Life Sciences 8% 8% Government Education 14% Energy & Utilities 3% 2%
  • 12. Overview SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCIRevenues: FY09 $500M, FY10 $721M , FY11 ~$1BGlobal Footprint: >100 CountriesProduction: US, EU and Asia Production facilitiesEngineering: 70% of workforce in engineering (30% growth through recession)Market Share: #1 Server Channel (SMCI enables ~10% of global server market)Brand Equity: Growing public profile since 2007 IPOCorporate Focus: Energy Efficiency, Earth-friendly, Green Technology Innovation
  • 13. Product FamilyResource Optimized (WIO/UIO) Twin Architecture GPU SuperComputing Data Center Optimized EmbeddedApplication Optimized: Multi I/O SuperBlade WorkstationMainstream Business Solutions Storage Server
  • 14. In-House Design and Server Building Block Solutions® Technology Partners Server Building Block Solutions® Customer Requirements Application Optimized OEM Specs Tri-Lab Optimized Data Center In-House Design Server Building Block Solutions® > 350 Operating >550 >1300 > 140 Power Open Cooling Systems / Motherboards Chassis Supplies CPU/ Memory Modules Applications(1) As of Q2, 2009
  • 15. Big Data Analytics on HadoopInternet companies are not built on SQL but are building Analytics on Hadoop/NoSQL Existing Hadoop Users (Internet) This is what I think BI & ETL Tools Web Apps my analytics Reporting environment looks like… Management & Coordination Pig Hive HBase Hadoop System MapReduce Layer Hadoop Storage Web Portal, Social Networks
  • 16. Hadoop Components ( HDFS • Hadoop Distributed File System MapReduce • Framework for writing scalable data applications Pig • Procedural language that abstracts lower level MapReduce Zookeeper • Highly reliable distributed coordination Hive • Data warehouse infrastructure built on top of Hadoop HBase • Database for random, real time read/write access Oozie • workflow/coordination to manage jobs Mahout • Scalable machine learning libraries
  • 17. What can Hadoop do for you? Financial Services  Web & e-Tailing  Better knowing customers  Web usage, click stream behavior  Risk analysis and management.  Market & customer segmentation  Fraud detection and security  Ad customer targeting analytics.  On-line fraud detection Telecommunications  Government  Customer churn prevention.  Fraud detection  Price optimization and marketing  Compliance and regulatory analytics  Network analysis and optimization  Customer experience management  Retail  Market and consumer segmentation Healthcare  Merchandizing and cross-selling  Patient care quality  Promotion and campaign analysis  Drug development Data Source: Cloudera
  • 18. Hadoop Use Cases Linkedin – “People You May Know” and other facts Yahoo! – Hadoop to support AdSystems and web search Visa – Credit card fraud detection and analysis T-Mobile – Churn analysis, user experience Amazon, Baidu, AOL, eBay, Facebook, Twitter, … Data Source: Cloudera
  • 19. Hadoop Cluster HW selection What’s the HW configuration for Hadoop clusters?... It depends, workloads matter. CPU Intensive I/O Intensive Machine learning Data importing and exporting Natural language processing Indexing Complex data mining Searching Feature extraction Grouping Decoding/decompressing Data Storage Capacity General Configuration # of data mirroring 2 Quad Core CPUs 16-96GB Memory TCO 2 x GE Rack space 1TB-2TB Disk x n Power consumption 1U/2U Rack mount Different workloads
  • 20. Proven at Scale with Worldwide SupportProduction-scale testing of Apache Trunk & hosted environment for customer POC‟s  Industry’s largest Hadoop support team  Industry‟s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.)  Tested at scale on the Greenplum Analytics Workbench  1,000-node, 24-petabyte cluster  Multi-million dollar investment by EMC and partners  Reduced risk for EMC Bringing Rapid Innovation customers to Hadoop  Certification of partner products
  • 21. Supermicro Server Functions in the ClusterSupermicroData Nodes2U Storage ServerSupermicro InfrastructureNodes • 1,000+ Physical Supermicro Server Nodes (10k virtual nodes) • 12,000 Processor Cores • 24 Petabytes of Storage Capacity (6Gbps SATA) • 48 Terabytes RAM 2U Twin2 Server • 56 Gbps Infiniband Connectivity
  • 22. Supermicro Multi-Node Server Solutions Switch Data Center - Las Vegas NV
  • 23. Minutes Initial Benchmark Data…Results before fine-tuning.  World record performance results expected to be announced before 2013.
  • 24. Other testing programs – Supermicro & Intel CPU Benchmark
  • 25. Supermicro Advantages Why Supermicro… Building Blocks for different High Efficiency, High Quality Workloads & Requirement -Green IT -Meet any Hadoop workloads by models -High Efficiency Power -I/O, CPU, Disks, Density -High Quality for highest system availability and - Customize by specific workload requirement best utilization Proven solutions TCO -EMC Greenplum proven solutions Solutions to Cost-Effective Hadoop Clusters -100% Apache Hadoop Compatible Best choice of Hadoop Hardware platforms -Benchmark and testing programs with partners
  • 26. Turnkey Hadoop: Supermicro Complete Rack Solutions One Stop Shop for Hardware, End to End Total Solutions Speedup Deployment With Ready to Run Rack Systems Single Source, Consistent Build Quality and Delivery Time Multi-Vendor Compatibility Test, Zero Compatibility Issue Premium Service With Competitive PricingShipped Directly From US, NL, TW
  • 27. Broad Product Portfolios and Building Blocks Best platform to your Hadoop cluster
  • 28. SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Q&AThank You