!!!“Big Data Is Less !!! About Size, And More About Freedom” ―Techcrunch !!! THE ERA OF !!! BIG DATA “Findings: „Big Data‟ !!! Is More Extreme Than Volume” “Big Data! It‟s Real, IS HERE… ― Gartner It‟s Real-time, and It‟s Already “Total data: „bigger‟ Changing Your than big data” World” !!! ― 451 Group !!! !!! ―IDB
Data Sources Are Expanding THE DIGITAL UNIVERSE WILL GROW 44X IN THE NEXT 10 YEARSSource : 2011 IDC Digital Universe Study
BIG Data is Just a Bunch of Data to Store…? OR 90 80 70 60 50 Big 40 Data 30 Sources 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC
To Create Significant value to your business… HOW?...
Make BIG DataAccessible Identify the data source Store the data Connect applications and users Utilize the data in different views
EMC UAP Solutions – Analytics Platform This is what my analyticsenvironment looks like…
Building The Big Data Analytics “Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Greenplum HD Enterprise & Community Editions Hadoop Enterprise & Community EditionsWorld’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data
Greenplum Becomes the Foundation of EMC’s Data Computing Division E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0 “For three years, Gartner has identified Greenplum as the most advanced vendor in the visionaryquadrant of its data warehouse DBMS Magic Quadrant….” – Gartner
SAS at a GlanceCompany Highlight:• Founded 1976: 11,000+ employees in 400+ offices• 2010 worldwide revenue $2.43 B• IDC: SAS is leader in Analytics with a 34.5% market share : Analytics and Reporting• 4.5 million users worldwide• 50,000+sites in 114 countries• From Tools to Vertical Solutions Services Retail 11% Other 4% Financial Services 2% 42% Manufacturing 6% Healthcare Communications & Life Sciences 8% 8% Government Education 14% Energy & Utilities 3% 2%
Overview SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCIRevenues: FY09 $500M, FY10 $721M , FY11 ~$1BGlobal Footprint: >100 CountriesProduction: US, EU and Asia Production facilitiesEngineering: 70% of workforce in engineering (30% growth through recession)Market Share: #1 Server Channel (SMCI enables ~10% of global server market)Brand Equity: Growing public profile since 2007 IPOCorporate Focus: Energy Efficiency, Earth-friendly, Green Technology Innovation
Product FamilyResource Optimized (WIO/UIO) Twin Architecture GPU SuperComputing Data Center Optimized EmbeddedApplication Optimized: Multi I/O SuperBlade WorkstationMainstream Business Solutions Storage Server
In-House Design and Server Building Block Solutions® Technology Partners Server Building Block Solutions® Customer Requirements Application Optimized OEM Specs Tri-Lab Optimized Data Center In-House Design Server Building Block Solutions® > 350 Operating >550 >1300 > 140 Power Open Cooling Systems / Motherboards Chassis Supplies CPU/ Memory Modules Applications(1) As of Q2, 2009
Big Data Analytics on HadoopInternet companies are not built on SQL but are building Analytics on Hadoop/NoSQL Existing Hadoop Users (Internet) This is what I think BI & ETL Tools Web Apps my analytics Reporting environment looks like… Management & Coordination Pig Hive HBase Hadoop System MapReduce Layer Hadoop Storage Web Portal, Social Networks
Hadoop Components (hadoop.apache.org) HDFS • Hadoop Distributed File System MapReduce • Framework for writing scalable data applications Pig • Procedural language that abstracts lower level MapReduce Zookeeper • Highly reliable distributed coordination Hive • Data warehouse infrastructure built on top of Hadoop HBase • Database for random, real time read/write access Oozie • workflow/coordination to manage jobs Mahout • Scalable machine learning libraries
What can Hadoop do for you? Financial Services Web & e-Tailing Better knowing customers Web usage, click stream behavior Risk analysis and management. Market & customer segmentation Fraud detection and security Ad customer targeting analytics. On-line fraud detection Telecommunications Government Customer churn prevention. Fraud detection Price optimization and marketing Compliance and regulatory analytics Network analysis and optimization Customer experience management Retail Market and consumer segmentation Healthcare Merchandizing and cross-selling Patient care quality Promotion and campaign analysis Drug development Data Source: Cloudera
Hadoop Use Cases Linkedin – “People You May Know” and other facts Yahoo! – Hadoop to support AdSystems and web search Visa – Credit card fraud detection and analysis T-Mobile – Churn analysis, user experience Amazon, Baidu, AOL, eBay, Facebook, Twitter, … Data Source: Cloudera
Hadoop Cluster HW selection What’s the HW configuration for Hadoop clusters?... It depends, workloads matter. CPU Intensive I/O Intensive Machine learning Data importing and exporting Natural language processing Indexing Complex data mining Searching Feature extraction Grouping Decoding/decompressing Data Storage Capacity General Configuration # of data mirroring 2 Quad Core CPUs 16-96GB Memory TCO 2 x GE Rack space 1TB-2TB Disk x n Power consumption 1U/2U Rack mount Different workloads
Proven at Scale with Worldwide SupportProduction-scale testing of Apache Trunk & hosted environment for customer POC‟s Industry’s largest Hadoop support team Industry‟s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.) Tested at scale on the Greenplum Analytics Workbench 1,000-node, 24-petabyte cluster Multi-million dollar investment by EMC and partners Reduced risk for EMC Bringing Rapid Innovation customers to Hadoop Certification of partner products
Supermicro Server Functions in the ClusterSupermicroData Nodes2U Storage ServerSupermicro InfrastructureNodes • 1,000+ Physical Supermicro Server Nodes (10k virtual nodes) • 12,000 Processor Cores • 24 Petabytes of Storage Capacity (6Gbps SATA) • 48 Terabytes RAM 2U Twin2 Server • 56 Gbps Infiniband Connectivity
Supermicro Multi-Node Server Solutions Switch Data Center - Las Vegas NV
Minutes Initial Benchmark Data…Results before fine-tuning. World record performance results expected to be announced before 2013.
Other testing programs – Supermicro & Intel CPU Benchmark
Supermicro Advantages Why Supermicro… Building Blocks for different High Efficiency, High Quality Workloads & Requirement -Green IT -Meet any Hadoop workloads by models -High Efficiency Power -I/O, CPU, Disks, Density -High Quality for highest system availability and - Customize by specific workload requirement best utilization Proven solutions TCO -EMC Greenplum proven solutions Solutions to Cost-Effective Hadoop Clusters -100% Apache Hadoop Compatible Best choice of Hadoop Hardware platforms -Benchmark and testing programs with partners
Turnkey Hadoop: Supermicro Complete Rack Solutions One Stop Shop for Hardware, End to End Total Solutions Speedup Deployment With Ready to Run Rack Systems Single Source, Consistent Build Quality and Delivery Time Multi-Vendor Compatibility Test, Zero Compatibility Issue Premium Service With Competitive PricingShipped Directly From US, NL, TW
Broad Product Portfolios and Building Blocks Best platform to your Hadoop cluster
SMC Inc., HQ SMC BV, San Jose, CA The Netherlands SMC TW, Taiwan Q&AThank You
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.