0
© Copyright 2012 EMC Corporation. All rights reserved.   1
整合分析結構與非結構                                                         性資料暨應用案例                                               ...
Volume, Variety, Velocity, Value + ComplexityNew insights on                   Contextual andcustomers, products,         ...
Sample Big Data Scenarios             LOAN PROCESSING                             AUTO INSURANCE     SMART GRID ANALYTICS ...
Big Data Analytics For CompetitiveAdvantage            Suppliers                                                         S...
Big Data meets Fast Data                                                         Social and Personal – Every              ...
Working together, they enable entirely New Business Models                                                          Big Da...
Effective Customer Segmentation is all about blending Structured and Unstructured Data       – Transaction data (structure...
Big Data Architecture                                 Solving Big Data challenge                                          ...
Greenplum Overview© Copyright 2010 EMC Corporation. All rights reserved.   10
Greenplum Product Line© Copyright 2010 EMC Corporation. All rights reserved.   11
Architecture of GreenplumFlexible framework for processing large datasetsProcess large datasets with support for          ...
Greenplum MPP Share-Nothing Arch.                                                                           MPP      Share...
Benefits of the Greenplum Database   Architecture  • Simplicity           –    Parallelism is automatic – no manual partit...
Greenplum and Hadoop                                                         Analytics                                    ...
Big Data AnalyticsThe Power of Data Co-Processing                                                                         ...
Greenplum Hadoop• Greenplum HD     – Enterprise-ready Apache Hadoop     – Proven at Scale in 1,000 node Analytics       Wo...
AWB UpdateAnalytics Workbench Operational!•1025 nodes operational•1011 nodes with GPHD installed•8 total projects have bee...
Apache Hadoop Pain Points                                                         • Poor Job and Application Monitoring   ...
Greenplum MR:Enterprise Edition Stack           100%           APACHE                                                     ...
Greenplum MR: Enterprise EditionEnterprise-Ready Hadoop Platform for Unstructured Data                                    ...
Greenplum MR Simple Management• Health  Monitoring• Cluster  Administratio  n• Application  Provisioning© Copyright 2010 E...
Rack Level Monitoring© Copyright 2010 EMC Corporation. All rights reserved.   23
Greenplum MR Delivers True Return onInvestment                                                         •     NFS direct ac...
EMC Greenplum   Fastest data loading                                                                         Advanced anal...
EMC Big Data Analytics ReferenceArchitecture    Data Sources                                         Hadoop               ...
Architecture for Business Value                                                           Business Value                  ...
Big Data And EMC                                                                4   New Analytic Applications             ...
SAS / Greenplum Product Overview                                              SAS High Performance Computing        SAS Ac...
SAS and Greenplum UAP Integrated Architecture                                         Data                Data       Data ...
In A Single Unified Analytics PlatformSelf-ServiceIterative, AgileTransparent, Real-time CollaborationStructured & Unstruc...
© Copyright 2010 EMC Corporation. All rights reserved.   33
Upcoming SlideShare
Loading in...5
×

Greenplum hadoop

1,440

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,440
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
96
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Greenplum hadoop"

  1. 1. © Copyright 2012 EMC Corporation. All rights reserved. 1
  2. 2. 整合分析結構與非結構 性資料暨應用案例 Greenplum Enable Big Data Analytics 邱垂吉 Jimmy Chiu 技術顧問/EMC Greenplum Taiwan© Copyright 2012 EMC Corporation. All rights reserved. 2
  3. 3. Volume, Variety, Velocity, Value + ComplexityNew insights on Contextual andcustomers, products, Velocity Volume location-awareand operations delivery to any Big Data device Variety Complexity Documents Transactional Smart Grid Images Audio Text Video Data • Volume: data volumes approaching multiple petabytes • Velocity: data being generated and ingested for analysis in real-time • Variety: tabular, documents, e-mail, metering, network, video, image, audio • Complexity: different standards, domain rules, and storage formats per data type Gartner March 2011 © Copyright 2010 EMC Corporation. All rights reserved. 3
  4. 4. Sample Big Data Scenarios LOAN PROCESSING AUTO INSURANCE SMART GRID ANALYTICS IN BANKING IN P&C INSURANCE IN UTILITIES/ENERGY REAL-TIME STATISTICALPROACTIVE EMERGENCY RESPONSE VIDEO ANALYTICS IN HEALTHCARE IN RETAIL PROCESS CONTROL IN MANUFACTURING© Copyright 2010 EMC Corporation. All rights reserved. 4
  5. 5. Big Data Analytics For CompetitiveAdvantage Suppliers Suppliers Who are my most valuable Manufacturing customers? Manufacturing Inventory Inventory Physical Assets Physical Assets What are my most Distribution important Services Distribution products? Personal Marketing Services Mass Additional Marketing Profits What are my most successful campaigns? Customers Customers Today’s Business Model Big Data Analytics Business Model© Copyright 2010 EMC Corporation. All rights reserved. 5
  6. 6. Big Data meets Fast Data Social and Personal – Every Minutes: •Google gets more than 2 million search queries •About 47,000 people download an App •Some 100,000 tweets hit Twitter •Almost 300,000 people log on to Facebook Business and Transactional: •CERN (European Organization for Nuclear Research) generates 40TB/sec of scientific data •Wal-Mart – 1 million transactions per hour •World’s top systems currently trade at faster than 50 microseconds •New York Stock Exchange generates 1TB of new trading data daily© Copyright 2010 EMC Corporation. All rights reserved. 6
  7. 7. Working together, they enable entirely New Business Models Big Data allows you to find opportunities you didn’t know you had. Fast Data allows you to respond to opportunities before they are gone. In the Financial Services Industry, large quantities of historical data need to be processed against a growing number of fast-moving data feeds. Batch processing is no longer a suitable solution!© Copyright 2010 EMC Corporation. All rights reserved. 7
  8. 8. Effective Customer Segmentation is all about blending Structured and Unstructured Data – Transaction data (structured data) tells you what the customer did. – Unstructured data can tell you why they did it, why some others did not, what else they need or want, and what problems they may have.© Copyright 2010 EMC Corporation. All rights reserved. 8
  9. 9. Big Data Architecture Solving Big Data challenge involves more than just Requirements managing volumes of data. ― Gartner • Multiple data types: structured, semi-structured, unstructured • Integrated data stores: real-time, traditional, data warehouse • Modern development tools: Java, lightweight messages, mobile-enabled • Cloud-enabled: elastic scale, self-healing Beware point solutions – integration is critical!© Copyright 2010 EMC Corporation. All rights reserved. 9
  10. 10. Greenplum Overview© Copyright 2010 EMC Corporation. All rights reserved. 10
  11. 11. Greenplum Product Line© Copyright 2010 EMC Corporation. All rights reserved. 11
  12. 12. Architecture of GreenplumFlexible framework for processing large datasetsProcess large datasets with support for SQLboth SQL and MapReduce MapReduce Master MasterMaster servers optimize queriesfor the most efficient query executionInterconnect for continuouspipelining of data processingSegment servers process queriesclose to the data in parallelMPP Scatter/Gather streaming forfast loading of data© Copyright 2010 EMC Corporation. All rights reserved. 12
  13. 13. Greenplum MPP Share-Nothing Arch. MPP Share Share Disk Share nothing everything eg: eg: eg: Oracle RAC Greenplum Unix server Intranet Master Intranet DB DB DB DB DB DB DB DB DB SAN/FC Disk SAN Disk Disk Disk Disk Share disk© Copyright 2010 EMC Corporation. All rights reserved. 13
  14. 14. Benefits of the Greenplum Database Architecture • Simplicity – Parallelism is automatic – no manual partitioning required – No complex tuning required – just load and query – HA – Best of breed x86 and Ethernet networking technologies • Scalability – Linear scalability – Each node adds storage, query performance, loading performance • Flexibility – Fully parallelism for SQL92, SQL99, SQL2003 OLAP, MapReduce – Any schema (star, snowflake, 3NF, hybrid, etc) – Rich extensibility and language support (Perl, Python, R, C, etc) – Structure, semi-structure and unstructure© Copyright 2010 EMC Corporation. All rights reserved. 14
  15. 15. Greenplum and Hadoop Analytics Semi-Structured Structured Machine Data UnStructured ERP/CRM Logs Images/Sound Ad-hoc Analysis batch reporting on static data Dynamic Data© Copyright 2010 EMC Corporation. All rights reserved. 15
  16. 16. Big Data AnalyticsThe Power of Data Co-Processing Greenplum Chorus Analytic Productivity & Tool Integration End-to-end Platform Management & Control Data Access And Query Greenplum Commander SQL, MapReduce, SAS, MADLib, Mahout, R, and others SQL Engine MapReduce Engine parallel For Unstructured Data For Structured Data data exchange •Enterprise ready Apache • In-database Advanced Analytics Hadoop • Extreme performance on •Faster, more dependable, and commodity hardware parallel easier to use data exchange Greenplum Database Greenplum Hadoop Network Parallel Loading Of All Data Types© Copyright 2010 EMC Corporation. All rights reserved. 16
  17. 17. Greenplum Hadoop• Greenplum HD – Enterprise-ready Apache Hadoop – Proven at Scale in 1,000 node Analytics Workbench – Single product with 2 storage options (Isilon & HDFS)• Enterprise Edition becomes Greenplum MR: – Advanced features – 100% API compatible – Software-only product © Copyright 2010 EMC Corporation. All rights reserved. 17
  18. 18. AWB UpdateAnalytics Workbench Operational!•1025 nodes operational•1011 nodes with GPHD installed•8 total projects have been on boarded from universitycollaboration to partner technology evaluationProposals accepted by customer engagement team –info@analyticsworkbench.com•Engagement team will learn project objectives•JEDI council approves/disproves project based on technicalfeasibility and alignment with company goals•Projects informed of decisions and timelinesCluster access via - http://portal.analyticsworkbench.com/ © Copyright 2010 EMC Corporation. All rights reserved. 18
  19. 19. Apache Hadoop Pain Points • Poor Job and Application Monitoring Monitoring Solution • Non-existent Performance Monitoring Operability • Complex System Configuration and Manageability and • No Data Format Interoperability & Manageability Storage Abstractions • Poor Dimensional Lookup Performance Performance • Very poor Random Access and Serving Performance© Copyright 2010 EMC Corporation. All rights reserved. 19
  20. 20. Greenplum MR:Enterprise Edition Stack 100% APACHE Enhanced Monitoring INTERFACE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Distributed File System© Copyright 2010 EMC Corporation. All rights reserved. 20
  21. 21. Greenplum MR: Enterprise EditionEnterprise-Ready Hadoop Platform for Unstructured Data • 2 – 5x Faster than Apache Faster Hadoop • High Availability Reliable • Mirroring Easier to • NFS mountable Use • Graphical System Management© Copyright 2010 EMC Corporation. All rights reserved. 21
  22. 22. Greenplum MR Simple Management• Health Monitoring• Cluster Administratio n• Application Provisioning© Copyright 2010 EMC Corporation. All rights reserved. 22
  23. 23. Rack Level Monitoring© Copyright 2010 EMC Corporation. All rights reserved. 23
  24. 24. Greenplum MR Delivers True Return onInvestment • NFS direct access to simply load and access data directly in a Hadoop cluster • Enables standard tools and utilities to work directly on data contained in Hadoop • Heatmap user interface provides full cluster visibility and control. • Eliminates all single points of failure • High Availability for Job Tracker , NameNode & NFS • Snapshots allow point-in-time data protection and recovery. • Mirroring for business continuity includes wide area replication support. • Speeds jobs by 2X – 5X • Provides faster performance with ½ the hardware • Substantial capital and operating expense savings© Copyright 2010 EMC Corporation. All rights reserved. 24
  25. 25. EMC Greenplum Fastest data loading Advanced analytics DATA IN IN-DATABASE ANALYTICS DECISIONS OUTScatter/Gather Streaming Optimized for fast query execution Unified data access for greatertechnology for the world’s and linear scalability insight and value from datafastest data loading •Move processing closer to data •Enable parallel analysis•Eliminate data load •Shared-nothing, massively across the enterprisebottlenecks parallel processing (MPP) •Open platform with broad•Clean and integrate new data scale-out architecture language support•Several loading options, •Computing is automatically •Certified enterpriseranging from bulk load optimized and distributed connectivity and integrationupdates to micro-batching for across resources with most businessnear real-time processing • Provides the best concurrent intelligence; extract, multi-workload performance transform, and load (ETL); and management products© Copyright 2010 EMC Corporation. All rights reserved. 25
  26. 26. EMC Big Data Analytics ReferenceArchitecture Data Sources Hadoop Alerts Statistics Reduce Documents Genetic Algorithms Map- Map- Ecosystem* HDFS Reduce Dashboards Mobile Key Values Documents Other NoSql Machine Reports Data Mining Data Quality NoSQL Stores Multimedia parallel data exchange Spreadsheets SQL Stores Web/Social OLAP BU 1 Operations Research Data Marts LOB data MDM Mobile Enterprise Data BU 2 ERP Warehouse Neural Nets BU 3 ETL Data Visualization CRM Federated BI as a Data Service POS Warehouse Data Data Stores and Data Presentation & Integration Input Access Analysis Delivery Structured Traditional data Traditional data Big data analytics data sources Integration warehousing ramifications*Hadoop Ecosystem includes: Hive, Pig, Mahout, HBase, ZooKeeper, Oozie, Sqoop, Avro© Copyright 2010 EMC Corporation. All rights reserved. 26
  27. 27. Architecture for Business Value Business Value Chorus for Collaboration Analytics Analytics Self-develop app Self-develop app Java API Analytics tools Analytics tools JDBC (Mahout) (SAS, R, MADlib and more) ODBC Hbase .csv SAS & MADlib .txt GPDB - In GPDB - In Memory MapRFS (GPMR) ETL MapRFS: C++; MR: C++ x Load Performance: 2~5X DB’s Files High Availability Stable© Copyright 2010 EMC Corporation. All rights reserved. 27
  28. 28. Big Data And EMC 4 New Analytic Applications Data Science 3 2 Unified Analytics Platform Petabyte Scale Data Storage 1© Copyright 2010 EMC Corporation. All rights reserved. 29
  29. 29. SAS / Greenplum Product Overview SAS High Performance Computing SAS Access for SAS In-Database SAS In-Memory Integration Processing AnalyticsProvides integration capability to Requires SAS Enterprise Miner in New functionality from SAS thata number of databases order to be of value requires dedicated database applianceAllows for increased performance Will lead to significant Very high performance for businessof Base SAS Procs improvement in performance users that can significantly increase revenues or decrease costs as a result of improved performanceProducts: SAS Access for Greenpum Products: SAS Access for Products: SAS Access for Greenplum, SAS Grid Manager, SAS Greenplum, SAS Grid Manager, SAS Enterprise Miner, SAS Scoring High Performance Analytics Accelerator for Greenplum © Copyright 2010 EMC Corporation. All rights reserved. 30
  30. 30. SAS and Greenplum UAP Integrated Architecture Data Data Data Bl LOB Scientist Engineer Analyst Analyst User SAS Business Intelligence DATA SCIENCE TEAM Greenplum Chorus - Analytic Productivity Layer SAS Analytics Data Access & Query Layer (SAS ACCESS, SQL, MapReduce) Greenplum Database Greenplum Hadoop Private/Hybrid Cloud Infrastructure or Appliance Data Platform Admin SAS Information Management© Copyright 2010 EMC Corporation. All rights reserved. 31
  31. 31. In A Single Unified Analytics PlatformSelf-ServiceIterative, AgileTransparent, Real-time CollaborationStructured & Unstructured DataAnalyze Petabytes Of Current DataVirtual, Scale Out Architecture© Copyright 2010 EMC Corporation. All rights reserved. 32
  32. 32. © Copyright 2010 EMC Corporation. All rights reserved. 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×