Big Data Appliance: Innowacje w centrach przetwarzania danych - Maciej Tomkiewicz, Oracle


Published on

Social Media oraz Big Data przez pryzmat Oracle Applications, Oracle Fusion Middleware i Oracle Engineered Systems, Warszawa - 30.03.2012 r.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data Appliance: Innowacje w centrach przetwarzania danych - Maciej Tomkiewicz, Oracle

  1. 1. Big Data Appliance - Innowacje w centrachprzetwarzania danychMaciej
  2. 2. The following is intended to outline our general product direction.It is intended for information purposes only, and may not beincorporated into any contract. It is not a commitment to deliverany material, code, or functionality, and should not be relied uponin making purchasing decisions. The development, release, andtiming of any features or functionality described for Oracle’sproducts remains at the sole discretion of Oracle.
  3. 3. Agenda• The Business of Big Data• ORACLE HW strategy• Inside the Big Data Appliance • Overview • Applications• Summary• Q&A
  4. 4. <Insert Picture Here>The Business of Big Data
  5. 5. Big Data: Acting on New Data “I found it” Stores Looking back “PAST” Catalog/ Call 60% Web Center Potential increase in Retail Decisions retailers’ operating margins possible with Big Data Social Search Looking ahead Networks McKinsey Global Institute: “FUTURE” Big DataThe next frontier for innovation, competition and productivity (May 2011) “I think” “I want”
  6. 6. Tapping into Diverse Data Sets Video and Images Big Data:Decisions based Documentson all your data Social Data Machine-Generated Data Information Architectures Today: Transactions Decisions based on database data
  7. 7. Case: On-line Ads and Content Real-time: Determine Low Lookup user best ad to placeLatency profile on page for this user Add user NoSQL Expert if not present DB Input into System HDFS Predictions Actual on browsing ads Web served logs High scale Batch BI and Billing NoSQL DB data reductions Analytics Profiles
  8. 8. Case: On-line Adds and Content • Dynamic and rapidly changing schemaNoSQL DB • Scalable single record lookup • Low cost, high scale storage HDFS • Write once, read many times • High scale batch processing Hadoop • Highly customizable infrastructure • Deep analytics and BI value add RDBMS • Reporting for large user community
  9. 9. Big Data: Infrastructure Requirements Acquire Organize Analyze• Low, predictable Latency• High Transaction Volume • Deep Analytics• Flexible Data Structures • Agile Development • High Throughput • Massive Scalability • In-Place Preparation • Real Time Results • All Data Sources/Structures
  10. 10. Oracle Big Data Platform Big Data Exadata Exalytics ApplianceACQUIRE ORGANIZE ANALYZE DECIDE
  11. 11. Why Build A Hadoop Appliance? Time to Build Optimizations Maintenance
  13. 13. Engineered TogetherTested Together Certified Together Deployed Together Upgraded Together Managed Together Supported Together 13
  15. 15. <Insert Picture Here>Inside the Big Data ApplianceApplications
  16. 16. Oracle NoSQL DB A distributed, scalable key-value database• Simple Data Model • Key-value pair with major+sub-key paradigm • Read/insert/update/delete operations Application Application• Scalability NoSQLDB Driver NoSQLDB Driver • Dynamic data partitioning and distribution • Optimized data access via intelligent driver• High availability • One or more replicas • Disaster recovery through location of replicas • Resilient to partition master failures • No single point of failure• Transparent load balancing Storage Nodes Storage Nodes • Reads from master or replicas Data Center A Data Center B • Driver is network topology & latency aware• Elastic (Planned for Release 2) • Online addition/removal of Storage Nodes • Automatic data redistribution
  17. 17. Big Data ApplianceSystem Layout Master NodeNoSQL DB Replicas
  18. 18. Oracle NoSQL DB Differentiation• Commercial Grade Software and Support • General-purpose • Reliable – Based on proven Berkeley DB JE HA • Easy to install and configure• Scalable throughput, bounded latency• Simple Programming and Operational Model • Simple Major + Sub key and Value data structure • ACID transactions • Configurable consistency & durability• Easy Management • Web-based console, API accessible • Manages and Monitors: Topology; Load; Performance; Events; Alerts• Completes Oracle large scale data storage offerings
  19. 19. Oracle Loader for Hadoop Partition and transform into Oracle ready format QueryInput Load . . . . TableInput Oracle Loader for Hadoop
  20. 20. Streaming Access to HDFS HDFSDatafile_part_1 Oracle Database External Query HDFS TableDatafile_part_2 View FUSE HDFS OrDatafile_part_m Table Function HDFSDatafile_part_n Reduce Map HDFS Datafile_part_x
  21. 21. Oracle Data Integrator Easily integrate data from any source Expanded functionality: => Construct Hadoop jobs to transform and load data into Oracle => Leverage Oracle Loader for Hadoop and/or Hive
  22. 22. Oracle Big Data ApplianceSoftware Overview
  23. 23. Oracle Big Data Appliance Pre-Installed, Optimized • Oracle Linux 5.6 • Java Hotspot VM • Cloudera CDH • Cloudera Manager • Open Source R Distribution • Oracle NoSQL Database CE • Oracle Big Data Connectors ** Optional. Available and licensed separately.
  24. 24. Why Cloudera CDH?• Fast evolution in critical features – Built by the Hadoop experts in the community – Practical instead of esoteric• Proven at very large scale – In production at all the large consumers of Hadoop – Extremely stable in those environments• Managed and Tested by Cloudera – Managed Open Source components – Contains a rich management GUI tool
  25. 25. Cloudera CDH Apache Hadoop Apache Sqoop Apache Hive Apache Mahout Apache Pig Apache Whirr Apache HBase Apache Oozie Apache Zookeeper Fuse-DFS Apache Flume HueLatest details at:
  26. 26. Hadoop Software Layout (Masters) • Node 1: – M: Name Node, Balancer & HBase Master – S: HDFS Data Node, NoSQL Database Storage Node* • Node 2: – M: Secondary Name Node, Cloudera Manager, Zookeeper, MySQL Slave – S: HDFS Data Node, NoSQL Database Storage Node* • Node 3: 3 – M: JobTracker, MySQL Master, ODI Agent, Hive Server 2 1 – S: HDFS Data Node, NoSQL Database Storage Node** Optional
  27. 27. Oracle Big Data Connectors*Software Overview
  28. 28. Oracle Data IntegratorSimplifying MapReduce Oracle Data Integrator Automatically generates MapReduce code Oracle Loader for Manages the process Hadoop Loads into Data Warehouse
  29. 29. Oracle Loader for HadoopUse The Cluster ORACLE LOADER FOR HADOOP MAP REDUCE MAP Last stage in MapReduce MAP SHUFFLE /SORT REDUCE workflow Partitioned and non- MAP REDUCE partitioned tables MAP REDUCE SHUFFLE MAP /SORT REDUCE Online and offline loads
  30. 30. Oracle Direct Connector for HDFSDirect Access from Oracle Database HDFS Oracle Database SQL Query SQL access to HDFS External Table External table view Data query or import DCH DCH HDFS Infini Band DCH Client
  31. 31. Oracle R Hadoop Connector Native R Access to HadoopClient Host Oracle Big Data Oracle Exadata Appliance R Engine R Engine ORE ORHC ORHC Native R MapReduce Hadoop MapReduce R Engine Native R HDFS access Cluster Nodes ORE Software HDFS
  32. 32. Oracle Big Data Appliance• Optimized and Complete – All you need to store and integrate big data• Integrated with Oracle Exadata – Analyze all your data• Easy to Deploy – Risk Free, Quick Installation and Setup• Single Vendor Support – Full Oracle support for all hardware and software