• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre Dijcks - Oracle
 

Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre Dijcks - Oracle

on

  • 3,749 views

Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data ...

Analyzing new and diverse digital data streams can reveal new sources of economic value, provide fresh insights into customer behavior and identify market trends early on. But this influx of new data can create challenges for IT departments. To derive real business value from Big Data, you need the right tools to capture and organize a wide variety of data types from different sources, and to be able to easily analyze it within the context of all your enterprise data. Attend this session to learn how Oracle’s end-to-end value chain for Big Data can help you unlock the value of Big Data.

Statistics

Views

Total Views
3,749
Views on SlideShare
3,301
Embed Views
448

Actions

Likes
10
Downloads
0
Comments
0

4 Embeds 448

http://www.cloudera.com 388
http://www.scoop.it 54
http://www.samsung.net 5
http://cloudera.matt.dev 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Changed Count to Volume =>
  • Is Developer Centric the right word? Should we hyphenate, or put comma’s
  • Benefits for Online Mode: No need to write to disk after Hadoop job Simpler management for use cases with lots of nodes generating output filesBenefits for Offline Mode (DP Files): Import operation can be parallelized in the database Fastest option for external tables
  • Direct HDFS:Access data on HDFS through the external table mechanismBenefitsData on HDFS can be queried from the databaseImport into the database as needed

Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre Dijcks - Oracle Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre Dijcks - Oracle Presentation Transcript

  • <Insert Picture Here>Oracle Big Data Appliance and SolutionsJean-Pierre DijcksHadoop World – Nov 8th, 2012
  • The following is intended to outline our general productdirection. It is intended for information purposes only, andmay not be incorporated into any contract. It is not acommitment to deliver any material, code, or functionality,and should not be relied upon in making purchasingdecisions.The development, release, and timing of any features orfunctionality described for Oracle’s products remain at thesole discretion of Oracle.
  • Case: On-line Ads and Content Real-time: Determine Low best ad to placeLatency Lookup user on page for this user profile Add user NoSQL Expert if not present DB Input into System Actual HDFS Predictions ads on browsing Web served logs High scale Batch data reductions BI and Billing NoSQL DB Analytics Profiles
  • Agenda• Big Data Technology• Oracle Big Data Appliance• Big Data Applications• Summary• Q&A
  • <Insert Picture Here>Big Data Technology
  • Big Data: Infrastructure Requirements Acquire Organize Analyze• Low, predictable Latency• High Transaction Volume • Deep Analytics• Flexible Data Structures • Agile Development • Massive Scalability • High Throughput • Real Time Results • In-Place Preparation • All Data Sources/Structures
  • Divided Solution Spectrum Data Variety Distributed NoSQLDynamic File Systems Flexible MapReduceSchema Specialized Solutions Transaction Developer (Key-Value) Centric Stores SQLSchema DBMS DBMS Advanced Trusted ETL Analytics (OLTP) (DW) Secure Administered Acquire Organize Analyze
  • Oracle Integrated Software Solution Stack Data Variety HDFS Hadoop In-DB Analytics Dynamic Oracle Loader Schema “R” Oracle NoSQL for Hadoop Mining DB Text Oracle Data Integrator Graph Spatial Oracle Oracle Schema Database Database Oracle (OLTP) (DW) BI EE Acquire Organize Analyze8 Copyright © 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8 reserved.
  • Oracle Engineered Solutions Data Variety Big Data Appliance HDFS Hadoop In-DB • Hadoop Analytics Dynamic • NoSQL Database Loader Oracle Schema • Oracle Loader for hadoop Oracle NoSQL “R” for Hadoop • Oracle Data Integrator DB Mining Exalytics Oracle Text • Speed of Data Integrator Graph Thought Spatial Analytics Oracle Oracle Exadata Oracle Schema Database • OLTP & DW Database Oracle (OLTP) (DW) • Data Mining & Oracle R BI EE • Semantics • Spatial Acquire Organize Analyze9 Copyright © 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8 reserved.
  • Big Data ApplianceBatch Usage Model Oracle Oracle Oracle Big Data Appliance Exadata Exalytics InfiniBand InfiniBand Acquire Organize Analyze
  • Why build a Hadoop Appliance? • Time to Build? • Required Expertise? • Cost and Difficulty Maintaining?11 Copyright © 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8 reserved.
  • Oracle Big Data Appliance Hardware•18 Sun X4270 M2 Servers – 48 GB memory per node = 864 GB memory – 12 Intel cores per node = 216 cores – 24 TB storage per node = 432 TB storage•40 Gb p/sec InfiniBand•10 Gb p/sec Ethernet
  • Big Data Appliance Cluster of industry standard servers for Hadoop and NoSQL Database • Focus on Scalability and Availability at low costInfiniBand Network Compute and Storage• Redundant 40Gb/s switches • 18 High-performance low-cost• IB connectivity to Exadata servers acting as Hadoop nodes10GigE Network • 24 TB Capacity per node• 8 10GigE ports • 2 6-core CPUs per node• Datacenter connectivity • Hadoop triple replication • NoSQL Database triple replication
  • Scale Out to Infinity Scale out by connecting racks to each other using Infiniband • Expand up to eight racks without additional switches • Scale beyond eight racks by adding an additional switch
  • Oracle Big Data Appliance Software •Oracle Linux 5.6 •Java Hotspot VM •Apache Hadoop Distribution v0.20.x •R Distribution •Oracle NoSQL Database Enterprise Edition •Oracle Data Integrator Application Adapter for Hadoop •Oracle Loader for Hadoop
  • Why Open-Source Apache Hadoop?• Fast evolution in critical features • Built by the Hadoop experts in the community • Practical instead of esoteric • Focus on what is needed for large clusters• Proven at very large scale • In production at all the large consumers of Hadoop • Extremely stable in those environments • Well-understood by practitioners
  • Software Layout • Node 1: • M: Name Node, Balancer & HBase Master • S: HDFS Data Node, NoSQL DB Storage Node • Node 2: • M: Secondary Name Node, Management, Zookeeper, MySQL Slave • S: HDFS Data Node, NoSQL DB Storage Node • Node 3: • M: JobTracker, MySQL Master, ODI Agent, Hive Server • S: HDFS Data Node, NoSQL DB Storage Node • Node 4 – 18: • S: HDFS Data Nodes, Task Tracker, HBase Region Server, NoSQL DB Storage Nodes • Your MapReduce runs here!
  • Big Data Appliance Big Data for the Enterprise• Optimized and Complete • Everything you need to store and integrate your lower information density data• Integrated with Oracle Exadata • Analyze all your data• Easy to Deploy • Risk Free, Quick Installation and Setup• Single Vendor Support • Full Oracle support for the entire system and software set
  • <Insert Picture Here>Oracle NoSQL Database
  • Key-Value Store Workloads• Large dynamic schema based data repositories• Data capture • Web applications • Online retail • Sensor/statistics/network capture/Mobile Devices• Data services • Scalable authentication • Real-time communication (MMS, SMS, routing) • Personalization / Localization • Social Networks
  • Oracle NoSQL DB A distributed, scalable key-value database• Simple Data Model • Key-value pair with major+sub-key paradigm • Read/insert/update/delete operations Application Application• Scalability NoSQLDB Driver NoSQLDB Driver • Dynamic data partitioning and distribution • Optimized data access via intelligent driver• High availability • One or more replicas • Disaster recovery through location of replicas • Resilient to partition master failures • No single point of failure Storage Nodes Storage Nodes• Transparent load balancing Data Center B Data Center A • Reads from master or replicas • Driver is network topology & latency aware
  • Resolving a Request Operation + Key[M,m] + Value + Transaction Policy ClientHash Major Key to determinePartition id Use Partition Map to map Partition • Operation result id to a Rep Group • New Partition Map • RepNodeStorageTable Use State Table to determine eligible information Storage Node(s) within Rep Group Use Load Balancer to select best eligible Rep Node Contact Rep Node directly
  • ACID TransactionsTransaction Policy Transaction PolicyWrite Durability Read Consistency• Configurable per-operation, • Configurable per-operation, application can set defaults application can set defaults• Write Transaction Durability consists • Read Consistency specified as of both Absolute, Time-based, Version or a) Sync policy (on Master and None Replica) • Absolute  Read from the master • Sync – force to disk • Write No Sync – force to OS • Time-based  Read from any buffer replica that is within <time- • No Sync – write to local log buffer, interval> of master or better flush when convenient • Version  Read from any replica b) Replica Acknowledgement Policy that is current with <transaction- • All token> or higher • Simple Majority • None  Read from any replica • None
  • Oracle NoSQL DB Differentiation• Commercial Grade Software and Support • General-purpose • Reliable – Based on proven Berkeley DB JE HA • Easy to install and configure• Scalable throughput, bounded latency• Simple Programming and Operational Model • Simple Major + Sub key and Value data structure • ACID transactions • Configurable consistency & durability• Easy Management • Web-based console, API accessible • Manages and Monitors: Topology; Load; Performance; Events; Alerts• Completes Oracle large scale data storage offerings
  • Try NoSQL Database on OTN Oracle NoSQL Database: • Community Edition is available as a software only distribution • Enterprise Edition is available as a separately licensable product or as part of Big Data Appliance
  • <Insert Picture Here>Oracle Loader for Hadoop
  • Oracle Loader for Hadoop Features • Load data into a partitioned or non-partitioned table – Single level, composite or interval partitioned table – Support for scalar datatypes of Oracle Database – Load into Oracle Database 11g Release 2 • Runs as a Hadoop job and supports standard options • Pre-partitions and sorts data on Hadoop • Online and offline load modes27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Oracle Loader for HadoopINPUT 1 MAP MAP ORACLE LOADER FOR HADOOP MAP REDUCE REDUCE MAP MAP REDUCE MAP REDUCE MAP MAP REDUCE REDUCE REDUCE SHUFFLE MAP /SORT SHUFFLE MAP /SORT MAP MAP MAP REDUCE MAP REDUCE MAP REDUCE MAP REDUCE MAP MAP REDUCE SHUFFLE SHUFFLE MAP /SORT MAP /SORT REDUCE SHUFFLE /SORTINPUT 228 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Oracle Loader for Hadoop: Online Option Read target table metadata Perform ORACLE LOADER FOR HADOOP Connect to the database from the database partitioning, sorting, and from reducer nodes, load data conversion into database partitions in parallel MAP REDUCE MAP REDUCE SHUFFLE MAP /SORT MAP REDUCE MAP REDUCE SHUFFLE MAP REDUCE /SORT29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Oracle Loader for Hadoop: Offline Option Read target table metadata Perform ORACLE LOADER FOR HADOOP Write from reducer nodes to from the database partitioning, sorting, and Oracle Data Pump files data conversion MAP Import into the database in REDUCE parallel using external table MAP mechanism REDUCE SHUFFLE MAP /SORT MAP REDUCE MAP REDUCE SHUFFLE MAP REDUCE /SORT30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Oracle Loader for Hadoop Advantages • Offload database server processing to Hadoop: – Convert input data to final database format – Compute table partition for row – Sort rows by primary key within a table partition • Generate binary datapump files • Balance partition groups across reducers31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Input and Output FormatsInput Formats Output Formats Online Mode• Delimited text • Load directly from Hadoop nodes to Oracle database• Hive tables – JDBC – Managed and external tables – Parallel direct path – Native and non-native tables Offline Mode• Write your own input format • Datapump format – Create binary files for external tables – Import data into the database from the external table with a SQL statement • CSV, delimited text – Load through SQL*Loader or external table mechanism 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Selection Output Option for Use Case Oracle Loader for Hadoop Use Case Characteristics Output Option Online load with JDBC The simplest use case for non partitioned tables Online load with Direct Path Fast online load for partitioned tables Offline load with datapump files Fastest load method for external tables On Oracle Big Data Appliance Leave data on HDFS Direct HDFS Parallel access from database Import into database when needed33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Invoking Oracle Loader for Hadoop • Command line $ hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader -libjars <library jar files> -D <configuration properties> $HADOOP_HOME/bin/hadoop jar oraloader.jar oracle.hadoop.loader.oraLoader -libjars avro-1.4.1.jar, commons-math-2.2.jar -conf connection.xml -D mapreduce.inputformat.class=oracle.hadoop.loader.lib.input.DelimitedTextInputFormat -D mapreduce.outputformat.class=oracle.hadoop.loader.lib.output.JDBCOutputFormat34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • Automate Usage of Oracle Loader for Hadoop Oracle Data Integrator (ODI) • ODI has knowledge modules to – Generate data transformation code to run on Hive/Hadoop – Invoke Oracle Loader for Hadoop • Use the drag-and-drop interface in ODI to – Include invocation of Oracle Loader for Hadoop in any ODI packaged flow36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • 37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • <Insert Picture Here>Summary
  • Big Data Appliance Big Data for the Enterprise• Optimized and Complete • Everything you need to store and integrate your lower information density data• Integrated with Oracle Exadata • Analyze all your data• Easy to Deploy • Risk Free, Quick Installation and Setup• Single Vendor Support • Full Oracle support for the entire system and software set
  • Big Data Appliance and ExadataBig Data for the Enterprise NoSQL DB  HDFS  Hadoop  RDBMS 
  • Questions