<Insert Picture Here>




Oracle In-Database Hadoop:
When MapReduce Meets RDBMS
Kuassi Mensah | db360.blogspot.com @kmensah
Director Product Management | kuassi.mensah@oracle.com
The following is intended to outline our general product direction. It
              is intended for information purposes only, and may not be
              incorporated into any contract. It is not a commitment to deliver any
              material, code, or functionality, and should not be relied upon in
              making purchasing decisions. The development, release, and
              timing of any features or functionality described for Oracle s
              products remains at the sole discretion of Oracle.




Hadoop Summit 2012, June 13-14, San Jose, California, USA                                                               Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah


         2   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 8
Oracle In-Database Hadoop
             Agenda

            •  In-Database MapReduce
                  •  Why
                  •  Previous Initiatives and Limitations
            •  Oracle In-Database Hadoop
            •  Integration with Oracle’s Big Data solution
            •  Summary




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
In-Database MapReduce




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
MapReduce Paradigm
             You All Know This Stuff!


                                                                                                  Map:
                                                                                               <K1,V1>	
  →	
  	
  
                                                                                              {<K2,V2>,…}	
  


                                                                                                  Shuffle:
                                                                                         {<K2,V2>,	
  …}	
  →	
  
                                                                                       {<K2,{V2,…,V2}>,…}	
  


                                                                                                  Reduce:
                                                                                           <K2,{V2,…,V2}>	
  
                                                                                            →	
  {<K3,V3>,…}	
  




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
In-Database MapReduce
             Why?

            •  Avoid shipping data residing in RDBMS to a
               separate infrastructure.
                  •  Many initiatives
            •  Address top two issues preventing broader
               adoption of Hadoop in the enterprise
                  •  Lack of development and/or administration skills
                  •  Lack of enterprise-class security




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
In-Database MapReduce
             Previous Efforts and Limitations


            •  SQL-MapReduce,HadoopDB (Hadapt), etc.
            •  PL/SQL User-defined pipelined table functions
               and aggregation objects
            •  Limitations
                  •  Lack of compatibility with Hadoop
                  •  Loose integration with Hadoop
                  •  Dependency on Hadoop infrastructure




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop is
                                                            a prototype (not a feature of
                                                            Oracle products), built on
                                                            current Oracle products.




Hadoop Summit 2012, June 13-14, San Jose, California, USA    Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
             Goals

            •  Avoid shipping data residing in Oracle database to
               Hadoop clusters.
            •  Preserve Hadoop programming model
            •  Reduce dependency on Hadoop infrastructure
            •  Get enterprise developers up to speed with minimal
               training
            •  Get enterprise administrators (DBAs) up to speed
               with minimal training
            •  Reduce deployment time
            •  Bring enterprise class security to MapReduce
            •  Seamless integration with Oracle’s Big Data
               solution

Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
             Compatibility & Minimal Dependency on Hadoop Infra
                 Node 1                              Node 2                       Node 3
                                                                                                            Pipelelined Table
                                                                                                             Function w Java
                                                                                                             impl.
           Mapping Process                     Mapping Process              Mapping Process




                                                                                                         PARTITION by
                                                                                                          CLUSTER BY Clause




                  Node 1                               Node 2                      Node 3

                                                                                                           Pipelined Table
                                                                                                            Function w Java impl.
            Reducing Process                     Reducing Process             Reducing Process




Hadoop Summit 2012, June 13-14, San Jose, California, USA           Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
             Preserve Hadoop Programming Model
            •  Source-compatibility
            •  Job configuration
            •  Invocation thru Java interface: job.run()
            •  Direct table access: TableReader and TableWriter




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
             SQL and MapReduce Integration

            •  Mix SQL and MapReduce processing for flexibility and
               efficiency.
            •  MapReduce steps as pipelined table functions.


                  INSERT	
  INTO	
  OutTable	
  
                  SELECT	
  *	
  FROM	
  TABLE	
  
                  	
  (Word_Count_Reduce(:ConfKey,	
  
                  	
   	
  CURSOR(SELECT	
  *	
  FROM	
  TABLE	
  
                  	
  (Word_Count_Map(:ConfKey,	
  
                  	
   	
  CURSOR(SELECT	
  *	
  FROM	
  InTable))))))	
  



Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
               SQL and Java interfaces




	
  	
  SELECT	
  *	
  FROM	
  TABLE	
                                 	
  	
  public	
  class	
  WordCount	
  {	
  
	
  	
  	
  	
  (Reduce_VARCHAR2_NUMBER(:ConfKey,	
   	
  	
  public	
  static	
  void	
  main()	
  throws	
  Exception	
  {	
  
                                                                       	
  	
  	
  	
  /*	
  Setup	
  the	
  parameters	
  and	
  run	
  the	
  job	
  */	
  
	
  	
  	
  	
  	
  	
  CURSOR(SELECT	
  *	
  FROM	
  TABLE	
  
                                                                       	
  	
  	
  	
  ……	
  
	
  	
  	
  	
  (Map_VARCHAR2_NUMBER(:ConfKey,	
  
                                                                       	
  	
  	
  	
  job.init();	
  
	
  	
  	
  	
  	
  	
  CURSOR(SELECT	
  *	
  from	
  InTable))))))	
  	
  	
  	
  	
  job.run();	
  
                                                                              	
  	
  }	
  


  Hadoop Summit 2012, June 13-14, San Jose, California, USA               Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
               Leverage Enterprise Skills
            •  Get database developers up to speed, with minimal
               training, on developing MapReduce jobs by reusing
               Hadoop Mappers and Reducers
            •  Get DBAs up to speed on deploying and managing
               MapReduce jobs with minimal training




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle Database Security
             Bringing Enterprise Class Security to MapReduce


            •  Auditing and Monitoring
                  •  Database Activity Auditing
                  •  Database Firewall Monitoring
                  •  Centralized Audit Data Warehouse
            •  Encryption and Masking
                  •  Transparent Data Encryption
                  •  Network Encryption/Strong Auth
                  •  Data Masking for Non-Production
            •  Privileged User Access Control and Contextual
               Authorization
                  •  Separation of Duties for DBAs
                  •  Protection Realms & Rules
                  •  Label Based Access Control



Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Seamless integration with Oracle’s Big
                       Data Solution




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle’s Big Data solution
                                                            Endeca Information Discovery



                                       Oracle
                                      Big Data
                                                                           Oracle
                                      Appliance
                                                                          Exadata
                                                                                                            Oracle
                                                                                                           Exalytics




                                                            InfiniBand                    InfiniBand




                                                                                                                            Oracle
                                                                                                                           Real-Time
                                                                                                                           Decisions



                              Acquire        Organize & Discover                 Analyze                 Decide




Hadoop Summit 2012, June 13-14, San Jose, California, USA                Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle Direct Connector for HDFS

                                                                                         Direct Access from
                        HDFS                          Oracle Database                      Oracle Database
                                                                          SQL Query

                                                                                                 SQL access to HDFS
                                                                    External
                                                                     Table                       External table view

                                                                                                 Data query or import

                                                              DCH   HDFS
                                          Infini
                                                   Band
                                                             DCH
                                                            DCH
                                                                    Client




Hadoop Summit 2012, June 13-14, San Jose, California, USA            Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Analytics

                                                                                 Oracle Advanced
                                                                                         Analytics
                                                                                          Statistical
                                                                                          Data Mining
                                                                                          Text
                                                                                          Graph
                                                                                          Spatial
                                                                                          Semantic

                               2 miles




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
What Have We Done?




Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Oracle In-Database Hadoop
             Summary
                                                            A prototype:
                                                            •  Apply MapReduce processing to data in
                                                               Oracle RDBMS without the need of a
                                                               separate infrastructure.
                                                            •  Compatibility with Hadoop while
                                                               minimizing dependency on the Apache
                                                               Hadoop infrastructure.
                                                            •  Reduce training and deployment time.
                                                            •  Integration with Oracle SQL, allowing
                                                               mixing MapReduce steps with
                                                               sophisticated SQL queries.
                                                            •  Bring Enterprise Class Security to
                                                               Hadoop MapReduce
                                                            •  Seamless integration with Oracle’s Big
                                                               Data solution

Hadoop Summit 2012, June 13-14, San Jose, California, USA    Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Demo




Hadoop Summit 2012, June 13-14, San Jose, California, USA    Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Hadoop Summit 2012, June 13-14, San Jose, California, USA   Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
Thank You!


             Page 25

Oracle in Database Hadoop

  • 1.
    <Insert Picture Here> OracleIn-Database Hadoop: When MapReduce Meets RDBMS Kuassi Mensah | db360.blogspot.com @kmensah Director Product Management | kuassi.mensah@oracle.com
  • 2.
    The following isintended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah 2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
  • 3.
    Oracle In-Database Hadoop Agenda •  In-Database MapReduce •  Why •  Previous Initiatives and Limitations •  Oracle In-Database Hadoop •  Integration with Oracle’s Big Data solution •  Summary Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 4.
    In-Database MapReduce Hadoop Summit2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 5.
    MapReduce Paradigm You All Know This Stuff! Map: <K1,V1>  →     {<K2,V2>,…}   Shuffle: {<K2,V2>,  …}  →   {<K2,{V2,…,V2}>,…}   Reduce: <K2,{V2,…,V2}>   →  {<K3,V3>,…}   Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 6.
    In-Database MapReduce Why? •  Avoid shipping data residing in RDBMS to a separate infrastructure. •  Many initiatives •  Address top two issues preventing broader adoption of Hadoop in the enterprise •  Lack of development and/or administration skills •  Lack of enterprise-class security Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 7.
    In-Database MapReduce Previous Efforts and Limitations •  SQL-MapReduce,HadoopDB (Hadapt), etc. •  PL/SQL User-defined pipelined table functions and aggregation objects •  Limitations •  Lack of compatibility with Hadoop •  Loose integration with Hadoop •  Dependency on Hadoop infrastructure Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 8.
    Oracle In-Database Hadoop HadoopSummit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 9.
    Oracle In-Database Hadoopis a prototype (not a feature of Oracle products), built on current Oracle products. Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 10.
    Oracle In-Database Hadoop Goals •  Avoid shipping data residing in Oracle database to Hadoop clusters. •  Preserve Hadoop programming model •  Reduce dependency on Hadoop infrastructure •  Get enterprise developers up to speed with minimal training •  Get enterprise administrators (DBAs) up to speed with minimal training •  Reduce deployment time •  Bring enterprise class security to MapReduce •  Seamless integration with Oracle’s Big Data solution Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 11.
    Oracle In-Database Hadoop Compatibility & Minimal Dependency on Hadoop Infra Node 1 Node 2 Node 3 Pipelelined Table Function w Java impl. Mapping Process Mapping Process Mapping Process PARTITION by CLUSTER BY Clause Node 1 Node 2 Node 3 Pipelined Table Function w Java impl. Reducing Process Reducing Process Reducing Process Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 12.
    Oracle In-Database Hadoop Preserve Hadoop Programming Model •  Source-compatibility •  Job configuration •  Invocation thru Java interface: job.run() •  Direct table access: TableReader and TableWriter Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 13.
    Oracle In-Database Hadoop SQL and MapReduce Integration •  Mix SQL and MapReduce processing for flexibility and efficiency. •  MapReduce steps as pipelined table functions. INSERT  INTO  OutTable   SELECT  *  FROM  TABLE    (Word_Count_Reduce(:ConfKey,      CURSOR(SELECT  *  FROM  TABLE    (Word_Count_Map(:ConfKey,      CURSOR(SELECT  *  FROM  InTable))))))   Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 14.
    Oracle In-Database Hadoop SQL and Java interfaces    SELECT  *  FROM  TABLE      public  class  WordCount  {          (Reduce_VARCHAR2_NUMBER(:ConfKey,      public  static  void  main()  throws  Exception  {          /*  Setup  the  parameters  and  run  the  job  */              CURSOR(SELECT  *  FROM  TABLE          ……          (Map_VARCHAR2_NUMBER(:ConfKey,          job.init();              CURSOR(SELECT  *  from  InTable))))))          job.run();      }   Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 15.
    Oracle In-Database Hadoop Leverage Enterprise Skills •  Get database developers up to speed, with minimal training, on developing MapReduce jobs by reusing Hadoop Mappers and Reducers •  Get DBAs up to speed on deploying and managing MapReduce jobs with minimal training Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 16.
    Oracle Database Security Bringing Enterprise Class Security to MapReduce •  Auditing and Monitoring •  Database Activity Auditing •  Database Firewall Monitoring •  Centralized Audit Data Warehouse •  Encryption and Masking •  Transparent Data Encryption •  Network Encryption/Strong Auth •  Data Masking for Non-Production •  Privileged User Access Control and Contextual Authorization •  Separation of Duties for DBAs •  Protection Realms & Rules •  Label Based Access Control Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 17.
    Seamless integration withOracle’s Big Data Solution Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 18.
    Oracle’s Big Datasolution Endeca Information Discovery Oracle Big Data Oracle Appliance Exadata Oracle Exalytics InfiniBand InfiniBand Oracle Real-Time Decisions Acquire Organize & Discover Analyze Decide Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 19.
    Oracle Direct Connectorfor HDFS Direct Access from HDFS Oracle Database Oracle Database SQL Query SQL access to HDFS External Table External table view Data query or import DCH HDFS Infini Band DCH DCH Client Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 20.
    Oracle In-Database Analytics Oracle Advanced Analytics Statistical Data Mining Text Graph Spatial Semantic 2 miles Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 21.
    What Have WeDone? Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 22.
    Oracle In-Database Hadoop Summary A prototype: •  Apply MapReduce processing to data in Oracle RDBMS without the need of a separate infrastructure. •  Compatibility with Hadoop while minimizing dependency on the Apache Hadoop infrastructure. •  Reduce training and deployment time. •  Integration with Oracle SQL, allowing mixing MapReduce steps with sophisticated SQL queries. •  Bring Enterprise Class Security to Hadoop MapReduce •  Seamless integration with Oracle’s Big Data solution Hadoop Summit 2012, June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 23.
    Demo Hadoop Summit 2012,June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 24.
    Hadoop Summit 2012,June 13-14, San Jose, California, USA Oracle In-Database Hadoop: When MapReduce Meets RDBMS. Kuassi Mensah
  • 25.
    Thank You! Page 25