SlideShare a Scribd company logo
There and back again.
or how to connect Oracle and Big Data.
SESSION ID
GLEB OTOCHKIN
Gleb Otochkin
Started to work with data in 1992
At Pythian since 2008
Area of expertise:
● Data Integration
● Oracle RAC
● Oracle engineered systems
● Virtualization
● Performance tuning
● Big Data
otochkin@pythian.com
@sky_vst
Principal Consultant
© The Pythian Group Inc., 2017 2
ABOUT PYTHIAN
Pythian’s 400+ IT professionals
help companies adopt and
manage disruptive technologies
to better compete
© 2016 Pythian. Confidential 3
Systems currently
managed by Pythian
EXPERIENCED
Pythian experts
in 35 countries
GLOBAL
Millennia of experience
gathered and shared
over 19 years
EXPERTS
11,800 2400
© The Pythian Group Inc., 2017 4
Journey of our data begins here.
© The Pythian Group Inc., 2017 5
AGENDA
© The Pythian Group Inc., 2017 6
• What do we call by name “Big Data”?
• Business cases.
• Real time replication from Oracle to Big Data.
• How to use Big Data in Oracle.
• What is next.
• QA
Big Data?
© The Pythian Group Inc., 2017 7
CLICK TO ADD
TITLE
Big Data ecosystem
Big Data?
Here in Texas we call it
just Data.
CLICK TO ADD
TITLE
SUBTITLE
© The Pythian Group Inc., 2017
Big Data ecosystem
VOLUME
ANALYSIS
ANY
STRUCTURE
PROCESSINGGROWS
COMPLEX
DISCOVERY
BIG
Some Big Data tools and terms.
Some terms and tools used in Big DATA:
• HDFS - Hadoop Distributed File System.
• Apache Hadoop - framework for distributed storage and processing.
• HBase - non-relational, distributed database.
• Kafka - streaming processing data platform.
• Flume - streaming, aggregation data framework.
• Cassandra - open-source distributed NoSQL database.
• Mongodb - open-source cross-platform document-oriented database.
• Hive - query and analysis data framework on top of Hadoop.
• Avro - data format widely used in BD (JSON+binary)
© The Pythian Group Inc., 2017 10
Business cases
© The Pythian Group Inc., 2017 11
Why do we need replication?
Business cases.
Business cases.
Operation activity on DB with rest on a Data Lake.
© The Pythian Group Inc., 2017 13
RDBMS
OLTP
RDBMS
OLTP
BD platform
Data
Preparation
Engine
BI &
Analysis
Business cases.
Operation activity on DB with Data in Big Data and BI on a RDBMS.
© The Pythian Group Inc., 2017 14
RDBMS
OLTP
RDBMS
OLTP
BD platform
Data
Preparation
Engine
BI &
AnalysisRDBMS
Business cases.
Operation and BI activity on DB with main Data body in a Big Data platform.
© The Pythian Group Inc., 2017 15
RDBMS
OLTP
RDBMS
OLTP
BD platform
BI &
Analysis
ODI & BD SQL
Business cases.
Operation and BI activity on DB with main Data body in a Big Data platform.
© The Pythian Group Inc., 2017 16
RDBMS
OLTP
RDBMS
OLTP
BD platforms
BI &
Analysis
Kafka KafkaKafka
RDBMS
Business cases.
Short summary
• How we connect different platforms:
•Data replication tools.
▪Oracle GoldenGate.
▪DBVisit.
▪Shareplex.
▪ELT tools
•Batch load tools.
▪Sqoop.
▪Oracle loader for Hadoop
•Data Integration tools.
▪Oracle Data Integrator.
•Presenting Big Data to RDBMS:
▪Oracle Big Data SQL.
▪Gluent.
• Reasons:
• Growing Data Volume.
• Unstructured data.
• Data retention policy.
• Cost.
© The Pythian Group Inc., 2017 17
Oracle to big data.
Real time replication OLTP data to a Big Data platform.
And what is wrong with Sqoop?
Batch offloading from Oracle to Big Data.
To HDFS.
• Sqoop one of the oldes.
• Works in both ways.
• Gluent is more than a just
batch tool.
© The Pythian Group Inc., 2017 20
SQOOP HDFS
HDFS
HDFS
HDFS
HDFS
Database
Replication to Big Data.
Sqoop import.
© The Pythian Group Inc., 2017 21
App
App
App
App Database
App
Sqoop HDFS
HDFS
HDFS
HDFS
HDFS
MapReduceJDBC
Hive
Generate MapReduce Jobs.
Connects through JDBC.
Good for batch processing.
Replication to Big Data.
Oracle Goldengate:
• Proc.
•Real time streaming.
•No or minimal impact to the source.
•Supports most of the data types.
•Different formats and conversion.
•Enterprise support
• Cons.
•Licensing fee.
•Closed source code.
© The Pythian Group Inc., 2017 22
Why Goldengate?
Replication to Big Data.
Replication to HDFS by Oracle GoldenGate.
© The Pythian Group Inc., 2017 23
App
App
App
App Database Tran
Log
App
Oracle
Goldengate
Oracle
Goldengate
HDFS
HDFS
HDFS
HDFS
HDFS
OGG Trail
Replication to Big Data.
Source side.
• Oracle 11.2.0.4 and up
• Archivelog mode.
• Supplemental logging:
•Minimal od DB level.
•On schema or table level.
• OGG user in database.
© The Pythian Group Inc., 2017 24
App
App
App
App Database Tran
Log
App
Replication to Big Data.
Oracle GoldenGate.
• OGG 12.2.0.1 and up.
• BD adapters with replicat.
• Supported BD targets:
•HDFS
•Kafka
•Flume
•HBase
•Mongodb (from 12.3)
•Cassandra (from 12.3)
• Different formats.
© The Pythian Group Inc., 2017 25
Tran
Log
Manager
OGG Trail
Extract
OGG Trail
Data Pump
Manager
Replicat
Replication to Big Data.
To HDFS.
• OGG 12.2.0.1 and up.
• BD adapters with replicat.
• Different formats.
• DML and DDL.
© The Pythian Group Inc., 2017 26
OGG Trail
Manager
Replicat
HDFS Client
HDFS
HDFS
HDFS
HDFS
HDFS
Replication to Big Data.
Test two.
• New columns:
•Operation type.
•Table name
•Local and UTC timestamp
orcl> select * from ggtest.test_tab_2;
ID RND_STR USE_DATE
1 BGBXRKJL 02/13/16 08:34:19
2 FNMCEPWE 08/17/15 04:50:18
© The Pythian Group Inc., 2017 27
hive> select * from BDTEST.TEST_TAB_2;
OK
I BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:21.186000 00000000120000004759 1 BGBXRKJL
2016-02-13:08:34:19 NULL
I BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:22.827000 00000000120000004921 2
FNMCEPWE 2015-08-17:04:50:18 NULL
Replication to Big Data.
To Kafka.
© The Pythian Group Inc., 2017 28
OGG Trail
Manager
Replicat
Kafka
producer
Kafka
topic
Kafka
topic
Zookeeper
Kafka
Consumer
Kafka
Consumer
Kafka
Consumer
Broker
Stream
Replication to Big Data.
Some notes about source:
• JSON,Text,Avro(different types) and
XML formats.
• Some strange behaviour with Avro.
• Hive support through text format.
• Trimming out leading or trailing
whitespaces.
• Truncates only as DML.
• Proper path to Java classes.
• We are replicating a log of changes.
• Only committed transactions.
• “Passive” commit.
• Different levels of supplemental
logging may lead to different
captured data (compressed DML).
• DDL only with following DML.
• CTAS is not supported.
© The Pythian Group Inc., 2017 29
and BD destination:
Replication to Big Data.
Some notes about Kafka
handler:
• Different topologies for Kafka.
• Topic partitioning to table.
• Topics are created automatically.
• Operation vs transactional mode.
• Blocking vs Non-Blocking.
• Version 0.9+.
• Schema definition can go to a
different topic(Avro).
© The Pythian Group Inc., 2017 30
and others:
Kafka connect and Kafka stream.
Confluent team and their additions.
• Kafka connect.
• Frame to build connector.
• https://www.confluent.io/product
/connectors/
• Main purpose is copy data to
and from Kafka.
© The Pythian Group Inc., 2017 31
• Kafka Stream.
• Incorporated to Kafka and works
together.
• https://www.confluent.io/product/kaf
ka-streams/
• Input data are transformed to output
data.
Replication to Big Data.
To Flume.
© The Pythian Group Inc., 2017 32
OGG Trail
Manager
Replicat
Flume
source
Flume
channel
HDFSFlume sink
Flume agent
Replication to Big Data.
Some notes about Flume handler:
• Configuration file path.
• Failover and load balancing (from
1.6.x for Avro).
• Kerberos security (from 1.6.x for
Thrift).
• Version 1.4+. (1.6.x better)
• Avro and Thrift formats.
• Schema changes go to a different
Flume event.
© The Pythian Group Inc., 2017 33
and others:
Replication to Big Data.
To HBase. MongoDB. Cassandra.
© The Pythian Group Inc., 2017 34
OGG Trail
Manager
Replicat
Hbase client HDFS
HBASE
Replication to Big Data.
Primary key for a source table in HBase.
orcl> alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id);
Table altered.
orcl> insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null);
© The Pythian Group Inc., 2017 35
• Having supplemental logging for all columns:
•Row id as concatenation of values for all columns.
• Adding a primary key and supplemental logging for keys:
•Row id is primary key.
Replication to Big Data.
Two cases. All columns supplemental logging and PK supplemental logging.
© The Pythian Group Inc., 2017 36
hbase(main):012:0> scan 'BDTEST:TEST_TAB_2'
ROW COLUMN+CELL
7|IJWQRO7T|2013-07-07:08:13:52 column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52
7|IJWQRO7T|2013-07-07:08:13:52 column=cf:PK_ID, timestamp=1459275116849, value=7
7|IJWQRO7T|2013-07-07:08:13:52 column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T
8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37
8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:PK_ID, timestamp=1459278884047, value=8
8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1
8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER
9 column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19
9 column=cf:PK_ID, timestamp=1462473865704, value=9
9 column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST
9 column=cf:TEST_COL, timestamp=1462473865704, value=NULL
Replication to Big Data.
Some notes about HBase handler:
• Hbase-site.xml in the classpath.
• GROUPTRANSOPS batching for
performance.
• Kerberos security (from 1.6.x for
Thrift).
• Version 1.1+. (0.98 with compat
parameter)
• HBase row key mapping.
• Only single column family.
© The Pythian Group Inc., 2017 37
and others:
Replication to Big Data.
Some notes about Cassandra
handler:
• Does not allow the _id column to be
modified.
• Undo for bulk transactions (requires
full before image for deletes in OGG).
• Primary key updates are deletes and
inserts (watch compressed updates in
OGG).
• Does not automatically create
keyspaces (but creates tables)
• Primary keys are immutable.
• Inserts and Updates are different
than in traditional databases.
• Primary key updates are deletes and
inserts (watch compressed updates
in OGG).
© The Pythian Group Inc., 2017 38
and Mongodb:
Other tools.
What can be used as an alternative.
• DBvisit.
• Replicate connector to Kafka.
• Confluent version of Kafka.
• http://www.dbvisit.com/replicate
_connector_for_kafka
• Has similar functionality to
Oracle GoldenGate.
© The Pythian Group Inc., 2017 39
• Dell Shareplex.
• Replication to Kafka.
• Supports most of datatypes.
• http://documents.software.dell.com/
SharePlex/8.6.5/release-notes/syst
em-requirements
• https://documents.software.dell.com
/shareplex/8.6.4/administration-guid
e/configure-replication-to-open-targ
et-targets/configure-replication-to-a-
kafka-target
Back again
Using data from Big Data in Oracle
From Big Data to Oracle.
© The Pythian Group Inc., 2017 41
Tool Filtering Data
Conversion
Query Offload
query
Parallel
Sqoop Oracle Oracle Yes
ODBC gateway Hadoop Oracle Yes Yes
Oracle loader
for HDFS
Oracle Hadoop Yes
Oracle SQL
Connector for
HDFS
Oracle Oracle Yes Yes
BD SQL Hadoop Hadoop Yes Yes Yes
ODI KM KM Yes Yes
Gluent Hadoop Hadoop Yes Yes Yes
Oracle Data Integrator.
ODI vs traditional ETL.
Intermediate staging
and transformation
Source I
Source II
Target
Extract Transform Load
Load engine
Source I
Source II Target
Extract TransformLoad
Transform
© The Pythian Group Inc., 2017 42
Oracle Data Integrator.
How it looks.
Weblogic
Staging
schema
Oracle DB I
HDFS
Work
repository
Oracle DB II
CSV text
ODI Agent
Target
schema
Master
repository
ODI Studio
© The Pythian Group Inc., 2017 43
Oracle Data Integrator.
ODI workflow.
Data model
Project I
Designer
Project II
Data model
Data model
Data model
Logical
schema I
Logical view
Logical
schema II
Prod Batch
Context
Prod
Stream
PROD
Physical
PROD
Test
PROD
Stream
PROD
Stream
Test env
Test env
© The Pythian Group Inc., 2017 44
Oracle Data Integrator.
ODI logical mapping
• Logical mapping is
separated from
physical
implementation.
© The Pythian Group Inc., 2017 45
Oracle Data Integrator.
ODI physical mapping.
• Different physical
mapping for the
same logical map
• Different
knowledge
modules.
© The Pythian Group Inc., 2017 46
ODI
Oracle Data Integrator.
• Knowledge modules for most
platforms.
• Uses filters, mappings, joins and
constraints.
• Batch or event(stream) oriented
integration.
• ELT:
• Extract.
• Load.
• Transform.
• Logical and physical models separation.
• Knowledge modules:
• RKM - reverse engineering.
• LKM - load.
• IKM - integration.
© The Pythian Group Inc., 2017 47
Oracle Big Data SQL
Architecture.
© The Pythian Group Inc., 2017 48
Oracle Database Hadoop
CDH or HDP
management server
Hadoop
BD SQL
agent/service
BD SQL
Exadata technology for Big Data
Oracle Big Data SQL.
What it can do for us.
• Smart scan:
• Storage indexes.
• Bloom filters.
• Works with:
• Apache Hive.
• HDFS
• Oracle NoSQL Database.
• Apache HBase
• Fully supports SQL syntax.
• Predicate push down.
© The Pythian Group Inc., 2017 49
Oracle Big Data SQL.
Views and packages to use.
• Smart scan:
• Storage
indexes.
• Bloom filters.
© The Pythian Group Inc., 2017 50
orcl> select cluster_id,database_name, owner, table_name from all_hive_tables where database_name='bdtest';
CLUSTER_ID DATABASE_NAME OWNER TABLE_NAME
bigdatalite bdtest oracle test_tab_1
bigdatalite bdtest oracle test_tab_2
PROCEDURE CREATE_EXTDDL_FOR_HIVE
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
CLUSTER_ID VARCHAR2 IN
DB_NAME VARCHAR2 IN
HIVE_TABLE_NAME VARCHAR2 IN
HIVE_PARTITION PL/SQL BOOLEAN IN
TABLE_NAME VARCHAR2 IN
PERFORM_DDL PL/SQL BOOLEAN IN DEFAULT
TEXT_OF_DDL CLOB OUT
Oracle Big Data SQL.
Building the external table.
• Hive table can be queried in
Oracle now.
© The Pythian Group Inc., 2017 51
CREATE TABLE test_tab_2 ( tran_flag VARCHAR2(4000), tab_name VARCHAR2(4000), ………..)
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS
( com.oracle.bigdata.cluster=bigdatalite
com.oracle.bigdata.tablename=bdtest.test_tab_2) ) PARALLEL 2 REJECT LIMIT UNLIMITED
TRAN_FLAG ID RND_STR USE_DATE
---------- ---------- ---------- --------------------
I 1 BGBXRKJL 2016-02-13:08:34:19
I 2 FNMCEPWE 2015-08-17:04:50:18
I 1 BGBXRKJL 2016-02-13:08:34:19
Oracle Big Data SQL.
What it can do for us.
• External table FULL access.
© 2016 Pythian. Confidential 52
Oracle Big Data SQL.
Options for BD SQL.
• Oracle Copy to Hadoop utility.
•Using Data Pump format.
•ORACLE_DATAPUMP access driver.
•CTAS to external table.
•hadoop fs -put …
•Hive “INPUTFORMAT
'oracle.hadoop.hive.datapump.DPInputFormat'”
•Oracle Shell for Hadoop Loaders
• ACCESS PARAMETERS :
• com.oracle.bigdata.overflow
• com.oracle.bigdata.tablename
• com.oracle.bigdata.colmap
• com.oracle.bigdata.erroropt
• Directly to HDFS :
• TYPE oracle_hdfs.
© The Pythian Group Inc., 2017 53
Other tools.
What can be used as an alternative.
• Gluent Data Platform.
• Can offload data to Hadoop.
• Allows query data of any data
source in Hadoop.
• Offload engine .
• Supports Cloudera, Hortonworks or
any Hadoop with Impala or or Hive
SQL engine installed.
https://gluent.com/
© The Pythian Group Inc., 2017 54
And here we have returned back.
© The Pythian Group Inc., 2017 55
QA
© The Pythian Group Inc., 2017 56
THANK YOU
Email: otochkin@pythian.com
https://pythian.com/blog
@sky_vst
© The Pythian Group Inc., 2017 57

More Related Content

What's hot

Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseСергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseVolha Banadyseva
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178Kai Sasaki
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Oracle GoldenGate for MySQL Overview
Oracle GoldenGate for MySQL OverviewOracle GoldenGate for MySQL Overview
Oracle GoldenGate for MySQL OverviewJinyu Wang
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database OverviewSteve Min
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanMichael Stack
 
Power JSON with PostgreSQL
Power JSON with PostgreSQLPower JSON with PostgreSQL
Power JSON with PostgreSQLEDB
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengCeph Community
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 

What's hot (20)

Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL databaseСергей Сверчков и Виталий Руденя. Choosing a NoSQL database
Сергей Сверчков и Виталий Руденя. Choosing a NoSQL database
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Oracle GoldenGate for MySQL Overview
Oracle GoldenGate for MySQL OverviewOracle GoldenGate for MySQL Overview
Oracle GoldenGate for MySQL Overview
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
SQL on Azure
SQL on AzureSQL on Azure
SQL on Azure
 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
 
Power JSON with PostgreSQL
Power JSON with PostgreSQLPower JSON with PostgreSQL
Power JSON with PostgreSQL
 
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to Contribute
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 

Similar to [db tech showcase Tokyo 2017] C13:There and back again or how to connect Oracle and Big Data by The Pythian Group Inc. - Gleb Otochkin

[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...
[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...
[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...Insight Technology, Inc.
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018Gleb Otochkin
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...tdc-globalcode
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
EDB: Power to Postgres
EDB: Power to PostgresEDB: Power to Postgres
EDB: Power to PostgresAshnikbiz
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaEdelweiss Kammermann
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 

Similar to [db tech showcase Tokyo 2017] C13:There and back again or how to connect Oracle and Big Data by The Pythian Group Inc. - Gleb Otochkin (20)

[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...
[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...
[db tech showcase Tokyo 2017] C24:Taking off to the clouds. How to use DMS in...
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Virtualization and Containers
Virtualization and ContainersVirtualization and Containers
Virtualization and Containers
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
EDB: Power to Postgres
EDB: Power to PostgresEDB: Power to Postgres
EDB: Power to Postgres
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 

More from Insight Technology, Inc.

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?Insight Technology, Inc.
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Insight Technology, Inc.
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明するInsight Technology, Inc.
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーンInsight Technology, Inc.
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとInsight Technology, Inc.
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?Insight Technology, Inc.
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームInsight Technology, Inc.
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門Insight Technology, Inc.
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 Insight Technology, Inc.
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也Insight Technology, Inc.
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー Insight Technology, Inc.
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?Insight Technology, Inc.
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Insight Technology, Inc.
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?Insight Technology, Inc.
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...Insight Technology, Inc.
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 Insight Technology, Inc.
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Insight Technology, Inc.
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]Insight Technology, Inc.
 

More from Insight Technology, Inc. (20)

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
Docker and the Oracle Database
Docker and the Oracle DatabaseDocker and the Oracle Database
Docker and the Oracle Database
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごと
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォーム
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門
 
Lunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL ServicesLunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL Services
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
 

Recently uploaded

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

[db tech showcase Tokyo 2017] C13:There and back again or how to connect Oracle and Big Data by The Pythian Group Inc. - Gleb Otochkin

  • 1. There and back again. or how to connect Oracle and Big Data. SESSION ID GLEB OTOCHKIN
  • 2. Gleb Otochkin Started to work with data in 1992 At Pythian since 2008 Area of expertise: ● Data Integration ● Oracle RAC ● Oracle engineered systems ● Virtualization ● Performance tuning ● Big Data otochkin@pythian.com @sky_vst Principal Consultant © The Pythian Group Inc., 2017 2
  • 3. ABOUT PYTHIAN Pythian’s 400+ IT professionals help companies adopt and manage disruptive technologies to better compete © 2016 Pythian. Confidential 3
  • 4. Systems currently managed by Pythian EXPERIENCED Pythian experts in 35 countries GLOBAL Millennia of experience gathered and shared over 19 years EXPERTS 11,800 2400 © The Pythian Group Inc., 2017 4
  • 5. Journey of our data begins here. © The Pythian Group Inc., 2017 5
  • 6. AGENDA © The Pythian Group Inc., 2017 6 • What do we call by name “Big Data”? • Business cases. • Real time replication from Oracle to Big Data. • How to use Big Data in Oracle. • What is next. • QA
  • 7. Big Data? © The Pythian Group Inc., 2017 7
  • 8. CLICK TO ADD TITLE Big Data ecosystem Big Data? Here in Texas we call it just Data.
  • 9. CLICK TO ADD TITLE SUBTITLE © The Pythian Group Inc., 2017 Big Data ecosystem VOLUME ANALYSIS ANY STRUCTURE PROCESSINGGROWS COMPLEX DISCOVERY BIG
  • 10. Some Big Data tools and terms. Some terms and tools used in Big DATA: • HDFS - Hadoop Distributed File System. • Apache Hadoop - framework for distributed storage and processing. • HBase - non-relational, distributed database. • Kafka - streaming processing data platform. • Flume - streaming, aggregation data framework. • Cassandra - open-source distributed NoSQL database. • Mongodb - open-source cross-platform document-oriented database. • Hive - query and analysis data framework on top of Hadoop. • Avro - data format widely used in BD (JSON+binary) © The Pythian Group Inc., 2017 10
  • 11. Business cases © The Pythian Group Inc., 2017 11
  • 12. Why do we need replication? Business cases.
  • 13. Business cases. Operation activity on DB with rest on a Data Lake. © The Pythian Group Inc., 2017 13 RDBMS OLTP RDBMS OLTP BD platform Data Preparation Engine BI & Analysis
  • 14. Business cases. Operation activity on DB with Data in Big Data and BI on a RDBMS. © The Pythian Group Inc., 2017 14 RDBMS OLTP RDBMS OLTP BD platform Data Preparation Engine BI & AnalysisRDBMS
  • 15. Business cases. Operation and BI activity on DB with main Data body in a Big Data platform. © The Pythian Group Inc., 2017 15 RDBMS OLTP RDBMS OLTP BD platform BI & Analysis ODI & BD SQL
  • 16. Business cases. Operation and BI activity on DB with main Data body in a Big Data platform. © The Pythian Group Inc., 2017 16 RDBMS OLTP RDBMS OLTP BD platforms BI & Analysis Kafka KafkaKafka RDBMS
  • 17. Business cases. Short summary • How we connect different platforms: •Data replication tools. ▪Oracle GoldenGate. ▪DBVisit. ▪Shareplex. ▪ELT tools •Batch load tools. ▪Sqoop. ▪Oracle loader for Hadoop •Data Integration tools. ▪Oracle Data Integrator. •Presenting Big Data to RDBMS: ▪Oracle Big Data SQL. ▪Gluent. • Reasons: • Growing Data Volume. • Unstructured data. • Data retention policy. • Cost. © The Pythian Group Inc., 2017 17
  • 18. Oracle to big data. Real time replication OLTP data to a Big Data platform.
  • 19. And what is wrong with Sqoop?
  • 20. Batch offloading from Oracle to Big Data. To HDFS. • Sqoop one of the oldes. • Works in both ways. • Gluent is more than a just batch tool. © The Pythian Group Inc., 2017 20 SQOOP HDFS HDFS HDFS HDFS HDFS Database
  • 21. Replication to Big Data. Sqoop import. © The Pythian Group Inc., 2017 21 App App App App Database App Sqoop HDFS HDFS HDFS HDFS HDFS MapReduceJDBC Hive Generate MapReduce Jobs. Connects through JDBC. Good for batch processing.
  • 22. Replication to Big Data. Oracle Goldengate: • Proc. •Real time streaming. •No or minimal impact to the source. •Supports most of the data types. •Different formats and conversion. •Enterprise support • Cons. •Licensing fee. •Closed source code. © The Pythian Group Inc., 2017 22 Why Goldengate?
  • 23. Replication to Big Data. Replication to HDFS by Oracle GoldenGate. © The Pythian Group Inc., 2017 23 App App App App Database Tran Log App Oracle Goldengate Oracle Goldengate HDFS HDFS HDFS HDFS HDFS OGG Trail
  • 24. Replication to Big Data. Source side. • Oracle 11.2.0.4 and up • Archivelog mode. • Supplemental logging: •Minimal od DB level. •On schema or table level. • OGG user in database. © The Pythian Group Inc., 2017 24 App App App App Database Tran Log App
  • 25. Replication to Big Data. Oracle GoldenGate. • OGG 12.2.0.1 and up. • BD adapters with replicat. • Supported BD targets: •HDFS •Kafka •Flume •HBase •Mongodb (from 12.3) •Cassandra (from 12.3) • Different formats. © The Pythian Group Inc., 2017 25 Tran Log Manager OGG Trail Extract OGG Trail Data Pump Manager Replicat
  • 26. Replication to Big Data. To HDFS. • OGG 12.2.0.1 and up. • BD adapters with replicat. • Different formats. • DML and DDL. © The Pythian Group Inc., 2017 26 OGG Trail Manager Replicat HDFS Client HDFS HDFS HDFS HDFS HDFS
  • 27. Replication to Big Data. Test two. • New columns: •Operation type. •Table name •Local and UTC timestamp orcl> select * from ggtest.test_tab_2; ID RND_STR USE_DATE 1 BGBXRKJL 02/13/16 08:34:19 2 FNMCEPWE 08/17/15 04:50:18 © The Pythian Group Inc., 2017 27 hive> select * from BDTEST.TEST_TAB_2; OK I BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:21.186000 00000000120000004759 1 BGBXRKJL 2016-02-13:08:34:19 NULL I BDTEST.TEST_TAB_2 2016-10-25 01:09:16.000168 2016-10-24T21:09:22.827000 00000000120000004921 2 FNMCEPWE 2015-08-17:04:50:18 NULL
  • 28. Replication to Big Data. To Kafka. © The Pythian Group Inc., 2017 28 OGG Trail Manager Replicat Kafka producer Kafka topic Kafka topic Zookeeper Kafka Consumer Kafka Consumer Kafka Consumer Broker Stream
  • 29. Replication to Big Data. Some notes about source: • JSON,Text,Avro(different types) and XML formats. • Some strange behaviour with Avro. • Hive support through text format. • Trimming out leading or trailing whitespaces. • Truncates only as DML. • Proper path to Java classes. • We are replicating a log of changes. • Only committed transactions. • “Passive” commit. • Different levels of supplemental logging may lead to different captured data (compressed DML). • DDL only with following DML. • CTAS is not supported. © The Pythian Group Inc., 2017 29 and BD destination:
  • 30. Replication to Big Data. Some notes about Kafka handler: • Different topologies for Kafka. • Topic partitioning to table. • Topics are created automatically. • Operation vs transactional mode. • Blocking vs Non-Blocking. • Version 0.9+. • Schema definition can go to a different topic(Avro). © The Pythian Group Inc., 2017 30 and others:
  • 31. Kafka connect and Kafka stream. Confluent team and their additions. • Kafka connect. • Frame to build connector. • https://www.confluent.io/product /connectors/ • Main purpose is copy data to and from Kafka. © The Pythian Group Inc., 2017 31 • Kafka Stream. • Incorporated to Kafka and works together. • https://www.confluent.io/product/kaf ka-streams/ • Input data are transformed to output data.
  • 32. Replication to Big Data. To Flume. © The Pythian Group Inc., 2017 32 OGG Trail Manager Replicat Flume source Flume channel HDFSFlume sink Flume agent
  • 33. Replication to Big Data. Some notes about Flume handler: • Configuration file path. • Failover and load balancing (from 1.6.x for Avro). • Kerberos security (from 1.6.x for Thrift). • Version 1.4+. (1.6.x better) • Avro and Thrift formats. • Schema changes go to a different Flume event. © The Pythian Group Inc., 2017 33 and others:
  • 34. Replication to Big Data. To HBase. MongoDB. Cassandra. © The Pythian Group Inc., 2017 34 OGG Trail Manager Replicat Hbase client HDFS HBASE
  • 35. Replication to Big Data. Primary key for a source table in HBase. orcl> alter table ggtest.test_tab_2 add constraint pk_test_tab_2 primary key (pk_id); Table altered. orcl> insert into ggtest.test_tab_2 values(9,'PK_TEST',sysdate,null); © The Pythian Group Inc., 2017 35 • Having supplemental logging for all columns: •Row id as concatenation of values for all columns. • Adding a primary key and supplemental logging for keys: •Row id is primary key.
  • 36. Replication to Big Data. Two cases. All columns supplemental logging and PK supplemental logging. © The Pythian Group Inc., 2017 36 hbase(main):012:0> scan 'BDTEST:TEST_TAB_2' ROW COLUMN+CELL 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:ACC_DATE, timestamp=1459275116849, value=2013-07-07:08:13:52 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:PK_ID, timestamp=1459275116849, value=7 7|IJWQRO7T|2013-07-07:08:13:52 column=cf:RND_STR_1, timestamp=1459275116849, value=IJWQRO7T 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:ACC_DATE, timestamp=1459278884047, value=2016-03-29:15:14:37 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:PK_ID, timestamp=1459278884047, value=8 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:RND_STR_1, timestamp=1459278884047, value=TEST_INS1 8|TEST_INS1|2016-03-29:15:14:37|TEST_ALTER column=cf:TEST_COL, timestamp=1459278884047, value=TEST_ALTER 9 column=cf:ACC_DATE, timestamp=1462473865704, value=2016-05-05:14:44:19 9 column=cf:PK_ID, timestamp=1462473865704, value=9 9 column=cf:RND_STR_1, timestamp=1462473865704, value=PK_TEST 9 column=cf:TEST_COL, timestamp=1462473865704, value=NULL
  • 37. Replication to Big Data. Some notes about HBase handler: • Hbase-site.xml in the classpath. • GROUPTRANSOPS batching for performance. • Kerberos security (from 1.6.x for Thrift). • Version 1.1+. (0.98 with compat parameter) • HBase row key mapping. • Only single column family. © The Pythian Group Inc., 2017 37 and others:
  • 38. Replication to Big Data. Some notes about Cassandra handler: • Does not allow the _id column to be modified. • Undo for bulk transactions (requires full before image for deletes in OGG). • Primary key updates are deletes and inserts (watch compressed updates in OGG). • Does not automatically create keyspaces (but creates tables) • Primary keys are immutable. • Inserts and Updates are different than in traditional databases. • Primary key updates are deletes and inserts (watch compressed updates in OGG). © The Pythian Group Inc., 2017 38 and Mongodb:
  • 39. Other tools. What can be used as an alternative. • DBvisit. • Replicate connector to Kafka. • Confluent version of Kafka. • http://www.dbvisit.com/replicate _connector_for_kafka • Has similar functionality to Oracle GoldenGate. © The Pythian Group Inc., 2017 39 • Dell Shareplex. • Replication to Kafka. • Supports most of datatypes. • http://documents.software.dell.com/ SharePlex/8.6.5/release-notes/syst em-requirements • https://documents.software.dell.com /shareplex/8.6.4/administration-guid e/configure-replication-to-open-targ et-targets/configure-replication-to-a- kafka-target
  • 40. Back again Using data from Big Data in Oracle
  • 41. From Big Data to Oracle. © The Pythian Group Inc., 2017 41 Tool Filtering Data Conversion Query Offload query Parallel Sqoop Oracle Oracle Yes ODBC gateway Hadoop Oracle Yes Yes Oracle loader for HDFS Oracle Hadoop Yes Oracle SQL Connector for HDFS Oracle Oracle Yes Yes BD SQL Hadoop Hadoop Yes Yes Yes ODI KM KM Yes Yes Gluent Hadoop Hadoop Yes Yes Yes
  • 42. Oracle Data Integrator. ODI vs traditional ETL. Intermediate staging and transformation Source I Source II Target Extract Transform Load Load engine Source I Source II Target Extract TransformLoad Transform © The Pythian Group Inc., 2017 42
  • 43. Oracle Data Integrator. How it looks. Weblogic Staging schema Oracle DB I HDFS Work repository Oracle DB II CSV text ODI Agent Target schema Master repository ODI Studio © The Pythian Group Inc., 2017 43
  • 44. Oracle Data Integrator. ODI workflow. Data model Project I Designer Project II Data model Data model Data model Logical schema I Logical view Logical schema II Prod Batch Context Prod Stream PROD Physical PROD Test PROD Stream PROD Stream Test env Test env © The Pythian Group Inc., 2017 44
  • 45. Oracle Data Integrator. ODI logical mapping • Logical mapping is separated from physical implementation. © The Pythian Group Inc., 2017 45
  • 46. Oracle Data Integrator. ODI physical mapping. • Different physical mapping for the same logical map • Different knowledge modules. © The Pythian Group Inc., 2017 46
  • 47. ODI Oracle Data Integrator. • Knowledge modules for most platforms. • Uses filters, mappings, joins and constraints. • Batch or event(stream) oriented integration. • ELT: • Extract. • Load. • Transform. • Logical and physical models separation. • Knowledge modules: • RKM - reverse engineering. • LKM - load. • IKM - integration. © The Pythian Group Inc., 2017 47
  • 48. Oracle Big Data SQL Architecture. © The Pythian Group Inc., 2017 48 Oracle Database Hadoop CDH or HDP management server Hadoop BD SQL agent/service BD SQL Exadata technology for Big Data
  • 49. Oracle Big Data SQL. What it can do for us. • Smart scan: • Storage indexes. • Bloom filters. • Works with: • Apache Hive. • HDFS • Oracle NoSQL Database. • Apache HBase • Fully supports SQL syntax. • Predicate push down. © The Pythian Group Inc., 2017 49
  • 50. Oracle Big Data SQL. Views and packages to use. • Smart scan: • Storage indexes. • Bloom filters. © The Pythian Group Inc., 2017 50 orcl> select cluster_id,database_name, owner, table_name from all_hive_tables where database_name='bdtest'; CLUSTER_ID DATABASE_NAME OWNER TABLE_NAME bigdatalite bdtest oracle test_tab_1 bigdatalite bdtest oracle test_tab_2 PROCEDURE CREATE_EXTDDL_FOR_HIVE Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- CLUSTER_ID VARCHAR2 IN DB_NAME VARCHAR2 IN HIVE_TABLE_NAME VARCHAR2 IN HIVE_PARTITION PL/SQL BOOLEAN IN TABLE_NAME VARCHAR2 IN PERFORM_DDL PL/SQL BOOLEAN IN DEFAULT TEXT_OF_DDL CLOB OUT
  • 51. Oracle Big Data SQL. Building the external table. • Hive table can be queried in Oracle now. © The Pythian Group Inc., 2017 51 CREATE TABLE test_tab_2 ( tran_flag VARCHAR2(4000), tab_name VARCHAR2(4000), ………..) ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS ( com.oracle.bigdata.cluster=bigdatalite com.oracle.bigdata.tablename=bdtest.test_tab_2) ) PARALLEL 2 REJECT LIMIT UNLIMITED TRAN_FLAG ID RND_STR USE_DATE ---------- ---------- ---------- -------------------- I 1 BGBXRKJL 2016-02-13:08:34:19 I 2 FNMCEPWE 2015-08-17:04:50:18 I 1 BGBXRKJL 2016-02-13:08:34:19
  • 52. Oracle Big Data SQL. What it can do for us. • External table FULL access. © 2016 Pythian. Confidential 52
  • 53. Oracle Big Data SQL. Options for BD SQL. • Oracle Copy to Hadoop utility. •Using Data Pump format. •ORACLE_DATAPUMP access driver. •CTAS to external table. •hadoop fs -put … •Hive “INPUTFORMAT 'oracle.hadoop.hive.datapump.DPInputFormat'” •Oracle Shell for Hadoop Loaders • ACCESS PARAMETERS : • com.oracle.bigdata.overflow • com.oracle.bigdata.tablename • com.oracle.bigdata.colmap • com.oracle.bigdata.erroropt • Directly to HDFS : • TYPE oracle_hdfs. © The Pythian Group Inc., 2017 53
  • 54. Other tools. What can be used as an alternative. • Gluent Data Platform. • Can offload data to Hadoop. • Allows query data of any data source in Hadoop. • Offload engine . • Supports Cloudera, Hortonworks or any Hadoop with Impala or or Hive SQL engine installed. https://gluent.com/ © The Pythian Group Inc., 2017 54
  • 55. And here we have returned back. © The Pythian Group Inc., 2017 55
  • 56. QA © The Pythian Group Inc., 2017 56