SlideShare a Scribd company logo
1 of 59
Download to read offline
1© The Pythian Group Inc., 2018
DATE
Oracle GoldenGate for Big Data - deep dive.
One bridge to connect them all
Gleb Otochkin
Started to work with data in 1992
At Pythian since 2008
Area of expertise:
● Data Integration
● Oracle RAC
● Oracle engineered systems
● Virtualization
● Performance tuning
● Big Data
otochkin@pythian.com
@sky_vst
Principal Consultant
© 2016 Pythian. Confidential 2
© 2017 Pythian. Confidential 3
Years in Business
20 Pythian Experts
in 35 Countries
400+ Current Clients
Globally
350+
4© The Pythian Group Inc., 2018
AGENDA
4
● Big Data introduction.
● Why we need to stream data.
● What we are streaming.
● Replicat.
● Different targets.
● Cloud.
● QA.
5© The Pythian Group Inc., 2018
Big Data intro.
Why and what.
6© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
CLICK TO ADD
TITLE
Big Data intro.
Big Data?
In Texas we call it just data.
7© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Something about data.
Why do we talk about it and where it is coming from.
7
Col A Col B Col C Col A
Col ACol A Col B
Col A Col B Col A
Col ACol A Col D
Col A Col A Col B
Col ACol A Col B Col C
ROW format
8© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Something about data.
Why do we talk about it and where it is coming from.
8
Col A Col B Col C Col A
Col ACol A Col B Col C
Col A Col B Col A
Col ACol A Col D
Col A Col B Col A
Col ACol A Col B
Columnar format
9© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Something about data and how we store it.
Why do we talk about it and where it is coming from.
9
Col A Col B Col C Col A
Col ACol A Col B
Col A Col B Col A Col B1
Col ACol A Col D
Col B2
Col A Col A Col B
Col ACol A Col B Col C1
Unstructured
ColC2
10© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Something about data and how we store it.
Why do we talk about it and where it is coming from.
10
Block Block Block Block
Block Block Block Block
Block Block Block Block
Block Block Block Block
Buffer - Data Processing Processed data
Easy for a table of several MB or GB.
What about TB or PB ?
11© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Big Data intro.
Terms and tools for Big Data:
• HDFS - Hadoop Distributed File System.
• Apache Hadoop - framework for distributed storage and processing.
• HBase - non-relational, distributed database.
• Kafka - streaming processing data platform.
• Cassandra - open-source distributed NoSQL database.
• Mongodb - open-source cross-platform document-oriented database.
• Hive - query and analysis data framework on top of Hadoop.
• Avro,Parquet,ORC - data formats widely used in BD
11
12© The Pythian Group Inc., 2018
Why we need to stream data.
Topology and data layout.
13© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Business cases.
Operation activity on DB with rest on a Data Lake.
13
RDBMS
OLTP
RDBMS
OLTP
BD platform
Data
Preparation
Engine
BI &
Analysis
14© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Business cases.
Operation activity on DB with Data in Big Data and BI on a RDBMS.
14
RDBMS
OLTP
RDBMS
OLTP
BD platform
Data
Preparation
Engine
BI &
AnalysisRDBMS
15© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Business cases.
Operation and BI activity on DB with main Data body in a Big Data platform.
15
RDBMS
OLTP
RDBMS
OLTP
BD platforms
BI &
Analysis
OGG KafkaKafka
RDBMS
16© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Business cases.
Operation and BI activity on DB with main Data body in a Big Data platform.
16
RDBMS
OLTP
RDBMS
OLTP
BD platforms
BI &
Analysis
OGG KafkaOGG
Stream
processing
17© The Pythian Group Inc., 2018
What we are streaming.
Source site - extract.
18© 2018 Pythian. Confidential
Extract
Source
Data Pump
Trail files
Manager
GoldenGate architecture.
Redo logs
19© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 19
Supplemental logging.
● No supplemental logging on DB or table level.
UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4;
update "UNKNOWN"."OBJ# 103768"
set
"COL 1" = HEXTORAW('76616C3134')
where
"COL 1" = HEXTORAW('76616C34')
ROWID = 'AAAZVYAAGAAAADfAAE';
20© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 20
Supplemental logging.
● Min supplemental logging on DB level only.
UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4;
update "SCOTT"."TEST_TAB_1"
set
"STR1" = 'val14'
where
"STR1" = 'val4' and
ROWID = 'AAAZVYAAGAAAADfAAE';
21© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 21
Supplemental logging.
● Min supplemental logging on DB level and supplemental logging
for primary keys:
UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4;
update "SCOTT"."TEST_TAB_1"
set
"STR1" = 'val14'
where
"ID" = 4 and
"STR1" = 'val4' and
ROWID = 'AAAZVYAAGAAAADfAAE';
22© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 22
Metadata changes and DDL replication.
● The metadata changes are captured by default to the trail.
● They will be applied with the following DML (insert or update).
● Add “GETTRUNCATES” clause to parameters for extract and
replicat to replicate truncate operations..
● “DDL include” doesn’t work with “GETTRUNCATES”.
To work properly with truncate we need to set them up as a DML
operations in the GoldenGate.
23© 2018 Pythian. Confidential
Source Target
Data Pump
Manager
Collector
Manager
Trail
Extract Replicat
Trail
GoldenGate architecture.
24© The Pythian Group Inc., 2018
Replicat.
How it works.
25© 2018 Pythian. Confidential
Target
Collector
Manager
Replicat
Trail
Replicat.
26© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 26
Trail file.
● File Header.
Trail file
File header
Logdump 98 >open /u01/ogg_ora121/var/lib/data/od000000000
Current LogTrail is /u01/ogg_ora121/var/lib/data/od000000000
Logdump 99 >ghdr on
Logdump 100 >detail on
Logdump 101 >detail data
Logdump 102 >usertoken on
Logdump 103 >n
2017/11/27 15:59:57.582.051 FileHeader Len 1409 RBA 0
Name: *FileHeader*
3000 030b 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
0006 3200 0004 2000 0000 3300 0008 02f2 8530 a414 | ..2... ...3......0..
bee3 3400 0002 0000 3500 0036 3500 0032 0030 7572 | ..4.....5..65..2.0ur
693a 6269 6764 6174 616c 6974 653a 6c6f 6361 6c64 | i:bigdatalite:locald
6f6d 6169 6e3a 3a75 3031 3a6f 6767 6d73 3a62 696e | omain::u01:oggms:bin
3a45 5854 3031 3600 000d 000b 6f64 3030 3030 3030 | :EXT016.....od000000
3030 3037 0000 0101 3800 0004 0000 0000 3900 0008 | 0007....8.......9...
27© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 27
Trail file.
● Record Header Area.
Trail file
File header
Record Header Area
Logdump 104 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x00)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 91 (x005b) IO Time : 2017/11/27 10:58:08.788.835
IOType : 170 (xaa) OrigNode : 1 (x01)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
DDR/TDR Idx: (001, 000) AuditPos : 26645008
Continued : N (x00) RecCount : 1 (x01)
2017/11/27 10:58:08.788.835 Metadata Len 91 RBA 1417
Name: ORCL
*
DDR Version: 1
Database type: ORACLE
Character set ID: UTF-8
National character set ID: UTF-16
Locale: neutral
Case sensitivity: 14 14 14 14 14 14 14 14 14 14 14 14 11 14 14 14
TimeZone: GMT
Global name: ORCL.LOCALDOMAIN
*
28© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 28
Trail file.
● Metadata.
Trail file
File header
Record Header Area
Metadata
Logdump 105 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : @ (x40)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 1558 (x0616) IO Time : 2017/11/27 15:59:57.588.540
IOType : 170 (xaa) OrigNode : 2 (x02)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
DDR/TDR Idx: (001, 001) AuditPos : 26645008
Continued : N (x00) RecCount : 1 (x01)
2017/11/27 15:59:57.588.540 Metadata Len 1558 RBA 1563
Name: ORCL.C##OGG.GG_HEARTBEAT_SEED
*
1)Name 2)Data Type 3)External Length 4)Fetch Offset 5)Scale
…..
*
TDR version: 11
Definition for table ORCL.C##OGG.GG_HEARTBEAT_SEED
Record Length: 9872
Columns: 17
LOCAL_DATABASE 64 512 0 0 0 1 0 512 512 0 0 0 0 0 1
0 1 0 1 0 0 0 0
29© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 29
Trail file.
● Record.
Trail file
File header
Record Header Area
Metadata
Record
Logdump 106 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : L (x4c)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 854 (x0356) IO Time : 2017/11/27 10:56:20.000.887
IOType : 135 (x87) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 2689 AuditPos : 26645008
Continued : N (x00) RecCount : 1 (x01)
2017/11/27 10:56:20.000.887 GGSUnifiedPKUpdate Len 854 RBA 3201
Name: ORCL.C##OGG.GG_HEARTBEAT_SEED (TDR Index: 1)
After Image: Partition 76 G s
a901 0000 0000 0700 0000 0300 4344 4201 001f 0000 | ............CDB.....
0032 3031 372d 3131 2d32 373a 3135 3a35 353a 3230 | .2017-11-27:15:55:20
2e31 3136 3037 3730 3030 0200 0400 ffff 0000 0300 | .116077000..........
0400 ffff 0000 0400 0400 ffff 0000 0500 0400 ffff | ....................
0000 0600 1f00 ffff 0000 0000 0000 0000 0000 0000 | ....................
0000 0000 0000 0000 0000 0000 0000 0000 0007 001f | ....................
00ff ff00 0000 0000 0000 0000 0000 0000 0000 0000 | ....................
...
30© 2018 Pythian. Confidential
BD Target
Collector
Manager Replicat
Trail
Replicat standard architecture.
Property file
Parameter file
31© 2018 Pythian. Confidential
BD Target
Collector
Manager
Replicat
Trail
Replicat with staging file.
Event Handler
Staging file
32© The Pythian Group Inc., 2018
Targets.
Different handlers.
33© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 33
Handlers.
● List of target handlers.
● BigQuery Handler.
● Cassandra Handler.
● Elasticsearch Handler.
● File Writer Handler.
■ HDFS Event Handler.
■ ORC Event Handler.
■ Oracle Cloud Event Handler.
■ Parquet Event Handler.
■ S3 Event Handler.
● Flume Handler.
● HBase Event Handler.
● HDFS Handler.
● JDBC Handler.
● List of target handlers (continued).
● Kafka.
● Kafka connect.
● Kafka REST Proxy Handler.
● Kinesis Streams Handler.
● MongoDB Handler.
● Oracle NoSQL Handler.
● Microsoft Azure Data Lake.
● Elasticsearch Handler.
● File Writer Handler.
● Source handler.
● Oracle GoldenGate Capture for
Cassandra.
34© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 34
HDFS handler.
● OGG for BD + hadoop client.
RHDFS
replicat process
./dirrpt/RHDFS.rpt
./dirrpt/RHDFS.log
./dirpcs/RHDFS.pcr
./dirchk/RHDFS.cpr
Trail file:
./dirdat/or*
./dirrpt/RHDFS.dsc
Property file:
./dirprm/hdfs.prop
Parameter file:
./dirprm/rhdfs.prm
-- Report file
-- Log file
-- Process file
-- Checkpoint file
-- Discard file
Hadoop client:
*.jar + conf files
To HDFS
35© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 35
HDFS handler.
● HDFS properties file.
gg.handlerlist=hdfs
gg.handler.hdfs.type=hdfs
gg.handler.hdfs.pathMappingTemplate=/user/oracle/ogghdfs
gg.handler.hdfs.fileNameMappingTemplate=${fullyQualifiedTableName}_${groupName
}_${currentTimestamp}.txt
gg.handler.hdfs.format=delimitedtext
gg.handler.hdfs.mode=tx
gg.log.level=INFO
gg.report.time=30sec
gg.classpath=/usr/lib/hadoop/client/*:/usr/lib/hadoop/etc/hadoop/:
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar
36© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 36
HDFS handler.
● HDFS replicat parameter file.
REPLICAT rhdfs
-- Trail file for this example is located in "AdapterExamples/trail" directory
-- Command to add REPLICAT
-- add replicat rhdfs, exttrail dirdat/or
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
GETTRUNCATES
MAP SCOTT.*, TARGET SCOTT.*;
37© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 37
HDFS handler.
● Hadoop client.
● Install the hadoop client using yum or previously downloaded packages:
$ yum install hadoop-client
● Prepare configuration files in /etc/hadoop/conf directory:
core-site.xml
hdfs-site.xml
ssl-client.xml
topology.map
● Test hadoop:
$hadoop fs -ls
38© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
HDFS handler.
Journal of changes instead of a state
38
Database HDFS
Insert row 1
Row 1
I | Row 1
Insert row 2
Row 2
I | Row 2
Insert row 3
Row 3
I | Row 3
Update row 3
Row 3
U | Row 3
Delete row 2
D | Row 2
39© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 39
HDFS handler.
● What we have on the hadoop site.
● The file has been created according our mapping:
$ [oracle@vm130-179 oggbd]$ hadoop fs -ls ogghdfs
Found 1 items
-rw-r--r-- 3 oracle oinstall 142 2018-11-26 17:41 ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt
[oracle@vm130-179 oggbd]$
● Insert operation:
[oracle@vm130-179 oggbd]$ hadoop fs -cat ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt
ISCOTT.TEST_TAB_12018-11-26
16:46:14.0013192018-11-26T17:36:35.21600000000000000000004833ID1STR11CJ1LEUJUS
E_DATE2018-11-26 16:46:06
40© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 40
HDFS handler.
● What we have on the hadoop site.
● The file has been created according our mapping:
$ [oracle@vm130-179 oggbd]$ hadoop fs -ls ogghdfs
Found 1 items
-rw-r--r-- 3 oracle oinstall 142 2018-11-26 17:41 ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt
[oracle@vm130-179 oggbd]$
● Delete operation:
[oracle@vm130-179 oggbd]$ hadoop fs -cat ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt
DSCOTT.TEST_TAB_12018-11-27
15:18:05.0582092018-11-27T15:18:08.45500000000000000000010410ID14STR17VOYOFUIU
SE_DATE2018-11-27 15:17:32
41© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 41
HDFS handler.
● What we have on the hadoop site.
● INSERT,UPDATE,DELETE and TRUNCATE in the file:
ISCOTT.TEST_TAB_12018-11-27
16:33:17.9996292018-11-27T16:33:22.21100000000000000000001966ID21STR1QK5UJVURU
SE_DATE2018-11-27 16:33:15
USCOTT.TEST_TAB_12018-11-27
16:34:54.0005962018-11-27T16:34:56.84800000000000000000002135ID21STR1QNI5ZHNEU
SE_DATE
DSCOTT.TEST_TAB_12018-11-27
16:35:40.0014242018-11-27T16:35:43.91200000000000000000002331ID21STR1QNI5ZHNEU
SE_DATE2018-11-27 16:33:15
TSCOTT.TEST_TAB_12018-11-27
16:36:17.0049062018-11-27T16:36:20.96800000000000000000002497IDSTR1USE_DATE
42© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 42
Hbase handler.
● OGG for BD + hadoop client.
RHDFS
replicat process
./dirrpt/RHDFS.rpt
./dirrpt/RHDFS.log
./dirpcs/RHDFS.pcr
./dirchk/RHDFS.cpr
Trail file:
./dirdat/or*
./dirrpt/RHDFS.dsc
Property file:
./dirprm/hbase.prop
Parameter file:
./dirprm/rhbase.prm
-- Report file
-- Log file
-- Process file
-- Checkpoint file
-- Discard file
HBase client:
*.jar + conf files
To HDFS
43© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 43
HBase handler.
● HBase and supplemental logging enabled for all columns.
orcl> insert into scott.part_tab_1 (str1,use_date)
values (dbms_random.string('x',8),sysdate);
hbase(main):007:0> list
TABLE
SCOTT:PART_TAB_1
1 row(s) in 0.0150 seconds
=> ["SCOTT:PART_TAB_1"]
hbase(main):008:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
2|EROAEDND|2018-11-28 13:50:04 column=cf:ID, timestamp=1543431340770, value=2
2|EROAEDND|2018-11-28 13:50:04 column=cf:STR1, timestamp=1543431340770, value=EROAEDND
2|EROAEDND|2018-11-28 13:50:04 column=cf:USE_DATE, timestamp=1543431340770,
value=2018-11-28 13:50:04
1 row(s) in 0.2080 seconds
44© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 44
HBase handler.
● Supplemental logging enabled for the primary key.
OGG (http://localhost:9001 dep_01 as c_ogg) 4> delete trandata scott.part_tab_1
OGG (http://localhost:9001 dep_01 as c_ogg) 5> add trandata scott.part_tab_1
OGG (http://localhost:9001 dep_01 as c_ogg) 6> info trandata scott.part_tab_1
Logging of supplemental redo log data is enabled for table SCOTT.PART_TAB_1.
Columns supplementally logged for table SCOTT.PART_TAB_1:
- "ID"
Prepared CSN for table SCOTT.PART_TAB_1: 4301403009
45© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 45
HBase handler.
● HBase and supplemental logging enabled for the primary key.
orcl> insert into scott.part_tab_1 (str1,use_date)
values (dbms_random.string('x',8),sysdate);
hbase(main):020:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
3 column=cf:ID, timestamp=1543432623211, value=3
3 column=cf:STR1, timestamp=1543432623211, value=5AV3SKKU
3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29
1 row(s) in 0.0390 seconds
46© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 46
HBase handler.
● HBase handles updates.
orcl>update scott.part_tab_1 set str1='UPDATED' where ID=3;
hbase(main):010:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
3 column=cf:ID, timestamp=1543941858433, value=3
3 column=cf:STR1, timestamp=1543941858433, value=UPDATED
3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29
4 column=cf:ID, timestamp=1543941787306, value=4
4 column=cf:STR1, timestamp=1543941787306, value=VRT582TI
4 column=cf:USE_DATE, timestamp=1543941787306, value=2018-12-04 11:42:46
2 row(s) in 0.0600 seconds
47© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 47
HBase handler.
● HBase handles deletes.
orcl>delete from scott.part_tab_1 where ID=4;
hbase(main):013:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
3 column=cf:ID, timestamp=1543941858433, value=3
3 column=cf:STR1, timestamp=1543941858433, value=UPDATED
3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29
1 row(s) in 0.0530 seconds
48© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 48
HBase handler.
● HBase and truncates.
orcl> truncate table scott.part_tab_1;
hbase(main):020:0> list
TABLE
SCOTT:PART_TAB_1
1 row(s) in 0.0270 seconds
=> ["SCOTT:PART_TAB_1"]
hbase(main):021:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
0 row(s) in 0.0370 seconds
49© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 49
HBase handler.
● HBase and delete-insert.
orcl> select * from scott.part_tab_1;
ID STR1 USE_DATE
---------------- ---------- -----------------
8 XL7UCVIG 12/04/18 12:10:55
6 QX81W9C1 12/04/18 12:08:04
orcl> delete from scott.part_tab_1 where ID=8;
1 row deleted.
orcl> insert into scott.part_tab_1 (id,str1,use_date) values
(8,dbms_random.string('x',8),sysdate);
1 row created.
orcl> commit;
50© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 50
HBase handler.
● HBase and delete-insert.
hbase(main):031:0> scan 'SCOTT:PART_TAB_1'
ROW COLUMN+CELL
6 column=cf:ID, timestamp=1543943294808, value=6
6 column=cf:STR1, timestamp=1543943294808, value=QX81W9C1
6 column=cf:USE_DATE, timestamp=1543943294808,
value=2018-12-04 12:08:04
1 row(s) in 0.0370 seconds
Workaround :
set the property inthe hbase.props file
gg.handler.name.setHbaseOperationTimestamp=true
51© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 51
File Writer handler.
● Staging + processing.
RFileH
replicat process
Parquet event handler
ORC event handler
Oracle Cloud Infra
AWS S3
Staging file:
../dirout/or*
Oracle Cloud Classic
Reformatted file:
../dirout/par*
Final destination:
AWS S3 data
--Complete to write to the file
--Parquet(or another) event handler
--S3 (or another) event handler
52© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Oracle Big Data Lite VM.
You can try the most of the tools on a VM.
• The Oracle BD Lite VM image.
• Has Oracle database and Hadoop.
• ODI, GoldenGate and BD SQL.
• Oracle connectors .
• Oracle NoSQL and R distribution.
• Sample schemas and scenarios.
https://www.oracle.com/technetwork/data
base/bigdata-appliance/oracle-bigdatalite-
2104726.html
52
53© The Pythian Group Inc., 2018
In the cloud
DIPC and OGG service
54© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Oracle Goldengate Cloud Service
• Easy setup.
• Integrated with other Oracle Cloud
services.
• Secure.
• Highly customizable and flexible.
• Supports multiple types of targets.
OGG Cloud service.
55© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Oracle Cloud.
GoldenGate Cloud Service.
55
App
App
App
App Database Tran
Log
App
Oracle
Goldengate
Oracle
Goldengate
KAFKA
KAFKA
KAFKA
KAFKA
KAFKA
OGG Trail
56© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Oracle Data Integration Platform Cloud
• Connect to data sources and
prepare, transform, replicate,
analyze, govern, and monitor data
• Analyze data streams.
• Set up policies based on metrics to
receive notifications.
• Manage all your data sources from a
single platform.
Data transformation, integration, replication, analysis, and governance..
• On-premises to Cloud.
• Cloud to Cloud.
• Cloud to On-premises.
• On-premises to On-premises.
57© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
Oracle Cloud.
GoldenGate Cloud Service or Data Integration Platform Cloud (DIPC)
57
App
App
App
App Database Tran
Log
App
DIPC AgentOracle
Goldengate
Oracle
Goldengate
KAFKA
KAFKA
KAFKA
KAFKA
KAFKA
OGG Trail
ODI
Data
Preparation
58© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
QA
58
59© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018
THANK YOU
Email: otochkin@pythian.com
https://pythian.com/blog
@sky_vst
59

More Related Content

What's hot

Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...Yann Pauly
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
Flink Forward Europe 2019 - Berlin
Flink Forward Europe 2019 - BerlinFlink Forward Europe 2019 - Berlin
Flink Forward Europe 2019 - BerlinDavid Morin
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 

What's hot (20)

Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Flink Forward Europe 2019 - Berlin
Flink Forward Europe 2019 - BerlinFlink Forward Europe 2019 - Berlin
Flink Forward Europe 2019 - Berlin
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 

Similar to One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018

There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9Gleb Otochkin
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...Insight Technology, Inc.
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryThuyen Ho
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfMiguel Angel Fajardo
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesBobby Curtis
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Databricks
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteCarlos Andrés García
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteVMware Tanzu
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDeepak Chandramouli
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQLPingCAP
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItGustavo Rene Antunez
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Kevin Xu
 
Implementing MySQL Database-as-a-Service using open source tools
Implementing MySQL Database-as-a-Service using open source toolsImplementing MySQL Database-as-a-Service using open source tools
Implementing MySQL Database-as-a-Service using open source toolsAll Things Open
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformMariaDB plc
 

Similar to One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018 (20)

There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQuery
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
Huge pages why-what-how
Huge pages why-what-howHuge pages why-what-how
Huge pages why-what-how
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
Why You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get ItWhy You Need Manageability Now More than Ever and How to Get It
Why You Need Manageability Now More than Ever and How to Get It
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Implementing MySQL Database-as-a-Service using open source tools
Implementing MySQL Database-as-a-Service using open source toolsImplementing MySQL Database-as-a-Service using open source tools
Implementing MySQL Database-as-a-Service using open source tools
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud Platform
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018

  • 1. 1© The Pythian Group Inc., 2018 DATE Oracle GoldenGate for Big Data - deep dive. One bridge to connect them all
  • 2. Gleb Otochkin Started to work with data in 1992 At Pythian since 2008 Area of expertise: ● Data Integration ● Oracle RAC ● Oracle engineered systems ● Virtualization ● Performance tuning ● Big Data otochkin@pythian.com @sky_vst Principal Consultant © 2016 Pythian. Confidential 2
  • 3. © 2017 Pythian. Confidential 3 Years in Business 20 Pythian Experts in 35 Countries 400+ Current Clients Globally 350+
  • 4. 4© The Pythian Group Inc., 2018 AGENDA 4 ● Big Data introduction. ● Why we need to stream data. ● What we are streaming. ● Replicat. ● Different targets. ● Cloud. ● QA.
  • 5. 5© The Pythian Group Inc., 2018 Big Data intro. Why and what.
  • 6. 6© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 CLICK TO ADD TITLE Big Data intro. Big Data? In Texas we call it just data.
  • 7. 7© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Something about data. Why do we talk about it and where it is coming from. 7 Col A Col B Col C Col A Col ACol A Col B Col A Col B Col A Col ACol A Col D Col A Col A Col B Col ACol A Col B Col C ROW format
  • 8. 8© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Something about data. Why do we talk about it and where it is coming from. 8 Col A Col B Col C Col A Col ACol A Col B Col C Col A Col B Col A Col ACol A Col D Col A Col B Col A Col ACol A Col B Columnar format
  • 9. 9© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Something about data and how we store it. Why do we talk about it and where it is coming from. 9 Col A Col B Col C Col A Col ACol A Col B Col A Col B Col A Col B1 Col ACol A Col D Col B2 Col A Col A Col B Col ACol A Col B Col C1 Unstructured ColC2
  • 10. 10© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Something about data and how we store it. Why do we talk about it and where it is coming from. 10 Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Buffer - Data Processing Processed data Easy for a table of several MB or GB. What about TB or PB ?
  • 11. 11© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Big Data intro. Terms and tools for Big Data: • HDFS - Hadoop Distributed File System. • Apache Hadoop - framework for distributed storage and processing. • HBase - non-relational, distributed database. • Kafka - streaming processing data platform. • Cassandra - open-source distributed NoSQL database. • Mongodb - open-source cross-platform document-oriented database. • Hive - query and analysis data framework on top of Hadoop. • Avro,Parquet,ORC - data formats widely used in BD 11
  • 12. 12© The Pythian Group Inc., 2018 Why we need to stream data. Topology and data layout.
  • 13. 13© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Business cases. Operation activity on DB with rest on a Data Lake. 13 RDBMS OLTP RDBMS OLTP BD platform Data Preparation Engine BI & Analysis
  • 14. 14© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Business cases. Operation activity on DB with Data in Big Data and BI on a RDBMS. 14 RDBMS OLTP RDBMS OLTP BD platform Data Preparation Engine BI & AnalysisRDBMS
  • 15. 15© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Business cases. Operation and BI activity on DB with main Data body in a Big Data platform. 15 RDBMS OLTP RDBMS OLTP BD platforms BI & Analysis OGG KafkaKafka RDBMS
  • 16. 16© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Business cases. Operation and BI activity on DB with main Data body in a Big Data platform. 16 RDBMS OLTP RDBMS OLTP BD platforms BI & Analysis OGG KafkaOGG Stream processing
  • 17. 17© The Pythian Group Inc., 2018 What we are streaming. Source site - extract.
  • 18. 18© 2018 Pythian. Confidential Extract Source Data Pump Trail files Manager GoldenGate architecture. Redo logs
  • 19. 19© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 19 Supplemental logging. ● No supplemental logging on DB or table level. UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4; update "UNKNOWN"."OBJ# 103768" set "COL 1" = HEXTORAW('76616C3134') where "COL 1" = HEXTORAW('76616C34') ROWID = 'AAAZVYAAGAAAADfAAE';
  • 20. 20© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 20 Supplemental logging. ● Min supplemental logging on DB level only. UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4; update "SCOTT"."TEST_TAB_1" set "STR1" = 'val14' where "STR1" = 'val4' and ROWID = 'AAAZVYAAGAAAADfAAE';
  • 21. 21© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 21 Supplemental logging. ● Min supplemental logging on DB level and supplemental logging for primary keys: UPDATE scott.test_tab_1 SET str1='val14' WHERE id=4; update "SCOTT"."TEST_TAB_1" set "STR1" = 'val14' where "ID" = 4 and "STR1" = 'val4' and ROWID = 'AAAZVYAAGAAAADfAAE';
  • 22. 22© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 22 Metadata changes and DDL replication. ● The metadata changes are captured by default to the trail. ● They will be applied with the following DML (insert or update). ● Add “GETTRUNCATES” clause to parameters for extract and replicat to replicate truncate operations.. ● “DDL include” doesn’t work with “GETTRUNCATES”. To work properly with truncate we need to set them up as a DML operations in the GoldenGate.
  • 23. 23© 2018 Pythian. Confidential Source Target Data Pump Manager Collector Manager Trail Extract Replicat Trail GoldenGate architecture.
  • 24. 24© The Pythian Group Inc., 2018 Replicat. How it works.
  • 25. 25© 2018 Pythian. Confidential Target Collector Manager Replicat Trail Replicat.
  • 26. 26© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 26 Trail file. ● File Header. Trail file File header Logdump 98 >open /u01/ogg_ora121/var/lib/data/od000000000 Current LogTrail is /u01/ogg_ora121/var/lib/data/od000000000 Logdump 99 >ghdr on Logdump 100 >detail on Logdump 101 >detail data Logdump 102 >usertoken on Logdump 103 >n 2017/11/27 15:59:57.582.051 FileHeader Len 1409 RBA 0 Name: *FileHeader* 3000 030b 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1... 0006 3200 0004 2000 0000 3300 0008 02f2 8530 a414 | ..2... ...3......0.. bee3 3400 0002 0000 3500 0036 3500 0032 0030 7572 | ..4.....5..65..2.0ur 693a 6269 6764 6174 616c 6974 653a 6c6f 6361 6c64 | i:bigdatalite:locald 6f6d 6169 6e3a 3a75 3031 3a6f 6767 6d73 3a62 696e | omain::u01:oggms:bin 3a45 5854 3031 3600 000d 000b 6f64 3030 3030 3030 | :EXT016.....od000000 3030 3037 0000 0101 3800 0004 0000 0000 3900 0008 | 0007....8.......9...
  • 27. 27© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 27 Trail file. ● Record Header Area. Trail file File header Record Header Area Logdump 104 >n ___________________________________________________________________ Hdr-Ind : E (x45) Partition : . (x00) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 91 (x005b) IO Time : 2017/11/27 10:58:08.788.835 IOType : 170 (xaa) OrigNode : 1 (x01) TransInd : . (x03) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) DDR/TDR Idx: (001, 000) AuditPos : 26645008 Continued : N (x00) RecCount : 1 (x01) 2017/11/27 10:58:08.788.835 Metadata Len 91 RBA 1417 Name: ORCL * DDR Version: 1 Database type: ORACLE Character set ID: UTF-8 National character set ID: UTF-16 Locale: neutral Case sensitivity: 14 14 14 14 14 14 14 14 14 14 14 14 11 14 14 14 TimeZone: GMT Global name: ORCL.LOCALDOMAIN *
  • 28. 28© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 28 Trail file. ● Metadata. Trail file File header Record Header Area Metadata Logdump 105 >n ___________________________________________________________________ Hdr-Ind : E (x45) Partition : @ (x40) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 1558 (x0616) IO Time : 2017/11/27 15:59:57.588.540 IOType : 170 (xaa) OrigNode : 2 (x02) TransInd : . (x03) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) DDR/TDR Idx: (001, 001) AuditPos : 26645008 Continued : N (x00) RecCount : 1 (x01) 2017/11/27 15:59:57.588.540 Metadata Len 1558 RBA 1563 Name: ORCL.C##OGG.GG_HEARTBEAT_SEED * 1)Name 2)Data Type 3)External Length 4)Fetch Offset 5)Scale ….. * TDR version: 11 Definition for table ORCL.C##OGG.GG_HEARTBEAT_SEED Record Length: 9872 Columns: 17 LOCAL_DATABASE 64 512 0 0 0 1 0 512 512 0 0 0 0 0 1 0 1 0 1 0 0 0 0
  • 29. 29© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 29 Trail file. ● Record. Trail file File header Record Header Area Metadata Record Logdump 106 >n ___________________________________________________________________ Hdr-Ind : E (x45) Partition : L (x4c) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 854 (x0356) IO Time : 2017/11/27 10:56:20.000.887 IOType : 135 (x87) OrigNode : 255 (xff) TransInd : . (x03) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 2689 AuditPos : 26645008 Continued : N (x00) RecCount : 1 (x01) 2017/11/27 10:56:20.000.887 GGSUnifiedPKUpdate Len 854 RBA 3201 Name: ORCL.C##OGG.GG_HEARTBEAT_SEED (TDR Index: 1) After Image: Partition 76 G s a901 0000 0000 0700 0000 0300 4344 4201 001f 0000 | ............CDB..... 0032 3031 372d 3131 2d32 373a 3135 3a35 353a 3230 | .2017-11-27:15:55:20 2e31 3136 3037 3730 3030 0200 0400 ffff 0000 0300 | .116077000.......... 0400 ffff 0000 0400 0400 ffff 0000 0500 0400 ffff | .................... 0000 0600 1f00 ffff 0000 0000 0000 0000 0000 0000 | .................... 0000 0000 0000 0000 0000 0000 0000 0000 0007 001f | .................... 00ff ff00 0000 0000 0000 0000 0000 0000 0000 0000 | .................... ...
  • 30. 30© 2018 Pythian. Confidential BD Target Collector Manager Replicat Trail Replicat standard architecture. Property file Parameter file
  • 31. 31© 2018 Pythian. Confidential BD Target Collector Manager Replicat Trail Replicat with staging file. Event Handler Staging file
  • 32. 32© The Pythian Group Inc., 2018 Targets. Different handlers.
  • 33. 33© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 33 Handlers. ● List of target handlers. ● BigQuery Handler. ● Cassandra Handler. ● Elasticsearch Handler. ● File Writer Handler. ■ HDFS Event Handler. ■ ORC Event Handler. ■ Oracle Cloud Event Handler. ■ Parquet Event Handler. ■ S3 Event Handler. ● Flume Handler. ● HBase Event Handler. ● HDFS Handler. ● JDBC Handler. ● List of target handlers (continued). ● Kafka. ● Kafka connect. ● Kafka REST Proxy Handler. ● Kinesis Streams Handler. ● MongoDB Handler. ● Oracle NoSQL Handler. ● Microsoft Azure Data Lake. ● Elasticsearch Handler. ● File Writer Handler. ● Source handler. ● Oracle GoldenGate Capture for Cassandra.
  • 34. 34© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 34 HDFS handler. ● OGG for BD + hadoop client. RHDFS replicat process ./dirrpt/RHDFS.rpt ./dirrpt/RHDFS.log ./dirpcs/RHDFS.pcr ./dirchk/RHDFS.cpr Trail file: ./dirdat/or* ./dirrpt/RHDFS.dsc Property file: ./dirprm/hdfs.prop Parameter file: ./dirprm/rhdfs.prm -- Report file -- Log file -- Process file -- Checkpoint file -- Discard file Hadoop client: *.jar + conf files To HDFS
  • 35. 35© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 35 HDFS handler. ● HDFS properties file. gg.handlerlist=hdfs gg.handler.hdfs.type=hdfs gg.handler.hdfs.pathMappingTemplate=/user/oracle/ogghdfs gg.handler.hdfs.fileNameMappingTemplate=${fullyQualifiedTableName}_${groupName }_${currentTimestamp}.txt gg.handler.hdfs.format=delimitedtext gg.handler.hdfs.mode=tx gg.log.level=INFO gg.report.time=30sec gg.classpath=/usr/lib/hadoop/client/*:/usr/lib/hadoop/etc/hadoop/: javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar
  • 36. 36© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 36 HDFS handler. ● HDFS replicat parameter file. REPLICAT rhdfs -- Trail file for this example is located in "AdapterExamples/trail" directory -- Command to add REPLICAT -- add replicat rhdfs, exttrail dirdat/or TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 GETTRUNCATES MAP SCOTT.*, TARGET SCOTT.*;
  • 37. 37© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 37 HDFS handler. ● Hadoop client. ● Install the hadoop client using yum or previously downloaded packages: $ yum install hadoop-client ● Prepare configuration files in /etc/hadoop/conf directory: core-site.xml hdfs-site.xml ssl-client.xml topology.map ● Test hadoop: $hadoop fs -ls
  • 38. 38© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 HDFS handler. Journal of changes instead of a state 38 Database HDFS Insert row 1 Row 1 I | Row 1 Insert row 2 Row 2 I | Row 2 Insert row 3 Row 3 I | Row 3 Update row 3 Row 3 U | Row 3 Delete row 2 D | Row 2
  • 39. 39© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 39 HDFS handler. ● What we have on the hadoop site. ● The file has been created according our mapping: $ [oracle@vm130-179 oggbd]$ hadoop fs -ls ogghdfs Found 1 items -rw-r--r-- 3 oracle oinstall 142 2018-11-26 17:41 ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt [oracle@vm130-179 oggbd]$ ● Insert operation: [oracle@vm130-179 oggbd]$ hadoop fs -cat ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt ISCOTT.TEST_TAB_12018-11-26 16:46:14.0013192018-11-26T17:36:35.21600000000000000000004833ID1STR11CJ1LEUJUS E_DATE2018-11-26 16:46:06
  • 40. 40© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 40 HDFS handler. ● What we have on the hadoop site. ● The file has been created according our mapping: $ [oracle@vm130-179 oggbd]$ hadoop fs -ls ogghdfs Found 1 items -rw-r--r-- 3 oracle oinstall 142 2018-11-26 17:41 ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt [oracle@vm130-179 oggbd]$ ● Delete operation: [oracle@vm130-179 oggbd]$ hadoop fs -cat ogghdfs/SCOTT.TEST_TAB_1_RHDFS_2018-11-26_17-36-35.222.txt DSCOTT.TEST_TAB_12018-11-27 15:18:05.0582092018-11-27T15:18:08.45500000000000000000010410ID14STR17VOYOFUIU SE_DATE2018-11-27 15:17:32
  • 41. 41© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 41 HDFS handler. ● What we have on the hadoop site. ● INSERT,UPDATE,DELETE and TRUNCATE in the file: ISCOTT.TEST_TAB_12018-11-27 16:33:17.9996292018-11-27T16:33:22.21100000000000000000001966ID21STR1QK5UJVURU SE_DATE2018-11-27 16:33:15 USCOTT.TEST_TAB_12018-11-27 16:34:54.0005962018-11-27T16:34:56.84800000000000000000002135ID21STR1QNI5ZHNEU SE_DATE DSCOTT.TEST_TAB_12018-11-27 16:35:40.0014242018-11-27T16:35:43.91200000000000000000002331ID21STR1QNI5ZHNEU SE_DATE2018-11-27 16:33:15 TSCOTT.TEST_TAB_12018-11-27 16:36:17.0049062018-11-27T16:36:20.96800000000000000000002497IDSTR1USE_DATE
  • 42. 42© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 42 Hbase handler. ● OGG for BD + hadoop client. RHDFS replicat process ./dirrpt/RHDFS.rpt ./dirrpt/RHDFS.log ./dirpcs/RHDFS.pcr ./dirchk/RHDFS.cpr Trail file: ./dirdat/or* ./dirrpt/RHDFS.dsc Property file: ./dirprm/hbase.prop Parameter file: ./dirprm/rhbase.prm -- Report file -- Log file -- Process file -- Checkpoint file -- Discard file HBase client: *.jar + conf files To HDFS
  • 43. 43© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 43 HBase handler. ● HBase and supplemental logging enabled for all columns. orcl> insert into scott.part_tab_1 (str1,use_date) values (dbms_random.string('x',8),sysdate); hbase(main):007:0> list TABLE SCOTT:PART_TAB_1 1 row(s) in 0.0150 seconds => ["SCOTT:PART_TAB_1"] hbase(main):008:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 2|EROAEDND|2018-11-28 13:50:04 column=cf:ID, timestamp=1543431340770, value=2 2|EROAEDND|2018-11-28 13:50:04 column=cf:STR1, timestamp=1543431340770, value=EROAEDND 2|EROAEDND|2018-11-28 13:50:04 column=cf:USE_DATE, timestamp=1543431340770, value=2018-11-28 13:50:04 1 row(s) in 0.2080 seconds
  • 44. 44© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 44 HBase handler. ● Supplemental logging enabled for the primary key. OGG (http://localhost:9001 dep_01 as c_ogg) 4> delete trandata scott.part_tab_1 OGG (http://localhost:9001 dep_01 as c_ogg) 5> add trandata scott.part_tab_1 OGG (http://localhost:9001 dep_01 as c_ogg) 6> info trandata scott.part_tab_1 Logging of supplemental redo log data is enabled for table SCOTT.PART_TAB_1. Columns supplementally logged for table SCOTT.PART_TAB_1: - "ID" Prepared CSN for table SCOTT.PART_TAB_1: 4301403009
  • 45. 45© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 45 HBase handler. ● HBase and supplemental logging enabled for the primary key. orcl> insert into scott.part_tab_1 (str1,use_date) values (dbms_random.string('x',8),sysdate); hbase(main):020:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 3 column=cf:ID, timestamp=1543432623211, value=3 3 column=cf:STR1, timestamp=1543432623211, value=5AV3SKKU 3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29 1 row(s) in 0.0390 seconds
  • 46. 46© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 46 HBase handler. ● HBase handles updates. orcl>update scott.part_tab_1 set str1='UPDATED' where ID=3; hbase(main):010:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 3 column=cf:ID, timestamp=1543941858433, value=3 3 column=cf:STR1, timestamp=1543941858433, value=UPDATED 3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29 4 column=cf:ID, timestamp=1543941787306, value=4 4 column=cf:STR1, timestamp=1543941787306, value=VRT582TI 4 column=cf:USE_DATE, timestamp=1543941787306, value=2018-12-04 11:42:46 2 row(s) in 0.0600 seconds
  • 47. 47© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 47 HBase handler. ● HBase handles deletes. orcl>delete from scott.part_tab_1 where ID=4; hbase(main):013:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 3 column=cf:ID, timestamp=1543941858433, value=3 3 column=cf:STR1, timestamp=1543941858433, value=UPDATED 3 column=cf:USE_DATE, timestamp=1543432623211, value=2018-11-28 14:11:29 1 row(s) in 0.0530 seconds
  • 48. 48© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 48 HBase handler. ● HBase and truncates. orcl> truncate table scott.part_tab_1; hbase(main):020:0> list TABLE SCOTT:PART_TAB_1 1 row(s) in 0.0270 seconds => ["SCOTT:PART_TAB_1"] hbase(main):021:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 0 row(s) in 0.0370 seconds
  • 49. 49© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 49 HBase handler. ● HBase and delete-insert. orcl> select * from scott.part_tab_1; ID STR1 USE_DATE ---------------- ---------- ----------------- 8 XL7UCVIG 12/04/18 12:10:55 6 QX81W9C1 12/04/18 12:08:04 orcl> delete from scott.part_tab_1 where ID=8; 1 row deleted. orcl> insert into scott.part_tab_1 (id,str1,use_date) values (8,dbms_random.string('x',8),sysdate); 1 row created. orcl> commit;
  • 50. 50© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 50 HBase handler. ● HBase and delete-insert. hbase(main):031:0> scan 'SCOTT:PART_TAB_1' ROW COLUMN+CELL 6 column=cf:ID, timestamp=1543943294808, value=6 6 column=cf:STR1, timestamp=1543943294808, value=QX81W9C1 6 column=cf:USE_DATE, timestamp=1543943294808, value=2018-12-04 12:08:04 1 row(s) in 0.0370 seconds Workaround : set the property inthe hbase.props file gg.handler.name.setHbaseOperationTimestamp=true
  • 51. 51© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 51 File Writer handler. ● Staging + processing. RFileH replicat process Parquet event handler ORC event handler Oracle Cloud Infra AWS S3 Staging file: ../dirout/or* Oracle Cloud Classic Reformatted file: ../dirout/par* Final destination: AWS S3 data --Complete to write to the file --Parquet(or another) event handler --S3 (or another) event handler
  • 52. 52© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Oracle Big Data Lite VM. You can try the most of the tools on a VM. • The Oracle BD Lite VM image. • Has Oracle database and Hadoop. • ODI, GoldenGate and BD SQL. • Oracle connectors . • Oracle NoSQL and R distribution. • Sample schemas and scenarios. https://www.oracle.com/technetwork/data base/bigdata-appliance/oracle-bigdatalite- 2104726.html 52
  • 53. 53© The Pythian Group Inc., 2018 In the cloud DIPC and OGG service
  • 54. 54© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Oracle Goldengate Cloud Service • Easy setup. • Integrated with other Oracle Cloud services. • Secure. • Highly customizable and flexible. • Supports multiple types of targets. OGG Cloud service.
  • 55. 55© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Oracle Cloud. GoldenGate Cloud Service. 55 App App App App Database Tran Log App Oracle Goldengate Oracle Goldengate KAFKA KAFKA KAFKA KAFKA KAFKA OGG Trail
  • 56. 56© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Oracle Data Integration Platform Cloud • Connect to data sources and prepare, transform, replicate, analyze, govern, and monitor data • Analyze data streams. • Set up policies based on metrics to receive notifications. • Manage all your data sources from a single platform. Data transformation, integration, replication, analysis, and governance.. • On-premises to Cloud. • Cloud to Cloud. • Cloud to On-premises. • On-premises to On-premises.
  • 57. 57© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 Oracle Cloud. GoldenGate Cloud Service or Data Integration Platform Cloud (DIPC) 57 App App App App Database Tran Log App DIPC AgentOracle Goldengate Oracle Goldengate KAFKA KAFKA KAFKA KAFKA KAFKA OGG Trail ODI Data Preparation
  • 58. 58© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 QA 58
  • 59. 59© The Pythian Group Inc., 2018© The Pythian Group Inc., 2018 THANK YOU Email: otochkin@pythian.com https://pythian.com/blog @sky_vst 59