SlideShare a Scribd company logo
1 of 40
Real Time Big Data Analytic Platform with
Oracle GoldenGate Big Data Adapters
Rajit Saha Principal BigData Engineer
Vengata(Venky) Guruswamy Principal Database Administrator
Confidential
 Introduction - LendingClub
 Real-time Big Data Platform Use cases
 Lambda Architecture – On-Premise / AWS
 Data Processing Flow/Algorithm
 GoldenGate Big Data Adapter Implementation Architecture,
Configuration and Troubleshooting Scenarios
Agenda
2
Confidential 3
LendingClub
LendingClub is
America’s
largest online
marketplace
connecting
borrowers and
investors.
 Headquartered in San Francisco, CA
 Office in Westborough, MA
Confidential 4
Product Lines
Personal Loans
Loans up to $40K
600+ FICO
36 and 60 mo. terms
Small Business
Business Loans up to
$300K
At least $75,000 in annual
sales
At least 2 years in
business
Patient
Financing
Extended plans up to
$50K; no-interest
plans up to $32K
Auto Refinance
Must have an
outstanding balance
of $5K-$55K, initiated
in the last 3 months
with 24 months of
remaining payments
Confidential
 Maintain Reporting Databases
 Implement transaction history for key
tables(Change Data capture)
 Enable real-time feed to Big data platform
5
Use Cases of
Oracle
GoldenGate
@
LendingClub
Confidential
 LendingClub’s Big Data Platform is
responsible for generating thousands of
reports for the company – Daily, Monthly,
Quarterly
 Investor Reports, Financial Reports, Collection
Reports, Risk Reports, Marketing Reports etc.
• Near Real-time Availability of OLTP Data in
Hadoop Based OLAP warehouse
• Thousands of Tables and rapidly increasing
data volumes towards petabytes
• Thousands of internal adhoc Users
6
Why Big Data
Platform ?
Why Real-time
Data Needed ?
Confidential
 Typical changes are:
 Positive Response from customers
 Indicated DON’T CALL on a particular phone number
from previous contacts.
 Under certain circumstance, it can also be limited on
how many times we can contact borrower by legal or
regulatory reason.
 “NO CALL” list needs be uploaded and fed into dialer
in a very timely fashion, and ideally it should be near
real-time at the event/change occurred.
7
NO Call list is a list of
phone numbers we
upload to dialer
system in order to
suppress Collections
calls due to loan
and/or customer
status changes
throughout the day.
Real-time Use Cases NO CALL List Generation
Confidential 8
What is it?
Why are we using
Real-time?
Who is it for?
Real-time Use Cases Marketing Dashboard
Track the progress of all the marketing channels with
respect to quarterly forecast.
Business needs numbers every hour to change the
strategy in case any channel performance compared to
target.
Marketing Team
Confidential 9
Lambda Architecture On-Premise HortonWorks Cluster
EXTRACT
TRAIL
REPLICAT
INCREMENTAL
TRANSACTIONS
Data
Science
CONSUMERS
TRAIL
TRAIL
Confidential 10
Lambda Architecture AWS S3/EMR Clusters
EXTRACT
TRAIL
REPLICAT
INCREMENTAL
TRANSACTIONS
Data
Science
CONSUMERS
TRAIL
TRAIL
Confidential
SPEED LAYER - DATA FLOW DIAGRAM
Confidential
 DDL Verification from Oracle Source to GG Table before
processing starts
 Data Inserted fromAvro GG Table to CDC partition table
 Hive Dynamic Partition on OP_TS
 ORC (Optimized Row Columnar) Format
 Nightly Snapshot Creation –
 Latest Updates from CDC tables and rest from previous
day Snapshot
 Create Hive and Presto Real Time View with CDC and Nightly
Snapshot
 Data Validation : Full table Checksum with Sqoop Snapshot
12
Few Important
Steps in Speed
Layer Processing
Process Flow ..
Confidential
How are we getting latest update
– For a Table , within a transaction for a row
CREATE EXTERNAL TABLE
SCHEMA_GG.FOO (
TABLE STRING,
OP_TYPE STRING,
OP_TS STRING,
CURRENT_TS STRING ,
POS STRING ,
PRIMARY_KEYS ARRAY<STRING> ,
TOKENS MAP<STRING,STRING> ,
PKEY STRING ,
..
..
..
{"TKN-OPTYPE":"INSERT","TKN-RECORDLENGTH":"1400","TKN-LOGRBA":"1774157","TKN-
OBJECTNAME":”SCHEMA.FOO","TKN-FILERBA":"","TKN-LAGMSEC":"54420","TKN-
USERNAME":”XXX","TKN-SCN":"7633890649912","TKN-RSN":"7633890649838","TKN-
TRANSACTIONINDICATOR":"BEGIN","TKN-RECORDTIMESTAMP":"2017-07-30 23:49:14","TKN-
LOGPOSITION":"19784208","TKN-FILESEQNO":"","TKN-ROWID":"AAA46FACCAAD4CtAAS","TKN-
XID":"184.13.1380795","TKN-COMMITTIMESTAMP":"2017-07-30 23:49:15.000000","TKN-REDOTHREAD":"1"}
SELECT TOKENS FROM SCHEMA_GG.FOO LIMIT 1;Input Table From GG Adaptor
All the transaction records goes to Hive Partitioned ORC
CDC table- SCHEMA_RT_CDC.FOO
SELECT DISTINCT VW.PKEY,.. FROM (SELECT FOO.ID, .. RANK() OVER (PARTITION BY
FOO.PKEY ORDER BY FOO.TKN-SCN DESC, FOO.TKN-RSN DESC, FOO.POS DESC, FOO
.CURRENT_TS DESC) AS DEDUP_RANK FROM SCHEMA_RT_CDC.FOO WHERE
FOO.OP_TS_DATE >='2017-07-30') VW WHERE VW.DEDUP_RANK = 1
Confidential
Verification of Real Time View definition
SELECT PKEY, OP_TS, TKN_SCN, TKN_RSN, POS, CURRENT_TS, MODIFIED_D,
OP_TYPE FROM SCHEMA_RT_CDC.FOO WHERE PKEY = 2181322736 ORDER BY
TKN_SCN DESC,TKN_RSN DESC,POS DESC,CURRENT_TS DESC
Records from CDC
SELECT PKEY, MODIFIED_D FROM SCHEMA_RT.FOO WHERE PKEY = 2181322736Records from Real Time View
PKEY OP_TS TKN_SCN TKN_RSN POS CURRENT_TS MODIFIED_D OP_TYPE
2181322736
2017-06-29
02:32:33.0
7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U
2181322736
2017-06-29
02:32:33.0
7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U
2181322736
2017-06-29
00:39:16.001
7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U
2181322736
2017-06-29
00:39:16.001
7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U
2181322736
2017-06-28
17:02:59.002
7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I
2181322736
2017-06-28
17:02:59.002
7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I
2181322736
2017-06-28
17:02:59.002
7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I
2181322736
2017-06-28
17:02:59.002
7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I
2181322736 2017-06-28 19:32:33.0
Confidential
SLAs
Transaction -> Import Process Start
Avg ~5mins
Transaction -> Import Process End
Avg ~ 5 – 15 mins
Spikes are for NIGHTLY SNAPSHOT Creation
1. Full table creation
2. TEZ is not good for Skew JOIN
3. Cluster Busy
Confidential
GoldenGate Big Data Adapter Implementation
Architecture
Confidential 17
GoldenGate BigData Adapter Pipeline
Db Node 1
DB
Extract
Data
Pump
Golden
Gate
Adapter
Servers
(HA)
Replicat for
HDFS Hive
Db Node 1 TRAIL FILE 1
TRAIL FILE 2
TRAIL FILE 3
TRAIL FILE 1
TRAIL FILE 2
TRAIL FILE 3
Confidential 18
GoldenGate BigData Infrastructure Architecture
Confidential 19
GoldenGate BigData Infrastructure Architecture – Site Failover
Confidential
GoldenGate Big Data Adapter
Configuration
Confidential
Source Extract Parameter - eprod.prm
NOCOMPRESSUPDATES
GETUPDATEBEFORES
TABLE <schema name>.*, tokens (
TKN-COMMITTIMESTAMP = @GETENV('GGHEADER', 'COMMITTIMESTAMP'),
TKN-FILESEQNO = @GETENV('RECORD', 'FILESEQNO'),
TKN-FILERBA = @GETENV('RECORD', 'FILERBA'),
TKN-LAGMSEC = @GETENV('LAG', 'MSEC'),
TKN-LOGPOSITION = @GETENV('GGHEADER', 'LOGPOSITION'),
TKN-OBJECTNAME = @GETENV('GGHEADER', 'OBJECTNAME'),
TKN-OPTYPE = @GETENV('GGHEADER', 'OPTYPE'),
TKN-RECORDLENGTH = @GETENV('GGHEADER', 'RECORDLENGTH'),
TKN-RECORDTIMESTAMP = @GETENV('RECORD', 'TIMESTAMP'),
TKN-ROWID = @GETENV('RECORD', 'ROWID'),
TKN-RSN = @GETENV('RECORD', 'RSN'),
TKN-SCN = @GETENV('TRANSACTION', 'CSN'),
TKN-TRANSACTIONINDICATOR = @GETENV('GGHEADER', 'TRANSACTIONINDICATOR'),
TKN-USERNAME = @GETENV('TRANSACTION', 'USERNAME'),
TKN-XID = @GETENV('TRANSACTION', 'XID')
)
Confidential
Source Data Pump Extract Parameter - pprod.prm
EXTRACT PPROD
PASSTHRU
RMTHOST <Goldengate adapter server>, MGRPORT <port no>
-- Trail file written by Pump to the remote host
RMTTRAIL /filesystem/pumptrail_ALL/pt
-- Pass data through without mapping, filtering, conversion:
PASSTHRU
-- Specify tables to be captured:
TABLE *.*;
Confidential
Target – Replicat Parameter - rhdfs.prm
REPLICAT RHDFS
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
-Schemas
MAP SCHEMA1.*, TARGET SCHEMA1_GG.*;
MAP SCHEMA2.*, TARGET SCHEMA2_GG.*;
-Heartbeat
MAP DBASCHEMA.HEARTBEAT,TARGET DBASCHEMA_GG.HEARTBEAT;
Confidential
gg.handler.hdfs.type=hdfs
##performance
gg.handler.hdfs.maxFileSize=256m
gg.handler.hdfs.fileRollInterval=5m
gg.handler.hdfs.inactivityRollInterval=5m
#config
gg.handler.hdfs.fileSuffix=.avro
gg.handler.hdfs.partitionByTable=true
gg.handler.hdfs.rollOnMetadataChange=true
gg.handler.hdfs.format=avro_row_ocf
## hive jdbc config
gg.handler.hdfs.schemaFilePath=/ogg/schema
gg.handler.hdfs.rootFilePath=/ogg/data
gg.handler.hdfs.hiveJdbcUrl=jdbc:hive2://@Hive_Server_Name:Hive_Port
Target – Replicat Parameters - hdfs.props
Confidential
Oracle GoldenGate to AWS S3 Connector
gg.handlerlist=hdfs
gg.handler.hdfs.type=hdfs
gg.handler.hdfs.rootFilePath=s3://LCBUCKET/OGG
s3replicat.props
Custom Built Hadoop Client from Trunk
hadoop-3.0.0-alpha3-SNAPSHOT
<property>
<name>fs.s3a.access.key</name>
<value><< AWS KEY >></value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value><<AWS SECRET KEY >></value>
</property>
<property>
<name>fs.s3a.proxy.host</name>
<value> << PROXY SERVER >> </value>
</property>
<property>
<name>fs.s3a.proxy.port</name>
<value><< PROXY PORT >></value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>SSE-KMS</value>
</property>
<property>
<name>fs.s3a.server-side-encryption-key</name>
<value>arn:aws:kms:<< KEY >></value>
</property>
Hadoop Client Talks to s3 via Hadoop-AWS
package via s3a protocol
EMR only understand s3 , so s3a to s3
mapping is done by
Confidential
GGSCI - OutputSecurity - Authentication
### Kerberos params
gg.handler.name.authType=kerberos
gg.handler.name.kerberosKeytabFile=/etc/security/keytabs/lcapp.headless.keytab
gg.handler.name.kerberosPrincipal=osuser@PROD-DOMAIN.COM
Confidential
GGSCI - OutputSecurity - Username/password Encryption
ORACLEWALLETUSERNAME ggadalias ggadapters
ORACLEWALLETPASSWORD ggadalias ggadapters
gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[ggadalias ggadapters]
gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[ggadalias ggadapters]
gg.handler.hdfs.hiveJdbcUserName=unencrypted_username
gg.handler.hdfs.hiveJdbcPassword=unencrypted_password
Confidential
GGSCI - Output
Password encryption steps:
1. Add credential store:
GGSCI > ADD CREDENTIALSTORE
Credential store created in ./dircrd/.
2. Add the credential for these users.
ALTER CREDENTIALSTORE ADD USER <Hive_username>, password
<Hive_password> alias ggadalias domain ggadapters
Security-Credential Store
Confidential
How HDFS Handler handles the Data Flow ?
• The output format chosen is Avro row ocf format for three reasons
• Integration with Hive
• Seamless Schema Evolution
• More compact than Avro op ocf
• There are two files produced namely Avro and Avsc files.
Confidential
GGSCI - OutputConfigure High Availability for GoldenGate Resource
As ROOT user.
agctl add goldengate gg1 --gg_home /bdggvol/ggaws --instance_type dual
--nodes node1,node2 -network 1 --ip 99.99.99.99 --user appuser --group oinstall
--filesystems ora.registry.acfs --environment_vars
"LD_LIBRARY_PATH=/opt/java/default/jre/lib/amd64/server:/lib,
PATH=/opt/java/default/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin,
JAVA_HOME=/opt/java/default"
$GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u user:oracle:rwx
$GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u group:oinstall:rwx
$GRID_HOME/bin/crsctl setperm resource xag.gg1.goldengate -u group:oinstall:rwx
$ GRID_HOME/bin/crsctl setperm resource ora.net1.network -u user:appuser:rwx
Confidential
GGSCI - OutputSchema Regex
Avro Doesn’t like ($) symbol ! – No problem :
gg.schemareplaceregex=[$:]
gg.schemareplacestring=_
Confidential
 Upgrade from 12.2 to 12.3 has changed the timestamp format from [YYYY-MM-
DD:HH24:MI:SS.FFF] to [YYYY-MM-DD HH24:MI:SS.FFF]
 To avoid the dreaded ERROR OGG-15050 Error loading Java VM runtime library: (2 No
such file or directory) ensure to set the following parameter JAVA_HOME,
LD_LIBRARY_PATH and PATH properly. Remember manager passes the ENV variables.
 It’s a multi-threaded process and ensure to allocate sufficient Unix resources like NPROC ..
 Please monitor the heart beat table from Hive.
 Design the application to be Idempotent.
 Run the HDFS handler from dedicated hadoop client node.
32
Goldengate Big Data Adapter – Things To Remember
Confidential
Troubleshooting Scenarios
Confidential
GGSCI - OutputRewind – Functionality
Confidential
GGSCI - OutputBefore Patch - 17995064 
Confidential
GGSCI - OutputAfter Patch - 17995064 
Confidential
GGSCI - OutputLobs-Clobs-Blobs
• Data team needed the CLOBS even though the CLOB is not updated.
• GETUPDATEBEFORES doesn’t work for CLOB
• Lobs :Always Include LOBS In Trail File (Doc ID 1639717.1)
• Include the extract parameters NOCOMPRESSUPDATES and
FETCHCOLS option in the TABLE parameter, for example:
NOCOMPRESSUPDATES
TABLE schema.mytable, fetchcols (mylobs1, mylob2);
Confidential
 SR 3-15688859001 : Data loss after a glitch between goldengate
adapter and namenode
 Please test the rewind functionality and ability to handle duplicate data
 Please upgrade to latest release - OGGBD 12.3.1.1.0 which contains the fix for
the abend issue.
 Monitor heart beat table in Hive.
 Monitor the ERROR messages from logfiles - RHDFS_info_log4j.log [Splunk]
 java.io.EOFException: Premature EOF: no length prefix available
38
Ability to handle hadoop failures
Confidential
GoldenGate Big Data Adapter - Monitoring [Wavefront]
Q & A
Thank You

More Related Content

What's hot

Oracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim WilliamsOracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim WilliamsMarkus Michalewicz
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Amazon Web Services
 
Oracle Goldengate training by Vipin Mishra
Oracle Goldengate training by Vipin Mishra Oracle Goldengate training by Vipin Mishra
Oracle Goldengate training by Vipin Mishra Vipin Mishra
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
 
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...TigerGraph
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to CassandraUmair Mansoob
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
 
Best New Features of Oracle Database 12c
Best New Features of Oracle Database 12cBest New Features of Oracle Database 12c
Best New Features of Oracle Database 12cPini Dibask
 
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionOracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionMarkus Michalewicz
 
Oracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACOracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACMarkus Michalewicz
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and OptimizationPgDay.Seoul
 
Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil Nair
 
Sql Objects And PL/SQL
Sql Objects And PL/SQLSql Objects And PL/SQL
Sql Objects And PL/SQLGary Myers
 
My SYSAUX tablespace is full - please help
My SYSAUX tablespace is full - please helpMy SYSAUX tablespace is full - please help
My SYSAUX tablespace is full - please helpMarkus Flechtner
 
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングしばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングオラクルエンジニア通信
 
Oracle Databaseを用いて学ぶ RDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016
Oracle Databaseを用いて学ぶRDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016 Oracle Databaseを用いて学ぶRDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016
Oracle Databaseを用いて学ぶ RDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016 Ryota Watabe
 
MySQLのソース・ターゲットエンドポイントとしての利用
MySQLのソース・ターゲットエンドポイントとしての利用MySQLのソース・ターゲットエンドポイントとしての利用
MySQLのソース・ターゲットエンドポイントとしての利用QlikPresalesJapan
 
DBA 3 year Interview Questions
DBA 3 year Interview QuestionsDBA 3 year Interview Questions
DBA 3 year Interview QuestionsNaveen P
 
Oracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイドOracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイドオラクルエンジニア通信
 

What's hot (20)

Oracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim WilliamsOracle Flex ASM - What’s New and Best Practices by Jim Williams
Oracle Flex ASM - What’s New and Best Practices by Jim Williams
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
 
Oracle Goldengate training by Vipin Mishra
Oracle Goldengate training by Vipin Mishra Oracle Goldengate training by Vipin Mishra
Oracle Goldengate training by Vipin Mishra
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
 
Best New Features of Oracle Database 12c
Best New Features of Oracle Database 12cBest New Features of Oracle Database 12c
Best New Features of Oracle Database 12c
 
Chapter 3 stored procedures
Chapter 3 stored proceduresChapter 3 stored procedures
Chapter 3 stored procedures
 
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionOracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
 
Oracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RACOracle Extended Clusters for Oracle RAC
Oracle Extended Clusters for Oracle RAC
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
 
Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016
 
Sql Objects And PL/SQL
Sql Objects And PL/SQLSql Objects And PL/SQL
Sql Objects And PL/SQL
 
My SYSAUX tablespace is full - please help
My SYSAUX tablespace is full - please helpMy SYSAUX tablespace is full - please help
My SYSAUX tablespace is full - please help
 
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングしばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
 
Oracle Databaseを用いて学ぶ RDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016
Oracle Databaseを用いて学ぶRDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016 Oracle Databaseを用いて学ぶRDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016
Oracle Databaseを用いて学ぶ RDBMSの基本 (抜粋版) - JPOUG Oracle Database入学式 2016
 
MySQLのソース・ターゲットエンドポイントとしての利用
MySQLのソース・ターゲットエンドポイントとしての利用MySQLのソース・ターゲットエンドポイントとしての利用
MySQLのソース・ターゲットエンドポイントとしての利用
 
DBA 3 year Interview Questions
DBA 3 year Interview QuestionsDBA 3 year Interview Questions
DBA 3 year Interview Questions
 
Oracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイドOracle GoldenGate Veridata 12cR2 セットアップガイド
Oracle GoldenGate Veridata 12cR2 セットアップガイド
 

Similar to LendingClub RealTime BigData Platform with Oracle GoldenGate

EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?confluent
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelDeepak Chandramouli
 
Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015aioughydchapter
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Tom Diederich
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Data integration
Data integrationData integration
Data integrationBallerina
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13DECK36
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaGuido Schmutz
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationHTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationEDB
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 

Similar to LendingClub RealTime BigData Platform with Oracle GoldenGate (20)

126 Golconde
126 Golconde126 Golconde
126 Golconde
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | Gimel
 
Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Data integration
Data integrationData integration
Data integration
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationHTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

LendingClub RealTime BigData Platform with Oracle GoldenGate

  • 1. Real Time Big Data Analytic Platform with Oracle GoldenGate Big Data Adapters Rajit Saha Principal BigData Engineer Vengata(Venky) Guruswamy Principal Database Administrator
  • 2. Confidential  Introduction - LendingClub  Real-time Big Data Platform Use cases  Lambda Architecture – On-Premise / AWS  Data Processing Flow/Algorithm  GoldenGate Big Data Adapter Implementation Architecture, Configuration and Troubleshooting Scenarios Agenda 2
  • 3. Confidential 3 LendingClub LendingClub is America’s largest online marketplace connecting borrowers and investors.  Headquartered in San Francisco, CA  Office in Westborough, MA
  • 4. Confidential 4 Product Lines Personal Loans Loans up to $40K 600+ FICO 36 and 60 mo. terms Small Business Business Loans up to $300K At least $75,000 in annual sales At least 2 years in business Patient Financing Extended plans up to $50K; no-interest plans up to $32K Auto Refinance Must have an outstanding balance of $5K-$55K, initiated in the last 3 months with 24 months of remaining payments
  • 5. Confidential  Maintain Reporting Databases  Implement transaction history for key tables(Change Data capture)  Enable real-time feed to Big data platform 5 Use Cases of Oracle GoldenGate @ LendingClub
  • 6. Confidential  LendingClub’s Big Data Platform is responsible for generating thousands of reports for the company – Daily, Monthly, Quarterly  Investor Reports, Financial Reports, Collection Reports, Risk Reports, Marketing Reports etc. • Near Real-time Availability of OLTP Data in Hadoop Based OLAP warehouse • Thousands of Tables and rapidly increasing data volumes towards petabytes • Thousands of internal adhoc Users 6 Why Big Data Platform ? Why Real-time Data Needed ?
  • 7. Confidential  Typical changes are:  Positive Response from customers  Indicated DON’T CALL on a particular phone number from previous contacts.  Under certain circumstance, it can also be limited on how many times we can contact borrower by legal or regulatory reason.  “NO CALL” list needs be uploaded and fed into dialer in a very timely fashion, and ideally it should be near real-time at the event/change occurred. 7 NO Call list is a list of phone numbers we upload to dialer system in order to suppress Collections calls due to loan and/or customer status changes throughout the day. Real-time Use Cases NO CALL List Generation
  • 8. Confidential 8 What is it? Why are we using Real-time? Who is it for? Real-time Use Cases Marketing Dashboard Track the progress of all the marketing channels with respect to quarterly forecast. Business needs numbers every hour to change the strategy in case any channel performance compared to target. Marketing Team
  • 9. Confidential 9 Lambda Architecture On-Premise HortonWorks Cluster EXTRACT TRAIL REPLICAT INCREMENTAL TRANSACTIONS Data Science CONSUMERS TRAIL TRAIL
  • 10. Confidential 10 Lambda Architecture AWS S3/EMR Clusters EXTRACT TRAIL REPLICAT INCREMENTAL TRANSACTIONS Data Science CONSUMERS TRAIL TRAIL
  • 11. Confidential SPEED LAYER - DATA FLOW DIAGRAM
  • 12. Confidential  DDL Verification from Oracle Source to GG Table before processing starts  Data Inserted fromAvro GG Table to CDC partition table  Hive Dynamic Partition on OP_TS  ORC (Optimized Row Columnar) Format  Nightly Snapshot Creation –  Latest Updates from CDC tables and rest from previous day Snapshot  Create Hive and Presto Real Time View with CDC and Nightly Snapshot  Data Validation : Full table Checksum with Sqoop Snapshot 12 Few Important Steps in Speed Layer Processing Process Flow ..
  • 13. Confidential How are we getting latest update – For a Table , within a transaction for a row CREATE EXTERNAL TABLE SCHEMA_GG.FOO ( TABLE STRING, OP_TYPE STRING, OP_TS STRING, CURRENT_TS STRING , POS STRING , PRIMARY_KEYS ARRAY<STRING> , TOKENS MAP<STRING,STRING> , PKEY STRING , .. .. .. {"TKN-OPTYPE":"INSERT","TKN-RECORDLENGTH":"1400","TKN-LOGRBA":"1774157","TKN- OBJECTNAME":”SCHEMA.FOO","TKN-FILERBA":"","TKN-LAGMSEC":"54420","TKN- USERNAME":”XXX","TKN-SCN":"7633890649912","TKN-RSN":"7633890649838","TKN- TRANSACTIONINDICATOR":"BEGIN","TKN-RECORDTIMESTAMP":"2017-07-30 23:49:14","TKN- LOGPOSITION":"19784208","TKN-FILESEQNO":"","TKN-ROWID":"AAA46FACCAAD4CtAAS","TKN- XID":"184.13.1380795","TKN-COMMITTIMESTAMP":"2017-07-30 23:49:15.000000","TKN-REDOTHREAD":"1"} SELECT TOKENS FROM SCHEMA_GG.FOO LIMIT 1;Input Table From GG Adaptor All the transaction records goes to Hive Partitioned ORC CDC table- SCHEMA_RT_CDC.FOO SELECT DISTINCT VW.PKEY,.. FROM (SELECT FOO.ID, .. RANK() OVER (PARTITION BY FOO.PKEY ORDER BY FOO.TKN-SCN DESC, FOO.TKN-RSN DESC, FOO.POS DESC, FOO .CURRENT_TS DESC) AS DEDUP_RANK FROM SCHEMA_RT_CDC.FOO WHERE FOO.OP_TS_DATE >='2017-07-30') VW WHERE VW.DEDUP_RANK = 1
  • 14. Confidential Verification of Real Time View definition SELECT PKEY, OP_TS, TKN_SCN, TKN_RSN, POS, CURRENT_TS, MODIFIED_D, OP_TYPE FROM SCHEMA_RT_CDC.FOO WHERE PKEY = 2181322736 ORDER BY TKN_SCN DESC,TKN_RSN DESC,POS DESC,CURRENT_TS DESC Records from CDC SELECT PKEY, MODIFIED_D FROM SCHEMA_RT.FOO WHERE PKEY = 2181322736Records from Real Time View PKEY OP_TS TKN_SCN TKN_RSN POS CURRENT_TS MODIFIED_D OP_TYPE 2181322736 2017-06-29 02:32:33.0 7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U 2181322736 2017-06-29 02:32:33.0 7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U 2181322736 2017-06-29 00:39:16.001 7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U 2181322736 2017-06-29 00:39:16.001 7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U 2181322736 2017-06-28 17:02:59.002 7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I 2181322736 2017-06-28 17:02:59.002 7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I 2181322736 2017-06-28 17:02:59.002 7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I 2181322736 2017-06-28 17:02:59.002 7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I 2181322736 2017-06-28 19:32:33.0
  • 15. Confidential SLAs Transaction -> Import Process Start Avg ~5mins Transaction -> Import Process End Avg ~ 5 – 15 mins Spikes are for NIGHTLY SNAPSHOT Creation 1. Full table creation 2. TEZ is not good for Skew JOIN 3. Cluster Busy
  • 16. Confidential GoldenGate Big Data Adapter Implementation Architecture
  • 17. Confidential 17 GoldenGate BigData Adapter Pipeline Db Node 1 DB Extract Data Pump Golden Gate Adapter Servers (HA) Replicat for HDFS Hive Db Node 1 TRAIL FILE 1 TRAIL FILE 2 TRAIL FILE 3 TRAIL FILE 1 TRAIL FILE 2 TRAIL FILE 3
  • 18. Confidential 18 GoldenGate BigData Infrastructure Architecture
  • 19. Confidential 19 GoldenGate BigData Infrastructure Architecture – Site Failover
  • 20. Confidential GoldenGate Big Data Adapter Configuration
  • 21. Confidential Source Extract Parameter - eprod.prm NOCOMPRESSUPDATES GETUPDATEBEFORES TABLE <schema name>.*, tokens ( TKN-COMMITTIMESTAMP = @GETENV('GGHEADER', 'COMMITTIMESTAMP'), TKN-FILESEQNO = @GETENV('RECORD', 'FILESEQNO'), TKN-FILERBA = @GETENV('RECORD', 'FILERBA'), TKN-LAGMSEC = @GETENV('LAG', 'MSEC'), TKN-LOGPOSITION = @GETENV('GGHEADER', 'LOGPOSITION'), TKN-OBJECTNAME = @GETENV('GGHEADER', 'OBJECTNAME'), TKN-OPTYPE = @GETENV('GGHEADER', 'OPTYPE'), TKN-RECORDLENGTH = @GETENV('GGHEADER', 'RECORDLENGTH'), TKN-RECORDTIMESTAMP = @GETENV('RECORD', 'TIMESTAMP'), TKN-ROWID = @GETENV('RECORD', 'ROWID'), TKN-RSN = @GETENV('RECORD', 'RSN'), TKN-SCN = @GETENV('TRANSACTION', 'CSN'), TKN-TRANSACTIONINDICATOR = @GETENV('GGHEADER', 'TRANSACTIONINDICATOR'), TKN-USERNAME = @GETENV('TRANSACTION', 'USERNAME'), TKN-XID = @GETENV('TRANSACTION', 'XID') )
  • 22. Confidential Source Data Pump Extract Parameter - pprod.prm EXTRACT PPROD PASSTHRU RMTHOST <Goldengate adapter server>, MGRPORT <port no> -- Trail file written by Pump to the remote host RMTTRAIL /filesystem/pumptrail_ALL/pt -- Pass data through without mapping, filtering, conversion: PASSTHRU -- Specify tables to be captured: TABLE *.*;
  • 23. Confidential Target – Replicat Parameter - rhdfs.prm REPLICAT RHDFS TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 -Schemas MAP SCHEMA1.*, TARGET SCHEMA1_GG.*; MAP SCHEMA2.*, TARGET SCHEMA2_GG.*; -Heartbeat MAP DBASCHEMA.HEARTBEAT,TARGET DBASCHEMA_GG.HEARTBEAT;
  • 25. Confidential Oracle GoldenGate to AWS S3 Connector gg.handlerlist=hdfs gg.handler.hdfs.type=hdfs gg.handler.hdfs.rootFilePath=s3://LCBUCKET/OGG s3replicat.props Custom Built Hadoop Client from Trunk hadoop-3.0.0-alpha3-SNAPSHOT <property> <name>fs.s3a.access.key</name> <value><< AWS KEY >></value> </property> <property> <name>fs.s3a.secret.key</name> <value><<AWS SECRET KEY >></value> </property> <property> <name>fs.s3a.proxy.host</name> <value> << PROXY SERVER >> </value> </property> <property> <name>fs.s3a.proxy.port</name> <value><< PROXY PORT >></value> </property> <property> <name>fs.s3a.connection.ssl.enabled</name> <value>true</value> </property> <property> <name>fs.s3.impl</name> <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> </property> <property> <name>fs.s3a.server-side-encryption-algorithm</name> <value>SSE-KMS</value> </property> <property> <name>fs.s3a.server-side-encryption-key</name> <value>arn:aws:kms:<< KEY >></value> </property> Hadoop Client Talks to s3 via Hadoop-AWS package via s3a protocol EMR only understand s3 , so s3a to s3 mapping is done by
  • 26. Confidential GGSCI - OutputSecurity - Authentication ### Kerberos params gg.handler.name.authType=kerberos gg.handler.name.kerberosKeytabFile=/etc/security/keytabs/lcapp.headless.keytab gg.handler.name.kerberosPrincipal=osuser@PROD-DOMAIN.COM
  • 27. Confidential GGSCI - OutputSecurity - Username/password Encryption ORACLEWALLETUSERNAME ggadalias ggadapters ORACLEWALLETPASSWORD ggadalias ggadapters gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[ggadalias ggadapters] gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[ggadalias ggadapters] gg.handler.hdfs.hiveJdbcUserName=unencrypted_username gg.handler.hdfs.hiveJdbcPassword=unencrypted_password
  • 28. Confidential GGSCI - Output Password encryption steps: 1. Add credential store: GGSCI > ADD CREDENTIALSTORE Credential store created in ./dircrd/. 2. Add the credential for these users. ALTER CREDENTIALSTORE ADD USER <Hive_username>, password <Hive_password> alias ggadalias domain ggadapters Security-Credential Store
  • 29. Confidential How HDFS Handler handles the Data Flow ? • The output format chosen is Avro row ocf format for three reasons • Integration with Hive • Seamless Schema Evolution • More compact than Avro op ocf • There are two files produced namely Avro and Avsc files.
  • 30. Confidential GGSCI - OutputConfigure High Availability for GoldenGate Resource As ROOT user. agctl add goldengate gg1 --gg_home /bdggvol/ggaws --instance_type dual --nodes node1,node2 -network 1 --ip 99.99.99.99 --user appuser --group oinstall --filesystems ora.registry.acfs --environment_vars "LD_LIBRARY_PATH=/opt/java/default/jre/lib/amd64/server:/lib, PATH=/opt/java/default/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin, JAVA_HOME=/opt/java/default" $GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u user:oracle:rwx $GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u group:oinstall:rwx $GRID_HOME/bin/crsctl setperm resource xag.gg1.goldengate -u group:oinstall:rwx $ GRID_HOME/bin/crsctl setperm resource ora.net1.network -u user:appuser:rwx
  • 31. Confidential GGSCI - OutputSchema Regex Avro Doesn’t like ($) symbol ! – No problem : gg.schemareplaceregex=[$:] gg.schemareplacestring=_
  • 32. Confidential  Upgrade from 12.2 to 12.3 has changed the timestamp format from [YYYY-MM- DD:HH24:MI:SS.FFF] to [YYYY-MM-DD HH24:MI:SS.FFF]  To avoid the dreaded ERROR OGG-15050 Error loading Java VM runtime library: (2 No such file or directory) ensure to set the following parameter JAVA_HOME, LD_LIBRARY_PATH and PATH properly. Remember manager passes the ENV variables.  It’s a multi-threaded process and ensure to allocate sufficient Unix resources like NPROC ..  Please monitor the heart beat table from Hive.  Design the application to be Idempotent.  Run the HDFS handler from dedicated hadoop client node. 32 Goldengate Big Data Adapter – Things To Remember
  • 35. Confidential GGSCI - OutputBefore Patch - 17995064 
  • 36. Confidential GGSCI - OutputAfter Patch - 17995064 
  • 37. Confidential GGSCI - OutputLobs-Clobs-Blobs • Data team needed the CLOBS even though the CLOB is not updated. • GETUPDATEBEFORES doesn’t work for CLOB • Lobs :Always Include LOBS In Trail File (Doc ID 1639717.1) • Include the extract parameters NOCOMPRESSUPDATES and FETCHCOLS option in the TABLE parameter, for example: NOCOMPRESSUPDATES TABLE schema.mytable, fetchcols (mylobs1, mylob2);
  • 38. Confidential  SR 3-15688859001 : Data loss after a glitch between goldengate adapter and namenode  Please test the rewind functionality and ability to handle duplicate data  Please upgrade to latest release - OGGBD 12.3.1.1.0 which contains the fix for the abend issue.  Monitor heart beat table in Hive.  Monitor the ERROR messages from logfiles - RHDFS_info_log4j.log [Splunk]  java.io.EOFException: Premature EOF: no length prefix available 38 Ability to handle hadoop failures
  • 39. Confidential GoldenGate Big Data Adapter - Monitoring [Wavefront]
  • 40. Q & A Thank You