Big data from the trenches

Big Data from the
trenches
Advice from the FSI industry
By: Azrul MADISA

About me…
• VP – Enterprise Data
Architect @ Maybank
• Take care of Maybank’s
data world wide
• Nuts about data, analytics
and software dev.
• Very hands on, love to read
• Teach aikido to kids

Big Data landscape today
https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck

Too many big data tech?
Wait … what?
I have to know ALL
that?

Let’s change the game a bit…
Usecase

The data journey
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Sandbox

Example: credit scoring and loan origination
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Screens
Data staging
area
Data
warehouse
Score card
builder
Decisioning
Sandbox
Data
scientist

Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data
StagingApplication
Over-night

• Manage data quality up front
• Human-factor data quality
Data Entry
Data Staging
Application
Over-night
Audit trail
Weekly

• Non-human error
• Use PEWMA algorithm
https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/

Creating a sandbox on the cloud
• Why cloud:
– Scale data discovery as needed
– Merging private with public data
– Less bureaucratic
• But…
– Customer data on the cloud is a no no

• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?

• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
What if there is a way to mask numerical data
while keeping the statistical properties intact
Easier for the
regulators to
digest

• Random projection
• Usually used for dimension reduction
Original
data
(M x N)
Random
matrix
(N x N)
X =
Masked
data
(M x N)

Fast real-time vs. batch
analytics

Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Analytical
model
Monthly

• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Real time decisioning
Monthly

• So what is real time analytics:
User
Application
Real time decisioning analytics
Analytical
model
updated in
real time

• So what is real time analytics:
User
Application
Real time analytics and decisioning
Analytical
model
updated in
real time
Predictive
analytics
Batch
analytical
model
Real-time
analytical model

• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Location, user info
SMS campaign

• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Change behaviour
(E.g. buy
something else)
Learn new
behaviour

Fast real-time analytics : Real-time analytics in
action
Over time
Interest
in
concerts
Interest
in movies
Interest
in sports

Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports

Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Real time
analytical
tracking and
learning of
people’s
interest

Putting it all together
under one architecture

Data architecture
• Some difficult questions around big data and analytics
– How can I invest in big data while managing cost?
– How can I “experiment” with big data while mitigating risks?
– How can I create a 360 view of data without boiling the ocean?
– How can I use oversea data without violation regulations?

Tiered data architecture
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch

Data
consumer
Data virtualization
SQL /
Rest /
SOAP /
MQ
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Official data model

• Investment / level of support
Master data
Fast data
Hot data
Cold data
Investment
in CPU /
memory
Investment
in storage
Level 1
Level 1
Level 2
Level 3
Data virtualization Level 1
Level of
support

• Invest where it matters
– Defer investment if needed
– Refocus investment without disrupting business
• Data virtualization
– Create a façade for data access
– Provide standard interface for data
– Single data model, single access, single quality checkpoint
• Allow ‘experimentation’
– E.g. cut-off point for hot / cold
• Oversea data access
– Data stays where they are, only aggregated data is transferred back
– More palatable to regulators
• 360 view
– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed
• Single place to check for data quality

That’s all folks…
• Linkedin:
– https://www.linkedin.com/in/azrul-madisa-6052419

Big data from the trenches

More Related Content

What's hot

Similar to Big data from the trenches

Recently uploaded

Big data from the trenches