SlideShare a Scribd company logo
Big Data from the
trenches
Advice from the FSI industry
By: Azrul MADISA
About me…
• VP – Enterprise Data
Architect @ Maybank
• Take care of Maybank’s
data world wide
• Nuts about data, analytics
and software dev.
• Very hands on, love to read
• Teach aikido to kids
Big Data landscape today
https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
Too many big data tech?
Wait … what?
I have to know ALL
that?
Let’s change the game a bit…
Usecase
The data journey
The data journey
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Sandbox
Example: credit scoring and loan origination
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Screens
Data staging
area
Data
warehouse
Score card
builder
Decisioning
Sandbox
Data
scientist
Acquisition with quality
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data
StagingApplication
Over-night
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data Entry
Data Staging
Application
Over-night
Audit trail
Weekly
Acquisition with quality
• Non-human error
• Use PEWMA algorithm
https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
Data sandbox
Creating a sandbox on the cloud
• Why cloud:
– Scale data discovery as needed
– Merging private with public data
– Less bureaucratic
• But…
– Customer data on the cloud is a no no
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
What if there is a way to mask numerical data
while keeping the statistical properties intact
Easier for the
regulators to
digest
Creating a sandbox on the cloud
• Random projection
• Usually used for dimension reduction
Original
data
(M x N)
Random
matrix
(N x N)
X =
Masked
data
(M x N)
Fast real-time vs. batch
analytics
Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Analytical
model
Monthly
Fast real-time analytics
• ‘Batch’ analytics:
User
Application
Over-night
batch
Data
warehouse
Predictive
analytics
Descriptive
analytics
Real time decisioning
Monthly
Fast real-time analytics
• So what is real time analytics:
User
Application
Real time decisioning analytics
Analytical
model
updated in
real time
Fast real-time analytics
• So what is real time analytics:
User
Application
Real time analytics and decisioning
Analytical
model
updated in
real time
Predictive
analytics
Batch
analytical
model
Real-time
analytical model
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Location, user info
SMS campaign
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Change behaviour
(E.g. buy
something else)
Learn new
behaviour
Fast real-time analytics : Real-time analytics in
action
Over time
Interest
in
concerts
Interest
in movies
Interest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5 1
174
347
520
693
866
1039
1212
1385
1558
1731
1904
2077
2250
2423
2596
2769
2942
3115
3288
3461
3634
3807
3980
4153
4326
4499
4672
4845
5018
5191
5364
5537
5710
5883
6056
6229
6402
6575
6748
6921
7094
7267
7440
7613
7786
7959
8132
8305
8478
8651
8824
8997
9170
9343
9516
9689
9862
10…
10…
10…
10…
10…
10…
INTEREST
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Real time
analytical
tracking and
learning of
people’s
interest
Putting it all together
under one architecture
Data architecture
• Some difficult questions around big data and analytics
– How can I invest in big data while managing cost?
– How can I “experiment” with big data while mitigating risks?
– How can I create a 360 view of data without boiling the ocean?
– How can I use oversea data without violation regulations?
Tiered data architecture
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Tiered data architecture
Data
consumer
Data virtualization
SQL /
Rest /
SOAP /
MQ
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Official data model
Tiered data architecture
• Investment / level of support
Master data
Fast data
Hot data
Cold data
Investment
in CPU /
memory
Investment
in storage
Level 1
Level 1
Level 2
Level 3
Data virtualization Level 1
Level of
support
Tiered data architecture
• Invest where it matters
– Defer investment if needed
– Refocus investment without disrupting business
• Data virtualization
– Create a façade for data access
– Provide standard interface for data
– Single data model, single access, single quality checkpoint
• Allow ‘experimentation’
– E.g. cut-off point for hot / cold
• Oversea data access
– Data stays where they are, only aggregated data is transferred back
– More palatable to regulators
• 360 view
– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed
• Single place to check for data quality
That’s all folks…
• Linkedin:
– https://www.linkedin.com/in/azrul-madisa-6052419

More Related Content

What's hot

Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
Cloudera, Inc.
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
MapR Technologies
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
DataWorks Summit
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
InSemble
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Dataconomy Media
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
Jen Stirrup
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
Rob Winters
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
AtScale
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
DataStax
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for Scale
DataStax
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
Keshav Tripathy
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Formant
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
Aseem Bansal
 
BigData
BigDataBigData
BigData
Shankar R
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Ivo Vachkov
 

What's hot (20)

Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Introduction: Architecting for Scale
Introduction: Architecting for ScaleIntroduction: Architecting for Scale
Introduction: Architecting for Scale
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
 
BigData
BigDataBigData
BigData
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 

Similar to Big data from the trenches

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
Nicolas Morales
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
Catalina Arango
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 

Similar to Big data from the trenches (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Big data from the trenches

  • 1. Big Data from the trenches Advice from the FSI industry By: Azrul MADISA
  • 2. About me… • VP – Enterprise Data Architect @ Maybank • Take care of Maybank’s data world wide • Nuts about data, analytics and software dev. • Very hands on, love to read • Teach aikido to kids
  • 3. Big Data landscape today https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
  • 4. Too many big data tech? Wait … what? I have to know ALL that?
  • 5. Let’s change the game a bit… Usecase
  • 7. The data journey Acquisition Dumping Tidy data Real Time Analytics Analytical model Sandbox
  • 8. Example: credit scoring and loan origination Acquisition Dumping Tidy data Real Time Analytics Analytical model Screens Data staging area Data warehouse Score card builder Decisioning Sandbox Data scientist
  • 10. Acquisition with quality • Manage data quality up front • Human-factor data quality Data Entry Data StagingApplication Over-night
  • 11. Acquisition with quality • Manage data quality up front • Human-factor data quality Data Entry Data Staging Application Over-night Audit trail Weekly
  • 12. Acquisition with quality • Non-human error • Use PEWMA algorithm https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
  • 14. Creating a sandbox on the cloud • Why cloud: – Scale data discovery as needed – Merging private with public data – Less bureaucratic • But… – Customer data on the cloud is a no no
  • 15. Creating a sandbox on the cloud • Masking – Non-numerical data => No sweat! – E.g. • En. Abdul Jalil => 837x2unxy237e832!@ • 720324-03-8891 => 472376-84-8732 • Masking numerical data?
  • 16. Creating a sandbox on the cloud • Masking – Non-numerical data => No sweat! – E.g. • En. Abdul Jalil => 837x2unxy237e832!@ • 720324-03-8891 => 472376-84-8732 • Masking numerical data? What if there is a way to mask numerical data while keeping the statistical properties intact Easier for the regulators to digest
  • 17. Creating a sandbox on the cloud • Random projection • Usually used for dimension reduction Original data (M x N) Random matrix (N x N) X = Masked data (M x N)
  • 18. Fast real-time vs. batch analytics
  • 19. Fast real-time analytics • ‘Batch’ analytics: User Application Over-night batch Data warehouse Predictive analytics Descriptive analytics Analytical model Monthly
  • 20. Fast real-time analytics • ‘Batch’ analytics: User Application Over-night batch Data warehouse Predictive analytics Descriptive analytics Real time decisioning Monthly
  • 21. Fast real-time analytics • So what is real time analytics: User Application Real time decisioning analytics Analytical model updated in real time
  • 22. Fast real-time analytics • So what is real time analytics: User Application Real time analytics and decisioning Analytical model updated in real time Predictive analytics Batch analytical model Real-time analytical model
  • 23. Fast real-time analytics • Q- learning • E.g. SMS advertisement campaign Real-time Analytical Marketting System Location, user info SMS campaign
  • 24. Fast real-time analytics • Q- learning • E.g. SMS advertisement campaign Real-time Analytical Marketting System Change behaviour (E.g. buy something else) Learn new behaviour
  • 25. Fast real-time analytics : Real-time analytics in action Over time Interest in concerts Interest in movies Interest in sports
  • 26. Fast real-time analytics: Real time analytics in action 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 174 347 520 693 866 1039 1212 1385 1558 1731 1904 2077 2250 2423 2596 2769 2942 3115 3288 3461 3634 3807 3980 4153 4326 4499 4672 4845 5018 5191 5364 5537 5710 5883 6056 6229 6402 6575 6748 6921 7094 7267 7440 7613 7786 7959 8132 8305 8478 8651 8824 8997 9170 9343 9516 9689 9862 10… 10… 10… 10… 10… 10… INTEREST MESSAGES SPORTS CONCERTS MOVIES Interest in concerts Interest in movies Interest in sports
  • 27. Fast real-time analytics: Real time analytics in action 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 174 347 520 693 866 1039 1212 1385 1558 1731 1904 2077 2250 2423 2596 2769 2942 3115 3288 3461 3634 3807 3980 4153 4326 4499 4672 4845 5018 5191 5364 5537 5710 5883 6056 6229 6402 6575 6748 6921 7094 7267 7440 7613 7786 7959 8132 8305 8478 8651 8824 8997 9170 9343 9516 9689 9862 10… 10… 10… 10… 10… 10… INTEREST MESSAGES SPORTS CONCERTS MOVIES Interest in concerts Interest in movies Interest in sports Real time analytical tracking and learning of people’s interest
  • 28. Putting it all together under one architecture
  • 29. Data architecture • Some difficult questions around big data and analytics – How can I invest in big data while managing cost? – How can I “experiment” with big data while mitigating risks? – How can I create a 360 view of data without boiling the ocean? – How can I use oversea data without violation regulations?
  • 30. Tiered data architecture Data warehouse - Staging - SQL access Big Data Infra (E.g. Hadoop) Data sources Batch Real-time Real-time store Master / Reference Data Social / Cloud Public Data Oversea Data Oversea data sources Social network Batch
  • 31. Tiered data architecture Data consumer Data virtualization SQL / Rest / SOAP / MQ Data warehouse - Staging - SQL access Big Data Infra (E.g. Hadoop) Data sources Batch Real-time Real-time store Master / Reference Data Social / Cloud Public Data Oversea Data Oversea data sources Social network Batch Official data model
  • 32. Tiered data architecture • Investment / level of support Master data Fast data Hot data Cold data Investment in CPU / memory Investment in storage Level 1 Level 1 Level 2 Level 3 Data virtualization Level 1 Level of support
  • 33. Tiered data architecture • Invest where it matters – Defer investment if needed – Refocus investment without disrupting business • Data virtualization – Create a façade for data access – Provide standard interface for data – Single data model, single access, single quality checkpoint • Allow ‘experimentation’ – E.g. cut-off point for hot / cold • Oversea data access – Data stays where they are, only aggregated data is transferred back – More palatable to regulators • 360 view – Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed • Single place to check for data quality
  • 34. That’s all folks… • Linkedin: – https://www.linkedin.com/in/azrul-madisa-6052419