SlideShare a Scribd company logo
Fintech case study on Automated
Discovery, Control and Monitoring Data
SPEAKERS
ALBERTO BALAJI
AGENDA
▸Introduction
▸Challenges with big data
▸G-Research overview
▸Journey with HDP
▸Challenges and learnings
▸Recommendation
PARTNE
RSGLOBAL
SOLUTIONS FOR DISCOVERING AND
MANAGING SENSITIVE DATA
ABOUT PRIVACERA
DETECT
MALICIOUS OR
ACCIDENTAL
USE
CONTROL
ANONYMIZE
DATA/
RESTRICT
ACCESS
DISCOVER
WHAT TYPE OF
DATA STORED
AND WHERE?
REPORT
SECURITY AND
COMPLIANCE
REPORTING
SOLUTIONS FOR MANAGING SENSITIVE DATA
CHALLENGES WITH BIG DATA
▸High volume of data
▸Stringent security and privacy requirements
▸Multi-tenant environments
▸Multiple methods for accessing data
▸Business users using newer tools
▸Traditional tools for security and governance “retrofitted” for
the modern data architecture
DATA
DISCOVERY
ACCESS
CONTROL ANONYMIZATION MONITORING
4 STEPS FOR MANAGING SENSITIVE DATA
G-RESEARCH
CASE STUDY
ABOUT G-RESEARCH
▸Quantitative research technology company
▸Statistical analysis and Big Data pipelines to recognise
patterns and extract insights from very large market
datasets
▸Forecasting analytics to predict variances in financial
markets
▸Clients operate in capital markets globally
▸Undergoing very aggressive growth and adoption of cutting
edge technology
DATASETS
▸Market Data
▸Level 1 (top of book)
▸Basic market data (instrument, bid price, bid size, ask
price, ask size)
▸Level 2 (order book or market depth)
▸Richer data (highest bid prices, lowest ask prices)
DATASETS
▸Market Data
▸Level 1 (top of book)
▸Level 2 (order book or market depth)
▸Often represented as incremental updates at
nanosecond granularity
▸HUGE dataset!
Time Quantity Bid Ask Quantity Time
12:32:16 120 12.25 12.25 150 12:32:55
12:31:01 50 12.26 12.27 60 12:31:19
12:33:45 150 12.25 12.27 100 12:34:27
DATASETS
▸Other datasets include
▸Datapacks for additional enrichment
▸Risk analysis on positions, portfolios and strategies
▸Reference data about markets structures and corporate
entities
▸News feeds
▸…
HADOOP DATA PLATFORM – REASONS
High Volume Fast Processing
Multi Tenant Flexibility
TECHNOLOGY STACK
CORE REQUIREMENTS FOR DATA
PLATFORM
Security
▸Protect intellectual property
of the company
▸Datasets processed
through the pipeline
▸...but also code
Integration
▸Integration with existing
security systems and
policies
▸Authentication
▸Encryption
▸Strict and flexible
authorization
CORE REQUIREMENTS.. CONTD..
Governance
▸Governance
▸Ability to find data easily
enabling collaboration
▸Track data changes,
impact, lineage and
maintain consistency
Multi Tenancy and Scalability
▸Multi-tenancy
▸Variety of development
teams need to work on
the the same platform
and share data and
resources
▸Scalability
▸Explosive data growth
IMPORTANCE OF GOVERNANCE
- Someone who’s been quoted about a billion times
“Metadata is the new Data”
GOVERNANCE AND SECURITY - SOLUTION
DESIGN
GOVERNANCE AND SECURITY – SECURITY
FOUNDATION
Authentication Authorization Auditing Data Protection
Kerberos +
Knox
Ranger +
Knox
Ranger
HDFS
Encryption
GOVERNANCE AND SECURITY – MANAGING
RESTRICTED DATA
Data Discovery Access Control Anonymization Monitoring
Privacera +
Atlas
Ranger tag
based
policies
Ranger
Dynamic
Masking
Privacera
Custom
Spark
Lineage
ARCHITECTURE
CUSTOM METADATA IN ATLAS
▸Custom datasets  Atlas metamodel definitions
Type
spark_job
id
cardinality
indexable
operations
cardinality
indexable
input_data
cardinality
indexable
output_data
cardinality
indexable
Entity
Type
spark_join_
operation
Type
string
Type
spark_dataset
Type
Dataset
Type
Process
Type
spark_operation
Attributes
Type
spark_entity
Type
Referenceable
OUR JOURNEY IN DATA SECURITY
▸Datazone definition to
capture data movement
▸Advanced data discovery
and tagging
▸Custom lineage applied
on our own data types
Standard Ranger
Policies
Tag-based Ranger
Policies
Comprehensive
Tag-based Ranger
Policies
Privacera Advanced
Security Policies
Atlas OoB
Custom Atlas
metamodels
Privacera
▸Access management
through data tags
▸Basic Ranger policies
t=0
t=1
t=2
t=3
OUR JOURNEY IN SECURITY
MANAGEMENT - MONITORING
PUBLIC
DATAZONE
RESTRICTED
DATAZONE
TABLE A TABLE B
FOLDER
A
FOLDER
B
TABLE
C
TABLE
D
FOLDER
C
FOLDER
D
“Sensitive tags not
allowed”
“Sensitive tags are
OK”
KEY CHALLENGES AND LEARNINGS
▸Technical challenges
▸GraphDB does not scale as metastore
▸~100k’s entities tagged per week
▸Back-end rewritten to only use HBase + Solr
▸Open source flexibility can be a 2-sided coin
▸Business challenges
▸Business process integration
SUMMARY
▸Understand your data before expanding your data lake
▸Invest in automated classification and centralized metadata
▸Manage user access through data classification
▸Anonymise data to reduce exposure
▸Monitor the use of data - “trust but verify”
WE’RE

More Related Content

What's hot

Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafeDs 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Bloombase
 
Hitachi datasheet-hcp-and-bloombase-storesafe
Hitachi datasheet-hcp-and-bloombase-storesafeHitachi datasheet-hcp-and-bloombase-storesafe
Hitachi datasheet-hcp-and-bloombase-storesafe
Bloombase
 
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
Veritas Technologies LLC
 
Rama Kolappan – The multi-cloud geared for the digital business
Rama Kolappan – The multi-cloud geared for the digital businessRama Kolappan – The multi-cloud geared for the digital business
Rama Kolappan – The multi-cloud geared for the digital business
Veritas Technologies LLC
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl
 
DS_Appliance_Datasheet
DS_Appliance_DatasheetDS_Appliance_Datasheet
DS_Appliance_DatasheetMike McDermott
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYC
Ulf Mattsson
 
Phil Carter of IDC: An analyst point of view
Phil Carter of IDC: An analyst point of viewPhil Carter of IDC: An analyst point of view
Phil Carter of IDC: An analyst point of view
Veritas Technologies LLC
 
A Little Security For Big Data
A Little Security For Big DataA Little Security For Big Data
A Little Security For Big Data
Saurabh Kheni
 
eDiscovery platform EMEA user conference 2017
eDiscovery platform EMEA user conference 2017eDiscovery platform EMEA user conference 2017
eDiscovery platform EMEA user conference 2017
Veritas Technologies LLC
 
Idc bif2018 praveen raman _v1.0
Idc bif2018 praveen raman _v1.0Idc bif2018 praveen raman _v1.0
Idc bif2018 praveen raman _v1.0
Praveen Raman
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of data
Veritas Technologies LLC
 
Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...
Ulf Mattsson
 
Presentatie_PatentWorkflow
Presentatie_PatentWorkflowPresentatie_PatentWorkflow
Presentatie_PatentWorkflowAnne Le Turnier
 
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Atlanta ISSA  2010 Enterprise Data Protection   Ulf MattssonAtlanta ISSA  2010 Enterprise Data Protection   Ulf Mattsson
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Ulf Mattsson
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
Ulf Mattsson
 
David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storage
Veritas Technologies LLC
 
Michael Kaishar Pci Dss Power Point Presentation
Michael Kaishar Pci Dss Power Point PresentationMichael Kaishar Pci Dss Power Point Presentation
Michael Kaishar Pci Dss Power Point Presentation
Michael Kaishar, MSIA | CISSP
 
8
88

What's hot (20)

Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafeDs 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
 
Hitachi datasheet-hcp-and-bloombase-storesafe
Hitachi datasheet-hcp-and-bloombase-storesafeHitachi datasheet-hcp-and-bloombase-storesafe
Hitachi datasheet-hcp-and-bloombase-storesafe
 
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
Keynote session – Mitigate risks and stay compliant with Chris Bridgland and ...
 
Rama Kolappan – The multi-cloud geared for the digital business
Rama Kolappan – The multi-cloud geared for the digital businessRama Kolappan – The multi-cloud geared for the digital business
Rama Kolappan – The multi-cloud geared for the digital business
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
DS_Appliance_Datasheet
DS_Appliance_DatasheetDS_Appliance_Datasheet
DS_Appliance_Datasheet
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYC
 
DS_Appliance_Datasheet
DS_Appliance_DatasheetDS_Appliance_Datasheet
DS_Appliance_Datasheet
 
Phil Carter of IDC: An analyst point of view
Phil Carter of IDC: An analyst point of viewPhil Carter of IDC: An analyst point of view
Phil Carter of IDC: An analyst point of view
 
A Little Security For Big Data
A Little Security For Big DataA Little Security For Big Data
A Little Security For Big Data
 
eDiscovery platform EMEA user conference 2017
eDiscovery platform EMEA user conference 2017eDiscovery platform EMEA user conference 2017
eDiscovery platform EMEA user conference 2017
 
Idc bif2018 praveen raman _v1.0
Idc bif2018 praveen raman _v1.0Idc bif2018 praveen raman _v1.0
Idc bif2018 praveen raman _v1.0
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of data
 
Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...
 
Presentatie_PatentWorkflow
Presentatie_PatentWorkflowPresentatie_PatentWorkflow
Presentatie_PatentWorkflow
 
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Atlanta ISSA  2010 Enterprise Data Protection   Ulf MattssonAtlanta ISSA  2010 Enterprise Data Protection   Ulf Mattsson
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storage
 
Michael Kaishar Pci Dss Power Point Presentation
Michael Kaishar Pci Dss Power Point PresentationMichael Kaishar Pci Dss Power Point Presentation
Michael Kaishar Pci Dss Power Point Presentation
 
8
88
8
 

Similar to G-Research - Privacera - Dataworks Summit 2018

Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...
DataWorks Summit
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analytics
Christian Have
 
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
Scalable Data Management for Kafka and Beyond | Dan Rice, BigIDScalable Data Management for Kafka and Beyond | Dan Rice, BigID
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
HostedbyConfluent
 
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformBest Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data Platform
MapR Technologies
 
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
North Texas Chapter of the ISSA
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Veritas Technologies LLC
 
Article data-centric security key to cloud and digital business
Article   data-centric security key to cloud and digital businessArticle   data-centric security key to cloud and digital business
Article data-centric security key to cloud and digital business
Ulf Mattsson
 
Data centric security key to cloud and digital business
Data centric security key to cloud and digital businessData centric security key to cloud and digital business
Data centric security key to cloud and digital business
Ulf Mattsson
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
balasahebcomp
 
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
44CON
 
Security Analytics Beyond Cyber
Security Analytics Beyond CyberSecurity Analytics Beyond Cyber
Security Analytics Beyond Cyber
Phil Huggins FBCS CITP
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
BigDataEverywhere
 
Big Data Expo 2015 - Data Science Innovation Privacy Considerations
Big Data Expo 2015 - Data Science Innovation Privacy ConsiderationsBig Data Expo 2015 - Data Science Innovation Privacy Considerations
Big Data Expo 2015 - Data Science Innovation Privacy Considerations
BigDataExpo
 
GDPR Part 3: Practical Quest
GDPR Part 3: Practical QuestGDPR Part 3: Practical Quest
GDPR Part 3: Practical Quest
Adrian Dumitrescu
 
Generating actionable consumer insights from analytics - Telekom R&D
Generating actionable consumer insights from analytics - Telekom R&DGenerating actionable consumer insights from analytics - Telekom R&D
Generating actionable consumer insights from analytics - Telekom R&DMerlien Institute
 
Managing Data from the Edge to HPC
Managing Data from the Edge to HPCManaging Data from the Edge to HPC
Managing Data from the Edge to HPC
inside-BigData.com
 
Widepoint orc thales webinar 111313d - nov 2013
Widepoint orc thales webinar 111313d - nov 2013Widepoint orc thales webinar 111313d - nov 2013
Widepoint orc thales webinar 111313d - nov 2013
Federation for Identity and Cross-Credentialing Systems (FiXs)
 
Data lake protection ft 3119 -ver1.0
Data lake protection   ft 3119 -ver1.0Data lake protection   ft 3119 -ver1.0
Data lake protection ft 3119 -ver1.0
Finto Thomas , CISSP, TOGAF, CCSP, ITIL. JNCIS
 
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
DataStax
 
Expert Craftsmanship: NoCode Platform for Smooth Project Implementation
Expert Craftsmanship: NoCode Platform for Smooth Project ImplementationExpert Craftsmanship: NoCode Platform for Smooth Project Implementation
Expert Craftsmanship: NoCode Platform for Smooth Project Implementation
Blaze Tech
 

Similar to G-Research - Privacera - Dataworks Summit 2018 (20)

Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...Building trust in your data lake. A fintech case study on automated data disc...
Building trust in your data lake. A fintech case study on automated data disc...
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analytics
 
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
Scalable Data Management for Kafka and Beyond | Dan Rice, BigIDScalable Data Management for Kafka and Beyond | Dan Rice, BigID
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
 
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformBest Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data Platform
 
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
NTXISSACSC2 - Information Security Opportunity: Embracing Big Data with Peopl...
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
 
Article data-centric security key to cloud and digital business
Article   data-centric security key to cloud and digital businessArticle   data-centric security key to cloud and digital business
Article data-centric security key to cloud and digital business
 
Data centric security key to cloud and digital business
Data centric security key to cloud and digital businessData centric security key to cloud and digital business
Data centric security key to cloud and digital business
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
 
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
44CON 2014 - Security Analytics Beyond Cyber, Phil Huggins
 
Security Analytics Beyond Cyber
Security Analytics Beyond CyberSecurity Analytics Beyond Cyber
Security Analytics Beyond Cyber
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Big Data Expo 2015 - Data Science Innovation Privacy Considerations
Big Data Expo 2015 - Data Science Innovation Privacy ConsiderationsBig Data Expo 2015 - Data Science Innovation Privacy Considerations
Big Data Expo 2015 - Data Science Innovation Privacy Considerations
 
GDPR Part 3: Practical Quest
GDPR Part 3: Practical QuestGDPR Part 3: Practical Quest
GDPR Part 3: Practical Quest
 
Generating actionable consumer insights from analytics - Telekom R&D
Generating actionable consumer insights from analytics - Telekom R&DGenerating actionable consumer insights from analytics - Telekom R&D
Generating actionable consumer insights from analytics - Telekom R&D
 
Managing Data from the Edge to HPC
Managing Data from the Edge to HPCManaging Data from the Edge to HPC
Managing Data from the Edge to HPC
 
Widepoint orc thales webinar 111313d - nov 2013
Widepoint orc thales webinar 111313d - nov 2013Widepoint orc thales webinar 111313d - nov 2013
Widepoint orc thales webinar 111313d - nov 2013
 
Data lake protection ft 3119 -ver1.0
Data lake protection   ft 3119 -ver1.0Data lake protection   ft 3119 -ver1.0
Data lake protection ft 3119 -ver1.0
 
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
How Virtual Reality and Machine Learning Are Powering the New Age of Network ...
 
Expert Craftsmanship: NoCode Platform for Smooth Project Implementation
Expert Craftsmanship: NoCode Platform for Smooth Project ImplementationExpert Craftsmanship: NoCode Platform for Smooth Project Implementation
Expert Craftsmanship: NoCode Platform for Smooth Project Implementation
 

Recently uploaded

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 

Recently uploaded (20)

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 

G-Research - Privacera - Dataworks Summit 2018

  • 1. Fintech case study on Automated Discovery, Control and Monitoring Data
  • 3. AGENDA ▸Introduction ▸Challenges with big data ▸G-Research overview ▸Journey with HDP ▸Challenges and learnings ▸Recommendation
  • 4. PARTNE RSGLOBAL SOLUTIONS FOR DISCOVERING AND MANAGING SENSITIVE DATA ABOUT PRIVACERA
  • 5. DETECT MALICIOUS OR ACCIDENTAL USE CONTROL ANONYMIZE DATA/ RESTRICT ACCESS DISCOVER WHAT TYPE OF DATA STORED AND WHERE? REPORT SECURITY AND COMPLIANCE REPORTING SOLUTIONS FOR MANAGING SENSITIVE DATA
  • 6. CHALLENGES WITH BIG DATA ▸High volume of data ▸Stringent security and privacy requirements ▸Multi-tenant environments ▸Multiple methods for accessing data ▸Business users using newer tools ▸Traditional tools for security and governance “retrofitted” for the modern data architecture
  • 9. ABOUT G-RESEARCH ▸Quantitative research technology company ▸Statistical analysis and Big Data pipelines to recognise patterns and extract insights from very large market datasets ▸Forecasting analytics to predict variances in financial markets ▸Clients operate in capital markets globally ▸Undergoing very aggressive growth and adoption of cutting edge technology
  • 10. DATASETS ▸Market Data ▸Level 1 (top of book) ▸Basic market data (instrument, bid price, bid size, ask price, ask size) ▸Level 2 (order book or market depth) ▸Richer data (highest bid prices, lowest ask prices)
  • 11. DATASETS ▸Market Data ▸Level 1 (top of book) ▸Level 2 (order book or market depth) ▸Often represented as incremental updates at nanosecond granularity ▸HUGE dataset! Time Quantity Bid Ask Quantity Time 12:32:16 120 12.25 12.25 150 12:32:55 12:31:01 50 12.26 12.27 60 12:31:19 12:33:45 150 12.25 12.27 100 12:34:27
  • 12. DATASETS ▸Other datasets include ▸Datapacks for additional enrichment ▸Risk analysis on positions, portfolios and strategies ▸Reference data about markets structures and corporate entities ▸News feeds ▸…
  • 13. HADOOP DATA PLATFORM – REASONS High Volume Fast Processing Multi Tenant Flexibility
  • 15. CORE REQUIREMENTS FOR DATA PLATFORM Security ▸Protect intellectual property of the company ▸Datasets processed through the pipeline ▸...but also code Integration ▸Integration with existing security systems and policies ▸Authentication ▸Encryption ▸Strict and flexible authorization
  • 16. CORE REQUIREMENTS.. CONTD.. Governance ▸Governance ▸Ability to find data easily enabling collaboration ▸Track data changes, impact, lineage and maintain consistency Multi Tenancy and Scalability ▸Multi-tenancy ▸Variety of development teams need to work on the the same platform and share data and resources ▸Scalability ▸Explosive data growth
  • 17. IMPORTANCE OF GOVERNANCE - Someone who’s been quoted about a billion times “Metadata is the new Data”
  • 18. GOVERNANCE AND SECURITY - SOLUTION DESIGN
  • 19. GOVERNANCE AND SECURITY – SECURITY FOUNDATION Authentication Authorization Auditing Data Protection Kerberos + Knox Ranger + Knox Ranger HDFS Encryption
  • 20. GOVERNANCE AND SECURITY – MANAGING RESTRICTED DATA Data Discovery Access Control Anonymization Monitoring Privacera + Atlas Ranger tag based policies Ranger Dynamic Masking Privacera Custom Spark Lineage
  • 22. CUSTOM METADATA IN ATLAS ▸Custom datasets  Atlas metamodel definitions Type spark_job id cardinality indexable operations cardinality indexable input_data cardinality indexable output_data cardinality indexable Entity Type spark_join_ operation Type string Type spark_dataset Type Dataset Type Process Type spark_operation Attributes Type spark_entity Type Referenceable
  • 23. OUR JOURNEY IN DATA SECURITY ▸Datazone definition to capture data movement ▸Advanced data discovery and tagging ▸Custom lineage applied on our own data types Standard Ranger Policies Tag-based Ranger Policies Comprehensive Tag-based Ranger Policies Privacera Advanced Security Policies Atlas OoB Custom Atlas metamodels Privacera ▸Access management through data tags ▸Basic Ranger policies t=0 t=1 t=2 t=3
  • 24. OUR JOURNEY IN SECURITY MANAGEMENT - MONITORING PUBLIC DATAZONE RESTRICTED DATAZONE TABLE A TABLE B FOLDER A FOLDER B TABLE C TABLE D FOLDER C FOLDER D “Sensitive tags not allowed” “Sensitive tags are OK”
  • 25. KEY CHALLENGES AND LEARNINGS ▸Technical challenges ▸GraphDB does not scale as metastore ▸~100k’s entities tagged per week ▸Back-end rewritten to only use HBase + Solr ▸Open source flexibility can be a 2-sided coin ▸Business challenges ▸Business process integration
  • 26. SUMMARY ▸Understand your data before expanding your data lake ▸Invest in automated classification and centralized metadata ▸Manage user access through data classification ▸Anonymise data to reduce exposure ▸Monitor the use of data - “trust but verify”