SlideShare a Scribd company logo
1 of 22
Harnessing Big Data Tools in
    Financial Services
          Chris Swan
          @cpswan
Big Data – a little analysis




                               2
Overview
Based on a blog post from April 2012 – http://is.gd/swbdla



                                           Problem Types


                                       Big Data
                         Data Volume


                                                     Quant

                                       Simple


                                        Algorithm Complexity




                                                               3
Simple problems
Low data volume, low algorithm complexity



                                          Problem Types


                                      Big Data
                        Data Volume


                                                    Quant

                                      Simple


                                       Algorithm Complexity




                                                              4
Quant Problems
Any data volume, high algorithm complexity



                                          Problem Types


                                      Big Data
                        Data Volume


                                                    Quant

                                      Simple


                                       Algorithm Complexity




                                                              5
Big Data Problems
High data volume, low algorithm complexity



                       Problem Types
                                                  Types of Big Data Problem:

                   Big Data                       1. Inherent
     Data Volume




                                                  2. More data gives better
                                 Quant            result than more complex
                                                  algorithm
                   Simple


                    Algorithm Complexity




                                                                               6
The good, the bad and the ugly of Big Data


Good
  – Lots of new tools, mostly open source




Bad
  – Term being abused by marketing departments

Ugly
  – Can easily lead to over reliance on systems that lack transparency and ignore specific data
    points
         'Computer says no', but nobody can explain why




                                                                                                  7
Misquoting Roger Needham



Whoever thinks their analytics problem is
          solved by big data,
  doesn’t understand their analytics
   problem and doesn’t understand
                big data



                                        8
Security and Governance




                          9
The priesthood of storage and the cult of the DBA

Enterprise storage systems have (mostly) their own interconnect and their own special
people to look after that, any changes (weekends only) and backups
– The priesthood of storage
Relational Database Management Systems (RDBMS) are about more than just SQL
– Backup and recovery
– Access control
  – Identity management
    – Integration with enterprise directories
– Data security
  – Encryption
– Schema management
  – Glossaries and data dictionaries
DataBase Administrators (DBAs) have become the guardians of all this
– The cult of the DBA
Anything not under the management of the cult doesn't count as being part of the official
'books and records of the firm'
– Or at least that's what they'll tell you


                                                                                        10
NOSQL as a hack around corporate governance
Many 'Big Data' tools also fly under the banner of 'NOSQL'

NOSQL allows for the escape from the clutches of the priesthood of storage and the cult of
the DBA

  The reason for choosing Cassandra (or whatever) for a project might have nothing to do
  with 'Big Data'

  Security is often viewed as an optional non functional requirement
  – Big Data security controls may be less mature than traditional RDBMS
    – So compensating controls must be used for whatever is missing out of the box
  – 3rd party tools market still nascent
    – So less choice for bolt on security

  NOSQL hasn't yet become an integral part of organisation structure/culture




                                                                                         11
Data Centre implications




                           12
Simple problems
Low data volume, low algorithm complexity


                                                 This is the type of problem that
                       Problem Types             has traditionally worked a
                                                 single machine (the database
                                                 server) really hard.
                   Big Data                      • Reliability has always been a
     Data Volume




                                                    concern for single box designs
                                 Quant              (though this is a solved problem
                                                    where synchronous replication is
                                                    used).
                   Simple                             • This is what makes SAN
                                                          attractive
                                                 • No special considerations for
                    Algorithm Complexity            network and storage




                                                                                       13
Quant Problems – the easy part
Any data volume, high algorithm complexity


                                                High Performance Compute (HPC)
                       Problem Types            impact is well understood:
                                                • Lots of machines at the optimum
                                                   CPU/$ price point
                   Big Data                         • Previously optimised for CAPEX
     Data Volume




                                                    • Present trend is to optimise for
                                                        TCO (especially energy)
                                 Quant          • No real challenges around storage
                                                   or interconnect
                   Simple         HPC               • Though some local caching
                                                        using a 'data grid' may improve
                                                        duty cycle over a pure
                    Algorithm Complexity                stateless design




                                                                                     14
Quant Problems – the hard part
Any data volume, high algorithm complexity


                                                Data intensive HPC shifts the focus to
                       Problem Types            interconnect and storage:
                                                • Fast network (>1gB Ethernet) may
                                  Data             be needed to get data where it's
                   Big Data     intensive          needed
     Data Volume




                                   HPC               • 10gB Ethernet (or faster)
                                                     • Infiniband if latency is an issue
                                 Quant          • SANs don't work at this scale (and
                                                   are too expensive anyway)
                   Simple                            • Data needs to be sharded
                                                        across inexpensive local discs

                    Algorithm Complexity




                                                                                      15
Big Data Problems – look easy now
High data volume, low algorithm complexity



                       Problem Types
                                                 Typically less demanding on
                                                 interconnect than data intensive
                   Big Data                      HPC workloads:
                                                 • Ethernet likely to be sufficient
     Data Volume




                                                 Many things that wear the 'big
                                   Quant         data' label are in fact solutions
                                                 for sharding large data sets
                   Simple                        across inexpensive local disc
                                                 • E.g. This is what the Hadoop
                                                    Distributed File System (HDFS)
                    Algorithm Complexity            does




                                                                                      16
The role of SSD

At least for the time being this is a delicate balance between capacity and speed
Applications that become I/O bound with traditional disc need to make a value judgement
on scaling the storage element (switch to SSD) versus scaling the entire solution (buy
more servers and electricity).
– Falling prices will tilt balance towards SSD
Worth noting that many traditional databases will now fit into RAM (especially if spread
across a number of machines), which leaves an emerging SSD sweet spot across the
middle of the chart.
Attention needs to be paid to the 'impedance mismatch' between contemporary workloads
(like Cassandra) and contemporary storage (like SSD). This is not handled well by
decades old file systems (and for a long time the RDBMS vendors have cheated by having
their own file systems).

SSD will hit the feature size scaling wall at the same time as CPU
– Spinning disc (and other technologies will not)
– Enjoy the ride whilst it lasts (perhaps not too much longer)
  – Interesting things will happen when things we've become accustomed to having
    exponential growth flatten out whilst other growth curves continue

                                                                                     17
The future of block storage
SAN/NAS stops being a category in its own right and becomes part of the software
defined data centre
– SAN (and especially dedicated fibre channel networks) goes away altogether
– NAS folds into the commodity server space – looks like DAS at the hardware layer but
  behaves like NAS from a software perspective
– Dedicated puddles of software defined storage will be aligned to 'big data', but the overall
  capacity management should ultimately be defined by the first exhausted commodity (CPU,
  RAM, I/O, disc)




                                                                                             18
Data Centre impact - Summary




             >        Simple energy efficient servers
                      With local disk




             <        Big boxes
                      Connected to SAN


Everything looks the same (less diversity in hardware)
Everything uses the minimum possible energy
'Big Data' is a part of the overall capacity management problem
Data centre automation will solve for optimal equipment/energy use



                                                                     19
Wrapping up




              20
Conclusions


'Big Data' is a label that used to describe an emerging category of tools that are useful for
problems with large data volume and low algorithmic complexity

The technical and organisational means to provide security and governance for these
tools are less mature than for traditional databases

Data centres will fill up with more low end servers using local storage (and these will likely
be the designs emerging from hyperscale operators that are optimised for manufacturing
and energy efficiency)




                                                                                           21
Questions?




             22

More Related Content

Viewers also liked

Docker - a lot changed in a year
Docker - a lot changed in a yearDocker - a lot changed in a year
Docker - a lot changed in a yearChris Swan
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the mythsChris Swan
 
Security protocols in constrained environments
Security protocols in constrained environments Security protocols in constrained environments
Security protocols in constrained environments Chris Swan
 
Where is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloudWhere is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloudChris Swan
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?Chris Swan
 
CloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVRCloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVRChris Swan
 
Deploying Security at Scale
Deploying Security at ScaleDeploying Security at Scale
Deploying Security at ScaleChris Swan
 
Digital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused FinanceDigital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused FinanceChris Swan
 
How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?Chris Swan
 

Viewers also liked (9)

Docker - a lot changed in a year
Docker - a lot changed in a yearDocker - a lot changed in a year
Docker - a lot changed in a year
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
 
Security protocols in constrained environments
Security protocols in constrained environments Security protocols in constrained environments
Security protocols in constrained environments
 
Where is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloudWhere is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloud
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?
 
CloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVRCloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVR
 
Deploying Security at Scale
Deploying Security at ScaleDeploying Security at Scale
Deploying Security at Scale
 
Digital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused FinanceDigital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused Finance
 
How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?
 

Similar to IET harnessing big data tools in financial services

Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12Chris Purrington
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500Accenture
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500Accenture
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloudtervela
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big data movement webcast
Big data movement webcastBig data movement webcast
Big data movement webcasttervela
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualizationtervela
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Mark Tapley
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Cloud Computing for Utilities
Cloud Computing for UtilitiesCloud Computing for Utilities
Cloud Computing for UtilitiesEsri
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 

Similar to IET harnessing big data tools in financial services (20)

Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud
 
Big Data
Big DataBig Data
Big Data
 
Big data movement webcast
Big data movement webcastBig data movement webcast
Big data movement webcast
 
Big data management
Big data managementBig data management
Big data management
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
DCIM
DCIMDCIM
DCIM
 
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualization
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Cloud Computing for Utilities
Cloud Computing for UtilitiesCloud Computing for Utilities
Cloud Computing for Utilities
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

More from Chris Swan

LNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data ServicesLNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data ServicesChris Swan
 
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsSOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsChris Swan
 
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfAll Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfChris Swan
 
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-VFluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-VChris Swan
 
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationQConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationChris Swan
 
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterFlutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterChris Swan
 
QConSF 2022 - Backends in Dart
QConSF 2022 - Backends in DartQConSF 2022 - Backends in Dart
QConSF 2022 - Backends in DartChris Swan
 
London IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoTLondon IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoTChris Swan
 
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and FlutterFlutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and FlutterChris Swan
 
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?Chris Swan
 
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Chris Swan
 
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and FlutterFlutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and FlutterChris Swan
 
Full Stack Squared 2022 - Power of Open Source
Full Stack Squared 2022   - Power of Open SourceFull Stack Squared 2022   - Power of Open Source
Full Stack Squared 2022 - Power of Open SourceChris Swan
 
Flutter Vikings 2022 - Full Stack Dart
Flutter Vikings 2022  - Full Stack DartFlutter Vikings 2022  - Full Stack Dart
Flutter Vikings 2022 - Full Stack DartChris Swan
 
Droidcon London 2021 - Full Stack Dart
Droidcon London 2021   - Full Stack DartDroidcon London 2021   - Full Stack Dart
Droidcon London 2021 - Full Stack DartChris Swan
 
Keeping a project going
Keeping a project goingKeeping a project going
Keeping a project goingChris Swan
 
Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021Chris Swan
 
TMS9995 on RC2014
TMS9995 on RC2014TMS9995 on RC2014
TMS9995 on RC2014Chris Swan
 
CloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 IntroCloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 IntroChris Swan
 
DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'Chris Swan
 

More from Chris Swan (20)

LNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data ServicesLNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data Services
 
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsSOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
 
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfAll Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
 
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-VFluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
 
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationQConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
 
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterFlutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
 
QConSF 2022 - Backends in Dart
QConSF 2022 - Backends in DartQConSF 2022 - Backends in Dart
QConSF 2022 - Backends in Dart
 
London IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoTLondon IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoT
 
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and FlutterFlutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
 
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
 
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...
 
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and FlutterFlutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
 
Full Stack Squared 2022 - Power of Open Source
Full Stack Squared 2022   - Power of Open SourceFull Stack Squared 2022   - Power of Open Source
Full Stack Squared 2022 - Power of Open Source
 
Flutter Vikings 2022 - Full Stack Dart
Flutter Vikings 2022  - Full Stack DartFlutter Vikings 2022  - Full Stack Dart
Flutter Vikings 2022 - Full Stack Dart
 
Droidcon London 2021 - Full Stack Dart
Droidcon London 2021   - Full Stack DartDroidcon London 2021   - Full Stack Dart
Droidcon London 2021 - Full Stack Dart
 
Keeping a project going
Keeping a project goingKeeping a project going
Keeping a project going
 
Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021
 
TMS9995 on RC2014
TMS9995 on RC2014TMS9995 on RC2014
TMS9995 on RC2014
 
CloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 IntroCloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 Intro
 
DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'
 

Recently uploaded

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

IET harnessing big data tools in financial services

  • 1. Harnessing Big Data Tools in Financial Services Chris Swan @cpswan
  • 2. Big Data – a little analysis 2
  • 3. Overview Based on a blog post from April 2012 – http://is.gd/swbdla Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 3
  • 4. Simple problems Low data volume, low algorithm complexity Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 4
  • 5. Quant Problems Any data volume, high algorithm complexity Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 5
  • 6. Big Data Problems High data volume, low algorithm complexity Problem Types Types of Big Data Problem: Big Data 1. Inherent Data Volume 2. More data gives better Quant result than more complex algorithm Simple Algorithm Complexity 6
  • 7. The good, the bad and the ugly of Big Data Good – Lots of new tools, mostly open source Bad – Term being abused by marketing departments Ugly – Can easily lead to over reliance on systems that lack transparency and ignore specific data points 'Computer says no', but nobody can explain why 7
  • 8. Misquoting Roger Needham Whoever thinks their analytics problem is solved by big data, doesn’t understand their analytics problem and doesn’t understand big data 8
  • 10. The priesthood of storage and the cult of the DBA Enterprise storage systems have (mostly) their own interconnect and their own special people to look after that, any changes (weekends only) and backups – The priesthood of storage Relational Database Management Systems (RDBMS) are about more than just SQL – Backup and recovery – Access control – Identity management – Integration with enterprise directories – Data security – Encryption – Schema management – Glossaries and data dictionaries DataBase Administrators (DBAs) have become the guardians of all this – The cult of the DBA Anything not under the management of the cult doesn't count as being part of the official 'books and records of the firm' – Or at least that's what they'll tell you 10
  • 11. NOSQL as a hack around corporate governance Many 'Big Data' tools also fly under the banner of 'NOSQL' NOSQL allows for the escape from the clutches of the priesthood of storage and the cult of the DBA The reason for choosing Cassandra (or whatever) for a project might have nothing to do with 'Big Data' Security is often viewed as an optional non functional requirement – Big Data security controls may be less mature than traditional RDBMS – So compensating controls must be used for whatever is missing out of the box – 3rd party tools market still nascent – So less choice for bolt on security NOSQL hasn't yet become an integral part of organisation structure/culture 11
  • 13. Simple problems Low data volume, low algorithm complexity This is the type of problem that Problem Types has traditionally worked a single machine (the database server) really hard. Big Data • Reliability has always been a Data Volume concern for single box designs Quant (though this is a solved problem where synchronous replication is used). Simple • This is what makes SAN attractive • No special considerations for Algorithm Complexity network and storage 13
  • 14. Quant Problems – the easy part Any data volume, high algorithm complexity High Performance Compute (HPC) Problem Types impact is well understood: • Lots of machines at the optimum CPU/$ price point Big Data • Previously optimised for CAPEX Data Volume • Present trend is to optimise for TCO (especially energy) Quant • No real challenges around storage or interconnect Simple HPC • Though some local caching using a 'data grid' may improve duty cycle over a pure Algorithm Complexity stateless design 14
  • 15. Quant Problems – the hard part Any data volume, high algorithm complexity Data intensive HPC shifts the focus to Problem Types interconnect and storage: • Fast network (>1gB Ethernet) may Data be needed to get data where it's Big Data intensive needed Data Volume HPC • 10gB Ethernet (or faster) • Infiniband if latency is an issue Quant • SANs don't work at this scale (and are too expensive anyway) Simple • Data needs to be sharded across inexpensive local discs Algorithm Complexity 15
  • 16. Big Data Problems – look easy now High data volume, low algorithm complexity Problem Types Typically less demanding on interconnect than data intensive Big Data HPC workloads: • Ethernet likely to be sufficient Data Volume Many things that wear the 'big Quant data' label are in fact solutions for sharding large data sets Simple across inexpensive local disc • E.g. This is what the Hadoop Distributed File System (HDFS) Algorithm Complexity does 16
  • 17. The role of SSD At least for the time being this is a delicate balance between capacity and speed Applications that become I/O bound with traditional disc need to make a value judgement on scaling the storage element (switch to SSD) versus scaling the entire solution (buy more servers and electricity). – Falling prices will tilt balance towards SSD Worth noting that many traditional databases will now fit into RAM (especially if spread across a number of machines), which leaves an emerging SSD sweet spot across the middle of the chart. Attention needs to be paid to the 'impedance mismatch' between contemporary workloads (like Cassandra) and contemporary storage (like SSD). This is not handled well by decades old file systems (and for a long time the RDBMS vendors have cheated by having their own file systems). SSD will hit the feature size scaling wall at the same time as CPU – Spinning disc (and other technologies will not) – Enjoy the ride whilst it lasts (perhaps not too much longer) – Interesting things will happen when things we've become accustomed to having exponential growth flatten out whilst other growth curves continue 17
  • 18. The future of block storage SAN/NAS stops being a category in its own right and becomes part of the software defined data centre – SAN (and especially dedicated fibre channel networks) goes away altogether – NAS folds into the commodity server space – looks like DAS at the hardware layer but behaves like NAS from a software perspective – Dedicated puddles of software defined storage will be aligned to 'big data', but the overall capacity management should ultimately be defined by the first exhausted commodity (CPU, RAM, I/O, disc) 18
  • 19. Data Centre impact - Summary > Simple energy efficient servers With local disk < Big boxes Connected to SAN Everything looks the same (less diversity in hardware) Everything uses the minimum possible energy 'Big Data' is a part of the overall capacity management problem Data centre automation will solve for optimal equipment/energy use 19
  • 21. Conclusions 'Big Data' is a label that used to describe an emerging category of tools that are useful for problems with large data volume and low algorithmic complexity The technical and organisational means to provide security and governance for these tools are less mature than for traditional databases Data centres will fill up with more low end servers using local storage (and these will likely be the designs emerging from hyperscale operators that are optimised for manufacturing and energy efficiency) 21