SlideShare a Scribd company logo
Harnessing Big Data Tools in
    Financial Services
          Chris Swan
          @cpswan
Big Data – a little analysis




                               2
Overview
Based on a blog post from April 2012 – http://is.gd/swbdla



                                           Problem Types


                                       Big Data
                         Data Volume


                                                     Quant

                                       Simple


                                        Algorithm Complexity




                                                               3
Simple problems
Low data volume, low algorithm complexity



                                          Problem Types


                                      Big Data
                        Data Volume


                                                    Quant

                                      Simple


                                       Algorithm Complexity




                                                              4
Quant Problems
Any data volume, high algorithm complexity



                                          Problem Types


                                      Big Data
                        Data Volume


                                                    Quant

                                      Simple


                                       Algorithm Complexity




                                                              5
Big Data Problems
High data volume, low algorithm complexity



                       Problem Types
                                                  Types of Big Data Problem:

                   Big Data                       1. Inherent
     Data Volume




                                                  2. More data gives better
                                 Quant            result than more complex
                                                  algorithm
                   Simple


                    Algorithm Complexity




                                                                               6
The good, the bad and the ugly of Big Data


Good
  – Lots of new tools, mostly open source




Bad
  – Term being abused by marketing departments

Ugly
  – Can easily lead to over reliance on systems that lack transparency and ignore specific data
    points
         'Computer says no', but nobody can explain why




                                                                                                  7
Misquoting Roger Needham



Whoever thinks their analytics problem is
          solved by big data,
  doesn’t understand their analytics
   problem and doesn’t understand
                big data



                                        8
Security and Governance




                          9
The priesthood of storage and the cult of the DBA

Enterprise storage systems have (mostly) their own interconnect and their own special
people to look after that, any changes (weekends only) and backups
– The priesthood of storage
Relational Database Management Systems (RDBMS) are about more than just SQL
– Backup and recovery
– Access control
  – Identity management
    – Integration with enterprise directories
– Data security
  – Encryption
– Schema management
  – Glossaries and data dictionaries
DataBase Administrators (DBAs) have become the guardians of all this
– The cult of the DBA
Anything not under the management of the cult doesn't count as being part of the official
'books and records of the firm'
– Or at least that's what they'll tell you


                                                                                        10
NOSQL as a hack around corporate governance
Many 'Big Data' tools also fly under the banner of 'NOSQL'

NOSQL allows for the escape from the clutches of the priesthood of storage and the cult of
the DBA

  The reason for choosing Cassandra (or whatever) for a project might have nothing to do
  with 'Big Data'

  Security is often viewed as an optional non functional requirement
  – Big Data security controls may be less mature than traditional RDBMS
    – So compensating controls must be used for whatever is missing out of the box
  – 3rd party tools market still nascent
    – So less choice for bolt on security

  NOSQL hasn't yet become an integral part of organisation structure/culture




                                                                                         11
Data Centre implications




                           12
Simple problems
Low data volume, low algorithm complexity


                                                 This is the type of problem that
                       Problem Types             has traditionally worked a
                                                 single machine (the database
                                                 server) really hard.
                   Big Data                      • Reliability has always been a
     Data Volume




                                                    concern for single box designs
                                 Quant              (though this is a solved problem
                                                    where synchronous replication is
                                                    used).
                   Simple                             • This is what makes SAN
                                                          attractive
                                                 • No special considerations for
                    Algorithm Complexity            network and storage




                                                                                       13
Quant Problems – the easy part
Any data volume, high algorithm complexity


                                                High Performance Compute (HPC)
                       Problem Types            impact is well understood:
                                                • Lots of machines at the optimum
                                                   CPU/$ price point
                   Big Data                         • Previously optimised for CAPEX
     Data Volume




                                                    • Present trend is to optimise for
                                                        TCO (especially energy)
                                 Quant          • No real challenges around storage
                                                   or interconnect
                   Simple         HPC               • Though some local caching
                                                        using a 'data grid' may improve
                                                        duty cycle over a pure
                    Algorithm Complexity                stateless design




                                                                                     14
Quant Problems – the hard part
Any data volume, high algorithm complexity


                                                Data intensive HPC shifts the focus to
                       Problem Types            interconnect and storage:
                                                • Fast network (>1gB Ethernet) may
                                  Data             be needed to get data where it's
                   Big Data     intensive          needed
     Data Volume




                                   HPC               • 10gB Ethernet (or faster)
                                                     • Infiniband if latency is an issue
                                 Quant          • SANs don't work at this scale (and
                                                   are too expensive anyway)
                   Simple                            • Data needs to be sharded
                                                        across inexpensive local discs

                    Algorithm Complexity




                                                                                      15
Big Data Problems – look easy now
High data volume, low algorithm complexity



                       Problem Types
                                                 Typically less demanding on
                                                 interconnect than data intensive
                   Big Data                      HPC workloads:
                                                 • Ethernet likely to be sufficient
     Data Volume




                                                 Many things that wear the 'big
                                   Quant         data' label are in fact solutions
                                                 for sharding large data sets
                   Simple                        across inexpensive local disc
                                                 • E.g. This is what the Hadoop
                                                    Distributed File System (HDFS)
                    Algorithm Complexity            does




                                                                                      16
The role of SSD

At least for the time being this is a delicate balance between capacity and speed
Applications that become I/O bound with traditional disc need to make a value judgement
on scaling the storage element (switch to SSD) versus scaling the entire solution (buy
more servers and electricity).
– Falling prices will tilt balance towards SSD
Worth noting that many traditional databases will now fit into RAM (especially if spread
across a number of machines), which leaves an emerging SSD sweet spot across the
middle of the chart.
Attention needs to be paid to the 'impedance mismatch' between contemporary workloads
(like Cassandra) and contemporary storage (like SSD). This is not handled well by
decades old file systems (and for a long time the RDBMS vendors have cheated by having
their own file systems).

SSD will hit the feature size scaling wall at the same time as CPU
– Spinning disc (and other technologies will not)
– Enjoy the ride whilst it lasts (perhaps not too much longer)
  – Interesting things will happen when things we've become accustomed to having
    exponential growth flatten out whilst other growth curves continue

                                                                                     17
The future of block storage
SAN/NAS stops being a category in its own right and becomes part of the software
defined data centre
– SAN (and especially dedicated fibre channel networks) goes away altogether
– NAS folds into the commodity server space – looks like DAS at the hardware layer but
  behaves like NAS from a software perspective
– Dedicated puddles of software defined storage will be aligned to 'big data', but the overall
  capacity management should ultimately be defined by the first exhausted commodity (CPU,
  RAM, I/O, disc)




                                                                                             18
Data Centre impact - Summary




             >        Simple energy efficient servers
                      With local disk




             <        Big boxes
                      Connected to SAN


Everything looks the same (less diversity in hardware)
Everything uses the minimum possible energy
'Big Data' is a part of the overall capacity management problem
Data centre automation will solve for optimal equipment/energy use



                                                                     19
Wrapping up




              20
Conclusions


'Big Data' is a label that used to describe an emerging category of tools that are useful for
problems with large data volume and low algorithmic complexity

The technical and organisational means to provide security and governance for these
tools are less mature than for traditional databases

Data centres will fill up with more low end servers using local storage (and these will likely
be the designs emerging from hyperscale operators that are optimised for manufacturing
and energy efficiency)




                                                                                           21
Questions?




             22

More Related Content

Viewers also liked

Docker - a lot changed in a year
Docker - a lot changed in a yearDocker - a lot changed in a year
Docker - a lot changed in a year
Chris Swan
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
Chris Swan
 
Security protocols in constrained environments
Security protocols in constrained environments Security protocols in constrained environments
Security protocols in constrained environments
Chris Swan
 
Where is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloudWhere is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloud
Chris Swan
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?
Chris Swan
 
CloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVRCloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVR
Chris Swan
 
Deploying Security at Scale
Deploying Security at ScaleDeploying Security at Scale
Deploying Security at Scale
Chris Swan
 
Digital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused FinanceDigital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused Finance
Chris Swan
 
How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?
Chris Swan
 

Viewers also liked (9)

Docker - a lot changed in a year
Docker - a lot changed in a yearDocker - a lot changed in a year
Docker - a lot changed in a year
 
Big data debunking some of the myths
Big data debunking some of the mythsBig data debunking some of the myths
Big data debunking some of the myths
 
Security protocols in constrained environments
Security protocols in constrained environments Security protocols in constrained environments
Security protocols in constrained environments
 
Where is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloudWhere is my big data: security, privacy and jurisdictions in the cloud
Where is my big data: security, privacy and jurisdictions in the cloud
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?
 
CloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVRCloudCamp London 15 Sep 2016 - WebVR
CloudCamp London 15 Sep 2016 - WebVR
 
Deploying Security at Scale
Deploying Security at ScaleDeploying Security at Scale
Deploying Security at Scale
 
Digital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused FinanceDigital Banking Creates Opportunity for Customer-Focused Finance
Digital Banking Creates Opportunity for Customer-Focused Finance
 
How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?How do I do DevOps when all I have is Ops?
How do I do DevOps when all I have is Ops?
 

Similar to IET harnessing big data tools in financial services

Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12Chris Purrington
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500Accenture
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500Accenture
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud
tervela
 
Big Data
Big DataBig Data
Big Data
NGDATA
 
Big data movement webcast
Big data movement webcastBig data movement webcast
Big data movement webcast
tervela
 
Big data management
Big data managementBig data management
Big data management
zeba khanam
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
mark madsen
 
DCIM
DCIMDCIM
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualization
tervela
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...
Mark Tapley
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Cloud Computing for Utilities
Cloud Computing for UtilitiesCloud Computing for Utilities
Cloud Computing for Utilities
Esri
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 

Similar to IET harnessing big data tools in financial services (20)

Chris swan big data - a little analysis - cloud camp london 24.10.12
Chris swan   big data - a little analysis - cloud camp london 24.10.12Chris swan   big data - a little analysis - cloud camp london 24.10.12
Chris swan big data - a little analysis - cloud camp london 24.10.12
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
 
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud
 
Big Data
Big DataBig Data
Big Data
 
Big data movement webcast
Big data movement webcastBig data movement webcast
Big data movement webcast
 
Big data management
Big data managementBig data management
Big data management
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
DCIM
DCIMDCIM
DCIM
 
Big Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & VirtualizationBig Data: Movement, Warehousing, & Virtualization
Big Data: Movement, Warehousing, & Virtualization
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Cloud Computing for Utilities
Cloud Computing for UtilitiesCloud Computing for Utilities
Cloud Computing for Utilities
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

More from Chris Swan

LNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data ServicesLNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data Services
Chris Swan
 
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsSOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
Chris Swan
 
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfAll Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
Chris Swan
 
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-VFluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Chris Swan
 
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationQConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
Chris Swan
 
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterFlutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Chris Swan
 
QConSF 2022 - Backends in Dart
QConSF 2022 - Backends in DartQConSF 2022 - Backends in Dart
QConSF 2022 - Backends in Dart
Chris Swan
 
London IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoTLondon IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoT
Chris Swan
 
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and FlutterFlutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
Chris Swan
 
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
Chris Swan
 
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Chris Swan
 
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and FlutterFlutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
Chris Swan
 
Full Stack Squared 2022 - Power of Open Source
Full Stack Squared 2022   - Power of Open SourceFull Stack Squared 2022   - Power of Open Source
Full Stack Squared 2022 - Power of Open Source
Chris Swan
 
Flutter Vikings 2022 - Full Stack Dart
Flutter Vikings 2022  - Full Stack DartFlutter Vikings 2022  - Full Stack Dart
Flutter Vikings 2022 - Full Stack Dart
Chris Swan
 
Droidcon London 2021 - Full Stack Dart
Droidcon London 2021   - Full Stack DartDroidcon London 2021   - Full Stack Dart
Droidcon London 2021 - Full Stack Dart
Chris Swan
 
Keeping a project going
Keeping a project goingKeeping a project going
Keeping a project going
Chris Swan
 
Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021
Chris Swan
 
TMS9995 on RC2014
TMS9995 on RC2014TMS9995 on RC2014
TMS9995 on RC2014
Chris Swan
 
CloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 IntroCloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 Intro
Chris Swan
 
DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'
Chris Swan
 

More from Chris Swan (20)

LNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data ServicesLNETM - Atsign - Privacy with Personal Data Services
LNETM - Atsign - Privacy with Personal Data Services
 
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF ScorecardsSOOCon24 - Showing that you care about security - OpenSSF Scorecards
SOOCon24 - Showing that you care about security - OpenSSF Scorecards
 
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdfAll Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
All Day DevOps 2023 - Implementing OSSF Scorecards Across an Organisation.pdf
 
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-VFluttercon Berlin 23 - Dart & Flutter on RISC-V
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
 
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an OrganisationQConNY 2023 - Implementing OSSF Scorecards Across an Organisation
QConNY 2023 - Implementing OSSF Scorecards Across an Organisation
 
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and FlutterFlutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
Flutter SV Meetup Oct 2022 - End to end encrypted IoT with Dart and Flutter
 
QConSF 2022 - Backends in Dart
QConSF 2022 - Backends in DartQConSF 2022 - Backends in Dart
QConSF 2022 - Backends in Dart
 
London IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoTLondon IoT Meetup Sep 2022 - End to end encrypted IoT
London IoT Meetup Sep 2022 - End to end encrypted IoT
 
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and FlutterFlutter Vikings 2022 - End to end IoT with Dart and Flutter
Flutter Vikings 2022 - End to end IoT with Dart and Flutter
 
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
EMFcamp2022 - What if apps logged into you, instead of you logging into apps?
 
Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...Devoxx UK 2022 - Application security: What should the attack landscape look ...
Devoxx UK 2022 - Application security: What should the attack landscape look ...
 
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and FlutterFlutter Festival London 2022 - End to end IoT with Dart and Flutter
Flutter Festival London 2022 - End to end IoT with Dart and Flutter
 
Full Stack Squared 2022 - Power of Open Source
Full Stack Squared 2022   - Power of Open SourceFull Stack Squared 2022   - Power of Open Source
Full Stack Squared 2022 - Power of Open Source
 
Flutter Vikings 2022 - Full Stack Dart
Flutter Vikings 2022  - Full Stack DartFlutter Vikings 2022  - Full Stack Dart
Flutter Vikings 2022 - Full Stack Dart
 
Droidcon London 2021 - Full Stack Dart
Droidcon London 2021   - Full Stack DartDroidcon London 2021   - Full Stack Dart
Droidcon London 2021 - Full Stack Dart
 
Keeping a project going
Keeping a project goingKeeping a project going
Keeping a project going
 
Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021Dart on Arm - Flutter Bangalore June 2021
Dart on Arm - Flutter Bangalore June 2021
 
TMS9995 on RC2014
TMS9995 on RC2014TMS9995 on RC2014
TMS9995 on RC2014
 
CloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 IntroCloudCamp London Nov 2019 Intro
CloudCamp London Nov 2019 Intro
 
DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'DevSecOps Days London - Teaching 'Shift Left on Security'
DevSecOps Days London - Teaching 'Shift Left on Security'
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

IET harnessing big data tools in financial services

  • 1. Harnessing Big Data Tools in Financial Services Chris Swan @cpswan
  • 2. Big Data – a little analysis 2
  • 3. Overview Based on a blog post from April 2012 – http://is.gd/swbdla Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 3
  • 4. Simple problems Low data volume, low algorithm complexity Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 4
  • 5. Quant Problems Any data volume, high algorithm complexity Problem Types Big Data Data Volume Quant Simple Algorithm Complexity 5
  • 6. Big Data Problems High data volume, low algorithm complexity Problem Types Types of Big Data Problem: Big Data 1. Inherent Data Volume 2. More data gives better Quant result than more complex algorithm Simple Algorithm Complexity 6
  • 7. The good, the bad and the ugly of Big Data Good – Lots of new tools, mostly open source Bad – Term being abused by marketing departments Ugly – Can easily lead to over reliance on systems that lack transparency and ignore specific data points 'Computer says no', but nobody can explain why 7
  • 8. Misquoting Roger Needham Whoever thinks their analytics problem is solved by big data, doesn’t understand their analytics problem and doesn’t understand big data 8
  • 10. The priesthood of storage and the cult of the DBA Enterprise storage systems have (mostly) their own interconnect and their own special people to look after that, any changes (weekends only) and backups – The priesthood of storage Relational Database Management Systems (RDBMS) are about more than just SQL – Backup and recovery – Access control – Identity management – Integration with enterprise directories – Data security – Encryption – Schema management – Glossaries and data dictionaries DataBase Administrators (DBAs) have become the guardians of all this – The cult of the DBA Anything not under the management of the cult doesn't count as being part of the official 'books and records of the firm' – Or at least that's what they'll tell you 10
  • 11. NOSQL as a hack around corporate governance Many 'Big Data' tools also fly under the banner of 'NOSQL' NOSQL allows for the escape from the clutches of the priesthood of storage and the cult of the DBA The reason for choosing Cassandra (or whatever) for a project might have nothing to do with 'Big Data' Security is often viewed as an optional non functional requirement – Big Data security controls may be less mature than traditional RDBMS – So compensating controls must be used for whatever is missing out of the box – 3rd party tools market still nascent – So less choice for bolt on security NOSQL hasn't yet become an integral part of organisation structure/culture 11
  • 13. Simple problems Low data volume, low algorithm complexity This is the type of problem that Problem Types has traditionally worked a single machine (the database server) really hard. Big Data • Reliability has always been a Data Volume concern for single box designs Quant (though this is a solved problem where synchronous replication is used). Simple • This is what makes SAN attractive • No special considerations for Algorithm Complexity network and storage 13
  • 14. Quant Problems – the easy part Any data volume, high algorithm complexity High Performance Compute (HPC) Problem Types impact is well understood: • Lots of machines at the optimum CPU/$ price point Big Data • Previously optimised for CAPEX Data Volume • Present trend is to optimise for TCO (especially energy) Quant • No real challenges around storage or interconnect Simple HPC • Though some local caching using a 'data grid' may improve duty cycle over a pure Algorithm Complexity stateless design 14
  • 15. Quant Problems – the hard part Any data volume, high algorithm complexity Data intensive HPC shifts the focus to Problem Types interconnect and storage: • Fast network (>1gB Ethernet) may Data be needed to get data where it's Big Data intensive needed Data Volume HPC • 10gB Ethernet (or faster) • Infiniband if latency is an issue Quant • SANs don't work at this scale (and are too expensive anyway) Simple • Data needs to be sharded across inexpensive local discs Algorithm Complexity 15
  • 16. Big Data Problems – look easy now High data volume, low algorithm complexity Problem Types Typically less demanding on interconnect than data intensive Big Data HPC workloads: • Ethernet likely to be sufficient Data Volume Many things that wear the 'big Quant data' label are in fact solutions for sharding large data sets Simple across inexpensive local disc • E.g. This is what the Hadoop Distributed File System (HDFS) Algorithm Complexity does 16
  • 17. The role of SSD At least for the time being this is a delicate balance between capacity and speed Applications that become I/O bound with traditional disc need to make a value judgement on scaling the storage element (switch to SSD) versus scaling the entire solution (buy more servers and electricity). – Falling prices will tilt balance towards SSD Worth noting that many traditional databases will now fit into RAM (especially if spread across a number of machines), which leaves an emerging SSD sweet spot across the middle of the chart. Attention needs to be paid to the 'impedance mismatch' between contemporary workloads (like Cassandra) and contemporary storage (like SSD). This is not handled well by decades old file systems (and for a long time the RDBMS vendors have cheated by having their own file systems). SSD will hit the feature size scaling wall at the same time as CPU – Spinning disc (and other technologies will not) – Enjoy the ride whilst it lasts (perhaps not too much longer) – Interesting things will happen when things we've become accustomed to having exponential growth flatten out whilst other growth curves continue 17
  • 18. The future of block storage SAN/NAS stops being a category in its own right and becomes part of the software defined data centre – SAN (and especially dedicated fibre channel networks) goes away altogether – NAS folds into the commodity server space – looks like DAS at the hardware layer but behaves like NAS from a software perspective – Dedicated puddles of software defined storage will be aligned to 'big data', but the overall capacity management should ultimately be defined by the first exhausted commodity (CPU, RAM, I/O, disc) 18
  • 19. Data Centre impact - Summary > Simple energy efficient servers With local disk < Big boxes Connected to SAN Everything looks the same (less diversity in hardware) Everything uses the minimum possible energy 'Big Data' is a part of the overall capacity management problem Data centre automation will solve for optimal equipment/energy use 19
  • 21. Conclusions 'Big Data' is a label that used to describe an emerging category of tools that are useful for problems with large data volume and low algorithmic complexity The technical and organisational means to provide security and governance for these tools are less mature than for traditional databases Data centres will fill up with more low end servers using local storage (and these will likely be the designs emerging from hyperscale operators that are optimised for manufacturing and energy efficiency) 21