SlideShare a Scribd company logo
HPC Meets Big Data in Financial
Services
Philip Filleul – Global Lead FS
Agenda
● Who is Cray
● Cray Vision and Products
● Breakthrough analytic technologies
● For each – Spark, Graph, Machine Learning
● What is it
● FS Use cases
● Technology needs to successfully deliver
● How Cray enables
● Key takeaways
Cray: The Myth vs. Reality
Myths
• They are huge
• They are proprietary
• They are complex
• They are expensive
Vs. Reality!
• They can be – but they start less than a rack
• No: Intel, Linux, open standards, Hadoop
• Simpler and more productive than a grid
• No – cost competitive, lower TCO, higher value
Three Focus Areas
• Computation
• Storage & Data Management
• Analytics
Rev. $725M
Gross R&D: >$130M*
H/C: 1,200*
Rev. $562M
Gross R&D: $105M
H/C: 1,138
Rev. $526M
Gross R&D: $92M
H/C: 1,042
Rev. $421M
Gross R&D: $86M
H/C: 929
Rev. $236M
Gross R&D: $77M
H/C: 860
2011 2012 2013 2014 2015
Continuing Financial Strength and R&D Investment
Copyright 2016 Cray Inc
Where does Cray add value in FS?
BUSINESS
LINES
Asset
Management
Search for alpha
Strategy
confidence
Wealth
Management
Roboadvice
Scaling mass
affluent
Securities
Massive
regulation
Compliance
burden
Stress tests
Commoditization
Insurance
Telematics
Fraud
CROSS
INDUSTRY
Technology
Commoditization
and open source
Cloud
Big data
analytics
Cybersecurity
✔ ✔
✔
✔
✔
✔
✔
✔✔
✔
✔
Cray’s Vision:
The Fusion of Supercomputing and Big & Fast Data
Copyright 2016 Cray Inc.
Super Computing
Big Data
Analytics
Modeling The World
Cray Supercomputers solving “grand challenges” in science, engineering and analytics
Compute Store Analyze
Data-Intensive
Processing
High throughput event processing &
data capture from sensors, data feeds
and instruments
Math Models
Modeling and simulation
augmented with data to provide the
highest fidelity virtual reality results
Data Models
Integration of datasets and math
models for search, analysis,
predictive modeling and knowledge
discovery
High Performance Data Analytics (HPDA)
Cray Product Range and FS Applicability
 Aries Interconnect
 Single memory
 Scalability
 Package density
 Grid compatibility
 Upgradeability
 Integrated Stack
 Best in class power
and cooling
 NVIDIA GPU density
 Proven at scale
 Integrated h/w and
s/w stack
 Developer
productivity
CS400
Cluster
Supercomputer
XC40
Supercomputer
 Risk/Pricing
 CVA
 Machine Learning
 Superfast data sharing
 Specialist within grid
 Surprisingly Low TCO
 Risk/Pricing
 Options FFT
 Algo backtesting
 Deep Learning
Cray Product Range and FS Applicability
 Lustre parallel file system
 Single POSIX namespace
 Modular scaling 7.5GB/s-1.7TB/s
 Integrated and preconfigured
 Reliability and availability at scale
 Multi tier single namespace archive
 Rule based policy migration
 Flexible integration with most OEM
tape and disk
 Preconfigured and integrated
Archive
Lustre Parallel File System  High thruput for algo
analytics pipeline
 Converged storage across
grid, analytics, Hadoop
 Data Lake archival
 Analytical data archival
 Market data archival
 Data no longer ‘deep
sixed’
Cray Product Range and FS Applicability
 Most scalable graph
processor available
 Whole graph analytics
possible
 Open RDF/Sparql
 Single memory space
and extreme threaded
processor
 Cloudera 5.2/Yarn
 Open to non CDH apps
 Dense compute and
memory
 SSD layer for HDFS
 Lustre/Posix for scale
out storage
Urika-XA
Extreme Analytics
Platform
Urika-GD
Graph Discovery
Appliance
 Surveillance
 Cybersecurity
 Ontology based
transaction compliance
 Spark optimized
 R/T streaming analytics
converged with regular
analytics
 Machine learning
Breakthrough Analytic Technologies
Copyright 2015 Cray Inc.
● Growing CPU capability, commoditization, memory
size and IO bandwidth has made some new software
technologies explode
● Spark
● Graph
● Machine Learning
● For each:
● What are they?
● Why are they important in FS?
● What technology attributes do they need to deliver on the
promise?
● How does Cray enable?
Spark
Copyright 2015 Cray Inc.
● What is the technology
● General purpose, productive analytic technology
● Open source, target of much development work
● Memory first, shared data
● Base ecosystem for e.g. GraphX and MLlib
● FS Use Cases:
● Risk analytics
● Real-time alerting and dashboarding
● Web clickstream rapid ETL for CSRs
Spark Technology Needs
Copyright 2016 Cray Inc.
Compute
Node
Compute
Node
Memory
SSD
HDD
Block Shuffle over
interconnect
Intermediate results spill
over from memory, SSD
recommended for
latency/size balance
HDD for Job
input and output
HDFS vs. Parallel File System for high
bandwidth and scaling disk separate to
compute
Performance Recommendation
- Fast interconnect
- SSD per node
- Shared parallel filesystem
What is Graph
Copyright 2016 Cray Inc.
A Traditional RDBMS is GOOD at:
- Rapid update
- Simple queries about items
But BAD at:
- Relationships between data items
- Patterns of relationships
- Interactions between many data items
- E.g. suspicious pattern of actions
Graph databases:
Operate entirely in memory


Discovering New Risk/Compliance events
● Goal: Find detection patterns and improve
efficiency of the investigation process by
reducing false positives
● Data sets: Accounts, Customer Transactions,
3rd party data feeds, Detection and Case
Management systems
● Technical Challenges: Rigid detection system
schemas and rules; Constantly degrading
performance as new data comes in; Hard to
tune performance with new data; Long data on-
boarding timeframes; Manual disposition of
benign alerts
● Users: Investigators, Analysts
● Usage model: Tune detection system models
via data discovery; Enhance, improve and
augment the alert investigations process
● Augmenting: Existing detection systems
RestrictedTradingList
Trader
StockSymbol
LegalEmployee
DestIP
Port
Protocol SourceIP
TypeDateTime
BadgeLogs
EntryTime
ExitTime
Location
SystemWith
AdminRights
ITEmployee
PolicyViolations
Location
Restriction
RestrictionStartDate
Department
CounterParty
Transaction
Date
Transaction
Type
RestrictionDate
Communication
Event
Location
Restriction
RestrictionStartDate
RecordType
Time
Inexperienced
CSR Event
Resolutions
Discovering Customer Churn drivers
● Goal: Identify correlations between service events
(truck rolls, call escalations, customer service rep
experience level, set-top box reliability…) and
customer churn
● Data sets: Customer records, Historic billing
records, IVR, HR/training records, customer
surveys, Network Operations data, Work Orders…
● Technical Challenges: Volume, Variety and
Velocity of data; Disconnected and disparate data
sources from operational lines of business and 3rd
party contractors
● Users: Customer Operations Analysts
● Usage model: Analyze relationships between
service & related events and eventual customer
contract outcomes
● Augmenting: Existing data warehouse appliances
Customers
Call Center
Events
Work
Orders
Call
Escalations
Truck Rolls Set-Top
Box feeds
Supervisor
Intervention
3rd Party
Service Tech
AVR
Failure
CSR
Resolution
Cabin
et
Failure
Residential
Accounts
Web
Service
Commercial
Accounts
Mphasis Nextangles: A Disruptive Approach
Regulations
& Policies
Data & IT
systems
Now : Sample Audits
connect the two silos
NextAngles: Bridges the
two through Knowledge
Models
1. Regulations are deconstructed to
computer understandable rules
2. Rules are applied to Smart Data
3. This application is through
knowledge model
Old World Solution :
Inadequate
New World Solution :
Knowledge models
1818
NEXTANGLES
Massively scalable, “Living” model of the bank
How it worksHOW NEXTANGLES WORKS
Convert to
“Smart Data”
Time 
Investigation
Tools
Customer’s
Systems
Dashboards
Concept
Model
Rules
Inferences
• Potential violations
• Prohibited activities
• Operational risk measures
• Data problems
T1 T2 T3 T4 T5 T6 T7
Context model
• Line of business
• Legal entities
• Geographies
• Customer segments
• Organization structure
• Processes
Reference & Transaction Data
• Parties
• Accounts / GL / positions
• Transactions & events
“Facts”
Encoded
Regulations &
Policies
Encoded
Banking
Knowledge
ENABLER #1: SMART DATA
• Data stored as computer
intelligible “graphs”
What is it?
Class
predicate
• Formal standards from the W3C
and other bodies
• Over 12 years Semantic Web has
evolved to a full ecosystem of
products and practices
• Order of magnitude reduction in
handling real world data
complexity
How is it enabled? Value Proposition
Making the data computer intelligible
ObjectSubject
Data  SMART Data
ENABLER #2: RULES & CURATED KNOWLEDGE
Reliable, consistent and predictable application of reasoning & complex rules
• Built on the smart data model:
helps computers reach the same
conclusions as human
knowledge workers
• Knowledge expressed as rules
that are intrinsically part of the
smart data ecosystem
• Reduces the need for humans to
intervene & define “how” to solve
problems
What is it? How is it enabled? Value Proposition
Traditional
Rules
Data
Traditional Rules: need to be wired in
Rules in a Smart Data ecosystem: Fills in gaps
ENABLER #3: WORKSPACES
• A complete rethink of user
interfaces around smart data &
knowledge models
• Semantic + knowledge base
driven “Noun-verb” paradigm
• “Workspaces” – context where
users work through an enquiry
• 6 widgets:
• Solves the “I need Excel”
problem
• Solves the swivel chair problem
• Solves the vocabulary problem
What is it? How is it enabled? Value Proposition
A rethink of enterprise applications for knowledge workers
Faceted Search View
List VisualizeHistory
Forms / WizardsWorkspace
ENABLER #4: LEARNING
• Learns from user behavior to
help pre-populate workspaces
• Learns how users use tools to
perform tasks
• Tries to proactively bring up
the tools when it sees a
similar situation
• Interim work products can be
turned into future automation
• User behavior in a user interface
is tracked in detail, and encoded
into smart data
• Learning algorithms eliminate
dead ends & build an optimum
path to the answers
• Effort for manual tasks reduces
over time
• Almost like “custom screens” for
1000’s of subtle variations
• The Next Angles learns from
users’ behavior
• Supervisors can short-circuit
learning engine to “pre-configure”
workspaces
What is it? How is it enabled? Value Proposition
Continuous improvement of efficiency and effectiveness through learning
Anti Money Laundering: Solutions to a Real Problem
● Backlog of investigations due to large number of alerts
● Constantly changing AML rules and regulations
● Consolidation of data from various systems within and outside the
bank
● Balancing the load with limited resources
Challenges
Urika-GD: Purpose-built for data discovery
1,944
Times
Faster !
“In the amount of time it takes to validate one hypothesis, we can now validate 1000
hypotheses – increasing our success rate significantly.” – Dr. Ilya Shmulevich
Access all data with uniform, low latency
regardless of partitioning, layout or
access pattern
Do not know the relationships
in the data
Do not know the desired
insight or the right question to
ask
Do not know the
paths/linkages to explore
diverse data sets
Investigate multiple, changing
hypotheses in parallel without
prefetching/caching
Explore diverse data fused without
upfront modeling and independent of
linkage/traversal path
Shared Memory
Model
Memory Accelerator
In Memory, Graph
Analytical Database
# PROCESSORS TIME
Traditional Approaches after months of optimization 48 10.8 Hours
Cray 32 30 sec
Machine Learning
All the data vs. a sample, messy data is OK
Correlation vs. causation
Algorithms fine tune themselves
Machine Learning is Different:
Machine Learning use cases in FS
Copyright 2016 Cray Inc.
● Anomaly detection for compliance
● Rogue traders, Fat Fingers
● E.g. normal accuracy with decision trees: 70-75%
● Deep neural nets >90%, which can halve fraud costs
● Fraud, money laundering
● Trading Strategies
● Risk and reward prediction
● Structured and unstructured data sources
● Personnel and Customer Management
● Recruiting/Turnover prevention
● CRM for trading platforms
Supervised Machine Learning
Copyright 2016 Cray Inc.
First label data: human
judgments on historic
data – e.g. fraud or not
fraud
Statistical analysis of
training data
Model finds correlations
between input data and
human applied labels
•1000s of features: events, state,
temporal, graph
•Millions of fraud patterns
•Copes with noisy data
Deep Learning as the emerging Supervised
Learning ML
● NVIDIA the thought (technology) leader in Deep
Learning
● GPU technology well-suited
● Adopters like Google, Facebook, Microsoft
Especially successful for
- Pattern recognition
- Feature extraction
in speech, pictures, time-series
Technology Needs of Machine Learning
Copyright 2016 Cray Inc.
● Highly parallel any to any
● Dense compute, large memory, fast interconnect
● Deep learning: Dense GPUs depending on toolset
Cray XC for large single image memory scaling
Cray CS-Storm for dense GPUs for Deep Learning
A greater engineering challenge than you might think
Cray makes the world’s densest most scalable and RELIABLE GPU
systems
Unsupervised Machine Learning
In Summary: New Analytics Technology Needs
Copyright 2016 Cray Inc.
Characteristic Older Hadoop Traditional HPC Advanced
Analytics
Interconnect Slow Fast/Intelligent Fast/Intelligent
Single memory
capability
No Yes Yes
High Bandwidth
I/O
No Yes Yes
Node Local
Storage
Yes No Hybrid
Compute density Low High High
GPUs No Yes Yes
Summary
Copyright 2016 Cray Inc.
● Game changing analytics technologies are arriving
● They have high ROI use cases in FS
● Their technology demands do not align with traditional
Hadoop clusters
● Their technology needs are closer to HPC
● Cray has great heritage, experience and technology
● Cray is designing new age analytic products

More Related Content

What's hot

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
Elvis Muyanja
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
Big Data User Group Karlsruhe/Stuttgart
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
Dataconomy Media
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
InSemble
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
Donghui Zhang
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
Dataconomy Media
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
Vivek Murugesan
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industry
Parviz Iskhakov
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
BigData Analysis
BigData AnalysisBigData Analysis
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
DataWorks Summit
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
balvis_ms
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
Cambridge Semantics
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 

What's hot (20)

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Strategyzing big data in telco industry
Strategyzing big data in telco industryStrategyzing big data in telco industry
Strategyzing big data in telco industry
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 

Similar to Bitkom Cray presentation - on HPC affecting big data analytics in FS

Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
Vikas Sardana
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Kinetica
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 

Similar to Bitkom Cray presentation - on HPC affecting big data analytics in FS (20)

Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 

Recently uploaded

how to sell pi coins effectively (from 50 - 100k pi)
how to sell pi coins effectively (from 50 - 100k  pi)how to sell pi coins effectively (from 50 - 100k  pi)
how to sell pi coins effectively (from 50 - 100k pi)
DOT TECH
 
GeM ppt in railway for presentation on gem
GeM ppt in railway  for presentation on gemGeM ppt in railway  for presentation on gem
GeM ppt in railway for presentation on gem
CwierAsn
 
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
Falcon Invoice Discounting
 
234Presentation on Indian Debt Market.ppt
234Presentation on Indian Debt Market.ppt234Presentation on Indian Debt Market.ppt
234Presentation on Indian Debt Market.ppt
PravinPatil144525
 
how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.
DOT TECH
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
DOT TECH
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
shetivia
 
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdfWhich Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Kezex (KZX)
 
how to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchangehow to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchange
DOT TECH
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
morearsh02
 
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
beulahfernandes8
 
how to sell pi coins in all Africa Countries.
how to sell pi coins in all Africa Countries.how to sell pi coins in all Africa Countries.
how to sell pi coins in all Africa Countries.
DOT TECH
 
Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024
Commercial Bank of Ceylon PLC
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
DOT TECH
 
This assessment plan proposal is to outline a structured approach to evaluati...
This assessment plan proposal is to outline a structured approach to evaluati...This assessment plan proposal is to outline a structured approach to evaluati...
This assessment plan proposal is to outline a structured approach to evaluati...
lamluanvan.net Viết thuê luận văn
 
APP I Lecture Notes to students 0f 4the year
APP I  Lecture Notes  to students 0f 4the yearAPP I  Lecture Notes  to students 0f 4the year
APP I Lecture Notes to students 0f 4the year
telilaalilemlem
 
Introduction to Indian Financial System ()
Introduction to Indian Financial System ()Introduction to Indian Financial System ()
Introduction to Indian Financial System ()
Avanish Goel
 
managementaccountingunitiv-230422140105-dd17d80b.ppt
managementaccountingunitiv-230422140105-dd17d80b.pptmanagementaccountingunitiv-230422140105-dd17d80b.ppt
managementaccountingunitiv-230422140105-dd17d80b.ppt
SuseelaPalanimuthu
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Quotidiano Piemontese
 
Financial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptxFinancial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptx
Writo-Finance
 

Recently uploaded (20)

how to sell pi coins effectively (from 50 - 100k pi)
how to sell pi coins effectively (from 50 - 100k  pi)how to sell pi coins effectively (from 50 - 100k  pi)
how to sell pi coins effectively (from 50 - 100k pi)
 
GeM ppt in railway for presentation on gem
GeM ppt in railway  for presentation on gemGeM ppt in railway  for presentation on gem
GeM ppt in railway for presentation on gem
 
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
innovative-invoice-discounting-platforms-in-india-empowering-retail-investors...
 
234Presentation on Indian Debt Market.ppt
234Presentation on Indian Debt Market.ppt234Presentation on Indian Debt Market.ppt
234Presentation on Indian Debt Market.ppt
 
how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.how to sell pi coins at high rate quickly.
how to sell pi coins at high rate quickly.
 
how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.how to sell pi coins in South Korea profitably.
how to sell pi coins in South Korea profitably.
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
 
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdfWhich Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
 
how to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchangehow to sell pi coins on Bitmart crypto exchange
how to sell pi coins on Bitmart crypto exchange
 
655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf655264371-checkpoint-science-past-papers-april-2023.pdf
655264371-checkpoint-science-past-papers-april-2023.pdf
 
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
Exploring Abhay Bhutada’s Views After Poonawalla Fincorp’s Collaboration With...
 
how to sell pi coins in all Africa Countries.
how to sell pi coins in all Africa Countries.how to sell pi coins in all Africa Countries.
how to sell pi coins in all Africa Countries.
 
Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024Commercial Bank Economic Capsule - May 2024
Commercial Bank Economic Capsule - May 2024
 
What price will pi network be listed on exchanges
What price will pi network be listed on exchangesWhat price will pi network be listed on exchanges
What price will pi network be listed on exchanges
 
This assessment plan proposal is to outline a structured approach to evaluati...
This assessment plan proposal is to outline a structured approach to evaluati...This assessment plan proposal is to outline a structured approach to evaluati...
This assessment plan proposal is to outline a structured approach to evaluati...
 
APP I Lecture Notes to students 0f 4the year
APP I  Lecture Notes  to students 0f 4the yearAPP I  Lecture Notes  to students 0f 4the year
APP I Lecture Notes to students 0f 4the year
 
Introduction to Indian Financial System ()
Introduction to Indian Financial System ()Introduction to Indian Financial System ()
Introduction to Indian Financial System ()
 
managementaccountingunitiv-230422140105-dd17d80b.ppt
managementaccountingunitiv-230422140105-dd17d80b.pptmanagementaccountingunitiv-230422140105-dd17d80b.ppt
managementaccountingunitiv-230422140105-dd17d80b.ppt
 
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...Turin Startup Ecosystem 2024  - Ricerca sulle Startup e il Sistema dell'Innov...
Turin Startup Ecosystem 2024 - Ricerca sulle Startup e il Sistema dell'Innov...
 
Financial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptxFinancial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptx
 

Bitkom Cray presentation - on HPC affecting big data analytics in FS

  • 1. HPC Meets Big Data in Financial Services Philip Filleul – Global Lead FS
  • 2. Agenda ● Who is Cray ● Cray Vision and Products ● Breakthrough analytic technologies ● For each – Spark, Graph, Machine Learning ● What is it ● FS Use cases ● Technology needs to successfully deliver ● How Cray enables ● Key takeaways
  • 3. Cray: The Myth vs. Reality Myths • They are huge • They are proprietary • They are complex • They are expensive Vs. Reality! • They can be – but they start less than a rack • No: Intel, Linux, open standards, Hadoop • Simpler and more productive than a grid • No – cost competitive, lower TCO, higher value Three Focus Areas • Computation • Storage & Data Management • Analytics
  • 4. Rev. $725M Gross R&D: >$130M* H/C: 1,200* Rev. $562M Gross R&D: $105M H/C: 1,138 Rev. $526M Gross R&D: $92M H/C: 1,042 Rev. $421M Gross R&D: $86M H/C: 929 Rev. $236M Gross R&D: $77M H/C: 860 2011 2012 2013 2014 2015 Continuing Financial Strength and R&D Investment Copyright 2016 Cray Inc
  • 5.
  • 6. Where does Cray add value in FS? BUSINESS LINES Asset Management Search for alpha Strategy confidence Wealth Management Roboadvice Scaling mass affluent Securities Massive regulation Compliance burden Stress tests Commoditization Insurance Telematics Fraud CROSS INDUSTRY Technology Commoditization and open source Cloud Big data analytics Cybersecurity ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔✔ ✔ ✔
  • 7. Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data Copyright 2016 Cray Inc. Super Computing Big Data Analytics Modeling The World Cray Supercomputers solving “grand challenges” in science, engineering and analytics Compute Store Analyze Data-Intensive Processing High throughput event processing & data capture from sensors, data feeds and instruments Math Models Modeling and simulation augmented with data to provide the highest fidelity virtual reality results Data Models Integration of datasets and math models for search, analysis, predictive modeling and knowledge discovery High Performance Data Analytics (HPDA)
  • 8. Cray Product Range and FS Applicability  Aries Interconnect  Single memory  Scalability  Package density  Grid compatibility  Upgradeability  Integrated Stack  Best in class power and cooling  NVIDIA GPU density  Proven at scale  Integrated h/w and s/w stack  Developer productivity CS400 Cluster Supercomputer XC40 Supercomputer  Risk/Pricing  CVA  Machine Learning  Superfast data sharing  Specialist within grid  Surprisingly Low TCO  Risk/Pricing  Options FFT  Algo backtesting  Deep Learning
  • 9. Cray Product Range and FS Applicability  Lustre parallel file system  Single POSIX namespace  Modular scaling 7.5GB/s-1.7TB/s  Integrated and preconfigured  Reliability and availability at scale  Multi tier single namespace archive  Rule based policy migration  Flexible integration with most OEM tape and disk  Preconfigured and integrated Archive Lustre Parallel File System  High thruput for algo analytics pipeline  Converged storage across grid, analytics, Hadoop  Data Lake archival  Analytical data archival  Market data archival  Data no longer ‘deep sixed’
  • 10. Cray Product Range and FS Applicability  Most scalable graph processor available  Whole graph analytics possible  Open RDF/Sparql  Single memory space and extreme threaded processor  Cloudera 5.2/Yarn  Open to non CDH apps  Dense compute and memory  SSD layer for HDFS  Lustre/Posix for scale out storage Urika-XA Extreme Analytics Platform Urika-GD Graph Discovery Appliance  Surveillance  Cybersecurity  Ontology based transaction compliance  Spark optimized  R/T streaming analytics converged with regular analytics  Machine learning
  • 11. Breakthrough Analytic Technologies Copyright 2015 Cray Inc. ● Growing CPU capability, commoditization, memory size and IO bandwidth has made some new software technologies explode ● Spark ● Graph ● Machine Learning ● For each: ● What are they? ● Why are they important in FS? ● What technology attributes do they need to deliver on the promise? ● How does Cray enable?
  • 12. Spark Copyright 2015 Cray Inc. ● What is the technology ● General purpose, productive analytic technology ● Open source, target of much development work ● Memory first, shared data ● Base ecosystem for e.g. GraphX and MLlib ● FS Use Cases: ● Risk analytics ● Real-time alerting and dashboarding ● Web clickstream rapid ETL for CSRs
  • 13. Spark Technology Needs Copyright 2016 Cray Inc. Compute Node Compute Node Memory SSD HDD Block Shuffle over interconnect Intermediate results spill over from memory, SSD recommended for latency/size balance HDD for Job input and output HDFS vs. Parallel File System for high bandwidth and scaling disk separate to compute Performance Recommendation - Fast interconnect - SSD per node - Shared parallel filesystem
  • 14. What is Graph Copyright 2016 Cray Inc. A Traditional RDBMS is GOOD at: - Rapid update - Simple queries about items But BAD at: - Relationships between data items - Patterns of relationships - Interactions between many data items - E.g. suspicious pattern of actions Graph databases: Operate entirely in memory  
  • 15. Discovering New Risk/Compliance events ● Goal: Find detection patterns and improve efficiency of the investigation process by reducing false positives ● Data sets: Accounts, Customer Transactions, 3rd party data feeds, Detection and Case Management systems ● Technical Challenges: Rigid detection system schemas and rules; Constantly degrading performance as new data comes in; Hard to tune performance with new data; Long data on- boarding timeframes; Manual disposition of benign alerts ● Users: Investigators, Analysts ● Usage model: Tune detection system models via data discovery; Enhance, improve and augment the alert investigations process ● Augmenting: Existing detection systems RestrictedTradingList Trader StockSymbol LegalEmployee DestIP Port Protocol SourceIP TypeDateTime BadgeLogs EntryTime ExitTime Location SystemWith AdminRights ITEmployee PolicyViolations Location Restriction RestrictionStartDate Department CounterParty Transaction Date Transaction Type RestrictionDate Communication Event Location Restriction RestrictionStartDate RecordType Time
  • 16. Inexperienced CSR Event Resolutions Discovering Customer Churn drivers ● Goal: Identify correlations between service events (truck rolls, call escalations, customer service rep experience level, set-top box reliability…) and customer churn ● Data sets: Customer records, Historic billing records, IVR, HR/training records, customer surveys, Network Operations data, Work Orders… ● Technical Challenges: Volume, Variety and Velocity of data; Disconnected and disparate data sources from operational lines of business and 3rd party contractors ● Users: Customer Operations Analysts ● Usage model: Analyze relationships between service & related events and eventual customer contract outcomes ● Augmenting: Existing data warehouse appliances Customers Call Center Events Work Orders Call Escalations Truck Rolls Set-Top Box feeds Supervisor Intervention 3rd Party Service Tech AVR Failure CSR Resolution Cabin et Failure Residential Accounts Web Service Commercial Accounts
  • 17. Mphasis Nextangles: A Disruptive Approach Regulations & Policies Data & IT systems Now : Sample Audits connect the two silos NextAngles: Bridges the two through Knowledge Models 1. Regulations are deconstructed to computer understandable rules 2. Rules are applied to Smart Data 3. This application is through knowledge model Old World Solution : Inadequate New World Solution : Knowledge models
  • 18. 1818 NEXTANGLES Massively scalable, “Living” model of the bank How it worksHOW NEXTANGLES WORKS Convert to “Smart Data” Time  Investigation Tools Customer’s Systems Dashboards Concept Model Rules Inferences • Potential violations • Prohibited activities • Operational risk measures • Data problems T1 T2 T3 T4 T5 T6 T7 Context model • Line of business • Legal entities • Geographies • Customer segments • Organization structure • Processes Reference & Transaction Data • Parties • Accounts / GL / positions • Transactions & events “Facts” Encoded Regulations & Policies Encoded Banking Knowledge
  • 19. ENABLER #1: SMART DATA • Data stored as computer intelligible “graphs” What is it? Class predicate • Formal standards from the W3C and other bodies • Over 12 years Semantic Web has evolved to a full ecosystem of products and practices • Order of magnitude reduction in handling real world data complexity How is it enabled? Value Proposition Making the data computer intelligible ObjectSubject Data  SMART Data
  • 20. ENABLER #2: RULES & CURATED KNOWLEDGE Reliable, consistent and predictable application of reasoning & complex rules • Built on the smart data model: helps computers reach the same conclusions as human knowledge workers • Knowledge expressed as rules that are intrinsically part of the smart data ecosystem • Reduces the need for humans to intervene & define “how” to solve problems What is it? How is it enabled? Value Proposition Traditional Rules Data Traditional Rules: need to be wired in Rules in a Smart Data ecosystem: Fills in gaps
  • 21. ENABLER #3: WORKSPACES • A complete rethink of user interfaces around smart data & knowledge models • Semantic + knowledge base driven “Noun-verb” paradigm • “Workspaces” – context where users work through an enquiry • 6 widgets: • Solves the “I need Excel” problem • Solves the swivel chair problem • Solves the vocabulary problem What is it? How is it enabled? Value Proposition A rethink of enterprise applications for knowledge workers Faceted Search View List VisualizeHistory Forms / WizardsWorkspace
  • 22. ENABLER #4: LEARNING • Learns from user behavior to help pre-populate workspaces • Learns how users use tools to perform tasks • Tries to proactively bring up the tools when it sees a similar situation • Interim work products can be turned into future automation • User behavior in a user interface is tracked in detail, and encoded into smart data • Learning algorithms eliminate dead ends & build an optimum path to the answers • Effort for manual tasks reduces over time • Almost like “custom screens” for 1000’s of subtle variations • The Next Angles learns from users’ behavior • Supervisors can short-circuit learning engine to “pre-configure” workspaces What is it? How is it enabled? Value Proposition Continuous improvement of efficiency and effectiveness through learning
  • 23. Anti Money Laundering: Solutions to a Real Problem ● Backlog of investigations due to large number of alerts ● Constantly changing AML rules and regulations ● Consolidation of data from various systems within and outside the bank ● Balancing the load with limited resources Challenges
  • 24. Urika-GD: Purpose-built for data discovery 1,944 Times Faster ! “In the amount of time it takes to validate one hypothesis, we can now validate 1000 hypotheses – increasing our success rate significantly.” – Dr. Ilya Shmulevich Access all data with uniform, low latency regardless of partitioning, layout or access pattern Do not know the relationships in the data Do not know the desired insight or the right question to ask Do not know the paths/linkages to explore diverse data sets Investigate multiple, changing hypotheses in parallel without prefetching/caching Explore diverse data fused without upfront modeling and independent of linkage/traversal path Shared Memory Model Memory Accelerator In Memory, Graph Analytical Database # PROCESSORS TIME Traditional Approaches after months of optimization 48 10.8 Hours Cray 32 30 sec
  • 25. Machine Learning All the data vs. a sample, messy data is OK Correlation vs. causation Algorithms fine tune themselves Machine Learning is Different:
  • 26. Machine Learning use cases in FS Copyright 2016 Cray Inc. ● Anomaly detection for compliance ● Rogue traders, Fat Fingers ● E.g. normal accuracy with decision trees: 70-75% ● Deep neural nets >90%, which can halve fraud costs ● Fraud, money laundering ● Trading Strategies ● Risk and reward prediction ● Structured and unstructured data sources ● Personnel and Customer Management ● Recruiting/Turnover prevention ● CRM for trading platforms
  • 27. Supervised Machine Learning Copyright 2016 Cray Inc. First label data: human judgments on historic data – e.g. fraud or not fraud Statistical analysis of training data Model finds correlations between input data and human applied labels •1000s of features: events, state, temporal, graph •Millions of fraud patterns •Copes with noisy data
  • 28. Deep Learning as the emerging Supervised Learning ML ● NVIDIA the thought (technology) leader in Deep Learning ● GPU technology well-suited ● Adopters like Google, Facebook, Microsoft Especially successful for - Pattern recognition - Feature extraction in speech, pictures, time-series
  • 29. Technology Needs of Machine Learning Copyright 2016 Cray Inc. ● Highly parallel any to any ● Dense compute, large memory, fast interconnect ● Deep learning: Dense GPUs depending on toolset Cray XC for large single image memory scaling Cray CS-Storm for dense GPUs for Deep Learning A greater engineering challenge than you might think Cray makes the world’s densest most scalable and RELIABLE GPU systems
  • 31. In Summary: New Analytics Technology Needs Copyright 2016 Cray Inc. Characteristic Older Hadoop Traditional HPC Advanced Analytics Interconnect Slow Fast/Intelligent Fast/Intelligent Single memory capability No Yes Yes High Bandwidth I/O No Yes Yes Node Local Storage Yes No Hybrid Compute density Low High High GPUs No Yes Yes
  • 32. Summary Copyright 2016 Cray Inc. ● Game changing analytics technologies are arriving ● They have high ROI use cases in FS ● Their technology demands do not align with traditional Hadoop clusters ● Their technology needs are closer to HPC ● Cray has great heritage, experience and technology ● Cray is designing new age analytic products

Editor's Notes

  1. Title Slide intended for Financial presentations
  2. This chart shows key concerns in different parts of financial services and the many areas (in those key concerns) where Cray adds real value
  3. Cray is thought of as a classic large supercomputer maker but is less known in Financial Services. To enable world class scaling HPC Cray has invested in its own highly intelligent and performant interconnect and when combined with a systems engineering approach and commonly available commodity technologies it has enabled us to produce open standards compliant, highly performant, reliable and competitively priced scalable systems that are highly relevant in FS. The financial services industry is experiencing a wave of innovation enabled by big and fast data at a time of unprecedented regulatory pressure and margin compression. Firms are being forced to not only cut costs and re-platform onto more cost effective data platforms such as those based on Hadoop, but also to embrace regulatory oversight and move to more real time monitoring of their positions, their risks as well as areas like employee surveillance. Lets look at risk management as an example. Historically the nightly value at risk (VAR) workload is one of the cornerstones of banking IT. It is highly parallel and runs on 10s to 100s of thousands of cores in a large commodity grid in a linear algebra Monte Carlo math model. The pressure now is to move that to more real time for regulatory and sound business management reasons. This is a good example of a traditional HPC application looking more like a high performance data analytics application (HPDA). Some firms are looking at Spark to re-platform this application. The technology needs of this new style of application are very different to the traditional commodity cluster and need high bandwidth, throughput, concurrency and any-to-any communications with an underlying HPC capability. All these more demanding needs play well to Cray’s strengths and to our newer system design points. Cray is a technology leader in each of the areas of scalable compute, storage and analytics, but it is the combination of all these in this new style of application where we have unparalleled strength.
  4. Spark performance is very complex with many factors that are highly use case dependent but some broad generalities can be seen. At the start of a job data from disk needs to be brought into the nodes memory With HDFS this is normally stored 3x in different places on the cluster. An alternative is to store data on a parallel filesystem such as open source Lustre. This allows: Data and compute to be scaled separately, data has high throughput but slightly higher latency than HDD on node This is very suitable for the central storage of input and output data (rather than intermediate results where latency is more important than throughput) In the map phase of processing data is normally self contained on the node. Intermediate results can either rest in memory on disk or both. Depending on the use case when larger internediate results are needed SSD is highly suitable being cheaper than memory, and more extensible. HDD can be used but latencies are much higher. In the reduce phase of processing results are aggregated across nodes. Here we see dependency on interconnect speeds.
  5. Key points: Objective: identify financial risks posed by fraudulent / illegal activity before they incur massive punitive fines Challenge: Enormous volumes of surveillance and trading data; key indicators of collusion or wrong-doing are deeply hidden, and many patterns need to be explored to identify malfeasance with significant certainty Solution: Urika’s ability to fuse data from many sources into an enormous graph and search for hidden patterns allows 1000s of hypotheses to be explored in the time previously required to explore just one, increasing success rate. Words: The patience of the SEC and other regulators with financial services firms is at an all time low, with hefty penalties being applied to corporate wrong doing. Even worse is the damage done to the reputation of the firm in the media scrum if such wrong doing comes to light. It is more crucial than ever to have controls in place that detect and stop such wrong doing by employees before they can cause organizational harm, and the role of the corporate risk officer has grown accordingly. Risk management is complex, however. Consider insider trading within the context of an investment bank. You have to identify traders in a particular equity who interact with others having inside knowledge. Even identifying “interactions” is complex, given that people might communicate by phone, email, in-person, via an intermediary … tracking down such behaviour involves looking at data from a myriad of different sources, and correlating with complex trading patterns across time. And that’s just one form of risk! In addition to insider trading, risk and compliance officers are also tasked with detecting and implementing anti-money laundering procedures, identifying systemic risk factors through co-party risk analysis and many others. Urika addresses these challenges. It provides a platform that allows risk investigators to fuse data from a wide variety of sources in real time and search for patterns of interaction that could indicate insider trading or other malfeasance. Urika fuses the data from multiple sources into a graph, and provides the means to pose complex, ad-hoc analytic queries across the entire dataset and obtain results in real time.
  6. Key points: Operational focus: identify customers at the highest risk of churn and design a sticky package of services to aide in retention Challenge: continuous refinement of techniques used to identify at-risk customers, using data from variety of very large datasets Solution: Urika – fuse datasets into a big graph, analyze patterns of interactions and churn and formulate patterns which can identify at-risk customers Words: Churn is a problem that plagues every telecoms company, particularly mobile providers. When it can cost hundreds of dollars to acquire a new customer, minimizing churn is a major strategic initiative with direct impact on profitability. However, predicting churn is quite complex. One major international carrier realized that customer satisfaction could be measured using a variety of information sources: service events (outages, call escalations, customer rep experience), frequency and response to those service records, calling patterns, social media, influencers and a variety of other information. It’s not easy: the data comes in many different formats, and is very voluminous… The customer used Urika to fuse together these many disparate sources of information, and searched for historical correlations between these various indicators and an eventual churn outcome. That analysis produced a set of patterns, which could then be applied to existing customers to predict who was likely to churn – long before customer dissatisfaction reached the point where they were considering churning. Analysis of their calling patterns and influencers enabled the creation of highly targeted offers to those customers, with very high acceptance rates. This solution offered the best of all worlds: it minimized costs by targeting the most at-risk customers exclusively and created very high value, sticky offers for those customers specifically. As a side note, it should be mentioned that the telco quickly realized that this data source could be used to discovery many other things. Example: they were able to determine that there were an enormous number of unnecessary truck rolls because a common way of dealing with customer complaints was to move the customer from one line to another at the central office – which placated that customer, but resulted in dissatisfaction on the part of the customer that was displaced from the “good” line, resulting in another complaint and truck roll… Urika addresses the problem using a graph database that fuses all the data together, enabling a 360 degree view of all the relationships that any particular entity is involved in. This enables sophisticated querying and analysis, including temporal analysis. The use of whole graph analytics facilitates identifying key influencers and the construction of a sticky, micro-targeted offer.
  7. Regulatory compliance & Policies’ and ‘Data processing in IT systems’ typically sit in 2 separate silos. With the accelerating change in regulations, it is impractical to try and bridge the gap through sample audits. NextAngles bridges the gap between the two knowledge models