Evolving Patterns in Big Data
Neil Avery
CTO Excelian
neil.avery@excelian.com
Background….
Financial Services > Investment Banking:
• Cloud
• Big Data
• Risk platforms
neil.avery@excelian.com
@avery_neil
Agenda
• Before: How we used to do things?
• What has changed?
• Now: What are we doing now?
• Patterns: Enterprise, Lakes and Lambda
• Next: And then?
In the beginning…
• We used to:
- Solve simple problems…. kind of
- J2EE, CORBA….
- SOA… ESB
- Messaging
Getting smarter
• Data was in a Relational Database
• Data Scale was the challenge
• ‘Lets build a data caching layer!’…and IMDG’s
emerged
• We need to scale:
compute grids at 2k -> 50k compute cores;
they need data
Making Data scale
• Key-Value store
• Map <K,V>
• Plus events, listeners, processors,
location awareness and more…
[Oracle Coherence, GemFire,
Gigaspaces – and now Hazelcast]
So what is a compute grid?
• Financial services
• Life-sciences
• Computational fluid dynamics
Amazon – compute in the cloud
Source: http://media.amazonwebservices.com/
“And then”: Internet scale; a different mind-set
• Google OpenSource: Map/Reduce GFS etc.
• Hadoop: Map/Reduce/HDFS etc.
• Apache Dynamo: Cassandra etc.
• MongoDB
• We scan store at scale but doing anything
useful is slow, painful, infrastructure is
complex…..
The lightbulb; hype cycle
• We have data – it has value
• Tier-1 banks have been storing
logs in HDFS for 10 years
Where do you fit?
Business problem Big data type Description
Utilities: Predict power consumption Machine-generated data Utility companies have rolled out smart meters to measure the consumption of water, gas, and electricity at regular intervals of one hour or
less. These smart meters generate huge volumes of interval data that needs to be analyzed.
Telecommunications: Customer
churn analytics
Web and social data
Transaction data
Telecommunications operators need to build detailed customer churn models that include social media and transaction data, such as
CDRs, to keep up with the competition.
The value of the churn models depends on the quality of customer attributes (customer master data such as date of birth, gender, location,
and income) and the social behavior of customers.
Telecommunications providers who implement a predictive analytics strategy can manage and predict churn by analyzing the calling
patterns of subscribers.
Marketing: Sentiment analysis Web and social data Marketing departments use Twitter feeds to conduct sentiment analysis to determine what users are saying about the company and its
products or services, especially after a new product or release is launched.
Customer sentiment must be integrated with customer profile data to derive meaningful results. Customer feedback may vary according to
customer demographics.
Customer service: Call monitoring Human-generated IT departments are turning to big data solutions to analyze application logs to gain insight that can improve system performance. Log files
from various application vendors are in different formats; they must be standardized before IT departments can use them.
Retail: Personalized messaging based
on facial recognition and social
media
Web and social data
Biometrics
Retailers can use facial recognition technology in combination with a photo from social media to make personalized offers to customers
based on buying behavior and location.
This capability could have a tremendous impact on retailers? loyalty programs, but it has serious privacy ramifications. Retailers would
need to make the appropriate privacy disclosures before implementing these applications.
Retail and marketing: Mobile data
and location-based targeting
Machine-generated data
Transaction data
Retailers can target customers with specific promotions and coupons based location data. Solutions are typically designed to detect a
user's location upon entry to a store or through GPS.
Location data combined with customer preference data from social networks enable retailers to target online and in-store marketing
campaigns based on buying history. Notifications are delivered through mobile applications, SMS, and email.
FSS, Healthcare: Fraud detection Machine-generated data
Transaction data
Human-generated
Fraud management predicts the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze
transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, first-party fraud,
and deliberate misuse of account privileges.
Solutions are typically designed to detect and prevent myriad fraud and risk types across multiple industries, including:
•Credit and debit payment card fraud
•Deposit account fraud, Technical fraud, Bad debt, Healthcare fraud, Medicaid and Medicare fraud, Property and casualty insurance fraud
•Worker compensation fraud, Insurance fraud, Telecommunications fraud
Source: http://www.ibm.com/developerworks/library/bd-archpatterns1/index.html
For us: mostly enterprise
• Many dimensions
to understand!
• It’s how you keep
the business happy
Data Strategy
• CDO? Business strategy? Tech-consolidation, Business Use-cases
• Key-Factors are always Data (shape) and analytics (processing) –
maps onto Network, Storage and Compute; there is no escape
• So what patterns are we seeing?
Pattern 1 : Enterprise Cache
• Massive K-V Store
• Pluggable back-end (Cassandra, Mongo, Couchbase)
• Eventing with Kafka
• Nearside caching, multi-tenant, role-based-access
Pattern 2 : Enterprise Pipe
• Enterprise wide Kafka pipe
• Pluggable back-end (Rabbit, Cloud-AMQ etc.)
• Sending LogData to centralised storage, message passing,
high-performance, re-playable Queue etc.
• Multi-tenant
Pattern 3 : NoSQL A-A-S
• Much like Relational DB’s are managed
• Multi-tenant: using Cassandra Keyspaces (Blackrock, ING others)
• Rely on native platform features
• Multi-tenant, Role-based-access
Data Lakes
• Centralised hub-spoke
• Schema-less, raw data
• Map across sources
• Security and visibility
• Catalogue
• Data-virtualization
Lambda Architecture
• Near-real-time views
(storm, spark-streaming)
• Virtual Private Cloud
How does this work with Cloud?
Source: http://docs.datastax.com/
Cloud Region A
Cloud Region B
Cloud
Exchange
On-Prem
Trends
• Cassandra – massive traction
• Feels like a Database
• Maps well onto cloud
• Virtual private cloud is - helping driving Adoption
• Kafka – scales and works well
• Akka – Actor base (Spark)
• Microservices & Reactive: vertx.io etc.
• OpenHFT – high-performance java
• Cassandra versus the network
What’s next?
• More cloud & containerisation
• Graph DB
• Spark Evolution
• Spark SQL maturity
• OLAP NoSQL support for at-scale ad-hoc analysis
• Further commoditisation and generalisation of platforms (land grab)
• Data-virtualization
• 2016 – the year of the PAAS
Questions?
neil.avery@excelian.com
@avery_neil
www.excelian.com/
@Excelian
@Excelian
@ExcelianLTD
Title: Open Sans 100 px
• Subtitle: Open Sans 48 px
Title: Open Sans 100 px
• Subtitle: Open Sans 48 px

EVOLVING PATTERNS IN BIG DATA - NEIL AVERY

  • 1.
    Evolving Patterns inBig Data Neil Avery CTO Excelian neil.avery@excelian.com
  • 2.
    Background…. Financial Services >Investment Banking: • Cloud • Big Data • Risk platforms neil.avery@excelian.com @avery_neil
  • 3.
    Agenda • Before: Howwe used to do things? • What has changed? • Now: What are we doing now? • Patterns: Enterprise, Lakes and Lambda • Next: And then?
  • 4.
    In the beginning… •We used to: - Solve simple problems…. kind of - J2EE, CORBA…. - SOA… ESB - Messaging
  • 5.
    Getting smarter • Datawas in a Relational Database • Data Scale was the challenge • ‘Lets build a data caching layer!’…and IMDG’s emerged • We need to scale: compute grids at 2k -> 50k compute cores; they need data
  • 6.
    Making Data scale •Key-Value store • Map <K,V> • Plus events, listeners, processors, location awareness and more… [Oracle Coherence, GemFire, Gigaspaces – and now Hazelcast]
  • 7.
    So what isa compute grid? • Financial services • Life-sciences • Computational fluid dynamics
  • 8.
    Amazon – computein the cloud Source: http://media.amazonwebservices.com/
  • 9.
    “And then”: Internetscale; a different mind-set • Google OpenSource: Map/Reduce GFS etc. • Hadoop: Map/Reduce/HDFS etc. • Apache Dynamo: Cassandra etc. • MongoDB • We scan store at scale but doing anything useful is slow, painful, infrastructure is complex…..
  • 10.
    The lightbulb; hypecycle • We have data – it has value • Tier-1 banks have been storing logs in HDFS for 10 years
  • 11.
    Where do youfit? Business problem Big data type Description Utilities: Predict power consumption Machine-generated data Utility companies have rolled out smart meters to measure the consumption of water, gas, and electricity at regular intervals of one hour or less. These smart meters generate huge volumes of interval data that needs to be analyzed. Telecommunications: Customer churn analytics Web and social data Transaction data Telecommunications operators need to build detailed customer churn models that include social media and transaction data, such as CDRs, to keep up with the competition. The value of the churn models depends on the quality of customer attributes (customer master data such as date of birth, gender, location, and income) and the social behavior of customers. Telecommunications providers who implement a predictive analytics strategy can manage and predict churn by analyzing the calling patterns of subscribers. Marketing: Sentiment analysis Web and social data Marketing departments use Twitter feeds to conduct sentiment analysis to determine what users are saying about the company and its products or services, especially after a new product or release is launched. Customer sentiment must be integrated with customer profile data to derive meaningful results. Customer feedback may vary according to customer demographics. Customer service: Call monitoring Human-generated IT departments are turning to big data solutions to analyze application logs to gain insight that can improve system performance. Log files from various application vendors are in different formats; they must be standardized before IT departments can use them. Retail: Personalized messaging based on facial recognition and social media Web and social data Biometrics Retailers can use facial recognition technology in combination with a photo from social media to make personalized offers to customers based on buying behavior and location. This capability could have a tremendous impact on retailers? loyalty programs, but it has serious privacy ramifications. Retailers would need to make the appropriate privacy disclosures before implementing these applications. Retail and marketing: Mobile data and location-based targeting Machine-generated data Transaction data Retailers can target customers with specific promotions and coupons based location data. Solutions are typically designed to detect a user's location upon entry to a store or through GPS. Location data combined with customer preference data from social networks enable retailers to target online and in-store marketing campaigns based on buying history. Notifications are delivered through mobile applications, SMS, and email. FSS, Healthcare: Fraud detection Machine-generated data Transaction data Human-generated Fraud management predicts the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, first-party fraud, and deliberate misuse of account privileges. Solutions are typically designed to detect and prevent myriad fraud and risk types across multiple industries, including: •Credit and debit payment card fraud •Deposit account fraud, Technical fraud, Bad debt, Healthcare fraud, Medicaid and Medicare fraud, Property and casualty insurance fraud •Worker compensation fraud, Insurance fraud, Telecommunications fraud Source: http://www.ibm.com/developerworks/library/bd-archpatterns1/index.html
  • 12.
    For us: mostlyenterprise • Many dimensions to understand! • It’s how you keep the business happy
  • 13.
    Data Strategy • CDO?Business strategy? Tech-consolidation, Business Use-cases • Key-Factors are always Data (shape) and analytics (processing) – maps onto Network, Storage and Compute; there is no escape • So what patterns are we seeing?
  • 14.
    Pattern 1 :Enterprise Cache • Massive K-V Store • Pluggable back-end (Cassandra, Mongo, Couchbase) • Eventing with Kafka • Nearside caching, multi-tenant, role-based-access
  • 15.
    Pattern 2 :Enterprise Pipe • Enterprise wide Kafka pipe • Pluggable back-end (Rabbit, Cloud-AMQ etc.) • Sending LogData to centralised storage, message passing, high-performance, re-playable Queue etc. • Multi-tenant
  • 16.
    Pattern 3 :NoSQL A-A-S • Much like Relational DB’s are managed • Multi-tenant: using Cassandra Keyspaces (Blackrock, ING others) • Rely on native platform features • Multi-tenant, Role-based-access
  • 17.
    Data Lakes • Centralisedhub-spoke • Schema-less, raw data • Map across sources • Security and visibility • Catalogue • Data-virtualization
  • 18.
    Lambda Architecture • Near-real-timeviews (storm, spark-streaming)
  • 19.
    • Virtual PrivateCloud How does this work with Cloud? Source: http://docs.datastax.com/ Cloud Region A Cloud Region B Cloud Exchange On-Prem
  • 20.
    Trends • Cassandra –massive traction • Feels like a Database • Maps well onto cloud • Virtual private cloud is - helping driving Adoption • Kafka – scales and works well • Akka – Actor base (Spark) • Microservices & Reactive: vertx.io etc. • OpenHFT – high-performance java • Cassandra versus the network
  • 21.
    What’s next? • Morecloud & containerisation • Graph DB • Spark Evolution • Spark SQL maturity • OLAP NoSQL support for at-scale ad-hoc analysis • Further commoditisation and generalisation of platforms (land grab) • Data-virtualization • 2016 – the year of the PAAS
  • 22.
  • 23.
    Title: Open Sans100 px • Subtitle: Open Sans 48 px
  • 24.
    Title: Open Sans100 px • Subtitle: Open Sans 48 px