EVOLVING PATTERNS IN BIG DATA - NEIL AVERY

Evolving Patterns in Big Data
Neil Avery
CTO Excelian
neil.avery@excelian.com

Background….
Financial Services > Investment Banking:
• Cloud
• Big Data
• Risk platforms
@avery_neil

Agenda
• Before: How we used to do things?
• What has changed?
• Now: What are we doing now?
• Patterns: Enterprise, Lakes and Lambda
• Next: And then?

In the beginning…
• We used to:
- Solve simple problems…. kind of
- J2EE, CORBA….
- SOA… ESB
- Messaging

Getting smarter
• Data was in a Relational Database
• Data Scale was the challenge
• ‘Lets build a data caching layer!’…and IMDG’s
emerged
• We need to scale:
compute grids at 2k -> 50k compute cores;
they need data

Making Data scale
• Key-Value store
• Map <K,V>
• Plus events, listeners, processors,
location awareness and more…
[Oracle Coherence, GemFire,
Gigaspaces – and now Hazelcast]

So what is a compute grid?
• Financial services
• Life-sciences
• Computational fluid dynamics

Amazon – compute in the cloud
Source: http://media.amazonwebservices.com/

“And then”: Internet scale; a different mind-set
• Google OpenSource: Map/Reduce GFS etc.
• Hadoop: Map/Reduce/HDFS etc.
• Apache Dynamo: Cassandra etc.
• MongoDB
• We scan store at scale but doing anything
useful is slow, painful, infrastructure is
complex…..

The lightbulb; hype cycle
• We have data – it has value
• Tier-1 banks have been storing
logs in HDFS for 10 years

Where do you fit?
Business problem Big data type Description
Utilities: Predict power consumption Machine-generated data Utility companies have rolled out smart meters to measure the consumption of water, gas, and electricity at regular intervals of one hour or
less. These smart meters generate huge volumes of interval data that needs to be analyzed.
Telecommunications: Customer
churn analytics
Web and social data
Transaction data
Telecommunications operators need to build detailed customer churn models that include social media and transaction data, such as
CDRs, to keep up with the competition.
The value of the churn models depends on the quality of customer attributes (customer master data such as date of birth, gender, location,
and income) and the social behavior of customers.
Telecommunications providers who implement a predictive analytics strategy can manage and predict churn by analyzing the calling
patterns of subscribers.
Marketing: Sentiment analysis Web and social data Marketing departments use Twitter feeds to conduct sentiment analysis to determine what users are saying about the company and its
products or services, especially after a new product or release is launched.
Customer sentiment must be integrated with customer profile data to derive meaningful results. Customer feedback may vary according to
customer demographics.
Customer service: Call monitoring Human-generated IT departments are turning to big data solutions to analyze application logs to gain insight that can improve system performance. Log files
from various application vendors are in different formats; they must be standardized before IT departments can use them.
Retail: Personalized messaging based
on facial recognition and social
media
Web and social data
Biometrics
Retailers can use facial recognition technology in combination with a photo from social media to make personalized offers to customers
based on buying behavior and location.
This capability could have a tremendous impact on retailers? loyalty programs, but it has serious privacy ramifications. Retailers would
need to make the appropriate privacy disclosures before implementing these applications.
Retail and marketing: Mobile data
and location-based targeting
Machine-generated data
Transaction data
Retailers can target customers with specific promotions and coupons based location data. Solutions are typically designed to detect a
user's location upon entry to a store or through GPS.
Location data combined with customer preference data from social networks enable retailers to target online and in-store marketing
campaigns based on buying history. Notifications are delivered through mobile applications, SMS, and email.
FSS, Healthcare: Fraud detection Machine-generated data
Transaction data
Human-generated
Fraud management predicts the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze
transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, first-party fraud,
and deliberate misuse of account privileges.
Solutions are typically designed to detect and prevent myriad fraud and risk types across multiple industries, including:
•Credit and debit payment card fraud
•Deposit account fraud, Technical fraud, Bad debt, Healthcare fraud, Medicaid and Medicare fraud, Property and casualty insurance fraud
•Worker compensation fraud, Insurance fraud, Telecommunications fraud
Source: http://www.ibm.com/developerworks/library/bd-archpatterns1/index.html

For us: mostly enterprise
• Many dimensions
to understand!
• It’s how you keep
the business happy

Data Strategy
• CDO? Business strategy? Tech-consolidation, Business Use-cases
• Key-Factors are always Data (shape) and analytics (processing) –
maps onto Network, Storage and Compute; there is no escape
• So what patterns are we seeing?

Pattern 1 : Enterprise Cache
• Massive K-V Store
• Pluggable back-end (Cassandra, Mongo, Couchbase)
• Eventing with Kafka
• Nearside caching, multi-tenant, role-based-access

Pattern 2 : Enterprise Pipe
• Enterprise wide Kafka pipe
• Pluggable back-end (Rabbit, Cloud-AMQ etc.)
• Sending LogData to centralised storage, message passing,
high-performance, re-playable Queue etc.
• Multi-tenant

Pattern 3 : NoSQL A-A-S
• Much like Relational DB’s are managed
• Multi-tenant: using Cassandra Keyspaces (Blackrock, ING others)
• Rely on native platform features
• Multi-tenant, Role-based-access

Data Lakes
• Centralised hub-spoke
• Schema-less, raw data
• Map across sources
• Security and visibility
• Catalogue
• Data-virtualization

Lambda Architecture
• Near-real-time views
(storm, spark-streaming)

• Virtual Private Cloud
How does this work with Cloud?
Source: http://docs.datastax.com/
Cloud Region A
Cloud Region B
Cloud
Exchange
On-Prem

Trends
• Cassandra – massive traction
• Feels like a Database
• Maps well onto cloud
• Virtual private cloud is - helping driving Adoption
• Kafka – scales and works well
• Akka – Actor base (Spark)
• Microservices & Reactive: vertx.io etc.
• OpenHFT – high-performance java
• Cassandra versus the network

What’s next?
• More cloud & containerisation
• Graph DB
• Spark Evolution
• Spark SQL maturity
• OLAP NoSQL support for at-scale ad-hoc analysis
• Further commoditisation and generalisation of platforms (land grab)
• Data-virtualization
• 2016 – the year of the PAAS

Questions?
@avery_neil
www.excelian.com/
@Excelian
@Excelian
@ExcelianLTD

Title: Open Sans 100 px
• Subtitle: Open Sans 48 px

EVOLVING PATTERNS IN BIG DATA - NEIL AVERY

More Related Content

What's hot

Similar to EVOLVING PATTERNS IN BIG DATA - NEIL AVERY

More from Big Data Week

Recently uploaded

EVOLVING PATTERNS IN BIG DATA - NEIL AVERY