> Small vs. Big Data <
What the heck? What does it all
mean and how does it help me?
> Wikipedia: Big data
In information technology, big data consists of datasets
that grow so large that they become awkward to work
with using on-hand database management tools.
Difficulties include capture, storage, search, sharing,
analytics, and visualizing.
This trend continues because of the benefits of working
with larger and larger datasets allowing analysts to spot
business trends, prevent diseases, combat crime.
Though a moving target, current limits are on the order
of terabytes, exabytes and zettabytes of data.
June 2012 © Datalicious Pty Ltd 2
June 2012 © Datalicious Pty Ltd 3
Big data = bottlenecks
> Big data analytics bottlenecks
June 2012 © Datalicious Pty Ltd 4
Fast laptops now have up to 8GB of
RAM, that means you can compute
up to 6GB of raw data very fast in
memory thus bypassing the
biggest bottleneck: I/O
> Power vs. distributed computing
June 2012 © Datalicious Pty Ltd 5
Adding more supercomputers is
difficult as they are complex and
expensive but adding machines to
a distributed computing network
is fairly cheap and ‘easy’.
June 2012 © Datalicious Pty Ltd 6
Big data = hype?
> Importance of research experience
June 2012 © Datalicious Pty Ltd 7
The consumer decision process is changing from linear to circular.
Consideration
set now grows
during (online)
research phase
which increases
importance of
user experience
during that phase
(Online) Research
> The consumer data journey
June 2012 © Datalicious Pty Ltd 8
To retention messagesTo transactional data
From suspect to To customer
From behavioural data From awareness messages
TimeTime
prospect
Campaign response data
> Single customer view is key
June 2012 © Datalicious Pty Ltd 9
Customer profile data
+ The whole is greater
than the sum of its parts
Website behavioural data
> Maximise identification points
20%
40%
60%
80%
100%
120%
140%
160%
0 4 8 12 16 20 24 28 32 36 40 44 48
Weeks
Cam
paign
response
Em
ailsubscription
Online
purchase
Repeatpurchase
Confirm
ation
em
ail
Em
ailnew
sletter
W
ebsite
login
Online
billpaym
ent
−−− Probability of identification through Cookies
June 2012 10© Datalicious Pty Ltd
App
dow
nload/access
> Traditional single customer view
June 2012 © Datalicious Pty Ltd 11
Vendor
data feed #2
Website
data
Call center
data
Customer
data
Reports and
dashboards
Vendor
data feed #1
Vendor
data feed #3
Targeted
campaigns
Transaction
data warehouse
Reporting
data warehouse
Data import
(ETL) process
> Traditional single customer view
June 2012 © Datalicious Pty Ltd 12
Vendor
data feed #2
Website
data
Call center
data
Customer
data
Reports and
dashboards
Vendor
data feed #1
Vendor
data feed #3
Targeted
campaigns
Transaction
data warehouse
Reporting
data warehouse
Data import
(ETL) process
Challenge #1:
Rigid database
schema requires
extensive planning
and maintenance
Challenge #2:
Data feeds require
constant updates
and maintenance
Challenge #3:
Increasing number
of (unstructured)
data sources
Splunk instance
on dedicated
AWS server
> Splunk single customer view
June 2012 © Datalicious Pty Ltd 13
3rd party
campaign
execution
Splunk saved
searches and
dashboards
Splunk
Forwarder for
data import
Website
data
Call center
data
Customer
data
Splunk regex
builder and
data exports
SuperTag
integration for
real-time data
3rd party data
mining and
reporting
> Key Splunk advantages
§ Powerful data mining
– Structured and unstructured data
§ Easy sharing of insights
– Online dashboards and reports
§ Short project duration
– Quick implementation and 1st insights
§ Integration with other platforms
– Regex builder and data extracts
§ Low technology and resource costs
– Implementation and maintenance
June 2012 © Datalicious Pty Ltd 23
June 2012 © Datalicious Pty Ltd 24
Contact us
cbartens@datalicious.com
Learn more
blog.datalicious.com
Follow us
twitter.com/datalicious
Data > Insights > Action

AIMIA Big Data Challenges Splunk

  • 1.
    > Small vs.Big Data < What the heck? What does it all mean and how does it help me?
  • 2.
    > Wikipedia: Bigdata In information technology, big data consists of datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to spot business trends, prevent diseases, combat crime. Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data. June 2012 © Datalicious Pty Ltd 2
  • 3.
    June 2012 ©Datalicious Pty Ltd 3 Big data = bottlenecks
  • 4.
    > Big dataanalytics bottlenecks June 2012 © Datalicious Pty Ltd 4 Fast laptops now have up to 8GB of RAM, that means you can compute up to 6GB of raw data very fast in memory thus bypassing the biggest bottleneck: I/O
  • 5.
    > Power vs.distributed computing June 2012 © Datalicious Pty Ltd 5 Adding more supercomputers is difficult as they are complex and expensive but adding machines to a distributed computing network is fairly cheap and ‘easy’.
  • 6.
    June 2012 ©Datalicious Pty Ltd 6 Big data = hype?
  • 7.
    > Importance ofresearch experience June 2012 © Datalicious Pty Ltd 7 The consumer decision process is changing from linear to circular. Consideration set now grows during (online) research phase which increases importance of user experience during that phase (Online) Research
  • 8.
    > The consumerdata journey June 2012 © Datalicious Pty Ltd 8 To retention messagesTo transactional data From suspect to To customer From behavioural data From awareness messages TimeTime prospect
  • 9.
    Campaign response data >Single customer view is key June 2012 © Datalicious Pty Ltd 9 Customer profile data + The whole is greater than the sum of its parts Website behavioural data
  • 10.
    > Maximise identificationpoints 20% 40% 60% 80% 100% 120% 140% 160% 0 4 8 12 16 20 24 28 32 36 40 44 48 Weeks Cam paign response Em ailsubscription Online purchase Repeatpurchase Confirm ation em ail Em ailnew sletter W ebsite login Online billpaym ent −−− Probability of identification through Cookies June 2012 10© Datalicious Pty Ltd App dow nload/access
  • 11.
    > Traditional singlecustomer view June 2012 © Datalicious Pty Ltd 11 Vendor data feed #2 Website data Call center data Customer data Reports and dashboards Vendor data feed #1 Vendor data feed #3 Targeted campaigns Transaction data warehouse Reporting data warehouse Data import (ETL) process
  • 12.
    > Traditional singlecustomer view June 2012 © Datalicious Pty Ltd 12 Vendor data feed #2 Website data Call center data Customer data Reports and dashboards Vendor data feed #1 Vendor data feed #3 Targeted campaigns Transaction data warehouse Reporting data warehouse Data import (ETL) process Challenge #1: Rigid database schema requires extensive planning and maintenance Challenge #2: Data feeds require constant updates and maintenance Challenge #3: Increasing number of (unstructured) data sources
  • 13.
    Splunk instance on dedicated AWSserver > Splunk single customer view June 2012 © Datalicious Pty Ltd 13 3rd party campaign execution Splunk saved searches and dashboards Splunk Forwarder for data import Website data Call center data Customer data Splunk regex builder and data exports SuperTag integration for real-time data 3rd party data mining and reporting
  • 23.
    > Key Splunkadvantages § Powerful data mining – Structured and unstructured data § Easy sharing of insights – Online dashboards and reports § Short project duration – Quick implementation and 1st insights § Integration with other platforms – Regex builder and data extracts § Low technology and resource costs – Implementation and maintenance June 2012 © Datalicious Pty Ltd 23
  • 24.
    June 2012 ©Datalicious Pty Ltd 24 Contact us cbartens@datalicious.com Learn more blog.datalicious.com Follow us twitter.com/datalicious
  • 25.