Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Cone™‫‏‬– Digital Marketing
Digital Transformation
Throughout eternity, all that is of like form comes around again –
everything that is the same must...
Digital Product Lifecycle Strategy
• Everything that goes around, comes around – everything has its’ own
lifecycle, in its...
Digital Product Lifecycle Strategy
Investment
Product
Lifecycle
Product
Design
Product
Launch
Product
Planning
Death
Plate...
Digital Product Lifecycle Strategy
The Cone™‫‏‬- Lifestyle Understanding
The CONE™
The CONE™ - Social Intelligence
Getting to the heart of audiences - and
putting audiences back at the heart of m...
The CONE™ - Audience Measurement
• Due to severe competition, Communications Service Providers (CSPs) such as 3 Mobile, EE...
The CONE™ - Social Intelligence
This revolutionary Digital Marketing approach is called the Cone™‫-‏‬ a next-
generation S...
21st Century Lifestyle Understanding
Fanatics (10%) Enthusiasts (20%) Casuals (20%) Indifferent (40%)
Cone™ Fan Base Under...
The CONE™ - a New Lens
Today we can view
audiences through a
better lens than given by
traditional segmentation.
Our bette...
Cone™ Lifestyle Understanding
What‫‏‬is‫‘‏‬The‫‏‬Cone’?
• At its simplest, The‫‏‬Cone™‫‏‬is a visual metaphor that maps th...
Cone™ Lifestyle Understanding
Cone™ Lifestyle
Understanding©
Fanatics (10%)
- Core fans, including
cultural arbiters, tren...
Cone™ Lifestyle Understanding
How does the Cone work?
• The principle of The‫‏‬Cone™‫‏‬Audience‫‏‬Metrics‫‏&‏‬Analytics‫‏‬...
Cone™ Lifestyle Understanding
©2013 Innovation Pipeline
The CONE™ - BBC Radio 1
Cone™‫‏‬Innovation - BBC Radio 1, 2002-05
• In 2002, BBC Radio 1 - the UK’s no.1 youth radio brand...
Sony Music: Audience Cone™ / Artist DNA
Sony Music 2007-2011 - Audience Cone™‫/‏‬ Artist DNA
• The key to success at Sony ...
The Challenge – American Idol, 2014
The Challenge – American Idol, 2014
• Analyse the Reality TV audience spectrum so that...
The CONE™ - American Idol, 2014
Cone™‫‏‬Innovation – American Idol, 2014
1. Fanatics - 10% : - Know about each contestant ...
Cone™ Fan Base Understanding
©2013 Innovation Pipeline
The Cone™ Application
• Where old-school audience analysis was retrospective and fixed, the
new Cone™ data science is lean...
The‫‏‬Cone™‫‏‬Application
Social Intelligence
Cloud
CRM
Data
Profile
Data
CRM / CEM
Big Data
Analytics
Customer Management...
Proof-of-concept and Prototype
The Cone™‫‏‬approach is lean, agile, smart and creative: -
• We start by providing a custom...
The Cone™‫‏‬– Model Design and Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1...
The Cone™‫‏‬– Social Intelligence
The Cone™‫‏‬
The Cone™‫‏‬– Digital Marketing
– turning Social Intelligence into Actionable Marketing Insights / Sales Oppo...
The Cone™‫‏‬- Digital Marketing
Telematics
The Internet of Things (IoT) – Smart Devices, Smart Apps, Wearable
Technology, Vehicle Telemetry, Smart Homes a...
The Cone™‫‏‬– Model Design and Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1...
Social Intelligence – Brand Loyalty and Affinity
CONE SEGMENTS – Brand Loyalty and Affinity
Social Intelligence drives Bra...
Social Intelligence – Streaming and Segmentation
Social
Interaction
Brand
Affinity
Geo-demographic
ProfileExperian Mosaic ...
Social Intelligence – Social Interaction
Social Interaction Cone Rules
1. Inactive – not engaged – low evidence / low affi...
Social Intelligence – Actionable Insights
Brand
Affinity
Social
Interaction
Geo-demographic
Profile
Experian Mosaic – 15 G...
Social Interaction
How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around...
The chart above illustrates the richness and diversity of social media.....
The pattern of Social Relationships.....
Social Media is the fastest growing category of user-provided global content and ...
Social Conversations SCRM in the Cloud
Traditional CRM was very much based around data and information that brands could collect
on their customers, all of which...
Evolution of CRM to SCRM - The challenge for organizations now is adapting and evolving
to meet the needs and demands of t...
In Social CRM - the customer is actually the focal point of how an organization operates. Instead of
marketing products or...
Posted on April 20, 2010 by Laurance Buchanan - Capgemini
SOCIAL CRM – a Business Framework and Operating Model
Social CRM...
Social Graphs and Market Sentiment
•‫‏‬Using‫“‏‬BIG‫‏‬DATA”‫‏‬to‫‏‬drive‫‏‬Market‫‏‬Sentiment‫‏•‏‬
Unprompted online conve...
• Influencer Programmes have a long history in
industries such as software, computers and
electronics, - but today they ar...
The Cone™‫‏‬- Digital Marketing
SalesForce.com – a Cloud Platform Social CRM Business Solution
The Cone™‫‏‬- Digital Marketing
The Cone™‫‏‬- Lifestyle Und...
Digital Marketing – Solution Options
Vendor Social
Intelligence
Mobile Big Data Analytics Cloud CRM / CEM
Amazon +
Salesfo...
The Cone™‫‏‬- Digital Marketing
The Cone™‫‏‬
Lifestyle Understanding
The‫‏‬Cone™‫‏‬– Brand Loyalty and Affinity
The Cloud ...
The Cone™‫‏‬– Digital Marketing
Connecting‫‏‬the‫‏‬Unconnected…..
• FMCG, Media, Entertainment and other enterprises which...
The Cone™‫‏‬- Eight Primitives
Primitive Problem / Opportunity Business
Domain
System Function Software Product
Who ? Who ...
The Cone™‫‏‬– EIGHT PRIMITIVES
Event
Dimension
Party
Dimension
Geographic
Dimension
Motivation
Dimension
Time
Dimension
Me...
Social Intelligence – Profiling and Analysis
Fanatics - 10%
Enthusiasts - 20%
Casuals - 30%
Indifferent - 40%
The Cone™‫‏‬...
The Cone™‫‏‬– Model Development
Initialise
Cone™‫‏‬
Model
Cone™‫‏‬
Model
Design
Data Load
Cone™‫‏‬
Model
Calibration
and T...
The Cone™‫‏‬– Model Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1 Cone™‫‏‬Mo...
The Cone™‫‏‬– Model Implementation
Initialise
Cone™‫‏‬
Model
Cone™‫‏‬
Model
Design
Data Load
Cone™‫‏‬
Model
Calibration
an...
The Cone™‫‏‬– Digital Marketing
Data Streams into Revenue Streams…..
• Digital Marketing is the communication, advertising...
The Cone™
Converting Data Streams into Revenue Streams
Salesforce
Anomaly 42
Cone
Unica
End User
BIG DATA
ANALYTICS
SOCIAL...
SalesForce.com – a Cloud Platform Social CRM Business Solution
The Cone™‫‏‬- Digital Marketing
The Cone™‫‏‬- Lifestyle Und...
“DATA‫‏‬SCIENCE”‫–‏‬ my own special area of Business expertise
Targeting – Map / Reduce
Consume – End-User Data
Data Acqui...
The Cone™‫‏‬- Brand Loyalty / Affinity
1. Brand
Affinity
2. Social
Interaction
3. Geo-demographic Profile – Experian Mosai...
The Cone™‫‏‬- CAMPAIGN
Salesforce
Anomaly 42
Cone
Unica
End User
BIG DATA
ANALYTICS
Cone™‫‏‬
Brand Affinity
Campaign
CRM
Insights
InsightsInsight...
Hadoop
Clustering and Managing Data.....
Managing Data Transfers in Networked Computer Clusters using Orchestra
To illustr...
Hadoop Framework
• The workhorse relational database has been the tool of choice for businesses for well over 20
years now...
RDBMS and Hadoop: Apples and Oranges?
• Below is Figure 1 - a comparison of the overall differences between
Database RDBMS...
Comparing Data in DWH, Appliances,
Hadoop Clusters and Analytics Engines
RDBMS DWH DWH Appliance Hadoop Cluster Analytics ...
Hadoop Framework
• These datasets would previously have been very challenging and expensive to take on with a
traditional ...
The Emerging “Big Data” Stack
Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
–...
Hadoop Framework
• Each of these factors is presently driving interest in alternatives that are significantly better at
de...
HP HAVEn Big Data Platform
Informatica / Hortonworks Vibe
Telco 2.0 “Big Data” Analytics Architecture
Case Study – Huawei SmartCare CEM
Customers
Campaign Mart
Analytics &
Customer
Loyalty
Loyalty Mart
CRM Data
Customer DWH ...
Big Data – Products
The MapReduce technique has spilled over into many other disciplines that process vast
quantities of i...
Split-Map-Shuffle-Reduce Process
Big Data
Consumers
Split Map Shuffle Reduce
Key / Value Pairs Actionable InsightsData Pro...
Apache Hadoop Component Stack
HDFS
MapReduce
Pig
Zookeeper
Hive
HBase
Oozie
Mahoot
Hadoop Distributed File System (HDFS)
S...
Data Management Component Stack
Informatica
Drill
Millwheel
Informatica Big Data Edition / Vibe Data Stream
Data Analysis ...
Big Data Storage Platforms
Autonomy
Vertica
MongoDB
HP Unstructured Data DBMS
HP Columnar DBMS
High-availability DBMS
Couc...
Big Data Analytics Engines and Appliances
Alpine
Karmasphere
Kognito
Alpine Data Studio - Advanced Big Data Analytics
Karm...
Big Data Analytics and Visualisation Platforms
Tableaux Tableaux - Big Data Visualisation Engine
Eclipse Symentec Eclipse ...
Hadoop / Big Data Extended Infrastructure Stack
SSD Solid State Drive (SSD) – configured as cached memory / fast HDD
CUDA ...
Cloud-based Big-Data-as-a-Service and Analytics
AWS
Amazon Web Services (AWS) – Big Data-as-a-Service (BDaaS)
Elastic Comp...
Gartner Magic Quadrant for BI and Analytics Platforms
Hadoop Framework Distributions
FEATURE Hortonworks Cloudera MAPR Pivotal
Open Source Hadoop Library Yes Yes Yes Pivotal HD...
Gartner Magic Quadrant for BI
Data Warehouse Appliance / Real-time
Analytics Engine Price Comparison
Manufacturer
Server
Configuration
Cached Memory
Ser...
Clustering in “Big Data”
“A Cluster is a group of the same or similar data elements
which are aggregated – or closely dist...
• “Big‫‏‬Data”‫‏‬refers to vast aggregations (super sets) consisting of numerous individual
datasets (structured and unstr...
Clustering in “Big Data”
• The profiling and analysis of
large aggregated datasets in
order to determine a ‘natural’
struc...
Clustering in “Big Data”
•‫"‏‬BIG‫‏‬DATA”‫‏‬ANALYTICS‫–‏‬ PROFILING, CLUSTERING and 4D‫‏‬GEOSPATIAL‫‏‬ANALYSIS‫‏•‏‬
• The ...
Multi-channel Retail - Digital Architecture
• The last decade has seen an unprecedented explosion in mobile platforms
as t...
Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
– Mobile Enterprise Platforms (...
GIS MAPPING and SPATIAL DATA ANALYSIS
• A Geographic Information System (GIS) integrates hardware, software and
digital da...
GIS Mapping and Spatial Analysis
•‫‏‬GIS‫‏‬MAPPING‫‏‬and‫‏‬SPATIAL‫‏‬DATA‫‏‬ANALYSIS‫•‏‬
• A Geographic Information System...
World-wide Visitor Count – GIS Mapping
Geo-demographic Clustering in “Big Data”
•‫‏‬GEODEMOGRAPHIC‫‏‬PROFILING‫–‏‬ CLUSTERING‫‏‬IN“BIG‫‏‬DATA”‫‏•‏‬
• The profili...
GIS Mapping and Spatial Analysis
• A Geographic Information System (GIS) integrates hardware, software and digital
data ca...
BTSA Induction Cluster Map
Geo-Demographic Profile Clusters
Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume
– Mobile Enterprise Platforms (MEAP’s)
– D...
Clustering Phenomena in “Big Data”
“A Cluster is a group of profiled data similarities aggregated closely together”
• Clus...
Event Clusters and Connectivity
A
B
C
D
E
G
H
F
The above is an illustration of Event relationships - how Events might be ...
Event Clusters and Connectivity
• Aggregated Event includes coincident, related, connected and interconnected Event: -
• C...
Event Clusters and Connectivity
1
2
3
4
5
7
8
6
The above is an illustration of Event relationships - how Risk Events migh...
Aggregated Event Types
ATrigger A
Coincident Events
BTrigger B
Event
Event
CTrigger 1
Related Events
DTrigger 2
Event
Even...
Event Complexity Map
• 4D Geospatial Analytics is the
profiling and analysis of large
aggregated datasets in order to
determine a ‘natural’ str...
The Flow of Information through Time
• String Theory predicates that Space-Time exists in discrete packages, with Time Pre...
4D Geospatial Analytics – The Temporal Wave
• The Temporal Wave is a novel and innovative method for Visual Modelling and ...
4D Geospatial Analytics – London Timeline
4D Geospatial Analytics – London Timeline
• How did London evolve from its creation as a Roman city in 43AD into the
crowd...
• The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration
of Geospatial “Big Data” - simul...
Social Intelligence – Brand Affinity
CONE SEGMENTS - BRAND AFFINITY
• Social Intelligence drives Brand Loyalty Understandi...
Social Intelligence – Fan-base Understanding
Football Supporters – Map of London
Social Intelligence – Fan-base Understanding
CONE STREAMING and SEGMENTATION
• Multiple Cones can be created and cross-ref...
Social Intelligence – Fan-base Understanding
Social Intelligence – Social Interaction
Social Interaction Cone Rules
1. Inactive – not engaged – low evidence / low affi...
Social Interaction
How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around...
Clustering in “Big Data”
“A Cluster is a group of profiled data similarities aggregated closely together”
• Cluster Analys...
Cluster Types
Deep Space Galactic Clusters
Hadoop Cluster – “Big Data” Servers
Molecular Clusters
Geo-Demographic Clusters...
•‫‏‬GEODEMOGRAPHIC‫‏‬PROFILING‫–‏‬ CLUSTERING‫‏‬IN“BIG‫‏‬DATA”‫‏•‏‬
• The profiling and analysis of very large aggregated ...
Cluster Types
DISCIPLINE CLUSTER TYPE CLUSTERS DIMENSIONS DATA TYPE DATA SOURCE CLUSTERING
FACTORS /
FORCES
Astrophysics 4...
Clustering in “Big Data”
•‫"‏‬BIG‫‏‬DATA”‫‏‬ANALYTICS‫–‏‬ PROFILING, CLUSTERING and 4D‫‏‬GEOSPATIAL‫‏‬ANALYSIS‫‏•‏‬
• The ...
Cluster Analysis
• Data Representation
– Metadata - identifying common Data Objects, Types and Formats
• Data Taxonomy and...
k-means/Gaussian-Mixture Clustering of Audio Segments
Cluster Analysis
Clustering Algorithms
Hundreds of spatial, mathematical and statistical clustering algorithms are availab...
Distributed Clustering Model Performance
Clustering 100,000 2-D points with 2 clusters on 2.3 GHz quad-core
Intel Xeon pro...
Distributed
Clustering Models
Number of
processors
Speedup Factor
- K-means
Speedup Factor
- Kernel K-means
2 1.1 1.3
3 2....
Distributed Clustering Model Performance
Distributed Approximate Kernel K-means
2-D data set with 2 concentric circles
2.3...
HPCC Clustering Models
High Performance / High Concurrence Real-time Delivery (HPCC)
Distributed Clustering Models
The Cone™‫‏‬– Brand Loyalty / Affinity
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Cone TM Digital Marketing - Principles PDF
Upcoming SlideShare
Loading in …5
×

Cone TM Digital Marketing - Principles PDF

1,084 views

Published on

  • Be the first to comment

Cone TM Digital Marketing - Principles PDF

  1. 1. The Cone™‫‏‬– Digital Marketing
  2. 2. Digital Transformation Throughout eternity, all that is of like form comes around again – everything that is the same must return again in its own everlasting cycle..... • Marcus Aurelius – Emperor of Rome •
  3. 3. Digital Product Lifecycle Strategy • Everything that goes around, comes around – everything has its’ own lifecycle, in its’ own time. Things are born, grow, age, and ultimately they die. It’s easy to spot a lifecycle in action everywhere you look. As a person is born, grows, ages, and dies – then so does a star, a tree, a bird, a bee, or a civilization – and so does a company, a product, a technology or a market - everything goes around in a lifecycle of it own.
  4. 4. Digital Product Lifecycle Strategy Investment Product Lifecycle Product Design Product Launch Product Planning Death Plateau Product Maturity Decline Aging Early Growth Migrate Customers to new Products Withdraw Innovation Prototype / Pilot / Proof-of-concept Cash CowCease Investment
  5. 5. Digital Product Lifecycle Strategy
  6. 6. The Cone™‫‏‬- Lifestyle Understanding
  7. 7. The CONE™ The CONE™ - Social Intelligence Getting to the heart of audiences - and putting audiences back at the heart of marketing.
  8. 8. The CONE™ - Audience Measurement • Due to severe competition, Communications Service Providers (CSPs) such as 3 Mobile, EE, Talk-Talk and Vodafone, along with Mobile Virtual Network Operators (MVNOs) such as Virgin, Tesco and Giff-gaff - no longer make significant profit from their core services (Mobile, Fixed-line and Broadband). This has caused the dash for “Quad-play”, where CSPs now add Media and Entertainment Packages to their core network services offering (Mobile, Fixed-line & Broadband). • TV Set-top Boxes (Virgin, Talk-Talk, Sky, EE) are connected to the Internet and continuously stream Audience Channel Selection data and Music Play-lists to the Communications Service Provider (CSP) Audience Insight and Analytics servers. Similarly, Smart Phone Apps (BBC i- player, Sky Go, Netflix, Spotify) also continuously stream Audience Channel Selection data and Music Play-lists to the Communications Service Provider (CSP) - via Apigee to AWS Big Data. • In a typical household (Mother, Father, two children) there may be four Smart Phones and as many as ten other internet connected devices (Tablets, Laptops, Internet TVs, TV Set-top Boxes and Video Games Boxes) – all streaming video, audio and data – the details of which are captured, stored and analysed by the Communications Service Provider (CSP) using “Big Data” Analytics techniques. This yields valuable Audience Metrics and Analytics based on intimate understanding of consumer video, audio and internet content from which actionable audience insights is derived from video, audio and internet streaming data – which drives Personalised Advertising across all devices (Smart Phone, Tablet, Internet TV, Games Boxes).
  9. 9. The CONE™ - Social Intelligence This revolutionary Digital Marketing approach is called the Cone™‫-‏‬ a next- generation Social Intelligence solution for real-time lifestyle understanding: - • The Cone™‫‏‬solution uses Social Intelligence to get right to the heart of every audience - and puts the audience back at the heart of every media organisation. • The Cone™‫‏‬Digital‫‏‬Marketing‫‏‬solution works through Real-time Analytics – tuning directly into the dynamic nature of people, fashion, media and culture. • The Cone™‫‏‬solution analyses intimate audience viewing behaviour using Social Intelligence and Real-time Insight, inspiring better digital marketing campaigns, faster – ideas which connect directly with the widest possible network audience. • Most importantly, the Cone™‫‏‬solution tracks and understands the changing behaviour of viewers, fans and audiences and their propensity to engage with different ideas, lifestyles, interests, needs, passions, aspirations and desires.
  10. 10. 21st Century Lifestyle Understanding Fanatics (10%) Enthusiasts (20%) Casuals (20%) Indifferent (40%) Cone™ Fan Base Understanding© ©2013 Innovation Pipeline
  11. 11. The CONE™ - a New Lens Today we can view audiences through a better lens than given by traditional segmentation. Our better lens is what we now call the Cone™. The Cone™ visualises the volume and behaviour of a user-defined audience. When an audience is viewed is this way, the behaviours and volumes are visualised across our Cone™ spectrum that segments the audience’s propensity to engage. It’s this behaviour and volume understanding that visualises the Cone™. Scene Setters Restless Contented ©2013 Innovation Pipeline
  12. 12. Cone™ Lifestyle Understanding What‫‏‬is‫‘‏‬The‫‏‬Cone’? • At its simplest, The‫‏‬Cone™‫‏‬is a visual metaphor that maps the volume of audiences across an engagement spectrum with regards to how people connect with different passions and ideas. • At its most sophisticated, the Cone™ delivers total entertainment digital innovation. Why a Cone? • The Cone™ shape is informed by the correlation between the volume of audiences and their propensity to engage with different passions. This Cone shape proves to be universal in it’s application to brands, ideas and industries that have ‘fans’ i.e. – 1. The thin, pointy end of the Cone™ - • Low audience volume but incredibly high engagement and therefore high ‘purchase’ intent’ 2. The fat, base end of the Cone™ - • High audience volume but low engagement and therefore, much lower ‘purchase 'intent’ • We use our proprietary IP to produce The Cone™ in industries and clients that have fans (or at least where people engage through ‘passionate interest’ vs mere ‘consumption’). Thus The‫‏‬Cone™‫‏‬maps people as fans and audiences with active interests, needs and desires - not just as passive consumers.
  13. 13. Cone™ Lifestyle Understanding Cone™ Lifestyle Understanding© Fanatics (10%) - Core fans, including cultural arbiters, trend setters, curators, editors. Enthusiasts (20%) - Social amplifiers, restless for the new, who enjoy the discovery and social kudos of feeling and “being first”. Casuals (20%) - The wider market, happy to be influenced by others and open to engagement through social influence. Indifferent (40%) - Generally agnostic, uninterested and indifferent to ideas in question. Fanatics 10% Enthusiasts 20% Casuals 30% Indifferent 40% ©2013 Innovation Pipeline
  14. 14. Cone™ Lifestyle Understanding How does the Cone work? • The principle of The‫‏‬Cone™‫‏‬Audience‫‏‬Metrics‫‏&‏‬Analytics‫‏‬Solution‫‏‬is firstly to understand people’s lives, and then understand the role that different entertainment concepts and content play in their lives. Using this narrative of understanding, we can gain unique insights, helping make better and more incisive decisions through understanding who ideas are connecting with and why that inspires creative marketing. We then apply The Cone™ creative inspiration to innovate compelling propositions and ideas that will connect with the widest possible audiences. • On the surface, The‫‏‬Cone™‫‏‬profiles people’s propensity to engage with any given lens e.g. film, reality TV, music, radio, mobile, etc. along our FECI continuum: ranging from Fanatics through Enthusiasts to Casuals and “Indifferent” – finally the “Unconnected”. We then use proprietary data analytics to profile and describe groups of similar people within the FECI continuum. • The‫‏‬Cone™‫‏‬facilitates our understanding of how groups of like-minded individuals are connecting (or not connecting…..) with our brand and content – thus we can use intimate personal insights to learn how to inspire the right kinds of ideas and events to better target brand positioning and product content, influencing more receptive audiences, so delivering new core fan connections which drives an expanding and increasingly loyal fan base …..
  15. 15. Cone™ Lifestyle Understanding ©2013 Innovation Pipeline
  16. 16. The CONE™ - BBC Radio 1 Cone™‫‏‬Innovation - BBC Radio 1, 2002-05 • In 2002, BBC Radio 1 - the UK’s no.1 youth radio brand (now globally streamed to millions) - was in danger of losing its public service licence. Listener volume was in decline, with a total RAJAR audience of circa 7 million. Radio 1 had become disconnected from its core audiences. • We were asked to help innovate the total transformation of ideas, creativity and environment to return Radio 1 to its pre-eminent place in youth culture. • Central to Radio 1’s innovative revival was a new lens through which to view the Radio 1 audience. This lens helped us understand audience engagement through behaviour - versus fixed demographics. ©2013 Innovation Pipeline
  17. 17. Sony Music: Audience Cone™ / Artist DNA Sony Music 2007-2011 - Audience Cone™‫/‏‬ Artist DNA • The key to success at Sony Music was using the Audience‫‏‬Cone™‫‏‬and Artist DNA in order to help A&R Managers and Producers to understand the role music plays in people's lives - and then understand the impact of any particular genre or specific artist within that audience and cultural context. • We provided a unique approach to make sense of Digital Marketing and Social Intelligence as part of an Artists musical and career development. We called it the Artist DNA – a tool which supports the insightful creative foundation for all artist releases, tours, appearances and campaigns. • Today the Cone™‫‏‬App‫-‏‬ our proprietary solution using the Audience Cone™‫‏‬and Artist DNA approach – is used by Sony Music in 32 global territories – placing the audience back at the heart of Sony Music and putting the artists back at the heart of their audiences - attracting new fans and re- connecting with old fans – to give the widest possible audience and fan-base.
  18. 18. The Challenge – American Idol, 2014 The Challenge – American Idol, 2014 • Analyse the Reality TV audience spectrum so that we can better understand who American Idol fans are, and therefore gain insight into how we can halt the audience decline of 2014….. • There is a very real and present Reality TV Cone - because there exists distinct Reality TV audience clusters - discrete groups of people who engage with Reality TV in a variety of different ways….. • Reality TV is a well understood lens into how people live out their own lives (they might not admit this) – so that we can understand viewers lives and lifestyle and engage them through the Reality TV lens. • We can map this lens through our Fanatics, Enthusiasts, Casuals and Indifferent (FECI) spectrum in order to place each individual along a continuum of audience interest, affinity, loyalty and engagement. • We can then profile and segment these people into different groups along the FECI spectrum – and therefore, those within these groups who have a greater propensity and appetite for American Idol: - – Viewers with an increased or decreased awareness of the Reality TV genre – Viewers with a higher or lower interest in Reality TV shows / media coverage – Viewers with a greater or lesser knowledge of Reality TV presenters / participants – Viewers who invest more or less time in consuming Reality TV – live / streamed content
  19. 19. The CONE™ - American Idol, 2014 Cone™‫‏‬Innovation – American Idol, 2014 1. Fanatics - 10% : - Know about each contestant in every show, devote time to reality TV. Primarily live viewers. 2. Enthusiasts - 26%: - Buy very much into Reality TV. Have other passions. Love social media ‘second screening’. 3. Casuals - 42% : - A more diverse group. Reality TV is only one part of their busy lives. Will engage if it meets their needs and values. American Idol, 2014 over-indexed on “Casuals”‫–‏‬ but under-indexed on Audience Total 4. Indifferent - 22% : - “Indifferent”‫‏‬viewers interact with the brand when there are other brand Fans within their social network who act as “Influencers”.‫‏‏‬AI 2014 under-indexed on both “Indifferent”‫‏‬and Audience Total 5. Unconnected. Huge marketplace. Generally, “Unconnected”‫‏‬viewers only connect with the brand if there are other brand advocates within their social network who act as influencers or “Introducers”‫‏‬to Reality TV series. Fanatics 10% Enthusiasts 26% Casuals 42% Indifferent 22% The Challenge – American Idol, 2014 Analyse the Reality TV audience so that we can better understand who American Idol fans are, and therefore gain insight into how we can halt the audience decline of 2014….. • There is a Reality TV Cone because there exists discrete groups of people who engage with Reality TV in different ways. • Reality TV is a well understood lens in peoples lives (they might not admit this - but we can view their lives through this Reality TV lens). • We can map this lens through our Fanatics, Enthusiasts, Casuals and Indifferent (FECI) continuum in order to place every individual along the spectrum of audience engagement. ©2013 Innovation Pipeline
  20. 20. Cone™ Fan Base Understanding ©2013 Innovation Pipeline
  21. 21. The Cone™ Application • Where old-school audience analysis was retrospective and fixed, the new Cone™ data science is lean, agile, current, fluid and predictive. • The‫‏‬Cone™‫‏‬App takes our proven Audience Cone™‫‏‬and Artist DNA approach and puts it on-line to render a custom lens for an audience; a lens you can zoom, pan and focus - to reveal more hidden detail. • The‫‏‬Cone™‫‏‬App applies data science and digital analytics principles to generate innovative marketing insights - translated into a narrative of real-time audience understanding - that answers the six key questions: - 1. What’s happening now ? 2. Who’s making it happen ? 3. Where is it happening ? 4. Why is it happening ? 5. When is it happening ? 6. How is it happening ?
  22. 22. The‫‏‬Cone™‫‏‬Application Social Intelligence Cloud CRM Data Profile Data CRM / CEM Big Data Analytics Customer Management (CRM / CEM) Social Intelligence Campaign Management e-Business Big Data Analytics The Cone™‫‏‬ Customer Loyalty & Brand Affinity The Cone™‫‏‬ Smart Apps Audience Survey Data Insights Reports TV Set- top Box
  23. 23. Proof-of-concept and Prototype The Cone™‫‏‬approach is lean, agile, smart and creative: - • We start by providing a custom Cone™ app as a proof of concept. We then work with client key stakeholders to scope a detailed brief which articulates a business problem domain that the Cone™ can help resolve. • Under normal circumstances we utilise all current and past audience research and any other available internal data to first establish a baseline client Cone™. • We then augment this by overlaying external data - Social Media Intelligence and other live streamed audience data that will provide our new real-time view for who / what / why / where / when and how fan-base and lifestyle understanding. • Lastly, we apply this understanding social intelligence as new actionable insights to inform creative marketing campaign solutions against the agreed brief. • Post proof-of-concept, we then agree a Cone™ app fixed term licence along with Cone™ consulting, mentoring and support – on-demand, as and when required.
  24. 24. The Cone™‫‏‬– Model Design and Delivery Phase / Step Description Input Design Process Output Cost (estimate) Skill Set 1 1 Cone™‫‏‬Model‫‏‬Data‫‏‬ Analysis / Design User Requirements Data Analysis & Data Modelling Cone™ Logical Data Model £k Business / Data Analyst 2 Cone™‫‏‬Data‫‏‬Design‫‏‬ – Questionnaire User Requirements Data Analysis & Data Modelling Questionnaire Survey Form £k Business / Data Analyst 3 Cone™‫‏‬Physical‫‏‬ Database Design Logical Data Model Cone™ Database Design Physical Cone™ Design £k Data Analyst / DBA 4 Cone™‫‏‬Data‫‏‬Load‫–‏‬ Questionnaire / Survey Forms Physical Data Model, Survey Questionnaire Cone™ Model Calibration and Tuning Runs Initialised Cone™ Model £k Business / Data Analyst, DBA 2 5 Cone™‫‏‬Data‫‏‬Load‫–‏‬ In-house CRM and Audience Data Physical Data Model, People CRM Data Cone™ Model CRM Data Load Populated Cone™ Model £k Business / Data Analyst, DBA 6 Cone™‫‏‬Profiling Cone™ Clustering Algorithms Cone™ Model Data Profiling – Kernel k-means Profiled Cone™ Model £k Data Analyst, DBA, Data Scientists 3 7 Cone™‫‏‬Streaming‫‏‬ and Segmentation Historic Sales and CRM Data Cone™ History Matching Runs Cone™ Historic Trends £k Data Scientists 8 Cone™‫‏‬Real-time Social Media Feeds Global Social Intelligence Cone™ Real- Time Analytics Actionable Cone™ Insights (variable with Cone™ total data volume) Data Scientists
  25. 25. The Cone™‫‏‬– Social Intelligence
  26. 26. The Cone™‫‏‬ The Cone™‫‏‬– Digital Marketing – turning Social Intelligence into Actionable Marketing Insights / Sales Opportunities… 1. Education Cone™ – Training and Education Business Scenario and Use Cases 2. Utilities Cone™ – Water, Gas and Electricity Business Scenario and Use Cases 3. Media Cone™ – Broadband, Land-line, Mobile and Entertainment Business Scenario and Use Cases 4. Music Cone™ – Brand / Genre / Label / Artists Business Scenario and Use Cases 5. Political Cone™ – Party and Voter Election Business Scenario and Use Cases 6. Fashion Cone™ – Fashion and Luxury Brands Business Scenario and Use Cases 7. Sports Cone™ – Elite Team Sports Franchise Business Scenario and Use Cases 8. Patient Cone™ – Digital Healthcare / medical Business Scenario and Use Cases
  27. 27. The Cone™‫‏‬- Digital Marketing
  28. 28. Telematics The Internet of Things (IoT) – Smart Devices, Smart Apps, Wearable Technology, Vehicle Telemetry, Smart Homes and Building Automation SMACT/4D Digital Technology Stack
  29. 29. The Cone™‫‏‬– Model Design and Delivery Phase / Step Description Input Design Process Output Cost (estimate) Skill Set 1 1 Cone™‫‏‬Model‫‏‬Data‫‏‬ Analysis / Design User Requirements Data Analysis & Data Modelling Cone™ Logical Data Model £k Business / Data Analyst 2 Cone™‫‏‬Data‫‏‬Design‫‏‬ – Questionnaire User Requirements Data Analysis & Data Modelling Questionnaire Survey Form £k Business / Data Analyst 3 Cone™‫‏‬Physical‫‏‬ Database Design Logical Data Model Cone™ Database Design Physical Cone™ Design £k Data Analyst / DBA 4 Cone™‫‏‬Data‫‏‬Load‫–‏‬ Questionnaire / Survey Forms Physical Data Model, Survey Questionnaire Cone™ Model Calibration and Tuning Runs Initialised Cone™ Model £k Business / Data Analyst, DBA 2 5 Cone™‫‏‬Data‫‏‬Load‫–‏‬ In-house CRM and Audience Data Physical Data Model, People CRM Data Cone™ Model CRM Data Load Populated Cone™ Model £k Business / Data Analyst, DBA 6 Cone™‫‏‬Profiling Cone™ Clustering Algorithms Cone™ Model Data Profiling – Kernel k-means Profiled Cone™ Model £k Data Analyst, DBA, Data Scientists 3 7 Cone™‫‏‬Streaming‫‏‬ and Segmentation Historic Sales and CRM Data Cone™ History Matching Runs Cone™ Historic Trends £k Data Scientists 8 Cone™‫‏‬Real-time Social Media Feeds Global Social Intelligence Cone™ Real- Time Analytics Actionable Cone™ Insights (variable with Cone™ total data volume) Data Scientists
  30. 30. Social Intelligence – Brand Loyalty and Affinity CONE SEGMENTS – Brand Loyalty and Affinity Social Intelligence drives Brand Loyalty and Affinity, Lifestyle Understanding - Fan-base Profiling, Streaming and Segmentation and marketing Campaigns – expressed in the creation and maintenance of a detailed History and Balanced Scorecard for every individual in the Cone, allowing summation by Stream / Segment: - 1. Inactive – need to draw their attention towards the Brand 2. Indifferent – need to educate them about core Brand Values 3. Disconnected– need to re-engage with the Brand 4. Casuals – exhibit Brand awareness and interest 5. Followers – follow the Brand, engage with social media and consume brand communications 6. Enthusiasts – engaged with the Brand, participate in Brand / Product / Media events and merchandising 7. Supporters– show strong need, desire and propensity to support Brand / Product / Media consumption 8. Fanatics – demonstrate total Commitment / Dedication / Loyalty for all aspects of the Brand / Product / Media PROPENSITY – Balanced Scorecard • Balanced Scorecard – is a summary of all the data-points for an Individual / Stream / Segment • Propensity Score – In the statistical analysis of observational data, Propensity Score Matching (PSM) is a statistical matching technique that attempts to estimate the effect of a Campaign / Offer / Promotion or other intervention by calculating the impact of factors that predict the outcome of the Campaign / Offer / Promotion. • Propensity Model – is the Baysian probability of the outcome of an event in an Individual / Stream / Segment • Predictive Analytics - an area of data mining that deals with extracting information from data and using it to predict trends and behaviour patterns. Often the unknown event of interest is in the future, however, Predictive Analytics can be applied to any type of event with an unknown outcome - in the past, present or future.
  31. 31. Social Intelligence – Streaming and Segmentation Social Interaction Brand Affinity Geo-demographic ProfileExperian Mosaic – 15 Groups (Streams), 66 Types (Segments) Hybrid Cone – 3 Dimensions The Cone™‫‏‬ Social Interaction The Cone™‫‏‬– Streaming & Segmentation
  32. 32. Social Intelligence – Social Interaction Social Interaction Cone Rules 1. Inactive – not engaged – low evidence / low affinity / low interest in Social Media 2. Lone Wolf – sparse / thin social network - may share negative information (Trolling) 3. Home Boy – Social Network clustered around Home Location Postcodes (Gang Culture) 4. Eternal Student – Social Network clustered around School / College / University Alumni 5. Workplace – Social Network clustered around Work and Colleagues (e.g. City Brokers, Traders) 6. Friends and Family – Social Network clustered around physical social contacts - Friends and Family 7. Enthusiast – Social Network clustered around shared, common interests – Sport. Music and Fashion etc. 8. Promiscuous – Open Networker – virtual Social Network across all categories- will connect with anybody Number of Segments • With anonymous data (e.g. surveys and polls) then the number of initial Segments is 4 (Matt Hart). With people data (named individuals) we can discover much richer internal and external data from multiple sources (Social Media / User Content / Experian) - and therefore segment the population with greater granularity Individuals Qualifying for Multiple Segments. • When individuals qualify for multiple segments - we can either add these deviant (non-standard) individuals to the Segment that they have the greatest affinity with - or kick out any such deviants into an Outlying / Outcast / Miscellaneous Segment for further statistical processing or for processing throiugh manual intervention
  33. 33. Social Intelligence – Actionable Insights Brand Affinity Social Interaction Geo-demographic Profile Experian Mosaic – 15 Groups (Segments), 66 Types (Streams) Hybrid Cone – 3 Dimensions Fanatics - 10% Enthusiasts - 20% Casuals - 30% Indifferent - 40% The Cone™‫‏‬ Brand Loyalty & Affinity The Cone™‫‏‬– Actionable Insights
  34. 34. Social Interaction How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around social and environmental issues.
  35. 35. The chart above illustrates the richness and diversity of social media.....
  36. 36. The pattern of Social Relationships..... Social Media is the fastest growing category of user-provided global content and will eventually grow to 20% of all internet content. Gartner defines social media content as unstructured data created, edited and published by users on external platforms including Facebook, MySpace, LinkedIn, Twitter, Xing, YouTube and a myriad of other social networking platforms - in addition to internal Corporate Wikis, special interest group blogs, communications and collaboration platforms..... Social Mapping is the method used to describe how social linkage between individuals in order to define Social Networks and to understand the nature of intimate relationships between individuals.
  37. 37. Social Conversations SCRM in the Cloud
  38. 38. Traditional CRM was very much based around data and information that brands could collect on their customers, all of which would go into a CRM system that then allowed the company to better target various customers. CRM is comprised of sales, marketing and service / support–based functions whose purpose was to move the customer through a pipeline with the goal of keeping the customer coming back to buy more and more stuff...... TRADITIONAL CRM – Customer Management PipelineTRADITIONAL CRM – Customer Management Pipeline
  39. 39. Evolution of CRM to SCRM - The challenge for organizations now is adapting and evolving to meet the needs and demands of these new social customers - many organizations still do not understand the CRM value of social media..... SOCIAL CRM – Social Media ConversationsSOCIAL CRM – Social Media Conversations
  40. 40. In Social CRM - the customer is actually the focal point of how an organization operates. Instead of marketing products or pushing messages to customers, brands now talk to and collaborate with their customers to solve business problems, empower customers to shape their own Customer Experience and Journeys and develop strong customer relationships - which will over time, turn participants into brand evangelists and positive customer advocates..... SOCIAL CRM – Social CRM ProcessesSOCIAL CRM – Social Media Conversations
  41. 41. Posted on April 20, 2010 by Laurance Buchanan - Capgemini SOCIAL CRM – a Business Framework and Operating Model Social CRM - a Business Framework and Operating Model SOCIAL CRM – Business Framework and Operating Model
  42. 42. Social Graphs and Market Sentiment •‫‏‬Using‫“‏‬BIG‫‏‬DATA”‫‏‬to‫‏‬drive‫‏‬Market‫‏‬Sentiment‫‏•‏‬ Unprompted online conversations, statements and news create an online reflection of real-life events and issues – influencing the thoughts of individual consumers – managing Reputational Risk and so shaping Market Sentiment. The Social Media data, Blogs and News feeds that form this digital mirror of the world provides a gold mine of actionable information.....
  43. 43. • Influencer Programmes have a long history in industries such as software, computers and electronics, - but today they are successfully deployed across all types of industries including automotive, smart phones, fashion, health and nutrition, wine, sports, music, technology, travel tourism and leisure – and financial services..... • In a hyper-connected world market-makers and influencers increasingly provide the gateway to decision makers who drive consumer behaviour. • Unprompted online conversations, statements and news create an online reflection of real-life events and issues – influencing the thoughts of individual consumers and so shaping Market Sentiment. • The Social Media data and News feeds that form this digital mirror of the world provides a gold mine of information. However, unlocking the data is not straight forward as it requires a complex and unique set of technologies, skills and methods..... INFLUENCER PROGRAMMES – Social Media Conversations INFLUENCER PROGRAMMES – Social Media Conversations INFLUENCER PROGRAMMES – Social Media Conversations
  44. 44. The Cone™‫‏‬- Digital Marketing
  45. 45. SalesForce.com – a Cloud Platform Social CRM Business Solution The Cone™‫‏‬- Digital Marketing The Cone™‫‏‬- Lifestyle Understanding Customer Management (CRM / CEM) Social Intelligence Campaign Management e-Business Big Data Analytics The Cone™‫‏‬ Customer Loyalty & Brand Affinity The Cone™‫‏‬ Smart Apps Alarms & Alerts Reporting
  46. 46. Digital Marketing – Solution Options Vendor Social Intelligence Mobile Big Data Analytics Cloud CRM / CEM Amazon + Salesforce Anomaly 42 Apple iOS + Android AWS Elastic MapReduce (EMR) AWS S3 “R” Revolution Kernel k-means AWS EC2 SalesForce + 3rd Party Apps Store Google Google Analytics Google Nexus Google Hadoop Google Analytics Google Cloud Google Office + Apps IBM IBM InfoSphere BigInsights IBM Cloud Microsoft Nokia, Windows 8 for Mobile Microsoft SQL/Server + Hadoop Microsoft Analytics DOT.NET, C# Windows Azure HDInsight Microsoft Office 360 + Dynamics Oracle Oracle DBMS + Hadoop OBIE Oracle Cloud Oracle CRM and EBS SAP SUP + Fiori SAP HANA + Hadoop Business Objects SAP HANA Cloud SAP CRM + Hybris
  47. 47. The Cone™‫‏‬- Digital Marketing The Cone™‫‏‬ Lifestyle Understanding The‫‏‬Cone™‫‏‬– Brand Loyalty and Affinity The Cloud – SalesForce.com Amazon Web Services (AWS} Social Intelligence Data Science / Big Data Analytics Customer Experience & Journey - CRM / CEM Alarms / Alerts Reporting e-Business Smart Apps
  48. 48. The Cone™‫‏‬– Digital Marketing Connecting‫‏‬the‫‏‬Unconnected….. • FMCG, Media, Entertainment and other enterprises which supply products and services indirectly to consumers – via Channel Partners such as Distributors, Dealers, Wholesalers and Retailers – are not directly connected to their customer base. In order to drive brand strategy and customer loyalty / affinity – they have to reach out to, contact and connect with, on the most intimate terms - the widest possible range of end-user consumers: - – Music (e.g. BBC and Sony Music) – Broadcasting (e.g. Radio 1 / American Idol) – Digital Media Content (e.g. Sony Films / Netflix) – Sports Franchises (e.g. Manchester City / New York City) – Fast Fashion Retailers (e.g. ASOS, Next, New Look, Primark, Top Shop) – Luxury Brands / Aggregators (e.g. Armani, Burberry, Versace / LVMH, PPR, Richemont) – Multi-channel Retailers – Loyalty, Campaigns, Offers and Promotions – Financial Services Companies – Brand Protection and Reputation Management – Travel, Leisure and Entertainment Organisations - Destination Resorts and Events – MVNO / CSPs - OTT Business Partner Analytics (Sky Go, Netflix via Firebrand / Apigee) – Telco, Media and Communications - Churn Management / Conquest / Up-sell / Cross-sell Campaigns – Digital Healthcare – Private / Public Healthcare Service Provisioning: - Geo-demographic Clustering and Propensity Modelling (Patient Monitoring, Wellbeing, Clinical Trials, Morbidity and Actuarial Outcomes)
  49. 49. The Cone™‫‏‬- Eight Primitives Primitive Problem / Opportunity Business Domain System Function Software Product Who ? Who are our Customers ? Party - People / Organisations CRM / CEM SalesForce.com - Customer Management What ? What are they saying about us ? Social Media / Communications Social Intelligence Google Analytics, Anomaly 42 Why ? Why - their Interest / Behaviour / Motivation / Aspirations / Desires ? Brand Identity / Loyalty / Affinity / Offers / Promos’ Marketing, Campaign Management Predictive Analytics / Propensity Modelling Where ? Where do they Live / Work / Shop / Relax ? Places - Location GIS / GPS Geospatial Analytics When ? When do they contact / buy products from us ? Time / Date Contact Event / Sales Transaction Multi-channel Retail / Mobile Platforms How ? How do they contact and connect with us – Media / Telecoms Channels ? Communications Channel • Mobile • Internet • In-store Multi-channel Retail / Mobile Platforms Which ? Which Brands / Ranges / Categories / Products ? Retail Merchandising Product Catalogue IBM Product Centre / Stebo / Kalido Via ? Via Business Partners / 3rd Party Channels ? Sales Channel Retail Channel / Outlet Amazon, E-bay, Alibaba
  50. 50. The Cone™‫‏‬– EIGHT PRIMITIVES Event Dimension Party Dimension Geographic Dimension Motivation Dimension Time Dimension Media Dimension Cone™‫‏‬ MEDIA FACT WHO ? WHAT ? WHERE ? HOW ?WHEN ?WHY ? • Indifferent • Casuals • Enthusiasts • Fanatics • Radio Show • Television Show • Internet Advert • Campaign • Offer • Promotion • Pre-order • Purchase • Download • Playlist • Booking • Attendance • Advert / Publicity • Posting / Blog • Facebook • LinkedIn • Myspace • Twitter • YouTube • Xing • Region / Country • State / County • City / Town • Street / Building • Postcode • Person • Organisation Product Dimension WHICH ? • Category • Label / Artist • Album / Track • Tour / City / Arena • Merchandise Channel Dimension VIA ? • Channel / Partner • In-store • Internet Service • Mobile Smart App (Spotify etc.) Advert / Publicity Type Sales Channel Posting / Blog Source / Type Subject Location Media Event • Awareness • Interest • Need • DesireMotivation Customer Time / Date Version 2 – Media Co’s
  51. 51. Social Intelligence – Profiling and Analysis Fanatics - 10% Enthusiasts - 20% Casuals - 30% Indifferent - 40% The Cone™‫‏‬ Brand Loyalty & Affinity The Cone™‫‏‬– Profiling & Analysis
  52. 52. The Cone™‫‏‬– Model Development Initialise Cone™‫‏‬ Model Cone™‫‏‬ Model Design Data Load Cone™‫‏‬ Model Calibration and Tuning Cone™‫‏‬ History Matching Cone™‫‏‬ Real-Time Analytics Survey Script Data Data Model Customer Data Profiling Data Historic Data Real-Time Data Cone™ Model Database Design Populated Cone™ Model Profiled Cone™ Model Historic Trends Actionable Insights Step 1 Step 3 Step 4 Step 5 Step 6Step 2
  53. 53. The Cone™‫‏‬– Model Delivery Phase / Step Description Input Design Process Output Cost (estimate) Skill Set 1 1 Cone™‫‏‬Model‫‏‬Data‫‏‬ Analysis / Design User Requirements Data Analysis & Data Modelling Cone™ Logical Data Model £k Business / Data Analyst 2 Cone™‫‏‬Data‫‏‬Design‫‏‬ – Questionnaire User Requirements Data Analysis & Data Modelling Questionnaire Survey Form £k Business / Data Analyst 3 Cone™‫‏‬Physical‫‏‬ Database Design Logical Data Model Cone™ Database Design Physical Cone™ Design £k Data Analyst / DBA 4 Cone™‫‏‬Data‫‏‬Load‫–‏‬ Questionnaire / Survey Forms Physical Data Model, Survey Questionnaire Cone™ Model Calibration and Tuning Runs Initialised Cone™ Model £k Business / Data Analyst, DBA 2 5 Cone™‫‏‬Data‫‏‬Load‫–‏‬ In-house CRM and Audience Data Physical Data Model, People CRM Data Cone™ Model CRM Data Load Populated Cone™ Model £k Business / Data Analyst, DBA 6 Cone™‫‏‬Profiling Cone™ Clustering Algorithms Cone™ Model Data Profiling – Kernel k-means Profiled Cone™ Model £k Data Analyst, DBA, Data Scientists 3 7 Cone™‫‏‬Streaming‫‏‬ and Segmentation Historic Sales and CRM Data Cone™ History Matching Runs Cone™ Historic Trends £k Data Scientists 8 Cone™‫‏‬Real-time Social Media Feeds Global Social Intelligence Cone™ Real- Time Analytics Actionable Cone™ Insights (variable with Cone™ total data volume) Data Scientists
  54. 54. The Cone™‫‏‬– Model Implementation Initialise Cone™‫‏‬ Model Cone™‫‏‬ Model Design Data Load Cone™‫‏‬ Model Calibration and Tuning Cone™‫‏‬ History Matching Cone™‫‏‬ Real-Time Analytics Data Model Database Schema Business Analyst DBA Survey Data Cone™‫‏‬Model Data Architect DBA CRM Data Populated Cone™‫‏‬Model Data Architect DBA Stream and Segment Data Profiled Cone™‫‏‬Model Data Architect DBA Historic Data Historic Trends Data Architect Data Scientists Real-Time Data Actionable Insights Data Architect Data Scientists
  55. 55. The Cone™‫‏‬– Digital Marketing Data Streams into Revenue Streams….. • Digital Marketing is the communication, advertising and marketing of brands, products and services via multiple digital channels and channel partners in order to reach out to, contact and connect, on the most intimate terms, with the widest possible range of consumers. Through the exploitation of Digital Media we can initiate and maintain engaging Social Conversations. • Digital Marketing extends key Brand Messages across every digital platform, from simple internet marketing to mobile, broadcast and social media channels – yielding Social Intelligence data in order to discover actionable Marketing Insights – which in turn convert digital Data Streams into Revenue Streams • The key objective of Digital Marketing is to reach out to, contact and connect directly with carefully selected consumers – so that we create strong, lasting and durable relationships in order to promote key brand, category and product messages to targeted consumers and thus develop a tangible, valuable. very real and distinct brand / category / product interest, following, affinity and loyalty
  56. 56. The Cone™ Converting Data Streams into Revenue Streams Salesforce Anomaly 42 Cone Unica End User BIG DATA ANALYTICS SOCIAL MEDIA E-Commerce Platform FULFILMENT Sales Orders Salesforce CRM Geo-demographics • Streaming • Segmentation • Household Data SOCIAL CRM Households Insights InsightsInsights Anomaly 42 Unica Offers and Promotions People and Places Campaigns Social Intelligence • User Content and Blogs • Social Groups and NetworksSOCIAL INTELLIGENCE Actionable Marketing Insights EXPERIAN The Cone™‫‏‬ Big Wheel keeps on turning – Perfect Store
  57. 57. SalesForce.com – a Cloud Platform Social CRM Business Solution The Cone™‫‏‬- Digital Marketing The Cone™‫‏‬- Lifestyle Understanding Customer Management (CRM / CEM) Social Intelligence Campaign Management e-Business Big Data Analytics The Cone™‫‏‬ Customer Loyalty & Brand Affinity The Cone™‫‏‬ Smart Apps Alarms & Alerts Reporting
  58. 58. “DATA‫‏‬SCIENCE”‫–‏‬ my own special area of Business expertise Targeting – Map / Reduce Consume – End-User Data Data Acquisition – High-Volume Data Flows – Mobile‫‏‬Enterprise‫‏‬Platforms‫(‏‬MEAP’s) Apache Hadoop Framework HDFS, MapReduce, Metlab “R” Autonomy, Vertica Smart Devices Smart Apps Smart Grid Clinical Trial, Morbidity and Actuarial Outcomes Market Sentiment and Price Curve Forecasting Horizon Scanning,, Tracking and Monitoring Weak Signal, Wild Card and Black Swan Event Forecasting – Data Delivery and Consumption News Feeds and Digital Media Global Internet Content Social Mapping Social Media Social CRM – Data Discovery and Collection – Analytics Engines - Hadoop – Data Presentation and Display Excel Web Mobile – Data Management Processes Data Audit Data Profile Data Quality Reporting Data Quality Improvement Data Extract, Transform, Load – Performance Acceleration GPU’s – massive parallelism SSD’s – in-memory processing DBMS – ultra-fast data replication – Data Management Tools DataFlux Embarcadero Informatica Talend – Info. Management Tools Business Objects Cognos Hyperion Microstrategy Biolap Jedox Sagent Polaris Teradata SAP HANA Netezza (now IBM) Greenplum (now EMC2) Extreme Data xdg Zybert Gridbox – Data Warehouse Appliances Ab Initio Ascential Genio Orchestra SOCIAL CRM – The Emerging Big Data Stack
  59. 59. The Cone™‫‏‬- Brand Loyalty / Affinity 1. Brand Affinity 2. Social Interaction 3. Geo-demographic Profile – Experian Mosaic -15 Groups (Segments), 66 Types (Streams) Hybrid Cone™ – 3 Dimensions Fanatics - 10% Enthusiasts - 20% Casuals - 30% Indifferent - 40% The Cone™‫‏‬ Brand Loyalty & Affinity
  60. 60. The Cone™‫‏‬- CAMPAIGN
  61. 61. Salesforce Anomaly 42 Cone Unica End User BIG DATA ANALYTICS Cone™‫‏‬ Brand Affinity Campaign CRM Insights InsightsInsights SALES PEOPLE DEMOGRAPHICS Household Data SOCIAL INTELLIGENCE User Content, Social Groups and Networks Offers and Promotions People & Places PROFILING Streaming & Segmentation The‫‏‬Cone™‫‏‬– CYCLEThe Cone™‫‏‬– CONSUMER CYCLE e-Business Smart Apps Big Wheel keeps on turning – Perfect Store
  62. 62. Hadoop Clustering and Managing Data..... Managing Data Transfers in Networked Computer Clusters using Orchestra To illustrate I/O Bottlenecks, we studied Data Transfer impact in two clustered computing systems: - Hadoop - using trace from a 3000-node cluster at Facebook Spark a MapReduce-like framework with iterative machine learning + graph algorithms. Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, Ion Stoica University of California, Berkeley {mosharaf, matei, jtma, jordan, istoica}@cs.berkeley.edu
  63. 63. Hadoop Framework • The workhorse relational database has been the tool of choice for businesses for well over 20 years now. Challengers have come and gone but the trusty RDBMS is the foundation of almost all enterprise systems today. This includes almost all transactional and data warehousing systems. The RDBMS has earned its place as a proven model that, despite some quirks, is fundamental to the very integrity and operational success of IT systems around the world. • The relational database is finally showing some signs of age as data volumes and network speeds grow faster than the computer industry's present compliance with Moore's Law can keep pace with. The Web in particular is driving innovation in new ways of processing information as the data footprints of Internet-scale applications become prohibitive using traditional SQL database engines. • When it comes to database processing today, change is being driven by (at least) four factors: – Speed. The seek times of physical storage is not keeping pace with improvements in network speeds. – Scale. The difficulty of scaling the RDBMS out efficiently (i.e. clustering beyond a handful of servers is notoriously hard.) – Integration. Today's data processing tasks increasingly have to access and combine data from many different non-relational sources, often over a network. – Volume. Data volumes have grown from tens of gigabytes in the 1990s to hundreds of terabytes and often petabytes in recent years.
  64. 64. RDBMS and Hadoop: Apples and Oranges? • Below is Figure 1 - a comparison of the overall differences between Database RDBMS and MapReduce-based systems such as Hadoop • From this it's clear that the MapReduce model cannot replace the traditional enterprise RDBMS. However, it can be a key enabler of a number of interesting scenarios that can considerably increase flexibility, turn-around times, and the ability to tackle problems that weren't possible before. • With Database RDBMS platforms, SQL-based processing of data sets tends to fall away and not scale linearly after a specific volume ceiling, usually just a handful of nodes in a cluster. With MapReduce, you can consistently obtain performance gains by increasing the size of the cluster. In other words, double the size of Hadoop cluster and a job will run twice as fast - quadruple it will rub four times faster - its the same linear relationship, irrespective of data volume and throughput.
  65. 65. Comparing Data in DWH, Appliances, Hadoop Clusters and Analytics Engines RDBMS DWH DWH Appliance Hadoop Cluster Analytics Appliance Data size Gigabytes Terabytes Petabytes Petabytes Access Interactive and batch Interactive and batch Batch Interactive Structure Fixed schema Fixed schema Flexible schema Flexible schema Language SQL SQL Non-procedural Languages (Java, C++, Ruby, “R” etc) Non-procedural Languages (Java, C++, Ruby, “R” etc) Data Integrity High High Low Very High Architecture Shared memory - SMP Shared nothing - MPP Hadoop DFS In-memory Processing – GPGPUs / SSDs Virtualisation Partitions / Regions MPP / Nodal MPP / Clustered MPP / Clustered Scaling Non-linear Nodal / Linear Clustered / Linear Clustered / Linear Updates Read and write Write once, read many Write once, read many Write once, read many Selects Row-based Set-based Column-based Array-based Latency Low – Real-time Low – Near Real-time High – Historic Reporting Very Low – Real-time Analytics Figure 1: Comparing RDBMS to MapReduce
  66. 66. Hadoop Framework • These datasets would previously have been very challenging and expensive to take on with a traditional RDBMS using standard bulk load and ETL approaches. Never mind trying to efficiently combining multiple data sources simultaneously or dealing with volumes of data that simply can't reside on any single machine (or often even dozens). Hadoop deals with this by using a distributed file system (HDFS) that's designed to deal coherently with datasets that can only reside across distributed server farms. HDFS is also fault resilient and so doesn't impose the overhead of RAID drives and mirroring on individual nodes in a Hadoop compute cluster, allowing the use of truly low cost commodity hardware. • So what does this specifically mean to enterprise users that would like to improve their data processing capabilities? Well, first there are some catches to be aware of. Despite enormous strengths in distributed data processing and analysis, MapReduce is not good in some key areas that the RDMS is extremely strong in (and vice versa). The MapReduce approach tends to have high latency (i.e. not suitable for real-time transactions) compared to relational databases and is strongest at processing large volumes of write-once data where most of the dataset needs to be processed at one time. The RDBMS excels at point queries and updates, while MapReduce is best when data is written once and read many times. • The story is the same with structured data, where the RDBMS and the rules of database normalization identified precise laws for preserving the integrity of structured data and which have stood the test of time. MapReduce is designed for a less structured, more federated world where schemas may be used but data formats can be much looser and freeform.
  67. 67. The Emerging “Big Data” Stack Targeting – Map / Reduce Consume – End-User Data Data Acquisition – High-Volume Data Flows – Mobile Enterprise Platforms (MEAP’s) Apache Hadoop Framework HDFS, MapReduce, Metlab “R” Autonomy, Vertica Smart Devices Smart Apps Smart Grid Clinical Trial, Morbidity and Actuarial Outcomes Market Sentiment and Price Curve Forecasting Horizon Scanning,, Tracking and Monitoring Weak Signal, Wild Card and Black Swan Event Forecasting – Data Delivery and Consumption News Feeds and Digital Media Global Internet Content Social Mapping Social Media Social CRM – Data Discovery and Collection – Analytics Engines - Hadoop – Data Presentation and Display Excel Web Mobile – Data Management Processes Data Audit Data Profile Data Quality Reporting Data Quality Improvement Data Extract, Transform, Load – Performance Acceleration GPU’s – massive parallelism SSD’s – in-memory processing DBMS – ultra-fast database replication – Data Management Tools DataFlux Embarcadero Informatica Talend – Info. Management Tools Business Objects Cognos Hyperion Microstrategy Biolap Jedox Sagent Polaris Teradata SAP HANA Netezza (now IBM) Greenplum (now EMC2) Extreme Data xdg Zybert Gridbox – Data Warehouse Appliances Ab Initio Ascential Genio Orchestra
  68. 68. Hadoop Framework • Each of these factors is presently driving interest in alternatives that are significantly better at dealing with these requirements. I'll be clear here: The relational database has proven to be incredibly versatile and is the right tool for the majority of business needs today. However, the edge cases for many large-scale business applications are moving out into areas where the RDBMS is often not the strongest option. One of the most discussed new alternatives at the moment is Hadoop, a popular open source implementation of MapReduce. MapReduce is a simple yet very powerful method for processing and analyzing extremely large data sets, even up to the multi- petabyte level. At its most basic, MapReduce is a process for combining data from multiple inputs (creating the "map"), and then reducing it using a supplied function that will distill and extract the desired results. It was originally invented by engineers at Google to deal with the building of production search indexes. The MapReduce technique has since spilled over into other disciplines that process vast quantities of information including science, industry, and systems management. For its part, Hadoop has become the leading implementation of MapReduce. • While there are many non-relational database approaches out there today (see my emerging IT and business topics post for a list), nothing currently matches Hadoop for the amount of attention it's receiving or the concrete results that are being reported in recent case studies. A quick look at thelist of organizations that have applications powered by Hadoop includes Yahoo! with over 25,000 nodes (including a single, massive 4,000 node cluster), Quantcast which says it has over 3,000 cores running Hadoop and currently processes over 1PB of data per day, and Adknowledge who uses Hadoop to process over 500 million clickstream events daily using up to 200 nodes
  69. 69. HP HAVEn Big Data Platform
  70. 70. Informatica / Hortonworks Vibe
  71. 71. Telco 2.0 “Big Data” Analytics Architecture
  72. 72. Case Study – Huawei SmartCare CEM Customers Campaign Mart Analytics & Customer Loyalty Loyalty Mart CRM Data Customer DWH Customer Care “BIG‫‏‬DATA” Merchandising & Logistics Data Retail Data Warehouse Retail Multi-channel Sales Analysis Mobile Platforms EPOS Data Call Centre Data Internet Data e-Commerce Systems Store Systems Merchandising Warehousing & Logistics Inventory & Provisioning Hadoop Cluster SAP HANA ERP Systems Finance Managers Financial Data Warehouse Head OfficeFinancial Analysis Reports ERP Data OSS – Network Management Network Provisioning & Fault Management OperationsNetwork Data Network and Fault Reports Operations Managers Inventory, Provisioning & Replenishment BSS – Rating, Mediation and Billing Mediation Rating and Billing Systems Business Managers Supplier Data Product Data Customer Data Inventory & Provisioning Reports Planning & Forecasting Systems CDR Data Call Data Warehouse Billing Data Autonomy Vertica Operational “BIG‫‏‬DATA” Multi-channel Retail MSS – Head Office – Finance, Planning &Strategy Social Media - External Data Customer Care Systems CRM & Digital Marketing Systems Customers CEM SAP HANA Catalogue Hadoop ClusterPentaho, MetLab, “R” Cloudera Apache Hadoop Framework
  73. 73. Big Data – Products The MapReduce technique has spilled over into many other disciplines that process vast quantities of information including science, industry, and systems management. The Apache Hadoop Library has become the most popular implementation of MapReduce – with framework implementations from Cloudera, Hortonworks and MAPR
  74. 74. Split-Map-Shuffle-Reduce Process Big Data Consumers Split Map Shuffle Reduce Key / Value Pairs Actionable InsightsData Provisioning Raw Data
  75. 75. Apache Hadoop Component Stack HDFS MapReduce Pig Zookeeper Hive HBase Oozie Mahoot Hadoop Distributed File System (HDFS) Scalable Data Applications Framework Procedural Language – abstracts low-level MapReduce operators High-reliability distributed cluster co-ordination Structured Data Access Management Hadoop Database Management System Job Management and Data Flow Co-ordination Scalable Knowledge-base Framework
  76. 76. Data Management Component Stack Informatica Drill Millwheel Informatica Big Data Edition / Vibe Data Stream Data Analysis Framework Data Analytics on-the-fly + Extract – Transform – Load Framework Flume Sqoop Scribe Extract – Transform - Load Extract – Transform - Load Extract – Transform - Load Talend Extract – Transform - Load Pentaho Extract – Transform – Load Framework + Data Reporting on-the-fly
  77. 77. Big Data Storage Platforms Autonomy Vertica MongoDB HP Unstructured Data DBMS HP Columnar DBMS High-availability DBMS CouchDB Couchbase Database Server for Big Data with NoSQL / Hadoop Integration Pivotal Pivotal Big Data Suite – GreenPlum, GemFire, SQLFire, HAWQ Cassandra Cassandra Distributed Database for Big Data with NoSQL and Hadoop Integration NoSQL NoSQL Database for Oracle, SQL/Server, Couchbase etc. Riak Basho Technologies Riak Big Data DBMS with NoSQL / Hadoop Integration
  78. 78. Big Data Analytics Engines and Appliances Alpine Karmasphere Kognito Alpine Data Studio - Advanced Big Data Analytics Karmasphere Studio and Analyst – Hadoop Customer Analytics Kognito In-memory Big Data Analytics MPP Platform Skytree Redis Skytree Server Artificial Intelligence / Machine Learning Platform Redis is an open source key-value database for AWS, Pivotal etc. Teradata Teradata Appliance for Hadoop Neo4j Crunchbase Neo4j - Graphical Database for Big Data InfiniDB Columnar MPP open-source DB version hosted on GitHub Big Data Analytics Engines / Appliances
  79. 79. Big Data Analytics and Visualisation Platforms Tableaux Tableaux - Big Data Visualisation Engine Eclipse Symentec Eclipse - Big Data Visualisation Mathematica Mathematical Expressions and Algorithms StatGraphics Statistical Expressions and Algorithms FastStats Numerical computation, visualization and programming toolset MatLab R Data Acquisition and Analysis Application Development Toolkit “R”‫‏‬Statistical‫‏‬Programming‫‏/‏‬Algorithm‫‏‬Language Revolution Revolution‫‏‬Analytics‫‏‬Framework‫‏‬and‫‏‬Library‫‏‬for‫“‏‬R”
  80. 80. Hadoop / Big Data Extended Infrastructure Stack SSD Solid State Drive (SSD) – configured as cached memory / fast HDD CUDA CUDA (Compute Unified Device Architecture) GPGPU GPGPU (General Purpose Graphical Processing Unit Architecture) IMDG IMDG (In-memory Data Grid – extended cached memory) Vibe Splunk High Velocity / High Volume Machine / Automatic Data Streaming High Velocity / High Volume Machine / Automatic Data Streaming Ambari High-availability distributed cluster co-ordination YARN Hadoop Resource Scheduling Big Data Extended Architecture Stack
  81. 81. Cloud-based Big-Data-as-a-Service and Analytics AWS Amazon Web Services (AWS) – Big Data-as-a-Service (BDaaS) Elastic Compute Cloud (ECC) and Simple Storage Service (S3) 1010 Data Big Data Discovery, Visualisation and Sharing Cloud Platform SAP HANA SAP HANA Cloud - In-memory Big Data Analytics Appliance Azure Microsoft Azure Data-as-a-Service (DaaS) and Analytics Anomaly 42 Anomaly 42 Smart-Data-as-a-Service (SDaaS) and Analytics Workday Workday Big-Data-as-a-Service (BDaaS) and Analytics Google Cloud Google Cloud Platform – Cloud Storage, Compute Platform, Firebrand API Resource Framework Apigee Apigee API Resource Framework
  82. 82. Gartner Magic Quadrant for BI and Analytics Platforms
  83. 83. Hadoop Framework Distributions FEATURE Hortonworks Cloudera MAPR Pivotal Open Source Hadoop Library Yes Yes Yes Pivotal HD Support Yes Yes Yes Yes Professional Services Yes Yes Yes Yes Catalogue Extensions Yes Yes Yes Yes Management Extensions Yes Yes Yes Architecture Extensions Yes Yes Infrastructure Extensions Yes Yes Library Support Services Catalogue Job Management Library Support Services Catalogue Hortonworks Cloudera MAPR Library Support Services Catalogue Job Management Resilience High Availability Performance Pivotal Library Support Services Catalogue Job Management Resilience High Availability Performance
  84. 84. Gartner Magic Quadrant for BI
  85. 85. Data Warehouse Appliance / Real-time Analytics Engine Price Comparison Manufacturer Server Configuration Cached Memory Server Type Software Platform Cost (est.) SAP HANA 32-node (4 Channels x 8 CPU) 1.3 Terabytes SMP Proprietary $ 6,000,,000 Teradata 20-node (2 Channels x 10 CPU) 1 Terabyte MPP Proprietary $ 1,000,000 Netezza (now IBM) 20-node (2 Channels x 10 CPU) 1 Terabyte MPP Proprietary $ 180,000 IBM ex5 (non-HANA configuration) 32-node (4 Channels x 8 CPU) 1.3 Terabytes SMP Proprietary $ 120,000 Greenplum (now Pivotal) 20-node (2 Channels x 10 CPU) 1 Terabyte MPP Open Source $ 20,000 XtremeData xdb (BO BW) 20-node (2 Channels x 10 CPU) 1 Terabyte MPP Open Source $ 18,000 Zybert Gridbox 48-node (4 Channels x 12 CPU) 20 Terabytes SMP Open Source $ 60,000
  86. 86. Clustering in “Big Data” “A Cluster is a group of the same or similar data elements which are aggregated – or closely distributed – together” Clustering is a technique used to explore content and understand information in every business sector and scientific field that collects and processes very large volumes of data Clustering is an essential tool for any “Big Data” problem
  87. 87. • “Big‫‏‬Data”‫‏‬refers to vast aggregations (super sets) consisting of numerous individual datasets (structured and unstructured) - whose size and scope is beyond the capability of conventional transactional (OLTP) or analytics (OLAP) Database Management Systems and Enterprise Software Tools to capture, store, analyse and manage. Examples of “Big Data” include the vast and ever changing amounts of data generated in social networks where we maintain Blogs and have conversations with each other, news data streams, geo-demographic data, internet search and browser logs, as well as the ever-growing amount of machine data generated by pervasive smart devices - monitors, sensors and detectors in the environment – captured via the Smart Grid, then processed in the Cloud – and delivered to end-user Smart Phones and Tablets via Intelligent Agents and Alerts. • Data Set Mashing and “Big‫‏‬Data”‫‏‬Global‫‏‬Content‫‏‬Analysis – drives Horizon Scanning, Monitoring and Tracking processes by taking numerous, apparently un-related RSS and other Information Streams and Data Feeds, loading them into Very large Scale (VLS) DWH Structures and Document Management Systems for Real-time Analytics – searching for and identifying possible signs of relationships hidden in data (Facts/Events)– in order to discover and interpret previously unknown Data Relationships driven by hidden Clustering Forces – revealed via “Weak‫‏‬Signals”‫‏‬indicating emerging and developing Application Scenarios, Patterns and Trends - in turn predicating possible, probable and alternative global transformations which may unfold as future “Wild‫‏‬Card”‫‏‬or “Black‫‏‬Swan”‫‏‬events. “Big Data”
  88. 88. Clustering in “Big Data” • The profiling and analysis of large aggregated datasets in order to determine a ‘natural’ structure of groupings provides an important technique for many statistical and analytic applications. Cluster analysis on the basis of profile similarities or geographic distribution is a method where no prior assumptions are made concerning the number of groups or group hierarchies and internal structure. Geo- demographic techniques are frequently used in order to profile and segment populations by ‘natural’ groupings - such as common behavioural traits, Clinical Trial, Morbidity or Actuarial outcomes - along with many other shared characteristics and common factors.....
  89. 89. Clustering in “Big Data” •‫"‏‬BIG‫‏‬DATA”‫‏‬ANALYTICS‫–‏‬ PROFILING, CLUSTERING and 4D‫‏‬GEOSPATIAL‫‏‬ANALYSIS‫‏•‏‬ • The profiling and analysis of large aggregated datasets in order to determine a ‘natural’ structure of data relationships or groupings, is an important starting point forming the basis of many mapping, statistical and analytic applications. Cluster analysis of implicit similarities - such as time-series demographic or geographic distribution - is a critical technique where no prior assumptions are made concerning the number or type of groups that may be found, or their relationships, hierarchies or internal data structures. Geospatial and demographic techniques are frequently used in order to profile and segment populations by ‘natural’ groupings. Shared characteristics or common factors such as Behaviour / Propensity or Epidemiology, Clinical, Morbidity and Actuarial outcomes – allow us to discover and explore previously unknown, concealed or unrecognised insights, patterns, trends or data relationships. •‫‏‬PREDICTIVE‫‏‬ANALYITICS‫‏‬and‫‏‬EVENT‫‏‬FORECASTING‫•‏‬ • Predictive Analytics and Event Forecasting uses Horizon Scanning, Tracking and Monitoring methods combined with Cycle, Pattern and Trend Analysis techniques for Event Forecasting and Propensity Models in order to anticipate a wide range of business. economic, social and political Future Events – ranging from micro-economic Market phenomena such as forecasting Market Sentiment and Price Curve movements - to large-scale macro-economic Fiscal phenomena using Weak Signal processing to predict future Wild Card and Black Swan Events - such as Monetary System shocks.
  90. 90. Multi-channel Retail - Digital Architecture • The last decade has seen an unprecedented explosion in mobile platforms as the internet and mobile worlds came of age. It is no longer acceptable to have only a bricks-and-mortar high-street presence – customer-focused companies are now expected to deliver their Customer Experience and Journey via internet websites, mobiles and more recently tablets.
  91. 91. Targeting – Map / Reduce Consume – End-User Data Data Acquisition – High-Volume Data Flows – Mobile Enterprise Platforms (MEAP’s) Apache Hadoop Framework HDFS, MapReduce, Metlab “R” Autonomy, Vertica Smart Devices Smart Apps Smart Grid Clinical Trial, Morbidity and Actuarial Outcomes Market Sentiment and Price Curve Forecasting Horizon Scanning,, Tracking and Monitoring Weak Signal, Wild Card and Black Swan Event Forecasting – Data Delivery and Consumption News Feeds and Digital Media Global Internet Content Social Mapping Social Media Social CRM – Data Discovery and Collection – Analytics Engines - Hadoop – Data Presentation and Display Excel Web Mobile – Data Management Processes Data Audit Data Profile Data Quality Reporting Data Quality Improvement – Performance Acceleration GPU’s – massive parallelism SSD’s – in-memory processing DBMS – ultra-fast data replication – Data Management Tools DataFlux Embarcadero Informatica Talend – Info. Management Tools Business Objects Cognos Hyperion Microstrategy Biolap Jedox Sagent Polaris Teradata SAP HANA Netezza (now IBM) Greenplum (now EMC2) Extreme Data xdg – Data Warehouse Appliances Ab Initio Ascential Genio Orchestra Social Intelligence – The Emerging Big Data Stack
  92. 92. GIS MAPPING and SPATIAL DATA ANALYSIS • A Geographic Information System (GIS) integrates hardware, software and digital data capture devices for acquiring, managing, analysing, distributing and displaying all forms of geographically dependant location data – including machine generated data such as Computer-aided Design (CAD) data from land and building surveys, Global Positioning System (GPS) terrestrial location data - as well as all kinds of data streams - HDCCTV, aerial and satellite image data.....
  93. 93. GIS Mapping and Spatial Analysis •‫‏‬GIS‫‏‬MAPPING‫‏‬and‫‏‬SPATIAL‫‏‬DATA‫‏‬ANALYSIS‫•‏‬ • A Geographic Information System (GIS) integrates hardware, software and digital data capture devices for acquiring, managing, analysing, distributing and displaying all forms of geographically dependant location data – including machine generated data such as Computer-aided Design (CAD) data from land and building surveys, Global Positioning System (GPS) terrestrial location data - as well as all kinds of data streams - HDCCTV, aerial and satellite image data..... • Spatial Data Analysis is a set of techniques for analysing 3-dimensional spatial (Geographic) data and location (Positional) object data overlays. Software that implements spatial analysis techniques requires access to both the locations of objects and their physical attributes. Spatial statistics extends traditional statistics to support the analysis of geographic data. Spatial Data Analysis provides techniques to describe the distribution of data in the geographic space (descriptive spatial statistics), analyse the spatial patterns of the data (spatial pattern or cluster analysis), identify and measure spatial relationships (spatial regression), and create a surface from sampled data (spatial interpolation, usually categorized as geo-statistics). • The results of spatial data analysis are largely dependent upon the type, quantity, distribution and data quality of the spatial objects under analysis.
  94. 94. World-wide Visitor Count – GIS Mapping
  95. 95. Geo-demographic Clustering in “Big Data” •‫‏‬GEODEMOGRAPHIC‫‏‬PROFILING‫–‏‬ CLUSTERING‫‏‬IN“BIG‫‏‬DATA”‫‏•‏‬ • The profiling and analysis of large aggregated datasets in order to determine a ‘natural’ or implicit structure of data relationships or groupings where no prior assumptions are made concerning the number or type of groups discovered or group relationships, hierarchies or internal data structures - in order to discover hidden data relationships - is an important starting point forming the basis of many statistical and analytic applications. The subsequent explicit Cluster Analysis as of discovered data relationships is a critical technique which attempts to explain the nature, cause and effect of those implicit profile similarities or geographic distributions. Demographic techniques are frequently used in order to profile and segment populations using ‘natural’ groupings - such as common behavioural traits, Clinical, Morbidity or Actuarial outcomes, along with many other shared characteristics and common factors – and then attempt to understand and explain those natural group affinities and geographical distributions using methods such as Causal Layer Analysis (CLA).....
  96. 96. GIS Mapping and Spatial Analysis • A Geographic Information System (GIS) integrates hardware, software and digital data capture devices for acquiring, managing, analysing, distributing and displaying all forms of geographically dependant location data – including machine generated data such as Computer-aided Design (CAD) data from land and building surveys, Global Positioning System (GPS) terrestrial location data - as well as all kinds of data streams - HDCCTV, aerial and satellite image data..... • Spatial Data Analysis is a set of techniques for analysing spatial (Geographic) location data. The results of spatial analysis are dependent on the locations of the objects being analysed. Software that implements spatial analysis techniques requires access to both the locations of objects and their physical attributes. • Spatial statistics extends traditional statistics to support the analysis of geographic data. Spatial Data Analysis provides techniques to describe the distribution of data in the geographic space (descriptive spatial statistics), analyse the spatial patterns of the data (spatial pattern or cluster analysis), identify and measure spatial relationships (spatial regression), and create a surface from sampled data (spatial interpolation, usually categorized as geo-statistics).
  97. 97. BTSA Induction Cluster Map
  98. 98. Geo-Demographic Profile Clusters
  99. 99. Targeting – Map / Reduce Consume – End-User Data Data Acquisition – High-Volume – Mobile Enterprise Platforms (MEAP’s) – Data Delivery and Consumption – Data Discovery and Collection – Analytics Engines - Hadoop – Data Management Processes – Performance Acceleration Apache Hadoop Framework HDFS, MapReduce, Metlab “R” Autonomy, Vertica Smart Devices Smart Apps Smart Grid Clinical Trial, Morbidity and Actuarial Outcomes Market Sentiment and Price Curve Forecasting Horizon Scanning,, Tracking and Monitoring Weak Signal, Wild Card and Black Swan Event Forecasting News Feeds and Digital Media Global Internet Content Social Mapping Social Media Social CRM Data Audit Data Profile Data Quality Reporting Data Quality Improvement Data Extract, Transform, Load GPU’s – massive parallelism SSD’s – in-memory processing DBMS – ultra-fast data replication – Data Presentation and Display – Data Management Tools – Info. Management Tools – Data Warehouse Appliances Excel Web Mobile DataFlux Embarcadero Informatica Talend Business Objects Cognos Hyperion Microstrategy Biolap Jedox Sagent Polaris Teradata SAP HANA Netezza (now IBM) Greenplum (now EMC2) Extreme Data xdg Zybert Gridbox Ab Initio Ascential Genio Orchestra
  100. 100. Clustering Phenomena in “Big Data” “A Cluster is a group of profiled data similarities aggregated closely together” • Cluster Analysis is a technique which is used to explore very large volumes of structured and unstructured data - transactional, machine generated (automatic) social media and internet content and geo-demographic information - in order to discover previously unknown, unrecognised or hidden logical data relationships.
  101. 101. Event Clusters and Connectivity A B C D E G H F The above is an illustration of Event relationships - how Events might be connected. Any detailed, intimate understanding of the connection between Events may help us to answer questions such as: - • If Event A occurs does it make Event B or H more or less likely to occur ? • If Event B occurs what effect does it have on Events C,D,E, F and G ? Answering questions such as these allows us to plan our Event Management approach and Risk mitigation strategy – and to decide how better to focus our Incident / Event resources and effort…..
  102. 102. Event Clusters and Connectivity • Aggregated Event includes coincident, related, connected and interconnected Event: - • Coincident - two or more Events appear simultaneously in the same domain – but they arise from different triggers (unrelated causal events) • Related - two more Events materialise in the same domain sharing common Event features or characteristics (may share a possible hidden common trigger or cause – and so are candidates for further analysis and investigation) • Connected - two more Events materialise in the same domain due to the same trigger (common cause) • Interconnected - two more Events materialise together in a Event cluster, series or “storm” - the previous (prior) Event event triggering the subsequent (next) event in an Event Series….. • A series of Aggregated Events may result in a significant cumulative impact - and are therefore frequently identified incorrectly as Wild-card or Black Swan Events - rather than just simply as event clusters or event “storms”.....
  103. 103. Event Clusters and Connectivity 1 2 3 4 5 7 8 6 The above is an illustration of Event relationships - how Risk Events might be connected. A detailed and intimate understanding of Event clusters and the connection between Events may help us to understand: - • What is the relationship between Events 1 and 8, and what impact do they have on Events 2 - 7 ? • Events 2 - 5 and Events 6 and 7 occur in clusters – what are the factors influencing these clusters ? Answering questions such as these allows us to plan our Risk Event management approach and mitigation strategy – and to decide how to better focus our resources and effort on Risk Events and fraud management. Claimant 1 Risk Event Claimant 2 Residence Vehicle Event Cluster
  104. 104. Aggregated Event Types ATrigger A Coincident Events BTrigger B Event Event CTrigger 1 Related Events DTrigger 2 Event Event E Trigger Connected Events Event EventF GTrigger Inter-connected Events Event Event H
  105. 105. Event Complexity Map
  106. 106. • 4D Geospatial Analytics is the profiling and analysis of large aggregated datasets in order to determine a ‘natural’ structure of groupings provides an important technique for many statistical and analytic applications. • Demographic and Geospatial Cluster Analysis - on the basis of profile similarities or geographic distribution - is a statistical method whereby no prior assumptions are made concerning the number of groups or group hierarchies and internal structure. Geo-spatial and geodemographic techniques are frequently used in order to profile and segment populations by ‘natural’ groupings - such as common behavioural traits, Clinical Trial, Morbidity or Actuarial outcomes - along with many other shared characteristics and common factors..... 4D Geospatial Analytics
  107. 107. The Flow of Information through Time • String Theory predicates that Space-Time exists in discrete packages, with Time Present always in some way inextricably woven into both Time Past and Time Future. This yields the intriguing possibility of insights through the mists of time into the outcome of future events – as any item of Data or Information (Global Content) may contain faint traces which offer glimpses into the future trajectory of Clusters of linked Past, Present and Future Events. If all future timeline were linear, then every event would unfold in an unerringly predictable manner towards a known and certain conclusion. The future is, however, both unknown and unknowable (Hawking Paradox) . Future outcomes are uncertain – future timelines are non-linear (branched) with a multitude of possible alternative futures. Chaos Theory suggests that even the most subliminal inputs, originating from unknown forces so minute as to be undetectable, might become amplified through numerous system cycles to grow in influence and impact over time – deviating Space-Time trajectories far away from their original predicted path – so fundamentally altering the outcome of future events. • Every item of Global Content in the Present is somehow connected with both Past and Future temporal planes. Space-Time is a Dimension Cluster consisting of the three Spatial dimensions (x, y and z axes) plus Time (the fourth dimension - t) – which together flow in a single direction – relentlessly towards the future. Space-Time does not flow uniformly – the “arrow of time” may be deflected by unknown factors. There may exist “hidden external forces” (unseen interactions) that create disturbance in the temporal plane stack which marks the passage of time - with the potential to create eddies, vortices and whirlpools along the trajectory of Time (chaos, disorder and uncertainty) – which in turn posses the capacity to generate ripples and waves (randomness and disruption) – thus changing the course of the Space-Time continuum. “Weak‫‏‬Signals”‫‏‬are “Ghosts‫‏‬in‫‏‬the‫‏‬Machine” – echoes of these subliminal temporal interactions – that may contain within insights or clues about possible future “Wild‫‏‬card” or “Black‫‏‬Swan”‫‏‬random events
  108. 108. 4D Geospatial Analytics – The Temporal Wave • The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration of Geospatial “Big Data” - simultaneously within a Time (history) and Space (geographic) context. The problems encountered in exploring and analysing vast volumes of spatial– temporal information in today's data-rich landscape – are becoming increasingly difficult to manage effectively. In order to overcome the problem of data volume and scale in a Time (history) and Space (location) context requires not only traditional location–space and attribute–space analysis common in GIS Mapping and Spatial Analysis - but now with the additional dimension of time–space analysis. The Temporal Wave supports a new method of Visual Exploration for Geospatial (location) data within a Temporal (timeline) context. • This time-visualisation approach integrates Geospatial (location) data within a Temporal (timeline) dataset - along with data visualisation techniques - thus improving accessibility, exploration and analysis of the huge amounts of geo-spatial data used to support geo- visual “Big Data” analytics. The temporal wave combines the strengths of both linear timeline and cyclical wave-form analysis – and is able to represent data both within a Time (history) and Space (geographic) context simultaneously – and even at different levels of granularity. Linear and cyclic trends in space-time data may be represented in combination with other graphic representations typical for location–space and attribute–space data- types. The Temporal Wave can be used in roles as a time–space data reference system, as a time–space continuum representation tool, and as time–space interaction tool.
  109. 109. 4D Geospatial Analytics – London Timeline
  110. 110. 4D Geospatial Analytics – London Timeline • How did London evolve from its creation as a Roman city in 43AD into the crowded, chaotic cosmopolitan megacity we see today? The London Evolution Animation takes a holistic view of what has been constructed in the capital over different historical periods – what has been lost, what saved and what protected. • Greater London covers 600 square miles. Up until the 17th century, however, the capital city was crammed largely into a single square mile which today is marked by the skyscrapers which are a feature of the financial district of the City. • This visualisation, originally created for the Almost Lost exhibition by the Bartlett Centre for Advanced Spatial Analysis (CASA), explores the historic evolution of the city by plotting a timeline of the development of the road network - along with documented buildings and other features – through 4D geospatial analysis of a vast number of diverse geographic, archaeological and historic data sets. • Unlike other historical cities such as Athens or Rome, with an obvious patchwork of districts from different periods, London's individual structures scheduled sites and listed buildings are in many cases constructed gradually by parts assembled during different periods. Researchers who have tried previously to locate and document archaeological structures and research historic references will know that these features, when plotted, appear scrambled up like pieces of different jigsaw puzzles – all scattered across the contemporary London cityscape.
  111. 111. • The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration of Geospatial “Big Data” - simultaneously within a Time (history) and Space (geographic) context. The problems encountered in exploring and analysing vast volumes of spatial– temporal information in today's data-rich landscape – are becoming increasingly difficult to manage effectively. In order to overcome the problem of data volume and scale in a Time (history) and Space (location) context requires not only traditional location–space and attribute–space analysis common in GIS Mapping and Spatial Analysis - but now with the additional dimension of time–space analysis. The Temporal Wave supports a new method of Visual Exploration for Geospatial (location) data within a Temporal (timeline) context. • This time-visualisation approach integrates Geospatial (location) data within a Temporal (timeline) dataset - along with data visualisation techniques - thus improving accessibility, exploration and analysis of the huge amounts of geo-spatial data used to support geo- visual “Big Data” analytics. The temporal wave combines the strengths of both linear timeline and cyclical wave-form analysis – and is able to represent data both within a Time (history) and Space (geographic) context simultaneously – and even at different levels of granularity. Linear and cyclic trends in space-time data may be represented in combination with other graphic representations typical for location–space and attribute–space data- types. The Temporal Wave can be used in roles as a time–space data reference system, as a time–space continuum representation tool, and as time–space interaction tool. 4D Geospatial Analytics – The Temporal Wave
  112. 112. Social Intelligence – Brand Affinity CONE SEGMENTS - BRAND AFFINITY • Social Intelligence drives Brand Loyalty Understanding - Fan-base Profiling, Streaming and Segmentation – expressed in the creation and maintenance of a detailed History and Balanced Scorecard for every individual in the Cone, allowing summation by Stream / Segment: - 1. Inactive – need to draw their attention towards the Brand 2. Indifferent – need to educate them about core Brand Values 3. Disconnected– need to re-engage with the Brand 4. Casuals – exhibit Brand awareness and interest 5. Followers – follow the Brand, engage with social media and consume brand communications 6. Enthusiasts – engaged with the Brand, participate in Brand / Product / Media events and merchandising 7. Supporters– show strong need, desire and propensity to support Brand / Product / Media consumption 8. Fanatics – demonstrate total Commitment / Dedication / Loyalty for all aspects of the Brand / Product / Media PROPENSITY • Balanced Scorecard – is a summary of all the data-points for an Individual / Stream / Segment • Propensity Score – In the statistical analysis of observational data, Propensity Score Matching (PSM) is a statistical matching technique that attempts to estimate the effect of a Campaign / Offer / Promotion or other intervention by calculating the impact of factors that predict the outcome of the Campaign / Offer / Promotion. • Propensity Model – is the Baysian probability of the outcome of an event in an Individual / Stream / Segment • Predictive Analytics - an area of data mining that deals with extracting information from data and using it to predict trends and behaviour patterns. Often the unknown event of interest is in the future, however, Predictive Analytics can be applied to any type of event with an unknown outcome - in the past, present or future.
  113. 113. Social Intelligence – Fan-base Understanding Football Supporters – Map of London
  114. 114. Social Intelligence – Fan-base Understanding CONE STREAMING and SEGMENTATION • Multiple Cones can be created and cross-referenced using Social Intelligence and Brand Interaction / Fan-base Profiling and Segmentation in order to deliver actionable insights for any genre of Brand Loyalty and Fan-base Understanding – as well as for other Geo-demographic Analytics purposes – e.g. Digital Healthcare, Clinical Trials, Morbidity and Actuarial Outcomes: - – Music (e.g. BBC and Sony Music) – Broadcasting (e.g. Radio 1 / American Idol) – Digital Media Content (e.g. Sony Films / Netflix) – Sports Franchises (e.g. Manchester City / New York City) – Sport Footwear and Apparel (e.g. Nike, Puma, Adidas, Reebok) – Fast Fashion Retailers (e.g. ASOS, Next, New Look, Primark) – Luxury Brands / Aggregators (e.g. Armani, Burberry, Versace / LVMH, PPR, Richemont) – Multi-channel Retailers – Brand Affinity / Loyalty Marketing + Product Campaigns, Offers & Promotions – Financial Services Companies – Brand Protection and Reputation Management – Travel, Leisure and Entertainment Organisations - Destination Events and Resorts – MVNO / CSPs - OTT Business Partner Analytics (Sky Go, Netflix, iPlayer via Firebrand / Apigee) – Telco, Media and Communications - Churn Management / Conquest / Up-sell / Cross-sell Campaigns – Digital Healthcare – Private / Public Healthcare Service Provisioning: - Geo-demographic Clustering and Propensity Modelling (Patient Monitoring, Wellbeing, Clinical Trials, Morbidity and Actuarial Outcomes)
  115. 115. Social Intelligence – Fan-base Understanding
  116. 116. Social Intelligence – Social Interaction Social Interaction Cone Rules 1. Inactive – not engaged – low evidence / low affinity / low interest in Social Media 2. Lone Wolf – sparse / thin social network - may share negative information (Trolling) 3. Home Boy – Social Network clustered around Home Location Postcodes (Gang Culture) 4. Eternal Student – Social Network clustered around School / College / University Alumni 5. Workplace – Social Network clustered around Work and Colleagues (e.g. City Brokers, Traders) 6. Friends and Family – Social Network clustered around physical social contacts - Friends and Family 7. Enthusiast – Social Network clustered around shared, common interests – Sport. Music and Fashion etc. 8. Promiscuous – Open Networker – virtual Social Network across all categories- will connect with anybody Number of Segments • With anonymous data (e.g polls) then the number of initial Segments is 4 (Matt Holland). With named individuals we can discover much richer internal and external
  117. 117. Social Interaction How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around social and environmental issues.
  118. 118. Clustering in “Big Data” “A Cluster is a group of profiled data similarities aggregated closely together” • Cluster Analysis is a technique used to explore very large volumes of transactional and machine generated (automatic) data, social media and internet content and information - in order to discover previously unknown, unrecognised or hidden data relationships. • Clustering is an essential tool for any “Big‫‏‬Data”‫‏‬problem. Cluster Analysis of both explicit (given) or implicit (discovered) data relationships in “Big‫‏‬Data”‫‏‬is a critical technique which attempts to explain the nature, cause and effect of the forces which drive clustering. Any observed profiled data similarities – geographic or temporal aggregations, mathematical or statistical distributions – may be explained through Causal Layer Analysis. – Choice of clustering algorithm and parameters are both process and data dependent – Approximate Kernel K-means provides a good trade-off between clustering accuracy and data volumes, throughput, performance and scalability – Challenges include homogeneous and heterogeneous data (structured versus unstructured data), data quality, streaming, scalability, cluster cardinality and validity
  119. 119. Cluster Types Deep Space Galactic Clusters Hadoop Cluster – “Big Data” Servers Molecular Clusters Geo-Demographic Clusters Mineral Lode Clusters
  120. 120. •‫‏‬GEODEMOGRAPHIC‫‏‬PROFILING‫–‏‬ CLUSTERING‫‏‬IN“BIG‫‏‬DATA”‫‏•‏‬ • The profiling and analysis of very large aggregated datasets to determine ‘natural’ or implicit data relationships and discover hidden common factors and data structures - where no prior assumptions are made concerning the number or type of groups - is driven by uncovering previously unknown data relationships and natural groupings. The discovery of such Cluster / Group relationships, hierarchies or internal data structures is an important starting point forming the basis of many statistical and analytic applications which are designed to expose hidden data relationships. • A subsequent explicit Cluster Analysis of previously discovered data relationships is an important technique which attempts to understand the true nature, cause and impact of unknown clustering forces driving implicit profile similarities, mathematical and geographic distributions. Geo-demographic techniques are frequently used in order to profile and segment Demographic and Spatial data by ‘natural’ groupings – including common behavioural traits, Clinical Trial, Morbidity or Actuarial outcomes – along with numerous other shared characteristics and common factors Cluster Analysis attempt to understand and explain those natural group affinities and geographical distributions using methods such as Causal Layer Analysis (CLA)..... Clustering in “Big Data”
  121. 121. Cluster Types DISCIPLINE CLUSTER TYPE CLUSTERS DIMENSIONS DATA TYPE DATA SOURCE CLUSTERING FACTORS / FORCES Astrophysics 4D Distribution of Matter across the Universe through Space and Time Star Systems Stellar Clusters Galaxies Galactic Clusters Mass / Energy Space / Time Astronomy Images – Microwave, Infrared, Optical, Ultraviolet, Radio, X-ray, Gamma-ray Optical Telescope Infrared Telescope Radio Telescope X-ray Telescope Gravity Dark Matter Dark Energy Dark Flow Climate Change Temperature Changes Precipitation Changes Ice-mass Changes Hot / Cold Dry / Wet More / Less ice Temperature Precipitation Sea / Land Ice Average Temperature Average Precipitation Greenhouse Gases % Weather Station Data Ice Core Data Tree-ring Data Solar Forcing Oceanic Forcing Atmospheric Forcing Actuarial Science Morbidity, Clinical Trials, Epidemiology Place / Date of birth Place / Date of death Cause of Death Birth / Death Longevity Cause of Death Medical Events Geography Time Biomedical Data Demographic Data Geographic data Register of Births Register of Deaths Medical Records Health Wealth Demographics Price Curves Economic Modelling Long-range Forecasting Economic growth Economic recession Bull markets Bear markets Monetary Value Geography Time Real (Austrian) GDP Foreign Exchange Rates Interest Rates Price movements Daily Closing Prices Government Central Banks Money Markets Stock Exchange Commodity Exchange Business Cycles Economic Trends Market Sentiment Fear and Greed Supply / Demand Business Clusters Retail Parks Digital / Fin Tech Leisure / Tourism Creative / Academic Retail Technology Resorts Arts / Sciences Company / SIC Geography Time Entrepreneurs Start-ups Mergers Acquisitions Investors NGAs Government Academic Bodies Capital / Finance Political policy Economic policy Social policy Elite Team Sports Performance Science Winners Loosens Team / Athlete Sport / Club League Tables Medal Tables Sporting Events Team / Athlete Sport / Club Geography Time Performance Data Biomedical Data Sports Governing Bodies RSS News Feeds Social Media Hawk-Eye Pro-Zone Technique Application Form / Fitness Ability / Attitude Training / Coaching Speed / Endurance Future Management Human Activity Natural Events Random Events Waves, Cycles, Patterns, Trends Random Events Geography Time Weak Signals Strong Signals Wild Card Events Black Swan Events Global Internet Content / Big Data Analytics - Horizon Scanning, Tracking and Monitoring Random Events Waves, Cycles, Patterns, Trends, Extrapolations
  122. 122. Clustering in “Big Data” •‫"‏‬BIG‫‏‬DATA”‫‏‬ANALYTICS‫–‏‬ PROFILING, CLUSTERING and 4D‫‏‬GEOSPATIAL‫‏‬ANALYSIS‫‏•‏‬ • The profiling and analysis of large aggregated datasets in order to determine a ‘natural’ structure of data relationships or groupings, is an important starting point forming the basis of many mapping, statistical and analytic applications. Cluster analysis of implicit similarities - such as time-series demographic or geographic distribution - is a critical technique where no prior assumptions are made concerning the number or type of groups that may be found, or their relationships, hierarchies or internal data structures. Geospatial and demographic techniques are frequently used in order to profile and segment populations by ‘natural’ groupings. Shared characteristics or common factors such as Behaviour / Propensity or Epidemiology, Clinical, Morbidity and Actuarial outcomes – allow us to discover and explore previously unknown, concealed or unrecognised insights, patterns, trends or data relationships. •‫‏‬PREDICTIVE‫‏‬ANALYITICS‫‏‬and‫‏‬EVENT‫‏‬FORECASTING‫•‏‬ • Predictive Analytics and Event Forecasting uses Horizon Scanning, Tracking and Monitoring methods combined with Cycle, Pattern and Trend Analysis techniques for Event Forecasting and Propensity Models in order to anticipate a wide range of business. economic, social and political Future Events – ranging from micro-economic Market phenomena such as forecasting Market Sentiment and Price Curve movements - to large-scale macro-economic Fiscal phenomena using Weak Signal processing to predict future Wild Card and Black Swan Events - such as Monetary System shocks.
  123. 123. Cluster Analysis • Data Representation – Metadata - identifying common Data Objects, Types and Formats • Data Taxonomy and Classification – Similarity Matrix (labelled data) – Grouping of explicit data relationships • Data Audit - given any collection of labelled objects..... – Identifying relationships between discrete data items – Identifying common data features - values and ranges – Identifying unusual data features - outliers and exceptions • Data Profiling and Clustering - given any collection of unlabeled objects..... – Pattern Matrix (unlabelled data) – Discover implicit data relationships – Find meaningful groupings in Data (Clusters) – Predictive Analytics – Baysean Event Forecasting – Wave-form Analytics – Periodicity, Cycles and Trends – Explore hidden relationships between discrete data features Many big data problems feature unlabeled objects
  124. 124. k-means/Gaussian-Mixture Clustering of Audio Segments
  125. 125. Cluster Analysis Clustering Algorithms Hundreds of spatial, mathematical and statistical clustering algorithms are available – many clustering algorithms are “admissible” – but no single algorithm alone is “optimal” • K-means • Gaussian mixture models • Kernel K-means • Spectral Clustering • Nearest neighbour • Latent Dirichlet Allocation Challenges‫‏‬in‫“‏‬Big‫‏‬Data”‫‏‬Clustering • Data quality • Volume – number of data items • Cardinality – number of clusters • Synergy – measures of similarity • Values – outliers and exceptions • Cluster accuracy - validity and verification • Homogeneous versus heterogeneous data (structured and unstructured data)
  126. 126. Distributed Clustering Model Performance Clustering 100,000 2-D points with 2 clusters on 2.3 GHz quad-core Intel Xeon processors, with 8GB memory in intel07 cluster Network communication cost increases with the no. of processors K-means Kernel K -means
  127. 127. Distributed Clustering Models Number of processors Speedup Factor - K-means Speedup Factor - Kernel K-means 2 1.1 1.3 3 2.4 1.5 4 3.1 1.6 5 3.0 3.8 6 3.1 1.9 7 3.3 1.5 8 1.2 1.5 K-means Kernel K -means Clustering 100,000 2-D points with 2 clusters on 2.3 GHz quad-core Intel Xeon processors, with 8GB memory in intel07 cluster Network communication cost increases with the no. of processors
  128. 128. Distributed Clustering Model Performance Distributed Approximate Kernel K-means 2-D data set with 2 concentric circles 2.3 GHz quad-core Intel Xeon processors, with 8GB memory in intel07 cluster Run-time Size of dataset (no. of Records) Benchmark Performance (Speedup Factor ) 10K 3.8 100K 4.8 1M 3.8 10M 6.4
  129. 129. HPCC Clustering Models High Performance / High Concurrence Real-time Delivery (HPCC)
  130. 130. Distributed Clustering Models
  131. 131. The Cone™‫‏‬– Brand Loyalty / Affinity

×