2. Digital Transformation
Throughout eternity, all that is of like form comes around again –
everything that is the same must return again in its own
everlasting cycle.....
• Marcus Aurelius – Emperor of Rome •
3. Digital Product Lifecycle Strategy
• Everything that goes around, comes around – everything has its’ own
lifecycle, in its’ own time. Things are born, grow, age, and ultimately
they die. It’s easy to spot a lifecycle in action everywhere you look. As
a person is born, grows, ages, and dies – then so does a star, a tree, a
bird, a bee, or a civilization – and so does a company, a product, a
technology or a market - everything goes around in a lifecycle of it own.
4. Digital Product Lifecycle Strategy
Investment
Product
Lifecycle
Product
Design
Product
Launch
Product
Planning
Death
Plateau
Product
Maturity
Decline
Aging
Early Growth
Migrate
Customers
to new
Products
Withdraw
Innovation Prototype / Pilot / Proof-of-concept
Cash CowCease
Investment
7. The CONE™
The CONE™ - Social Intelligence
Getting to the heart of audiences - and
putting audiences back at the heart of marketing.
8. The CONE™ - Audience Measurement
• Due to severe competition, Communications Service Providers (CSPs) such as 3 Mobile, EE,
Talk-Talk and Vodafone, along with Mobile Virtual Network Operators (MVNOs) such as Virgin,
Tesco and Giff-gaff - no longer make significant profit from their core services (Mobile, Fixed-line
and Broadband). This has caused the dash for “Quad-play”, where CSPs now add Media and
Entertainment Packages to their core network services offering (Mobile, Fixed-line & Broadband).
• TV Set-top Boxes (Virgin, Talk-Talk, Sky, EE) are connected to the Internet and continuously
stream Audience Channel Selection data and Music Play-lists to the Communications Service
Provider (CSP) Audience Insight and Analytics servers. Similarly, Smart Phone Apps (BBC i-
player, Sky Go, Netflix, Spotify) also continuously stream Audience Channel Selection data and
Music Play-lists to the Communications Service Provider (CSP) - via Apigee to AWS Big Data.
• In a typical household (Mother, Father, two children) there may be four Smart Phones and as
many as ten other internet connected devices (Tablets, Laptops, Internet TVs, TV Set-top Boxes
and Video Games Boxes) – all streaming video, audio and data – the details of which are
captured, stored and analysed by the Communications Service Provider (CSP) using “Big Data”
Analytics techniques. This yields valuable Audience Metrics and Analytics based on intimate
understanding of consumer video, audio and internet content from which actionable audience
insights is derived from video, audio and internet streaming data – which drives Personalised
Advertising across all devices (Smart Phone, Tablet, Internet TV, Games Boxes).
9.
10. The CONE™ - Social Intelligence
This revolutionary Digital Marketing approach is called the Cone™- a next-
generation Social Intelligence solution for real-time lifestyle understanding: -
• The Cone™solution uses Social Intelligence to get right to the heart of every
audience - and puts the audience back at the heart of every media organisation.
• The Cone™DigitalMarketingsolution works through Real-time Analytics –
tuning directly into the dynamic nature of people, fashion, media and culture.
• The Cone™solution analyses intimate audience viewing behaviour using Social
Intelligence and Real-time Insight, inspiring better digital marketing campaigns,
faster – ideas which connect directly with the widest possible network audience.
• Most importantly, the Cone™solution tracks and understands the changing
behaviour of viewers, fans and audiences and their propensity to engage with
different ideas, lifestyles, interests, needs, passions, aspirations and desires.
13. Cone™ Lifestyle Understanding
Whatis‘TheCone’?
• At its simplest, TheCone™is a visual metaphor that maps the volume of audiences across an
engagement spectrum with regards to how people connect with different passions and ideas.
• At its most sophisticated, the Cone™ delivers total entertainment digital innovation.
Why a Cone?
• The Cone™ shape is informed by the correlation between the volume of audiences and their propensity
to engage with different passions. This Cone shape proves to be universal in it’s application to brands,
ideas and industries that have ‘fans’ i.e. –
1. The thin, pointy end of the Cone™ -
• Low audience volume but incredibly high engagement and therefore high ‘purchase’ intent’
2. The fat, base end of the Cone™ -
• High audience volume but low engagement and therefore, much lower ‘purchase 'intent’
• We use our proprietary IP to produce The Cone™ in industries and clients that have fans (or at least
where people engage through ‘passionate interest’ vs mere ‘consumption’). Thus TheCone™maps
people as fans and audiences with active interests, needs and desires - not just as passive consumers.
15. Cone™ Lifestyle Understanding
How does the Cone work?
• The principle of TheCone™AudienceMetrics&AnalyticsSolutionis firstly to understand
people’s lives, and then understand the role that different entertainment concepts and content
play in their lives. Using this narrative of understanding, we can gain unique insights, helping
make better and more incisive decisions through understanding who ideas are connecting with
and why that inspires creative marketing. We then apply The Cone™ creative inspiration to
innovate compelling propositions and ideas that will connect with the widest possible audiences.
• On the surface, TheCone™profiles people’s propensity to engage with any given lens e.g. film,
reality TV, music, radio, mobile, etc. along our FECI continuum: ranging from Fanatics through
Enthusiasts to Casuals and “Indifferent” – finally the “Unconnected”. We then use proprietary
data analytics to profile and describe groups of similar people within the FECI continuum.
• TheCone™facilitates our understanding of how groups of like-minded individuals are
connecting (or not connecting…..) with our brand and content – thus we can use intimate
personal insights to learn how to inspire the right kinds of ideas and events to better target brand
positioning and product content, influencing more receptive audiences, so delivering new core
fan connections which drives an expanding and increasingly loyal fan base …..
18. Sony Music: Audience Cone™ / Artist DNA
Sony Music 2007-2011 - Audience Cone™/ Artist DNA
• The key to success at Sony Music was using the AudienceCone™and
Artist DNA in order to help A&R Managers and Producers to understand the
role music plays in people's lives - and then understand the impact of any
particular genre or specific artist within that audience and cultural context.
• We provided a unique approach to make sense of Digital Marketing and
Social Intelligence as part of an Artists musical and career development.
We called it the Artist DNA – a tool which supports the insightful creative
foundation for all artist releases, tours, appearances and campaigns.
• Today the Cone™App- our proprietary solution using the Audience
Cone™and Artist DNA approach – is used by Sony Music in 32 global
territories – placing the audience back at the heart of Sony Music and putting
the artists back at the heart of their audiences - attracting new fans and re-
connecting with old fans – to give the widest possible audience and fan-base.
19. The Challenge – American Idol, 2014
The Challenge – American Idol, 2014
• Analyse the Reality TV audience spectrum so that we can better understand who American Idol
fans are, and therefore gain insight into how we can halt the audience decline of 2014…..
• There is a very real and present Reality TV Cone - because there exists distinct Reality TV audience
clusters - discrete groups of people who engage with Reality TV in a variety of different ways…..
• Reality TV is a well understood lens into how people live out their own lives (they might not admit this) –
so that we can understand viewers lives and lifestyle and engage them through the Reality TV lens.
• We can map this lens through our Fanatics, Enthusiasts, Casuals and Indifferent (FECI) spectrum in
order to place each individual along a continuum of audience interest, affinity, loyalty and engagement.
• We can then profile and segment these people into different groups along the FECI spectrum – and
therefore, those within these groups who have a greater propensity and appetite for American Idol: -
– Viewers with an increased or decreased awareness of the Reality TV genre
– Viewers with a higher or lower interest in Reality TV shows / media coverage
– Viewers with a greater or lesser knowledge of Reality TV presenters / participants
– Viewers who invest more or less time in consuming Reality TV – live / streamed content
22. The Cone™ Application
• Where old-school audience analysis was retrospective and fixed, the
new Cone™ data science is lean, agile, current, fluid and predictive.
• TheCone™App takes our proven Audience Cone™and Artist DNA
approach and puts it on-line to render a custom lens for an audience; a
lens you can zoom, pan and focus - to reveal more hidden detail.
• TheCone™App applies data science and digital analytics principles to
generate innovative marketing insights - translated into a narrative of
real-time audience understanding - that answers the six key questions: -
1. What’s happening now ?
2. Who’s making it happen ?
3. Where is it happening ?
4. Why is it happening ?
5. When is it happening ?
6. How is it happening ?
23. TheCone™Application
Social Intelligence
Cloud
CRM
Data
Profile
Data
CRM / CEM
Big Data
Analytics
Customer Management
(CRM / CEM)
Social
Intelligence
Campaign
Management e-Business
Big Data Analytics
The Cone™
Customer Loyalty
& Brand Affinity
The Cone™
Smart Apps
Audience
Survey Data
Insights
Reports
TV Set-
top Box
24. Proof-of-concept and Prototype
The Cone™approach is lean, agile, smart and creative: -
• We start by providing a custom Cone™ app as a proof of concept. We then work
with client key stakeholders to scope a detailed brief which articulates a business
problem domain that the Cone™ can help resolve.
• Under normal circumstances we utilise all current and past audience research and
any other available internal data to first establish a baseline client Cone™.
• We then augment this by overlaying external data - Social Media Intelligence and
other live streamed audience data that will provide our new real-time view for who /
what / why / where / when and how fan-base and lifestyle understanding.
• Lastly, we apply this understanding social intelligence as new actionable insights
to inform creative marketing campaign solutions against the agreed brief.
• Post proof-of-concept, we then agree a Cone™ app fixed term licence along with
Cone™ consulting, mentoring and support – on-demand, as and when required.
25. The Cone™– Model Design and Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1 Cone™ModelData
Analysis / Design
User
Requirements
Data Analysis &
Data Modelling
Cone™ Logical
Data Model
£k Business /
Data Analyst
2 Cone™DataDesign
– Questionnaire
User
Requirements
Data Analysis &
Data Modelling
Questionnaire
Survey Form
£k Business /
Data Analyst
3 Cone™Physical
Database Design
Logical Data
Model
Cone™
Database
Design
Physical
Cone™ Design
£k Data Analyst
/ DBA
4 Cone™DataLoad–
Questionnaire /
Survey Forms
Physical Data
Model, Survey
Questionnaire
Cone™ Model
Calibration and
Tuning Runs
Initialised
Cone™ Model
£k Business /
Data Analyst,
DBA
2 5 Cone™DataLoad–
In-house CRM and
Audience Data
Physical Data
Model, People
CRM Data
Cone™ Model
CRM Data Load
Populated
Cone™ Model
£k Business /
Data Analyst,
DBA
6 Cone™Profiling Cone™
Clustering
Algorithms
Cone™ Model
Data Profiling –
Kernel k-means
Profiled
Cone™ Model
£k Data Analyst,
DBA, Data
Scientists
3 7 Cone™Streaming
and Segmentation
Historic Sales
and CRM Data
Cone™ History
Matching Runs
Cone™ Historic
Trends
£k Data
Scientists
8 Cone™Real-time
Social Media Feeds
Global Social
Intelligence
Cone™ Real-
Time Analytics
Actionable
Cone™ Insights
(variable with
Cone™ total
data volume)
Data
Scientists
27. The Cone™
The Cone™– Digital Marketing
– turning Social Intelligence into Actionable Marketing Insights / Sales Opportunities…
1. Education Cone™ – Training and Education Business Scenario and Use Cases
2. Utilities Cone™ – Water, Gas and Electricity Business Scenario and Use Cases
3. Media Cone™ – Broadband, Land-line, Mobile and Entertainment Business Scenario and Use Cases
4. Music Cone™ – Brand / Genre / Label / Artists Business Scenario and Use Cases
5. Political Cone™ – Party and Voter Election Business Scenario and Use Cases
6. Fashion Cone™ – Fashion and Luxury Brands Business Scenario and Use Cases
7. Sports Cone™ – Elite Team Sports Franchise Business Scenario and Use Cases
8. Patient Cone™ – Digital Healthcare / medical Business Scenario and Use Cases
29. Telematics
The Internet of Things (IoT) – Smart Devices, Smart Apps, Wearable
Technology, Vehicle Telemetry, Smart Homes and Building Automation
SMACT/4D Digital Technology Stack
30. The Cone™– Model Design and Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1 Cone™ModelData
Analysis / Design
User
Requirements
Data Analysis &
Data Modelling
Cone™ Logical
Data Model
£k Business /
Data Analyst
2 Cone™DataDesign
– Questionnaire
User
Requirements
Data Analysis &
Data Modelling
Questionnaire
Survey Form
£k Business /
Data Analyst
3 Cone™Physical
Database Design
Logical Data
Model
Cone™
Database
Design
Physical
Cone™ Design
£k Data Analyst
/ DBA
4 Cone™DataLoad–
Questionnaire /
Survey Forms
Physical Data
Model, Survey
Questionnaire
Cone™ Model
Calibration and
Tuning Runs
Initialised
Cone™ Model
£k Business /
Data Analyst,
DBA
2 5 Cone™DataLoad–
In-house CRM and
Audience Data
Physical Data
Model, People
CRM Data
Cone™ Model
CRM Data Load
Populated
Cone™ Model
£k Business /
Data Analyst,
DBA
6 Cone™Profiling Cone™
Clustering
Algorithms
Cone™ Model
Data Profiling –
Kernel k-means
Profiled
Cone™ Model
£k Data Analyst,
DBA, Data
Scientists
3 7 Cone™Streaming
and Segmentation
Historic Sales
and CRM Data
Cone™ History
Matching Runs
Cone™ Historic
Trends
£k Data
Scientists
8 Cone™Real-time
Social Media Feeds
Global Social
Intelligence
Cone™ Real-
Time Analytics
Actionable
Cone™ Insights
(variable with
Cone™ total
data volume)
Data
Scientists
31. Social Intelligence – Brand Loyalty and Affinity
CONE SEGMENTS – Brand Loyalty and Affinity
Social Intelligence drives Brand Loyalty and Affinity, Lifestyle Understanding - Fan-base Profiling, Streaming and
Segmentation and marketing Campaigns – expressed in the creation and maintenance of a detailed History and
Balanced Scorecard for every individual in the Cone, allowing summation by Stream / Segment: -
1. Inactive – need to draw their attention towards the Brand
2. Indifferent – need to educate them about core Brand Values
3. Disconnected– need to re-engage with the Brand
4. Casuals – exhibit Brand awareness and interest
5. Followers – follow the Brand, engage with social media and consume brand communications
6. Enthusiasts – engaged with the Brand, participate in Brand / Product / Media events and merchandising
7. Supporters– show strong need, desire and propensity to support Brand / Product / Media consumption
8. Fanatics – demonstrate total Commitment / Dedication / Loyalty for all aspects of the Brand / Product / Media
PROPENSITY – Balanced Scorecard
• Balanced Scorecard – is a summary of all the data-points for an Individual / Stream / Segment
• Propensity Score – In the statistical analysis of observational data, Propensity Score Matching (PSM) is a
statistical matching technique that attempts to estimate the effect of a Campaign / Offer / Promotion or other
intervention by calculating the impact of factors that predict the outcome of the Campaign / Offer / Promotion.
• Propensity Model – is the Baysian probability of the outcome of an event in an Individual / Stream / Segment
• Predictive Analytics - an area of data mining that deals with extracting information from data and using it to
predict trends and behaviour patterns. Often the unknown event of interest is in the future, however, Predictive
Analytics can be applied to any type of event with an unknown outcome - in the past, present or future.
32. Social Intelligence – Streaming and Segmentation
Social
Interaction
Brand
Affinity
Geo-demographic
ProfileExperian Mosaic – 15 Groups (Streams), 66 Types (Segments)
Hybrid Cone – 3 Dimensions
The Cone™
Social Interaction
The Cone™– Streaming & Segmentation
33. Social Intelligence – Social Interaction
Social Interaction Cone Rules
1. Inactive – not engaged – low evidence / low affinity / low interest in Social Media
2. Lone Wolf – sparse / thin social network - may share negative information (Trolling)
3. Home Boy – Social Network clustered around Home Location Postcodes (Gang Culture)
4. Eternal Student – Social Network clustered around School / College / University Alumni
5. Workplace – Social Network clustered around Work and Colleagues (e.g. City Brokers, Traders)
6. Friends and Family – Social Network clustered around physical social contacts - Friends and Family
7. Enthusiast – Social Network clustered around shared, common interests – Sport. Music and Fashion etc.
8. Promiscuous – Open Networker – virtual Social Network across all categories- will connect with anybody
Number of Segments
• With anonymous data (e.g. surveys and polls) then the number of initial Segments is 4 (Matt Hart). With people
data (named individuals) we can discover much richer internal and external data from multiple sources (Social
Media / User Content / Experian) - and therefore segment the population with greater granularity
Individuals Qualifying for Multiple Segments.
• When individuals qualify for multiple segments - we can either add these deviant (non-standard) individuals to
the Segment that they have the greatest affinity with - or kick out any such deviants into an Outlying / Outcast /
Miscellaneous Segment for further statistical processing or for processing throiugh manual intervention
34. Social Intelligence – Actionable Insights
Brand
Affinity
Social
Interaction
Geo-demographic
Profile
Experian Mosaic – 15 Groups (Segments), 66 Types (Streams)
Hybrid Cone – 3 Dimensions
Fanatics - 10%
Enthusiasts - 20%
Casuals - 30%
Indifferent - 40%
The Cone™
Brand Loyalty & Affinity
The Cone™– Actionable Insights
35. Social Interaction
How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around social and environmental issues.
36.
37. The chart above illustrates the richness and diversity of social media.....
38. The pattern of Social Relationships.....
Social Media is the fastest growing category of user-provided global content and will eventually grow
to 20% of all internet content. Gartner defines social media content as unstructured data created,
edited and published by users on external platforms including Facebook, MySpace, LinkedIn, Twitter,
Xing, YouTube and a myriad of other social networking platforms - in addition to internal Corporate
Wikis, special interest group blogs, communications and collaboration platforms.....
Social Mapping is the method used to describe how social linkage between individuals in order to
define Social Networks and to understand the nature of intimate relationships between individuals.
40. Traditional CRM was very much based around data and information that brands could collect
on their customers, all of which would go into a CRM system that then allowed the company
to better target various customers. CRM is comprised of sales, marketing and service /
support–based functions whose purpose was to move the customer through a pipeline with
the goal of keeping the customer coming back to buy more and more stuff......
TRADITIONAL CRM – Customer Management PipelineTRADITIONAL CRM – Customer Management Pipeline
41. Evolution of CRM to SCRM - The challenge for organizations now is adapting and evolving
to meet the needs and demands of these new social customers - many organizations still
do not understand the CRM value of social media.....
SOCIAL CRM – Social Media ConversationsSOCIAL CRM – Social Media Conversations
42. In Social CRM - the customer is actually the focal point of how an organization operates. Instead of
marketing products or pushing messages to customers, brands now talk to and collaborate with
their customers to solve business problems, empower customers to shape their own Customer
Experience and Journeys and develop strong customer relationships - which will over time, turn
participants into brand evangelists and positive customer advocates.....
SOCIAL CRM – Social CRM ProcessesSOCIAL CRM – Social Media Conversations
43. Posted on April 20, 2010 by Laurance Buchanan - Capgemini
SOCIAL CRM – a Business Framework and Operating Model
Social CRM - a Business Framework and Operating Model
SOCIAL CRM – Business Framework and Operating Model
44. Social Graphs and Market Sentiment
•Using“BIGDATA”todriveMarketSentiment•
Unprompted online conversations, statements and news create an online reflection of real-life events and
issues – influencing the thoughts of individual consumers – managing Reputational Risk and so shaping
Market Sentiment. The Social Media data, Blogs and News feeds that form this digital mirror of the world
provides a gold mine of actionable information.....
45. • Influencer Programmes have a long history in
industries such as software, computers and
electronics, - but today they are successfully
deployed across all types of industries including
automotive, smart phones, fashion, health and
nutrition, wine, sports, music, technology, travel
tourism and leisure – and financial services.....
• In a hyper-connected world market-makers and
influencers increasingly provide the gateway to
decision makers who drive consumer behaviour.
• Unprompted online conversations, statements
and news create an online reflection of real-life
events and issues – influencing the thoughts of
individual consumers and so shaping Market
Sentiment.
• The Social Media data and News feeds that form
this digital mirror of the world provides a gold
mine of information. However, unlocking the
data is not straight forward as it requires a
complex and unique set of technologies, skills
and methods.....
INFLUENCER PROGRAMMES – Social Media Conversations
INFLUENCER PROGRAMMES – Social Media Conversations
INFLUENCER PROGRAMMES – Social Media Conversations
47. SalesForce.com – a Cloud Platform Social CRM Business Solution
The Cone™- Digital Marketing
The Cone™- Lifestyle Understanding
Customer Management
(CRM / CEM)
Social
Intelligence
Campaign
Management
e-Business
Big Data Analytics
The Cone™
Customer Loyalty
& Brand Affinity
The Cone™
Smart
Apps
Alarms
& Alerts
Reporting
48. Digital Marketing – Solution Options
Vendor Social
Intelligence
Mobile Big Data Analytics Cloud CRM / CEM
Amazon +
Salesforce
Anomaly 42 Apple iOS +
Android
AWS Elastic
MapReduce
(EMR)
AWS S3
“R” Revolution
Kernel k-means
AWS EC2 SalesForce
+ 3rd Party
Apps Store
Google Google
Analytics
Google
Nexus
Google
Hadoop
Google
Analytics
Google Cloud Google Office
+ Apps
IBM IBM InfoSphere BigInsights IBM Cloud
Microsoft Nokia,
Windows 8
for Mobile
Microsoft
SQL/Server +
Hadoop
Microsoft
Analytics
DOT.NET, C#
Windows
Azure
HDInsight
Microsoft
Office 360 +
Dynamics
Oracle Oracle DBMS +
Hadoop
OBIE Oracle Cloud Oracle CRM
and EBS
SAP SUP + Fiori SAP HANA +
Hadoop
Business
Objects
SAP HANA
Cloud
SAP CRM +
Hybris
49. The Cone™- Digital Marketing
The Cone™
Lifestyle Understanding
TheCone™– Brand Loyalty and Affinity
The Cloud – SalesForce.com
Amazon Web Services (AWS}
Social
Intelligence
Data Science /
Big Data Analytics
Customer Experience
& Journey - CRM / CEM
Alarms / Alerts
Reporting
e-Business Smart Apps
50. The Cone™– Digital Marketing
ConnectingtheUnconnected…..
• FMCG, Media, Entertainment and other enterprises which supply products and services
indirectly to consumers – via Channel Partners such as Distributors, Dealers, Wholesalers
and Retailers – are not directly connected to their customer base. In order to drive brand
strategy and customer loyalty / affinity – they have to reach out to, contact and connect
with, on the most intimate terms - the widest possible range of end-user consumers: -
– Music (e.g. BBC and Sony Music)
– Broadcasting (e.g. Radio 1 / American Idol)
– Digital Media Content (e.g. Sony Films / Netflix)
– Sports Franchises (e.g. Manchester City / New York City)
– Fast Fashion Retailers (e.g. ASOS, Next, New Look, Primark, Top Shop)
– Luxury Brands / Aggregators (e.g. Armani, Burberry, Versace / LVMH, PPR, Richemont)
– Multi-channel Retailers – Loyalty, Campaigns, Offers and Promotions
– Financial Services Companies – Brand Protection and Reputation Management
– Travel, Leisure and Entertainment Organisations - Destination Resorts and Events
– MVNO / CSPs - OTT Business Partner Analytics (Sky Go, Netflix via Firebrand / Apigee)
– Telco, Media and Communications - Churn Management / Conquest / Up-sell / Cross-sell Campaigns
– Digital Healthcare – Private / Public Healthcare Service Provisioning: - Geo-demographic Clustering and
Propensity Modelling (Patient Monitoring, Wellbeing, Clinical Trials, Morbidity and Actuarial Outcomes)
51. The Cone™- Eight Primitives
Primitive Problem / Opportunity Business
Domain
System Function Software Product
Who ? Who are our Customers ? Party - People /
Organisations
CRM / CEM SalesForce.com -
Customer Management
What ? What are they saying
about us ?
Social Media /
Communications
Social Intelligence Google Analytics,
Anomaly 42
Why ? Why - their Interest /
Behaviour / Motivation /
Aspirations / Desires ?
Brand Identity /
Loyalty / Affinity /
Offers / Promos’
Marketing,
Campaign
Management
Predictive Analytics /
Propensity Modelling
Where ? Where do they Live /
Work / Shop / Relax ?
Places -
Location
GIS / GPS Geospatial Analytics
When ? When do they contact /
buy products from us ?
Time / Date Contact Event /
Sales Transaction
Multi-channel Retail /
Mobile Platforms
How ? How do they contact and
connect with us – Media /
Telecoms Channels ?
Communications
Channel
• Mobile
• Internet
• In-store
Multi-channel Retail /
Mobile Platforms
Which ? Which Brands / Ranges /
Categories / Products ?
Retail
Merchandising
Product
Catalogue
IBM Product Centre /
Stebo / Kalido
Via ? Via Business Partners /
3rd Party Channels ?
Sales Channel Retail Channel /
Outlet
Amazon, E-bay, Alibaba
52. The Cone™– EIGHT PRIMITIVES
Event
Dimension
Party
Dimension
Geographic
Dimension
Motivation
Dimension
Time
Dimension
Media
Dimension
Cone™
MEDIA
FACT
WHO ? WHAT ? WHERE ?
HOW ?WHEN ?WHY ?
• Indifferent
• Casuals
• Enthusiasts
• Fanatics
• Radio Show
• Television Show
• Internet Advert
• Campaign
• Offer
• Promotion
• Pre-order
• Purchase
• Download
• Playlist
• Booking
• Attendance
• Advert / Publicity
• Posting / Blog
• Facebook
• LinkedIn
• Myspace
• Twitter
• YouTube
• Xing
• Region / Country
• State / County
• City / Town
• Street / Building
• Postcode
• Person
• Organisation
Product
Dimension
WHICH ?
• Category
• Label / Artist
• Album / Track
• Tour / City / Arena
• Merchandise
Channel
Dimension
VIA ?
• Channel / Partner
• In-store
• Internet Service
• Mobile Smart App
(Spotify etc.)
Advert / Publicity Type
Sales Channel
Posting / Blog
Source / Type
Subject
Location
Media
Event
• Awareness
• Interest
• Need
• DesireMotivation
Customer
Time / Date
Version 2 –
Media Co’s
53. Social Intelligence – Profiling and Analysis
Fanatics - 10%
Enthusiasts - 20%
Casuals - 30%
Indifferent - 40%
The Cone™
Brand Loyalty & Affinity
The Cone™– Profiling & Analysis
54. The Cone™– Model Development
Initialise
Cone™
Model
Cone™
Model
Design
Data Load
Cone™
Model
Calibration
and Tuning
Cone™
History
Matching
Cone™
Real-Time
Analytics
Survey
Script Data
Data Model
Customer
Data
Profiling
Data
Historic
Data
Real-Time
Data
Cone™
Model
Database
Design
Populated
Cone™
Model
Profiled
Cone™
Model
Historic
Trends
Actionable
Insights
Step 1 Step 3 Step 4 Step 5 Step 6Step 2
55. The Cone™– Model Delivery
Phase /
Step
Description Input Design
Process
Output Cost
(estimate)
Skill Set
1 1 Cone™ModelData
Analysis / Design
User
Requirements
Data Analysis &
Data Modelling
Cone™ Logical
Data Model
£k Business /
Data Analyst
2 Cone™DataDesign
– Questionnaire
User
Requirements
Data Analysis &
Data Modelling
Questionnaire
Survey Form
£k Business /
Data Analyst
3 Cone™Physical
Database Design
Logical Data
Model
Cone™
Database
Design
Physical
Cone™ Design
£k Data Analyst
/ DBA
4 Cone™DataLoad–
Questionnaire /
Survey Forms
Physical Data
Model, Survey
Questionnaire
Cone™ Model
Calibration and
Tuning Runs
Initialised
Cone™ Model
£k Business /
Data Analyst,
DBA
2 5 Cone™DataLoad–
In-house CRM and
Audience Data
Physical Data
Model, People
CRM Data
Cone™ Model
CRM Data Load
Populated
Cone™ Model
£k Business /
Data Analyst,
DBA
6 Cone™Profiling Cone™
Clustering
Algorithms
Cone™ Model
Data Profiling –
Kernel k-means
Profiled
Cone™ Model
£k Data Analyst,
DBA, Data
Scientists
3 7 Cone™Streaming
and Segmentation
Historic Sales
and CRM Data
Cone™ History
Matching Runs
Cone™ Historic
Trends
£k Data
Scientists
8 Cone™Real-time
Social Media Feeds
Global Social
Intelligence
Cone™ Real-
Time Analytics
Actionable
Cone™ Insights
(variable with
Cone™ total
data volume)
Data
Scientists
56. The Cone™– Model Implementation
Initialise
Cone™
Model
Cone™
Model
Design
Data Load
Cone™
Model
Calibration
and Tuning
Cone™
History
Matching
Cone™
Real-Time
Analytics
Data Model
Database
Schema
Business
Analyst
DBA
Survey Data
Cone™Model
Data
Architect
DBA
CRM Data
Populated
Cone™Model
Data
Architect
DBA
Stream and
Segment Data
Profiled
Cone™Model
Data
Architect
DBA
Historic Data
Historic
Trends
Data
Architect
Data Scientists
Real-Time
Data
Actionable
Insights
Data
Architect
Data Scientists
57. The Cone™– Digital Marketing
Data Streams into Revenue Streams…..
• Digital Marketing is the communication, advertising and marketing of
brands, products and services via multiple digital channels and channel
partners in order to reach out to, contact and connect, on the most intimate
terms, with the widest possible range of consumers. Through the exploitation of
Digital Media we can initiate and maintain engaging Social Conversations.
• Digital Marketing extends key Brand Messages across every digital platform,
from simple internet marketing to mobile, broadcast and social media channels
– yielding Social Intelligence data in order to discover actionable Marketing
Insights – which in turn convert digital Data Streams into Revenue Streams
• The key objective of Digital Marketing is to reach out to, contact and connect
directly with carefully selected consumers – so that we create strong, lasting
and durable relationships in order to promote key brand, category and product
messages to targeted consumers and thus develop a tangible, valuable. very
real and distinct brand / category / product interest, following, affinity and loyalty
58. The Cone™
Converting Data Streams into Revenue Streams
Salesforce
Anomaly 42
Cone
Unica
End User
BIG DATA
ANALYTICS
SOCIAL MEDIA
E-Commerce
Platform
FULFILMENT
Sales Orders
Salesforce
CRM
Geo-demographics
• Streaming
• Segmentation
• Household Data
SOCIAL CRM
Households
Insights
InsightsInsights
Anomaly
42
Unica
Offers and
Promotions
People
and Places
Campaigns
Social Intelligence
• User Content and Blogs
• Social Groups and NetworksSOCIAL INTELLIGENCE
Actionable Marketing Insights
EXPERIAN
The Cone™
Big Wheel keeps on turning – Perfect Store
59. SalesForce.com – a Cloud Platform Social CRM Business Solution
The Cone™- Digital Marketing
The Cone™- Lifestyle Understanding
Customer Management
(CRM / CEM)
Social
Intelligence
Campaign
Management
e-Business
Big Data Analytics
The Cone™
Customer Loyalty
& Brand Affinity
The Cone™
Smart
Apps
Alarms
& Alerts
Reporting
60. “DATASCIENCE”– my own special area of Business expertise
Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
– MobileEnterprisePlatforms(MEAP’s)
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
– Data Delivery and Consumption
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Presentation and Display
Excel
Web
Mobile
– Data Management Processes
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
Data Extract, Transform, Load
– Performance Acceleration
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Management Tools
DataFlux
Embarcadero
Informatica
Talend
– Info. Management Tools
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
Zybert Gridbox
– Data Warehouse Appliances
Ab Initio
Ascential
Genio
Orchestra
SOCIAL CRM – The Emerging Big Data Stack
63. Salesforce
Anomaly 42
Cone
Unica
End User
BIG DATA
ANALYTICS
Cone™
Brand Affinity
Campaign
CRM
Insights
InsightsInsights
SALES
PEOPLE
DEMOGRAPHICS
Household Data
SOCIAL INTELLIGENCE
User Content, Social
Groups and Networks
Offers and
Promotions
People
& Places
PROFILING
Streaming & Segmentation
TheCone™– CYCLEThe Cone™– CONSUMER CYCLE
e-Business
Smart
Apps
Big Wheel keeps on turning – Perfect Store
64. Hadoop
Clustering and Managing Data.....
Managing Data Transfers in Networked Computer Clusters using Orchestra
To illustrate I/O Bottlenecks, we studied Data Transfer impact in two clustered computing systems: -
Hadoop - using trace from a 3000-node cluster at Facebook
Spark a MapReduce-like framework with iterative machine learning + graph algorithms.
Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, Ion Stoica
University of California, Berkeley
{mosharaf, matei, jtma, jordan, istoica}@cs.berkeley.edu
65. Hadoop Framework
• The workhorse relational database has been the tool of choice for businesses for well over 20
years now. Challengers have come and gone but the trusty RDBMS is the foundation of almost
all enterprise systems today. This includes almost all transactional and data warehousing
systems. The RDBMS has earned its place as a proven model that, despite some quirks, is
fundamental to the very integrity and operational success of IT systems around the world.
• The relational database is finally showing some signs of age as data volumes and network
speeds grow faster than the computer industry's present compliance with Moore's Law can
keep pace with. The Web in particular is driving innovation in new ways of processing
information as the data footprints of Internet-scale applications become prohibitive using
traditional SQL database engines.
• When it comes to database processing today, change is being driven by (at least) four factors:
– Speed. The seek times of physical storage is not keeping pace with improvements in network speeds.
– Scale. The difficulty of scaling the RDBMS out efficiently (i.e. clustering beyond a handful of servers is
notoriously hard.)
– Integration. Today's data processing tasks increasingly have to access and combine data from many
different non-relational sources, often over a network.
– Volume. Data volumes have grown from tens of gigabytes in the 1990s to hundreds of terabytes and
often petabytes in recent years.
66.
67. RDBMS and Hadoop: Apples and Oranges?
• Below is Figure 1 - a comparison of the overall differences between
Database RDBMS and MapReduce-based systems such as Hadoop
• From this it's clear that the MapReduce model cannot replace the
traditional enterprise RDBMS. However, it can be a key enabler of a
number of interesting scenarios that can considerably increase
flexibility, turn-around times, and the ability to tackle problems that
weren't possible before.
• With Database RDBMS platforms, SQL-based processing of data sets
tends to fall away and not scale linearly after a specific volume ceiling,
usually just a handful of nodes in a cluster. With MapReduce, you can
consistently obtain performance gains by increasing the size of the
cluster. In other words, double the size of Hadoop cluster and a job will
run twice as fast - quadruple it will rub four times faster - its the same
linear relationship, irrespective of data volume and throughput.
68. Comparing Data in DWH, Appliances,
Hadoop Clusters and Analytics Engines
RDBMS DWH DWH Appliance Hadoop Cluster Analytics Appliance
Data size Gigabytes Terabytes Petabytes Petabytes
Access Interactive and
batch
Interactive and batch Batch Interactive
Structure Fixed schema Fixed schema Flexible schema Flexible schema
Language SQL SQL Non-procedural
Languages (Java, C++,
Ruby, “R” etc)
Non-procedural
Languages (Java, C++,
Ruby, “R” etc)
Data Integrity High High Low Very High
Architecture Shared memory -
SMP
Shared nothing - MPP Hadoop DFS In-memory Processing
– GPGPUs / SSDs
Virtualisation Partitions / Regions MPP / Nodal MPP / Clustered MPP / Clustered
Scaling Non-linear Nodal / Linear Clustered / Linear Clustered / Linear
Updates Read and write Write once, read many Write once, read many Write once, read many
Selects Row-based Set-based Column-based Array-based
Latency Low – Real-time Low – Near Real-time High – Historic
Reporting
Very Low – Real-time
Analytics
Figure 1: Comparing RDBMS to MapReduce
69. Hadoop Framework
• These datasets would previously have been very challenging and expensive to take on with a
traditional RDBMS using standard bulk load and ETL approaches. Never mind trying to efficiently
combining multiple data sources simultaneously or dealing with volumes of data that simply can't
reside on any single machine (or often even dozens). Hadoop deals with this by using a distributed
file system (HDFS) that's designed to deal coherently with datasets that can only reside across
distributed server farms. HDFS is also fault resilient and so doesn't impose the overhead of RAID
drives and mirroring on individual nodes in a Hadoop compute cluster, allowing the use of truly low
cost commodity hardware.
• So what does this specifically mean to enterprise users that would like to improve their data
processing capabilities? Well, first there are some catches to be aware of. Despite enormous
strengths in distributed data processing and analysis, MapReduce is not good in some key areas that
the RDMS is extremely strong in (and vice versa). The MapReduce approach tends to have high
latency (i.e. not suitable for real-time transactions) compared to relational databases and is
strongest at processing large volumes of write-once data where most of the dataset needs to be
processed at one time. The RDBMS excels at point queries and updates, while MapReduce is best
when data is written once and read many times.
• The story is the same with structured data, where the RDBMS and the rules of database
normalization identified precise laws for preserving the integrity of structured data and which have
stood the test of time. MapReduce is designed for a less structured, more federated world where
schemas may be used but data formats can be much looser and freeform.
70. The Emerging “Big Data” Stack
Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
– Mobile Enterprise Platforms (MEAP’s)
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
– Data Delivery and Consumption
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Presentation and Display
Excel
Web
Mobile
– Data Management Processes
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
Data Extract, Transform, Load
– Performance Acceleration
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast database replication
– Data Management Tools
DataFlux
Embarcadero
Informatica
Talend
– Info. Management Tools
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
Zybert Gridbox
– Data Warehouse Appliances
Ab Initio
Ascential
Genio
Orchestra
71.
72. Hadoop Framework
• Each of these factors is presently driving interest in alternatives that are significantly better at
dealing with these requirements. I'll be clear here: The relational database has proven to be
incredibly versatile and is the right tool for the majority of business needs today. However, the edge
cases for many large-scale business applications are moving out into areas where the RDBMS is
often not the strongest option. One of the most discussed new alternatives at the moment
is Hadoop, a popular open source implementation of MapReduce. MapReduce is a simple yet very
powerful method for processing and analyzing extremely large data sets, even up to the multi-
petabyte level. At its most basic, MapReduce is a process for combining data from multiple inputs
(creating the "map"), and then reducing it using a supplied function that will distill and extract the
desired results. It was originally invented by engineers at Google to deal with the building of
production search indexes. The MapReduce technique has since spilled over into other disciplines
that process vast quantities of information including science, industry, and systems management.
For its part, Hadoop has become the leading implementation of MapReduce.
• While there are many non-relational database approaches out there today (see my emerging IT and
business topics post for a list), nothing currently matches Hadoop for the amount of attention it's
receiving or the concrete results that are being reported in recent case studies. A quick look at
thelist of organizations that have applications powered by Hadoop includes Yahoo! with over
25,000 nodes (including a single, massive 4,000 node cluster), Quantcast which says it has over
3,000 cores running Hadoop and currently processes over 1PB of data per day, and Adknowledge
who uses Hadoop to process over 500 million clickstream events daily using up to 200 nodes
76. Case Study – Huawei SmartCare CEM
Customers
Campaign Mart
Analytics &
Customer
Loyalty
Loyalty Mart
CRM Data
Customer DWH Customer Care
“BIGDATA”
Merchandising &
Logistics Data
Retail Data
Warehouse
Retail
Multi-channel
Sales Analysis
Mobile
Platforms
EPOS Data
Call Centre Data
Internet Data
e-Commerce
Systems
Store Systems
Merchandising
Warehousing
& Logistics
Inventory &
Provisioning
Hadoop Cluster
SAP HANA
ERP
Systems
Finance
Managers
Financial Data
Warehouse
Head
OfficeFinancial
Analysis
Reports
ERP Data
OSS – Network Management
Network Provisioning &
Fault Management
OperationsNetwork Data
Network and
Fault Reports
Operations
Managers
Inventory,
Provisioning &
Replenishment
BSS – Rating, Mediation and Billing
Mediation
Rating and
Billing
Systems
Business
Managers
Supplier Data
Product Data
Customer Data
Inventory &
Provisioning
Reports
Planning &
Forecasting
Systems
CDR Data
Call Data
Warehouse
Billing Data
Autonomy Vertica
Operational
“BIGDATA”
Multi-channel Retail
MSS – Head Office – Finance, Planning &Strategy
Social Media -
External Data
Customer Care
Systems
CRM & Digital
Marketing
Systems
Customers
CEM
SAP HANA
Catalogue
Hadoop ClusterPentaho,
MetLab, “R”
Cloudera
Apache
Hadoop
Framework
77. Big Data – Products
The MapReduce technique has spilled over into many other disciplines that process vast
quantities of information including science, industry, and systems management. The Apache
Hadoop Library has become the most popular implementation of MapReduce – with
framework implementations from Cloudera, Hortonworks and MAPR
79. Apache Hadoop Component Stack
HDFS
MapReduce
Pig
Zookeeper
Hive
HBase
Oozie
Mahoot
Hadoop Distributed File System (HDFS)
Scalable Data Applications Framework
Procedural Language – abstracts low-level MapReduce operators
High-reliability distributed cluster co-ordination
Structured Data Access Management
Hadoop Database Management System
Job Management and Data Flow Co-ordination
Scalable Knowledge-base Framework
80. Data Management Component Stack
Informatica
Drill
Millwheel
Informatica Big Data Edition / Vibe Data Stream
Data Analysis Framework
Data Analytics on-the-fly + Extract – Transform – Load Framework
Flume
Sqoop
Scribe
Extract – Transform - Load
Extract – Transform - Load
Extract – Transform - Load
Talend Extract – Transform - Load
Pentaho Extract – Transform – Load Framework + Data Reporting on-the-fly
81. Big Data Storage Platforms
Autonomy
Vertica
MongoDB
HP Unstructured Data DBMS
HP Columnar DBMS
High-availability DBMS
CouchDB
Couchbase Database Server for Big Data with NoSQL / Hadoop
Integration
Pivotal Pivotal Big Data Suite – GreenPlum, GemFire, SQLFire, HAWQ
Cassandra
Cassandra Distributed Database for Big Data with NoSQL and
Hadoop Integration
NoSQL NoSQL Database for Oracle, SQL/Server, Couchbase etc.
Riak
Basho Technologies Riak Big Data DBMS with NoSQL / Hadoop
Integration
82. Big Data Analytics Engines and Appliances
Alpine
Karmasphere
Kognito
Alpine Data Studio - Advanced Big Data Analytics
Karmasphere Studio and Analyst – Hadoop Customer Analytics
Kognito In-memory Big Data Analytics MPP Platform
Skytree
Redis
Skytree Server Artificial Intelligence / Machine Learning Platform
Redis is an open source key-value database for AWS, Pivotal etc.
Teradata Teradata Appliance for Hadoop
Neo4j Crunchbase Neo4j - Graphical Database for Big Data
InfiniDB Columnar MPP open-source DB version hosted on GitHub
Big Data Analytics Engines / Appliances
83. Big Data Analytics and Visualisation Platforms
Tableaux Tableaux - Big Data Visualisation Engine
Eclipse Symentec Eclipse - Big Data Visualisation
Mathematica Mathematical Expressions and Algorithms
StatGraphics Statistical Expressions and Algorithms
FastStats Numerical computation, visualization and programming toolset
MatLab
R
Data Acquisition and Analysis Application Development Toolkit
“R”StatisticalProgramming/AlgorithmLanguage
Revolution RevolutionAnalyticsFrameworkandLibraryfor“R”
84. Hadoop / Big Data Extended Infrastructure Stack
SSD Solid State Drive (SSD) – configured as cached memory / fast HDD
CUDA CUDA (Compute Unified Device Architecture)
GPGPU GPGPU (General Purpose Graphical Processing Unit Architecture)
IMDG IMDG (In-memory Data Grid – extended cached memory)
Vibe
Splunk
High Velocity / High Volume Machine / Automatic Data Streaming
High Velocity / High Volume Machine / Automatic Data Streaming
Ambari High-availability distributed cluster co-ordination
YARN Hadoop Resource Scheduling
Big Data Extended Architecture Stack
85. Cloud-based Big-Data-as-a-Service and Analytics
AWS
Amazon Web Services (AWS) – Big Data-as-a-Service (BDaaS)
Elastic Compute Cloud (ECC) and Simple Storage Service (S3)
1010 Data Big Data Discovery, Visualisation and Sharing Cloud Platform
SAP HANA SAP HANA Cloud - In-memory Big Data Analytics Appliance
Azure Microsoft Azure Data-as-a-Service (DaaS) and Analytics
Anomaly 42 Anomaly 42 Smart-Data-as-a-Service (SDaaS) and Analytics
Workday Workday Big-Data-as-a-Service (BDaaS) and Analytics
Google Cloud
Google Cloud Platform – Cloud Storage, Compute Platform,
Firebrand API Resource Framework
Apigee Apigee API Resource Framework
89. Data Warehouse Appliance / Real-time
Analytics Engine Price Comparison
Manufacturer
Server
Configuration
Cached Memory
Server
Type
Software
Platform
Cost (est.)
SAP HANA 32-node (4
Channels x 8 CPU)
1.3 Terabytes SMP Proprietary $ 6,000,,000
Teradata 20-node (2
Channels x 10 CPU)
1 Terabyte MPP Proprietary $ 1,000,000
Netezza
(now IBM)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Proprietary $ 180,000
IBM ex5 (non-HANA
configuration)
32-node (4
Channels x 8 CPU)
1.3 Terabytes SMP Proprietary $ 120,000
Greenplum (now
Pivotal)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Open Source $ 20,000
XtremeData xdb
(BO BW)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Open Source $ 18,000
Zybert Gridbox 48-node (4
Channels x 12 CPU)
20 Terabytes SMP Open Source $ 60,000
90. Clustering in “Big Data”
“A Cluster is a group of the same or similar data elements
which are aggregated – or closely distributed – together”
Clustering is a technique used to explore content and
understand information in every business sector and scientific
field that collects and processes very large volumes of data
Clustering is an essential tool for any “Big Data” problem
91.
92. • “BigData”refers to vast aggregations (super sets) consisting of numerous individual
datasets (structured and unstructured) - whose size and scope is beyond the capability of
conventional transactional (OLTP) or analytics (OLAP) Database Management Systems
and Enterprise Software Tools to capture, store, analyse and manage. Examples of “Big
Data” include the vast and ever changing amounts of data generated in social networks
where we maintain Blogs and have conversations with each other, news data streams,
geo-demographic data, internet search and browser logs, as well as the ever-growing
amount of machine data generated by pervasive smart devices - monitors, sensors and
detectors in the environment – captured via the Smart Grid, then processed in the Cloud –
and delivered to end-user Smart Phones and Tablets via Intelligent Agents and Alerts.
• Data Set Mashing and “BigData”GlobalContentAnalysis – drives Horizon Scanning,
Monitoring and Tracking processes by taking numerous, apparently un-related RSS and
other Information Streams and Data Feeds, loading them into Very large Scale (VLS)
DWH Structures and Document Management Systems for Real-time Analytics – searching
for and identifying possible signs of relationships hidden in data (Facts/Events)– in order to
discover and interpret previously unknown Data Relationships driven by hidden Clustering
Forces – revealed via “WeakSignals”indicating emerging and developing Application
Scenarios, Patterns and Trends - in turn predicating possible, probable and alternative
global transformations which may unfold as future “WildCard”or “BlackSwan”events.
“Big Data”
93. Clustering in “Big Data”
• The profiling and analysis of
large aggregated datasets in
order to determine a ‘natural’
structure of groupings provides
an important technique for many
statistical and analytic
applications. Cluster analysis
on the basis of profile similarities
or geographic distribution is a
method where no prior
assumptions are made
concerning the number of
groups or group hierarchies and
internal structure. Geo-
demographic techniques are
frequently used in order to
profile and segment populations
by ‘natural’ groupings - such as
common behavioural traits,
Clinical Trial, Morbidity or
Actuarial outcomes - along with
many other shared
characteristics and common
factors.....
94. Clustering in “Big Data”
•"BIGDATA”ANALYTICS– PROFILING, CLUSTERING and 4DGEOSPATIALANALYSIS•
• The profiling and analysis of large aggregated datasets in order to determine a ‘natural’
structure of data relationships or groupings, is an important starting point forming the basis of
many mapping, statistical and analytic applications. Cluster analysis of implicit similarities -
such as time-series demographic or geographic distribution - is a critical technique where no
prior assumptions are made concerning the number or type of groups that may be found, or
their relationships, hierarchies or internal data structures. Geospatial and demographic
techniques are frequently used in order to profile and segment populations by ‘natural’
groupings. Shared characteristics or common factors such as Behaviour / Propensity or
Epidemiology, Clinical, Morbidity and Actuarial outcomes – allow us to discover and explore
previously unknown, concealed or unrecognised insights, patterns, trends or data relationships.
•PREDICTIVEANALYITICSandEVENTFORECASTING•
• Predictive Analytics and Event Forecasting uses Horizon Scanning, Tracking and Monitoring
methods combined with Cycle, Pattern and Trend Analysis techniques for Event Forecasting
and Propensity Models in order to anticipate a wide range of business. economic, social and
political Future Events – ranging from micro-economic Market phenomena such as forecasting
Market Sentiment and Price Curve movements - to large-scale macro-economic Fiscal
phenomena using Weak Signal processing to predict future Wild Card and Black Swan Events
- such as Monetary System shocks.
95.
96. Multi-channel Retail - Digital Architecture
• The last decade has seen an unprecedented explosion in mobile platforms
as the internet and mobile worlds came of age. It is no longer acceptable to
have only a bricks-and-mortar high-street presence – customer-focused
companies are now expected to deliver their Customer Experience and
Journey via internet websites, mobiles and more recently tablets.
97. Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
– Mobile Enterprise Platforms (MEAP’s)
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
– Data Delivery and Consumption
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Presentation and Display
Excel
Web
Mobile
– Data Management Processes
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
– Performance Acceleration
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Management Tools
DataFlux
Embarcadero
Informatica
Talend
– Info. Management Tools
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
– Data Warehouse Appliances
Ab Initio
Ascential
Genio
Orchestra
Social Intelligence – The Emerging Big Data Stack
98. GIS MAPPING and SPATIAL DATA ANALYSIS
• A Geographic Information System (GIS) integrates hardware, software and
digital data capture devices for acquiring, managing, analysing, distributing and
displaying all forms of geographically dependant location data – including
machine generated data such as Computer-aided Design (CAD) data from land
and building surveys, Global Positioning System (GPS) terrestrial location data -
as well as all kinds of data streams - HDCCTV, aerial and satellite image data.....
99. GIS Mapping and Spatial Analysis
•GISMAPPINGandSPATIALDATAANALYSIS•
• A Geographic Information System (GIS) integrates hardware, software and
digital data capture devices for acquiring, managing, analysing, distributing and
displaying all forms of geographically dependant location data – including machine
generated data such as Computer-aided Design (CAD) data from land and
building surveys, Global Positioning System (GPS) terrestrial location data - as
well as all kinds of data streams - HDCCTV, aerial and satellite image data.....
• Spatial Data Analysis is a set of techniques for analysing 3-dimensional spatial
(Geographic) data and location (Positional) object data overlays. Software that
implements spatial analysis techniques requires access to both the locations of
objects and their physical attributes. Spatial statistics extends traditional statistics
to support the analysis of geographic data. Spatial Data Analysis provides
techniques to describe the distribution of data in the geographic space (descriptive
spatial statistics), analyse the spatial patterns of the data (spatial pattern or cluster
analysis), identify and measure spatial relationships (spatial regression), and
create a surface from sampled data (spatial interpolation, usually categorized as
geo-statistics).
• The results of spatial data analysis are largely dependent upon the type,
quantity, distribution and data quality of the spatial objects under analysis.
101. Geo-demographic Clustering in “Big Data”
•GEODEMOGRAPHICPROFILING– CLUSTERINGIN“BIGDATA”•
• The profiling and analysis of large aggregated datasets in order to determine a
‘natural’ or implicit structure of data relationships or groupings where no prior
assumptions are made concerning the number or type of groups discovered or group
relationships, hierarchies or internal data structures - in order to discover hidden data
relationships - is an important starting point forming the basis of many statistical and
analytic applications. The subsequent explicit Cluster Analysis as of discovered data
relationships is a critical technique which attempts to explain the nature, cause and
effect of those implicit profile similarities or geographic distributions. Demographic
techniques are frequently used in order to profile and segment populations using
‘natural’ groupings - such as common behavioural traits, Clinical, Morbidity or Actuarial
outcomes, along with many other shared characteristics and common factors – and
then attempt to understand and explain those natural group affinities and geographical
distributions using methods such as Causal Layer Analysis (CLA).....
102. GIS Mapping and Spatial Analysis
• A Geographic Information System (GIS) integrates hardware, software and digital
data capture devices for acquiring, managing, analysing, distributing and displaying all
forms of geographically dependant location data – including machine generated data
such as Computer-aided Design (CAD) data from land and building surveys, Global
Positioning System (GPS) terrestrial location data - as well as all kinds of data
streams - HDCCTV, aerial and satellite image data.....
• Spatial Data Analysis is a set of techniques for analysing spatial (Geographic)
location data. The results of spatial analysis are dependent on the locations of
the objects being analysed. Software that implements spatial analysis techniques
requires access to both the locations of objects and their physical attributes.
• Spatial statistics extends traditional statistics to support the analysis of geographic
data. Spatial Data Analysis provides techniques to describe the distribution of data in
the geographic space (descriptive spatial statistics), analyse the spatial patterns of the
data (spatial pattern or cluster analysis), identify and measure spatial relationships
(spatial regression), and create a surface from sampled data (spatial interpolation,
usually categorized as geo-statistics).
106. Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume
– Mobile Enterprise Platforms (MEAP’s)
– Data Delivery and Consumption
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Management Processes
– Performance Acceleration
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
Data Extract, Transform, Load
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Presentation and Display
– Data Management Tools
– Info. Management Tools
– Data Warehouse Appliances
Excel
Web
Mobile
DataFlux
Embarcadero
Informatica
Talend
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
Zybert Gridbox
Ab Initio
Ascential
Genio
Orchestra
107. Clustering Phenomena in “Big Data”
“A Cluster is a group of profiled data similarities aggregated closely together”
• Cluster Analysis is a technique which is used to explore very large volumes of
structured and unstructured data - transactional, machine generated (automatic)
social media and internet content and geo-demographic information - in order to
discover previously unknown, unrecognised or hidden logical data relationships.
108. Event Clusters and Connectivity
A
B
C
D
E
G
H
F
The above is an illustration of Event relationships - how Events might be connected. Any detailed,
intimate understanding of the connection between Events may help us to answer questions such as: -
• If Event A occurs does it make Event B or H more or less likely to occur ?
• If Event B occurs what effect does it have on Events C,D,E, F and G ?
Answering questions such as these allows us to plan our Event Management approach and Risk
mitigation strategy – and to decide how better to focus our Incident / Event resources and effort…..
109. Event Clusters and Connectivity
• Aggregated Event includes coincident, related, connected and interconnected Event: -
• Coincident - two or more Events appear simultaneously in the same domain –
but they arise from different triggers (unrelated causal events)
• Related - two more Events materialise in the same domain sharing common
Event features or characteristics (may share a possible hidden common trigger or
cause – and so are candidates for further analysis and investigation)
• Connected - two more Events materialise in the same domain due to the same
trigger (common cause)
• Interconnected - two more Events materialise together in a Event cluster, series
or “storm” - the previous (prior) Event event triggering the subsequent (next) event
in an Event Series…..
• A series of Aggregated Events may result in a significant cumulative impact - and are
therefore frequently identified incorrectly as Wild-card or Black Swan Events - rather
than just simply as event clusters or event “storms”.....
110. Event Clusters and Connectivity
1
2
3
4
5
7
8
6
The above is an illustration of Event relationships - how Risk Events might be connected. A detailed and
intimate understanding of Event clusters and the connection between Events may help us to understand: -
• What is the relationship between Events 1 and 8, and what impact do they have on Events 2 - 7 ?
• Events 2 - 5 and Events 6 and 7 occur in clusters – what are the factors influencing these clusters ?
Answering questions such as these allows us to plan our Risk Event management approach and mitigation
strategy – and to decide how to better focus our resources and effort on Risk Events and fraud management.
Claimant 1
Risk Event
Claimant 2
Residence
Vehicle
Event
Cluster
111. Aggregated Event Types
ATrigger A
Coincident Events
BTrigger B
Event
Event
CTrigger 1
Related Events
DTrigger 2
Event
Event
E
Trigger
Connected Events
Event
EventF
GTrigger
Inter-connected Events
Event Event
H
113. • 4D Geospatial Analytics is the
profiling and analysis of large
aggregated datasets in order to
determine a ‘natural’ structure of
groupings provides an important
technique for many statistical and
analytic applications.
• Demographic and Geospatial
Cluster Analysis - on the basis of
profile similarities or geographic
distribution - is a statistical method
whereby no prior assumptions are
made concerning the number of
groups or group hierarchies and
internal structure. Geo-spatial and
geodemographic techniques are
frequently used in order to profile and
segment populations by ‘natural’
groupings - such as common
behavioural traits, Clinical Trial,
Morbidity or Actuarial outcomes - along
with many other shared characteristics
and common factors.....
4D Geospatial Analytics
114. The Flow of Information through Time
• String Theory predicates that Space-Time exists in discrete packages, with Time Present always
in some way inextricably woven into both Time Past and Time Future. This yields the intriguing
possibility of insights through the mists of time into the outcome of future events – as any item of
Data or Information (Global Content) may contain faint traces which offer glimpses into the future
trajectory of Clusters of linked Past, Present and Future Events. If all future timeline were linear,
then every event would unfold in an unerringly predictable manner towards a known and certain
conclusion. The future is, however, both unknown and unknowable (Hawking Paradox) . Future
outcomes are uncertain – future timelines are non-linear (branched) with a multitude of possible
alternative futures. Chaos Theory suggests that even the most subliminal inputs, originating from
unknown forces so minute as to be undetectable, might become amplified through numerous
system cycles to grow in influence and impact over time – deviating Space-Time trajectories far
away from their original predicted path – so fundamentally altering the outcome of future events.
• Every item of Global Content in the Present is somehow connected with both Past and Future
temporal planes. Space-Time is a Dimension Cluster consisting of the three Spatial dimensions
(x, y and z axes) plus Time (the fourth dimension - t) – which together flow in a single direction –
relentlessly towards the future. Space-Time does not flow uniformly – the “arrow of time” may
be deflected by unknown factors. There may exist “hidden external forces” (unseen interactions)
that create disturbance in the temporal plane stack which marks the passage of time - with the
potential to create eddies, vortices and whirlpools along the trajectory of Time (chaos, disorder
and uncertainty) – which in turn posses the capacity to generate ripples and waves (randomness
and disruption) – thus changing the course of the Space-Time continuum. “WeakSignals”are
“GhostsintheMachine” – echoes of these subliminal temporal interactions – that may contain
within insights or clues about possible future “Wildcard” or “BlackSwan”random events
115. 4D Geospatial Analytics – The Temporal Wave
• The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration
of Geospatial “Big Data” - simultaneously within a Time (history) and Space (geographic)
context. The problems encountered in exploring and analysing vast volumes of spatial–
temporal information in today's data-rich landscape – are becoming increasingly difficult to
manage effectively. In order to overcome the problem of data volume and scale in a Time
(history) and Space (location) context requires not only traditional location–space and
attribute–space analysis common in GIS Mapping and Spatial Analysis - but now with the
additional dimension of time–space analysis. The Temporal Wave supports a new method
of Visual Exploration for Geospatial (location) data within a Temporal (timeline) context.
• This time-visualisation approach integrates Geospatial (location) data within a Temporal
(timeline) dataset - along with data visualisation techniques - thus improving accessibility,
exploration and analysis of the huge amounts of geo-spatial data used to support geo-
visual “Big Data” analytics. The temporal wave combines the strengths of both linear
timeline and cyclical wave-form analysis – and is able to represent data both within a Time
(history) and Space (geographic) context simultaneously – and even at different levels of
granularity. Linear and cyclic trends in space-time data may be represented in combination
with other graphic representations typical for location–space and attribute–space data-
types. The Temporal Wave can be used in roles as a time–space data reference system,
as a time–space continuum representation tool, and as time–space interaction tool.
117. 4D Geospatial Analytics – London Timeline
• How did London evolve from its creation as a Roman city in 43AD into the
crowded, chaotic cosmopolitan megacity we see today? The London Evolution
Animation takes a holistic view of what has been constructed in the capital over
different historical periods – what has been lost, what saved and what protected.
• Greater London covers 600 square miles. Up until the 17th century, however,
the capital city was crammed largely into a single square mile which today is
marked by the skyscrapers which are a feature of the financial district of the City.
• This visualisation, originally created for the Almost Lost exhibition by the Bartlett
Centre for Advanced Spatial Analysis (CASA), explores the historic evolution of
the city by plotting a timeline of the development of the road network - along with
documented buildings and other features – through 4D geospatial analysis of a
vast number of diverse geographic, archaeological and historic data sets.
• Unlike other historical cities such as Athens or Rome, with an obvious patchwork
of districts from different periods, London's individual structures scheduled sites
and listed buildings are in many cases constructed gradually by parts assembled
during different periods. Researchers who have tried previously to locate and
document archaeological structures and research historic references will know
that these features, when plotted, appear scrambled up like pieces of different
jigsaw puzzles – all scattered across the contemporary London cityscape.
118. • The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration
of Geospatial “Big Data” - simultaneously within a Time (history) and Space (geographic)
context. The problems encountered in exploring and analysing vast volumes of spatial–
temporal information in today's data-rich landscape – are becoming increasingly difficult to
manage effectively. In order to overcome the problem of data volume and scale in a Time
(history) and Space (location) context requires not only traditional location–space and
attribute–space analysis common in GIS Mapping and Spatial Analysis - but now with the
additional dimension of time–space analysis. The Temporal Wave supports a new method
of Visual Exploration for Geospatial (location) data within a Temporal (timeline) context.
• This time-visualisation approach integrates Geospatial (location) data within a Temporal
(timeline) dataset - along with data visualisation techniques - thus improving accessibility,
exploration and analysis of the huge amounts of geo-spatial data used to support geo-
visual “Big Data” analytics. The temporal wave combines the strengths of both linear
timeline and cyclical wave-form analysis – and is able to represent data both within a Time
(history) and Space (geographic) context simultaneously – and even at different levels of
granularity. Linear and cyclic trends in space-time data may be represented in combination
with other graphic representations typical for location–space and attribute–space data-
types. The Temporal Wave can be used in roles as a time–space data reference system,
as a time–space continuum representation tool, and as time–space interaction tool.
4D Geospatial Analytics – The Temporal Wave
119. Social Intelligence – Brand Affinity
CONE SEGMENTS - BRAND AFFINITY
• Social Intelligence drives Brand Loyalty Understanding - Fan-base Profiling, Streaming and Segmentation –
expressed in the creation and maintenance of a detailed History and Balanced Scorecard for every individual in
the Cone, allowing summation by Stream / Segment: -
1. Inactive – need to draw their attention towards the Brand
2. Indifferent – need to educate them about core Brand Values
3. Disconnected– need to re-engage with the Brand
4. Casuals – exhibit Brand awareness and interest
5. Followers – follow the Brand, engage with social media and consume brand communications
6. Enthusiasts – engaged with the Brand, participate in Brand / Product / Media events and merchandising
7. Supporters– show strong need, desire and propensity to support Brand / Product / Media consumption
8. Fanatics – demonstrate total Commitment / Dedication / Loyalty for all aspects of the Brand / Product / Media
PROPENSITY
• Balanced Scorecard – is a summary of all the data-points for an Individual / Stream / Segment
• Propensity Score – In the statistical analysis of observational data, Propensity Score Matching (PSM) is a
statistical matching technique that attempts to estimate the effect of a Campaign / Offer / Promotion or other
intervention by calculating the impact of factors that predict the outcome of the Campaign / Offer / Promotion.
• Propensity Model – is the Baysian probability of the outcome of an event in an Individual / Stream / Segment
• Predictive Analytics - an area of data mining that deals with extracting information from data and using it to
predict trends and behaviour patterns. Often the unknown event of interest is in the future, however, Predictive
Analytics can be applied to any type of event with an unknown outcome - in the past, present or future.
121. Social Intelligence – Fan-base Understanding
CONE STREAMING and SEGMENTATION
• Multiple Cones can be created and cross-referenced using Social Intelligence and Brand
Interaction / Fan-base Profiling and Segmentation in order to deliver actionable insights for any
genre of Brand Loyalty and Fan-base Understanding – as well as for other Geo-demographic
Analytics purposes – e.g. Digital Healthcare, Clinical Trials, Morbidity and Actuarial Outcomes: -
– Music (e.g. BBC and Sony Music)
– Broadcasting (e.g. Radio 1 / American Idol)
– Digital Media Content (e.g. Sony Films / Netflix)
– Sports Franchises (e.g. Manchester City / New York City)
– Sport Footwear and Apparel (e.g. Nike, Puma, Adidas, Reebok)
– Fast Fashion Retailers (e.g. ASOS, Next, New Look, Primark)
– Luxury Brands / Aggregators (e.g. Armani, Burberry, Versace / LVMH, PPR, Richemont)
– Multi-channel Retailers – Brand Affinity / Loyalty Marketing + Product Campaigns, Offers & Promotions
– Financial Services Companies – Brand Protection and Reputation Management
– Travel, Leisure and Entertainment Organisations - Destination Events and Resorts
– MVNO / CSPs - OTT Business Partner Analytics (Sky Go, Netflix, iPlayer via Firebrand / Apigee)
– Telco, Media and Communications - Churn Management / Conquest / Up-sell / Cross-sell Campaigns
– Digital Healthcare – Private / Public Healthcare Service Provisioning: - Geo-demographic Clustering and
Propensity Modelling (Patient Monitoring, Wellbeing, Clinical Trials, Morbidity and Actuarial Outcomes)
123. Social Intelligence – Social Interaction
Social Interaction Cone Rules
1. Inactive – not engaged – low evidence / low affinity / low interest in Social Media
2. Lone Wolf – sparse / thin social network - may share negative information (Trolling)
3. Home Boy – Social Network clustered around Home Location Postcodes (Gang Culture)
4. Eternal Student – Social Network clustered around School / College / University Alumni
5. Workplace – Social Network clustered around Work and Colleagues (e.g. City Brokers, Traders)
6. Friends and Family – Social Network clustered around physical social contacts - Friends and Family
7. Enthusiast – Social Network clustered around shared, common interests – Sport. Music and Fashion etc.
8. Promiscuous – Open Networker – virtual Social Network across all categories- will connect with anybody
Number of Segments
• With anonymous data (e.g polls) then the
number of initial Segments is 4 (Matt
Holland). With named individuals we can
discover much richer internal and external
124. Social Interaction
How consumers use social media (e.g., Facebook, Twitter) to address and/or engage with companies around social and environmental issues.
125. Clustering in “Big Data”
“A Cluster is a group of profiled data similarities aggregated closely together”
• Cluster Analysis is a technique used to explore very large volumes of transactional and
machine generated (automatic) data, social media and internet content and information -
in order to discover previously unknown, unrecognised or hidden data relationships.
• Clustering is an essential tool for any “BigData”problem. Cluster Analysis of both
explicit (given) or implicit (discovered) data relationships in “BigData”is a critical
technique which attempts to explain the nature, cause and effect of the forces which drive
clustering. Any observed profiled data similarities – geographic or temporal aggregations,
mathematical or statistical distributions – may be explained through Causal Layer Analysis.
– Choice of clustering algorithm and parameters are both process and data dependent
– Approximate Kernel K-means provides a good trade-off between clustering accuracy and
data volumes, throughput, performance and scalability
– Challenges include homogeneous and heterogeneous data (structured versus unstructured
data), data quality, streaming, scalability, cluster cardinality and validity
126. Cluster Types
Deep Space Galactic Clusters
Hadoop Cluster – “Big Data” Servers
Molecular Clusters
Geo-Demographic Clusters
Mineral Lode Clusters
127. •GEODEMOGRAPHICPROFILING– CLUSTERINGIN“BIGDATA”•
• The profiling and analysis of very large aggregated datasets to determine ‘natural’ or
implicit data relationships and discover hidden common factors and data structures -
where no prior assumptions are made concerning the number or type of groups - is
driven by uncovering previously unknown data relationships and natural groupings.
The discovery of such Cluster / Group relationships, hierarchies or internal data
structures is an important starting point forming the basis of many statistical and
analytic applications which are designed to expose hidden data relationships.
• A subsequent explicit Cluster Analysis of previously discovered data relationships is
an important technique which attempts to understand the true nature, cause and
impact of unknown clustering forces driving implicit profile similarities, mathematical
and geographic distributions. Geo-demographic techniques are frequently used in
order to profile and segment Demographic and Spatial data by ‘natural’ groupings –
including common behavioural traits, Clinical Trial, Morbidity or Actuarial outcomes –
along with numerous other shared characteristics and common factors Cluster
Analysis attempt to understand and explain those natural group affinities and
geographical distributions using methods such as Causal Layer Analysis (CLA).....
Clustering in “Big Data”
128. Cluster Types
DISCIPLINE CLUSTER TYPE CLUSTERS DIMENSIONS DATA TYPE DATA SOURCE CLUSTERING
FACTORS /
FORCES
Astrophysics 4D Distribution of
Matter across the
Universe through
Space and Time
Star Systems
Stellar Clusters
Galaxies
Galactic Clusters
Mass / Energy
Space / Time
Astronomy Images –
Microwave, Infrared,
Optical, Ultraviolet, Radio,
X-ray, Gamma-ray
Optical Telescope
Infrared Telescope
Radio Telescope
X-ray Telescope
Gravity
Dark Matter
Dark Energy
Dark Flow
Climate Change Temperature Changes
Precipitation Changes
Ice-mass Changes
Hot / Cold
Dry / Wet
More / Less ice
Temperature
Precipitation
Sea / Land Ice
Average Temperature
Average Precipitation
Greenhouse Gases %
Weather Station Data
Ice Core Data
Tree-ring Data
Solar Forcing
Oceanic Forcing
Atmospheric Forcing
Actuarial Science
Morbidity, Clinical
Trials, Epidemiology
Place / Date of birth
Place / Date of death
Cause of Death
Birth / Death
Longevity
Cause of Death
Medical Events
Geography
Time
Biomedical Data
Demographic Data
Geographic data
Register of Births
Register of Deaths
Medical Records
Health
Wealth
Demographics
Price Curves
Economic Modelling
Long-range Forecasting
Economic growth
Economic recession
Bull markets
Bear markets
Monetary Value
Geography
Time
Real (Austrian) GDP
Foreign Exchange Rates
Interest Rates
Price movements
Daily Closing Prices
Government
Central Banks
Money Markets
Stock Exchange
Commodity Exchange
Business Cycles
Economic Trends
Market Sentiment
Fear and Greed
Supply / Demand
Business Clusters Retail Parks
Digital / Fin Tech
Leisure / Tourism
Creative / Academic
Retail
Technology
Resorts
Arts / Sciences
Company / SIC
Geography
Time
Entrepreneurs
Start-ups
Mergers
Acquisitions
Investors
NGAs
Government
Academic Bodies
Capital / Finance
Political policy
Economic policy
Social policy
Elite Team Sports
Performance Science
Winners
Loosens
Team / Athlete
Sport / Club
League Tables
Medal Tables
Sporting Events
Team / Athlete
Sport / Club
Geography
Time
Performance Data
Biomedical Data
Sports Governing Bodies
RSS News Feeds
Social Media
Hawk-Eye
Pro-Zone
Technique
Application
Form / Fitness
Ability / Attitude
Training / Coaching
Speed / Endurance
Future Management Human Activity
Natural Events
Random Events
Waves, Cycles,
Patterns, Trends
Random Events
Geography
Time
Weak Signals
Strong Signals
Wild Card Events
Black Swan Events
Global Internet Content /
Big Data Analytics -
Horizon Scanning,
Tracking and Monitoring
Random Events
Waves, Cycles,
Patterns, Trends,
Extrapolations
129. Clustering in “Big Data”
•"BIGDATA”ANALYTICS– PROFILING, CLUSTERING and 4DGEOSPATIALANALYSIS•
• The profiling and analysis of large aggregated datasets in order to determine a ‘natural’
structure of data relationships or groupings, is an important starting point forming the basis of
many mapping, statistical and analytic applications. Cluster analysis of implicit similarities -
such as time-series demographic or geographic distribution - is a critical technique where no
prior assumptions are made concerning the number or type of groups that may be found, or
their relationships, hierarchies or internal data structures. Geospatial and demographic
techniques are frequently used in order to profile and segment populations by ‘natural’
groupings. Shared characteristics or common factors such as Behaviour / Propensity or
Epidemiology, Clinical, Morbidity and Actuarial outcomes – allow us to discover and explore
previously unknown, concealed or unrecognised insights, patterns, trends or data relationships.
•PREDICTIVEANALYITICSandEVENTFORECASTING•
• Predictive Analytics and Event Forecasting uses Horizon Scanning, Tracking and Monitoring
methods combined with Cycle, Pattern and Trend Analysis techniques for Event Forecasting
and Propensity Models in order to anticipate a wide range of business. economic, social and
political Future Events – ranging from micro-economic Market phenomena such as forecasting
Market Sentiment and Price Curve movements - to large-scale macro-economic Fiscal
phenomena using Weak Signal processing to predict future Wild Card and Black Swan Events
- such as Monetary System shocks.
130.
131. Cluster Analysis
• Data Representation
– Metadata - identifying common Data Objects, Types and Formats
• Data Taxonomy and Classification
– Similarity Matrix (labelled data)
– Grouping of explicit data relationships
• Data Audit - given any collection of labelled objects.....
– Identifying relationships between discrete data items
– Identifying common data features - values and ranges
– Identifying unusual data features - outliers and exceptions
• Data Profiling and Clustering - given any collection of unlabeled objects.....
– Pattern Matrix (unlabelled data)
– Discover implicit data relationships
– Find meaningful groupings in Data (Clusters)
– Predictive Analytics – Baysean Event Forecasting
– Wave-form Analytics – Periodicity, Cycles and Trends
– Explore hidden relationships between discrete data features
Many big data problems feature unlabeled objects
133. Cluster Analysis
Clustering Algorithms
Hundreds of spatial, mathematical and statistical clustering algorithms are available –
many clustering algorithms are “admissible” – but no single algorithm alone is “optimal”
• K-means
• Gaussian mixture models
• Kernel K-means
• Spectral Clustering
• Nearest neighbour
• Latent Dirichlet Allocation
Challengesin“BigData”Clustering
• Data quality
• Volume – number of data items
• Cardinality – number of clusters
• Synergy – measures of similarity
• Values – outliers and exceptions
• Cluster accuracy - validity and verification
• Homogeneous versus heterogeneous data (structured and unstructured data)
134. Distributed Clustering Model Performance
Clustering 100,000 2-D points with 2 clusters on 2.3 GHz quad-core
Intel Xeon processors, with 8GB memory in intel07 cluster
Network communication cost increases with the no. of processors
K-means Kernel K -means
135. Distributed
Clustering Models
Number of
processors
Speedup Factor
- K-means
Speedup Factor
- Kernel K-means
2 1.1 1.3
3 2.4 1.5
4 3.1 1.6
5 3.0 3.8
6 3.1 1.9
7 3.3 1.5
8 1.2 1.5
K-means
Kernel K -means
Clustering 100,000 2-D points with 2 clusters on 2.3 GHz quad-core
Intel Xeon processors, with 8GB memory in intel07 cluster
Network communication cost increases with the no. of processors
136. Distributed Clustering Model Performance
Distributed Approximate Kernel K-means
2-D data set with 2 concentric circles
2.3 GHz quad-core Intel Xeon processors, with 8GB memory in intel07 cluster
Run-time
Size of
dataset
(no. of
Records)
Benchmark
Performance
(Speedup
Factor )
10K 3.8
100K 4.8
1M 3.8
10M 6.4