The goal of this e-book is to explore and collaborate on the various details that emerge when you think of data analytics and social business. In particular, social data is Big Data. The challenges are similar but also more specific because of the nature of social identities.
The actual e-book version (an iBook for the Apple iPad) of this is available from: http://ow.ly/eC4E6
We need to expand our minds of how we convey thought leadership knowledge. That has already begun in the rise of social media interactions and relationships creating small format ways of quickly delivering nuggets of information. I am an active participant in that domain. Yet, I believe we still need the long-format of articles, blog posts, and even e-books. I am taking this opportunity to expand a short series into an e-book of its own focused around Analytics and Social Business Data.
The format and most importantly the content for the e-book will change over time, as I modify, expand or add chapters, findings, interesting resources and interviews over time.
If you have not seen my work before, perhaps a good start would be to look at my Forbes blog. As always, I am very open to discussions, conversations, commentary and criticism of my work, however it is delivered. You can reach me, regardless of if we are connected, through Twitter (@rawn), Facebook (Rawn Shah), or Google+ (+Rawn Shah).
3. C HAPTER 1
What are Businesses
Looking to Gain
from Big Data
analytics?
The SaÏd Business School at the Univer-
sity of Oxford and the IBM Institute of
Business Value together have uncovered
the state of Big Data use in their joint
study, Analytics: The real-world use of
big data, released in October 2012. Let us
take a look at the state of the industry
from the viewpoints of businesses who
are actively involved in this space. Image: M. Schroeck, R. Shockley, J. Smart, D.R. Morales, P. Tufano Analytics: The Real-World Use of Big
Data, Said School of Business, IBM IBV 2012
4. S ECTION 1 If there is something we can easily say today is that we have
more information available openly to people than in all his-
Does Big Data Matter? tory combined prior to two decades ago. Incidentally, that was
just shortly after the protocols for the Web were first created,
and later made famous by Mosaic, Netscape, and other popu-
lar tools of the time. It has since become the Web and the
Internet as we now know it.
W HAT IS B IG D ATA A NALYTICS ?
Yet, this is data in the public sphere. Enterprise data is an en-
1. Top 5 characteristics of Big Data: tirely different manner. We know we have lots of data across
the enterprise and the challenge to manage and integrate that
(a) A greater scope of information
data into a cohesive whole continues. The goal: to understand
(b) New kinds of data analysis what we know about our business.
(c) Real-time information These two worlds are clashing as we realize the multiplied
value of bringing together what we know from our inside-out
(d) Data influx from new technologies
view from the enterprise, our outside-in view from our custom-
(e) Non-traditional forms of media ers, and from the plain outside-only view of the public. What
we end up is a whole lot of information, creating the scenario
2. 63 percent of respondents report that the use
that we now refer to as the Big Data world.
of information (including big data) and
analytics is creating a competitive advantage The challenge: how do we in a practical manner analyze all
for their organizations – a 70 percent increase this information to get useful results to enhance and acceler-
in just two years. ate our business. We grasp that Big Data not only means new
technology components to simply have the capacity to ana-
3. The convergence of Volume, Variety, Velocity lyze, but also human components and skills as part of that ca-
and Veracity of data pacity.
As according to this joint IBM-Oxford study which This study from the joint efforts of IBM and University of Ox-
is available, free with registration, from this site. ford is quite timely in answering the up front questions of
what we should expect from Big Data analytics from the execu-
3
5. tive point of view. Starting with the very obvious question of Research Group points out, there is “a morass of confused defi-
how do we define Big Data itself. This is one of the first ques- nitions” around Big Data. His key point in this blog post is
tions the study has undertaken. But, first let us look at the that Big Data is about making decisions about the future not
data behind the study itself. just rehashing the past.
According to the report, this is F IGURE 1 Different Views on the Meaning of Big Data Figure 1 from the study shows
based on the Big Data @ Work some top characteristics show
Survey conducted by IBM in some commonality in what they
mid-2012 with 1144 profession- think is Big Data, but note that
als from 95 countries across they could pick up to two descrip-
26 industries. Respondents in tions. Considering that even the
a self-selected manner repre- top item is 18% while the lowest
sent a mix of disciplines, in- is 7%, there is still some debate
cluding both business profes- on the scope. The distribution of
sionals (54 percent of the total these characteristics describe pri-
sample) and IT professionals orities, although none of these
(46 percent). Study findings are divergent or exclusive of each
are based on analysis of survey other.Yet, we can observe that dif-
data, and discussions with Uni- ferent groups have different pri-
versity of Oxford academics, orities.
subject matter experts and
Based on this the study describes
business executives (notably
four key dimensions that both de-
from IBM). Of the respon-
scribe the complexities of work-
dents, 28% say they have initi-
ing with Big Data and distinguish
ated or deployed Big Data pro-
it from what we have been doing
jects, while 47% are still at the
with structured databases for dec-
planning stage.
Source: M. Schroeck, R. Shockley, J. Smart, D.R. Morales, P. Tufano Analytics: The Real- ades now. The four characteris-
World Use of Big Data, Said School of Business, IBM IBV 2012
As Ray Wang, Principal Ana- tics are described as follows (see
lyst and CEO of Constellation Figure 2):
4
6. • Volume - this is obvious from the name itself. There is a ety of sources, many not even your own.
much larger scale of data than what we have processed be-
fore. By itself it seems like a scaling issue alone. • Veracity - This is possibly the most interesting aspect of this
study: the attention to detail and how much that data con-
• Velocity - this is data in motion, changing over time, some- forms to facts or actual ‘truth’. I will dedicate a full chapter
times on an hourly or daily basis, and at other times when to this topic.
you have data from sensors, at a blinding fast pace of milli-
Ray Wang describes two other dimensions that arise with so-
seconds and microseconds. The real issue is how to process
cial engagement in particular, as Viscosity and Virality. The
and respond to it in good time.
former refers to the resistance to flow of data such as friction
• Variety - Speed and volume alone still call for technological due to integration flow rates from data sources. The latter de-
scaling, but scribes how
variety is F IGURE 2 Four Dimensions of Big Data quickly informa-
where you tion gets distrib-
need intelli- uted across peo-
gence. The ple -to-people
data comes networks. These
in many for- factors essen-
mats, tially describe a
known and state of inertia
unknown; it and its opposite,
may be free-flow.
structured
To me Viscosity
or unstruc-
is an aspect of
tured; it
access and Veloc-
may contain
ity, not one of its
multiple me-
own. The chal-
dia; and Source: M. Schroeck, R. Shockley, J. Smart, D.R. Morales, P. Tufano Analytics: The Real-World Use of Big Data, Said School of Business, IBM IBV
2012 lenge is in get-
emerge
ting access to
from a vari-
5
7. the data at the speed that it is traveling, and not working with
out of date information.
Virality occurs in two forms. The first is a metadata element of
‘how fast is this information spreading’ which is acceleration
and the various vectors it is going that is useful in determining
priorities and significance of the incoming data. Once you
have analyzed the data, what is your ability to distribute the
information effectively. Rohit Bhargava, author of Personality
Not Included (McGraw-Hill, 2008) and Likenomics (Wiley,
2012), shared other factors that describe how viral is the con-
tent that you have: Is it Unique? Is it Authentic? Is it Talk-
able? Both these views of the current state of data accelera-
tion, and the projected state of the result spread are studied at
length by the Word of Mouth Marketing Association.
I would therefore say they are either outcomes of the other di-
mensions or factors that play after the actual analyses, rather
than core dimensions.
The four factors of Volume, Variety, Velocity and Veracity im-
pact your ability to act on the data you have at hand, to hope-
fully make decisions about the future.
6
8. S ECTION 2 claims, assessing each
claim against identi-
What is Driving the Need? fied risk factors and
categorizing the risks.
Vestas Wind Systems
A/S, a Danish wind
turbine producer,
used a supercomputer
to analyze a large num-
ber of location-
dependent factors
With all the possibilities, we need to set our priorities on what such as temperature,
to analyze and more so, the bigger goal of what business initia- precipitation, wind ve-
tives are we trying to drive through this analysis. According to locity, humidity and
the study (see Figure 3), most respondents are looking to ex- atmospheric pressure.
pand their capabilities for Customer-centric outcomes (49%). This led to increased Source: M. Schroeck, R. Shockley, J. Smart, D.R. Morales, P.
Tufano Analytics: The Real-World Use of Big Data, Said
A far second is Operational optimization (18%), followed by predictability and reli- School of Business, IBM IBV 2012
risk management and new business models, and finally em- ability, which in the
ployee collaboration. end decreases cost to customers per kilowatt hour produced.
The focus on customer-centric issues is reinforced with their All these examples examine enterprise internal data, under
references in different industries. For example, Premier their dominion in known data sources and formats. As the
Healthcare Alliance used enhanced data sharing and analytics study says, “internal data is the most mature, well-understood
to improve patient outcomes while reducing spending by data available to organizations. It has been collected, inte-
US$2.85 billion. Santam Insurance, in South Africa improved grated, structured and standardized through years of enter-
the customer experience by implementing predictive analytics prise resource planning, master data management, business
to reduce fraud. Fraud losses accounted for 6 to 10 percent of intelligence and other related work.”
annual premium costs for Santam customers. They gained the
ability to catch fraud early by capturing data from incoming
7
9. From my view in the social world, implementing on internal
data relatively speaking, is much easier to accomplish. This
again is the issues that rise with Veracity.
Todd Watson of IBM points out in his blog, “Most Big Data ini-
tiatives currently being deployed by organizations are aimed
at improving the customer experience, yet less than half of the
organizations involved in active Big Data initiatives are cur-
rently collecting and analyzing external sources of data, like
social media.”
Regardless of the internal-external data focus, you can and
should examine social data from a customer-centric purpose.
What that suggests is that you need to understand not just
how customers interact with your company directly (which cre-
ates data such as transactions, service calls, sales requests,
event participation), but also how the customer is interacting
in the social world around the topic of what they are inter-
ested in.
Are they looking for advice on the need or use of the product?
Are they looking for recommendations? Are they comparison
shopping? Are they influencing others about that product or
service category?
These translate to building a deeper understanding and better
preparation to anticipate needs of the customer and the mar-
ket.
8
10. C HAPTER 2
The Veracity of Data
There are increasing levels of uncertainty
surrounding data as we move from the
well-defined world of transactions to the
context-free world of language and inter-
action. Yet, this is the precisely road to
take to future opportunities for the
customer-centric organization.
11. S ECTION 1
Veracity goes beyond The IBM-Oxford study does well by introducing Veracity but
even the term has some degree of uncertainty because of the
Uncertainty multiple implications that fall under this umbrella term. I'd
like to add some of the different ways that this comes to light
and go into depth with one aspect that impacts the area of ana-
lytics in Social Business.
• Accuracy - at the heart of veracity is how accurately does it
portray the real state of information
• Precision - how much precision is actually available. You can
L ET ’ S TALK U NCERTAINTY have high precision but it may still not be accurate to reality
Uncertainty comes in many shapes and forms, and • Reliability - will the data continue to be available at the
we need to understand that they can change same quality in the future or does become corrupted some-
substantially in meaning and in the amount of how (not including intentional changes to the data contents)
work-to-be-done when we look at the internal • Provenance - can you determine the path where the origin of
versus external data: the data came from, not simply where you picked it up
1. Accuracy • Fidelity - goes with provenance to how much did the data
change meaning from the original to where you got it
2. Precision
3. Reliability • Permission - do you actually have the right to use the data
the way you intend to
4. Provenance
Most of these aspects have been understood for quite some
5. Fidelity time in the years spent looking at data internally. As indicated
6. Permission in earlier sections, internal data is much better understood
10
12. and trusted to a degree. However, once you begin to look at
the external world, the rules begin to change. I NTERACTIVE 1
Veracity in relation to Analytics Capabilities
It is this last element of Permission that is a big impacting fac-
tor on using social data, very often because we simply pre-
sume that we have that right if we can get the data. This very
much applies to external public data from databases, web
sources, online applications, and so on that all contain some
bit of the data you may want. Even data within the enterprise
Structured Queries
can have that limit.
Even when those who have worked out the rights to access the
information, consider that this is not a fixed state, but has its
own Velocity and its own path of dependencies. This is data in
motion as the study describes, and more than just a changing
flow of information, it is also changing metadata. For exam-
ple, the data that describes your network of relationships Natural Language Processing
changes when you add a new person to your network, and this
metadata then changes the information that comes through 1 2
your network overall. You don't even need to add anyone new
directly. Your network itself is alive and evolving as people Image Source: M. Schroeck, R. Shockley, J. Smart, D.R. Morales, P. Tufano Analytics: The Real-World
Use of Big Data, Said School of Business, IBM IBV 2012. Comments: Rawn Shah
change jobs, move organizations, add new relationships and
interact socially.
This is arguably yet to be enforced thorough across borders
The global economy complicates this further because of territo- but it is a risk that still exists.
rial, national and other regional rules on rights to information
What all this represents can be loosely termed 'friction' that
in the social space. Access to your social information, and not
limits what you can learn from the data itself. Big data has dif-
even sensitive private information, falls into different jurisdic-
ferent issues in the social space and regardless of all the mar-
tional precepts, sometimes with their own compliance needs.
ket interest in social analytics, we have yet to really play in the
new reality of the Social Customer.
11
13. S ECTION 2
Moving from Transaction- T-mind is where most business thinking has been for some
decades since the mass adoption of relational databases. We
mindedness to all incorporate it into our operations, our marketing tactics,
our customer service, our product development, our ERP, and
Relationship-mindfulness other aspects of our businesses. In fact, at scale, we began to
look at Aggregates and thinking of demographics, and other
forms of groupings; at first, in the broad sense (e.g. male cus-
Transaction data is nice and easy. It can be very voluminous
but the context behind that data is often well recorded with a
lot of meta data about who, what, when, how, etc. A conversa-
tion by comparison is vague, multitudinal, multi-party, lan-
guage dependent, and may contain references to outside infor-
mation not included in the data. A relationship multiplies all
the conversations you may have had with someone over multi-
ple occurrences over time, as well as different modes of how
you interact, and changing parties of who else you interacted
with.
The contextual differences behind a transaction and a relation-
ship is leagues apart, and with Social Business, what we really
want to get to understanding is the state and the opportunities
for the relationship, not just the history. In a mentoring call
last week to some new consultants, I described the difference
Image: Generated by LinkedIn Labs. Data: Rawn Shah
in thinking Transactions (let's say 'T-mind') and Relationships
('R-mind') poses several magnitudes of data needs.
12
14. tomers aged between 19 and 25), to a much narrower focus Phoenix area, but the actual communities they engage and
(e.g., 'men 19-25 in my territory who commute to work and even the topics, car parts, people and other subjects they like
like sports'). By aggregating volume transactions into sets, we to talk about.
discovered new opportunities for business for smaller groups
R-mind is when you start to really look at the networks of indi-
of people, and therefore arrive at the 'A-mind' thinking in ag-
vidual people when making business decisions.
gregates.
What happens when you get to this state is the ultimate in per-
Peter Kim of digital agency R/GA, and former Chief Strategy
sonalization and customer-centricity. One aspect of the transi-
Officer of Dachis Group, recently shared in his blog that “func-
tions from T-mind to A-mind to S-mind and to R-mind is that
tional integration of ecosystems is emerging as the path to-
the scope of what you are looking at is getting smaller, but a
wards maximizing value.” His view is that focused aggregation
second aspect is that the data and analytics per person is also
at the interface to the customer will be a continuing principle
increasing in volume.
of reaching social masses, versus a strategy that distributes
how you interact with customers across different places. Yet, the challenge remains is what you also run into is the big
data problem with Permission as I explained earlier.
When social media arrived, we realized that you could share
this with customers to help their decisions (e.g., "People who
bought this, also bought ..." on Amazon). We also realized that
that information is spread around the Web in data sources out-
side our control. Thus, social media analytics was born to get
that sense of information.
This is still not quite reaching R-mind, but for the sake of this
piece, lets call it 'S-mind' to note that it includes the social
view and not just aggregates of data. The A-mind to S-mind
change happens when you consider not just demographics
data you defined yourself, but what you discovered from the
social web. In other words, you are creating new categories of
data based on the social data. For example, you are not only
thinking of people who like to work on cars as a hobby in the
13
15. C HAPTER 3
Aggregating Social
Data
Coming Soon
It is easy to misunderstand the title as a
technical issue of integrating data
sources. It is something more complex.
Social data is the basis of demographics
and psychographics, and essential to the
way we understand the customer. Let’s
take a step into the A-mind.