When it comes to big data, companies need to determine best fit with existing investments and incorporate proven best practices that enable them to run better and run differently.
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and Advantage
1. Traveling the Big Data Super Highway:
Realizing Enterprise-wide Adoption
and Advantage
When it comes to big data, companies big and small need to
determine not only the best fit of new technology with existing
investments but also how to incorporate proven best practices that
enable them to run better and run differently.
Executive Summary
The avalanche of data that is stressing — and
often collapsing — traditional computing systems
is matched only by the staggering number of
technical and architectural choices available to
those seeking business value from this environ-
ment. Therefore, big data platforms can be a dou-
ble-edged sword. While they provide significant IT
cost reductions and the power to analyze much
larger data sets than previously possible with
available IT capabilities, they can also unleash
strong disruptive forces, driven by a general lack
of understanding during planning and implemen-
tation.
Users and IT organizations alike still have a hard
time understanding what big data technology
actually is and how to effectively apply it. Many
organizations are struggling to transition into
full-scale production with their big data develop-
ment platforms and are hamstrung to provide
the business value promised and associated with
such platforms.
In short, big data opportunities are not without
big data challenges, the majority of which can be
grouped into four major categories:
• Immature technology landscape.
• Impact on end users due to fluctuations in the
current business model and shortcomings of
big data technology.
• Attempts to replace existing technology
components with a big data platform.
• Resource availability.
This white paper discusses the challenges of
implementing big data technology and provides
guidance on how to implement a big data initia-
tive by incorporating proven best practices.
Fragmented Perspectives
You can hardly pick up a magazine or browse a
Web site covering business or IT trends without
being bombarded by content extolling the virtues
of big data. Proponents typically address the
tremendous promise offered by big data tools
• Cognizant 20-20 Insights
cognizant 20-20 insights | august 2013
2. 2
and techniques, from gaining insight heretofore
unavailable, to significantly reducing the cost and/
or time necessary to achieve business benefits.
Also covered is the new-found organizational
ability to analyze data generated by devices and
social media, as well as other unstructured and
semi-structured data.
What stands out amid these
abundant capability and
benefit claims is the lack of
a universally accepted defi-
nition of big data. According
to IDC, “Big data technolo-
gies describe a new gen-
eration of technologies and
architectures, designed to
economically extract value
from very large volumes of
a wide variety of data, by
enabling high-velocity capture, discovery, and/or
analysis.”1
In other words, this definition incorpo-
rates all data types managed by next-generation
systems that must scale to handle ever-increas-
ing user workloads and data volumes.
On the other hand, McKinsey & Co. defines big
data as, “Datasets whose size is beyond the ability
of typical database software tools to capture,
store, manage and analyze.”2
This suggests that
big data’s size is relative to the effectiveness of
the technology that handles it and that what con-
stitutes big data today will not likely be big data
tomorrow.
All this said, there is little wonder why a wide
spectrum of big data approaches, and big data
results, exists.
Invisible Ink
Most gurus consider the Apache Open Source
Foundation’s Hadoop technology stack as the
quintessential big data platform (see Figure 1).
This stack actually comprises a small number of
components and does not completely address
key issues pertaining to real-time analytics, data
security and operations. Customers frequently
select one of the commercially available solutions
to address these issues.
The problem is that all the leading big data
solution vendors are still scrambling to fill opera-
tional, visualization and information discovery
gaps while also planning major product changes
needed over the next six months to a year.
A hidden message is written between the lines of
this big data story. Consider this:
• The capabilities available in the multitude
of commercial solutions vary significantly
between channels and continue to diverge.
• The current Hadoop stack is aimed at batch
processing and is not tailored for real-time
processing.
• Various tool sets are evolving rapidly and dra-
matically, and this technological progression is
expected to continue.
cognizant 20-20 insights
The leading big data
solution vendors
are still scrambling
to fill operational,
visualization and
information discovery
gaps while also planning
major product changes.
Figure 1
Components of Apache’s Hadoop Platform
Sqooprelational
databasedata
collector
Flume|Chukwalog
datacollector
Hadoop MapReduce
Distributed processing framework
HDFS
Hadoop distributed file system
R
statistics
Mahout
machine learning Pig data flow
Hive data
warehouse
Oozie
workflow
Zookeeper
coordination
Ambari
Provisioning, managing and monitoring Hadoop clusters
3. • Support for third-party data visualization tools
is currently not inherent in the Hadoop stack.
• The database capabilities of the platform do
not currently provide high concurrency or ad
hoc query support.
• Operational aspects of the platform are lacking,
such as point-in-time recovery and data-level
security.
• New organizational skill sets are required to
design, build and support applications running
on this platform.
Now that everyone is talking big data, should
organizations begin the implementation of a big
data strategy? The answer is a resounding “yes.”
The promise of big data to allow business users
to analyze large data sets in ways they cannot
perform today, while significantly reducing IT
infrastructure, is real. However, it will take time to
transform both business and technology organi-
zations into a state that will deliver full business
value. Companies should understand the limita-
tions of the platform and use care when deter-
mining where the technology fits, and where it
does not.
Planning the Journey
To tap big data advantages early on, companies
must ask — and hopefully answer — two funda-
mental questions that are not mutually exclusive:
1. Is the business goal to maximize the value
of data that already exists and solve current
problems with better, faster, more agile or less
expensive technology? That is, do we hope to
“run better” after big data is implemented?
2. Is the goal to tackle long-standing unsolved
problems or discover new solutions not
previously considered, using new sources and
new technologies? That is, do we hope to “run
differently” after big data is implemented?
To counteract the disruptive forces caused
by immature technology, organizations have
embarked on the journey along their own big data
super highways and are beginning to pass the fol-
lowing checkpoints:
• Awareness: Determining what big data really is
and what it means to them.
• Innovation: Understanding the capabilities and
limitations of big data technologies.
• Management: Harmonizing existing technolo-
gies with big data technologies to maximize the
life and use of existing IT investments.
• Operating: Mobilizing resources and structur-
ing a fluid environment.
• Transformation: Continually improving ana-
lytical capabilities through holistic adoption
across the enterprise.
The “unknown-uncaptured advantage” (shown in
orange in Figure 2) depicts the loss of possible
business value by the organization due to its
inability to identify, and thus capture, opportuni-
ties. In some ways, the orange area is the province
3cognizant 20-20 insights
The Big Data Journey: Lifting the Highway
Figure 2
BusinessValue
Maturity
Captured Advantage
Awareness Innovation Managing Operating
High
HighLow
Transformation
ReferenceReferen e
ArchitectureArchitec ur
Innovationnnovatio
LLaabb
SShhaarreedd SSeerrvviicceessrree vvh
Modell
Disruptive
Forces
Unknown Uncaptured Advantage
Uncaptured Known Advantage Completed Checkpoints
Big Data Super Highway
4. cognizant 20-20 insights 4
of researchers and visionaries long before value
is commercially viable. Here, companies cannot
capture what they do not know exists.
The “uncaptured-known advantage,” on the other
hand, (shown in green) represents an understand-
ing of value that is not yet realized. This is where a
company that makes the right technical and orga-
nizational decisions early on can gain business
advantage, and therefore competitive differen-
tiation, vis-à-vis late adopters. Discussed in-depth
in the following section, known mile markers are
enabling companies to “lift” their big data highway
and capture the value difference between the
two curves. Here is where value captured by one
enterprise and not another becomes a competi-
tive advantage — or disadvantage. Here is where
visionary companies can run differently.
Lastly, “captured advantage” (shown in blue)
simply indicates which benefits have been
obtained within the organization. Generally, this
is an area where all companies can eventually
be expected to participate. The blue area shows
that competitive advantage has given way to
competitors that have captured operational and
process advantages and run better. In contrast,
a company that is not even executing well in the
“captured advantage” space is running a risk of
obsolescence and market failure.
Stepping Stones, Bumps in
the Road and Navigating
There is much value to be earned in lifting the
pace of value discovery and creation and thus dis-
covering value earlier in the business cycle. And
so, companies must find a way to press on, despite
the opposing disruptive forces. A series of steps is
recommended for approaching and implementing
big data initiatives that have proved to be quite
successful. Likewise, there are pitfalls that come
with immaturity and a lack of direction.
Marker 1: Reference Architecture
No matter what data a company needs — sales,
competitive, economic, weather or demographic
data — the technical journey of big data begins
with the reference architecture. This addresses
the practical needs required to obtain maximum
reuse of existing technology investments, and
it incorporates the use of both big data and tra-
ditional technology components in one overall
architecture (see Figure 3).
Combining Technologies to Form Overarching Reference Architecture
Figure 3
NoSQL database
Analytics appliance
Third-party database
Legacy database
Traditional database
HDFS file system
Information Discovery
Reporting
Source 1
Unstructured
Data
E-mail
Social
media
Data marts
NoSQL
ArchivingSandbox
Metadata
DataViewsandSemanticLayer
AuthenticationandAuthorization
UserPrivileges
Internal and
External Data
Landing Staging
Information
Discovery
and Deep
Analytics
Dataware-
houses and
Data Marts
Data Privacy
and Security
Reporting Analytics
and Visualization
Hive
data marts
HBASE
warehouse
Source n Data
warehouses
Data Integration Layer and
Data Certification
Metadata Management
Data Governance, Stewardship, ILM, DLM, Data Standards, Data Federation, Lineage and Insights
Security
HDFS (Hive/HBASE)
Documents
Data Export
Process Data Extracts
(SFDC, Other)
Visualization
and Discovery
Big Data Analytics
Analytics
• Standard
• OLAP
• LIST
• Ad hoc
• Slice/dice
• Drill down
• Business
analytics
• Data mining
• Forecasting
• Guided
analytics
• Predictive
modeling
• BAM
• Static
• Code data
• Predictive
pattern
• GUI data
• Brand sentiment analytics
• Behavioral
• Social media
• Data indexing for search
5. Innovation labs are
critical to enabling the
right combination of
toolsets, datasets, skill
sets and mindsets for
better, faster, cheaper
and more successful
use of big data
technologies.
This hybrid reference architecture enables com-
panies to take advantage of greater processing
speeds provided by the highly scalable Hadoop
environment and positions the organization for
long-term use of semi-structured and unstruc-
tured data. Companies can also minimize the cost
and organizational impact of this new platform by
allowing business users and current applications
to continue to apply the same technologies they
use today.
As a comprehensive solution, the reference archi-
tecture promotes the following benefits:
• Provides a clear path to technology maturity.
• Offers agility by allowing quick response to
changing business needs through the integra-
tion and reuse of current analytics platforms
and skills.
• Promotes confidence as a result of reduced
risk, higher quality data and better governance.
• Explains to corporate stakeholders the role
of existing and new technologies and the
placement of new investments.
Marker 2: Innovation Labs
Many self-service, reporting and analytics tools
to which the business is accustomed must be
replaced or adapted to Hadoop. This fact alone
can result in a significant learning curve. Add to
this the need to replace other tools, and the curve
is exaggerated further.
To address this need, innovation labs are often
used to help business users understand the limi-
tations of the Hadoop platform and begin devel-
oping the skills necessary
for supporting the system.
By utilizing an innovation
lab, business users may even
begin to gain new informa-
tional knowledge, as well.
For example, business users
can be introduced to new
analytics tools while new
analytics models are being
developed. Simultaneously,
data scientists can begin
analyzing new data sources
as platform usage matures
within the organization. The business’s opera-
tional units can also begin preparing and collabo-
rating with their suppliers on key issues, such as
monitoring, point-in-time recovery, failover, data
security and all capabilities that are currently not
part of their numerous software solutions.
Quick Take
In recent programs that used a similar reference
architecture, the following results have been
achieved:
• A large financial company reduced its capital
expenses by eliminating the need to upgrade
existing systems. It accomplished this by
relocating transformational and data aggrega-
tion processes to the big data platform. If it had
continued processing on the current database
system, the company estimates it would have
needed to allocate over $20 million in capital
expenditures.
• After having an integrated customer view on its
wish list for over 10 years, an insurance giant
was finally able to achieve its goal through
the use of NoSQL technology. It combined
data from nearly 100 separate administrative
and claims systems and moved from pilot to
rollout in 90 days, creating a more accurate
churn model that responded much sooner to
new signals. This 90-day turnaround was a
refreshing outcome compared with typical
insurance industry IT projects, which are
measured in quarters or years.3
A Cautionary Tale
But not all big data stories end well. A global
company attempted to move portions of its data
acquisition, aggregation and analytical process-
ing to Hadoop. However, the company did not
understand the platform’s inherent limitations
before attempting to port a broad range of func-
tionality to it. As a result, it had to revert back
to its traditional platform, losing time and money
and instigating concern from business teams
regarding the platform’s overall viability.
Real-Life Reference Architectures
5cognizant 20-20 insights
6. cognizant 20-20 insights 6
Innovation labs are critical to enabling the right
combination of toolsets, datasets, skill sets
and mindsets for better, faster,
cheaper and more successful use
of big data technologies. Another
crucial concept of the innovation
lab is to provide an environment
where new technological compo-
nents can be introduced — as the
technology matures — and where
IT can learn more about the limi-
tations of those technologies.
Marker 3: Shared Services
Another critical element of success is the shared
services model, as well as concerns as to whether
the innovation lab can be properly staffed and
managed. Without this support, the adoption
and success rate of big data programs is often
dramatically reduced. The shared services model
provides a resource pool to support various
development and analytics needs. It also acts
as a breeding ground for technology develop-
ers to learn and grow while developing the pro-
cesses and procedures necessary for shifting the
platform into an operational state.
The main advantages of using the shared services
model include:
• Promoting the big data vision to provide the
broader context for implementation success.
People need direction, and without clear
leadership and vision, many will find reason
to resist or pull the initiative in opposing
directions.
• Highly skilled team members who can help
train and educate the analytics team, develop-
ment staff and data scientists on the proper
way to use the system.
• Mentorship as a fundamental support system.
An extension of training, mentoring enables
active participation in big data innovation
efforts and provides an environment for
leadership development.
• Use cases and usage patterns, which provide
stronger design guidance than design princi-
ples alone. This is because they put the prin-
ciples into practice and apply them to different
contexts. Doing so better illustrates how they
are to be applied and allows many design deci-
sions to be pre-determined.
• Planning for future adoption and integra-
tion of new technologies to meet tomorrow’s
business needs. Having a pre-developed plan
for big data technology progression — including
how existing technologies will fit into the
future big data framework — provides a solid
foundation for the user community.
• Specialized expertise available by project or
topic, eliminating the need to hire a full-time
resource for a one-time question.
• Bypassing of the “who pays first?” dilemma,
which occurs when multiple groups could
benefit from onboarding a new technology
but no funding model exists to allow for group
cost-sharing. In other cases, an individual team
is stuck with the entire cost burden of the
prototype. A shared services organization can
leverage a small innovation fund to seed these
innovation initiatives quickly and flexibly.
As an example of shared services model success,
a large financial institution was able to create,
process, analyze and leverage data to drive
business priorities. This capability enabled it to
enhance predictions of customer risk behaviors,
strengthen the identification of high-value pros-
pects and automate analysis of written customer
surveys, which led to decreased service improve-
ment time. Here, the shared services model acted
as a catalyst for achieving what was possible with
big data technology.
By combining people and knowledge with a
uniform plan, the company was able to bring
findings into fruition. Without the shared services
model, this capability would have remained in
the innovation lab’s Petri dish, never to advance
beyond anything more than a good idea.
Marker 4: Reaching Best in Class
Technology and capabilities alone will not create
a best-in-class organization or enable an orga-
nization to move from standard reporting and
traditional business intelligence analytics to the
next level of predictive analytics. As the organi-
zation matures in the use of its team, processes
and technological components, these features
become increasingly ingrained in the business.
Data scientists should form a trial-and-error
system, testing one idea after another as the data
history grows. This may at first seem counter-
intuitive, but conducting this type of analysis at
this stage of maturity typically is met with higher
success rates than alternatives. The reason:
Companies usually have the experience needed
to quickly identify opportunities, as well as the
stability to overcome challenges when testing
new ideas.
Data scientists
should form a trial-
and-error system,
testing one idea
after another as the
data history grows.
7. cognizant 20-20 insights 7
As these ideas are tested and verified, the
benefits of a truly agile analytics model begin
to take shape. Also, promoting a higher degree
of collaboration helps drives additional learning
throughout the enterprise and converts into
reality the goal of adopting an enterprise-wide
big data platform.
From On-Ramp to Destination
The technological — and organizational — consider-
ations that accompany the deployment and intro-
duction of big data platforms are an integral part
of implementing any big data strategy. Although
the Hadoop platform is evolving rapidly, organiza-
tions can start achieving results now by enabling
users to begin exploring the underlying technolo-
gies while addressing key present-day challenges.
Companies can maximize their investment in
current and future technology — and people — by
considering the following tips:
• Avoid trading today’s unsolved issues with
the invention of new use cases or business
problems that new technology is assumed
to solve. Instead, take a fresh look at current
business problems that can be addressed using
the existing infrastructure.
• Develop parallel strategies that immediately
enable business users’ application of core
technological components while correcting
your organization’s operational issues.
• Refrain from replacing current technology
just because you can. Instead, focus on
extending the current reference architecture
to blend big data components with the existing
architectural stack, thereby maximizing
current investments and reducing organiza-
tional change management challenges. The
traditional big data platform represents only
one aspect of the overall technology solution,
albeit an integral part. Also, your organization’s
current technology will be fully aligned with
reports and analytics calculations that work
well. Big data can pre-process the volumes or
calculate new factors, enabling your organiza-
tion to leave existing investments to the pre-
sentation layer of the calculated results.
• Remember that technology is not only
maturing; it’s also evolving. Therefore, your
organization should embrace new techno-
logical components and solutions in lieu of
attempting to invest in a single technology
stack.
Analogies abound on how companies can
safely achieve value from big data. By follow-
ing this checklist, business and IT leaders alike
can surmount steep challenges, derive big data
value more quickly, maintain advantage over late
adopters and enable the enterprise to benefit
from big data technologies.
Big Data Organizational Support Model
Figure 4
Onboard business
unit use cases
BigDataTechnologyMaturity
Big Data Organizational Maturity
High
HighLow
Maturity
Innnovvatioon Laabs ––
FFor Quick WWinss Shaared Servicees Modeel
Best in Class
OOrrganizaationRefereence Arcchiteectturre
Enterprise Adoption
How do I mobilize all my data and operationalize analytics at scale?
Agile Analytics
How do I continually improve my analytics capabilities?
Technology Refresh
How do I leverage big data with my existing assets
to revamp my current business model?
Pilot
How do I ensure expected value proposition?
Enable Innovation
How do I find additional informational value?
Proof of Concepts
What are my
capabilities/limitations?
Awareness
Whatisthis“stuff?”