Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and Advantage


Published on

When it comes to big data, companies need to determine best fit with existing investments and incorporate proven best practices that enable them to run better and run differently.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and Advantage

  1. 1. Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and Advantage When it comes to big data, companies big and small need to determine not only the best fit of new technology with existing investments but also how to incorporate proven best practices that enable them to run better and run differently. Executive Summary The avalanche of data that is stressing — and often collapsing — traditional computing systems is matched only by the staggering number of technical and architectural choices available to those seeking business value from this environ- ment. Therefore, big data platforms can be a dou- ble-edged sword. While they provide significant IT cost reductions and the power to analyze much larger data sets than previously possible with available IT capabilities, they can also unleash strong disruptive forces, driven by a general lack of understanding during planning and implemen- tation. Users and IT organizations alike still have a hard time understanding what big data technology actually is and how to effectively apply it. Many organizations are struggling to transition into full-scale production with their big data develop- ment platforms and are hamstrung to provide the business value promised and associated with such platforms. In short, big data opportunities are not without big data challenges, the majority of which can be grouped into four major categories: • Immature technology landscape. • Impact on end users due to fluctuations in the current business model and shortcomings of big data technology. • Attempts to replace existing technology components with a big data platform. • Resource availability. This white paper discusses the challenges of implementing big data technology and provides guidance on how to implement a big data initia- tive by incorporating proven best practices. Fragmented Perspectives You can hardly pick up a magazine or browse a Web site covering business or IT trends without being bombarded by content extolling the virtues of big data. Proponents typically address the tremendous promise offered by big data tools • Cognizant 20-20 Insights cognizant 20-20 insights | august 2013
  2. 2. 2 and techniques, from gaining insight heretofore unavailable, to significantly reducing the cost and/ or time necessary to achieve business benefits. Also covered is the new-found organizational ability to analyze data generated by devices and social media, as well as other unstructured and semi-structured data. What stands out amid these abundant capability and benefit claims is the lack of a universally accepted defi- nition of big data. According to IDC, “Big data technolo- gies describe a new gen- eration of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”1 In other words, this definition incorpo- rates all data types managed by next-generation systems that must scale to handle ever-increas- ing user workloads and data volumes. On the other hand, McKinsey & Co. defines big data as, “Datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.”2 This suggests that big data’s size is relative to the effectiveness of the technology that handles it and that what con- stitutes big data today will not likely be big data tomorrow. All this said, there is little wonder why a wide spectrum of big data approaches, and big data results, exists. Invisible Ink Most gurus consider the Apache Open Source Foundation’s Hadoop technology stack as the quintessential big data platform (see Figure 1). This stack actually comprises a small number of components and does not completely address key issues pertaining to real-time analytics, data security and operations. Customers frequently select one of the commercially available solutions to address these issues. The problem is that all the leading big data solution vendors are still scrambling to fill opera- tional, visualization and information discovery gaps while also planning major product changes needed over the next six months to a year. A hidden message is written between the lines of this big data story. Consider this: • The capabilities available in the multitude of commercial solutions vary significantly between channels and continue to diverge. • The current Hadoop stack is aimed at batch processing and is not tailored for real-time processing. • Various tool sets are evolving rapidly and dra- matically, and this technological progression is expected to continue. cognizant 20-20 insights The leading big data solution vendors are still scrambling to fill operational, visualization and information discovery gaps while also planning major product changes. Figure 1 Components of Apache’s Hadoop Platform Sqooprelational databasedata collector Flume|Chukwalog datacollector Hadoop MapReduce Distributed processing framework HDFS Hadoop distributed file system R statistics Mahout machine learning Pig data flow Hive data warehouse Oozie workflow Zookeeper coordination Ambari Provisioning, managing and monitoring Hadoop clusters
  3. 3. • Support for third-party data visualization tools is currently not inherent in the Hadoop stack. • The database capabilities of the platform do not currently provide high concurrency or ad hoc query support. • Operational aspects of the platform are lacking, such as point-in-time recovery and data-level security. • New organizational skill sets are required to design, build and support applications running on this platform. Now that everyone is talking big data, should organizations begin the implementation of a big data strategy? The answer is a resounding “yes.” The promise of big data to allow business users to analyze large data sets in ways they cannot perform today, while significantly reducing IT infrastructure, is real. However, it will take time to transform both business and technology organi- zations into a state that will deliver full business value. Companies should understand the limita- tions of the platform and use care when deter- mining where the technology fits, and where it does not. Planning the Journey To tap big data advantages early on, companies must ask — and hopefully answer — two funda- mental questions that are not mutually exclusive: 1. Is the business goal to maximize the value of data that already exists and solve current problems with better, faster, more agile or less expensive technology? That is, do we hope to “run better” after big data is implemented? 2. Is the goal to tackle long-standing unsolved problems or discover new solutions not previously considered, using new sources and new technologies? That is, do we hope to “run differently” after big data is implemented? To counteract the disruptive forces caused by immature technology, organizations have embarked on the journey along their own big data super highways and are beginning to pass the fol- lowing checkpoints: • Awareness: Determining what big data really is and what it means to them. • Innovation: Understanding the capabilities and limitations of big data technologies. • Management: Harmonizing existing technolo- gies with big data technologies to maximize the life and use of existing IT investments. • Operating: Mobilizing resources and structur- ing a fluid environment. • Transformation: Continually improving ana- lytical capabilities through holistic adoption across the enterprise. The “unknown-uncaptured advantage” (shown in orange in Figure 2) depicts the loss of possible business value by the organization due to its inability to identify, and thus capture, opportuni- ties. In some ways, the orange area is the province 3cognizant 20-20 insights The Big Data Journey: Lifting the Highway Figure 2 BusinessValue Maturity Captured Advantage Awareness Innovation Managing Operating High HighLow Transformation ReferenceReferen e ArchitectureArchitec ur Innovationnnovatio LLaabb SShhaarreedd SSeerrvviicceessrree vvh Modell Disruptive Forces Unknown Uncaptured Advantage Uncaptured Known Advantage Completed Checkpoints Big Data Super Highway
  4. 4. cognizant 20-20 insights 4 of researchers and visionaries long before value is commercially viable. Here, companies cannot capture what they do not know exists. The “uncaptured-known advantage,” on the other hand, (shown in green) represents an understand- ing of value that is not yet realized. This is where a company that makes the right technical and orga- nizational decisions early on can gain business advantage, and therefore competitive differen- tiation, vis-à-vis late adopters. Discussed in-depth in the following section, known mile markers are enabling companies to “lift” their big data highway and capture the value difference between the two curves. Here is where value captured by one enterprise and not another becomes a competi- tive advantage — or disadvantage. Here is where visionary companies can run differently. Lastly, “captured advantage” (shown in blue) simply indicates which benefits have been obtained within the organization. Generally, this is an area where all companies can eventually be expected to participate. The blue area shows that competitive advantage has given way to competitors that have captured operational and process advantages and run better. In contrast, a company that is not even executing well in the “captured advantage” space is running a risk of obsolescence and market failure. Stepping Stones, Bumps in the Road and Navigating There is much value to be earned in lifting the pace of value discovery and creation and thus dis- covering value earlier in the business cycle. And so, companies must find a way to press on, despite the opposing disruptive forces. A series of steps is recommended for approaching and implementing big data initiatives that have proved to be quite successful. Likewise, there are pitfalls that come with immaturity and a lack of direction. Marker 1: Reference Architecture No matter what data a company needs — sales, competitive, economic, weather or demographic data — the technical journey of big data begins with the reference architecture. This addresses the practical needs required to obtain maximum reuse of existing technology investments, and it incorporates the use of both big data and tra- ditional technology components in one overall architecture (see Figure 3). Combining Technologies to Form Overarching Reference Architecture Figure 3 NoSQL database Analytics appliance Third-party database Legacy database Traditional database HDFS file system Information Discovery Reporting Source 1 Unstructured Data E-mail Social media Data marts NoSQL ArchivingSandbox Metadata DataViewsandSemanticLayer AuthenticationandAuthorization UserPrivileges Internal and External Data Landing Staging Information Discovery and Deep Analytics Dataware- houses and Data Marts Data Privacy and Security Reporting Analytics and Visualization Hive data marts HBASE warehouse Source n Data warehouses Data Integration Layer and Data Certification Metadata Management Data Governance, Stewardship, ILM, DLM, Data Standards, Data Federation, Lineage and Insights Security HDFS (Hive/HBASE) Documents Data Export Process Data Extracts (SFDC, Other) Visualization and Discovery Big Data Analytics Analytics • Standard • OLAP • LIST • Ad hoc • Slice/dice • Drill down • Business analytics • Data mining • Forecasting • Guided analytics • Predictive modeling • BAM • Static • Code data • Predictive pattern • GUI data • Brand sentiment analytics • Behavioral • Social media • Data indexing for search
  5. 5. Innovation labs are critical to enabling the right combination of toolsets, datasets, skill sets and mindsets for better, faster, cheaper and more successful use of big data technologies. This hybrid reference architecture enables com- panies to take advantage of greater processing speeds provided by the highly scalable Hadoop environment and positions the organization for long-term use of semi-structured and unstruc- tured data. Companies can also minimize the cost and organizational impact of this new platform by allowing business users and current applications to continue to apply the same technologies they use today. As a comprehensive solution, the reference archi- tecture promotes the following benefits: • Provides a clear path to technology maturity. • Offers agility by allowing quick response to changing business needs through the integra- tion and reuse of current analytics platforms and skills. • Promotes confidence as a result of reduced risk, higher quality data and better governance. • Explains to corporate stakeholders the role of existing and new technologies and the placement of new investments. Marker 2: Innovation Labs Many self-service, reporting and analytics tools to which the business is accustomed must be replaced or adapted to Hadoop. This fact alone can result in a significant learning curve. Add to this the need to replace other tools, and the curve is exaggerated further. To address this need, innovation labs are often used to help business users understand the limi- tations of the Hadoop platform and begin devel- oping the skills necessary for supporting the system. By utilizing an innovation lab, business users may even begin to gain new informa- tional knowledge, as well. For example, business users can be introduced to new analytics tools while new analytics models are being developed. Simultaneously, data scientists can begin analyzing new data sources as platform usage matures within the organization. The business’s opera- tional units can also begin preparing and collabo- rating with their suppliers on key issues, such as monitoring, point-in-time recovery, failover, data security and all capabilities that are currently not part of their numerous software solutions. Quick Take In recent programs that used a similar reference architecture, the following results have been achieved: • A large financial company reduced its capital expenses by eliminating the need to upgrade existing systems. It accomplished this by relocating transformational and data aggrega- tion processes to the big data platform. If it had continued processing on the current database system, the company estimates it would have needed to allocate over $20 million in capital expenditures. • After having an integrated customer view on its wish list for over 10 years, an insurance giant was finally able to achieve its goal through the use of NoSQL technology. It combined data from nearly 100 separate administrative and claims systems and moved from pilot to rollout in 90 days, creating a more accurate churn model that responded much sooner to new signals. This 90-day turnaround was a refreshing outcome compared with typical insurance industry IT projects, which are measured in quarters or years.3 A Cautionary Tale But not all big data stories end well. A global company attempted to move portions of its data acquisition, aggregation and analytical process- ing to Hadoop. However, the company did not understand the platform’s inherent limitations before attempting to port a broad range of func- tionality to it. As a result, it had to revert back to its traditional platform, losing time and money and instigating concern from business teams regarding the platform’s overall viability. Real-Life Reference Architectures 5cognizant 20-20 insights
  6. 6. cognizant 20-20 insights 6 Innovation labs are critical to enabling the right combination of toolsets, datasets, skill sets and mindsets for better, faster, cheaper and more successful use of big data technologies. Another crucial concept of the innovation lab is to provide an environment where new technological compo- nents can be introduced — as the technology matures — and where IT can learn more about the limi- tations of those technologies. Marker 3: Shared Services Another critical element of success is the shared services model, as well as concerns as to whether the innovation lab can be properly staffed and managed. Without this support, the adoption and success rate of big data programs is often dramatically reduced. The shared services model provides a resource pool to support various development and analytics needs. It also acts as a breeding ground for technology develop- ers to learn and grow while developing the pro- cesses and procedures necessary for shifting the platform into an operational state. The main advantages of using the shared services model include: • Promoting the big data vision to provide the broader context for implementation success. People need direction, and without clear leadership and vision, many will find reason to resist or pull the initiative in opposing directions. • Highly skilled team members who can help train and educate the analytics team, develop- ment staff and data scientists on the proper way to use the system. • Mentorship as a fundamental support system. An extension of training, mentoring enables active participation in big data innovation efforts and provides an environment for leadership development. • Use cases and usage patterns, which provide stronger design guidance than design princi- ples alone. This is because they put the prin- ciples into practice and apply them to different contexts. Doing so better illustrates how they are to be applied and allows many design deci- sions to be pre-determined. • Planning for future adoption and integra- tion of new technologies to meet tomorrow’s business needs. Having a pre-developed plan for big data technology progression — including how existing technologies will fit into the future big data framework — provides a solid foundation for the user community. • Specialized expertise available by project or topic, eliminating the need to hire a full-time resource for a one-time question. • Bypassing of the “who pays first?” dilemma, which occurs when multiple groups could benefit from onboarding a new technology but no funding model exists to allow for group cost-sharing. In other cases, an individual team is stuck with the entire cost burden of the prototype. A shared services organization can leverage a small innovation fund to seed these innovation initiatives quickly and flexibly. As an example of shared services model success, a large financial institution was able to create, process, analyze and leverage data to drive business priorities. This capability enabled it to enhance predictions of customer risk behaviors, strengthen the identification of high-value pros- pects and automate analysis of written customer surveys, which led to decreased service improve- ment time. Here, the shared services model acted as a catalyst for achieving what was possible with big data technology. By combining people and knowledge with a uniform plan, the company was able to bring findings into fruition. Without the shared services model, this capability would have remained in the innovation lab’s Petri dish, never to advance beyond anything more than a good idea. Marker 4: Reaching Best in Class Technology and capabilities alone will not create a best-in-class organization or enable an orga- nization to move from standard reporting and traditional business intelligence analytics to the next level of predictive analytics. As the organi- zation matures in the use of its team, processes and technological components, these features become increasingly ingrained in the business. Data scientists should form a trial-and-error system, testing one idea after another as the data history grows. This may at first seem counter- intuitive, but conducting this type of analysis at this stage of maturity typically is met with higher success rates than alternatives. The reason: Companies usually have the experience needed to quickly identify opportunities, as well as the stability to overcome challenges when testing new ideas. Data scientists should form a trial- and-error system, testing one idea after another as the data history grows.
  7. 7. cognizant 20-20 insights 7 As these ideas are tested and verified, the benefits of a truly agile analytics model begin to take shape. Also, promoting a higher degree of collaboration helps drives additional learning throughout the enterprise and converts into reality the goal of adopting an enterprise-wide big data platform. From On-Ramp to Destination The technological — and organizational — consider- ations that accompany the deployment and intro- duction of big data platforms are an integral part of implementing any big data strategy. Although the Hadoop platform is evolving rapidly, organiza- tions can start achieving results now by enabling users to begin exploring the underlying technolo- gies while addressing key present-day challenges. Companies can maximize their investment in current and future technology — and people — by considering the following tips: • Avoid trading today’s unsolved issues with the invention of new use cases or business problems that new technology is assumed to solve. Instead, take a fresh look at current business problems that can be addressed using the existing infrastructure. • Develop parallel strategies that immediately enable business users’ application of core technological components while correcting your organization’s operational issues. • Refrain from replacing current technology just because you can. Instead, focus on extending the current reference architecture to blend big data components with the existing architectural stack, thereby maximizing current investments and reducing organiza- tional change management challenges. The traditional big data platform represents only one aspect of the overall technology solution, albeit an integral part. Also, your organization’s current technology will be fully aligned with reports and analytics calculations that work well. Big data can pre-process the volumes or calculate new factors, enabling your organiza- tion to leave existing investments to the pre- sentation layer of the calculated results. • Remember that technology is not only maturing; it’s also evolving. Therefore, your organization should embrace new techno- logical components and solutions in lieu of attempting to invest in a single technology stack. Analogies abound on how companies can safely achieve value from big data. By follow- ing this checklist, business and IT leaders alike can surmount steep challenges, derive big data value more quickly, maintain advantage over late adopters and enable the enterprise to benefit from big data technologies. Big Data Organizational Support Model Figure 4 Onboard business unit use cases BigDataTechnologyMaturity Big Data Organizational Maturity High HighLow Maturity Innnovvatioon Laabs –– FFor Quick WWinss Shaared Servicees Modeel Best in Class OOrrganizaationRefereence Arcchiteectturre Enterprise Adoption How do I mobilize all my data and operationalize analytics at scale? Agile Analytics How do I continually improve my analytics capabilities? Technology Refresh How do I leverage big data with my existing assets to revamp my current business model? Pilot How do I ensure expected value proposition? Enable Innovation How do I find additional informational value? Proof of Concepts What are my capabilities/limitations? Awareness Whatisthis“stuff?”
  8. 8. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 164,300 employees as of June 30, 2013, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at or follow us on Twitter: Cognizant. World Headquarters 500 Frank W. Burr Blvd. Teaneck, NJ 07666 USA Phone: +1 201 801 0233 Fax: +1 201 801 0243 Toll Free: +1 888 937 3277 Email: European Headquarters 1 Kingdom Street Paddington Central London W2 6BD Phone: +44 (0) 20 7297 7600 Fax: +44 (0) 20 7121 0102 Email: India Operations Headquarters #5/535, Old Mahabalipuram Road Okkiyam Pettai, Thoraipakkam Chennai, 600 096 India Phone: +91 (0) 44 4209 6000 Fax: +91 (0) 44 4209 6060 Email: ­­© Copyright 2013, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. About the Author Hal Lavender is an Associate Vice President of Enterprise Architecture within Cognizant’s Enterprise Infor- mation Management Practice, leading an enterprise architecture service line that provides architectural consulting and guidance in information management and analytics. Hal has over 30 years of experience with information architecture and technology in the data management space. A graduate of the Florida Institute of Technology and University of Dallas, he holds master’s degrees in business administration and computer science. Hal can be reached at Footnotes 1 “Extracting Value from Chaos,” IDC, June 2011, extracting-value-from-chaos-ar.pdf. 2 “Big Data: The Next Frontier for Innovation, Competition and Productivity,” McKinsey Global Institute, May 2011, 3 D. Henschen, “MetLife Uses NoSQL for Customer Service Breakthrough,” InformationWeek, service/240154741.