Surviving the Petabyte Age: A Practitioner's Guide


Published on

In the age of "big data," organizations need a business information model that organizes and partitions information in new ways that is useful to how businesses operate today.

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Surviving the Petabyte Age: A Practitioner's Guide

  1. 1. • Cognizant 20-20 InsightsSurviving the Petabyte Age:A Practitioner’s Guide Executive Summary The amount of time it takes for news to become common knowledge has shrunk, thanks to: The concept of “big data ” is gaining attention 1 across industries and the globe. Among the drivers • An emerging network of social media and blogs are the growth in social media (Twitter, Facebook, that potentially makes everyone a publisher of blogs, etc.) and the explosion of rich content from good and bad news. other information sources (activity logs from the Web, proximity and wireless sources, etc.). The • A rapid increase in the number of people who are untethered from traditional information desire to create actionable insights from ever- receptacles and now have a highly mobile increasing volumes of unstructured and struc- means of collecting and ingesting information. tured data sets is forcing enterprises to rethink their approach to big data, particularly as tradi- • The meteoric rise of desktop tools housing a tional approaches have proved difficult, if even significant portion of information. Organiza- possible, to apply to structured data sets. tions need to understand the information and processes involved in the dispensation of desk- One challenge that many, if not most, enter- top-managed information (mostly Microsoft prises are attempting to address is the increas- Access and Excel). This information is most ing number of data sources made available for likely to be found in the form of: analysis and reporting. Those who have taken an early adopter stance and integrated non-tabular > Copies of operational data (including both sources and targets). information (a.k.a. unstructured data) into their pool of analysis data have exacerbated their data > Copies of operational data that is enriched management problems. (including the processes and sources used for enrichment, as well as the targets that A second challenge is the shrinking timeframe in receive the enriched information). which a business stays focused on a particular topic. Thanks to the highly integrated and com- > Processes bypassing the systematized pro- cesses (including the bypassed processes, municative global economy, and the great strides the sources used for these processes, the made in expanding communications bandwidth, actors in these processes and the results of both good and bad news circumnavigate the these processes). globe at a much faster pace than ever before. cognizant 20-20 insights | december 2011
  2. 2. This whitepaper lays out the concept of a business tion models cannot be maintained fast enough toinformation model as a vehicle to organize infor- appease their business constituents. Moreover,mation into separate categories, which directly once constructed and populated with information,influences the creation, capture or extraction of these models require new technologies to inter-business value and elevates it to a heightened face with the data. Adding insult to injury, all thisfocus. We will cover four main topics: data is largely introspective and serves merely to support the status quo. When disruptions occur,1. Why companies dealing with big data in insights can only be gleaned from this data over today’s Petabye Age1 need to stratify informa- a sufficient passage of time; in the meantime, tion so that trustworthy, relevant, actionable insights are derived from what is largely called and timely data can be found at a moment’s unstructured and semi-structured data, as well as notice. data obtained from outside the organization via2. A business model that can be used to stratify social media, blogs, Web sites and a host of other information. sources that don’t fit into the neatly organized tools devised for insight generation.3. A new definition of partitioning and a business process for formulating the partitions. A major shift is transforming the basic tenets of Partitions should deal with stratifying informa- data-driven insight generation. This shift requires tion based on its contribution to organizational a new way of combining and synthesizing data data, as well as the more traditional technical used for navigating the highly integrated and partitioning that is conducted for performance communicative global economy. and maintenance reasons. Overcoming this challenge requires organizations4. Methods of rolling out an information infra- to solve three important issues (see Figure 1): structure aligned with this new partitioning definition. The realities of this new environ- • Data depth: How to derive insight from struc- ment are that the maintenance of a traditional tures that contain billions or more instances of enterprise information model happens at the data. These can include sessions in a Web log, speed of business and is in direct opposition entries obtained from social media, entries from to maintaining the focus of information that RFID activities or mobile-sourced activities. One directly contributes to enterprise value. thing is sure: The sheer size of these pools of data will continue to grow, resulting in techni-Three Issues to Solve cal hurdles that challenge traditional methodsThe Petabyte Age2 is creating a multitude of for efficiently and effectively using such largechallenges for IT organizations, as they find that pools of like data. Most solutions that deal withtheir well-honed, carefully constructed informa- big data attempt to meet this challenge.Data Challenges of the Petabyte AgeFigure 1 cognizant 20-20 insights 2
  3. 3. • Focus on enterprise value: How to quickly Sheer Depth of Similar Data determine which data requires the most focus Specialized tools have emerged to address this at any point in time. Thanks to our tightly issue of enormous pools of similar data. These connected global economy, news travels tools originate from the realization that the time- around the world more quickly than ever, honored structured query language tools, as well which requires rapid rethinking of enterprise as other tools built around database technologies, strategies and tactics. This requires the ability are ill-equipped to efficiently deal with billions, to quickly change which data is focused upon. if not trillions, of rows of data. Spawned from Traditional information models that are con- Google’s attempt to deal with the data accumu- structed to synthesize business knowledge lated from all the interactions that occur with the from the deluge of available data impede the Google software suite, a whole new framework nimbleness required to meet the needs of the built around the MapReduce technology has been modern-day enterprise. borne, and an emerging suite of tools has begun• Less introspective view: How to make the to appear on this new stack of technologies. whole information fabric less introspective. Using information derived from inside the There will no doubt be a refinement of the tech- organization can predict future trajectories niques that are maturing to deal with this concept only if the status quo is assumed. However, of big data. The only thing we can be sure of is when there is a high degree of turbulence, that the big-data business issues addressed by knowledge obtained from internally-generat- MapReduce and the related suite of technologies ed information is woefully inadequate in the are not going away. short term; insights are obtainable only after Just as the technologies available for launching sufficient time has passed and several cycles the initial collection of Web sites were immature, have been interpreted. The resulting organi- so are the tools for developing solutions for big zational missteps are covered regularly in the data. Much has been said about how technology news media. What is required is an ability to has taken a major step back from what is com- wield information as an early-warning system monly available for business intelligence and data for understanding changes in enterprise tra- warehousing solutions — but this is much less a jectories. Such data sources are external to statement about the problem of big data than it the enterprise until enough time has passed is about the immaturity of the technologies avail- for a history of data points to be inferred from able for solving the big-data problem set. internal data.Converting Big Data Into Value Relevant Actionable Trustworthy Acquired & Learned Created Knowledge Data Inference Just-in- Focused Time Capabilities Customers Markets Channels Value Risks Investors Chain Insight Regulatory Expected Disruptions Outcomes Heard Inference Action Innovation Extracted Originated Value Value Value Captured Captured Transaction Captured Value Value StreamFigure 2 cognizant 20-20 insights 3
  4. 4. Managing Opportunity and Risk Managing n Operational tio Risk Ac ra ti bo People Capabilitieso Techn ns ll a Customers olo Co gy ABLER Media N Competitors S S S S S S E Diffusing Focused Enhancing Disruptive Information Sustainable Events Value Markets Geographies Pro Financing tri cs duc Me Innto Re Process h n ul vation ai g at C or Defining e s Enterprise Valu Strategies Figure 3 Interestingly, the problem of large pools of data nal and external sources), learned inferences, is the primary issue, which today is tackled by heard inferences and innovations, some of which introducing technologies to tackle each of the will serve as disruptions to others in the partici- challenges outlined above independently. Com- pating marketplaces. panies that thrive in the Petabyte Age will be able to consolidate the technologies so their busi- It is the business model itself that must provide ness constituency is faced with a single interface the focus into what is pertinent to the business that addresses their full complement of informa- at a particular point in time and that serves as tional needs. the point of contention. The enterprise busi- ness models used as the basis for synthesizing Focus on Influencers information as the means of gaining insight are of Enterprise Value devised to map all data rather than “tiering” data The intent of business intelligence is to take into focus areas. Examples of focus areas include actionable, relevant, trustworthy and timely data; the following: put it through a model that aligns with key busi- • Directly relates to creating or protecting ness challenges (customers, extracted, originated or captured enterprise To create or protect geographies, channels, inves- value. enterprise value, the tors, markets, etc.) as the means to gain insight; and derive an • Does not directly contribute to value but is information deemed action plan to extract, originate mandatory for business operations.worthy of focus must or capture organizational value • May not be mandatory for business operations be sufficiently broad (see Figure 2, captured page). Furthermore, previous value but is mandatory for regulatory purposes. in scope so that both can be a one-time event (i.e., a • May not be mandatory for the above categories but is mandatory for archiving.the opportunities and temporary supply shortfall of risks are exposed in a competitor) or a permanent • Was once important but is now relegated to value stream. While captured historical trivia. all dimensions of the transactions are acceptable, To create or protect extracted, originated or cap- business model. captured value streams are tured enterprise value, the information deemed more desirable. worthy of focus must be sufficiently broad in Data is converted into insight by using acquired scope so that both the opportunities and risks are and created knowledge (obtained from both inter- exposed in all dimensions of the business model. cognizant 20-20 insights 4
  5. 5. For example, in the illustrated business model in at which point it is much more difficult toFigure 3 (see previous page), operational risks, remediate.disruptive events, enterprise strategies and Disruptions make themselves known throughsustainable value sources will be managed by external data much more readily than internalmanaging: data. However, there are also problems with exter-• People, as well as the services they provide. nal data, including the fact that this data is much• Processes and the metrics used to manage the more loosely defined and that the sheer number processes. of information sources are more extensive and change more frequently in scope and content.• Innovations — specifically, the products released into the marketplace. An example of an external data source that can be• Capabilities aligned with technologies. captured is Twitter. All Twitter content is capable of being captured, and a competitor’s promotionInformation will be managed in this model, along that is broadcast on Twitter can be immediatelythe following dimensions (i.e., the enablers): exposed. In order to listen for a Twitter message, however, a handful of literally billions of 140-byte• Customers, or the customers, prospects and messages will be the potential source of this infor- visitors who can be tapped for enterprise value. mation. And Twitter is only one of many informa- tion sources that can expose such calls to action.• Media, both traditional and emerging (social media like Facebook and Google+) that can Early warning systems are not a new phenomenon. influence enterprise value. Just as those that are deployed for catastrophic weather and natural disasters, early warning• Markets participated in for originating, systems for businesses should be launched to extracting or capturing enterprise value. warn of disruptions to the orderly management• Financing, or the source of funds used for of the strategies and tactics of enterprises that investments and cash flow used to originate, ultimately extract, originate or capture value. extract or capture enterprise value. Integrating this information into a meaning-• Geographies and sovereign nations from which ful early warning system requires a new way of enterprise value will be originated, extracted examining information. In the Petabyte Age of or captured. ubiquitous and proliferating data, the integration• Rivals in markets and geographies that of information must be done immediately, or else compete for customers, market coverage and the value of such integration is worth significantly funding sources. less than when it was initially exposed.A Less Introspective View Several years ago, computer scientists discoveredof Information that code was more nimble if it was decoupledOnly expected trends can be tracked using inter- from its underlying model, which gave rise to thenal information. Disruptions will eventually appear SOA and REST architectures; similarly, a processin internal data, but their trajectory will only be can decouple the modeling of data from theevident after two or more cycles of information ability to publish alerts, dashboards and access tomake their way into the internal data stream. This consumers. This post-discovery means of utiliz-means: ing data has been written about by Forrester and others and is the basis of many advanced tools• It will take a minimum of three days for new in the marketplace today. The reason for such an sales trajectories to make themselves known to approach is to discover anomalies prior to the a daily sales system. By that time, any progress normal publication cycle. that competitors have made in capturing value from your largest customers is removed for A number of technical solutions are emerging to immediate transactions (i.e., captured trans- deal with publishing data at a moment’s notice. actional value) and, in many cases, is gone Most of these solutions are covered under the forever (i.e., captured value streams). topic of “virtualized data warehouses,” which will be covered in a separate whitepaper. Momentum• In cases where data is reported less frequently, for virtualized warehouse technology has picked such as financial results, it will take weeks or up, as all vendors in the space have positioned months for such situations to be exposed, themselves to offer “perfect solutions.” cognizant 20-20 insights 5
  6. 6. Stages of Information Management The EIS/DSS Age The BI/DW Age The NextGen Age (circa 1975-1997) (circa 1993-2013) (circa 2010-?) Issues that were tackled: Issues that were tackled: • Elimination of paper • Single version of the truth • Improvements in monitored data • Terabytes of information • Information responsiveness • Performance constraints • Gigabytes of information • Governance models • Delivery models (PCs, Windows) • Specialized tools • Support costs • Delivery models (Web, etc.) Issues that must be tackled: • Just-in-time information • Always-on prioritized information • Less introspective information • Petabytes of information • Source integration timing • Governance and valuation models • Component-based delivery modelsFigure 4A Framework for the Petabyte Age available elsewhere rarely comes in neat bundles of tables that are easily integratedRoughly every 15 to 20 years, the disciplines of using readily available scripts.delivering enterprise information for creatingbusiness-critical insight and improving the overall • The ability to integrate new sources of infor-decision-making process undergo radical change mation at a moment’s notice. This requirement(see Figure 4). We are in the midst of such a major challenges the basic tenets of the enterpriseshift. These cycles tend to share the following information model and ETL processes thatcharacteristics: have matured over the past 20 years.• They are ushered in with the availability of • The ability to embrace changes (i.e., tools that are greatly reduced in price or additions and deletions to the information are open source and displace much of the fabric used to steer, organize and ultimately functionality of the products being replaced produce enterprise value by proving that (e.g., in the late ‘90’s, such products like Pilot the technology arm can responsively deliver and Comshare were displaced by market trustworthy information). Disciplines such as upstarts like Javelin and Excel). process governance, data governance, infor- mation centers of excellence that manage• There are referenceable cases of enterprises a catalog of components and information that have successfully utilized next-generation lifecycle management3 are enjoying renewed solutions for translating raw data into insight. popularity because they are cornerstones ofChallenges that must be tackled as part of this this renewed responsiveness to the knowledgenext-generation age are: worker community.• The ability to deliver prioritized, just-in-time What is important in the new disciplines associ- information through an always-on interface ated with insight generation is that they are cen- (i.e., mobile). tered on focusing on information, whether or• The ability to combine information generated not it is traditional, internally sourced informa- inside the organization (introspective) with tion. Many of the information sources will require information made available elsewhere. It is techniques associated with big data (billion-plus important to note that information made row tables), but all of it will require assistance in cognizant 20-20 insights 6
  7. 7. focusing on the information dilemma for the for- > Available in official operational systems.seeable future (i.e., finding which information iscritical for a specific business need is much akin > Available from unofficial operational sys- tems (normally Microsoft Access and Excel).to finding the proverbial needle in a haystack). > Introspective but document-centricMuch work has been done to create an infor- information (contracts, e-mail, etc.).mation lifecycle for managing performance ofanalytical and operational systems. However, par- > Information that is sourced outside the organization (social media, blogs,titioning strategies have rarely been relegated to newswires, etc.).partition information into the following schemes:• Information that is directly attributable to • Step 2: Create an information component inventory, assigning each information compo- generating or protecting revenue for an nent to a segment of the business information enterprise. model and determining its priority in gener-• Information that may not be strategically or ating value to the organization. Also, identify tactically significant to generating revenue but information that is required but not available is mandatory for business operations. Much as part of this exercise. financial data (not financing, which is often a cash position) falls into this category. • Step 3: Assign the information inventory to the partitions of the business information• Information that may not fall into the above model (i.e., directly contributing to enterprise two categories but is required for regulatory value, required for operations, etc.). purposes.• Information required for archival purposes. • Step 4: Align potential initiatives with the par- titioned information inventory and determine• Information that may have once fallen into the the impact to improving enterprise value by above categories but has been relegated to tackling these initiatives, thereby creating a historical trivia. roadmap to this prioritized information fabricThe process of partitioning information into areas critical to capturing, extracting or originatingdeserving focus (called “focus partitioning4”) is enterprise value.completed by determining the following: It is important to note that as much as we think• Step 1: Taking inventory of information used in that the business stakeholders don’t have the data the organization. Information will come from they need to perform their job, in reality there is one of five categories: always a means to obtain and utilize information > Downloaded and enriched through process- required for determining and executing on the es managed entirely from desktop systems. strategic, tactical and operational needs of theTemplate for Capturing, Aligning Information ComponentsWhen capturing the focused information that is used in a big data initiative, it is important to align the data backto the business information model. The template above is a vehicle that can be used to capture the focusedinformation exposed through a big data initiative and ensure alignment and proper placement in the businessinformation model.Figure 5 cognizant 20-20 insights 7
  8. 8. Alignment of Data Inventory with Business ValueEqually important to aligning information to the business information model is the identification of how theinformation will result in positive incremental value to the organization. It is important to continually put theidentified data to the test of whether it is actionable and, if properly used, is associated with organizational value.This template facilitates testing whether information prioritized for the big data initiative is both associated withthe business information model and results in value along the dimensions of the business information model.Figure 6enterprise. In areas where the sanctioned tech- initiative may not deliver the value anticipated ifnical vehicles were unable to provide this infor- the little islands of information are engrained intomation, the enterprise stewards found means to enterprise processes.cobble together the information they required. The determination of whether tackling theseIt is of paramount importance that the identity and islands of information is included in the enter-use of this information be ascertained when chart- prise strategy through an enterprise informationing a course for big data. In reality, lots of related management program, an enterprise data gov-islands of little data are often sewn together in a ernance program or some other initiative is lessbig data initiative. Tackling the obvious big data important than engaging the owners of these islands of information.Footnotes1 Big data includes data sets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing.2 Petabyte Age is a euphemism for the massive volumes of data that many organizations are dealing with that can be measured in petabytes, a unit of information equal to one quadrillion bytes.3 Information lifeycle management is a process used to improve the usefulness of data by moving lesser used data into segments. It is most commonly concerned with moving data from always needed partitions to rarely needed partitions and, finally, into archives.4 Focus partitioning is a term created by the author that describes applying generally accepted techniques to gain performance by segmenting data into partitions (vertical partitioning) to segmenting groups of data by the likelihood that it will participate in achieving organizational value. cognizant 20-20 insights 8
  9. 9. ReferencesMark Albala, “Enhancing Agility: Enabling Information Intelligence for a Turbulent World,” 2010.Mark Albala, “Post Discovery Intelligent Applications: The Next Big Thing,” 2009.Mark Albala, “Information and Execution Agility: The New Imperative,” 2009.Boris Evelson, “Information Post Discovery – Latest BI Trend,” blog post, Forrester Research,May 18, 2009.About the AuthorMark Albala is Practice Director of Cognizant’s North American Enterprise Information ManagementConsulting and Solution Architecture Practice. This practice provides solution architecture, informationgovernance, information strategy and program governance services to companies across industriesand supports Cognizant’s business intelligence and data warehouse delivery capabilities. A graduate ofSyracuse University, Mark has held senior thought leadership, advanced technical and trusted advisoryroles for organizations focused on the disciplines of information management for over 20 years. He canbe reached at CognizantCognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered inTeaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industryand business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50delivery centers worldwide and approximately 130,000 employees as of September 30, 2011, Cognizant is a member ofthe NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performingand fastest growing companies in the world. Visit us online at or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. 1 Kingdom Street #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA Paddington Central Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London W2 6BD Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7297 7600 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7121 0102 Fax: +91 (0) 44 4209 6060 Email: Email: Email:© Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein issubject to change without notice. All other trademarks mentioned herein are the property of their respective owners.