• Save
Big Data's Impact on the Data Supply Chain

Big Data's Impact on the Data Supply Chain



Information services companies maintain vast data supply chains (DSC) that form the basis of their revenue streams, yet the onslaight of "big data" threatens to capsize even wel-constructed systems. ...

Information services companies maintain vast data supply chains (DSC) that form the basis of their revenue streams, yet the onslaight of "big data" threatens to capsize even wel-constructed systems. Optimizing and consolidating DSCs (resulting from M&As, for example) requires a next-generation approach that integrates data sourcing and collecting, data management and data delivery.



Total Views
Views on SlideShare
Embed Views



10 Embeds 52

https://twitter.com 27 14
http://localhost 3 2
http://pmomale-ld1 1 1
http://searchutil01 1
https://social.cognizant.com 1
http://user-pc5 1
http://dschool.co 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Big Data's Impact on the Data Supply Chain Big Data's Impact on the Data Supply Chain Document Transcript

  • • Cognizant 20-20 InsightsBig Data’s Impact on the Data Supply ChainThe proliferation of social, mobile and traditional Web computingposes numerous business opportunities and technological challengesfor information services companies. By revamping their enterpriselevel data supply chains and IT infrastructures, as well as embracingpartnerships that deliver analytical acumen, information servicescompanies with the right organizational mindset and service deliverymodel can survive, if not thrive, amid the data onslaught. Executive Summary patterns, this next-generation platform will also induce information providers to roll out innovative The information industry is in the middle of a data products at a faster pace to beat the competition revolution. An unimaginably vast amount of data from both existing and niche information players is growing exponentially, providing challenges and meet the demand for value added, real-time and opportunities for data-centric companies data and quicker solutions for end consumers. to unlock new sources of economic value, take better credit and financial risks, spot business Information Services Trends trends sooner, provide fresh insights into industry A deep-dive into today’s information industry developments and create new products. At the reveals the following patterns. same time, data abundance also creates daunting challenges across numerous dimensions when it Continued Data Deluge comes to the ways and means of handling and Approximately five exabytes (EB)1 of data online analyzing data. in 2002 rose to 750 EB in 2009 and by 2021 it This white paper examines the challenges that is projected to cross the 35 zettabytes (ZB)2 level confront the information services industry and as seen in Figure 1. Statistics3 also indicate that offers guidance on ways the industry can rethink 90% of the data in the world was created in the its operating models and embrace new tools and last two years, a sum greater than the amount techniques to build a next-generation data supply of the data generated in the last 40 years. The chain (DSC). Our approach is premised on a world’s leading commercial information providers platform for turbocharging business performance deal with more than 200 million business records, that converts massive volumes of data spawned refreshing them more than 1.5 million times a by social, mobile and traditional Web-based day to provide accurate information to a host computing into meaningful and engaging insights. of businesses and consumers. They source data To address the information explosion coupled from various organizations in over 250 countries, with the dynamically changing data consumption 100 languages and cover around 200 currencies. cognizant 20-20 insights | may 2012
  • Data Volumes Are Growing one 60 second interval, with astounding growth forecasted. Ever-changing data consumption patterns and the associated technology landscape raise data security and privacy issues as data multiplies and is shared freely ever more. Convergence challenges related to unstructured and structured data also add to the worries. Google, Facebook, 44% LinkedIn and Twitter are eminent threats to estab- lished information players as are nontraditional niche information providers such as Bloomberg Law, DeepDyve, OregonLaws, OpenRegs, etc. that provide precise information targeted at specific 2011 2021 customer groups. The big data phenomenon threatens to break the existing data supply IDC predicts that between 2009 and 2020 digital data will grow 44 times to 35ZB, adding .8ZB of chain (DSC) of many information providers, par- data in 2020 alone. ticularly those whose chains are neither flexible nor scalable and include too many error-prone, Figure 1 manual touch points. For instance, latency in processing all data updates in the existing DSC of Their databases are updated every four to five one well-known provider currently ranges from 15 seconds. With new data about companies instan- to 20 days, versus a target of 24 hours or less. taneously created and parameters of existing That directly translates to revenue loss, customer companies worldwide changing by the minute, dissatisfaction and competitive disadvantage. the challenge will only intensify. These risks are real. Google reported a 20% The world’s leading providers of science and revenue loss with the increased time to display health information, for example, address the search results by as little as 500 milliseconds. needs of over 30 million scientists, students and Amazon reported a 1% sales decrease for an health and information professionals worldwide. additional delay of as little as 100 milliseconds. They churn out 2,000 journals, 20,000 books and major reference works each year. Users on Twitter “Free” Information Business Models, send more than 250 million with Data as the New Fuel The big data tweets per day. Almost 50 Companies such as Duedil, Cortera and Jigsaw (recently acquired by Salesforce.com and phenomenon hoursminute on uploaded per of video are YouTube renamed Data.com) are revolutionizing the “free” threatens to break by hundreds of millions of business model. The Jigsaw model, for instance, the existing data users worldwide. Facebook uses crowdsourcing to acquire and deliver a mar- ketplace for users to exchange business contact supply chain of many houses with over 200 million photos more than 90 billion information, worldwide. For sharing informationinformation providers, photos uploaded per day.4 on non-active prospects that these prospects particularly those would gladly do, users get new leads for free. While knowledge is hidden in If a user finds incorrect data, he gets points whose chains are these exabytes of free data, by updating the record in Jigsaw. Providing neither flexible nor data formats and sources incentives to have users scrub the huge database scalable and include are proliferating. The value enables Jigsaw to more easily maintain data of data extends to analytics integrity.too many error-prone, about the data, metadata manual touch points. and taxonomy constructs. Essentially, users actively source and update With new electronic devices, contacts in the Jigsaw database in return for technology and people free access to the company’s services. Thus, from churning out massive amounts of content by the a data quality, data entry scalability and data fractional second, data is exploding not just in maintenance perspective (issues that typically volume but also in diversity, structure and degree plague systems such as those in the CRM space), of authority. Figure 2 provides an indicative Jigsaw is a strong tool that can be used to append estimate of the data bombarding the Internet in incomplete records, augment leads to build highly cognizant 20-20 insights 2
  • The Digital Data Deluge — One Minute’s Worth of Data Flow 20 47,000 61,141 $83,000 New victims of App downloads Hours of music In sales identity theft 20 million 3,000 204 million $83,000 Photo views Photo Emails sent In sales uploads 320+ 100,000 New Twitter accounts New tweets 13,000 100+ New mobile users New LinkedIn 135 Botnet infections 6 277,000 6 million New Wikipedia Logins Facebook views articles published 2+ million Search queries 30 Hours of And Future Growth Is Staggering video uploaded 1.3 million Video viewsFigure 2targeted lists, plan territories and gain insights How can information services companieson people and companies with the most complete compete and remain cost leaders? Many of theirB2B data in one place. Jigsaw has already existing DSC systems provide neither enoughamassed more than 30-plus million contacts and insight nor a more robust understanding of theiris growing. It sells this information to customers customers — and nor do they reveal how their endwith large CRM databases who can compare customers are interacting with their data. Theirtheir database to the Jigsaw database, identi- existing DSC is not built for handling big data andfying and cleaning up any redundant records. the corresponding big data analytics cannot beJigsaw contacts then make money by offering effectively applied and leveraged to shed lightproducts geared toward companies interested in on what the data means or provide a pathway toincreasing, updating and cleaning their contact reduce IT infrastructure costs to attain greenerdirectories. These free models intensify competi- operations. Moreover, many information playerstion for traditional data aggregators. and their existing DSC systems are not really leveraging social media and its related opportuni-In addition to Google, which operates on an ad- ties to increase customer engagement, improvesupported (free) model, others like WebMD (a content quality and provide incremental value tohealth information provider) rely on advertis- the ecosystem.ing to generate a major portion of their revenuestreams, enabling them to provide free services. We are in an era where we trade our data for freeThey then make additional money from subscrip- goods and services. Never before have consumerstions and premium content, as well as listings from wielded this much power over marketers. All theindividuals who initially come to avail themselves data activity on the Internet, through any device,of free services and end up paying for a listing creates click-trails, leaves digital breadcrumbs,in order to heighten awareness for existing and produces data exhaust and creates metadata.new customers. Such models are allowing newer There is enough economic value in this data forentrants to underprice the competition or to offer an entire industry to be formed around this itself.some portions of their information portfolio for We will see a huge influx of companies dealingfree. As such, this approach threatens traditional with the various aspects of data drilling, shipping,information providers, forcing them to step up. refining, drug discovery and so on. Hence, based cognizant 20-20 insights 3 View slide
  • Thomson Reuters and Information HandlingServices Growth via Acquisitions Growth of Thomson Reuters and Information Handling Services Through Acquisitions Thomson Reuters (TR) TR acquires Information TR acquires Solucient, Thomson Corporation and acquires leading Holdings Inc, TradeWeb, Scholar One, Quantitative Reuters Group PLC combine to information products, CCBN; sells Thomson Analytics, MercuryMD. IHS form TR. IHS acquires Global solutions and publishers Media group and Drake acquires Construction Insight, JFA International, across domains - La Ley, Beam Morin. IHS acquires Research Consulting, Dolphin Software, Environmental Primark, Carson Group, Cambridge Energy Assets of geoPLUS, Software Providers, Divestco IOB (Brazil), Online Research Associates, USA CHD-Canadian, Nixon USA, Prime Publications, business of Dialog. TR acquires Current Information Services. Digital Logs. Documental Solutions, Reservoir Drugs, Gardiner- Visualization. Caldwell, launches Reuters Messaging. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 TR adopts International TR sells print-based Financial Reporting TR continues its acquisition healthcare magazines; TR partners with Merrill IHS acquires Exploration spree – NewsEdge Corp, Select Standards (IFRS). IHS exits newspapers; Lynch, Chicago Mercantile Data Services of Gulf acquires Lloyds Register- Harcourt General div, FindLaw. acquires Elite “ Exchange; acquires Global Coast, PCNAlert, GDS, IHS steadily begins information Fairplay completely, Information Group. Securities Information, Tax RapiData, Geological LogTech Canada, Environ- provider acquisitions across IHS acquires Partners, Action Images. IHS Consulting Services, energy, economics, geopolitical mental Support Solutions, Petroleum Limited, acquires Content and Data Environmax, Jane’s Prómiere. risk, sustainability and supply GeoFluids Data. Services from i2, American Information Group, John chain management - Petroleum Technical Publishing, Jebco, S.Herold, SDG Oil & Gas, Data Services Division, TerraBase, Mapit. McCloskey Group. Accumap Enerdata.Source: Cognizant analysis of Thomson Reuters and IHS data published on each company’s Web site.Figure 3on the above precedent, large players like Exxon, This model is proving more attractive as dataMobil, Pfizer or Merck could create large stand- processing scale, distribution and brand poweralone data-slicing organizations. becomes ever more critical.Mergers and Acquisitions, Industry Acquisitions cause significant problems forConsolidation and Footprint Expansion companies’ DSCs. There is almost certain to beMany information providers have expanded into data quality loss from disparate systems andnew markets through M&As and with local part- operational inefficiencies caused by the lack ofnerships. They seek to integrate all acquired a unified view of data. DSC integration issuescompanies into a single enterprise-level DSC. cause increased latency, slower time to value andHaving started in legal publishing, Thomson customer access problems. Many existing DSCsReuters now has a footprint across various infor- were built as stand-alone systems with closedmation domains, such as healthcare, tax and architectures, and have undergone many custom-accounting, intellectual property, financial, media, izations. This makes integration difficult, raisingrisk and compliance and even science. A leading costs and slowing payback time. It also increasesfinancial information provider has recently maintenance and ongoing enhancement costs.moved its data collection and storage operations Integrating newer functionalities developed usingin Italy to a partner. It has also bought tools for advanced integrated development environmentsdata interoperability between enterprise-level (IDE), debugging and automation tools makes theservices. development lifecycle an extremely complex task and transferring taxonomies becomes complicat-Some players are also consolidating data ed. For these archaic systems, lack of productivityprograms to find synergies in their business line tools and limited hardware and software optionsoperations. They also want newer data sources to result in greater time to market to meet dynamicenhance data quality and variety. Figure 3 depicts business requirements or regulatory compliance.how two large players, Thomson Reuters (TR)and Information Handling Services (IHS), have As the industry grapples with the informationgrown through acquisition during the last decade. explosion, the question on every CIO’s mind is cognizant 20-20 insights 4 View slide
  • how they can handle, manage and analyze this transform their ability to ingest, process anddata avalanche better. From the aforementioned distribute content under a wide variety of newpoints, what clearly emerges is a definite need business models. The key objective is to create afor the information providers to reexamine their next-generation DSC that:existing DSC for potential solutions. They should By reengineering • Optimizes operationalleverage their strategic and technology partner efficiencies. the existing DSC,capabilities in this discovery and eventual imple-mentation process. Starting points include: • Reduces data latency. from data sourcing • Is flexible to accommodate through data delivery,• Are the providers ready to take advantage of the above tipping points to emerge as lean and new data sources. providers can agile players to increase shareholder value? • Is scalable to handle future transform their ability data volumes.• How can providers help users find relevant and to ingest, process and compact information in a flood of big data? • Improves data quality distribute content while dynamically meetingTo address such issues, this paper explores one consumer demands. under a wide varietykey starting point, what we’ve termed the “next-generation data supply chain.” It conceptualizes • Explores newer monetiza- of new business tion models with data as anat a high level current and emerging elements asset. models.embedded in a DSC that can help enable newsolutions and explore opportunities, partner- • Provides faster time to market and the potential for greater revenue recognition.ships and alliances to enhance the value chain.This paper uses “data” and “information” inter- Figure 4 represents paradigms that could createchangeably as data forms the foundation for any a truly modern DSC. The following subsectionsinsightful information; increasingly, the two are present salient thoughts around some of thebecoming difficult to distinguish. prime components of such an upgraded DSC that could address current and futuristic data issues.The Next-GenerationData Supply Chain Data Sourcing and CollectionBy reengineering the existing DSC, from data This category includes business process outsourc-sourcing through data delivery, providers can ing (BPO), business process as a service (BPaaS)5DSC at a Glance COMMON DATA SUPPLY CATALYSTS CHAIN (DSC) OPTIMIZATION OPPORTUNITIES Business Process Crowd-sourced Multi-Platform, Data Collection as a Service Data Integrated Data Sourcing & by Crowd-sourcing (BPaaS) Digitization Workflow Tools Platform & Infrastructure (Cloud Solutions, Open Source Collection Data Governance and Master Data Management Data Supply Chain Monitoring Dashboard Data Supply Chain Maturity Assessment Data Validation Open Source Statistical Data Qualityas a & Cleansing Data Quality Tools Automation Service (DQaaS) Data Quality & BPaaS Tools Cleansing Search Data Mining Extreme Solutions) Data Driven Performance & Advanced On-demand Cloud Data Simulations Enhancement Analytics Analytics Enrichment Storage Data Data Big Data & Optimization Warehouse Transformation - Data No-SQL & Cloud Appliances & Tagging & Quick Management Computing Utilities Reference Codes Multi-Channel, Data as a Social Media Data Visualization Device Friendly Service (DaaS) Integration & Enhanced User Data Delivery Data Delivery ExperienceFigure 4 cognizant 20-20 insights 5
  • and crowdsourcing. Typically, organizations have arbitrage. The cloud combination provides hyper- treaded the traditional BPO path by focusing only scalability and the ability to deliver solutions on daily operational tasks to deliver the outcome. affordably. Financial benefits are in tune with But is there year-round work for data operations reduced operating costs of up to 30%, thereby to have a dedicated BPO team? Until the need cutting capital expenditures on up-front invest- for a dedicated BPO team is felt, it is tempting to ments. Although BPaaS is not a panacea, the have a temporary pay-per- benefits of a variable pricing model combined With electronic use team but “crowdsourc- with technology and business process excellence ing,” as mentioned earlier, that reduce or eliminate large capital demands devices, portables, can be considered as a certainly go beyond cost savings. BPaaS needs to Internet and rapidly replacement for traditional be embraced as a next-generation delivery model. evolving digital outsourcing. The crowd Data digitization entails conversion of physical or refers to the users, who are applications growing volunteers with common, manual records such as text, images, video and in importance, it social, financial or even audio to digital form. To address the prolifera- tion of nonstandard data formats from multiple becomes imperative intellectual motivation to data sources globally, and to preserve and archive accomplish a task. They to adopt data share solutions for mutual data in an orderly manner with simple access todigitization techniques benefit with the crowd- information and its dissemination, data must be standardized. Data digitization does that. and procedures to sourcer (organization or With electronic devices, portables, Internet and an individual) who is the create a truly global problem owner. rapidly evolving digital applications growing in marketplace. importance, it becomes imperative to adopt data Such distributed problem- digitization techniques and procedures to create solving addresses scal- a truly global marketplace. Embracing this will not ability through worldwide access to people and only help providers to contend with the conver- data almost free of cost and generates innovative gence of structured and unstructured data, but results by simplifying an otherwise complex task also enable the producer to reach the consumer that is too difficult to handle internally. There are directly, cutting out inefficient layers. But how numerous examples of projects using the crowd you do it depends on how your DSC is organized. to successfully collect and analyze data, some of which are noted later in this paper. Operating cost For example, how do you digitize 20 years of assessments for crowdsourced data using US$/ newspaper archives in less than three months? full-time equivalent (FTE)/hour suggests savings The New York Times did exactly that by using of about 60% (in the U.S.) to around 65% to 70% the power of collective human minds. The Times (in India) over traditional outsourcing. But how showed archived words (as scanned images) dependable is crowdsourcing? The crowd may from the newspaper archives to people who not work in all situations. Situations where work are filling out online forms across different is intermittent is one place where crowdsourcing Web sites to spell the words and help digitize might not work. In this case, BPaaS could be a the text from these archives. The subscribing more practical middle road approach to integrate Web sites (generally unrelated to the digitiza- business, tools and information technology and tion process) present these images for humans to achieve greater opportunities to optimize to decipher and transform into text, as part of operations and drive efficiency and flexibility. The their normal validation procedures that even BPaaS model retains a lightweight in-house data optical character recognition (OCR) procedures reporting team to interact with an outsourced cannot interpret properly. Text is useful because team who are specialists in handling data scanned newspaper images are difficult to store production and validation with tools residing on on small devices, expensive to download and the Cloud. BPaaS pushes standardization across cannot be searched. The images also protect any many data sources and embodies a flexible pay- suspicious interventions from any automated per-use pricing model. software programs or “bots,” ensuring that only humans validate the words. The sites then return The BPaaS proposition comes with infrastruc- the results to a software service that captures ture, platform and software “as a service” models and aggregates this digitized textual information. without affecting the traditional benefits of out- With the system reported to display over 200 sourcing such as process expertise and labor million words, scalability of the crowd provides cognizant 20-20 insights 6
  • quick results as in each case the human effort is a broad range of generic and specific businessjust a few seconds and represents digitization and rules and also adhere to a variety of data qualityvalidation at its best. standards.Crowdsourced data also makes a lot of sense, There is a need to incorporate data qualityparticularly as business people and consumers components directly into the data integrationincreasingly rely on mobile devices and social architecture that address three fundamental datamedia platforms to share data more freely across quality activities: profiling, cleansing and auditing.geographies. An excellent example of this is Profiling helps resolve the data integrity issuethe GPS navigation where the foundation is the and makes unaudited data or extracts acceptablemap database. Rather than rely solely on a map as baseline datasets. Cleansing ensures thatprovider database that may not necessarily be outlining and business rules are met. Andup to date or accurate, via crowdsourcing users auditing evaluates how the data meets differentreport map errors and new map features. Thus quality standards. To improve customer experi-users can benefit immensely from each other’s ences, leading to higher loyalty and competitivereports at no cost. differentiation, providers have to look beyond custom-built solutions andCrowdsourcing is a brilliant way to collect massive seek vendor assistance todata as it brings down the cost of setting up a maximize their results. Data Failure to implementdata collection unit. However, the data provided quality as a service (DQaaS), comprehensive andby the user community has to be made credible should be an integral part of automated datathrough data verification by the data quality tools. data quality as it allows for aAlthough crowdsourcing might affect data quality, centralized approach. With a cleansing processesby looking at a large base of users the outliers in single update and entry point that identify datadata could be easily found and eliminated. for all data controlled by data quality issues on an services, quality of data auto-Data Quality and Cleansing matically improves as there ongoing basis resultsHigh-quality data is a prime differentiator and is a single best version that in organizationsit’s a valuable competitive asset that increases enters the DSC. DQaaS is overspending on dataefficiency, enhances customer service and drives not limited to the way data isprofitability. The cost of poor data quality for a delivered. The solution is also quality and its relatedtypical enterprise is estimated to cost 8% to 12% simple and cost conscious as cleansing.of revenues. British Gas lost around £180M when data access is devoid of anydata quality problems caused its project to fail, need to know the complexities of the underlyingresulting in degraded customer relationships and data. There are also various pricing approachescontract cancellations. ABN Amro was fined $80M that make it flexible and popular to adopt, be itfor not having effective data quality compliance. quantity or subscriptions based, “pay per call toSevern Trent Water was fined £26M by regulators API” based or data type based. While there are afor trying to cover up data quality issues created wide array of tools from AbInitio, Microsoft SSIS,by its data migration project.6 IBM Ascential, Informatica, Uniserv, Oracle DW Builder, etc. to choose from, there are also openTraditionally, companies have been shortsighted source data quality and data integration toolswhen it comes to data quality by not having a offered such as Google Refine. Given the size offull lifecycle view. They have implemented source Google and its open source reach, the company’ssystem quality controls that only address the products should be seriously considered.point of origin, but that alone is not enough. Dataquality initiatives have been one-off affairs at Open source tools are powerful for working withan IT level rather than collective efforts of both messy datasets, including cleaning up inconsis-IT and the business side of the house. Failure to tencies, and transforming them from one formatimplement comprehensive and automated data into another. Hence a combination of open sourcecleansing processes that identify data quality and proprietary tools will help achieve benefits ofissues on an ongoing basis results in organiza- both worlds.tions overspending on data quality and its relatedcleansing. These issues will only increase in Data Enrichmentnumber and complexity with the increasing data If we are looking at data mining and doingsources that must be integrated. A flexible data sentiment analysis of 120,000 Tweet feeds perquality strategy is potentially required to tackle cognizant 20-20 insights 7
  • second, the enrichment components will be of analytic applications. They enable quickerdifferent than, say, handling 2,000 transactions decision-making in the context of rapidly shiftingper second as done by Visa. While data mining business conditions through the introduction ofand search performance optimization has been interactive visualization capabilities. Applicationsan integral part of data enrichment, approaches can be scaled across the enterprise with faster,such as sentiment or predictive analytics and more accurate planning cycles.data-driven simulations enable more effectiveextraction and analysis of data, leading to more With big data tools like Hadoop and extremeinformed decisions. That said, search is still the analytics, enrichment components can crunchbasis for any insightful data extraction. There is data faster and deliver better results. In addition,a need to evolve search by constantly integrating analytics can be transformed if informationsocial data, both structured and unstructured, to services with their global services partners canresult in true-to-life recommendations for smarter build a focused team of data scientists whouse. Hence efforts should continue to fine-tune understand cloud computing and big data andsearch algorithms with semantic expertise who can analyze and visualize traffic trends. Theyfocused on users to provide relevant answers (not combine the skills of techie, statistician and ajust results) in fractions of a second. Fine-tuning narrator to extract the diamonds hidden withinthe search not only increases targeted traffic and mountains of data. All this will enhance informa-visibility, but it will also provide high return on tion services providers’ abilities to recommendinvestment. The benefits of search optimization the right data sets to the right audience in realefforts have shown increased conversion rates of time and hence divert more traffic, resulting inover 30% for a leading American retailer in the higher revenue-generating opportunities.first two weeks of use, while revenue increased Data Managementover $300 million for a leading e-commerceplayer, to cite two examples. Cloud computing presents a viable solution to provide the required scalability and to enhanceAs a direct impact of the various information business agility. Leading cloud storage providerstrends adding to the data deluge, the term “big such as Amazon Web Services, Google and Micro-data” has come into prominence. This term is soft’s Azure are steadily cutting prices for theiroften used when referring to petabytes, exabytes cloud services. Instead of making millions ofand yet greater quantities of data and generally dollars in upfront investments on infrastructure,refers to the voluminous amount of structured providers can quickly convert Cap-Ex to Op-Exand unstructured data that takes too much and pay for data services as they go. Furthermore,time, effort and money to handle. Since it has to the cost for data storage is declining significantly.do with extremely large data sets that will only Providers could simply tweak their data classifica-increase in the future, big data and its analytics tion schema to optimize storage for even biggerrequire a different treatment. Hence, there is also savings. For example, by moving from a three-tiera need to augment platforms such as Hadoop7 classification to a six-tier one,8 one large manu-to store Web-scale data and support complex facturer cut its storage cost by almost $9 millionWeb analytics. Since Hadoop uses frameworks for 180 TB of data.to distribute the large data sets, processingloads amongst hundreds or even thousands of Going by analysts’ predictions on the impact ofcomputers, it should be explored as an enterprise cloud on the business of storage, there’s $50platform for extreme enterprise analytics — that billion of gross margin of enterprise storage inis, extremely complex analytics on extremely play with the transition to cloud providers. Datalarge data volumes. Since one of the prime center optimization cum consolidation cannotobjectives of any supply chain is an excellent user be neglected either. A Fortune 50 companyexperience, combining critical information and optimized its 100,000-square-foot data center9 ininsight capabilities using advanced analytics is Silicon Valley with the help of a solutions provider,the way forward. Furthermore, to extract the best resulting in $766,000 per year in annual energyperformance through hardware and software savings, a $530,000 Silicon Valley power rebateintegration, tools such as Oracle Exalytics can and a three-month return on investment. Virtu-be adopted to accelerate the speed at which alization in all its forms, including server virtu-analytics algorithms run. Such tools provide alization and storage virtualization, gets morereal-time visual analysis, and enable new types computing power out of servers while delivering cognizant 20-20 insights 8
  • a more adaptable data center that will enable big ways to eliminate data redundancy and enhancedata to be crunched efficiently while requiring less search efficiencies; with just-in-time data updateselectric power (and thus is greener). Providers in the production environment, overall latencyshould hence consider cloud and storage optimi- is reduced. Quick response (QR) code is anotherzation as part of their DSC transformation. optimizing measure for digitally packing more data than the erstwhile barBig Data Fundamentally Changes codes (enabling faster read- LexisNexis High-Data Management ability, especially throughSince information providers have to inevitably mobile devices). Modern search Performanceface the big data revolution, they need to be engines recognize these codes Computing Clusterprepared by building the big data technology to determine the freshness of (HPCC) andstacks. Apart from Hadoop, as discussed above, Web site content. To stay onsome of the necessary components in this archi- top of search listings, providers Appistry’s cloudtecture stack include Cassandra, a hybrid non- need to switch to QR codes to IQ storage Hadooprelational database that provides flexible schema, increase revenue conversion editions are examplestrue scalability and multi-datacenter awareness; rates as they provide quick andand No-SQL databases. No-SQL databases offer convenient user access to their of data supply chainsa next-generation environment that is non- Web sites and hence capture built on clustered filerelational, distributed, open-source and hori- more of the user’s attention, systems storage.zontally scalable. These capabilities need to be particularly when used inbaked into the enhanced data supply value chain advertisements. All these havethrough a service-oriented architecture (SOA) opened the door for more innovative and reve-that will enhance unstructured and structured nue-generating improvements and savings.data management and make providers future-ready by integrating widely disparate applications Data Deliveryfor a Web-based environment and using multiple How data is accessed, explored and visualized hasimplementation platforms. Data warehouse been disrupted by the continued expansion andappliances and utilities will be a blessing in availability of mobile devices. Any informationdisguise that extends beyond a traditional data services company that has not moved away fromwarehouse, providing robust business intelli- largely static and visually uninteresting data rep-gence. They are easy, affordable, powerful and resentations, device incompatible applications oroptimized for intensive analytical analysis and Web applications that don’t support the next-ageperformance with speedy time-to-value. Open social media platforms should do so with urgency.source clustered file systems like the Hadoop Dis- They don’t have to start by custom building fromtributed File Systems (HDFS) and its alternatives scratch. They can adopt dataas well as the Google file systems also resolves visualization application suites Classifying databasessome of the leading big data challenges. They are available either commer-better in performance and provide cost-efficient cially or outsourced to their for topicality is one ofscalability. They are hardware and operating technology partners that can the ways to eliminatesystem agnostic, making them flexible and easy help build automated graphic data redundancyto design and manage and provide industry- data representation tools orleading security in cloud deployments. LexisNexis revealing infographic aggre- and enhance searchHigh-Performance Computing Cluster (HPCC) and gations quickly, or provide efficiencies; with just-Appistry’s cloud IQ storage Hadoop editions are their proprietary packages in-time data updatesexamples of data supply chains built on clustered at a lower cost. This will notfile systems storage. only enhance real-time data in the production but also seamlessly integrate environment, overallAutomation is an integral part of this transforma- with other devices, tools and latency is reduced.tion and some of the aforementioned tools use a software.combination of data and artificial intelligence tocut repetitive tasks of managing various aspects Device applications development holds a mul-of the value chain. This would definitely free up tiplicity of business opportunities to increasepersonnel to focus on core strategic issues rather audience reach and customer engagement. Tothan on tasks that can be easily automated. get there, information services companies shouldClassifying databases for topicality is one of the consider partnering with their service providers cognizant 20-20 insights 9
  • to change the way data is viewed. This should dimensions: data latency, data quality, scalability,be combined with a user-friendly interface to flexibility and cost of operations. A single orga-an external rules engine and repository that nization-wide dashboard with logging capabilitiesbusiness users can use to modify or validate and a user-friendly interface that reports thesebusiness rules and hence add value by further metrics in real time will enable effective dataauthenticating data before delivery. With the auditing and eventually strong data governance.business world speeding toward mobility, the waydata is presented and accessed must be altered Finally, the DSC is only as good as the datato accommodate consumers on the move. Finally, which goes into it. Hence, having a systematicthere is tremendous opportunity to strengthen data governance practice with checks on datainformation providers’ client relationships and consistency, correctness and completeness willgrow their businesses, leveraging a strong social help maintain the DSC without too much effortmedia-influenced DSC strategy. Providers will add and ensure adequate data privacy. Companiescredibility and promote attention among key user normally have data policies, but these policies areconstituencies by enabling their data delivery merely used to satisfy compliance obligations.through social aggregators since this approach Current data governance programs have not yetenhances data and collaboration, creates a buzz attained maturity but companies that have databy distributing and publicizing new informa- governance typically show a 40% improvement intion releases, builds virtual interactive commu- ROI10 for IT investments compared to companiesnities and enables user-generated information that don’t have it.to enhance core research curation. Revenue Challenges to achieving an effective next-genera-generating and increasing avenues will be the tion DSC include:direct outcome of such a social media integrationmeasure. • The very thought of an end-to-end revamp of the entire DSC is overwhelming and theAssessment, Monitoring and amount of time, money and effort involved isData Governance daunting to leaders.One of the logical starting points is to perform an • Without CXO-level sponsorship and motivation,overall assessment of the existing DSC. An alter- there is little chance of success.native approach would be that instead of con-sidering the entire value chain in one stroke, to • Given the significant change management involved, and without a critical mass ofexamine each step over time. This could take the catalysts from the data provider and consumerform of an assessment of subcomponents of the communities, frustration can ensue.DSC (as per specific pain points on a client-to-cli-ent basis) through a consulting exercise. Through • A DSC can only be successful if the data is stan-such assessments, the consulting team can: dardized, as otherwise the organization must write custom code to standardize, clean and• Expose the “as-is state” of the current DSC and integrate the data. its capabilities; this would also include issues and potential risks. • Cloud computing and many of the “as-a-ser- vice” models rely on the service provider’s• Arrive at a “to-be state,” unlocking potential ability to avoid service downtime. opportunities, through findings and recom- mendations. The Path Ahead• Establish a business case and ROI to Staying ahead of the curve and emerging quantify benefits, including justification and victorious will require information services resourcing to realize the desired value-chain companies across industries and domains to transformation. embrace a next-generation DSC, selecting key• Road-map the data value chain implementation elements and service delivery models to unlock process and present a high-level architecture in productivity, seize competitive advantage and a prioritized and time-phased fashion. optimize the business for dynamic changes in the market. As we delve deeper to understand theOnce the DSC is baselined, the next step is to architectural, infrastructural, technical and eveninstrument the supply chain and vigorously business model implications, there is furthermonitor performance across the following prime scope for innovation. cognizant 20-20 insights 10
  • With pressure to cut costs further while at the strategic decisions by carefully trading off riskssame time modernize, there will be an evolving and rewards that ensure coexistence with today’sneed for additional models, methods, frameworks, business constraints and tomorrow’s demands forinfrastructure and techniques that allow providers real-time, anywhere, anytime information access.to tackle a data avalanche that will only increase. Information providers that postpone the decisionInformation providers should collaborate with to face the inevitable trends in the informa-their current or potential strategic and imple- tion industry as discussed in this paper will findmentation partners to mutually explore areas themselves stuck with rigid legacy environmentsin domain, products, services and technology to and will be eventually overtaken by forward-look-“wow” the end consumer. ing and more innovative competitors. If the vision is to be the market leader in world-class informa-Whether organizations expand existing archi- tion and to be the consumer’s preferred choicetectures or start afresh by building, buying or for insightful information and decision-making, itacquiring new capabilities for a more modern and is high time for information players to act now.future-proof DSC, they will need to quickly makeFootnotes1 1 Exabyte (EB) = 1018 bytes and is equal to 260 in binary usage.2 1 Zettabyte (ZB) = 1021 bytes and is equal to 270 in binary usage.3. “Where angels will tread,” The Economist, http://www.economist.com/node/215379674. Data points obtained directly from the following company Web sites: Elsevier, Twitter, YouTube and Facebook.5. Business process as a service, or BPaaS, is an application delivered as a service that is used by service- provider teams that perform business tasks on behalf of the service recipient. BPaaS combines traditional business process outsourcing (BPO) and software as a service (SaaS) to optimize business processes and elevate outcomes.6. British Gas, ABN Amro and Severn Trent Water examples: “Business Value for Data Quality,” http://www.x88.com/whitepapers/x88_pandora_data_quality_management.pdf7. Hadoop is a free (open source) Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is primarily conceived on a MapReduce paradigm that breaks applications into smaller chunks to be processed in a distributed fashion for rapid data processing.8. “How to save millions through storage optimization,” http://www.networkworld.com/supp/2007/ ndc3/052107-storage-optimization-side.html9. “SynapSense Bolsters Data Center Infrastructure Management Solutions with New Energy Management Tools,” http://www.reuters.com/article/2011/06/29/idUS251193+29-Jun-2011+BW2011062910. “IT Governance: How Top Performers Manage IT Decision Rights for Superior Results,” Peter Weill and Jeanne Ross, Harvard Business School Press.About the AuthorSethuraman M.S. is the Domain Lead for Information Services within Cognizant’s Information, Mediaand Entertainment (IME) Consulting Practice. Apart from providing leadership to a team that consultsto information services companies, he provides thought leadership built on his expertise in the infor-mation markets and global media. His extensive IT experience spans consulting, program management,software implementations and business development. Sethu has worked with clients worldwide and haswon numerous awards and recognitions. He received a BTech degree in instrumentation and electronicsfrom the National Institute of Technology (NIT), Punjab, India, an MS degree in software systems fromBirla Institute of Technology and Science (BITS), Pilani, India, and an MBA from the Asian Institute ofManagement (AIM), Manila, Philippines. Sethu can be reached at Sethuraman.MS@cognizant.com. cognizant 20-20 insights 11
  • About CognizantCognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered inTeaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industryand business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50delivery centers worldwide and approximately 140,500 employees as of March 31, 2012, Cognizant is a member of theNASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performingand fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. 1 Kingdom Street #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA Paddington Central Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London W2 6BD Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7297 7600 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7121 0102 Fax: +91 (0) 44 4209 6060 Email: inquiry@cognizant.com Email: infouk@cognizant.com Email: inquiryindia@cognizant.com©­­ Copyright 2012, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein issubject to change without notice. All other trademarks mentioned herein are the property of their respective owners.