Accenture Building the Foundation for Big Data report - Jan 2013


Published on

Accenture Building the Foundation
for Big Data report - Jan 2013

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Accenture Building the Foundation for Big Data report - Jan 2013

  1. 1. High Performance IT InsightsBuilding the Foundationfor Big Data
  2. 2. Page 2
  3. 3. For years, companies have been provide deeper insights into customers Big Data technologies have enabledcontending with a rapidly rising tide of and operations and ultimately help companies to explore how to increasedata that needs to be captured, stored drive competitive advantage. But this the efficiency, tame the total costand used by the business. But the data cannot be easily managed with of ownership and flexibility of theirnature of that tide has been changing, traditional relational databases and underlying IT infrastructure. Theand increasingly, it includes data from business intelligence tools, which continued growth of data has forceda variety sources, such as social media, are designed for structured data. At companies to seek new ways tosensors, machines and individual the same time, the rapidly increasing acquire, organize, manage and analyzeemployees. This unstructured data now volume and velocity of structured this data.makes up a very significant portion of and unstructured data are only In reality, the advent of Big Datathe data, and companies are rapidly complicating that problem. All of this is bringing new, unprecedentedexploring technologies for analyzing has led to the development of new workloads to the data center. Handlingthis kind of data to gain competitive distributed computing paradigms those workloads will require a distinct,advantage. (See Figure 1.) known collectively as Big Data, separate infrastructure, and IT will and analytics technologies such asFor many, unstructured data need to find ways to simultaneously Hadoop, NoSQL and others that handlerepresents a powerful untapped manage both the old and the new— unstructured data in its native state.resource—one that has the potential to and ultimately bring the two together.Figure 1Q; Does your organization use or have plans to deploy the following technologies? Relational database Search index Rules engines Columnar or column-oriented database Document store Complex event processing In-memory database Data warehouse appliance that combines software and hardware into a single pre-configured bundle Key Value Stores Hadoop 0 10 20 30 40 50 60 70 80 90 100 Have already done Planning to add within 12 months No plansSource: IDC and Computerworld BI and Analytics Survey Research Group IT Survey, 2012, n = 111 Page 3
  4. 4. Understanding the Costs • The cost of risk and quality problems, because Big Data Understanding theThe implementation of Big Data implementations are not easy, Big Data Platformcapabilities can have an impact on and issues can slow progress and The benefits of being able to take advantagemany aspects of the IT infrastructure. require costly re-work. of Big Data are clear and significant. But soBefore launching a Big Data initiative, too are the challenges that Big Data bringscompanies should make sure that • IT opportunity costs, because the to the data center. IT infrastructure teamsthey have a clear idea of the total time spent on the installation and will have to deal not only with huge volumescosts involved. To do so, they need to integration of Big Data reduces of data, but also with the complexity ofconsider factors such as: the time that IT can spend on varied types of data and the increasing activities that add value to the velocity with which that data needs to move.• Hardware costs, including servers, business. In addition, not all of this data has value, and storage and networking. IT must assist data scientists to sift through • The costs of delayed business huge amounts to find the “needle in a• Software costs, including Big improvements, because the haystack” needed to create business insights. Data software (like Hadoop and expected workforce productivity All in all, Big Data will require an its ecosystem) and connectors increases and delivery of business infrastructure that can store, move and required for integration with insights cannot be achieved until combine data with greater speed and traditional databases and business implementation is complete. agility—and traditional IT infrastructures intelligence tools. are simply not engineered to meet this• Implementation costs, such as need. It is technically possible to translate unstructured data into structured form, and research, design and plan work; then use relational database management installation and configuration; systems to manage it, of course. However, integration with existing business that translation process takes a great deal of intelligence applications; and time, driving up costs and delaying time to post-installation development and results. In general, the problem is not so much testing. technological as financial; it is simply not economically feasible to use the traditional infrastructure to manage Big Data.Page 4
  5. 5. It’s clear then that Big Data will require This decentralized approach to Big Data has not a matter of “either-or,” but rather ofits own, more cost-effective approach to several advantages. For example, it provides “and.” To derive business value from the fullinfrastructure—and in many cases, that cost-effective flexibility, with the ability to spectrum of data, IT infrastructure teamsapproach will represent a shift from past scale out quickly to include thousands of will have to work with both infrastructurepractices. For several years, data centers relatively inexpensive servers, rather than models, and operate two more or lesshave been focused on virtualization and going through the expensive upgrade of distinct platforms—and then develop dataconsolidation, moving to smaller footprints enterprise servers and storage equipment. architectures that encompass both.based on relatively few large servers And in terms of performance, the shared- With that in mind, Accenture foresees aconnected to large shared storage platforms. nothing model eliminates the need to rebalancing of the database landscapeHowever, Accenture believes that Big Data funnel data through a limited number of as data architects embrace the fact thatmay often require essentially the opposite shared storage disks—thereby doing away relational databases are no longer theapproach based on a more decentralized with a huge bottleneck that could seriously only tool in the toolkit. “Hybrid solutionmodel. In many situations, the right Big Data affect performance when dealing with large architectures” will mix old and new databaseplatform will consist of clusters of numerous amounts of data. forms, and advances pioneered in the newsmaller, commodity servers, rather than None of this is to say that this Big infrastructure will be applied to invigorateenterprise-class platforms. And storage Data platform will replace the existing the older infrastructure. (See Figure 2.) Inwill be handled locally at the individual infrastructure, or eliminate the need for short, tomorrow’s conversations about dataserver level, rather than centralized and virtualization and consolidation in the architectures will center on rebalancing,shared. (Certainly, there are cases where traditional infrastructure. In looking at coexistence and cross-pollination betweenpre-built Big Data engineered systems will the Big Data and existing platforms, it is the two the most appropriate approach; these arediscussed later.)Figure 2: Converging Data Architecture MPP AgileData Sources BI Analysis Advanced Analytics Data MartsUnstructured Hive Map/Reduce RDBMS HBase Data Integration Data Ingestion Real-time HDFS Analytics Engine External Rack Rack Rack Customer-facing Apps Node Node Node Disk Disk Disk Low-latency Systems: CPU CPU CPU Cassandra/HBase ODS Compute/Storage Hadoop/HDFS Based©2011 Accenture. All rights reserved. Page 5
  6. 6. Page 6
  7. 7. Big Data and the steps to keep it protected and available. This, too, may require new approaches, becauseData Center the volumes involved may be too large to back up and restore through conventionalThe emergence of these Big Data platforms methods. Security features of Big Datawill have significant ramifications for the technologies are continuously maturingdata center, and infrastructure professionals and organizations will need to considerwill have to address a number of new implementing adequate controls to addresschallenges. For example, data centers how to prevent data compromise and theft.will need to manage large-scale Big Data IT governance will also need to be adaptedplatforms, with hundreds or thousands of to support Big Data. As a rule, companiesnew clustered servers being added to the will have to make sure that governanceinfrastructure. They will also need to manage processes are in place for everythingthe provisioning and orchestration of from performance management to serviceservices across these numerous nodes, and chargeback, incident/problem managementintegrate the Big Data management suite and service desk support for the Big Datawith the traditional management suite. platform.Big Data will also put new demands on the Finally, IT will need to determine how tonetwork infrastructure, which will need integrate the Big Data platform with theto move terabyte-sized data sets. Even rest of the IT infrastructure. Companiesthe basics of physical infrastructure will will want to draw on both structured andbe affected, as the installation of large unstructured data, so that the two togethernumbers of commodity servers will require can provide a fuller picture of the business,adjustments in power supplies and heating/ and so Big Data can be understood incooling, as well as floor space. the context of other enterprise data. ThisSimilarly, the Big Data storage infrastructure integration will allow companies to leveragewill need to have multi-petabyte capacity existing data warehouses and analyticand the ability to support, potentially, tools, and enable decision makers to makebillions of objects. And because unstructured widespread use of Big Data throughout thedata represents an increasingly valuable asset, companies will have to take Page 7
  8. 8. Planning the It is also important to recognize that there is not a single “one-size-fits all” As discussed earlier, some companies are likely to find the decentralized, commodity-Infrastructure approach to the Big Data platform. Each company’s situation will be different, hardware “shared nothing” infrastructure to be appropriate. But there are many casesIT groups will need to take a comprehensive, which makes careful upfront planning where other approaches make better sense.multidiscipline approach to create Big Data critical. Infrastructure teams need to For example, the use of the commodityplatforms. IT infrastructure teams should fully understand the impact that Big Data platform with shared storage may be rightwork with other IT professionals who can will have on the data center. That means when smaller workloads are involved, andprovide perspectives on analytics, risk and analyzing data center capacity, storage where concerns about storage bottleneckscompliance, business applications and IT and networking requirements. It means impairing performance are minimal. Thisgovernance. These varied perspectives can identifying potential data sources and might include situations where the companyhelp ensure that data center services are gauging the data set sizes that will need to is just beginning to explore Big Data, andreengineered for the volume, velocity and be managed. It means understanding the workloads are limited. (See Figure 3.)complexity of Big Data, and that there is a analytics workload in term of volume andpath to bring the Big Data and traditional velocity, as well as CPU and IO workloads.architectures together—with an ongoing And it means determining the level offocus on the economics involved. As per integration that will be needed between theAccenture’s research a “data-centric” design Big Data platform and traditional businessis more important than it has ever been. intelligence tools.Figure 3: Infrastructure Solution Patterns Solution Patterns Use Cases Details Commodity platform 1. High flexibility and large scale outs 1. Commodity physical servers local storage 2. Hadoop implementation skills easily 2. Configured in PODs comprising racks of commodity servers accessible 3. Direct attached storage ~12x3TB per node 3. Develop or have access to a reference 4. Onsite disaster recovery backup and recovery architecture for Hadoop implementation 5. Infrastructure automation and orchestration 6. Plan for data center capacity Commodity platform 1. Small—medium implementation 1. Virtual servers running on hypervisors like VMWare ESXi shared storage 2. Hadoop implementation skill easily 2. Configured in PODS comprising nESX clusters with n to 1 density accessible 3. Shared scale out NAS 3. Develop or have access to a reference 4. Shared storage can be a potential bottleneck architecture for Hadoop implementation 5. Onsite backup and recovery 6. Offsite replication for disaster recovery 7. Infrastructure automation and orchestration 8. Plan for data center capacity Big Data Appliances 1. Fast time to delivery 1. Bundled computer, storage, network and Big Data components (Teradata, DCA, 2. Tight integration with existing BI 2. Engineered for high availability and fault tolerance Oracle) Analytics platforms (Oracle, Greenplum, 3. Simple and unified management Teradata) 4. Hadoop management tools 5. System management tools 6. Single support Cloud 1. Fast time to delivery 1. Challenge to move data sets to cloud at start (and possible implementation 2. Data center capacity issue termination of service) (single tenant or 3. Want to experiment with Big Data 2. Attention to data security and privacy multi-tenancy) 4. Already using could for infrastructure 3. Data ownership in cloud©2011 Accenture. All rights reserved.Page 8
  9. 9. In other cases, packaged, engineeredsystems might be appropriate—particularly Keeping the Focus A well-planned approach to the Big Data infrastructure can deliver that type ofwhen time-to-implement is critical. Thesesolutions may involve more upfront hard on the Business infrastructure—and ultimately help both IT and the business succeed. Computingcosts than clusters of commodity servers. While Big Data and traditional paradigms may change with the advent ofBut because they bundle technology and infrastructures will differ in many ways, Big Data, but the business’ expectationssoftware, it is possible to get them in one fundamental principle holds true across for IT as an enabler of efficiency andplace much more quickly, as well as avoid both—the need to ensure that IT enables innovation will not—and that will ultimatelythe complexities (and additional costs) business results. That means companies be the measure of success for the Big Dataof implementing Hadoop and connecting need to carefully assess and monitor total infrastructure.hardware, which can be significant. cost of infrastructure ownership as theyThe engineered solution approach may move forward with Big Data platforms.also be useful in cases where they can At the same time, they need to look beyondfacilitate integration with the existing costs and target infrastructure capabilitiesinfrastructure. (See “Simplifying Big Data that support business agility and growth.Implementation.”) For example, the Oracle Accenture research shows that high-Big Data appliance can streamline such performance businesses tend to emphasize aintegration for companies that are using number of important factors, such as havingOracle databases and business intelligence adaptive, executable strategies in a changingtools to handle structured data. environment, and creating competitive advantage through continual innovation. Big data technologies allow for much more flexibility in mobilizing data more quickly to meet the demands of business. The Big Data infrastructure can be key to enabling those approaches. To that end, companies need to ensure that the infrastructure lets IT continually cost-optimize operations, scale up and down against business demand and demonstrate value back to the business. Page 9
  10. 10. Simplifying Big Data In essence, the potential value of these engineered solutions comes Implementation down to reduced set-up times and A Big Data implementation, streamlined ongoing management— including the integration of various factors that can be vitally important infrastructure components, can in some situations. For example, be a complex task that requires Oracle’s offering in this space—the specialized skills. In addition, as Oracle Big Data Appliance—includes Big Data plays an increasingly vital 648 terabytes of raw storage and role in companies, it will become 216 CPU processing cores in a more important that the related single rack, optimized for Big Data. infrastructure have the kind of The appliance includes a complete performance, security and support set of Big Data software, such as seen in other critical business Hadoop and NoSQL. (See Figure 4.) solutions. With these realities in The entire pre-configured package is mind, companies may want to designed to provide the high levels of consider packaged, engineered performance, availability and security systems that provide “ready-made” required for enterprise systems. Big Data platforms.Figure 4: Two Approaches: Build Your Own vs. Use Oracle Engineered Systems Build Your Own Model Oracle ApproachData Variety Big Data Appliance Distributed File • Cloudera CDH 3 system Cloudera Manager • Big Data Connections Map Reduce Solutions • NoSQL Database CE TransactionSchema-less (Key-value Stores) Oracle Oracle Exadata Exalytics • OTLP & DW v.s. • Speed of • Data mining and Oracle thought • Semantics analysis • Spatial DBMS Schema ETL DBMS Data (OLTP) Warehouse Advanced Analytics Acquire Organize Analyze Acquire Organize AnalyzeReprinted with permission of Oracle Corporation.Page 10
  11. 11. This approach can cut The appliance approach can also environment. Overall, this approachimplementation time significantly. facilitate integration of Big Data can enable companies to weaveFor example, in one assessment platforms with the rest of the Big Data into the overall analyticsof a large Big Data effort, Oracle infrastructure. For example, many ecosystem, helping to simplify theestimated that for a 10-rack, companies use Oracle databases, process of acquiring, organizing and144-node system, hardware and the Oracle Big Data Appliance analyzing a broad range of data.implementation alone would require offers special software “connectors” Each company needs to weigh anearly 1,800 cables and about for integrating the appliance with number of factors, including its own1,300 man-hours of work. With the Oracle Database, as well as with needs, to determine which approachthe appliance package, the same the Oracle Exadata storage solution. to Big Data is right. But as Big Dataimplementation would require just (See Figure 5.) The Oracle Big Data tools and technologies evolve, the48 cables and 38 man hours. And Appliance uses high-speed InfiniBand ready-made engineered solutionslooking at the long run, the Oracle links to connect with these other provided by vendors are likely toappliance provides a single point of Oracle systems, enabling companies provide an appealing option for for Big Data hardware and to create a very high-poweredsoftware, reducing the complexity of computing environment—again, withdealing with multiple vendors. a single vendor supporting the entireFigure 5: Oracle Analytics – Powered by Engineered Systems Data Oracle Sources Big Data Connectors Oracle Big Data Appliance Oracle Exadata Oracle Exalytics Stream Acquire Organize Analyze and VisualizeReprinted with permission of Oracle Corporation. Page 11
  12. 12. If you would like to know About Accenturemore about how Big Data Accenture is a global managementcan help your organization consulting, technology services andachieve high performance, outsourcing company, with 257,000 people serving clients in more thancontact: 120 countries. Combining unparalleledArun Sondhi experience, comprehensive across all industries and business functions,+1 262 212 9496 and extensive research on the world’s most successful companies, AccentureDr. Mark Gold collaborates with clients to help become high-performance businesses and+1 703 947 3639 governments. The company generated netPatrick Sullivan revenues of US$27.9 billion for the year ended Aug. 31, 2012. Its home page is+1 312 693 3411 U. Dell’ 719 244 6325Copyright © 2012 AccentureAll rights reserved.Accenture, its logo, andHigh Performance Deliveredare trademarks of Accenture. mc363