Unlocking value in your (big) data


Published on

The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.

This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.

The slides are copyright of Accenture.

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We’llbuildontheseduringthepresentation
  • Thebadnews? It’snotgoing stop.Largeamounts of data bring a whole set of new challenges, howshouldwegoaboutthem?
  • It’s not just growing volumes of existing data, it’s also:The recognition of value in previously throw-away dataNew kinds of “data exhaust” – by-product data generated as part of other processes, currently ignored or thrown awayNew kinds of “intentional” dataThe combination of previously separate data
  • Big Data isnot so muchaboutthe “big”, butaboutfinding new waystohandle and analyze data thatwerenotpossiblebefore. There are a wholelot of new technologiesthat can be usedtodealwithbig data. Are familiar withall of them? Whichoneismostsuitableforyour case?
  • Source: http://www.gartner.com/it/page.jsp?id=1763814
  • Let’s stopfor a secondto look at thekeyenablertechnologies in Big Data.MapReduceOriginallydesigned and firstdeveloped in Google as part of theireffortsto more efficientlyindexthe webMapReduce splits input data into smaller chunk that can be processed in parallel.Scales linearly with number of nodes.HadoopOpen sourceimplementation of MapReduce, basedonGoogle’swhitepaper. Started in Yahoo, nowan top-levelproject in the Apache Foundation.Runsoncommodity software (Linux) and hardware (consumer-grade computerswithdirectlyattachedstorage)Ratherstraightforwardtoinstall and administrateLargeecosystem of additional open sourcecomponents: Pig, Hive, Oozie, FlumeLargeecosystem of commercialofferings (bothclosed and open source)
  • Big Data AnalyticsTechnologyMultiple tools and technologies, sometimes for the same purpose: Hadoop, NoSQL databases, in-memory analytics)Time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologiestraditional data warehousing processes are too slow and limited in scalabilityability to converge data from multiple data sources, both structured and unstructureddecreased that time to informationSkillsThere’sonly so muchwe can do withexploratoryprocesses; theonlywaystoeffectivelyanalyzebig data requiremathematical and statisticalconceptswithwhich more traditionalanalysts are not familiarBusinessanalystsusedto be abletomanagewith Excel and basic SQL knowledge; nowwith data thatdoesnotfollowany particular model (it’sunstructuredafterall), thereis a needto look foranalysisthat are comfortablewithstatisticals and mathematicalconcepts, who are abletodevisetheirownmodelstofindpatters and insightswherethereapparentlywerenone.Processes & OrganizationData must be open and sharedacrosstheenterprise, supportedbyorganizationsthat “own” itData must be madeavailableacrosstheenterprise (i.e. wecan’tfindtrends in data thatwe do nothave)
  • Source: “Big Data Analytics:Future Architectures,Skills and Roadmapsfor the CIO”, IDC 2012 (http://www.sas.com/resources/asset/BigDataAnalytics-FutureArchitectures-Skills-RomapsfortheCIO.pdf)Thethree Vs:Velocity, Volume and VarietyEverythingwill be analyzed, buthowmuch do wehave, howsoon do weneedit and howfast can we do it?
  • MapReduce and Hadoop is currently seen as a low-level paradigm on top of which high-level tools must be built that are more intuitive and easy to use non-programmer types (business analysts, data scientists)Big Data technologies have not reached maturity yet and will continue to evolve over the next coming years. IT decision makers must still be realistic about the limits of what can be achieved via these technologies, sometimes waiting instead for the next generation of data technologies.There is also a lot of start-up activity happening (Scalar, MapR). Also, “traditional” large vendors do not want to be left behind: Microsoft SQL Server 2012 will be able to read and write data from Hadoop and HDFS or run Hadoop on Microsoft’s Azure PaaS, IBM has a version of InfoSphereBigInsights ready to be run on their SmartCloud solution and Oracle has recently introduced its own appliance of both a software and hardware solution with Hadoop and in-memory capabilities for handling large amounts of data.
  • Big Data Analytics is anaugmentation to existinganalytical infrastructure that willallow to scale and drive insights beyond “current capabilities”So the question becomes:how do we add these capabilities to interoperate with traditional tools?
  • The worlds of structured and unstructured data are rapidly converging. Architects and CIOs must find ways to manage this convergence and enable all forms of datamanagement to coexist, sometimes using bridge technologies, such as using Hadoop to process and import data into traditional systems in ways that wouldn’t be possible with just the RDBMS approach. “Hybrid” landscapes are justthat, where Hadoop isintegratedwithexisting data warehouses, traditionalrelationaldatabases and applications in a waythattheimpactontheenterpriseisminimized.The reality is that the EDW is evolving into a virtualized cloud ecosystem in which all of these database architectures can and will coexist in a pluggable “Big Data” storage layer alongside HDFS, HBase (Hadoop’s columnar database), Cassandra (a sibling Apache project that supports peer-to-peer persistence for complex event processing and other real-time applications), graph databases, and other “NoSQL” platforms behind an abstraction layer with MapReduce as its focusBig Data is not necessarily about its “bigness.” Very few organizations are going to need the type of scale that often makes the Big Data headlines. So, far from rendering the relational database obsolete, the new advances will be incorporated over time into the traditional databases, extending their performance.Adding Hadoop to the enterprise provides a cost effective place to store vast quantities of structured data from operational systems and combine it with both internal and externally sourced unstructured / semi-structured data.Also advanced MapReduce analytical methods can be used directly against that store, or through Hive / Hbase more traditional BI tools can be used to analyze the data.
  • We’veseenthetools,butwhoisgoingtobuild, run and maintainallthis?TechnologyskillsTheemergence of big data isbasedon new technologiesthatrequireeither training orsourcingadditionalexpertiseData scienceTraditionalanalyticalmodels do notgenerallyscalewelltothetypical “big data-like” volumes; new ways of thinking are needed, waysthathelpfindwhatwewantedtofind as well as whatwedidnotknowwecouldfindData scientists are thenextgeneration of businessanalysts, withstrongstatisticalskills and abletothink “outside of the box” lookingfor new analyticalmodels.
  • Agile software developmentmethodologies are one of thepotentialanswerstothis.A data strategyisrequired, butwithanapproachthatisaboutmodelingless and iterating more (justlike agile).
  • Require new tools and technologyBig Data doesn’talwaysgetitright,withorwithoutanalytics (wacky iTunes and Spotifyrecommendations, weirdLinkedInsuggestions)Require new skills in yourworkforceResistanceisfutile – Big Data and analytics are inescapableTheycreatebusinessvalueforthebottom-lineItisthepathtocompetitiveadvantageBig Data isnotonlytransforming IT, itisalsotransformingbusinesses and industries: retailrecommendations, smart meter/gridanalytics
  • How do wegetstartedwithallthis?Identifywhichbusinessprocessescouldbenefitthemostfromimprovedhandling and processing of largeamounts of data – what are thebusinessdecisionsthatwemakeeachday and thatwe’dliketomake more efficiently and more effectively?Productize data acrossthecompany, makeit a “firstclasscitizen” and providesomekind of data servicelayer so that data isaccessiblethroughouttheenterpriseIdentifytheskill and technology gaps and decide whethertogroworacquire new talent and technologyforthecompany (withorwithoutthecloud)Itisclearthatthisrequiresaninvestment; itisthepath forward, butitrequiresthatyou as decision-makersmake a commitmenttogrowbig data in yourcompany.
  • Source: http://www.accenture.com/us-en/technology/technology-labs/Pages/insight-accenture-technology-vision-2012.aspx (http://bit.ly/accenturetechvision2012 and http://bit.ly/accenturetechnologyvision2012)
  • Unlocking value in your (big) data

    1. 1. Unlocking Value in (Big) DataOscar Renalias, Accentureoscar.renalias@accenture.com
    2. 2. About the presenterOscar RenaliasOscar is a Technology Architect and has been working atAccenture in the Helsinki office for the last 5 years. He holds aBachelor’s Degree in Computer Science from the UniversitatPolitècnica de Catalunya (UPC), in Barcelona.Oscar currently belongs to the global organization withinAccenture responsible for pushing technologyinnovation, working with selected new and emergingtechnologies together with clients to generate business value.Hadoop/Big Data is one of those areas.Oscar.renalias@accenture.com+358407725915 Copyright © 2012 Accenture All rights reserved.
    3. 3. Agenda• Top 4 things about Big Data & Analytics• What is Big Data?• Big Data Analytics – what is it?• What does it contain?• How is it integrated?• How do we manage it?• What next? Copyright © 2012 Accenture All rights reserved.
    4. 4. Top 4 things about Big Data Analytics Resistance is futile, you will be assimilated Competitive advantage It’s different Data wants to be openCopyright © 2012 Accenture All rights reserved.
    5. 5. Data is growingIt’s growing. Quickly. And it’s everywhere. Data stored in Exabytes (1018) 9000 7910 8000 7000 6000 5000 4000 3000 2000 1227 1000 130 0 2005 2010 2015 Source: IDC’s Digital Universe Study (sponsored by EMC), June 2011Copyright © 2012 Accenture All rights reserved.
    6. 6. New kinds of dataStructured data vs. Unstructured data growth Complex, Unstructured Analysis gap Our ability Relational to analyze Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. .Copyright © 2012 Accenture All rights reserved.
    7. 7. Big Data TechnologiesNew technologies, new approaches Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011Copyright © 2012 Accenture All rights reserved.
    8. 8. Where do analysts see Big Data?Gartner’s Hype Cycle for Emerging Technologies 2011Copyright © 2012 Accenture All rights reserved.
    9. 9. MapReduce and HadoopMapReduce revolutionized how we handle large amounts ofdata, Hadoop made it simple and affordable • Originally designed and first developed in Google as part of their efforts to more efficiently index the web • MapReduce splits input data into smaller chunk that can be processed in parallel • Scales linearly with number of nodes • Yahoo’s implementation of MapReduce • Open source, top-level project in the Apache Foundation • Designed to run on commodity software (Linux) and hardware (consumer-grade computers with directly attached storage) • Large ecosystem of additional components (both open source and commercial)Copyright © 2012 Accenture All rights reserved.
    10. 10. Big Data AnalyticsWhat is it?Big Data Analytics is a shift in the mindset of how wethink about analytics as an internal component to theorganizationFocuses on letting data be productized in a way thatdrives meaningful insights in a rapid fashion andinnovation to exploit missed opportunities in areaspreviously unlooked…… providing a path to competitive advantageCopyright © 2012 Accenture All rights reserved.
    11. 11. Big Data Analytics vs. traditional analyticsWhere do they differ? Technology Skills Processes & Organization Assumes Basic knowledge of “Siloed” data condensed, structured, an reporting and analysis organizations Traditional Analytics d feature rich datasets that tools, few specialized can be modeled: relational resources Only specific “views” of databases, data data visible across the warehouses, dashboards enterprise A stack of tools that Advanced Data is productized and enables an organization to analytical, mathematical shared across the Big Data Analytics build a framework that and statistical knowledge enterprise allows them to extract required to develop new useful features from a models – the data scientist Dedicated data large dataset to further organizations with well- understand how to model defined data management their data. processes and ownershipCopyright © 2012 Accenture All rights reserved.
    12. 12. Everything will be analyzedThe three Vs Real-time Event In- processing, H memory, NoS adoop + QL, Event NoSQL processing, E DW Velocity Relational, ET Hadoop, ETL L Batch Volume Structured Unstructured Variety Source: IDCCopyright © 2012 Accenture All rights reserved.
    13. 13. Big Data and Analytics in the Enterprise Many technology choices in a rapidly changing environment. Which one is right for you?Distributed Non-Relational Storage and ProcessingBig Data-Enabled Intelligence and AnalysisAnalytics-Focused Massively Parallel Processing(MPP) Software PlatformsHardware Optimized MPP Data WarehousesDistributed In-memory Cloud Copyright © 2012 Accenture All rights reserved.
    14. 14. TechnologyAugmenting existing analytics with Big Data technologies Emerging Data Technologies Big Data Analytics Traditional ToolsCopyright © 2012 Accenture All rights reserved.
    15. 15. SAS-Hadoop integrationAn example of how traditional analytics tools are evolving to interoperatewith HadoopSAS/Access Interface to Hadoop • Enable SAS user to analyze data stored in Hadoop • Allow Hadoop data processing from SAS client software such as Data Integration Studio, Enterprise Guide and Enterprise Miner. • The Access Engine not only move data into and out of Hadoop, but you can also run data processing and have it “pushed-down” into HadoopSAS Data Integration Studio Transformation for Hadoop • New sets of Hadoop transformations that enable DI studio user to load and unload data from Hadoop faster than Sqoop (Can connect to Oracle) • Perform “ETL-like” processing with Hive and Pig. • Hadoop specific scoring transform that enable models to be developed with Enterprise Miner to be deployed to Hadoop via DI Studio.Copyright © 2012 Accenture All rights reserved.
    16. 16. The impact of Big Data Analytics on our landscapesHybrid landscapes, where old and new converge Internal apps, customer- facing apps, mobile Analysis tools apps (SAS, SPSS, R, Data Services (REST, WS) Tableau) Relational DBs Pig Hive HBase MapReduce HDFS Enterprise DW ETL Real-time analytics Time Series Files Social Logs Web ERP CRMCopyright © 2012 Accenture All rights reserved.
    17. 17. Data Science and the skill gapClosing the loop – it’s not just about technology skillsData science“The sexy job in the next 10 years willbe statisticians” – Hal Varian, Chief Economist atGoogleData scientists are the next-generationanalytics professional, responsible forturning the data into insightCopyright © 2012 Accenture All rights reserved.
    18. 18. Big Data Analytics ManagementHow does Big Data Analytics Management Style Differ?In big data analytics resources generally have a hybrid cross betweenSoftware Engineering and Advanced Statistics. This dynamic of skill setsproduces a challenge in project methodology. Analytics Methodologies Software MethodologiesCopyright © 2012 Accenture All rights reserved.
    19. 19. Wrapping upBig Data is challenging current patterns of thought Cost-effective Data computing and Big Data and Analytics “explosion” storage Everything can be Data everywhere: Resistance is futile stored structured, unstru ctured, other Are the path to competitive advantage and create value Cheap large scale people’s computing power data, geolocation Compared to traditional readily available data analytics, they’re different; adapt or become irrelevant Open your dataCopyright © 2012 Accenture All rights reserved.
    20. 20. Wrapping upHow to get started• Identify business processes that you could do more effectively with the help of big data and analytics• Start with well-funded but small trials and proof-of- concepts, evolve towards a solid roadmap• Open up your data, transformation towards a “data as a service” architecture• Acquire or grow the needed technology and analytical skillsCopyright © 2012 Accenture All rights reserved.
    21. 21. Accenture Technology VisionStrong advice on data for 2012 http://bit.ly/accenturetechnologyvision2012 Copyright © 2012 Accenture All rights reserved.