Unlocking Value in (Big) Data

Oscar Renalias, Accenture
oscar.renalias@accenture.com
About the presenter
Oscar Renalias
Oscar is a Technology Architect and has been working at
Accenture in the Helsinki office for the last 5 years. He holds a
Bachelor’s Degree in Computer Science from the Universitat
Politècnica de Catalunya (UPC), in Barcelona.
Oscar currently belongs to the global organization within
Accenture responsible for pushing technology
innovation, working with selected new and emerging
technologies together with clients to generate business value.
Hadoop/Big Data is one of those areas.
Oscar.renalias@accenture.com
+358407725915



                                           Copyright © 2012 Accenture All rights reserved.
Agenda

• Top 4 things about Big Data & Analytics
• What is Big Data?
• Big Data Analytics – what is it?
• What does it contain?
• How is it integrated?
• How do we manage it?
• What next?




                                     Copyright © 2012 Accenture All rights reserved.
Top 4 things about Big Data Analytics

  Resistance is futile,
      you will be assimilated

  Competitive advantage
  It’s different

  Data wants to be open

Copyright © 2012 Accenture All rights reserved.
Data is growing
It’s growing. Quickly. And it’s everywhere.

                                                          Data stored in Exabytes (1018)
                                                  9000
                                                                                                   7910
                                                  8000

                                                  7000

                                                  6000

                                                  5000

                                                  4000

                                                  3000

                                                  2000
                                                                                 1227
                                                  1000
                                                                130
                                                      0
                                                                2005             2010              2015

                                              Source: IDC’s Digital Universe Study (sponsored by EMC), June 2011



Copyright © 2012 Accenture All rights reserved.
New kinds of data
Structured data vs. Unstructured data growth




                                                         Complex, Unstructured
                                                                                                               Analysis
                                                                                                               gap


                                                                                                              Our ability
                                   Relational                                                                 to analyze




        Source: An   IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
                                                                                                                         .
Copyright © 2012 Accenture All rights reserved.
Big Data Technologies
New technologies, new approaches




 Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011
Copyright © 2012 Accenture All rights reserved.
Where do analysts see Big Data?
Gartner’s Hype Cycle for Emerging Technologies 2011




Copyright © 2012 Accenture All rights reserved.
MapReduce and Hadoop
MapReduce revolutionized how we handle large amounts of
data, Hadoop made it simple and affordable

                                                  • Originally designed and first developed in
                                                    Google as part of their efforts to more
                                                    efficiently index the web
                                                  • MapReduce splits input data into smaller
                                                    chunk that can be processed in parallel
                                                  • Scales linearly with number of nodes


                                                  • Yahoo’s implementation of MapReduce
                                                  • Open source, top-level project in the
                                                    Apache Foundation
                                                  • Designed to run on commodity software
                                                    (Linux) and hardware (consumer-grade
                                                    computers with directly attached storage)
                                                  • Large ecosystem of additional
                                                    components (both open source and
                                                    commercial)
Copyright © 2012 Accenture All rights reserved.
Big Data Analytics
What is it?

Big Data Analytics is a shift in the mindset of how we
think about analytics as an internal component to the
organization


Focuses on letting data be productized in a way that
drives meaningful insights in a rapid fashion and
innovation to exploit missed opportunities in areas
previously unlooked…


… providing a path to competitive advantage




Copyright © 2012 Accenture All rights reserved.
Big Data Analytics vs. traditional analytics
Where do they differ?

                                Technology                      Skills                   Processes &
                                                                                         Organization
                        Assumes                        Basic knowledge of            “Siloed” data
                        condensed, structured, an      reporting and analysis        organizations
         Traditional
          Analytics




                        d feature rich datasets that   tools, few specialized
                        can be modeled: relational     resources                     Only specific “views” of
                        databases, data                                              data visible across the
                        warehouses, dashboards                                       enterprise




                       A stack of tools that           Advanced                      Data is productized and
                       enables an organization to      analytical, mathematical      shared across the
         Big Data
         Analytics




                       build a framework that          and statistical knowledge     enterprise
                       allows them to extract          required to develop new
                       useful features from a          models – the data scientist   Dedicated data
                       large dataset to further                                      organizations with well-
                       understand how to model                                       defined data management
                       their data.                                                   processes and ownership



Copyright © 2012 Accenture All rights reserved.
Everything will be analyzed
The three Vs


                        Real-time

                                                                                   Event
            In-                                                                processing, H
       memory, NoS                                                                adoop +
         QL, Event                                                                NoSQL
       processing, E
           DW


             Velocity

       Relational, ET
                                                                                Hadoop, ETL
              L

                             Batch



     Volume                              Structured             Unstructured

                                                      Variety                  Source: IDC

Copyright © 2012 Accenture All rights reserved.
Big Data and Analytics in the Enterprise
 Many technology choices in a rapidly changing environment.
 Which one is right for you?
Distributed Non-Relational Storage and Processing


Big Data-Enabled Intelligence and Analysis


Analytics-Focused Massively Parallel Processing
(MPP) Software Platforms

Hardware Optimized MPP Data Warehouses


Distributed In-memory

                                                    Cloud

 Copyright © 2012 Accenture All rights reserved.
Technology
Augmenting existing analytics with Big Data technologies



                                                   Emerging
                                                     Data
                                                  Technologies




                                                                 Big Data
                                                                 Analytics
                                                   Traditional
                                                     Tools




Copyright © 2012 Accenture All rights reserved.
SAS-Hadoop integration
An example of how traditional analytics tools are evolving to interoperate
with Hadoop
SAS/Access Interface to Hadoop
    • Enable SAS user to analyze data stored in Hadoop
    • Allow Hadoop data processing from SAS client software such as Data Integration Studio, Enterprise Guide and
       Enterprise Miner.
    • The Access Engine not only move data into and out of Hadoop, but you can also run data processing and have it
       “pushed-down” into Hadoop
SAS Data Integration Studio Transformation for Hadoop
    • New sets of Hadoop transformations that enable DI studio user to load and unload data from Hadoop faster than
       Sqoop (Can connect to Oracle)
    • Perform “ETL-like” processing with Hive and Pig.
    • Hadoop specific scoring transform that enable models to be developed with Enterprise Miner to be deployed to
       Hadoop via DI Studio.




Copyright © 2012 Accenture All rights reserved.
The impact of Big Data Analytics on our landscapes
Hybrid landscapes, where old and new converge


                Internal
                apps, customer-
                facing
                apps, mobile                                                               Analysis tools
                apps                                                                      (SAS, SPSS, R,
                                       Data Services (REST, WS)                              Tableau)

                                                                                         Relational DBs
                                          Pig           Hive
                                                                         HBase
                                              MapReduce

                                                    HDFS                                   Enterprise
                                                                                              DW

                                                                 ETL                     Real-time analytics



                                                        Time
                                                        Series   Files   Social   Logs
                  Web            ERP              CRM




Copyright © 2012 Accenture All rights reserved.
Data Science and the skill gap
Closing the loop – it’s not just about technology skills


Data science
“The sexy job in the next 10 years will
be statisticians”
 – Hal Varian, Chief Economist at
Google


Data scientists are the next-generation
analytics professional, responsible for
turning the data into insight



Copyright © 2012 Accenture All rights reserved.
Big Data Analytics Management
How does Big Data Analytics Management Style Differ?
In big data analytics resources generally have a hybrid cross between
Software Engineering and Advanced Statistics. This dynamic of skill sets
produces a challenge in project methodology.


                                                  Analytics Methodologies
           Software Methodologies




Copyright © 2012 Accenture All rights reserved.
Wrapping up
Big Data is challenging current patterns of thought




   Cost-effective
                                                      Data
  computing and                                                           Big Data and Analytics
                                                   “explosion”
     storage
  Everything can be                               Data everywhere:     Resistance is futile
  stored                                          structured, unstru
                                                  ctured, other        Are the path to competitive
                                                                       advantage and create value
  Cheap large scale                               people’s
  computing power                                 data, geolocation    Compared to traditional
  readily available                               data                 analytics, they’re different; adapt
                                                                       or become irrelevant

                                                                       Open your data


Copyright © 2012 Accenture All rights reserved.
Wrapping up
How to get started
• Identify business processes that you could do
  more effectively with the help of big data and
  analytics


• Start with well-funded but small trials and proof-of-
  concepts, evolve towards a solid roadmap


• Open up your data, transformation towards a “data
  as a service” architecture


• Acquire or grow the needed technology and
  analytical skills

Copyright © 2012 Accenture All rights reserved.
Accenture Technology Vision
Strong advice on data for 2012




              http://bit.ly/accenturetechnologyvision2012

                                   Copyright © 2012 Accenture All rights reserved.

Unlocking value in your (big) data

  • 1.
    Unlocking Value in(Big) Data Oscar Renalias, Accenture oscar.renalias@accenture.com
  • 2.
    About the presenter OscarRenalias Oscar is a Technology Architect and has been working at Accenture in the Helsinki office for the last 5 years. He holds a Bachelor’s Degree in Computer Science from the Universitat Politècnica de Catalunya (UPC), in Barcelona. Oscar currently belongs to the global organization within Accenture responsible for pushing technology innovation, working with selected new and emerging technologies together with clients to generate business value. Hadoop/Big Data is one of those areas. Oscar.renalias@accenture.com +358407725915 Copyright © 2012 Accenture All rights reserved.
  • 3.
    Agenda • Top 4things about Big Data & Analytics • What is Big Data? • Big Data Analytics – what is it? • What does it contain? • How is it integrated? • How do we manage it? • What next? Copyright © 2012 Accenture All rights reserved.
  • 4.
    Top 4 thingsabout Big Data Analytics Resistance is futile, you will be assimilated Competitive advantage It’s different Data wants to be open Copyright © 2012 Accenture All rights reserved.
  • 5.
    Data is growing It’sgrowing. Quickly. And it’s everywhere. Data stored in Exabytes (1018) 9000 7910 8000 7000 6000 5000 4000 3000 2000 1227 1000 130 0 2005 2010 2015 Source: IDC’s Digital Universe Study (sponsored by EMC), June 2011 Copyright © 2012 Accenture All rights reserved.
  • 6.
    New kinds ofdata Structured data vs. Unstructured data growth Complex, Unstructured Analysis gap Our ability Relational to analyze Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. . Copyright © 2012 Accenture All rights reserved.
  • 7.
    Big Data Technologies Newtechnologies, new approaches Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011 Copyright © 2012 Accenture All rights reserved.
  • 8.
    Where do analystssee Big Data? Gartner’s Hype Cycle for Emerging Technologies 2011 Copyright © 2012 Accenture All rights reserved.
  • 9.
    MapReduce and Hadoop MapReducerevolutionized how we handle large amounts of data, Hadoop made it simple and affordable • Originally designed and first developed in Google as part of their efforts to more efficiently index the web • MapReduce splits input data into smaller chunk that can be processed in parallel • Scales linearly with number of nodes • Yahoo’s implementation of MapReduce • Open source, top-level project in the Apache Foundation • Designed to run on commodity software (Linux) and hardware (consumer-grade computers with directly attached storage) • Large ecosystem of additional components (both open source and commercial) Copyright © 2012 Accenture All rights reserved.
  • 10.
    Big Data Analytics Whatis it? Big Data Analytics is a shift in the mindset of how we think about analytics as an internal component to the organization Focuses on letting data be productized in a way that drives meaningful insights in a rapid fashion and innovation to exploit missed opportunities in areas previously unlooked… … providing a path to competitive advantage Copyright © 2012 Accenture All rights reserved.
  • 11.
    Big Data Analyticsvs. traditional analytics Where do they differ? Technology Skills Processes & Organization Assumes Basic knowledge of “Siloed” data condensed, structured, an reporting and analysis organizations Traditional Analytics d feature rich datasets that tools, few specialized can be modeled: relational resources Only specific “views” of databases, data data visible across the warehouses, dashboards enterprise A stack of tools that Advanced Data is productized and enables an organization to analytical, mathematical shared across the Big Data Analytics build a framework that and statistical knowledge enterprise allows them to extract required to develop new useful features from a models – the data scientist Dedicated data large dataset to further organizations with well- understand how to model defined data management their data. processes and ownership Copyright © 2012 Accenture All rights reserved.
  • 12.
    Everything will beanalyzed The three Vs Real-time Event In- processing, H memory, NoS adoop + QL, Event NoSQL processing, E DW Velocity Relational, ET Hadoop, ETL L Batch Volume Structured Unstructured Variety Source: IDC Copyright © 2012 Accenture All rights reserved.
  • 13.
    Big Data andAnalytics in the Enterprise Many technology choices in a rapidly changing environment. Which one is right for you? Distributed Non-Relational Storage and Processing Big Data-Enabled Intelligence and Analysis Analytics-Focused Massively Parallel Processing (MPP) Software Platforms Hardware Optimized MPP Data Warehouses Distributed In-memory Cloud Copyright © 2012 Accenture All rights reserved.
  • 14.
    Technology Augmenting existing analyticswith Big Data technologies Emerging Data Technologies Big Data Analytics Traditional Tools Copyright © 2012 Accenture All rights reserved.
  • 15.
    SAS-Hadoop integration An exampleof how traditional analytics tools are evolving to interoperate with Hadoop SAS/Access Interface to Hadoop • Enable SAS user to analyze data stored in Hadoop • Allow Hadoop data processing from SAS client software such as Data Integration Studio, Enterprise Guide and Enterprise Miner. • The Access Engine not only move data into and out of Hadoop, but you can also run data processing and have it “pushed-down” into Hadoop SAS Data Integration Studio Transformation for Hadoop • New sets of Hadoop transformations that enable DI studio user to load and unload data from Hadoop faster than Sqoop (Can connect to Oracle) • Perform “ETL-like” processing with Hive and Pig. • Hadoop specific scoring transform that enable models to be developed with Enterprise Miner to be deployed to Hadoop via DI Studio. Copyright © 2012 Accenture All rights reserved.
  • 16.
    The impact ofBig Data Analytics on our landscapes Hybrid landscapes, where old and new converge Internal apps, customer- facing apps, mobile Analysis tools apps (SAS, SPSS, R, Data Services (REST, WS) Tableau) Relational DBs Pig Hive HBase MapReduce HDFS Enterprise DW ETL Real-time analytics Time Series Files Social Logs Web ERP CRM Copyright © 2012 Accenture All rights reserved.
  • 17.
    Data Science andthe skill gap Closing the loop – it’s not just about technology skills Data science “The sexy job in the next 10 years will be statisticians” – Hal Varian, Chief Economist at Google Data scientists are the next-generation analytics professional, responsible for turning the data into insight Copyright © 2012 Accenture All rights reserved.
  • 18.
    Big Data AnalyticsManagement How does Big Data Analytics Management Style Differ? In big data analytics resources generally have a hybrid cross between Software Engineering and Advanced Statistics. This dynamic of skill sets produces a challenge in project methodology. Analytics Methodologies Software Methodologies Copyright © 2012 Accenture All rights reserved.
  • 19.
    Wrapping up Big Datais challenging current patterns of thought Cost-effective Data computing and Big Data and Analytics “explosion” storage Everything can be Data everywhere: Resistance is futile stored structured, unstru ctured, other Are the path to competitive advantage and create value Cheap large scale people’s computing power data, geolocation Compared to traditional readily available data analytics, they’re different; adapt or become irrelevant Open your data Copyright © 2012 Accenture All rights reserved.
  • 20.
    Wrapping up How toget started • Identify business processes that you could do more effectively with the help of big data and analytics • Start with well-funded but small trials and proof-of- concepts, evolve towards a solid roadmap • Open up your data, transformation towards a “data as a service” architecture • Acquire or grow the needed technology and analytical skills Copyright © 2012 Accenture All rights reserved.
  • 21.
    Accenture Technology Vision Strongadvice on data for 2012 http://bit.ly/accenturetechnologyvision2012 Copyright © 2012 Accenture All rights reserved.

Editor's Notes

  • #5 We’llbuildontheseduringthepresentation
  • #6 Thebadnews? It’snotgoing stop.Largeamounts of data bring a whole set of new challenges, howshouldwegoaboutthem?
  • #7 It’s not just growing volumes of existing data, it’s also:The recognition of value in previously throw-away dataNew kinds of “data exhaust” – by-product data generated as part of other processes, currently ignored or thrown awayNew kinds of “intentional” dataThe combination of previously separate data
  • #8 Big Data isnot so muchaboutthe “big”, butaboutfinding new waystohandle and analyze data thatwerenotpossiblebefore. There are a wholelot of new technologiesthat can be usedtodealwithbig data. Are familiar withall of them? Whichoneismostsuitableforyour case?
  • #9 Source: http://www.gartner.com/it/page.jsp?id=1763814
  • #10 Let’s stopfor a secondto look at thekeyenablertechnologies in Big Data.MapReduceOriginallydesigned and firstdeveloped in Google as part of theireffortsto more efficientlyindexthe webMapReduce splits input data into smaller chunk that can be processed in parallel.Scales linearly with number of nodes.HadoopOpen sourceimplementation of MapReduce, basedonGoogle’swhitepaper. Started in Yahoo, nowan top-levelproject in the Apache Foundation.Runsoncommodity software (Linux) and hardware (consumer-grade computerswithdirectlyattachedstorage)Ratherstraightforwardtoinstall and administrateLargeecosystem of additional open sourcecomponents: Pig, Hive, Oozie, FlumeLargeecosystem of commercialofferings (bothclosed and open source)
  • #12 Big Data AnalyticsTechnologyMultiple tools and technologies, sometimes for the same purpose: Hadoop, NoSQL databases, in-memory analytics)Time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologiestraditional data warehousing processes are too slow and limited in scalabilityability to converge data from multiple data sources, both structured and unstructureddecreased that time to informationSkillsThere’sonly so muchwe can do withexploratoryprocesses; theonlywaystoeffectivelyanalyzebig data requiremathematical and statisticalconceptswithwhich more traditionalanalysts are not familiarBusinessanalystsusedto be abletomanagewith Excel and basic SQL knowledge; nowwith data thatdoesnotfollowany particular model (it’sunstructuredafterall), thereis a needto look foranalysisthat are comfortablewithstatisticals and mathematicalconcepts, who are abletodevisetheirownmodelstofindpatters and insightswherethereapparentlywerenone.Processes & OrganizationData must be open and sharedacrosstheenterprise, supportedbyorganizationsthat “own” itData must be madeavailableacrosstheenterprise (i.e. wecan’tfindtrends in data thatwe do nothave)
  • #13 Source: “Big Data Analytics:Future Architectures,Skills and Roadmapsfor the CIO”, IDC 2012 (http://www.sas.com/resources/asset/BigDataAnalytics-FutureArchitectures-Skills-RomapsfortheCIO.pdf)Thethree Vs:Velocity, Volume and VarietyEverythingwill be analyzed, buthowmuch do wehave, howsoon do weneedit and howfast can we do it?
  • #14 MapReduce and Hadoop is currently seen as a low-level paradigm on top of which high-level tools must be built that are more intuitive and easy to use non-programmer types (business analysts, data scientists)Big Data technologies have not reached maturity yet and will continue to evolve over the next coming years. IT decision makers must still be realistic about the limits of what can be achieved via these technologies, sometimes waiting instead for the next generation of data technologies.There is also a lot of start-up activity happening (Scalar, MapR). Also, “traditional” large vendors do not want to be left behind: Microsoft SQL Server 2012 will be able to read and write data from Hadoop and HDFS or run Hadoop on Microsoft’s Azure PaaS, IBM has a version of InfoSphereBigInsights ready to be run on their SmartCloud solution and Oracle has recently introduced its own appliance of both a software and hardware solution with Hadoop and in-memory capabilities for handling large amounts of data.
  • #15 Big Data Analytics is anaugmentation to existinganalytical infrastructure that willallow to scale and drive insights beyond “current capabilities”So the question becomes:how do we add these capabilities to interoperate with traditional tools?
  • #17 The worlds of structured and unstructured data are rapidly converging. Architects and CIOs must find ways to manage this convergence and enable all forms of datamanagement to coexist, sometimes using bridge technologies, such as using Hadoop to process and import data into traditional systems in ways that wouldn’t be possible with just the RDBMS approach. “Hybrid” landscapes are justthat, where Hadoop isintegratedwithexisting data warehouses, traditionalrelationaldatabases and applications in a waythattheimpactontheenterpriseisminimized.The reality is that the EDW is evolving into a virtualized cloud ecosystem in which all of these database architectures can and will coexist in a pluggable “Big Data” storage layer alongside HDFS, HBase (Hadoop’s columnar database), Cassandra (a sibling Apache project that supports peer-to-peer persistence for complex event processing and other real-time applications), graph databases, and other “NoSQL” platforms behind an abstraction layer with MapReduce as its focusBig Data is not necessarily about its “bigness.” Very few organizations are going to need the type of scale that often makes the Big Data headlines. So, far from rendering the relational database obsolete, the new advances will be incorporated over time into the traditional databases, extending their performance.Adding Hadoop to the enterprise provides a cost effective place to store vast quantities of structured data from operational systems and combine it with both internal and externally sourced unstructured / semi-structured data.Also advanced MapReduce analytical methods can be used directly against that store, or through Hive / Hbase more traditional BI tools can be used to analyze the data.
  • #18 We’veseenthetools,butwhoisgoingtobuild, run and maintainallthis?TechnologyskillsTheemergence of big data isbasedon new technologiesthatrequireeither training orsourcingadditionalexpertiseData scienceTraditionalanalyticalmodels do notgenerallyscalewelltothetypical “big data-like” volumes; new ways of thinking are needed, waysthathelpfindwhatwewantedtofind as well as whatwedidnotknowwecouldfindData scientists are thenextgeneration of businessanalysts, withstrongstatisticalskills and abletothink “outside of the box” lookingfor new analyticalmodels.
  • #19 Agile software developmentmethodologies are one of thepotentialanswerstothis.A data strategyisrequired, butwithanapproachthatisaboutmodelingless and iterating more (justlike agile).
  • #20 Require new tools and technologyBig Data doesn’talwaysgetitright,withorwithoutanalytics (wacky iTunes and Spotifyrecommendations, weirdLinkedInsuggestions)Require new skills in yourworkforceResistanceisfutile – Big Data and analytics are inescapableTheycreatebusinessvalueforthebottom-lineItisthepathtocompetitiveadvantageBig Data isnotonlytransforming IT, itisalsotransformingbusinesses and industries: retailrecommendations, smart meter/gridanalytics
  • #21 How do wegetstartedwithallthis?Identifywhichbusinessprocessescouldbenefitthemostfromimprovedhandling and processing of largeamounts of data – what are thebusinessdecisionsthatwemakeeachday and thatwe’dliketomake more efficiently and more effectively?Productize data acrossthecompany, makeit a “firstclasscitizen” and providesomekind of data servicelayer so that data isaccessiblethroughouttheenterpriseIdentifytheskill and technology gaps and decide whethertogroworacquire new talent and technologyforthecompany (withorwithoutthecloud)Itisclearthatthisrequiresaninvestment; itisthepath forward, butitrequiresthatyou as decision-makersmake a commitmenttogrowbig data in yourcompany.
  • #22 Source: http://www.accenture.com/us-en/technology/technology-labs/Pages/insight-accenture-technology-vision-2012.aspx (http://bit.ly/accenturetechvision2012 and http://bit.ly/accenturetechnologyvision2012)