SlideShare a Scribd company logo
1 of 36
Information Management
                             in the Age of Big Data




                                                              Mark Burnard
                                                            EMC Greenplum
                                                               March 2012
                                                         mark.burnard@emc.com


© Copyright 2011 EMC Corporation. All rights reserved.                          1
So what is “Big Data”?

                                                          B ig D a t a is
                                                                                  m e
                                                                            l u
  • m a s s i v e n e w d a t a v o lV s
                                     ume
                                         o
                                        Va
  • a n d n e w d a ta typ e s             r                                 ie
  • g e ne ra te d b y ma ny ne w                                    d e v ic e s
                                                                                  ty
                                                                      Velocity
© Copyright 2011 EMC Corporation. All rights reserved.                                 2
Volume




© Copyright 2011 EMC Corporation. All rights reserved.   3
© Copyright 2011 EMC Corporation. All rights reserved.   4
Meter Data is Growing Exponentially

                                                                                              3,000x   35040
                  R e ad s p e r Ye ar




                                                                       700x
                                                               120x
                                                                                          8760

                                                         30x
                                                                            1 460
                                               12              365



                                                                M e te r-re ad ing fre qu e ncy




© Copyright 2011 EMC Corporation. All rights reserved.                                                         5
© Copyright 2011 EMC Corporation. All rights reserved.   6
Big Data use case: Smart Meter data (Consumer view)




© Copyright 2011 EMC Corporation. All rights reserved.      7
Big Data use case: Smart Meter data (Utility view)




© Copyright 2011 EMC Corporation. All rights reserved.          8
Variety




© Copyright 2011 EMC Corporation. All rights reserved.             9
© Copyright 2011 EMC Corporation. All rights reserved.   10
Velocity



© Copyright 2011 EMC Corporation. All rights reserved.              11
© Copyright 2011 EMC Corporation. All rights reserved.   12
Use Cases for
Big Data Analytics
                                              from Ralph Kimball

 “…systems to support big data
          analytics have to look very
               different than the classic
      relational database systems
         from the 1980s and 1990s.
        The original RDBMSs were
 not built to handle any of these
                                      requirements!”


                                     - Ralph Kimball

                                                                   Source: Kimball, Ralph “The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics”




© Copyright 2011 EMC Corporation. All rights reserved.                                                                                                                            13
Traditional Data Warehousing
             (and Business Intelligence and Business Analytics)




                                  • it’s expensive
                                  • enhancements and projects take too long
                                  • it drives people to create their own “data feifdoms”


© Copyright 2011 EMC Corporation. All rights reserved.                                     14
The challenges of Data Warehousing…

               are now exacerbated by


               the era of



               Big Data
© Copyright 2011 EMC Corporation. All rights reserved.   15
Big Data will revolutionise
             Data Warehousing and analytics.

    New Realities…
• Do it faster
          – Volume: ingest more data
          – Velocity: ingest it faster                   New Demands!
• Manage new data types
          – Variety: manage and allow queries across structured, semi-structured and unstructured
            data

• Be more flexible
          – Unpredictable queries, Rapidly evolving bespoke analytics
          – New tools: Hadoop, MapReduce, Hive, HBase, “R”

• Do it at a lower cost
          – And, keep it unsummarised, and keep it for longer




© Copyright 2011 EMC Corporation. All rights reserved.                                              16
Information Management Strategy
           for Big Data
                                                           Current State                Target State                 Transition Plan


                                                          Assessment of current
                                                                                                                     Resource gap, training
             People &                                    organisational structure     Required skillsets and
                                                                                                                      plan and insource/
                                                           and capabilities vs       organisational structure
             Skillsets                                     requirements of the        to support future state
                                                                                                                     outsource/ suppliment
                                                                                                                            model
                                                               future state


                                                            Review of current
                                                                                      Sustainable approach to        Incremental approach
             Processes &                                     methodologies,
                                                                                     information management            to implement new
                                                         processes & governance
                                                                                     in light of differing levels   processes, methodology
             Methodology                                    vs fit for purpose
                                                                                       of governance needed             and governance
                                                              (future state)


                                                                                      Demarcation of subject        Implementation plan for
             Information                                  Review of requirements     areas by level of rigour in    new platforms, models &
                                                          and fitness for purpose;     data mgmt; new data             frameworks, and
             Architecture                                map of datamart feifdoms         models & data              absorbtion of datamart
                                                                                     management frameworks                  fefidoms



                                                                                         Required future                  Roadmap for
             Technology                                      Review of current
                                                                                      technology platforms,         implementing target state
                                                          platforms & capabilities
             Architecture                                    vs business needs
                                                                                         ecosystem and               technologies, prioritised
                                                                                          architecture                 by business benefit



© Copyright 2011 EMC Corporation. All rights reserved.                                                                                           17
Old School                                                                  New School
                                                         Information
                Data Model - centric                                         Business - centric
• Driven by the Enterprise Data Model                     • Driven by business need to turn data into information, and
  (Corporate Information Factory)                           by Business-led projects (long- and short-term)
• Huge effort and expense in transforming,                • Little or no transformation - business logic is pushed out to
  cleansing and matching data (conformed                    the business. (eg the "Transformationless Warehouse", or
  dimensions etc)                                           "Data Vault")
• Big challenges and expenses in managing                 • Simple data lineage, reduced need for metadata
  metadata, data lineage, MDM integration                   management. Master Data is just another data source.
                                                          • Different data sources can update the UAP at different
• Data loads from multiple systems must be
                                                            intervals, from trickle-feed to hourly/ nightly/ weekly/
  coordinated and inter-dependencies
                                                            monthly/ ad-hoc, as long as the users know when the last
  managed in the ETL scheduling tool and
                                                            refresh occurred. Some datasets are "pointers" to external
  framework
                                                            data sources - no replication.
• Structured data                                         • Structured, semi-structured and unstructured data
• Often forced to work with subsets of data, or           • Platform handles analytics on full datasets, unsummarised -
  forced to summarise data older than 'n'                   > much richer insights. (Wired Magazine: "The End of
  days/months/years                                         Science")




© Copyright 2011 EMC Corporation. All rights reserved.                                                                      18
Old School                                                                 New School
                                                         Technology
       Constrained by Technology                                      Empowered by Technology
                                                          • Low cost of space and performance means teams can cycle
• High cost of space and performance means
                                                            queries and investigations much faster -> different way of
  access/use is rationalised/restricted
                                                            working: more cycles -> more accurate results
• Adding new data sources or developing new               • Adding a new data source to the platform takes minutes, and
  data marts / subject areas typically takes                the logic to integrate the data source is applied by the
  months                                                    business / analyst
• Architecture is usually "scale up" - requires           • Architecture is "scale out" - add capacity without down
  expensive offload-copy-restore when                       time. Possible to use "hybrid cloud" model to add capacity
  increasing capacity                                       on demand during peak periods.
• Dev, Test and DR environment require their              • Dev, Test and DR can be virtual machines, provisioned and
  own servers, maintenance etc                              scaled on demand.
• Processing is in ETL servers, in database, and          • Processing is almost entirely in database. Data movement is
  in BI application servers.                                minimised.
• Many orphan data marts on PCs, laptops,                 • Need for user-created marts is met on the Unified Analytics
  servers                                                   Platform. Safety with flexibility.




© Copyright 2011 EMC Corporation. All rights reserved.                                                                    19
Old School                                                               New School
                                                         Processes
        IT-centric and Control-heavy                                     Trust and enablement
                                                           • Safety is in knowledge management, collaboration and
• Safety is in IT control.
                                                             peer review
• Precision needed - must reconcile and must be
                                                           • Approximate results may be acceptable (depending on
  exact. Gold standard applies to all data in the
                                                             the business use case)
  enterprise data model
• Enforce simplicity - hide complexity from the            • Expose complexity; trust the team. Build and iterate
  business (dumb it down; drag and drop from a               reports from whatever data sources you need (and are
  restricted semantic layer)                                 authorised to access)
• Emphasis on process - fill out the form, submit
                                                           • Emphasis on Self service
  the request
                                                           • Information enables forward-looking insights -> supports
• Information supports "rear view mirror"
                                                             innovation centres and business process re-engineering
  reporting on the past
                                                             or tweaking
• Analysts react to difficulty accessing data by
                                                           • Analytical sandpits are supported on the UAP - logic
  creating copies of data in "off the radar"
                                                             applied can be peer reviewed in the platform
  databases; logic applied is unauditable




© Copyright 2011 EMC Corporation. All rights reserved.                                                                  20
Old School                                                               New School
                                                         People
           Information consumption                                  Information-led Innovation
                (fixed reporting)                                       (flexible exploring)
• Focus is on standard reports for directors and          • Reporting is so BAU it is not the focus; analysts
  managers (analysts get the leftovers)                     empowered to get creative and add much more value.
• Business doesn't trust the warehouse (logic             • Business has control of the logic and transformations (if
  applied in transformations is opaque)                     you don’t trust it… fix it yourself - you built it!)
                                                          • Multiple data types and repositories (RDBMS, Hadoop,
• Single platform, single RDBMS, with many "off
                                                            text, logs) - must be accessible via an overlying single
  the radar" data marts
                                                            interface/platform (UAP)
                                                          • LOBs can collaborate using web 2.0/KM tools built into
• LOBs working in silos
                                                            the UAP
• Tightly controlled data dictionaries and
                                                          • Wiki-style approach for a “data asset registry” allows
  metadata management to preserve the Single
                                                            collaborative and agile metadata management
  Source of Truth
• "Power user" floats around training and
                                                          • Data Scientist floats around educating and empowering
  troubleshooting




© Copyright 2011 EMC Corporation. All rights reserved.                                                                  21
© Copyright 2011 EMC Corporation. All rights reserved.   22
Old School




                                                            New School         Agile Process & Tools
                                                          Analytics Engines
                                                          Analytic Engines


                                                                              Analytic Productivity Platform


                                                   Technology & Information     People & Processes




 © Copyright 2011 EMC Corporation. All rights reserved.                                                        23
Unified Analytics Platform - Customer Example:
T-Mobile

                                                             Greenplum Database + EDC Chorus




 10 0 T B E n t e r p r i s e                                       1 P e ta b yte
          DW                                                       A n a ly t ic D W
                                                           Greenplum Database + Chorus:
ustomer Challenge:                                          – Extracted data from EDW and other source
                                                              systems to quickly assemble new analytic mart
     – 100TB EDW focused on operational reporting
                                                            – Generated a social graph from call detail records
       and financial consolidation
                                                              and subscriber data
     – EDW is single source of truth, under heavy           – Within 2 weeks uncovered behavior where
       governance and control                                 “connected” subscribers where 7X more likely to
     – Unable to support all of the critical initiatives      churn than average user
       around data surrounding the business                 – Deployed1PB production EDC with GP to power
     – Customer loyalty and churn the #1 business             their analytic initiatives
       initiative from the CEO on down


© Copyright 2011 EMC Corporation. All rights reserved.                                                            24
T-Mobile Churn Analysis
• Extracted data from EDW and
  other source systems into new
  analytic sandbox
• Generated a social graph from call
  detail records and subscriber data
• Within 2 weeks uncovered
  behavior where “connected”
  subscribers were seven times
  more likely to churn than average
  user
• T-Mobile valued this insight at
  $70 million (for a $1 million
  investment in Greenplum).




© Copyright 2011 EMC Corporation. All rights reserved.   25
Information Management in the age of Big Data

                                                                  from                         to

                People &                                                                Information-led
                Skillsets                                Information consumption
                                                                                           Innovation

                Processes &                                IT-centric and             Business-centric;
                Methodology
                                                           control-heavy           empowerment and trust

                Information
                Architecture                               Data Model - centric      Business needs - centric


                Technology                                   Constrained by              Empowered by
                Architecture                                  Technology                  Technology



© Copyright 2011 EMC Corporation. All rights reserved.                                                          26
Unified Analytics Platform - Customer Example:
T-Mobile

                                                             Greenplum Database + EDC Chorus




 10 0 T B E n t e r p r i s e                                       1 P e ta b yte
          DW                                                       A n a ly t ic D W
                                                           Greenplum Database + Chorus:
ustomer Challenge:                                          – Extracted data from EDW and other source
                                                              systems to quickly assemble new analytic mart
     – 100TB EDW focused on operational reporting
                                                            – Generated a social graph from call detail records
       and financial consolidation
                                                              and subscriber data
     – EDW is single source of truth, under heavy           – Within 2 weeks uncovered behavior where
       governance and control                                 “connected” subscribers where 7X more likely to
     – Unable to support all of the critical initiatives      churn than average user
       around data surrounding the business                 – Deployed1PB production EDC with GP to power
     – Customer loyalty and churn the #1 business             their analytic initiatives
       initiative from the CEO on down


© Copyright 2011 EMC Corporation. All rights reserved.                                                            27
T-Mobile Churn Analysis
• Extracted data from EDW and
  other source systems into new
  analytic sandbox
• Generated a social graph from call
  detail records and subscriber data
• Within 2 weeks uncovered
  behavior where “connected”
  subscribers were seven times
  more likely to churn than average
  user
• T-Mobile valued this insight at
  $70 million (for a $1 million
  investment in Greenplum).




© Copyright 2011 EMC Corporation. All rights reserved.   28
Traffic Network Modelling




© Copyright 2011 EMC Corporation. All rights reserved.   29
Parallel Model Learning
• Solving tens of thousands of statistical modelling problems, one
  for each road in the city, in parallel:
     SELECT origin, dest,
                        madlib.linregr(travel_time,
                                                         array[peak_period(entry_time), …
                                                               origin_vol, dest_vol])
     FROM route_travel_info
     GROUP BY origin,dest;


• A model: t(x) = 466 + 7.72 peakPeriod(x) + 22.5 workDay(x) +
                                                    0.378 originVol(x) + 0.691 destVol(x)



© Copyright 2011 EMC Corporation. All rights reserved.                                      30
Applications for a Traffic Network Model
• Compute the shortest path between any two nodes at a
  future time point
• Identify potential bottlenecks in the traffic network through
  betweenness centrality scores
• Identify phase transition points for massive traffic congestion
  using simulation techniques
• Study the likely impact of new roads and traffic policies like
  the proposed 40 km/hr speed limit in Sydney CBD




© Copyright 2011 EMC Corporation. All rights reserved.              31
the Big Data writing is on the wall…

The Data Warehouse Institute (TDWI):

• 50% of TDWI survey respondents will replace
  their DW platform in the next 3 years because:
    Cannot do                                                  Poor query response                                                   45%
    advanced analysis                              Can’t support advanced analytics                                            40%
                                                         Inadequate data load speed                                        39%
    Cannot handle             Can’t scale up to large date volumes                                                       37%
    big data                     Cost of scaling up is too expensive                                               33%
    volumes    Poorly suited to real-time or on-demand workloads                                             29%

                                                                                      Source: TDWI Next Gen Database Study, 2010




© Copyright 2011 EMC Corporation. All rights reserved.                                                                                     32
© Copyright 2011 EMC Corporation. All rights reserved.   33
The Greenplum Unified Analytics Platform

                                                          Data               Data         Data        Bl         LOB
People                                DATA SCIENCE TEAM   Scientist          Engineer     Analyst     Analyst    User




                                                          Greenplum Chorus - Analytic Productivity Layer
Processes
                                                                      3rd Party/Partner Tools & Services

Information                                                              Data Access & Query Layer
                                                          Data
                                                          Platform       Greenplum Database         Greenplum Hadoop
                                                          Admin

Technology
                                               Private/Hybrid Cloud Infrastructure or Appliance



© Copyright 2011 EMC Corporation. All rights reserved.                                                                  34
© Copyright 2011 EMC Corporation. All rights reserved.   35
Information Management
                             in the Age of Big Data


                                                         Thank you




© Copyright 2011 EMC Corporation. All rights reserved.               36

More Related Content

What's hot

US Top 11 Cities for Tech Jobs_SA Tech
US Top 11 Cities for Tech Jobs_SA TechUS Top 11 Cities for Tech Jobs_SA Tech
US Top 11 Cities for Tech Jobs_SA TechPratheesh Nair
 
Top 11 cities for tech jobs 2013
Top 11 cities for tech jobs 2013Top 11 cities for tech jobs 2013
Top 11 cities for tech jobs 2013Andy Smith
 
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...TAUS - The Language Data Network
 
Track 1, Session 2, Flash by Amit Sharma
Track 1, Session 2, Flash by Amit SharmaTrack 1, Session 2, Flash by Amit Sharma
Track 1, Session 2, Flash by Amit SharmaEMC Forum India
 
H7149 Final Essays From Creative
H7149 Final Essays From CreativeH7149 Final Essays From Creative
H7149 Final Essays From Creativefionaoriordan
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaManoj Saxena
 
Din mobile framtid, john strand
Din mobile framtid, john strandDin mobile framtid, john strand
Din mobile framtid, john strandErgoGroup
 
03 2010 Online Buyer 101 Webinar
03 2010 Online Buyer 101 Webinar03 2010 Online Buyer 101 Webinar
03 2010 Online Buyer 101 WebinarBob Chaput
 
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceTDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceFitzgerald Analytics, Inc.
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Centervsarathy
 
HP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHPDutchWorld
 
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.final
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.finalMarch 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.final
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.finalTuan Phan
 
Rob anderson
Rob andersonRob anderson
Rob andersonEduserv
 

What's hot (17)

US Top 11 Cities for Tech Jobs_SA Tech
US Top 11 Cities for Tech Jobs_SA TechUS Top 11 Cities for Tech Jobs_SA Tech
US Top 11 Cities for Tech Jobs_SA Tech
 
Top 11 cities for tech jobs 2013
Top 11 cities for tech jobs 2013Top 11 cities for tech jobs 2013
Top 11 cities for tech jobs 2013
 
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
 
Populace+
Populace+Populace+
Populace+
 
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
 
Track 1, Session 2, Flash by Amit Sharma
Track 1, Session 2, Flash by Amit SharmaTrack 1, Session 2, Flash by Amit Sharma
Track 1, Session 2, Flash by Amit Sharma
 
H7149 Final Essays From Creative
H7149 Final Essays From CreativeH7149 Final Essays From Creative
H7149 Final Essays From Creative
 
KMWorld Presentation
KMWorld PresentationKMWorld Presentation
KMWorld Presentation
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Putting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. SaxenaPutting IBM Watson to Work.. Saxena
Putting IBM Watson to Work.. Saxena
 
Din mobile framtid, john strand
Din mobile framtid, john strandDin mobile framtid, john strand
Din mobile framtid, john strand
 
03 2010 Online Buyer 101 Webinar
03 2010 Online Buyer 101 Webinar03 2010 Online Buyer 101 Webinar
03 2010 Online Buyer 101 Webinar
 
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceTDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
 
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data CenterCloud Computing through FCAPS Managed Services in a Virtualized Data Center
Cloud Computing through FCAPS Managed Services in a Virtualized Data Center
 
HP Storage Works -Clemes Esser
HP Storage Works -Clemes EsserHP Storage Works -Clemes Esser
HP Storage Works -Clemes Esser
 
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.final
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.finalMarch 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.final
March 18 _2013_fed_ramp_agency_compliance_and_implementation_workshop.final
 
Rob anderson
Rob andersonRob anderson
Rob anderson
 

Viewers also liked

HP Helion Webinar #5 - Security Beyond Firewalls
HP Helion Webinar #5 - Security Beyond FirewallsHP Helion Webinar #5 - Security Beyond Firewalls
HP Helion Webinar #5 - Security Beyond FirewallsBeMyApp
 
Ejercicio 2
Ejercicio 2Ejercicio 2
Ejercicio 2anitahza
 
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...Cigniti Technologies Ltd
 
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore Tower
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore TowerCondo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore Tower
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore TowerDMCI CondoRentals
 
Palouse Conservation District
Palouse Conservation District Palouse Conservation District
Palouse Conservation District Renee Hausske
 
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부Myeongnyun Mission Church
 
Digitalna garaz Certifikat
Digitalna garaz CertifikatDigitalna garaz Certifikat
Digitalna garaz CertifikatPeter Smatana
 
Political Science - Equality
Political Science - EqualityPolitical Science - Equality
Political Science - EqualitySarvesh Sonawane
 
8 pro tips from successful filers
8 pro tips from successful filers8 pro tips from successful filers
8 pro tips from successful filersWorkiva
 
Bonsai tree | Growing Bonsai Plants | Technique | Training | Book
Bonsai tree | Growing Bonsai Plants | Technique | Training | BookBonsai tree | Growing Bonsai Plants | Technique | Training | Book
Bonsai tree | Growing Bonsai Plants | Technique | Training | BookBonsai Tree Plants
 
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue Airways
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue AirwaysSmarter at every step - the mega-trends: Eash Sundaram, JetBlue Airways
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue AirwaysSITA
 

Viewers also liked (13)

HP Helion Webinar #5 - Security Beyond Firewalls
HP Helion Webinar #5 - Security Beyond FirewallsHP Helion Webinar #5 - Security Beyond Firewalls
HP Helion Webinar #5 - Security Beyond Firewalls
 
Ejercicio 2
Ejercicio 2Ejercicio 2
Ejercicio 2
 
Carte generale 2013
Carte generale 2013Carte generale 2013
Carte generale 2013
 
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...
Case Study - Test Advisory Helps Leading Gas Distributor to Reduce Defect Flo...
 
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore Tower
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore TowerCondo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore Tower
Condo for Rent - Mandaluyong City - Dansalan Gardens - Sycamore Tower
 
Palouse Conservation District
Palouse Conservation District Palouse Conservation District
Palouse Conservation District
 
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부
20151220 주일예배, 막14장01절 11절, 경배받으실 왕 - 2부
 
Digitalna garaz Certifikat
Digitalna garaz CertifikatDigitalna garaz Certifikat
Digitalna garaz Certifikat
 
Formato para la gestion de cambios-Actividad 3
Formato para la gestion de cambios-Actividad 3Formato para la gestion de cambios-Actividad 3
Formato para la gestion de cambios-Actividad 3
 
Political Science - Equality
Political Science - EqualityPolitical Science - Equality
Political Science - Equality
 
8 pro tips from successful filers
8 pro tips from successful filers8 pro tips from successful filers
8 pro tips from successful filers
 
Bonsai tree | Growing Bonsai Plants | Technique | Training | Book
Bonsai tree | Growing Bonsai Plants | Technique | Training | BookBonsai tree | Growing Bonsai Plants | Technique | Training | Book
Bonsai tree | Growing Bonsai Plants | Technique | Training | Book
 
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue Airways
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue AirwaysSmarter at every step - the mega-trends: Eash Sundaram, JetBlue Airways
Smarter at every step - the mega-trends: Eash Sundaram, JetBlue Airways
 

Similar to Information Management in the Age of Big Data

The Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITThe Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITInnoTech
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffelScott Adams
 
Big data: tendências e oportunidades - Palestrante: Ana Oliveira
Big data: tendências e oportunidades - Palestrante: Ana OliveiraBig data: tendências e oportunidades - Palestrante: Ana Oliveira
Big data: tendências e oportunidades - Palestrante: Ana OliveiraRio Info
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation ITPaul Muller
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Datamysqlops
 
Research on big data
Research on big dataResearch on big data
Research on big dataRoby Chen
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Mauricio Godoy
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloudtdwiindia
 

Similar to Information Management in the Age of Big Data (20)

The Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand ITThe Rise of Big Data and On-Demand IT
The Rise of Big Data and On-Demand IT
 
OWF12/Java Michael hirt
OWF12/Java Michael hirtOWF12/Java Michael hirt
OWF12/Java Michael hirt
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffel
 
Big data: tendências e oportunidades - Palestrante: Ana Oliveira
Big data: tendências e oportunidades - Palestrante: Ana OliveiraBig data: tendências e oportunidades - Palestrante: Ana Oliveira
Big data: tendências e oportunidades - Palestrante: Ana Oliveira
 
DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
CTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINALCTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINAL
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation IT
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Data
 
Research on big data
Research on big dataResearch on big data
Research on big data
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloud
 
Lecture1-IS322(Data&InfoMang-introduction)
Lecture1-IS322(Data&InfoMang-introduction)Lecture1-IS322(Data&InfoMang-introduction)
Lecture1-IS322(Data&InfoMang-introduction)
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Information Management in the Age of Big Data

  • 1. Information Management in the Age of Big Data Mark Burnard EMC Greenplum March 2012 mark.burnard@emc.com © Copyright 2011 EMC Corporation. All rights reserved. 1
  • 2. So what is “Big Data”? B ig D a t a is m e l u • m a s s i v e n e w d a t a v o lV s ume o Va • a n d n e w d a ta typ e s r ie • g e ne ra te d b y ma ny ne w d e v ic e s ty Velocity © Copyright 2011 EMC Corporation. All rights reserved. 2
  • 3. Volume © Copyright 2011 EMC Corporation. All rights reserved. 3
  • 4. © Copyright 2011 EMC Corporation. All rights reserved. 4
  • 5. Meter Data is Growing Exponentially 3,000x 35040 R e ad s p e r Ye ar 700x 120x 8760 30x 1 460 12 365 M e te r-re ad ing fre qu e ncy © Copyright 2011 EMC Corporation. All rights reserved. 5
  • 6. © Copyright 2011 EMC Corporation. All rights reserved. 6
  • 7. Big Data use case: Smart Meter data (Consumer view) © Copyright 2011 EMC Corporation. All rights reserved. 7
  • 8. Big Data use case: Smart Meter data (Utility view) © Copyright 2011 EMC Corporation. All rights reserved. 8
  • 9. Variety © Copyright 2011 EMC Corporation. All rights reserved. 9
  • 10. © Copyright 2011 EMC Corporation. All rights reserved. 10
  • 11. Velocity © Copyright 2011 EMC Corporation. All rights reserved. 11
  • 12. © Copyright 2011 EMC Corporation. All rights reserved. 12
  • 13. Use Cases for Big Data Analytics from Ralph Kimball “…systems to support big data analytics have to look very different than the classic relational database systems from the 1980s and 1990s. The original RDBMSs were not built to handle any of these requirements!” - Ralph Kimball Source: Kimball, Ralph “The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics” © Copyright 2011 EMC Corporation. All rights reserved. 13
  • 14. Traditional Data Warehousing (and Business Intelligence and Business Analytics) • it’s expensive • enhancements and projects take too long • it drives people to create their own “data feifdoms” © Copyright 2011 EMC Corporation. All rights reserved. 14
  • 15. The challenges of Data Warehousing… are now exacerbated by the era of Big Data © Copyright 2011 EMC Corporation. All rights reserved. 15
  • 16. Big Data will revolutionise Data Warehousing and analytics. New Realities… • Do it faster – Volume: ingest more data – Velocity: ingest it faster New Demands! • Manage new data types – Variety: manage and allow queries across structured, semi-structured and unstructured data • Be more flexible – Unpredictable queries, Rapidly evolving bespoke analytics – New tools: Hadoop, MapReduce, Hive, HBase, “R” • Do it at a lower cost – And, keep it unsummarised, and keep it for longer © Copyright 2011 EMC Corporation. All rights reserved. 16
  • 17. Information Management Strategy for Big Data Current State Target State Transition Plan Assessment of current Resource gap, training People & organisational structure Required skillsets and plan and insource/ and capabilities vs organisational structure Skillsets requirements of the to support future state outsource/ suppliment model future state Review of current Sustainable approach to Incremental approach Processes & methodologies, information management to implement new processes & governance in light of differing levels processes, methodology Methodology vs fit for purpose of governance needed and governance (future state) Demarcation of subject Implementation plan for Information Review of requirements areas by level of rigour in new platforms, models & and fitness for purpose; data mgmt; new data frameworks, and Architecture map of datamart feifdoms models & data absorbtion of datamart management frameworks fefidoms Required future Roadmap for Technology Review of current technology platforms, implementing target state platforms & capabilities Architecture vs business needs ecosystem and technologies, prioritised architecture by business benefit © Copyright 2011 EMC Corporation. All rights reserved. 17
  • 18. Old School New School Information Data Model - centric Business - centric • Driven by the Enterprise Data Model • Driven by business need to turn data into information, and (Corporate Information Factory) by Business-led projects (long- and short-term) • Huge effort and expense in transforming, • Little or no transformation - business logic is pushed out to cleansing and matching data (conformed the business. (eg the "Transformationless Warehouse", or dimensions etc) "Data Vault") • Big challenges and expenses in managing • Simple data lineage, reduced need for metadata metadata, data lineage, MDM integration management. Master Data is just another data source. • Different data sources can update the UAP at different • Data loads from multiple systems must be intervals, from trickle-feed to hourly/ nightly/ weekly/ coordinated and inter-dependencies monthly/ ad-hoc, as long as the users know when the last managed in the ETL scheduling tool and refresh occurred. Some datasets are "pointers" to external framework data sources - no replication. • Structured data • Structured, semi-structured and unstructured data • Often forced to work with subsets of data, or • Platform handles analytics on full datasets, unsummarised - forced to summarise data older than 'n' > much richer insights. (Wired Magazine: "The End of days/months/years Science") © Copyright 2011 EMC Corporation. All rights reserved. 18
  • 19. Old School New School Technology Constrained by Technology Empowered by Technology • Low cost of space and performance means teams can cycle • High cost of space and performance means queries and investigations much faster -> different way of access/use is rationalised/restricted working: more cycles -> more accurate results • Adding new data sources or developing new • Adding a new data source to the platform takes minutes, and data marts / subject areas typically takes the logic to integrate the data source is applied by the months business / analyst • Architecture is usually "scale up" - requires • Architecture is "scale out" - add capacity without down expensive offload-copy-restore when time. Possible to use "hybrid cloud" model to add capacity increasing capacity on demand during peak periods. • Dev, Test and DR environment require their • Dev, Test and DR can be virtual machines, provisioned and own servers, maintenance etc scaled on demand. • Processing is in ETL servers, in database, and • Processing is almost entirely in database. Data movement is in BI application servers. minimised. • Many orphan data marts on PCs, laptops, • Need for user-created marts is met on the Unified Analytics servers Platform. Safety with flexibility. © Copyright 2011 EMC Corporation. All rights reserved. 19
  • 20. Old School New School Processes IT-centric and Control-heavy Trust and enablement • Safety is in knowledge management, collaboration and • Safety is in IT control. peer review • Precision needed - must reconcile and must be • Approximate results may be acceptable (depending on exact. Gold standard applies to all data in the the business use case) enterprise data model • Enforce simplicity - hide complexity from the • Expose complexity; trust the team. Build and iterate business (dumb it down; drag and drop from a reports from whatever data sources you need (and are restricted semantic layer) authorised to access) • Emphasis on process - fill out the form, submit • Emphasis on Self service the request • Information enables forward-looking insights -> supports • Information supports "rear view mirror" innovation centres and business process re-engineering reporting on the past or tweaking • Analysts react to difficulty accessing data by • Analytical sandpits are supported on the UAP - logic creating copies of data in "off the radar" applied can be peer reviewed in the platform databases; logic applied is unauditable © Copyright 2011 EMC Corporation. All rights reserved. 20
  • 21. Old School New School People Information consumption Information-led Innovation (fixed reporting) (flexible exploring) • Focus is on standard reports for directors and • Reporting is so BAU it is not the focus; analysts managers (analysts get the leftovers) empowered to get creative and add much more value. • Business doesn't trust the warehouse (logic • Business has control of the logic and transformations (if applied in transformations is opaque) you don’t trust it… fix it yourself - you built it!) • Multiple data types and repositories (RDBMS, Hadoop, • Single platform, single RDBMS, with many "off text, logs) - must be accessible via an overlying single the radar" data marts interface/platform (UAP) • LOBs can collaborate using web 2.0/KM tools built into • LOBs working in silos the UAP • Tightly controlled data dictionaries and • Wiki-style approach for a “data asset registry” allows metadata management to preserve the Single collaborative and agile metadata management Source of Truth • "Power user" floats around training and • Data Scientist floats around educating and empowering troubleshooting © Copyright 2011 EMC Corporation. All rights reserved. 21
  • 22. © Copyright 2011 EMC Corporation. All rights reserved. 22
  • 23. Old School New School Agile Process & Tools Analytics Engines Analytic Engines Analytic Productivity Platform Technology & Information People & Processes © Copyright 2011 EMC Corporation. All rights reserved. 23
  • 24. Unified Analytics Platform - Customer Example: T-Mobile Greenplum Database + EDC Chorus 10 0 T B E n t e r p r i s e 1 P e ta b yte DW A n a ly t ic D W Greenplum Database + Chorus: ustomer Challenge: – Extracted data from EDW and other source systems to quickly assemble new analytic mart – 100TB EDW focused on operational reporting – Generated a social graph from call detail records and financial consolidation and subscriber data – EDW is single source of truth, under heavy – Within 2 weeks uncovered behavior where governance and control “connected” subscribers where 7X more likely to – Unable to support all of the critical initiatives churn than average user around data surrounding the business – Deployed1PB production EDC with GP to power – Customer loyalty and churn the #1 business their analytic initiatives initiative from the CEO on down © Copyright 2011 EMC Corporation. All rights reserved. 24
  • 25. T-Mobile Churn Analysis • Extracted data from EDW and other source systems into new analytic sandbox • Generated a social graph from call detail records and subscriber data • Within 2 weeks uncovered behavior where “connected” subscribers were seven times more likely to churn than average user • T-Mobile valued this insight at $70 million (for a $1 million investment in Greenplum). © Copyright 2011 EMC Corporation. All rights reserved. 25
  • 26. Information Management in the age of Big Data from to People & Information-led Skillsets Information consumption Innovation Processes & IT-centric and Business-centric; Methodology control-heavy empowerment and trust Information Architecture Data Model - centric Business needs - centric Technology Constrained by Empowered by Architecture Technology Technology © Copyright 2011 EMC Corporation. All rights reserved. 26
  • 27. Unified Analytics Platform - Customer Example: T-Mobile Greenplum Database + EDC Chorus 10 0 T B E n t e r p r i s e 1 P e ta b yte DW A n a ly t ic D W Greenplum Database + Chorus: ustomer Challenge: – Extracted data from EDW and other source systems to quickly assemble new analytic mart – 100TB EDW focused on operational reporting – Generated a social graph from call detail records and financial consolidation and subscriber data – EDW is single source of truth, under heavy – Within 2 weeks uncovered behavior where governance and control “connected” subscribers where 7X more likely to – Unable to support all of the critical initiatives churn than average user around data surrounding the business – Deployed1PB production EDC with GP to power – Customer loyalty and churn the #1 business their analytic initiatives initiative from the CEO on down © Copyright 2011 EMC Corporation. All rights reserved. 27
  • 28. T-Mobile Churn Analysis • Extracted data from EDW and other source systems into new analytic sandbox • Generated a social graph from call detail records and subscriber data • Within 2 weeks uncovered behavior where “connected” subscribers were seven times more likely to churn than average user • T-Mobile valued this insight at $70 million (for a $1 million investment in Greenplum). © Copyright 2011 EMC Corporation. All rights reserved. 28
  • 29. Traffic Network Modelling © Copyright 2011 EMC Corporation. All rights reserved. 29
  • 30. Parallel Model Learning • Solving tens of thousands of statistical modelling problems, one for each road in the city, in parallel: SELECT origin, dest, madlib.linregr(travel_time, array[peak_period(entry_time), … origin_vol, dest_vol]) FROM route_travel_info GROUP BY origin,dest; • A model: t(x) = 466 + 7.72 peakPeriod(x) + 22.5 workDay(x) + 0.378 originVol(x) + 0.691 destVol(x) © Copyright 2011 EMC Corporation. All rights reserved. 30
  • 31. Applications for a Traffic Network Model • Compute the shortest path between any two nodes at a future time point • Identify potential bottlenecks in the traffic network through betweenness centrality scores • Identify phase transition points for massive traffic congestion using simulation techniques • Study the likely impact of new roads and traffic policies like the proposed 40 km/hr speed limit in Sydney CBD © Copyright 2011 EMC Corporation. All rights reserved. 31
  • 32. the Big Data writing is on the wall… The Data Warehouse Institute (TDWI): • 50% of TDWI survey respondents will replace their DW platform in the next 3 years because: Cannot do Poor query response 45% advanced analysis Can’t support advanced analytics 40% Inadequate data load speed 39% Cannot handle Can’t scale up to large date volumes 37% big data Cost of scaling up is too expensive 33% volumes Poorly suited to real-time or on-demand workloads 29% Source: TDWI Next Gen Database Study, 2010 © Copyright 2011 EMC Corporation. All rights reserved. 32
  • 33. © Copyright 2011 EMC Corporation. All rights reserved. 33
  • 34. The Greenplum Unified Analytics Platform Data Data Data Bl LOB People DATA SCIENCE TEAM Scientist Engineer Analyst Analyst User Greenplum Chorus - Analytic Productivity Layer Processes 3rd Party/Partner Tools & Services Information Data Access & Query Layer Data Platform Greenplum Database Greenplum Hadoop Admin Technology Private/Hybrid Cloud Infrastructure or Appliance © Copyright 2011 EMC Corporation. All rights reserved. 34
  • 35. © Copyright 2011 EMC Corporation. All rights reserved. 35
  • 36. Information Management in the Age of Big Data Thank you © Copyright 2011 EMC Corporation. All rights reserved. 36

Editor's Notes

  1. Option to talk about EMC’s acquisition of Greenplum here. Key points: EMC were the first to start talking about “big data” If appropriate, tell the story of the term “Big Data”: I remember when Gartner first started publishing articles and papers on what they called “Extreme Data” – this was in the days when everything was “extreme”. There was Extreme Sports (people jumping out of airplanes on surfboards), extreme performance, everything was becoming “extreme”. In IT people were talking about “extreme programming” (which became Agile). and Gartner talked about Extreme Data. Joe Tucci of EMC had already been talking about what he called Big Data, and saw that Gartner were calling the same concept Extreme Data. Joe didn’t like the term Extreme – it didn’t accurately depict what we were talking about. So he took a gamble and stuck with the term Big Data. And eventually Gartner and the analysts stopped talking about Extreme Data and started using the EMC term, Big Data. [This story is useful as it carries the meta-message of EMC as thought leaders.] EMC were the first movers in acquiring a Big Data Analytics platform. We looked at Netezza (runs on proprietary hardware, can’t be virtualised); we looked at Teradata (too big, expensive, too established to be able to pivot their business model); we looked at Vertica (good for some use cases but not for unpredictible ad-hoc queries and deep atomic-level analysis); we looked at AsterData (only addresses part of the ‘big data’ challenge); looked at a number of others but there were some key reasons why Greenplum was the standout selection for a serious Big Data Analytics strategy.
  2. Simple definition of Big Data. Some analysts are talking about a 4 th attribute called “complexity” – however the complexity is more around the fact that we’re running more complex queries and analytics against these data sets – not really an attribute of the data itself. We’ll be looking at some examples of what each of these three terms mean.
  3. Big Data is massive new data volumes This is a typical Australian electricity bill. How often do I get one of these? (guess) – every 3 months. (Why? Because a person has to physically walk up to the meter and read it.) – very manual. Can only be done a few times a year. Now, the electricity company has a data warehouse which captures all their billing data. They use it to analyse usage patterns across parts of the network (and not much else). Their data warehouse might be 3 terabytes. Not huge. This is a smart meter. The smart meter provides readings directly to the utility via wireless or mobile phone networks – every 5 minutes. So instead of one reading per customer every three months, we can access a record per customer every 5 minutes. So the data has just grown 3000 times. (That’s big data). So suddenly the utility, to retain the same level of customer data, needs 9 petabytes of data warehouse. Most of it becomes exhaust data – but there is tremendous value in this data if you can keep it and analyse it. Network load analysis over time Better decisions on where to increase network capacity Real-time alerts when a particular power node is approaching saturation You can also provide the information back to your customers…
  4. This is the Silverspring web site, a screenshot taken about 2 weeks ago. Notice the promise here to the consumer – “See your energy use in real time” – you only want to promise that if your database can perform, can handle huge numbers of ad-hoc queries from consumers accessing your website 24 x 7. SilverSpring use Greenplum to capture all the readings from their smart meters into a database, and they make this data available to their customers.
  5. An example of what consumer-facing real-time electricity usage looks like.
  6. But it’s not just for the consumers – the real value for the utility is what they can do with the data themselves: Predictive maintenance Usage trends over time down to the suburb and street level Geo-spatial mappings over streets looking weather-related incidents, maintenance cost anomalies and so on. (I worked with a Utility in Sydney where, using their data warehouse, they were able to identify some motors and pumps that were starting and stopping several times an hour, and others that only cycled once or twice a day or week – so instead of blindly sending around a maintenance crew every 3 months, they could maintain some pumps every month and other pumps only twice a year. Savings of several $M)
  7. (an example of variety) Here’s the web site of an Australian bank. Every single click on this website is captured in a Greenplum database. The marketing team can then look at how people navigate around the site, what works, what doesn’t work, which ads attract more clicks, and then they can move things around to test how customers respond. Based on how users navigate through the site, and which adds are clicked on, the marketing team manage which adds appear where, looking for the best response. They do this every 24 hours. This example was presented in 2011 to the Australian Institute of Analytics Professionals as an example of real business use for web-click analytics.
  8. (an example of velocity) every trade on the NYSE is captured in a Greenplum database. 300,000 transactions per second. And then analysts have algorithms running on this data in real time, looking for patterns that suggest fraud, such as insider trading. This analysis requires atomic-level data (no summarisation – every trade) and many months of data to find the patterns they are looking for. The market regulator, FINRA, is also a Greenplum customer. They actually aggregate the trade data from many bourses, including Arca, NYSE Amex, NASDAQ, Euronext, and the International Securities Exchange (ISX). All these sources are aggregated, and FINRA to the analysis across all of them, looking for more sophisticated insider trading and other fraudulent activity that may be hidden across several exchanges.
  9. The purpose of this slide is to establish that it’s not (just) EMC who are saying this about data warehousing needing to change – this comes from Ralph Kimball, one of the fathers of data warehousing And here are some examples of the kind of analytics that can be run against different types of data, and the kind of insights you can expect to gain His point is that traditional data warehousing architectures don’t cater for these types of analytics.
  10. This is traditional data warehousing – basically hasn’t changed in 20 years. (“In the past when I was consulting and advising in Information Management Strategy, I used to use this fact to reassure the client – we are treading a well-worn path here – all the mistakes have been made, this is not new, it’s been around for 20 years – However the time has come that this approach does need to be revisited, as it’s not flexible and agile enough to meet the accelerating rates of business change, and increasing data complexity and volumes.”) On the left we have the “source systems” – usually transaction systems eg SAP, Oracle apps, Siebel, CRM… (in insurance you would have a policy system – or more than one – and one or more claims systems) (in a bank you would have the core banking transaction system, plus mortgage systems, plus credit card systems, plus margin lending, and so on) - to get a “single view of customer” you have to bring all this data together into one data model. Talk about Bill Inmon, “father of data warehousing” – the Corporate Information Factory – the ideal world in which all information assets across the organisation are brought together into a comprehensive Enterprise Data Model , and then any question about any aspect of the business can be answered by this magic data model. So: (click) We transform the data and conform it all into a Consolidated Data Repository or Integrated Data Model (or Enterprise Data Model). These are very complex data models. Can take years to design and build. Often a company will buy one from IBM or Teradata because it takes too long to design your own, cheaper to get one off the shelf. BUT then you have to integrate all your data sources into the data model – a lot of work. One bank in Singapore spent $10M to buy one of these data models, then had to spend another $14M integrating their data sources into it. (Big banks, insurers and other companies had good success with this approach – 20 years ago. And the approach has remained pretty much unchanged in 20 years.) But now, the data model is so complex that you can’t have the business people working with it directly – so you create a data mart layer with simpler data models. And then you expose these data marts to the analysts through Business Intelligence tools like Cognos, Business Objects, MicroStrategy, Tableu, Qlikview and so on. But the pace of business has changed – and data warehousing has not kept up. This creates a lot of pain in the world of information management. I think I can summarise the pain of data warehousing in a few points. ( Note to presenter : depending on the audience and the time available the following content has to be filtered) First, it’s expensive (we know that, but that fact drives certain behaviours that have a detrimental effect on the ability to get value from this asset.) A typical project in data warehousing starts from a few hundred thousand and ranges up to the millions. (We used to joke: When the business comes to us with a question, the answer to the question is always the same: 6 months and half a million dollars. That’s how long it will take to enhance the EDW to answer the question) It’s expensive just in terms of raw hardware costs and licensing – often a multi-million dollar investment to kick off, and then ongoing annual licence fees for support. It’s also expensive to “feed and water” – the resources that are needed to troubleshoot data loads, create or manage partitions and indexes, tune queries and so on. And it’s expensive to develop and enhance the data model – a typical BI project on an Enterprise Data Warehouse starts at a few hundred thousand dollars and 3+ months of design, development and testing – and in a large site can easily be in the millions or tens of millions. (Cite an example or two – we are talking with a bank at the moment about a risk project in response to new requirements from the regulator – they are already throwing around figures as high as $80 million – and this does not create any new data, it just reports on data they already have.) Because of this cost, the usefulness of the DW is constrained. often, data is summarised after a certain point because we can’t use too much storage - it’s too expensive (when I worked at a Telco, we would keep two months of atomic level data, and then an end of month process would summarise the third month into a much smaller data set – meaning we lost a huge amount of valuable detail in the call data records. Why did we do this? The DW wasn’t big enough to keep the data, and getting more TB was just too expensive) So we compromise on the quality and depth of the data we can keep and the analysis we can provide, because of this cost factor. Analytics projects often need temporary space on the DW to work with subsets of data – in our case there were always projects competing for space. Marketing campaigns would have to reserve a few hundred GB between specified dates, and if another project ran over it’s end date, there would be a fight over who gets the space – often came down to raw dollar impact – if I kill the project now to give you the space, the impact is x million dollars; but if marketing don’t run their campaign, they lose y dollars in sunk cost and z million dollars in opportunity cost… and so on. Why couldn’t we just get more space? It’s too expensive and takes too long. Finally, the cost of the DW meant that it was rigidly protected. Access to the data required certain authorisations , sometimes even just to run a query needed certain permissions – it slowed everyone down. So if an analyst had a great idea about running some scenarios over a market segment, they had to decide whether it was worth going through the effort of getting permission to run the query (that could take half a day), then once they’ve run it they want to tweak it a little here and there, resubmit, tinker a bit, resubmit, and on and on. So the usefulness and value to the business of this great investment was constrained by the fact that it was such a huge investment. Second, enhancements and new projects take far too long. Time to value is just too great. Often by the time a project is completed, the business has moved on and the requirements have changed. One of the reasons for this is the use of rigid, sometimes purchased industry or enterprise data models. New data sources, if needed for a new reporting subject area (whether say bringing Risk data into the Enterprise Financial data model, or bringing a new customer information source into the marketing datamart) need to be mapped into the enterprise data model, then ETL jobs need to be designed, developed and tested, there needs to be impact analysis on existing reports to see if any are affected, and if so there’s another project to remediate those reports so that the business is not disrupted when the new project goes live. All this is not just expensive but takes a long time. Third , the barriers to easy access to the protected DW drives analysts and users to hive off and create their own data sets which they have unrestricted access to. When it’s hard to get access to a dataset, or hard to get some space in the EDW for a sand-box, the user community will preserve copies of the data that matter most to them, in local data stores in Excel or Access, or keep their own data marts in SQL Server or Oracle. These data sub-sets are irregularly refreshed, not subject to quality controls and are not auditable . Yet they are often used to derive figures that go to the board or to regulatory authorities, or to manage a whole reporting area such as Risk – and the numbers from these data repositories are passed to report assemblers who put these figures into monthly board reports, annual reports or regulatory reports (for example). The whole purpose of the data warehouse was to retire these fragmented data repositories and keep the data centralised where its quality can be assured, and yet the difficulty of getting to the data in the controlled environment drives the user community to bypass those very safeguards. And… all of these problems are about to get worse! (transition to next slide)
  11. (10 minute mark) (so where we were just managing or surviving with the data we already have, we’re about to be deluged with a tsunami of Big Data.)
  12. Big Data will revolutionise DW and analytics. In summary, these are the challenges that traditional data warehousing faces from Big Data and Big Data Analytics.
  13. There’s a lot on this slide and you can’t talk about all of it. The purpose of the slide is to impress the audience that we (EMC) have done a lot of deep and comprehensive thinking on how Information Management is going to change as the result of Big Data. Big Data doesn’t need to be a threat – there is a journey from where we are to where we need to go, there are companies at various stages along this journey, and we are helping many of them. It’s a holistic journey that isn’t just about technology – it’s about all of these (4) layers. Big Data and Big Data Analytics has implications for all of these aspects of an Information Management Strategy. IM Strategy has to address each of these aspects – where we are now, where we believe we need to be, and then how we are going to get there. I want to take some time to look at each of these aspects and highlight what that journey looks like in some detail.
  14. Key points – talk down the first column then the second (not row by row). Read up on the model-less warehouse or transformationless warehouse, and on data vault. This is relatively new and there will be old school people who have their whole careers and credibility built on the old way of doing things. You can’t dismiss that – it’s a both-and, not either-or (see later slides). If it gets confrontational, offer to take it “offline” and talk in person later. The defense is that it’s not EMC saying this, Kimball is saying it, the thought leaders are saying it. Kimball has a paper on this – offer to send it to the audience. If you know the “end of science” story you can tell that as well.
  15. This slide you can talk to row by row.
  16. There are other points that could be added – the slide does not have all of them. eg Tightly controlled data dictionary -> wiki as data asset inventory Classic waterfall report development processes -> iterative prototypes with frequent review cycles in the UAP itself Defined / pre-canned reports -> many one-off queries for data exploration
  17. The scope of the traditional data warehouse will shrink to those subject areas that require and justify the heavy governance that has traditionally been used to manage highly important information that needs to be accurate to the unit or to the cent. Eg reporting to the market, to the CFO, to regulators, to the board. (There will still need to be data integration eg if a bank has credit card holders who are also mortgage holders, they need to report on total exposure. Or if a telco has pre-paid and post-paid customers, they need to report to regulators on the size of their customer base) But much of what the data warehouse is used for does not require this rigour. And almost all of what Big Data is used for does not require this rigour. So we are seeing the emergence of a different kind of platform – the Analytical Platform for Big Data.
  18. (this slide teases out the points from the last one) Don’t spend time on this slide – it’s just a transition to introduce the Unified Analytics Platform (It’s important here not to go into detail on describing the analytical platform because there are slides following to do that – otherwise the rest of the presentation becomes redundant/repetitive)
  19. Here’s an example of a customer who used the Analytics Platform for some analysis that couldn’t run on the EDW (as it would have interrupted important BAU reports – can’t run full table scans on the entire call record history – everything else grinds to a halt.)
  20. (Note, this slide needs to go, the story is too old – replace with your favourite big data value story)
  21. this slide is to transition into a Greenplum-specific conversation – much of this change is internal (and we can provide advice and consulting to assist) – but in the technology layer we can provide solutions.
  22. Here’s an example of a customer who used the Analytics Platform for some analysis that couldn’t run on the EDW (as it would have interrupted important BAU reports – can’t run full table scans on the entire call record history – everything else grinds to a halt.)
  23. Is BCC using something like that? How are the travel time functions determined?
  24. Travel time function for road connection 10491 to area 10784
  25. Travel time function for road connection 10491 to area 10784
  26. The growth trajectory of data has already surpassed the capability of today ’s databases to adequately store and efficiently process them. We are seeing a fault line developing and widening rapidly between the demand and supply for better technology solutions to handle the data explosion. Final pitch – organisations can already see the writing on the wall – this survey was taken in 2011. (TDWI = the Data Warehouse Institute)
  27. So this is what the stack looks like from a high level. There’s the Greenplum database for co-processing structured, semi-structured, and unstructured data with Greenplum Hadoop. These are overlaid with a unified data access and query layer that combines the programming languages of choice (SQL, MapReduce, Etc.). Over the access layer comes our partner tool and services layer. We are not about locking customers into a single tool or stack. Instead we work with the tool vendor of your choice, be it SAS or R, Microstrategy or informatica. And sitting atop all of that technologies is the Chorus layer, which provides productivity tools to facilitate collaboration between the different stakeholders. What sets this diagram apart from a typically vendor example is the inclusion of people. That is not a mistake. We have introduced the Unified Analytics Platform but there is more to the story than technology and I will talk more about that in a few minutes. UAP is about enabling data analytics practitioners to access and manage datasets and projects much more easily. A typical such team can include the data platform administrator, data scientist, analysts, engineers, BI teams, and most importantly the line of business user and how they participate on this data science team. We develop, package, and support this as a unified software platform available over your favorite commodity hardware, cloud infrastructure, or from our modular Data Computing Appliance.
  28. Key to the success of the new approach to Information Management is the ability to collaborate and share knowledge within the same environment that manages the sand-pits and datasets.
  29. Option to talk about EMC’s acquisition of Greenplum here. Key points: EMC were the first to start talking about “big data” If appropriate, tell the story of the term “Big Data”: I remember when Gartner first started publishing articles and papers on what they called “Extreme Data” – this was in the days when everything was “extreme”. There was Extreme Sports (people jumping out of airplanes on surfboards), extreme performance, everything was becoming “extreme”. In IT people were talking about “extreme programming” (which became Agile). and Gartner talked about Extreme Data. Joe Tucci of EMC had already been talking about what he called Big Data, and saw that Gartner were calling the same concept Extreme Data. Joe didn’t like the term Extreme – it didn’t accurately depict what we were talking about. So he took a gamble and stuck with the term Big Data. And eventually Gartner and the analysts stopped talking about Extreme Data and started using the EMC term, Big Data. [This story is useful as it carries the meta-message of EMC as thought leaders.] EMC were the first movers in acquiring a Big Data Analytics platform. We looked at Netezza (runs on proprietary hardware, can’t be virtualised); we looked at Teradata (too big, expensive, too established to be able to pivot their business model); we looked at Vertica (good for some use cases but not for unpredictible ad-hoc queries and deep atomic-level analysis); we looked at AsterData (only addresses part of the ‘big data’ challenge); looked at a number of others but there were some key reasons why Greenplum was the standout selection for a serious Big Data Analytics strategy.