SlideShare a Scribd company logo
1 of 53
Get Ready for Big Data
               Wednesday November 30, 2011
                       2:40 – 4:00


Peter O'Kelly
  Principal Analyst, O'Kelly Associates
Hadley Reynolds
  Managing Director, Next Era Research
Kathleen Reidy
  Senior Analyst, 451 Research
Agenda
•   Big data in context
•   Big structured data
•   Big unstructured data
•   Big opportunities and risks
•   Q&A




                                  2
Big Data in Context
• What is “big data”?
  – Unhelpfully, both “big data” and “NoSQL,” generally
    considered a key part of the big data wave, are
    defined more in terms of what they’re not than
    what they are
  – A typical big data definition (Wikipedia):
     • “*…+ datasets that grow so large that they become
       awkward to work with using on-hand database
       management tools”


                                                           3
Big Data in Context
• With thanks to the Business SOA blog:
  – “*…+ describe Big Data in the same way that
    the Hitchhikers Guide to the Galaxy described space:
        – ‘Space,’ it says, ‘is big. Really big. You just won't believe how
          vastly, hugely, mindbogglingly big it is. I mean, you may think it's
          a long way down the road to the chemist's, but that's just
          peanuts to space, listen...’”




                                                                           4
Big Data in Context
• Why is big data a big deal now?
   – Commodity hardware and the Internet
       • Capability and price/performance curves that continue to defy all
         economic “laws”
       • Also facilitating compelling cloud services
   – Maturation and uptake of open source software, e.g., Hadoop
       • Powerful and often no- or low-cost
   – IT market
       • Enthusiasm for “NoSQL” systems
       • Frustration with incumbent information management vendors
   – Useful new data sources/resources, e.g., social network activity
     graphs, the “Internet of things,” sensor networks…
   – Competitive and compliance imperatives


                                                                             5
Big Data in Context
• A big data reality check
   – “Mindbogglingly”-scale information management is not new
      • Consider, e.g., VLDB, multi-billion document repositories, and the
        World Wide Web…
   – What is new and compelling
      • The combination of market dynamics producing new capability and
        price/performance curves
      • Cloud
          – No deep capital investment required to get started
          – Cloud-based information resources
      • Some innovative marketing, suggesting
          – Self-proclaimed next-generation big data systems are magical and
            revolutionary
          – Deployed systems are obsolete and wasteful



                                                                               6
A Big-Picture Framework
• A digital information item dichotomy
  – Resources (~unstructured information)
     • Digital artifacts optimized to convey stories
        – Organized in terms of narrative, hierarchy, and sequence
     • Examples: books, magazines, documents (e.g., PDF,
       Word), Web pages, XBRL documents, video, hypertext…
  – Relations (~structured information)
     • Application-independent descriptions of real-world
       things and relationships
     • Examples: business domain databases, e.g., customer,
       sales, HR…

                                                                     7
A Big-Picture Framework


 Resource    Relation




                          8
A Big-Picture Framework

                       Resources                         Relations


Conceptual        Resources and links               Entities, attributes,
                                               relationships, and identifiers

Logical           Model: hypertext              Model: extended relational
              Language: XQuery (ideally)             Language: SQL

Physical       Indexing (e.g., scalar data types, XML, full-text), locking and
             isolation levels, federation, replication, in-memory databases,
                   columnar storage, table spaces, caching, and more




                                                                                 9
Agenda
•   Big data in context
•   Big structured data
•   Big unstructured data
•   Big opportunities and risks
•   Q&A




                                  10
Big Structured Data
•   NoSQL
•   Hadoop
•   RDBMS reconsidered
•   Back to the bigger picture




                                   11
NoSQL
• No clear consensus on what “NoSQL” means
  – Started with what it’s against, not what it’s about
     • And often finds a receptive audience due to frustration
       with RDBMS business-as-usual
  – The “NoSQL” meme is a moving target
     • Initially implied “Just say ‘no’ to SQL”
     • Later quietly redefined as “Not Only SQL”
     • What may be next: “New Opportunities for SQL”
        – I.e., some developers may reconsider the value of SQL and
          RDBMSs, after hitting NoSQL limitations

                                                                      12
A NoSQL Taxonomy
• From the NoSQL Wikipedia article:




                                      13
NoSQL Perspectives
• The “NoSQL” meme confusingly conflates
   – Document database requirements
      • Best served by XML DBMS (XDBMS)
   – Physical model decisions on which only DBAs and systems
     architects should focus
      • And which are more complementary than competitive with
        RDBMS/XDBMS
   – Object databases, which have floundered for decades
      • But with which some application developers are nonetheless
        enamored, for minimized “impedance mismatch,” despite
        significant information management compromises
   – Semantic models
      • Also more complementary than competitive with RDBMS/XDBMS


                                                                     14
Hadoop
• Hadoop is often considered central to big data
   – Originating with Google’s MapReduce architecture, Apache
     Hadoop is an open source architecture for distributed
     processing on networks of commodity hardware
• Commercial application domains include (from Wikipedia)
   –   Log and/or clickstream analysis of various kinds
   –   Marketing analytics
   –   Machine learning and/or sophisticated data mining
   –   Image processing
   –   Processing of XML messages
   –   Web crawling and/or text processing
   –   General archiving, including of relational/tabular data, e.g. for
       compliance


                                                                           15
Hadoop
• Hadoop is popular and rapidly evolving
  – Most leading information management vendors,
    including Microsoft, have embraced Hadoop
  – There is now a Hadoop ecosystem




                                                   16
RDBMS Reconsidered
• RDBMS incumbents appear to be under siege, with
   – IT frustration with RDBMS business-as-usual
      • Counterproductive RDBMS vendor policies and attitudes
      • DBA modus operandi often seen as excessively conservative
   – Conventional wisdom about RDBMS limitations for, e.g.,
      • “Web scale”
      • “Agility”
      • The application/database “impedance mismatch”
   – The advent of open source and/or specialized DBMSs
      • E.g., MySQL is the M in the “LAMP stack”
      • “The end of the one-size-fits-all DBMS era”



                                                                    17
RDBMS Reconsidered
• An RDBMS reality check
  – Leading RDBMS products and open source initiatives are
    very powerful and flexible
     • And will continue to evolve, e.g., with the mainstream deployment
       of massive-memory servers and solid state disk (SSD) storage
  – And they continue to expand
     • E.g., in-database processing, with, for example, analytics engines
       running within DBMS kernels
  – But the RDBMS incumbents nonetheless face
    unprecedented challenges
     • Which sometimes resonate with frustrated architects and
       developers because of negative experiences that have more to do
       with how RDBMSs were used rather than what RDBMSs can
       effectively address


                                                                            18
RDBMS in the Big-Picture Framework

                       Resources                         Relations


Conceptual        Resources and links               Entities, attributes,
                                               relationships, and identifiers

Logical            Model: hypertext             Model: extended relational
                   Language: XQuery                  Language: SQL

Physical       Indexing (e.g., scalar data types, XML, full-text), locking and
             isolation levels, federation, replication, in-memory databases,
                   columnar storage, table spaces, caching, and more




                                                                                 19
RDBMS Reconsidered
• A Forrester big data reality check (from “Stay
  Alert To Database Technology Innovation,”
  11/19/2010):
  – “For 90% of BI use cases, which are often less than
    50 terabytes in size, relational databases still are
    good enough” (p. 4)
  – “Traditional relational databases are still good
    enough for the majority of transactional use
    cases” (p. 5)

                                                       20
Back to the Bigger Picture
• Compared with traditional enterprise data
  management, big data is
  – Essentially a collection of specialized physical
    models for very large, analysis-oriented data
    management
  – Expanding to encompass resources as well as
    relations
  – More about the potential for displacing expensive
    and closed/proprietary distributed processing
    alternatives than displacing RDBMS or XDBMS

                                                    21
Structured Big Data: Recap
• Substantive, sustainable, and synergistic
  – RDBMS
  – XDBMS
  – Hadoop
  – The cloud as an information management
    platform
• Vaguely defined, transitory, and over-hyped
  – NoSQL

                                                22
Agenda
•   Big data in context
•   Big structured data
•   Big unstructured data
•   Big opportunities and risks
•   Q&A




                                  23
Big Unstructured Data
• Finding Facts about Data – IDC/EMC
• Patterns for Unstructured Big Data
• How-to issues – who will know?




                                       24
http://www.emc.com/leadership/programs/digital-universe.htm   25
26
27
4/28/2011   28
29
30
4/28/2011   31
32
33
Facebook:
800M users
500M visitors/day
                              34
$100B potential value @ IPO
http://inmaps.linkedinlabs.com/   35
Unstructured Big Data Patterns
•   Search
•   Social
•   Mobile
•   Online Activities/Digital Marketing
•   Inquiry/Detection – Connecting Dots
•   Question Answering



                                          36
Mobile Adds:

Location data points
Voice searches
Siri questions
App history profile
Browse history profile
Search history profile
Past purchase profile
Camera-generated outputs/inputs
Coupon delivery & merchandising
Friends' locations
Social search
Local ad-match algo opportunities

                                    37
4/28/2011   38
Online Activities/Digital Marketing




                                      39
• Inquiry/Detection – Connecting Dots
  – Intelligence
  – Law Enforcement
  – Fraud Detection (Government, Financial, Health, …)
  – eDiscovery
                                                    40
Social Media Monitoring




                          41
Question Answering




 4/28/2011           42
Question Answering Beyond Jeopardy




                                     43
Twitter Analytics Questions
• What can we tell about a user from their tweets?
    –   from the tweets of those they follow?
    –   from the tweets of their followers?
    –   from the ratio of followers/following
•   What graph structures lead to successful networks?
•   User reputation?
•   Sentiment analysis?
•   What features get a tweet retweeted?
    –   How deep is the retweet tree?
• Long term duplicate detection
• Machine learning
• Language detection
                                                         44
45
46
http://www.mckinsey.com/en/Features/Big_Data.aspx
Agenda
•   Big data in context
•   Big structured data
•   Big unstructured data
•   Big opportunities and risks
•   Q&A




                                  47
Big Data Opportunities
• Improved visibility and insights
   – Can explore previously impractical questions
• Real-time analytics
   – Less dependence on “dead data”
• Blur the boundaries between structured and
  unstructured information
   – Unified views of resources and relations
• Consolidation
   – Reduce the number of moving parts in your infrastructure
      • Along with related licensing and maintenance expenses
• Compliance – capture and maintain data & records
  previously beyond firm's capabilities

                                                                48
Big Data Risks
• The potential for an ever-expanding set of information silos
   – Critical to relentlessly focus on minimized redundancy and
     optimized integration
• GIGO (garbage in, garbage out) at super-scale
   – Dramatic improvements in capabilities and price/performance
     provide new opportunities for self-inflicted damage, for
     organizations that don’t model or query effectively
• Cognitive overreach
   – The potential for information workers to create nonsensical
     queries based on poorly-designed and/or misunderstood
     information models
• Skills gaps create competitive disadvantages



                                                                   49
Q&A


Peter O'Kelly - peter@okellyassociates.com
Kathleen Reidy - kathleen.reidy@451Research.com
Hadley Reynolds - hadley.reynolds@nexteraresearch.com




                                                        50
Database market landscape
                                                                  Relational
                                     Analytic        Mapr       Infobright Netezza ParAccel SAP Sybase IQ
    Non-relational
                                    Piccolo       Hadoop      Teradata     EMC             IBM InfoSphere
                                     Dryad       Brisk                     Greenplum
                                                          Hadapt Aster Data Calpont VectorWise HP Vertica

     Operational      Progress                                 Oracle    IBM DB2      SQL Server JustOne
  InterSystems                            MarkLogic           MySQL          Ingres              PostgreSQL
 Objectivity         Document
                     Lotus Notes          McObject             SAP Sybase ASE            EnterpriseDB
      Versant
     NoSQL           CouchDB                                     NewSQL      HandlerSocket        Akiban
    Key value          MongoDB          -as-a-Service                                          MySQL Cluster
                                                               Amazon RDS
            Couchbase RavenDB Cloudant           App Engine               SQL Azure                  Clustrix
    Riak                                         Datastore     Database.com
           Redis                                                                               Drizzle
                     Big tables                                Xeround     FathomDB                  GenieDB
    Membrain                                  SimpleDB
                                                                                                   ScalArc
            Cassandra
    Voldemort     Hypertable              Graph                 Schooner MySQL         CodeFutures
                                              InfiniteGraph      Tokutek           ScaleBase     NimbusDB
    BerkeleyDB       HBase                    Neo4J                Continuent
                                                  GraphDB                       Translattice      VoltDB




Data Grid/Cache                     Terracotta       GigaSpaces Oracle Coherence                        Memcached
 IBM eXtreme Scale       GridGain     ScaleOut        Vmware GemFire        InfiniSpan              CloudTran
Big Data Complexity Continuum


                                                                                                           Climate Modeling     Gov’t Intelligence
                                                                                                             And Prediction       Applications
                                       Predictions
                                                                           Trend
                                                                          Analytics                                    Medical
Number & Complexity of Technologies




                                                                                                                      diagnostics
                                                                                                       Fraud
                                                                                                      Detection
                                                                                               Influence
                                                           Voice of Customer                   Networks
                                      Sentiment extraction
                                      Relationship                                                            Ad Targeting
                                                                                        Reputation             Retargeting
                                      Detection                                        management
                                                                                   Brand
                                                                                  monitoring

                                                                                                Intelligent
                                                                Web search
                                                                                                 Machines

                                       Pattern                          Log Analysis
                                                       Data mining                                                                     eCommerce
                                       Detection

                                      Speech to text
                                       Time          Historic                                                                         Future(Predict)
                                       52                                       Current (Monitor)
                                      Horizon                                                                                                    IDC 2005
Big Data Characteristics



        Velocity             Value

                    Big
                    Data

                            Variety/
         Volume
                           Complexity

© IDC                                   12/2/2011

More Related Content

What's hot

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageDebasish Ghosh
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014bigdatagurus_meetup
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Rob Grim
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Big data presentation
Big data presentationBig data presentation
Big data presentationChinh Vo Wili
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 

What's hot (20)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Big Data - architectural concerns for the new age
Big Data - architectural concerns for the new ageBig Data - architectural concerns for the new age
Big Data - architectural concerns for the new age
 
Dbm630_Lecture02-03
Dbm630_Lecture02-03Dbm630_Lecture02-03
Dbm630_Lecture02-03
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
P1 capitulo 5
P1 capitulo 5P1 capitulo 5
P1 capitulo 5
 
Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...Going local with a world-class data infrastructure: Enabling SDMX for researc...
Going local with a world-class data infrastructure: Enabling SDMX for researc...
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Hadoop
HadoopHadoop
Hadoop
 

Similar to Gilbane Boston 2011 big data

No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Big Data Warehousing Meetup with Riak
Big Data Warehousing Meetup with RiakBig Data Warehousing Meetup with Riak
Big Data Warehousing Meetup with RiakCaserta
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataMelissa Hornbostel
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 

Similar to Gilbane Boston 2011 big data (20)

No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Big Data Warehousing Meetup with Riak
Big Data Warehousing Meetup with RiakBig Data Warehousing Meetup with Riak
Big Data Warehousing Meetup with Riak
 
UNIT-2.pptx
UNIT-2.pptxUNIT-2.pptx
UNIT-2.pptx
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 

More from Peter O'Kelly

Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Peter O'Kelly
 
T3 marketing automation and big data
T3 marketing automation and big dataT3 marketing automation and big data
T3 marketing automation and big dataPeter O'Kelly
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information qualityPeter O'Kelly
 
Gilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetGilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetPeter O'Kelly
 
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Peter O'Kelly
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaPeter O'Kelly
 

More from Peter O'Kelly (6)

Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
 
T3 marketing automation and big data
T3 marketing automation and big dataT3 marketing automation and big data
T3 marketing automation and big data
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
 
Gilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetGilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead Yet
 
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Gilbane Boston 2011 big data

  • 1. Get Ready for Big Data Wednesday November 30, 2011 2:40 – 4:00 Peter O'Kelly Principal Analyst, O'Kelly Associates Hadley Reynolds Managing Director, Next Era Research Kathleen Reidy Senior Analyst, 451 Research
  • 2. Agenda • Big data in context • Big structured data • Big unstructured data • Big opportunities and risks • Q&A 2
  • 3. Big Data in Context • What is “big data”? – Unhelpfully, both “big data” and “NoSQL,” generally considered a key part of the big data wave, are defined more in terms of what they’re not than what they are – A typical big data definition (Wikipedia): • “*…+ datasets that grow so large that they become awkward to work with using on-hand database management tools” 3
  • 4. Big Data in Context • With thanks to the Business SOA blog: – “*…+ describe Big Data in the same way that the Hitchhikers Guide to the Galaxy described space: – ‘Space,’ it says, ‘is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space, listen...’” 4
  • 5. Big Data in Context • Why is big data a big deal now? – Commodity hardware and the Internet • Capability and price/performance curves that continue to defy all economic “laws” • Also facilitating compelling cloud services – Maturation and uptake of open source software, e.g., Hadoop • Powerful and often no- or low-cost – IT market • Enthusiasm for “NoSQL” systems • Frustration with incumbent information management vendors – Useful new data sources/resources, e.g., social network activity graphs, the “Internet of things,” sensor networks… – Competitive and compliance imperatives 5
  • 6. Big Data in Context • A big data reality check – “Mindbogglingly”-scale information management is not new • Consider, e.g., VLDB, multi-billion document repositories, and the World Wide Web… – What is new and compelling • The combination of market dynamics producing new capability and price/performance curves • Cloud – No deep capital investment required to get started – Cloud-based information resources • Some innovative marketing, suggesting – Self-proclaimed next-generation big data systems are magical and revolutionary – Deployed systems are obsolete and wasteful 6
  • 7. A Big-Picture Framework • A digital information item dichotomy – Resources (~unstructured information) • Digital artifacts optimized to convey stories – Organized in terms of narrative, hierarchy, and sequence • Examples: books, magazines, documents (e.g., PDF, Word), Web pages, XBRL documents, video, hypertext… – Relations (~structured information) • Application-independent descriptions of real-world things and relationships • Examples: business domain databases, e.g., customer, sales, HR… 7
  • 8. A Big-Picture Framework Resource Relation 8
  • 9. A Big-Picture Framework Resources Relations Conceptual Resources and links Entities, attributes, relationships, and identifiers Logical Model: hypertext Model: extended relational Language: XQuery (ideally) Language: SQL Physical Indexing (e.g., scalar data types, XML, full-text), locking and isolation levels, federation, replication, in-memory databases, columnar storage, table spaces, caching, and more 9
  • 10. Agenda • Big data in context • Big structured data • Big unstructured data • Big opportunities and risks • Q&A 10
  • 11. Big Structured Data • NoSQL • Hadoop • RDBMS reconsidered • Back to the bigger picture 11
  • 12. NoSQL • No clear consensus on what “NoSQL” means – Started with what it’s against, not what it’s about • And often finds a receptive audience due to frustration with RDBMS business-as-usual – The “NoSQL” meme is a moving target • Initially implied “Just say ‘no’ to SQL” • Later quietly redefined as “Not Only SQL” • What may be next: “New Opportunities for SQL” – I.e., some developers may reconsider the value of SQL and RDBMSs, after hitting NoSQL limitations 12
  • 13. A NoSQL Taxonomy • From the NoSQL Wikipedia article: 13
  • 14. NoSQL Perspectives • The “NoSQL” meme confusingly conflates – Document database requirements • Best served by XML DBMS (XDBMS) – Physical model decisions on which only DBAs and systems architects should focus • And which are more complementary than competitive with RDBMS/XDBMS – Object databases, which have floundered for decades • But with which some application developers are nonetheless enamored, for minimized “impedance mismatch,” despite significant information management compromises – Semantic models • Also more complementary than competitive with RDBMS/XDBMS 14
  • 15. Hadoop • Hadoop is often considered central to big data – Originating with Google’s MapReduce architecture, Apache Hadoop is an open source architecture for distributed processing on networks of commodity hardware • Commercial application domains include (from Wikipedia) – Log and/or clickstream analysis of various kinds – Marketing analytics – Machine learning and/or sophisticated data mining – Image processing – Processing of XML messages – Web crawling and/or text processing – General archiving, including of relational/tabular data, e.g. for compliance 15
  • 16. Hadoop • Hadoop is popular and rapidly evolving – Most leading information management vendors, including Microsoft, have embraced Hadoop – There is now a Hadoop ecosystem 16
  • 17. RDBMS Reconsidered • RDBMS incumbents appear to be under siege, with – IT frustration with RDBMS business-as-usual • Counterproductive RDBMS vendor policies and attitudes • DBA modus operandi often seen as excessively conservative – Conventional wisdom about RDBMS limitations for, e.g., • “Web scale” • “Agility” • The application/database “impedance mismatch” – The advent of open source and/or specialized DBMSs • E.g., MySQL is the M in the “LAMP stack” • “The end of the one-size-fits-all DBMS era” 17
  • 18. RDBMS Reconsidered • An RDBMS reality check – Leading RDBMS products and open source initiatives are very powerful and flexible • And will continue to evolve, e.g., with the mainstream deployment of massive-memory servers and solid state disk (SSD) storage – And they continue to expand • E.g., in-database processing, with, for example, analytics engines running within DBMS kernels – But the RDBMS incumbents nonetheless face unprecedented challenges • Which sometimes resonate with frustrated architects and developers because of negative experiences that have more to do with how RDBMSs were used rather than what RDBMSs can effectively address 18
  • 19. RDBMS in the Big-Picture Framework Resources Relations Conceptual Resources and links Entities, attributes, relationships, and identifiers Logical Model: hypertext Model: extended relational Language: XQuery Language: SQL Physical Indexing (e.g., scalar data types, XML, full-text), locking and isolation levels, federation, replication, in-memory databases, columnar storage, table spaces, caching, and more 19
  • 20. RDBMS Reconsidered • A Forrester big data reality check (from “Stay Alert To Database Technology Innovation,” 11/19/2010): – “For 90% of BI use cases, which are often less than 50 terabytes in size, relational databases still are good enough” (p. 4) – “Traditional relational databases are still good enough for the majority of transactional use cases” (p. 5) 20
  • 21. Back to the Bigger Picture • Compared with traditional enterprise data management, big data is – Essentially a collection of specialized physical models for very large, analysis-oriented data management – Expanding to encompass resources as well as relations – More about the potential for displacing expensive and closed/proprietary distributed processing alternatives than displacing RDBMS or XDBMS 21
  • 22. Structured Big Data: Recap • Substantive, sustainable, and synergistic – RDBMS – XDBMS – Hadoop – The cloud as an information management platform • Vaguely defined, transitory, and over-hyped – NoSQL 22
  • 23. Agenda • Big data in context • Big structured data • Big unstructured data • Big opportunities and risks • Q&A 23
  • 24. Big Unstructured Data • Finding Facts about Data – IDC/EMC • Patterns for Unstructured Big Data • How-to issues – who will know? 24
  • 26. 26
  • 27. 27
  • 28. 4/28/2011 28
  • 29. 29
  • 30. 30
  • 31. 4/28/2011 31
  • 32. 32
  • 33. 33
  • 34. Facebook: 800M users 500M visitors/day 34 $100B potential value @ IPO
  • 36. Unstructured Big Data Patterns • Search • Social • Mobile • Online Activities/Digital Marketing • Inquiry/Detection – Connecting Dots • Question Answering 36
  • 37. Mobile Adds: Location data points Voice searches Siri questions App history profile Browse history profile Search history profile Past purchase profile Camera-generated outputs/inputs Coupon delivery & merchandising Friends' locations Social search Local ad-match algo opportunities 37
  • 38. 4/28/2011 38
  • 40. • Inquiry/Detection – Connecting Dots – Intelligence – Law Enforcement – Fraud Detection (Government, Financial, Health, …) – eDiscovery 40
  • 44. Twitter Analytics Questions • What can we tell about a user from their tweets? – from the tweets of those they follow? – from the tweets of their followers? – from the ratio of followers/following • What graph structures lead to successful networks? • User reputation? • Sentiment analysis? • What features get a tweet retweeted? – How deep is the retweet tree? • Long term duplicate detection • Machine learning • Language detection 44
  • 45. 45
  • 47. Agenda • Big data in context • Big structured data • Big unstructured data • Big opportunities and risks • Q&A 47
  • 48. Big Data Opportunities • Improved visibility and insights – Can explore previously impractical questions • Real-time analytics – Less dependence on “dead data” • Blur the boundaries between structured and unstructured information – Unified views of resources and relations • Consolidation – Reduce the number of moving parts in your infrastructure • Along with related licensing and maintenance expenses • Compliance – capture and maintain data & records previously beyond firm's capabilities 48
  • 49. Big Data Risks • The potential for an ever-expanding set of information silos – Critical to relentlessly focus on minimized redundancy and optimized integration • GIGO (garbage in, garbage out) at super-scale – Dramatic improvements in capabilities and price/performance provide new opportunities for self-inflicted damage, for organizations that don’t model or query effectively • Cognitive overreach – The potential for information workers to create nonsensical queries based on poorly-designed and/or misunderstood information models • Skills gaps create competitive disadvantages 49
  • 50. Q&A Peter O'Kelly - peter@okellyassociates.com Kathleen Reidy - kathleen.reidy@451Research.com Hadley Reynolds - hadley.reynolds@nexteraresearch.com 50
  • 51. Database market landscape Relational Analytic Mapr Infobright Netezza ParAccel SAP Sybase IQ Non-relational Piccolo Hadoop Teradata EMC IBM InfoSphere Dryad Brisk Greenplum Hadapt Aster Data Calpont VectorWise HP Vertica Operational Progress Oracle IBM DB2 SQL Server JustOne InterSystems MarkLogic MySQL Ingres PostgreSQL Objectivity Document Lotus Notes McObject SAP Sybase ASE EnterpriseDB Versant NoSQL CouchDB NewSQL HandlerSocket Akiban Key value MongoDB -as-a-Service MySQL Cluster Amazon RDS Couchbase RavenDB Cloudant App Engine SQL Azure Clustrix Riak Datastore Database.com Redis Drizzle Big tables Xeround FathomDB GenieDB Membrain SimpleDB ScalArc Cassandra Voldemort Hypertable Graph Schooner MySQL CodeFutures InfiniteGraph Tokutek ScaleBase NimbusDB BerkeleyDB HBase Neo4J Continuent GraphDB Translattice VoltDB Data Grid/Cache Terracotta GigaSpaces Oracle Coherence Memcached IBM eXtreme Scale GridGain ScaleOut Vmware GemFire InfiniSpan CloudTran
  • 52. Big Data Complexity Continuum Climate Modeling Gov’t Intelligence And Prediction Applications Predictions Trend Analytics Medical Number & Complexity of Technologies diagnostics Fraud Detection Influence Voice of Customer Networks Sentiment extraction Relationship Ad Targeting Reputation Retargeting Detection management Brand monitoring Intelligent Web search Machines Pattern Log Analysis Data mining eCommerce Detection Speech to text Time Historic Future(Predict) 52 Current (Monitor) Horizon IDC 2005
  • 53. Big Data Characteristics Velocity Value Big Data Variety/ Volume Complexity © IDC 12/2/2011

Editor's Notes

  1. If fairness, the Wikipedia article, as of 2011117, also noted “This article appears to be in both diffused categories and their subcategories, or has an overbroad categorization, and may need cleanup.”
  2. Image source: http://www.nasa.gov/audience/forstudents/5-8/features/what-is-a-black-hole-58.html
  3. This is a high-level dichotomy – and not meant to be precise or mutually-exclusive (i.e., some info items have both resource and relation attributes)
  4. This is meant to be illustrative – neither precise nor exhaustive
  5. Point of having a merged cell for physical: it’s all coming together – it’s increasingly difficult to distinguish the underlying physical model services…Hypertext is not 1:1 with HTML – it’s beyond-the-basics hypertext as manifested, e.g., in Web publishing and collaboration-oriented systems/serversXQuery is not mainstream today, but it is exceptionally powerful and was co-developed in conjunction with XPath 2.0
  6. Captured 20111117Wikipedia also notes “This article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. (October 2011)”
  7. NoSQL is sometimes also associated with open source DBMS, adding more confusion
  8. Image source: http://hadoop.apache.org/
  9. Image sources: http://hadoop.apache.org/http://www.slideshare.net/cloudera/tokyo-nosqlslidesonly?from=ss_embedRelated vendor press releases:http://www.asterdata.com/news/091001-Aster-Hadoop-connector.phphttp://www.emc.com/about/news/press/2011/20110509-03.htm http://www.vertica.com/2010/10/10/vertica-4-0-connector-for-hadoop/http://thinking.netezza.com/blog/hadoop-netezza-synergy-data-analytics-results-new-customer-deployment-trends-part-1http://developer.teradata.com/tag/hadoophttp://www.informatica.com/news_events/press_releases/Pages/11012010_cloudera.aspxhttp://www.zdnet.com/blog/microsoft/microsoft-drops-dryad-puts-its-big-data-bets-on-hadoop/11226
  10. Bottom row is not meant to imply RDBMS doesn’t offer indexing – rather that current leading RDBMSs don’t offer 100% XDBMS features
  11. Source:http://www.forrester.com/rb/Research/stay_alert_to_database_technology_innovation/q/id/57947/t/2
  12. Continuing from research program @ UC Berkeley on size of the web
  13. More than doubling every 2 years – 51% CAGRZettabyte – 1000 ExabytesExabyte = 1000 PetabytesPetabyte = 1000 TerabytesTerabyte = 1000 Gigabytes(so Zettabyte = 1 000 000 000 gigabytes)
  14. This is growth for enterprise info under management only…Not consumer data, RFID, even web log data etc, unless explicitly managed.Industry consensus estimate: 80% Unstructured Info vs. 20% structured
  15. More info is digital to begin withMore file types – e.g. video, collaborative, mobile, presence, etc.More digital activity as part of the job/task environment
  16. In raw state, the digital world looks about as organized as this view of a piece of the universe.The first question has to be: what's out there?So search is really the first pattern in the technologies of unstructured information.
  17. Note more than a doubling in the past year…..
  18. This is a historical view of the largest web search indexes of their day.
  19. Social technology has brought a new pattern – the social graph – to the technologies of unstructured information.Now it's not just documents and links, but people and their relationships and their friends relationships.Facebook today
  20. Go to this link to see your own map of LinkedIn connections.
  21. If you were Google, would you like to own that data?If you were Facebook, would you consider marketing a smartphone product?If you were Microsoft, you might buy a mobile phone company to get into the market, if you only failed building your own ecosystem.
  22. Attensity screen shots – text analytics for web-wide monitoring of blogs, social networks, YouTube videos, traditional media, with purpose-built analytics and dashboards for timely/high availability decisions.
  23. Emphasis on guidance rather than discovery.Personal assistance – e.g. Siri & her successorsHealthcareCustomer serviceCustomer Self-service for complex products – e.g. investments, insurance, etc.MaintenanceIT service environmentsMany more…
  24. …in the period between 2010 and 2015.