SlideShare a Scribd company logo
1 of 29
Applying Semantics to Unstructured
     Data (Big and Getting Bigger)
                Wednesday, November 30, 2012
                        4:00 – 5:00
Bryan Bell
  Vice President, Enterprise Solutions, Expert
  System
Lynda Moulton,
  Analyst & Consultant, LWM Technology Services
Peter O'Kelly
  Principal Analyst, O'Kelly Associates
Overall Session Agenda
• Introduction and context-setting
• "Big Data" 101 for Business
• Semantics and the Big Data Opportunity




                                           2
Big Data 101 Agenda
•   Big data in context
•   Recap
•   Risks
•   Recommendations




                                  3
Big Data in Context
• What is “big data”?
  – Unhelpfully, both “big data” and “NoSQL,” generally
    considered a key part of the big data wave, are defined
    more in terms of what they aren’t than what they are
  – A typical big data definition (Wikipedia):
     • “[…] data sets that grow so large that they become awkward to
       work with using on-hand database management tools”
  – Often associated with Gartner’s volume, variety (and
    complexity), and velocity model
     • Also value and veracity considerations

                                                                 4
Big Data in Context
• Why is big data a big deal now?
  – Commoditized hardware, software, and networking
     • Capability and price/performance curves that continue to
       defy all economic “laws”
     • Cloud services with radical new capability/cost equations
  – Maturation and uptake of related open source
    software, especially Hadoop
     • Powerful and often no- or low-cost



                                                             5
Big Data in Context
• Why is big data a big deal now (continued)?
  – Market enthusiasm for “NoSQL” systems
  – Useful and often “open source”/public domain data
    sources and services
  – Mainstreaming of semantic tools and techniques




                                                   6
A Prime Minicomputer, c1982




                              7
Fast-Forward to 2012




                       8
Fast-Forward to 2012




                       9
Fast-Forward to 2012




                       10
Fast-Forward to 2012




                       11
Fast-Forward to 2012




                       12
Google BigQuery




                  13
Hadoop
• Hadoop is often considered central to big data
  – Originating with Google’s MapReduce architecture,
    Apache Hadoop is an open source architecture for
    distributed processing on networks of commodity
    hardware
  – From Wikipedia:
     • “’Map’ step: The master node takes the input, divides it into
       smaller sub-problems, and distributes them to worker nodes
     • ‘Reduce’ step: The master node then collects the answers to
       all the sub-problems and combines them in some way to
       form the output – the answer to the problem it was
       originally trying to solve”

                                                                   14
Hadoop
• Commercial application domains include (from
  Wikipedia)
  –   Log and/or clickstream analysis of various kinds
  –   Marketing analytics
  –   Machine learning and/or sophisticated data mining
  –   Image processing
  –   Processing of XML messages
  –   Web crawling and/or text processing
  –   General archiving, including of relational/tabular data,
      e.g. for compliance

                                                             15
Hadoop
• Hadoop is popular and rapidly evolving
  – Most leading information management vendors
    have embraced Hadoop
  – There is now a Hadoop ecosystem




                                                  16
Meanwhile, Back in the Googleplex
• Dremel, BigQuery, Spanner, and other really
  big data projects




                                                17
Meanwhile, Back in the Googleplex




                                18
Google Now




             19
A NoSQL Taxonomy
• From the NoSQL Wikipedia article:




                                      20
A View of the NoSQL Landscape




                                21
Another NoSQL Landscape View
NoSQL Perspectives
• The “NoSQL” meme confusingly conflates
   – Document database requirements
      • Best served by XML DBMS (XDBMS)
   – Physical database model decisions on which only DBAs and
     systems architects should focus
      • And which are more complementary than competitive with DBMS
   – Object databases, which have floundered for decades
      • But with which some application developers are nonetheless
        enamored, for minimized “impedance mismatch,” despite significant
        information management compromises
   – Semantic (e.g., RDF) models
      • Also more complementary than competitive with RDBMS/XDBMS
• Also consider: the “traditional” DBMS players can leverage
  the same underlying technology power curves

                                                                            23
Data as a Service
• The (single source of) truth is out there?...
   – High-quality data sources are being commoditized
   – Value is shifting to the ability to discern and leverage conceptual
     connections, not just to manage big databases
• Some resources and developments to explore
   –   Social networking graphs and activities
   –   Data.com (Salesforce.com)
   –   Data.gov
   –   Google Knowledge Graph
   –   Linked Data
   –   Microsoft Windows Azure Data Marketplace
   –   Wikidata.org
   –   Wolfram Alpha

                                                                      24
Mainstreaming Semantics
• Tools and techniques applied in search of
  more meaning, e.g.,
  – Vocabulary management
  – Disambiguation and auto-categorization
  – Text mining and analysis
  – Context and relationship analysis
• It’s still ideal to help people capture and apply
  data and metadata in context
  – Semantic tools/techniques are complementary

                                                  25
Mainstreaming Semantics
• The Semantic Web is still more vision than reality
   – But Google, Microsoft, and Yahoo, and Yandex, for
     example, are improving Web searches by capturing
     and applying more metadata and relationships via
     schema.org schemas in Web pages
   – And Google’s Knowledge Graph is about “things, not
     strings,” with, as of mid-2012, “500 million objects, as
     well as more than 3.5 billion facts about and
     relationships between these different objects”



                                                            26
Recap
• Commoditization and cloud
  – Very significant new opportunities
• Hadoop and related frameworks
  – Complementary to RDBMS and XDBMS
• NoSQL
  – Likely headed for meme-bust…
• Data services
  – Game-changing potential
• Semantic tools and techniques
  – Rapidly gaining momentum

                                         27
Risks
• The potential for an ever-expanding set of information silos
   – Focus on minimized redundancy and optimized integration
• GIGO (garbage in, garbage out) at super-scale
   – New opportunities for unprecedented self-inflicted damage, for
     organizations that don’t model or query effectively
• Cognitive overreach
   – The potential for information workers to create and act on
     nonsensical queries based on poorly-designed and/or
     misunderstood information models
• Skills gaps can create competitive disadvantages
   – Modeling, query formulation, and data analysis
   – Critical thinking and information literacy



                                                                  28
Recommendations
• Aim high: big data is in many respects just
  getting started…
   – A lot of technology recycling but also
     significant and disruptive innovation
• Work to build consensus among stake-
  holders on the opportunities and risks
• Focus on human skills – e.g., critical
  thinking and information literacy
   – For now, an instance of the most creative and
     powerful type of semantic big data processor
     we know of is between your ears

                                                     29

More Related Content

What's hot

Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data TechnologiesDATAVERSITY
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Usama Fayyad
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 

What's hot (20)

BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
Big Data
Big DataBig Data
Big Data
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 

Similar to Gilbane Boston 2012 Big Data 101

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxNouhaElhaji1
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big DataRaymond Yu
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Vladi Vexler
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
 

Similar to Gilbane Boston 2012 Big Data 101 (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big Data
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 

More from Peter O'Kelly

Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Peter O'Kelly
 
T3 marketing automation and big data
T3 marketing automation and big dataT3 marketing automation and big data
T3 marketing automation and big dataPeter O'Kelly
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information qualityPeter O'Kelly
 
Gilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetGilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetPeter O'Kelly
 
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Peter O'Kelly
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaPeter O'Kelly
 

More from Peter O'Kelly (6)

Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
Glibane 2016: How Consumer Cloud Conquered Corporate Control of Communication...
 
T3 marketing automation and big data
T3 marketing automation and big dataT3 marketing automation and big data
T3 marketing automation and big data
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
 
Gilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead YetGilbane Boston 2012: XML and SQL: Not Dead Yet
Gilbane Boston 2012: XML and SQL: Not Dead Yet
 
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
Revisiting Open Document Format and Office Open XML: The Quiet Revolution Con...
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Gilbane Boston 2012 Big Data 101

  • 1. Applying Semantics to Unstructured Data (Big and Getting Bigger) Wednesday, November 30, 2012 4:00 – 5:00 Bryan Bell Vice President, Enterprise Solutions, Expert System Lynda Moulton, Analyst & Consultant, LWM Technology Services Peter O'Kelly Principal Analyst, O'Kelly Associates
  • 2. Overall Session Agenda • Introduction and context-setting • "Big Data" 101 for Business • Semantics and the Big Data Opportunity 2
  • 3. Big Data 101 Agenda • Big data in context • Recap • Risks • Recommendations 3
  • 4. Big Data in Context • What is “big data”? – Unhelpfully, both “big data” and “NoSQL,” generally considered a key part of the big data wave, are defined more in terms of what they aren’t than what they are – A typical big data definition (Wikipedia): • “[…] data sets that grow so large that they become awkward to work with using on-hand database management tools” – Often associated with Gartner’s volume, variety (and complexity), and velocity model • Also value and veracity considerations 4
  • 5. Big Data in Context • Why is big data a big deal now? – Commoditized hardware, software, and networking • Capability and price/performance curves that continue to defy all economic “laws” • Cloud services with radical new capability/cost equations – Maturation and uptake of related open source software, especially Hadoop • Powerful and often no- or low-cost 5
  • 6. Big Data in Context • Why is big data a big deal now (continued)? – Market enthusiasm for “NoSQL” systems – Useful and often “open source”/public domain data sources and services – Mainstreaming of semantic tools and techniques 6
  • 14. Hadoop • Hadoop is often considered central to big data – Originating with Google’s MapReduce architecture, Apache Hadoop is an open source architecture for distributed processing on networks of commodity hardware – From Wikipedia: • “’Map’ step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes • ‘Reduce’ step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve” 14
  • 15. Hadoop • Commercial application domains include (from Wikipedia) – Log and/or clickstream analysis of various kinds – Marketing analytics – Machine learning and/or sophisticated data mining – Image processing – Processing of XML messages – Web crawling and/or text processing – General archiving, including of relational/tabular data, e.g. for compliance 15
  • 16. Hadoop • Hadoop is popular and rapidly evolving – Most leading information management vendors have embraced Hadoop – There is now a Hadoop ecosystem 16
  • 17. Meanwhile, Back in the Googleplex • Dremel, BigQuery, Spanner, and other really big data projects 17
  • 18. Meanwhile, Back in the Googleplex 18
  • 20. A NoSQL Taxonomy • From the NoSQL Wikipedia article: 20
  • 21. A View of the NoSQL Landscape 21
  • 23. NoSQL Perspectives • The “NoSQL” meme confusingly conflates – Document database requirements • Best served by XML DBMS (XDBMS) – Physical database model decisions on which only DBAs and systems architects should focus • And which are more complementary than competitive with DBMS – Object databases, which have floundered for decades • But with which some application developers are nonetheless enamored, for minimized “impedance mismatch,” despite significant information management compromises – Semantic (e.g., RDF) models • Also more complementary than competitive with RDBMS/XDBMS • Also consider: the “traditional” DBMS players can leverage the same underlying technology power curves 23
  • 24. Data as a Service • The (single source of) truth is out there?... – High-quality data sources are being commoditized – Value is shifting to the ability to discern and leverage conceptual connections, not just to manage big databases • Some resources and developments to explore – Social networking graphs and activities – Data.com (Salesforce.com) – Data.gov – Google Knowledge Graph – Linked Data – Microsoft Windows Azure Data Marketplace – Wikidata.org – Wolfram Alpha 24
  • 25. Mainstreaming Semantics • Tools and techniques applied in search of more meaning, e.g., – Vocabulary management – Disambiguation and auto-categorization – Text mining and analysis – Context and relationship analysis • It’s still ideal to help people capture and apply data and metadata in context – Semantic tools/techniques are complementary 25
  • 26. Mainstreaming Semantics • The Semantic Web is still more vision than reality – But Google, Microsoft, and Yahoo, and Yandex, for example, are improving Web searches by capturing and applying more metadata and relationships via schema.org schemas in Web pages – And Google’s Knowledge Graph is about “things, not strings,” with, as of mid-2012, “500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects” 26
  • 27. Recap • Commoditization and cloud – Very significant new opportunities • Hadoop and related frameworks – Complementary to RDBMS and XDBMS • NoSQL – Likely headed for meme-bust… • Data services – Game-changing potential • Semantic tools and techniques – Rapidly gaining momentum 27
  • 28. Risks • The potential for an ever-expanding set of information silos – Focus on minimized redundancy and optimized integration • GIGO (garbage in, garbage out) at super-scale – New opportunities for unprecedented self-inflicted damage, for organizations that don’t model or query effectively • Cognitive overreach – The potential for information workers to create and act on nonsensical queries based on poorly-designed and/or misunderstood information models • Skills gaps can create competitive disadvantages – Modeling, query formulation, and data analysis – Critical thinking and information literacy 28
  • 29. Recommendations • Aim high: big data is in many respects just getting started… – A lot of technology recycling but also significant and disruptive innovation • Work to build consensus among stake- holders on the opportunities and risks • Focus on human skills – e.g., critical thinking and information literacy – For now, an instance of the most creative and powerful type of semantic big data processor we know of is between your ears 29

Editor's Notes

  1. At my employer (a facilities management company in Seattle, responsible for the claims-processing back-end for Washington State Delta Dental) in 1982: added 4 MB main memory to a Prime 750 system; changed the locks on the building and office doors, due to new security risk (mega-$ upgrade)…
  2. Source: “How to Create a Mind,” Ray Kurzweil, p. 256
  3. Source: “How to Create a Mind,” Ray Kurzweil, p. 259
  4. Source: “How to Create a Mind,” Ray Kurzweil, p. 258
  5. Source: “How to Create a Mind,” Ray Kurzweil, p. 254
  6. Clipped from Amazon sale page 20121116
  7. An example of what these power curves facilitate…Source https://developers.google.com/bigquery/docs/pricing#tableCaptured 2012118Also consider Amazon Web Services, Salesforce.com’sdatabase.com
  8. Image source: http://hadoop.apache.org/
  9. Image source: http://hadoop.apache.org/
  10. Image sources: http://hadoop.apache.org/http://www.slideshare.net/cloudera/tokyo-nosqlslidesonly?from=ss_embed
  11. Source https://cloud.google.com/files/BigQueryTechnicalWP.pdfLater in the same paper: “Dremel can scan 35 billion rows without an index in tens of seconds […] parallelize queries and run them on tens of thousands of servers simultaneously”
  12. Source https://cloud.google.com/files/BigQueryTechnicalWP.pdf
  13. Google Now as an example of a big data application context – a personal experience snapshot:Early morning: searched Google Maps on my iPad for the address to nearby town high school, where I was my driving daughter that evening for an eventLater, on my Google Nexus 7 tablet, Google Now presented a “card” with directions and traffic information to the school – from my current location, which it got from GPS or Wi-Fi network triangulationOne click away from turn-by-turn navigationAlso note Google Voice Search All at no cost to me (except for the data I gave Google in exchange for using the services…) This is a basic example – Google has much more in mind, and it’s not alone in this context – it aspires to use predictive analytics (and big data about you in the world…) to answer questions before you ask them
  14. Captured 20121105
  15. Source: http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/My point: this is supposed to be a simplification, relative to RDBMS?...
  16. Source: http://arnon.me/2012/11/nosql-landscape-diagrams/Another view of the NoSQL land-grab; these domains (except for “NewSQL”)all predated the “NoSQL” label
  17. NoSQL is sometimes also associated with open source DBMS, adding more confusion
  18. Snapshots:Government data: also see http://www.cityofboston.gov/open/ and other country-level servicesWolfram Alpha – captured 20121118: “Curated data: 10+ trillion pieces of data from primary sources with continuous updating”
  19. Google Knowledge Graph: http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html
  20. Reference to Kurzweil book: a timely (and optimistic) review of how we got here, and what may be next