0
Digital Worlds (applications) q    VEC (Enterprise Scale)        •  1,300 source databases        •  10+ million views (v...
Observations q  Data    Sources     •  Massive          o  Number          o  Heterogeneity          o  Distribution (dat...
Trendsq  Digital    Universeq  Holistic     Views    •  Information Ecosystems: data    •  Ecosystems: Processes over se...
Databases and AI: The Twain Just Met q  Database     World     •  Engineering (RDBMSs) @ scale     •  Reasoning: Relation...
What Underlies the Digital Universe    Modelling           Execution  Data Models         DBMS Engines   Languages        ...
What Underlies the Data Universe  Relational               Data Independence    RDBMS  Data Model  Semantics              ...
Relational Database Improvements q  Pre-Relational    •  Hierarchical    •  Network q  Relational    •  Row store    •  ...
Data Models For New Domains Must HonorData Independence q    Array (Matrix)-store (SciDB) [Linear algebra] q    XML data...
Data Universe                Database Universe                    Relational                      Data                    ...
Data Universe   Graph-                Network              Time                 Data                Series  Scientific    ...
Data Universe   Graph-                Network              Time                 Data                Series  Scientific    ...
Data Integration Solution Space:Data Independence Required                                       Computation           Pro...
Databases vs. Semantic Web    Discrete Worlds         Heterogeneous WorldsSingle Versions of Truth       Multiple Truths  ...
Databases vs. Web                                                                                Web	                     ...
Data Integration q  Query:   define the result    •  Entity    •  Computation q  Find candidate data sets: search       ...
Managing Data @ Scale I q  Introduction    •  Michael L. Brodie q  Global   Data Integration and Global Data Mining    •...
Upcoming SlideShare
Loading in...5
×

STI Summit 2011 - Digital Worlds

167

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
167
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "STI Summit 2011 - Digital Worlds"

  1. 1. Digital Worlds (applications) q  VEC (Enterprise Scale) •  1,300 source databases •  10+ million views (via data integration) q  US Healthcare (National Scale) •  Scale o  Health care and social assistance offices: 784,626 incl •  Doctors offices: 220,131 •  Dentists: 127,057 •  Hospitals: 6,505 •  Clinics: ~5,000 ~= SME say 100 Databases o  Patients: 100-300+ million o  Databases: ~32 million •  Scope o  Comprehensive medical events, methods, analysis, … •  E.g., Alice (62) in Emergency Room with liver failure o  Insurance, payments, … o  New metric: healthcare quality •  Examples o  SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o  HHS CIO (Todd Park): Open Health Data Initiative o  US (PCAST, White House) vision
  2. 2. Observations q  Data Sources •  Massive o  Number o  Heterogeneity o  Distribution (data at source) o  Constant change – data, model, ontology, business rules, … •  Constrained o  Governance: privacy, confidentiality, legal, … o  Quality, correctness, precision, … o  Competition q  Critical Requirement: meaningful •  Human lives •  Health of individuals, communities, nation •  Economic impact: $ trillions / year •  Political: meaningless debates
  3. 3. Trendsq  Digital Universeq  Holistic Views •  Information Ecosystems: data •  Ecosystems: Processes over servicesq  Big Data: massive o  Number o  Distribution o  Heterogeneity •  Semantics •  Structure: relational databases, X databases, web, deep web •  Technology: databases, data warehouses, files, …q  New Models: problem solving, data, … •  Data-driven •  Social computing: data as social artifacts •  Science: Wolfram Alpha •  Pragmatics: Driven by healthcare quality improvement
  4. 4. Databases and AI: The Twain Just Met q  Database World •  Engineering (RDBMSs) @ scale •  Reasoning: Relational model (FoL) q  AI World •  Reasoning: more powerful & expressive •  Engineering: in the small q  Digital Universe, e.g., Web •  Reasoning: beyond the RDM & AI? •  Engineering: way beyond RDBMS q  Information ecosystems •  Databases: join •  Web: link Power Law of Data The value of a data element is proportional to the number of its meaningful uses.
  5. 5. What Underlies the Digital Universe Modelling Execution Data Models DBMS Engines Languages Algorithms Semantics Semantics Problem Solving Computation
  6. 6. What Underlies the Data Universe Relational Data Independence RDBMS Data Model Semantics SemanticsProblem Solving Computation
  7. 7. Relational Database Improvements q  Pre-Relational •  Hierarchical •  Network q  Relational •  Row store •  OLAP / Data Warehouse q  Post-Relational •  RDF store •  Column store •  Bare bones relational •  Stream / complex event processing q  Push Down •  Database / data warehouse appliances (20+ on the market) •  In-database analytics, … (10+ on the market)
  8. 8. Data Models For New Domains Must HonorData Independence q  Array (Matrix)-store (SciDB) [Linear algebra] q  XML databases: structured content, information exchange q  Content management: e.g., Sharepoint q  Graph/network store: social networking (Facebook), link analysis q  Protein store: protein folding, drug discovery, … q  Geospatial / map store: location-based applications q  Time series: signal processing, statistical and financial analysis q  Cloud / Mesh data (NoSQL) stores: web scale applications q  and they just keep coming …
  9. 9. Data Universe Database Universe Relational Data Universe
  10. 10. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  11. 11. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  12. 12. Data Integration Solution Space:Data Independence Required Computation Problem Solving Databases Relational Optimal 4 homogeneous Optimal 4 pure relational data relational data Domain-specific Emerging Emerging Semantic Technologies (AI) Knowledge Representation Minimal Powerful Ontologies Minimal Powerful Semantic Web Modest / emerging Modest / emerging Semantic Data Management Emerging Emerging Architectural Information-As-A-Service Emerging Emerging Cloud Emerging N/A
  13. 13. Databases vs. Semantic Web Discrete Worlds Heterogeneous WorldsSingle Versions of Truth Multiple Truths Data Models LOD Models? Mathematical Logic What Logic ? 1,000s of databasesProbabilistic / Eventual Common Sense Reasoning Reasoning? DI: Relational Join DI: Evidence Gathering Databases Semantic Web
  14. 14. Databases vs. Web Web   Explora2on   Mul2ple  versions  of  truth   .  .  .     Analysis  /  BI   Evidence  Gathering Data  Warehouses  Scale .  .  .     Seman+cally  Heterogeneous  Views   Single  versions   Data  Management   of  truth   .  .  .     Seman+cally  Homogeneous  Databases  
  15. 15. Data Integration q  Query: define the result •  Entity •  Computation q  Find candidate data sets: search Hard q  Extract, Transform, and Load (ETL): engineering q  Data Integration •  Entity resolution Harder •  Integration computation
  16. 16. Managing Data @ Scale I q  Introduction •  Michael L. Brodie q  Global Data Integration and Global Data Mining •  Chris Bizer q  DB vs RDF: structure vs correlation •  Peter Boncz
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×