SlideShare a Scribd company logo
1 of 16
Download to read offline
1 / 15 
Database Systems Research 
in Dan Olteanu's Group @Oxford 
DBOnto Kick-O Sept 25, 2014
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
2 / 15 
Outline
Factorized Databases by Example 
Orders 
customer day pizza 
Mario Monday Capricciosa 
Mario Friday Capricciosa 
Pietro Friday Hawaii 
Lucia Friday Hawaii 
Pizzas 
pizza item 
Capricciosa base 
Capricciosa ham 
Capricciosa mushrooms 
Hawaii base 
Hawaii ham 
Hawaii pineapple 
Items 
item price 
base 6 
ham 1 
mushrooms 1 
pineapple 2 
Consider the natural join of the three relations above: 
Orders 1 Pizzas 1 Items 
customer day pizza item price 
Mario Monday Capricciosa base 6 
Mario Monday Capricciosa ham 1 
Mario Monday Capricciosa mushrooms 1 
Mario Friday Capricciosa base 6 
Mario Friday Capricciosa ham 1 
Mario Friday Capricciosa mushrooms 1 
: : : : : : : : : : : : : : : 
3 / 15
Factorized Databases by Example 
Orders 1 Pizzas 1 Items 
customer day pizza item price 
Mario Monday Capricciosa base 6 
Mario Monday Capricciosa ham 1 
Mario Monday Capricciosa mushrooms 1 
Mario Friday Capricciosa base 6 
Mario Friday Capricciosa ham 1 
Mario Friday Capricciosa mushrooms 1 
: : : : : : : : : : : : : : : 
A 
at relational algebra expression encoding the above query result is: 
hMarioi  hMondayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hMondayi  hCapricciosai  hhami  h1i [ 
hMarioi  hMondayi  hCapricciosai  hmushroomsi  h1i [ 
hMarioi  hFridayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hFridayi  hCapricciosai  hhami  h1i [ 
hMarioi  hFridayi  hCapricciosai  hmushroomsi  h1i [ : : : 
It uses relational product (), union ([), and singleton relations (e.g., h1i). 
The attribute names are not shown to avoid clutter. 
4 / 15
Factorized Databases by Example 
The previous relational expression entails lots of redundancy due to the joins: 
hMarioi  hMondayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hMondayi  hCapricciosai  hhami  h1i [ 
hMarioi  hMondayi  hCapricciosai  hmushroomsi  h1i [ 
hMarioi  hFridayi  hCapricciosai  hbasei  h6i [ 
hMarioi  hFridayi  hCapricciosai  hhami  h1i [ 
hMarioi  hFridayi  hCapricciosai  hmushroomsi  h1i [ : : : 
We can factorize the expression following the join structure, e.g.,: 
hCapricciosai  (hMondayi  hMarioi [ hFridayi  hMarioi) 
 (hbasei  h6i [ hhami  h1i [ hmushroomsi  h1i) 
[ hHawaiii  hFridayi  (hLuciai [ hPietroi) 
 (hbasei  h6i [ hhami  h1i [ hpineapplei  h2i) 
pizza 
day 
customer 
item 
price 
There are several algebraically equivalent factorized representations de
ned by 
distributivity of product over union and commutativity of product and union. 
5 / 15
Key Properties of Factorized Representations 
Factorized representations of results for queries with select, project, join, 
aggregate, groupby, and orderby operators: 
Very high compression rate 
I Can be exponentially more succinct than the relations they encode. 
I Arbitrarily better than generic compression schemes, e.g., bzip2 
I Factorized representations of asymptotically-tight size bounds computable 
directly from input database and query 
Querying in the compressed domain 
I Factorizations are relational expressions and can be composed with queries 
I We developed the FDB in-memory query engine for this purpose 
6 / 15
Current Focus 
Reduce communication cost in distributed database systems 
Factorization of temporary query results exchanged between nodes 
Many systems already employ limited factorizations 
Google MegaStore and F1, FoundationDB, Microsoft Cloud SQL Server 
Google Faculty Research Award 
Reduce space requirements of large-scale feature vectors in predictive modelling 
Feature vectors = relations with high cardinality 
Improvements of 10-100x on LogicBlox client data 
7 / 15
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
8 / 15 
Outline 
gora ENFrame SPROUT2
Probabilistic Data is Commonplace 
Facts of life: 
Real-world data is often uncertain 
Currrent probabilistic databases are in the order of Billion records 
Generated from web data by NELL, Google Squared  Knowledge Vault 
Curating before processing is a time  money black hole 
We would like to query uncertain data asap! 
9 / 15
Probabilistic Data is Commonplace 
Facts of life: 
Real-world data is often uncertain 
Currrent probabilistic databases are in the order of Billion records 
Generated from web data by NELL, Google Squared  Knowledge Vault 
Curating before processing is a time  money black hole 
We would like to query uncertain data asap! 
MayBMS/SPROUT probabilistic database system 
Open-source, built on top of PostgreSQL 
3000+ downloads (as of Dec 2013) 
The PDB most benchmarked against 
SPROUT2 = SPROUT on Google Squared 
Caught the interest of UK Defence Science and Technology Lab 
9 / 15
A Squared Query Engine for Uncertain Web Data 
10 / 15
A Squared Query Engine for Uncertain Web Data 
11 / 15
Factorized Databases 
Probabilistic Databases 
Datalog Engines 
12 / 15 
Outline
A New Breed of Smart Database Systems 
Uni
ed and declarative programming model for the enterprise tech stack 
Can freely mix transactions, analytics, graph queries, mathematical 
programming and optimization, probabilistic programming 
Makes possible new classes of hybrid applications 
Typical app in retail sector: 
I 50K Datalog++ LOC (vs. millions of C++ LOC) 
I One system (vs. tens) 
13 / 15

More Related Content

What's hot

Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web ScienceJames Hendler
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrPortland R User Group
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldMobin Ranjbar
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Brandi Davis-Dusenbery
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
3 Google Operators
3 Google Operators3 Google Operators
3 Google Operatorsaptwano
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioWinston Chen
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataInfovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataPaul Houle
 
Elasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkElasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkShunsuke Kozawa
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus feevoginip
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 

What's hot (18)

Open Data
Open DataOpen Data
Open Data
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
 
R, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchrR, HTTP, and APIs, with a preview of TopicWatchr
R, HTTP, and APIs, with a preview of TopicWatchr
 
Hadoop Case Studies in the Real World
Hadoop Case Studies in the Real WorldHadoop Case Studies in the Real World
Hadoop Case Studies in the Real World
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
 
Database Backup
Database BackupDatabase Backup
Database Backup
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
3 Google Operators
3 Google Operators3 Google Operators
3 Google Operators
 
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
شرح مقدمة في أصول التفسير لابن تيمية (بازمول)
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph DataInfovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
 
Big data
Big dataBig data
Big data
 
Elasticsearch in hatena bookmark
Elasticsearch in hatena bookmarkElasticsearch in hatena bookmark
Elasticsearch in hatena bookmark
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus fee
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 

Viewers also liked

SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet PosterDBOnto
 
Optique - poster
Optique - posterOptique - poster
Optique - posterDBOnto
 
Optique presentation
Optique presentationOptique presentation
Optique presentationDBOnto
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationDBOnto
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn PresentationDBOnto
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA PresentationDBOnto
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperDBOnto
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox PosterDBOnto
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationDBOnto
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA posterDBOnto
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDBOnto
 
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterDBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ PosterDBOnto
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian HorrocksDBOnto
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationDBOnto
 

Viewers also liked (20)

SemFacet Poster
SemFacet PosterSemFacet Poster
SemFacet Poster
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
 
Optique presentation
Optique presentationOptique presentation
Optique presentation
 
Semantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentationSemantic Faceted Search with SemFacet presentation
Semantic Faceted Search with SemFacet presentation
 
ROSeAnn Presentation
ROSeAnn PresentationROSeAnn Presentation
ROSeAnn Presentation
 
PAGOdA Presentation
PAGOdA PresentationPAGOdA Presentation
PAGOdA Presentation
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators Paper
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox Poster
 
PDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentationPDQ: Proof-driven Querying presentation
PDQ: Proof-driven Querying presentation
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
PAGOdA poster
PAGOdA posterPAGOdA poster
PAGOdA poster
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
 
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - PosterArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
PDQ Poster
PDQ PosterPDQ Poster
PDQ Poster
 
Welcome by Ian Horrocks
Welcome by Ian HorrocksWelcome by Ian Horrocks
Welcome by Ian Horrocks
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox Presentation
 

Similar to Overview of Dan Olteanu's Research presentation

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxllr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxsmile790243
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIInside Analysis
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformYaoyu Wang
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxssuser454af01
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesDataWorks Summit
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed RJorge Martinez de Salinas
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 

Similar to Overview of Dan Olteanu's Research presentation (20)

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docxllr+ cHApTEFt s Database Processing(2) Does this design e.docx
llr+ cHApTEFt s Database Processing(2) Does this design e.docx
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BIFive Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BI
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docxFai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
Fai[ Away with Dynamo, Bigtabte, and Cassandra194 cHArlrEF.docx
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Overview of Dan Olteanu's Research presentation

  • 1. 1 / 15 Database Systems Research in Dan Olteanu's Group @Oxford DBOnto Kick-O Sept 25, 2014
  • 2. Factorized Databases Probabilistic Databases Datalog Engines 2 / 15 Outline
  • 3. Factorized Databases by Example Orders customer day pizza Mario Monday Capricciosa Mario Friday Capricciosa Pietro Friday Hawaii Lucia Friday Hawaii Pizzas pizza item Capricciosa base Capricciosa ham Capricciosa mushrooms Hawaii base Hawaii ham Hawaii pineapple Items item price base 6 ham 1 mushrooms 1 pineapple 2 Consider the natural join of the three relations above: Orders 1 Pizzas 1 Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 : : : : : : : : : : : : : : : 3 / 15
  • 4. Factorized Databases by Example Orders 1 Pizzas 1 Items customer day pizza item price Mario Monday Capricciosa base 6 Mario Monday Capricciosa ham 1 Mario Monday Capricciosa mushrooms 1 Mario Friday Capricciosa base 6 Mario Friday Capricciosa ham 1 Mario Friday Capricciosa mushrooms 1 : : : : : : : : : : : : : : : A at relational algebra expression encoding the above query result is: hMarioi hMondayi hCapricciosai hbasei h6i [ hMarioi hMondayi hCapricciosai hhami h1i [ hMarioi hMondayi hCapricciosai hmushroomsi h1i [ hMarioi hFridayi hCapricciosai hbasei h6i [ hMarioi hFridayi hCapricciosai hhami h1i [ hMarioi hFridayi hCapricciosai hmushroomsi h1i [ : : : It uses relational product (), union ([), and singleton relations (e.g., h1i). The attribute names are not shown to avoid clutter. 4 / 15
  • 5. Factorized Databases by Example The previous relational expression entails lots of redundancy due to the joins: hMarioi hMondayi hCapricciosai hbasei h6i [ hMarioi hMondayi hCapricciosai hhami h1i [ hMarioi hMondayi hCapricciosai hmushroomsi h1i [ hMarioi hFridayi hCapricciosai hbasei h6i [ hMarioi hFridayi hCapricciosai hhami h1i [ hMarioi hFridayi hCapricciosai hmushroomsi h1i [ : : : We can factorize the expression following the join structure, e.g.,: hCapricciosai (hMondayi hMarioi [ hFridayi hMarioi) (hbasei h6i [ hhami h1i [ hmushroomsi h1i) [ hHawaiii hFridayi (hLuciai [ hPietroi) (hbasei h6i [ hhami h1i [ hpineapplei h2i) pizza day customer item price There are several algebraically equivalent factorized representations de
  • 6. ned by distributivity of product over union and commutativity of product and union. 5 / 15
  • 7. Key Properties of Factorized Representations Factorized representations of results for queries with select, project, join, aggregate, groupby, and orderby operators: Very high compression rate I Can be exponentially more succinct than the relations they encode. I Arbitrarily better than generic compression schemes, e.g., bzip2 I Factorized representations of asymptotically-tight size bounds computable directly from input database and query Querying in the compressed domain I Factorizations are relational expressions and can be composed with queries I We developed the FDB in-memory query engine for this purpose 6 / 15
  • 8. Current Focus Reduce communication cost in distributed database systems Factorization of temporary query results exchanged between nodes Many systems already employ limited factorizations Google MegaStore and F1, FoundationDB, Microsoft Cloud SQL Server Google Faculty Research Award Reduce space requirements of large-scale feature vectors in predictive modelling Feature vectors = relations with high cardinality Improvements of 10-100x on LogicBlox client data 7 / 15
  • 9. Factorized Databases Probabilistic Databases Datalog Engines 8 / 15 Outline gora ENFrame SPROUT2
  • 10. Probabilistic Data is Commonplace Facts of life: Real-world data is often uncertain Currrent probabilistic databases are in the order of Billion records Generated from web data by NELL, Google Squared Knowledge Vault Curating before processing is a time money black hole We would like to query uncertain data asap! 9 / 15
  • 11. Probabilistic Data is Commonplace Facts of life: Real-world data is often uncertain Currrent probabilistic databases are in the order of Billion records Generated from web data by NELL, Google Squared Knowledge Vault Curating before processing is a time money black hole We would like to query uncertain data asap! MayBMS/SPROUT probabilistic database system Open-source, built on top of PostgreSQL 3000+ downloads (as of Dec 2013) The PDB most benchmarked against SPROUT2 = SPROUT on Google Squared Caught the interest of UK Defence Science and Technology Lab 9 / 15
  • 12. A Squared Query Engine for Uncertain Web Data 10 / 15
  • 13. A Squared Query Engine for Uncertain Web Data 11 / 15
  • 14. Factorized Databases Probabilistic Databases Datalog Engines 12 / 15 Outline
  • 15. A New Breed of Smart Database Systems Uni
  • 16. ed and declarative programming model for the enterprise tech stack Can freely mix transactions, analytics, graph queries, mathematical programming and optimization, probabilistic programming Makes possible new classes of hybrid applications Typical app in retail sector: I 50K Datalog++ LOC (vs. millions of C++ LOC) I One system (vs. tens) 13 / 15
  • 17. Live Programming in the Database Flexible spreadsheet backed by scalable full- edged DBMS Users can de
  • 18. ne formulas or change schema I Triggers addition/deletion of datalog code on the DB server program! edbs! execution graph! revised execution graph! idbs! revised idbs! (meta-data)! (actual data)! 14 / 15
  • 19. Our Approach Use declarative programming to improve the implementation of declarative systems! Internal library for declarative and incremental maintenance of program state, using a small datalog engine. I In the LogicBlox engine since May 2014 15 / 15