"ExpoDB: An Exploratory Data Science Platform"

© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB:An Exploratory Data Science Platform
(A New Frontier: From Data Processing to Knowledge Exploration)
Mohammad Sadoghi
Assistant Professor
Department of Computer Science
Purdue University
IBM Cognitive Systems Institute Speaker Series
September 29, 2016
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
2
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Data is spread across many islands of disconnected sources
(a lack of holistic view)
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
3
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Sadly, adverse drug reactions (ADRs) is the 4th leading cause of
deaths in United States, resulting in100,000 loss of life annually
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
4
http://www.cpsresearch.eu/clinical-trials/
http://news.mit.edu/2015/mnookin-vaccination-public-health-0227
http://www.healthcarepackaging.com/trends-and-issues/clinical-trials
http://stormercellularloo.gq/evolve-ii-clinical-trial.html
https://www.geneticliteracyproject.org
Adverse drug reaction costs over $136 billion dollars in US annually
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data at Web Scale
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
8
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
9
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
10
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
limit cells
growth
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
tum
or
suppressor
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
11
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
limit cells
growth
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
tum
or
suppressor
?
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
12
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
tum
or
suppressor
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
limit cells
growth
?
?
?
Why capture the semantic/context?
Semantic is essential to connect the dots.
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
13
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
(1) Instance Layer: Capturing raw data instances
including both structured & semi-structured data
How to capture the context?
limit cells
growth
tum
or
suppressor
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
14
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
How to capture the context?
limit cells
growth
tum
or
suppressor
(2) Relation Layer: Capturing the interconnectedness
of data instances across data sources
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
15
PTGS2
(Gene)
inhibits
TP53
(Gene)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Naproxen
(Aleve)
Disease
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Methotrexate
DHFR
(Gene)
inhibits
Arthritis
Warfarin
Embolism
(Blood Clot)
Nicotine
VKORC1
(Gene)CYP2C9
(Enzyme)
Chemical
Carboxylic
Acids
Heterocyclic
Aminopterin
Phenylpro-
pionates
Approved
Drugs
increased
degradation
inhibits
Inhibits
Inhibits
Inhibits
How to capture the context?
limit cells
growth
tum
or
suppressor
(3) Semantic Layer: Capturing conceptual relationships
among data instances and their types
© 2016 Mohammad Sadoghi (Purdue University)
Enriched Data Model: Semantic is essential to connect the dots
16
PTGS2
(Gene)
TP53
(Gene)
Acetaminophen
(Tylenol)
Rheumatoid
Arthritis
Osteosarcoma
(Bone Cancer)
Relief
Fever
Ibuprofen
(Advil)
Immune
System
Autoimmune
Joint
Diseases
Sarcoma
Neoplasms
Drug	Name Drug	Targets	
(Genes)
Symptomatic	
Treatment
Ibuprofen PTGS2 Rheumatoid	 Arthritis
Acetaminophen PTGS2 Relief Fever
Methotrexate DHFR Antineoplastic	
Anti-metabolite
Warfarin TP53	 Embolism
(Blood	 Clot)
Gene Interaction
PTGS2 TP53	(Gene)
DrugBank: Bioinformatics & Cheminformatics Resource
CTD: Comparative Toxicogenomics Database
Gene Function
TP53 Tumor	Suppressor
DHFR Limits	Cell Growth
Uniprot: Universal Protein Resource
Gene Disease
TP53	 Osteosarcoma
SemanticlayerRelationlayerInstancelayer
Methotrexate
DHFR
(Gene)
Arthritis
Warfarin
Embolism
(Blood Clot)
InformationKnowledgeData
Warfarin has narrow
therapeutic range
(fatal outcomes)
Dosage for Asians
population: 3.4 mg
Dosage for Whites
population: 5.1mg
Dosage for
African-Americans
population: 6.1 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
17
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
18
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
19
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
20
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
21
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
“What are the adverse reactions
of Warfarin?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
22
Rank	Query	
Representation
Rank	Query	Refinement
Rank	Data	Sources	Discovery
Rank	Query	Composition
Rank	Query	Answers
Rank	Answer	Evidence
Rank	Answer	
Representation
Query	Refinement	Ranking
Data	Source	Discovery	Ranking
Query	Composition	Ranking
Query	Answer	Ranking
Evidence	Ranking
Query	
Representation	Ranking
Answer	Representation	
Ranking
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
Yes/No
“Is Warfarin sensitive to
ethnic background?”
“Does Warfarin have a narrow
therapeutic range?”
“What are the disjoint classes of
population with respect to Warfarin?”
“What are the adverse reactions
of Warfarin?”
“What is an effective dosage of
Warfarin for preventing blood clot?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
23
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
24
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
25
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
Querying different sources
return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent?
(revisiting consistent answers formalism
& possible world semantics)
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
26
“Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?”
“What are the disjoint
classes of population with
respect to Warfarin?”
Querying different sources
return 6.1 mg, 5.1 mg, & 3.4 mg,
so is the data inconsistent?
(revisiting consistent answers formalism
& possible world semantics)
“What is an effective
dosage of Warfarin for
preventing blood clot?”
“Does Warfarin have
a narrow therapeutic range?”
Dosage for
African-Americans
population: 6.1 mg
Dosage for Whites
population: 5.1mg
Dosage for Asians
population: 3.4 mg
Given the known narrow therapeutic range,
so is 5.1 mg close enough to 5.0 mg?
(fuzzy answers formalism in
presence of enriched data)
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Immutable
Collection of
Objects)
Storage
Resource
Virtualization
27
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Personalized Medicine
(Drug Discovery/Safety)
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Immutable
Collection of
Objects)
Storage
Resource
Virtualization
28
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Apache Spark (General Data Processing on Distributed Memory)
Spark Data Model (Resilient Distributed Datasets — RDDs)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
29
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
30
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
31
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Semantic Layer Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
32
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Apache Spark (General Data Processing on Distributed Memory)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
33
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion Discovery
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture:Active Data Path
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
34
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Virtualized Hardware Acceleration (GPU & FPGA)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
Compliance
Informatics
© 2016 Mohammad Sadoghi (Purdue University)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
The First Step!
Applications
APIs/Services
(Access/Interfaces)
Processing
Engine
Data Model
(Enriching Raw
Data Towards
Knowledge)
Storage
Resource
Virtualization
35
Spark
Streaming
SparkSQL
BlinkDB
GraphX SparkR MLlib
ReasoningRefinementCuration Fusion
Semantic Layer
Spark Data Model (RDDs) Generic Data Model (Key-Value Store)
Ontology Rules Stochastic Models Tensor Embedding
Discovery
Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level)
Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON
Distributed File Systems (e.g., HDFS, S3, Ceph)
Distributed Memory (Tachyon)Compression (Succinct)
Resource Abstractions
(Apache Mesos)
Resource Management
(HadoopYarn)
Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP)
L-Store
(Real-time OLTP+OLAP)
FQP
(Flexible Query Processor)
EmbedS
(Ontology)
Phenomenological Features
(Deep-Learning-as-Oracle)
PADRES
(Event Processing)
IBM DB2 BLU
(Column Store)
SPIDER
(Declarative Data Cleansing)
Vraph
(Vectorized Graph Processing)
Tiresias
(Predicting Adverse Drug Reaction)
fpga-ToPSS
(Algorithmic Trading)
Compliance
Informatics
Virtualized Hardware Acceleration (GPU & FPGA)
© 2016 Mohammad Sadoghi (Purdue University)
ThankYou
Q&A
Exploratory Systems Lab (ExpoLab)
website: https://msadoghi.github.io/
© 2016 Mohammad Sadoghi (Purdue University)
Data/Knowledge Exploration:
• Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh,Yuan-Chi Chang, Mustafa Canim,Achille Fokoue,Yishai A. Feldman: Self-Curating Databases. EDBT 2016
• Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364
• Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller:Accuracy of Approximate String Joins Using Grams. QDB 2007
Drug Safety:
• Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug InteractionsThrough Large-Scale Similarity-Based Link Prediction. ESWC 2016
• Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug InteractionsThrough Similarity-Based Link Prediction OverWeb Data.WWW 2016
OLTP & OLAP:
• Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store:A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016)
• Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store:A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases.VLDB J. 25(5): 651-672 (2016)
• Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking ContentionThrough Multi-version Concurrency. PVLDB 7(13):
1331-1342 (2014)
• Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra:An SSD boosted key-value store. ICDE 2014: 1162-1167
• Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358
• Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013)
Hardware Acceleration:
• Rajesh R. Bordawekar, Mohammad Sadoghi:Accelerating database workloads by software-hardware-system co-design. ICDE 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin:A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX AnnualTechnical
Conference 2016
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen:The FQPVision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015)
• Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015
• Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013)
• Mohammad Sadoghi, Rija Javed, NaifTarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen:Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32
• Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374
• Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque,Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for AlgorithmicTrading. PVLDB 3(2):
1525-1528 (2010)
References:
1 of 37

Recommended

Parquet Strata/Hadoop World, New York 2013 by
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
133K views27 slides
Biological Foundations for Deep Learning: Towards Decision Networks by
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networksdiannepatricia
753 views25 slides
Developing Cognitive Systems to Support Team Cognition by
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
828 views21 slides
Powering Scientific Discovery with the Semantic Web (VanBUG 2014) by
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
2.1K views62 slides
Big Data, AI, and Pharma by
Big Data, AI, and PharmaBig Data, AI, and Pharma
Big Data, AI, and PharmaAmit Sheth
472 views36 slides
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases by
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesPredicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesAureus Sciences
437 views32 slides

More Related Content

Similar to "ExpoDB: An Exploratory Data Science Platform"

Connecting antimalarial data by
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial dataChris Southan
394 views1 slide
Data Integration vs Transparency: Tackling the tension by
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionPaul Groth
664 views56 slides
Generating Biomedical Hypotheses Using Semantic Web Technologies by
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
1.8K views46 slides
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology... by
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Warren Kibbe
1.3K views65 slides
PBSS sf 10-28-2016 flyer by
PBSS sf 10-28-2016 flyerPBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyerVinita Gupta
71 views1 slide
Health 2.0 for UK SpRs by
Health 2.0 for UK SpRsHealth 2.0 for UK SpRs
Health 2.0 for UK SpRsColin Mitchell
444 views49 slides

Similar to "ExpoDB: An Exploratory Data Science Platform"(20)

Connecting antimalarial data by Chris Southan
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial data
Chris Southan394 views
Data Integration vs Transparency: Tackling the tension by Paul Groth
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
Paul Groth664 views
Generating Biomedical Hypotheses Using Semantic Web Technologies by Michel Dumontier
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
Michel Dumontier1.8K views
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology... by Warren Kibbe
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Warren Kibbe1.3K views
PBSS sf 10-28-2016 flyer by Vinita Gupta
PBSS sf 10-28-2016 flyerPBSS sf 10-28-2016 flyer
PBSS sf 10-28-2016 flyer
Vinita Gupta71 views
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in... by ExternalEvents
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
ExternalEvents1.3K views
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up by open_phacts
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
open_phacts2.2K views
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So... by Larry Smarr
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Genomics in Society: Genomics, Cellular Networks, Preventive Medicine, and So...
Larry Smarr4.2K views
DNA Testing: Living Longer Via Personal Genomics by Melanie Swan
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal Genomics
Melanie Swan1.9K views
Dekker trog - knowledge engineering in radiation oncology - 2017 by Andre Dekker
Dekker   trog  - knowledge engineering in radiation oncology - 2017Dekker   trog  - knowledge engineering in radiation oncology - 2017
Dekker trog - knowledge engineering in radiation oncology - 2017
Andre Dekker327 views
Advancing Translational Research With The Semantic Web by Janelle Martinez
Advancing Translational Research With The Semantic WebAdvancing Translational Research With The Semantic Web
Advancing Translational Research With The Semantic Web
acs talk open source drug discovery by Sean Ekins
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
Sean Ekins800 views
Preparing for Microbial Threats to Health: What Every Professional Should Know by Tomas J. Aragon
Preparing for Microbial Threats to Health: What Every Professional Should KnowPreparing for Microbial Threats to Health: What Every Professional Should Know
Preparing for Microbial Threats to Health: What Every Professional Should Know
Tomas J. Aragon1.5K views
Technologies disrupting healthcare (webinar) by Ashish Advani
Technologies disrupting healthcare (webinar)Technologies disrupting healthcare (webinar)
Technologies disrupting healthcare (webinar)
Ashish Advani351 views
Methods to enhance the validity of precision guidelines emerging from big data by Chirag Patel
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
Chirag Patel384 views
Bioinformatics by JTADrexel
BioinformaticsBioinformatics
Bioinformatics
JTADrexel120.5K views

More from diannepatricia

Teaching cognitive computing with ibm watson by
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
897 views12 slides
Cognitive systems institute talk 8 june 2017 - v.1.0 by
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0diannepatricia
561 views25 slides
Building Compassionate Conversational Systems by
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
741 views36 slides
“Artificial Intelligence, Cognitive Computing and Innovating in Practice” by
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
918 views29 slides
Cognitive Insights drive self-driving Accessibility by
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
555 views19 slides
Artificial Intellingence in the Car by
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
481 views12 slides

More from diannepatricia(20)

Teaching cognitive computing with ibm watson by diannepatricia
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
diannepatricia897 views
Cognitive systems institute talk 8 june 2017 - v.1.0 by diannepatricia
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
diannepatricia561 views
Building Compassionate Conversational Systems by diannepatricia
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
diannepatricia741 views
“Artificial Intelligence, Cognitive Computing and Innovating in Practice” by diannepatricia
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
diannepatricia918 views
Cognitive Insights drive self-driving Accessibility by diannepatricia
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
diannepatricia555 views
Artificial Intellingence in the Car by diannepatricia
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
diannepatricia481 views
“Semantic PDF Processing & Document Representation” by diannepatricia
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
diannepatricia1.1K views
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R... by diannepatricia
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
diannepatricia1.3K views
170330 cognitive systems institute speaker series mark sherman - watson pr... by diannepatricia
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
diannepatricia536 views
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption” by diannepatricia
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
diannepatricia423 views
Cognitive Assistance for the Aging by diannepatricia
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
diannepatricia578 views
From complex Systems to Networks: Discovering and Modeling the Correct Network" by diannepatricia
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
diannepatricia410 views
The Role of Dialog in Augmented Intelligence by diannepatricia
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
diannepatricia864 views
Cyber-Social Learning Systems by diannepatricia
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
diannepatricia756 views
“IT Technology Trends in 2017… and Beyond” by diannepatricia
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
diannepatricia5.4K views
"Curious Learning: using a mobile platform for early literacy education as a ... by diannepatricia
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
diannepatricia730 views
Embodied Cognition - Booch HICSS50 by diannepatricia
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
diannepatricia846 views
KATE - a Platform for Machine Learning by diannepatricia
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
diannepatricia390 views
Cognitive Computing for Aging Society by diannepatricia
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
diannepatricia599 views

Recently uploaded

Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentationssuserb54b561
15 views27 slides
PharoJS - Zürich Smalltalk Group Meetup November 2023 by
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
132 views17 slides
Scaling Knowledge Graph Architectures with AI by
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
38 views15 slides
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
34 views35 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
40 views69 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
11 views29 slides

Recently uploaded(20)

Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Powerful Google developer tools for immediate impact! (2023-24) by wesley chun
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)
wesley chun10 views

"ExpoDB: An Exploratory Data Science Platform"

  • 1. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB:An Exploratory Data Science Platform (A New Frontier: From Data Processing to Knowledge Exploration) Mohammad Sadoghi Assistant Professor Department of Computer Science Purdue University IBM Cognitive Systems Institute Speaker Series September 29, 2016
  • 2. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 2 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Data is spread across many islands of disconnected sources (a lack of holistic view)
  • 3. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 3 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Sadly, adverse drug reactions (ADRs) is the 4th leading cause of deaths in United States, resulting in100,000 loss of life annually
  • 4. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 4 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Adverse drug reaction costs over $136 billion dollars in US annually
  • 5. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Data
  • 6. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data
  • 7. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data at Web Scale
  • 8. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 8 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 9. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 9 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 10. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 10 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  • 11. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 11 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor ? Why capture the semantic/context? Semantic is essential to connect the dots.
  • 12. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 12 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) tum or suppressor Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth ? ? ? Why capture the semantic/context? Semantic is essential to connect the dots.
  • 13. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 13 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits (1) Instance Layer: Capturing raw data instances including both structured & semi-structured data How to capture the context? limit cells growth tum or suppressor
  • 14. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 14 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (2) Relation Layer: Capturing the interconnectedness of data instances across data sources
  • 15. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 15 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (3) Semantic Layer: Capturing conceptual relationships among data instances and their types
  • 16. © 2016 Mohammad Sadoghi (Purdue University) Enriched Data Model: Semantic is essential to connect the dots 16 PTGS2 (Gene) TP53 (Gene) Acetaminophen (Tylenol) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Relief Fever Ibuprofen (Advil) Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Drug Name Drug Targets (Genes) Symptomatic Treatment Ibuprofen PTGS2 Rheumatoid Arthritis Acetaminophen PTGS2 Relief Fever Methotrexate DHFR Antineoplastic Anti-metabolite Warfarin TP53 Embolism (Blood Clot) Gene Interaction PTGS2 TP53 (Gene) DrugBank: Bioinformatics & Cheminformatics Resource CTD: Comparative Toxicogenomics Database Gene Function TP53 Tumor Suppressor DHFR Limits Cell Growth Uniprot: Universal Protein Resource Gene Disease TP53 Osteosarcoma SemanticlayerRelationlayerInstancelayer Methotrexate DHFR (Gene) Arthritis Warfarin Embolism (Blood Clot) InformationKnowledgeData Warfarin has narrow therapeutic range (fatal outcomes) Dosage for Asians population: 3.4 mg Dosage for Whites population: 5.1mg Dosage for African-Americans population: 6.1 mg
  • 17. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 17 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No
  • 18. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 18 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?”
  • 19. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 19 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?”
  • 20. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 20 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?”
  • 21. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 21 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?”
  • 22. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 22 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?”
  • 23. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 23 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?”
  • 24. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 24 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  • 25. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 25 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  • 26. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 26 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg Given the known narrow therapeutic range, so is 5.1 mg close enough to 5.0 mg? (fuzzy answers formalism in presence of enriched data)
  • 27. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 27 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Personalized Medicine (Drug Discovery/Safety) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Computational Finance Compliance Informatics
  • 28. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 28 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 29. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 29 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 30. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 30 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 31. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 31 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 32. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 32 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 33. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 33 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Discovery Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 34. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture:Active Data Path Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 34 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Virtualized Hardware Acceleration (GPU & FPGA) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  • 35. © 2016 Mohammad Sadoghi (Purdue University) Personalized Medicine (Drug Discovery/Safety) Computational Finance The First Step! Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 35 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) L-Store (Real-time OLTP+OLAP) FQP (Flexible Query Processor) EmbedS (Ontology) Phenomenological Features (Deep-Learning-as-Oracle) PADRES (Event Processing) IBM DB2 BLU (Column Store) SPIDER (Declarative Data Cleansing) Vraph (Vectorized Graph Processing) Tiresias (Predicting Adverse Drug Reaction) fpga-ToPSS (Algorithmic Trading) Compliance Informatics Virtualized Hardware Acceleration (GPU & FPGA)
  • 36. © 2016 Mohammad Sadoghi (Purdue University) ThankYou Q&A Exploratory Systems Lab (ExpoLab) website: https://msadoghi.github.io/
  • 37. © 2016 Mohammad Sadoghi (Purdue University) Data/Knowledge Exploration: • Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh,Yuan-Chi Chang, Mustafa Canim,Achille Fokoue,Yishai A. Feldman: Self-Curating Databases. EDBT 2016 • Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364 • Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller:Accuracy of Approximate String Joins Using Grams. QDB 2007 Drug Safety: • Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug InteractionsThrough Large-Scale Similarity-Based Link Prediction. ESWC 2016 • Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug InteractionsThrough Similarity-Based Link Prediction OverWeb Data.WWW 2016 OLTP & OLAP: • Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store:A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016) • Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store:A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases.VLDB J. 25(5): 651-672 (2016) • Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking ContentionThrough Multi-version Concurrency. PVLDB 7(13): 1331-1342 (2014) • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra:An SSD boosted key-value store. ICDE 2014: 1162-1167 • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013) Hardware Acceleration: • Rajesh R. Bordawekar, Mohammad Sadoghi:Accelerating database workloads by software-hardware-system co-design. ICDE 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin:A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX AnnualTechnical Conference 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen:The FQPVision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015) • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015 • Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013) • Mohammad Sadoghi, Rija Javed, NaifTarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen:Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374 • Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque,Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for AlgorithmicTrading. PVLDB 3(2): 1525-1528 (2010) References: