© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB:An Exploratory Data Science Platform
(A New Frontier: From Data Process...
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
2
http://www.cpsresearch.eu/clinical-trials...
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
3
http://www.cpsresearch.eu/clinical-trials...
© 2016 Mohammad Sadoghi (Purdue University)
Insight is Lost in Islands of Data
4
http://www.cpsresearch.eu/clinical-trials...
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data
© 2016 Mohammad Sadoghi (Purdue University)
Real-time Fusion and Exploration of Enriched Data at Web Scale
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
8
PTGS2...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
9
PTGS2...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
10
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
11
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
12
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
13
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
14
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data
15
PTGS...
© 2016 Mohammad Sadoghi (Purdue University)
Enriched Data Model: Semantic is essential to connect the dots
16
PTGS2
(Gene)...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
17
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
18
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
19
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
20
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
21
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
22
Rank	Query	
Representation
Rank	Query	Refinement
...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
23
“Is 5.0 mg an effective dosage of Warfarin for pr...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
24
“Is 5.0 mg an effective dosage of Warfarin for pr...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
25
“Is 5.0 mg an effective dosage of Warfarin for pr...
© 2016 Mohammad Sadoghi (Purdue University)
Context-aware Query Model
26
“Is 5.0 mg an effective dosage of Warfarin for pr...
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Int...
© 2016 Mohammad Sadoghi (Purdue University)
Spark Architecture: Knowledge Oblivious
Applications
APIs/Services
(Access/Int...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture: From Data to Knowledge
Applications
APIs/Services
(Access...
© 2016 Mohammad Sadoghi (Purdue University)
ExpoDB Architecture:Active Data Path
Applications
APIs/Services
(Access/Interf...
© 2016 Mohammad Sadoghi (Purdue University)
Personalized Medicine
(Drug Discovery/Safety)
Computational Finance
The First ...
© 2016 Mohammad Sadoghi (Purdue University)
ThankYou
Q&A
Exploratory Systems Lab (ExpoLab)
website: https://msadoghi.githu...
© 2016 Mohammad Sadoghi (Purdue University)
Data/Knowledge Exploration:
• Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassan...
Upcoming SlideShare
Loading in …5
×

"ExpoDB: An Exploratory Data Science Platform"

412 views

Published on

Dr. Mohammad Sadoghi, Professor of Computer Science at Purdue University, gave this presentation at the Cognitive Systems Institute Speaker Series on September 29,2016

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
412
On SlideShare
0
From Embeds
0
Number of Embeds
112
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"ExpoDB: An Exploratory Data Science Platform"

  1. 1. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB:An Exploratory Data Science Platform (A New Frontier: From Data Processing to Knowledge Exploration) Mohammad Sadoghi Assistant Professor Department of Computer Science Purdue University IBM Cognitive Systems Institute Speaker Series September 29, 2016
  2. 2. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 2 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Data is spread across many islands of disconnected sources (a lack of holistic view)
  3. 3. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 3 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Sadly, adverse drug reactions (ADRs) is the 4th leading cause of deaths in United States, resulting in100,000 loss of life annually
  4. 4. © 2016 Mohammad Sadoghi (Purdue University) Insight is Lost in Islands of Data 4 http://www.cpsresearch.eu/clinical-trials/ http://news.mit.edu/2015/mnookin-vaccination-public-health-0227 http://www.healthcarepackaging.com/trends-and-issues/clinical-trials http://stormercellularloo.gq/evolve-ii-clinical-trial.html https://www.geneticliteracyproject.org Adverse drug reaction costs over $136 billion dollars in US annually
  5. 5. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Data
  6. 6. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data
  7. 7. © 2016 Mohammad Sadoghi (Purdue University) Real-time Fusion and Exploration of Enriched Data at Web Scale
  8. 8. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 8 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  9. 9. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 9 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  10. 10. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 10 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor Why capture the semantic/context? Semantic is essential to connect the dots.
  11. 11. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 11 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits limit cells growth Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits tum or suppressor ? Why capture the semantic/context? Semantic is essential to connect the dots.
  12. 12. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 12 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) tum or suppressor Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits limit cells growth ? ? ? Why capture the semantic/context? Semantic is essential to connect the dots.
  13. 13. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 13 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits (1) Instance Layer: Capturing raw data instances including both structured & semi-structured data How to capture the context? limit cells growth tum or suppressor
  14. 14. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 14 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (2) Relation Layer: Capturing the interconnectedness of data instances across data sources
  15. 15. © 2016 Mohammad Sadoghi (Purdue University) Drug Safety: Challenges of Real-time Fusion & Exploration of Open Data 15 PTGS2 (Gene) inhibits TP53 (Gene) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Naproxen (Aleve) Disease Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Methotrexate DHFR (Gene) inhibits Arthritis Warfarin Embolism (Blood Clot) Nicotine VKORC1 (Gene)CYP2C9 (Enzyme) Chemical Carboxylic Acids Heterocyclic Aminopterin Phenylpro- pionates Approved Drugs increased degradation inhibits Inhibits Inhibits Inhibits How to capture the context? limit cells growth tum or suppressor (3) Semantic Layer: Capturing conceptual relationships among data instances and their types
  16. 16. © 2016 Mohammad Sadoghi (Purdue University) Enriched Data Model: Semantic is essential to connect the dots 16 PTGS2 (Gene) TP53 (Gene) Acetaminophen (Tylenol) Rheumatoid Arthritis Osteosarcoma (Bone Cancer) Relief Fever Ibuprofen (Advil) Immune System Autoimmune Joint Diseases Sarcoma Neoplasms Drug Name Drug Targets (Genes) Symptomatic Treatment Ibuprofen PTGS2 Rheumatoid Arthritis Acetaminophen PTGS2 Relief Fever Methotrexate DHFR Antineoplastic Anti-metabolite Warfarin TP53 Embolism (Blood Clot) Gene Interaction PTGS2 TP53 (Gene) DrugBank: Bioinformatics & Cheminformatics Resource CTD: Comparative Toxicogenomics Database Gene Function TP53 Tumor Suppressor DHFR Limits Cell Growth Uniprot: Universal Protein Resource Gene Disease TP53 Osteosarcoma SemanticlayerRelationlayerInstancelayer Methotrexate DHFR (Gene) Arthritis Warfarin Embolism (Blood Clot) InformationKnowledgeData Warfarin has narrow therapeutic range (fatal outcomes) Dosage for Asians population: 3.4 mg Dosage for Whites population: 5.1mg Dosage for African-Americans population: 6.1 mg
  17. 17. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 17 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No
  18. 18. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 18 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?”
  19. 19. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 19 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?”
  20. 20. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 20 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?”
  21. 21. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 21 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?”
  22. 22. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 22 Rank Query Representation Rank Query Refinement Rank Data Sources Discovery Rank Query Composition Rank Query Answers Rank Answer Evidence Rank Answer Representation Query Refinement Ranking Data Source Discovery Ranking Query Composition Ranking Query Answer Ranking Evidence Ranking Query Representation Ranking Answer Representation Ranking “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” Yes/No “Is Warfarin sensitive to ethnic background?” “Does Warfarin have a narrow therapeutic range?” “What are the disjoint classes of population with respect to Warfarin?” “What are the adverse reactions of Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?”
  23. 23. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 23 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?”
  24. 24. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 24 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  25. 25. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 25 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg
  26. 26. © 2016 Mohammad Sadoghi (Purdue University) Context-aware Query Model 26 “Is 5.0 mg an effective dosage of Warfarin for preventing blood clot?” “What are the disjoint classes of population with respect to Warfarin?” Querying different sources return 6.1 mg, 5.1 mg, & 3.4 mg, so is the data inconsistent? (revisiting consistent answers formalism & possible world semantics) “What is an effective dosage of Warfarin for preventing blood clot?” “Does Warfarin have a narrow therapeutic range?” Dosage for African-Americans population: 6.1 mg Dosage for Whites population: 5.1mg Dosage for Asians population: 3.4 mg Given the known narrow therapeutic range, so is 5.1 mg close enough to 5.0 mg? (fuzzy answers formalism in presence of enriched data)
  27. 27. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 27 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Personalized Medicine (Drug Discovery/Safety) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Computational Finance Compliance Informatics
  28. 28. © 2016 Mohammad Sadoghi (Purdue University) Spark Architecture: Knowledge Oblivious Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Immutable Collection of Objects) Storage Resource Virtualization 28 Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Apache Spark (General Data Processing on Distributed Memory) Spark Data Model (Resilient Distributed Datasets — RDDs) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  29. 29. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 29 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  30. 30. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 30 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  31. 31. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 31 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  32. 32. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 32 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Apache Spark (General Data Processing on Distributed Memory) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  33. 33. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture: From Data to Knowledge Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 33 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Discovery Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  34. 34. © 2016 Mohammad Sadoghi (Purdue University) ExpoDB Architecture:Active Data Path Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 34 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Virtualized Hardware Acceleration (GPU & FPGA) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) Personalized Medicine (Drug Discovery/Safety) Computational Finance Compliance Informatics
  35. 35. © 2016 Mohammad Sadoghi (Purdue University) Personalized Medicine (Drug Discovery/Safety) Computational Finance The First Step! Applications APIs/Services (Access/Interfaces) Processing Engine Data Model (Enriching Raw Data Towards Knowledge) Storage Resource Virtualization 35 Spark Streaming SparkSQL BlinkDB GraphX SparkR MLlib ReasoningRefinementCuration Fusion Semantic Layer Spark Data Model (RDDs) Generic Data Model (Key-Value Store) Ontology Rules Stochastic Models Tensor Embedding Discovery Relation Layer Intra- & Inter-domain Linkage (fine-grained & instance-level) Instance Layer Relational Graph/RDF Dense/Sparse MatricesJSON Distributed File Systems (e.g., HDFS, S3, Ceph) Distributed Memory (Tachyon)Compression (Succinct) Resource Abstractions (Apache Mesos) Resource Management (HadoopYarn) Online Transactional Processing (OLTP) + Online Analytical Processing (OLAP) L-Store (Real-time OLTP+OLAP) FQP (Flexible Query Processor) EmbedS (Ontology) Phenomenological Features (Deep-Learning-as-Oracle) PADRES (Event Processing) IBM DB2 BLU (Column Store) SPIDER (Declarative Data Cleansing) Vraph (Vectorized Graph Processing) Tiresias (Predicting Adverse Drug Reaction) fpga-ToPSS (Algorithmic Trading) Compliance Informatics Virtualized Hardware Acceleration (GPU & FPGA)
  36. 36. © 2016 Mohammad Sadoghi (Purdue University) ThankYou Q&A Exploratory Systems Lab (ExpoLab) website: https://msadoghi.github.io/
  37. 37. © 2016 Mohammad Sadoghi (Purdue University) Data/Knowledge Exploration: • Mohammad Sadoghi, Kavitha Srinivas, Oktie Hassanzadeh,Yuan-Chi Chang, Mustafa Canim,Achille Fokoue,Yishai A. Feldman: Self-Curating Databases. EDBT 2016 • Amit Chandel, Oktie Hassanzadeh, Nick Koudas, Mohammad Sadoghi, Divesh Srivastava: Benchmarking declarative approximate selection predicates. SIGMOD Conference 2007: 353-364 • Oktie Hassanzadeh, Mohammad Sadoghi, Renée J. Miller:Accuracy of Approximate String Joins Using Grams. QDB 2007 Drug Safety: • Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang: Predicting Drug-Drug InteractionsThrough Large-Scale Similarity-Based Link Prediction. ESWC 2016 • Achille Fokoue, Oktie Hassanzadeh, Mohammad Sadoghi, Ping Zhang: Predicting Drug-Drug InteractionsThrough Similarity-Based Link Prediction OverWeb Data.WWW 2016 OLTP & OLAP: • Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, Mustafa Canim: L-Store:A Real-time OLTP and OLAP System. CoRR abs/1601.04084 (2016) • Kaiwen Zhang, Mohammad Sadoghi, Hans-Arno Jacobsen: DL-Store:A Distributed Hybrid OLTP and OLAP Data Processing Engine. ICDCS 2016 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases.VLDB J. 25(5): 651-672 (2016) • Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, Kenneth A. Ross: Reducing Database Locking ContentionThrough Multi-version Concurrency. PVLDB 7(13): 1331-1342 (2014) • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: CaSSanDra:An SSD boosted key-value store. ICDE 2014: 1162-1167 • Prashanth Menon,Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen: Optimizing key-value stores for hybrid storage architectures. CASCON 2014: 355-358 • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Making Updates Disk-I/O Friendly Using SSDs. PVLDB 6(11): 997-1008 (2013) Hardware Acceleration: • Rajesh R. Bordawekar, Mohammad Sadoghi:Accelerating database workloads by software-hardware-system co-design. ICDE 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: SplitJoin:A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. USENIX AnnualTechnical Conference 2016 • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen:The FQPVision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record 44(2): 5-10 (2015) • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Configurable hardware-based streaming architecture using Online Programmable-Blocks. ICDE 2015 • Mohammedreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: Flexible Query Processor on FPGAs. PVLDB 6(12): 1310-1313 (2013) • Mohammad Sadoghi, Rija Javed, NaifTarafdar, Harsh Singh, Rohan Palaniappan, Hans-Arno Jacobsen: Multi-query Stream Processing on FPGAs. ICDE 2012: 1229-1232 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen:Towards highly parallel event processing through reconfigurable hardware. DaMoN 2011: 27-32 • Mohammad Sadoghi, Harsh Singh, Hans-Arno Jacobsen: fpga-ToPSS: line-speed event processing on fpgas. DEBS 2011: 373-374 • Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque,Warren Shum, Harsh Singh: Efficient Event Processing through Reconfigurable Hardware for AlgorithmicTrading. PVLDB 3(2): 1525-1528 (2010) References:

×