SlideShare a Scribd company logo
Schema–Independent Scientific
Data Cataloging Framework
Supun Nakandala, Sachith Dhanushka Withana, Dinu
Kumarasiri, Hirantha Jayawardena and H.M.N. Dilum
Bandara
(Department of Computer Science and Engineering, University of
Moratuwa, Sri Lanka)
Srinath Perera
(Wso2 Inc., Colombo, Sri Lanka)
Suresh Marru, Sudhakar Pamidighantam
Indiana University, Bloomington, USA
1
Scientist
Problem
Scientific Data
✧ Vast Volume
✧ Hard to …
❑ Search
❑ Reuse
❑ Share findings
2
GridChem Usecase
• Gaussian 9 experiments generate vast amount
of data in two forms
• Output file (*.out)
• Check point file (*.chk)
• Provide efficient searching among these data
3
Why we need a new one ??
Existing Solutions
● Tightly coupled
● Inflexible querying
● Static schemas
● Eg:-
● MCS
● MCAT
● MyLEAD
4
Our Solution
● Generalizable framework
● Flexible querying
o Wild card queries
o Full text queries
o Substring queries
o Fielded queries
● Static schema + dynamic fields
m
5
High-level Architecture
Folder Structure
6
What is new in our solution?
• Pluggable metadata extraction logic
• Extensible data product generation monitors
• Use of NoSQL database (Apache Solr)
• Ability to dynamically add metadata fields
7
Performance Test
8
• MySQL vs Solr
• Data Insert Performance
• Query Performance
• Exact match queries
• Range queries
• Full text queries
• Prefix match queries
• Suffix match queries
• Wildcard queries
• Substring queries
Solr resolves more complex queries 91% - 99% faster than a
MySQL-based implementation.
9
10
11
Summary
• What we did: A schema-independent
scientific data catalog with pluggable parser
logic and Solr backend
• Future work: Airavata integration and
provenance aware execution
12
Thank You …
1
3

More Related Content

Viewers also liked

SOLAR LED MINI HOME LIGHTING SYSTEM
SOLAR LED MINI HOME LIGHTING SYSTEMSOLAR LED MINI HOME LIGHTING SYSTEM
SOLAR LED MINI HOME LIGHTING SYSTEM
Sunmatrix Solar Power Systems
 
Practica 3
Practica 3Practica 3
Trabajo de proyecto de aula
Trabajo de proyecto de aulaTrabajo de proyecto de aula
Trabajo de proyecto de aulaMoi Duran
 
Verducci Event Productions Weddings
Verducci Event Productions WeddingsVerducci Event Productions Weddings
Verducci Event Productions Weddings
Anthony Mann
 
Animais muito mais_humanos
Animais muito mais_humanosAnimais muito mais_humanos
Animais muito mais_humanos
joexis
 
2014
20142014
Uriel Canepa - Aprendiendo con los simpsons
Uriel Canepa - Aprendiendo con los simpsons Uriel Canepa - Aprendiendo con los simpsons
Uriel Canepa - Aprendiendo con los simpsons
Uriel Canepa
 
Unidades 7 y 8 Proyecto Final
Unidades 7 y 8 Proyecto FinalUnidades 7 y 8 Proyecto Final
Unidades 7 y 8 Proyecto Final
Shannon Thornburg
 
Personas mayores y enfermedad: perspectiva cristiana.
Personas mayores y enfermedad: perspectiva cristiana.Personas mayores y enfermedad: perspectiva cristiana.
Personas mayores y enfermedad: perspectiva cristiana.
Consultorio Psicoterapeutico
 
tài liệu hóa hoc
tài liệu hóa hoc tài liệu hóa hoc
tài liệu hóa hoc
Minh Vũ Bình
 
The crystal gateway august newsletter & calendar
The crystal gateway august newsletter & calendarThe crystal gateway august newsletter & calendar
The crystal gateway august newsletter & calendarRosalie Muir
 

Viewers also liked (13)

SOLAR LED MINI HOME LIGHTING SYSTEM
SOLAR LED MINI HOME LIGHTING SYSTEMSOLAR LED MINI HOME LIGHTING SYSTEM
SOLAR LED MINI HOME LIGHTING SYSTEM
 
Practica 3
Practica 3Practica 3
Practica 3
 
formato APa
formato APaformato APa
formato APa
 
Trabajo de proyecto de aula
Trabajo de proyecto de aulaTrabajo de proyecto de aula
Trabajo de proyecto de aula
 
Verducci Event Productions Weddings
Verducci Event Productions WeddingsVerducci Event Productions Weddings
Verducci Event Productions Weddings
 
Animais muito mais_humanos
Animais muito mais_humanosAnimais muito mais_humanos
Animais muito mais_humanos
 
2014
20142014
2014
 
Uriel Canepa - Aprendiendo con los simpsons
Uriel Canepa - Aprendiendo con los simpsons Uriel Canepa - Aprendiendo con los simpsons
Uriel Canepa - Aprendiendo con los simpsons
 
Unidades 7 y 8 Proyecto Final
Unidades 7 y 8 Proyecto FinalUnidades 7 y 8 Proyecto Final
Unidades 7 y 8 Proyecto Final
 
Personas mayores y enfermedad: perspectiva cristiana.
Personas mayores y enfermedad: perspectiva cristiana.Personas mayores y enfermedad: perspectiva cristiana.
Personas mayores y enfermedad: perspectiva cristiana.
 
Water activists remain unheard
Water activists remain unheardWater activists remain unheard
Water activists remain unheard
 
tài liệu hóa hoc
tài liệu hóa hoc tài liệu hóa hoc
tài liệu hóa hoc
 
The crystal gateway august newsletter & calendar
The crystal gateway august newsletter & calendarThe crystal gateway august newsletter & calendar
The crystal gateway august newsletter & calendar
 

Similar to Schema-Independent Scientific Data Cataloging Framework

Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks
SakshiSingh480
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
Ben Blaiszik
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
KAMAL CHOUDHARY
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
Vahid Mirjalili
 
PLAY Minecraft!
PLAY Minecraft!PLAY Minecraft!
PLAY Minecraft!
Erin Brockette Reilly
 
DSA 1- Introduction.pdf
DSA 1- Introduction.pdfDSA 1- Introduction.pdf
DSA 1- Introduction.pdf
AliyanAbbas1
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
Vahid Mirjalili
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
Jason Hattrick-Simpers
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific System
Subhasis Dasgupta
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Kundjanasith Thonglek
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
Daniel S. Katz
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Rafael Ferreira da Silva
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
inside-BigData.com
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical Divide
Cybera Inc.
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
eSAT Publishing House
 

Similar to Schema-Independent Scientific Data Cataloging Framework (20)

Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
CV
CVCV
CV
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
PLAY Minecraft!
PLAY Minecraft!PLAY Minecraft!
PLAY Minecraft!
 
DSA 1- Introduction.pdf
DSA 1- Introduction.pdfDSA 1- Introduction.pdf
DSA 1- Introduction.pdf
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific System
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
 
GSU-RF-2013-Reddy-4
GSU-RF-2013-Reddy-4GSU-RF-2013-Reddy-4
GSU-RF-2013-Reddy-4
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical Divide
 
DOC
DOCDOC
DOC
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
 

Recently uploaded

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 

Recently uploaded (20)

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 

Schema-Independent Scientific Data Cataloging Framework

  • 1. Schema–Independent Scientific Data Cataloging Framework Supun Nakandala, Sachith Dhanushka Withana, Dinu Kumarasiri, Hirantha Jayawardena and H.M.N. Dilum Bandara (Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka) Srinath Perera (Wso2 Inc., Colombo, Sri Lanka) Suresh Marru, Sudhakar Pamidighantam Indiana University, Bloomington, USA 1
  • 2. Scientist Problem Scientific Data ✧ Vast Volume ✧ Hard to … ❑ Search ❑ Reuse ❑ Share findings 2
  • 3. GridChem Usecase • Gaussian 9 experiments generate vast amount of data in two forms • Output file (*.out) • Check point file (*.chk) • Provide efficient searching among these data 3
  • 4. Why we need a new one ?? Existing Solutions ● Tightly coupled ● Inflexible querying ● Static schemas ● Eg:- ● MCS ● MCAT ● MyLEAD 4 Our Solution ● Generalizable framework ● Flexible querying o Wild card queries o Full text queries o Substring queries o Fielded queries ● Static schema + dynamic fields
  • 7. What is new in our solution? • Pluggable metadata extraction logic • Extensible data product generation monitors • Use of NoSQL database (Apache Solr) • Ability to dynamically add metadata fields 7
  • 8. Performance Test 8 • MySQL vs Solr • Data Insert Performance • Query Performance • Exact match queries • Range queries • Full text queries • Prefix match queries • Suffix match queries • Wildcard queries • Substring queries Solr resolves more complex queries 91% - 99% faster than a MySQL-based implementation.
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. Summary • What we did: A schema-independent scientific data catalog with pluggable parser logic and Solr backend • Future work: Airavata integration and provenance aware execution 12