1. SciQL
A Query Language for Unified Scientific Data Processing and
Management
Javad Chamanara
University of Jena, Germany
javad.chamanara@uni-jena.de
At:
CIKM 2012, Maui, HI, USA
Nov. 2, 2012
2. What is scientific data?
November 2, 2012 javad.chamanara@uni-jena.de 2
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
3. What is available?
November 2, 2012 javad.chamanara@uni-jena.de 3
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
4. What is proposed here?
November 2, 2012 javad.chamanara@uni-jena.de 4
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
5. What does it provide?
November 2, 2012 javad.chamanara@uni-jena.de 5
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
6. A Sample
Define Perspective p1 As
{
Attribute Temp_Fahrenheit MapTo Function(1.8 * Temp_Celsius + 32)
Attribute SN_mg MapTo Function(SN_g * 1000)
Attribute Year MapTo Function(Year(Timestamp)) DataType=Integer
}
Connection d Adapter=Spreadsheet Source_URI="c:datadata1.xls"
Bind Perspective=p1 Connection=d Version=Latest As pdLatest
Var pdAll = Select From pdLatest
Draw Data=pdLatest GraphType=Scatter V-Axis=NS_mg H-Axis=Temp_Fahrenheit
Var pdGroupped = Select Average(Temp_Fahrenheit) As Avg From pdLatest Group
By Year
November 2, 2012 javad.chamanara@uni-jena.de 6
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
7. How does it work?
Var x = Select Average(Temp_Fahrenheit) As Avg From
pdLatest Where Year > 2001 Group By Year
November 2, 2012 javad.chamanara@uni-jena.de 7
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
8. How does it work? (AST)
=
Select
VAR DEF Project Fetch Filter Aggregate
pdLat
Var x Avg est > Group
Year 2001 Year
November 2, 2012 javad.chamanara@uni-jena.de 8
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
9. How does it work? (E-AST, CSV Adapter)
=
Select
VAR DEF Project Fetch Filter Aggregate
CSV
pdLat
Var x Avg est > Group
Year 2001 Year
November 2, 2012 javad.chamanara@uni-jena.de 9
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
10. How does it work? (E-AST, Excel Adapter)
=
Select
Default
VAR DEF Project Fetch Filter Aggregate
Default Excel Excel Excel Default
pdLat
Var x Avg est > Group
Year 2001 Year
November 2, 2012 javad.chamanara@uni-jena.de 10
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
11. How does it work? (E-AST, Database Adapter)
=
Select
DB
VAR DEF Project Fetch Filter Aggregate
Default DB DB DB DB
pdLat
Var x Avg est > Group
Year 2001 Year
November 2, 2012 javad.chamanara@uni-jena.de 11
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
12. Design
• Grammar
• Architecture
• Execution Engine
November 2, 2012 javad.chamanara@uni-jena.de 12
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
13. SciQL Language Constructs
November 2, 2012 javad.chamanara@uni-jena.de 13
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
14. The Grammar
November 2, 2012 javad.chamanara@uni-jena.de 14
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
15. General Architecture
cmp Components
Custom Matlab R Console Declarative Console
Application
SciQL
Spreadsheet Adapter RDBMS Adapter Vendor Specific
Adapter
CSV Spreadsheet R DBMS Other
November 2, 2012 javad.chamanara@uni-jena.de 15
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
16. Query Execution Engine
Query Engine
Data
Source
Adapter
E-AST Result set
Query
Execution
Engine
November 2, 2012 javad.chamanara@uni-jena.de 16
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
17. Mapping
cmp Perspectiv e
Perpectiv e 1 Data Perspectiv e 2
Data Field 1 Data Field 1 Attribute A
Attribute 1
Data Field 2 Data Field 2
Attribute B
Attribute 2
Data Field 3
Data Field 3
Port1 Attribute C
Attribute 3 Data Field 4
November 2, 2012 javad.chamanara@uni-jena.de 17
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
18. What would be the benefits?
• Scientists deal with just one language
• It has a data source independent instruction
set
• Its easier to learn and share
• Integration to other tools is easy
• Mitigates the need for computer knowledge
November 2, 2012 javad.chamanara@uni-jena.de 18
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
19. The Evaluation Plan
• To be used in the context of BExIS
– Big and diverse user community
– Various data
• Open source and free
– Early feedback
– Contribution
November 2, 2012 javad.chamanara@uni-jena.de 19
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
20. The Work Plan
• Define the grammar of the language
– 6-9 months
• Compare to related works and revise
– 3-6 months
• Compile the formal specification of the language
– 3-6 months
• Develop the proof of concept implementation
– 9-12 months
• Evaluation
– 6 months
November 2, 2012 javad.chamanara@uni-jena.de 20
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
21. Thanks
November 2, 2012 javad.chamanara@uni-jena.de 21
SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
Editor's Notes
Describe that these slide are showing concept maps in that, boxes are the concepts and labels on the relationships are the meanings/ purposes/ reasonsArrows and lines are equal, it is a tool issueData driven scienceHere are just some attributes of the scientific data
Related workTools have single focus/ general on processing and visualizationversioning/ provenance issuesData processing pipelineImpedance mismatch Format, Data type, Unit, AccuracyShaping data to work in workflowsMulti tool integration:
Is customized to work on scientific dataConsiders VersionsProduces provenance datathe difference and similarities to the slide before
BKR: Again, a layout more similar to the previous one would make it easier for the listener to get the picture ;-)
Describe the sample in briefPoint to the last select statement and tell that you like to investigate what happens to it.
User InputState Information
Input ParsingTree Construction
CSV AdapterDefault adapterAdapter capability matchingAST node selection based on the adapter’s capabilities
Spreadsheet Adapter
Database Adapter
The designed grammar is implemented in a language design framework likeAntLR/ JavaBKR: I think this won’t be readable
QEE: Optimization, Caching, State Management, AST Node selection and delegation, Result compilationAdapter: E-AST node implementation, executes the received node against the actual data sourceData Source: Is data + functionality. Data sources like spreadsheets, DBMSs, etc. have functions that the adapter may rely on them.