High level presentation of data
modeler’s job & available tools
INTRODUCTION
Objectives of Euclid Datamodel 101

Slidecast dedicated to Euclid data modelers & developers
Help you underst...
INTRODUCTION
Objectives and contents of this presentation

Get an overview of the data modeling process
Understand the dat...
SUMMARY
The data modeling process

1 - Understand Euclid DataModel
Why using a Euclid DataModel? Why choosing XML? What is...
Why using a Euclid DataModel?
Euclid mission relies a lot on data transfer and manipulation
Data consistency between OUs, ...
Why choosing XML?
XML language brings many benefits:
Easy to read and understand by humans and machines
<coord>
<x>12.05</...
What is XML Schema?
Two file format you should be familiar with:

XSD (XML schema)

XML

Describes the data structure

Con...
What are the Euclid-specific XML rules my
schema shall comply with?
Need for a fully consistent DataModel
everybody should...
How is DataModel SVN repo structured?
Classic SVN structure
- trunk: latest stable work
- branches: specific feature paral...
How are xml namespaces structured?
Under Dictionary and Interfaces, 4 top-level namespaces
- bas: common definitions share...
What should my DataModel contain?
Your DataModel should contain:
Must have

- definitions of pipeline inputs
- definitions...
What software can I use to write XML?
Of course, any text editor allows you to simply read and write XML
One of these two ...
How can I check if my DataModel is correct?
Use Oxygen or XMLSpy to validate your XML and XML Schema files
Well formed XML...
How can I use the DataModel in my code?
In your pipelines code, you might want to
in

Read and modify existing XML files
P...
How do I use XML data bindings?
Two XML binding libraries available for Euclid
For Python, based on PyXB
For C++, based on...
Can I get pre-configured tools at once?
We are building a virtual machine you can use on your own computer
Based on Scient...
In the next episode…

Tips DataModel from its creation
to the pipeline code

Stay tuned !
Upcoming SlideShare
Loading in …5
×

Euclid Data Model 101 - Episode 01: Overview

375 views
311 views

Published on

First episode of Euclid Data Model 101. High-level presentation of data modeler's job & available tools.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
375
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Euclid Data Model 101 - Episode 01: Overview

  1. 1. High level presentation of data modeler’s job & available tools
  2. 2. INTRODUCTION Objectives of Euclid Datamodel 101 Slidecast dedicated to Euclid data modelers & developers Help you understand what is expected and how to do it Released as multiple episodes over time 1st episode: high-level overview of tools and process 2nd episode: the TIPS example Following episodes: zoom on technical points
  3. 3. INTRODUCTION Objectives and contents of this presentation Get an overview of the data modeling process Understand the data model workflow Know where to find information Know what tools are available No complex details and technical information here… … but high-level information and pointers to the right direction.
  4. 4. SUMMARY The data modeling process 1 - Understand Euclid DataModel Why using a Euclid DataModel? Why choosing XML? What is XML Schema? What are the Euclid-specific XML rules my schema shall comply with? How is DataModel SVN repository structured? How are xml namespaces structured? 2 - Create your own DataModel What should my DataModel contain? What software can I use to write xml? How can I check if my datamodel is correct? 3 - Use the DataModel in your own code How can I use the data model in my code? How can I use XML data bindings? Can I get pre-configured tools all at once?
  5. 5. Why using a Euclid DataModel? Euclid mission relies a lot on data transfer and manipulation Data consistency between OUs, workflows, pipelines, storage is a key point use EAS Your DataModel will be: - used to structure EAS db - manipulated by your pipelines code Compliant Data products in/out DataModel IAL SDC use Pipeline code DESIGN TIME RUN TIME Your data products will be: - stored on EAS - queryable from EAS - transmitted to/from pipelines by IAL
  6. 6. Why choosing XML? XML language brings many benefits: Easy to read and understand by humans and machines <coord> <x>12.05</x> <y>3.1</y> </coord> Many tools available to create, control and check xml Strong type/namespace control and definition Widely used and supported across the world Self contained: express data and data structure XML chosen above many other alternatives Find information - W3Schools tutorials: http://www.w3schools.com/schema/
  7. 7. What is XML Schema? Two file format you should be familiar with: XSD (XML schema) XML Describes the data structure Contains the actual data <coord> <x>12.05</x> <y>3.1</y> </coord> complies with <xs:element name=«coord»> <xs:complexType> <xs:sequence> <xs:element name=«x» type=«xs:float» /> <xs:element name=«y» type=«xs:float» /> </xs:sequence> </xs:complexType> </xs:element> Find information - W3Schools tutorials: http://www.w3schools.com/schema/ - Highlights on XML/XSD (DM Workshop): http://euclid.roe.ac.uk/attachments/download/2744/Workshop_Nov2013_XSD_XML%20-%202.02.ppt
  8. 8. What are the Euclid-specific XML rules my schema shall comply with? Need for a fully consistent DataModel everybody should follow the same rules Among existing rules: - XML Schema file name XML file name Single root element Element identifier name Numeric type restriction - Recursive definitions Target namespaces Encoding Unqualified namespaces … Rules are still in development, feedback is welcome and changes might be required Find information - Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47 - DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf
  9. 9. How is DataModel SVN repo structured? Classic SVN structure - trunk: latest stable work - branches: specific feature parallel development - tags: official releases Dictionary and Interfaces for your products - Dictionnary: definition of the complexTypes and elements of your product entire DataModel - Interfaces: definition of the data exchanged between components. One root element only per type, that you can see as a variable to access a product. EC/SGS/ST/4-2-05-DM/schema Find information - DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf - Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html - Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
  10. 10. How are xml namespaces structured? Under Dictionary and Interfaces, 4 top-level namespaces - bas: common definitions shared by everyone - ins: instrument specific definitions - pro: OU-specific definitions - sys: system specific definitions (storage, processing…) /pro sub-levels - one directory per OU - one responsible custodian per directory EC/SGS/ST/4-2-05-DM/schema Find information - DMWorkshop svn presentation: http://euclid.roe.ac.uk/projects/eucrma/wiki/20131411DMWSconf - Dictionary of types: https://apceucliddev.in2p3.fr/jenkins/job/Dictionary/ws/eXist/dictionary.html - Configuration management & best practices: http://euclid.roe.ac.uk/projects/eucrma/wiki#Configuration-management
  11. 11. What should my DataModel contain? Your DataModel should contain: Must have - definitions of pipeline inputs - definitions of output products - definitions of intermediate elements used in your code <sgs:dataContainer> • ID • Filename • StorageNode • Path Your DataModel can use: - new elements you define - already existing elements - dataContainers for files with no specific definition Find information - Fits DataModel (see dictionary and interfaces): schema/trunk/Dictionary/pro/sim/euc-test-ousim-tips.xsd - DM Workshop DataContainer presentation: http://euclid.roe.ac.uk/attachments/download/2765 - DM wiki homepage: http://euclid.roe.ac.uk/projects/eucrma/wiki
  12. 12. What software can I use to write XML? Of course, any text editor allows you to simply read and write XML One of these two powerful XML development environment software is recommended - Altova XMLSpy (license from 400€) - Oxygen XML Editor (license from 99$ - 30 days free trial) Project oriented browsing, handles dependencies between files Content completion for elements, attributes & values XML validation and detection of errors Schema modeling with graph representation Find information - Altova XMLSpy: http://www.altova.com/xmlspy.html - Oxygen XML Editor: http://www.oxygenxml.com/
  13. 13. How can I check if my DataModel is correct? Use Oxygen or XMLSpy to validate your XML and XML Schema files Well formed XML: correct language syntax Document validation: xml conforms to xml schema definition Use Euclid Data Model Checker tools Check compliance with Euclid DataModel rules Python module & scripts available in Euclid SVN (ECSGSST4-2-05-DMtoolstrunkDataModelChecker) Find information - Altova XMLSpy: http://www.altova.com/xmlspy.html Oxygen XML Editor: http://www.oxygenxml.com/ Official Euclid XML rules: http://euclid.roe.ac.uk/dmsf/eucrma?folder_id=47 DM Workshop presentation: http://euclid.roe.ac.uk/attachments/download/2762/DM-Rules.pdf - DataModelChecker readme (SVN): ECSGSST4-2-05-DMtoolstrunkDataModelCheckerdoc
  14. 14. How can I use the DataModel in my code? In your pipelines code, you might want to in Read and modify existing XML files Produce new XML files Manipulate data as specified in the DataModel (no XML file) Multiple ways to do that Must be avoided Use bindings generation Bindings: Pipeline code Data Model Manually parse XML files Prefered way use Use XPATH and xml libraries (Python lxml) XML Schema elements become class definitions XML product becomes an object instance Find information - XML data bindings resources: http://www.rpbourret.com/xml/XMLDataBinding.htm out
  15. 15. How do I use XML data bindings? Two XML binding libraries available for Euclid For Python, based on PyXB For C++, based on CodeSynthesis XSD First step: generate classes from the DataModel C++ classes: .hxx & .cxx generateStubs.py DataModel XML Schema (.xsd files) C++ Python generate_allbindings.sh Python classes: .py Second step: use generated classes in your own code Create and access elements as you would do with usual classes/objects Find information - Python Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/PythonBinding - C++ Bindings library: (SVN)/EC/SGS/ST/4-2-05-DM/tools/trunk/CppBinding - DMWorkshop Python bindings presentation: http://euclid.roe.ac.uk/attachments/download/2734 - DMWorkshop C++ bindings presentation: http://euclid.roe.ac.uk/attachments/download/2745 & http://euclid.roe.ac.uk/attachments/download/2773
  16. 16. Can I get pre-configured tools at once? We are building a virtual machine you can use on your own computer Based on Scientific Linux 6 (OS supported for Euclid) Linked to Euclid CODEEN yum repository for package installation Linked to Euclid SVN for source code checkin/checkout Containing - Required software libraries Pre-configured development environment C++ & Python bindings generation libraries Data Model Checker tools … and more Still in development, hopefully available soon Find information - CODEEN yum packages list: https://apceuclidrepo.in2p3.fr/nexus/content/groups/el6.euclid/ - Virtualbox virtualization tool: https://www.virtualbox.org/ - VMWare virtualization tool: http://www.vmware.com/fr/products/player/
  17. 17. In the next episode… Tips DataModel from its creation to the pipeline code Stay tuned !

×