Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A CMD Core Model for CLARIN Web Services


Published on

Presentation at the Metadata2012 workshop at LREC12 in Istanbul, Turkey, May 22, 2012

  • Be the first to comment

  • Be the first to like this

A CMD Core Model for CLARIN Web Services

  1. 1. A CMD Core Model for CLARIN Web Services Menzo Windhouwer, Daan Broeder, Dieter van Uytvanck The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands
  2. 2. CLARIN visionOur vision is that the resources for processinglanguage, the data to be processed as well as appropriateguidance, advice and training be made available and canbe accessed over a distributed network from the usersdesktop. CLARIN proposes to make this vision a reality:the user will have access to guidance and advice throughdistributed knowledge centres, and via a single sign-onthe user will have access to repositories of data withstandardized descriptions, processing tools ready tooperate on standardized data, and all of this will beavailable on the internet using a service orientedarchitecture based on secure grid technologies. May 2012 LREC Metadata 2012 workshop 2
  3. 3. Outline• Web Service architectures• Component Metadata Infrastructure (CMDI)• National CLARIN initiatives• CMD core model for Web Services• Usage of the core model• Future work and conclusions22 May 2012 LREC Metadata 2012 workshop 3
  4. 4. Service inputtext=‘Welcome to Istanbul’ service output tokenize tokens=[‘Welcome’, ‘to’, ‘Istanbul’] separator=whitespace22 May 2012 LREC Metadata 2012 workshop 4
  5. 5. Web Service web server inputtext=‘Welcome to Istanbul’ service output tokenize tokens=[‘Welcome’, ‘to’, ‘Istanbul’] separator=whitespace How to invoke a service and pass the parameters? Different Service Oriented Architectures.22 May 2012 LREC Metadata 2012 workshop 5
  6. 6. SOA: RESTful Resource Orientation web server input text=‘Welcome to Istanbul’ service output tokenize tokens=[‘Welcome’, ‘to’, ‘Istanbul’] separator=whitespace• Resource oriented instead of service oriented • An URL tells what resource to operate on • Uses HTTP verbs (PUT, GET, POST, DELETE) to tell what to do1. create a resource PUT Content-type: plain/text Welcome to Istanbul response 201 Location: request tokens resource 6 GET
  7. 7. SOA: Remote Procedure Call web server input text=‘Welcome to Istanbul’ service output tokenize tokens=[‘Welcome’, ‘to’, ‘Istanbul’] separator=whitespace• The XML-RPC and SOAP are HTTP oriented RPC standards • An URL may function as an endpoint for several operations • Uses a standard envelope format to tell what to doPOST text/xml<methodCall> <methodName>tokenize</methodName> <params> <param><value><string>Welcome to Istanbul</string></value></param> <param><value><string>whitespace</string></value></param> </params> 7</methodCall>
  8. 8. SOA: REST-RPC hybrid web server input text=‘Welcome to Istanbul’ service output tokenize tokens=[‘Welcome’, ‘to’, ‘Istanbul’] separator=whitespace• Mixes REST and RPC • Can be more service than resource oriented • URL indicates what operation to perform on which dataGET 22 May 2012 LREC Metadata 2012 workshop 8
  9. 9. Interface Description Language• RPC architectures tend to have an IDL which allows to describe which operations are available at the endpoint – SOAP: Web Service Description Language • WSDL (2)• For REST and REST-RPC hybrids an IDL is controversial – Once you have the resource URL you can ‘just’ follow the links, e.g., like a web crawler does with HTML pages – However, REST(-RPC) allows too much freedom to allow a machine to infer how to retrieve a resource or invoke a service • RFC 6570: URI Template • WADL (old W3C submission by Sun) • WSDL 2 • ...22 May 2012 LREC Metadata 2012 workshop 9
  10. 10. Profile matching• To place Web Services in a chain or a workflow an user can be supported by profile matching – which service can operate on the input the user currently has available• The IDL describes the technical needs to invoke a service, but profile matching needs more semantic information – it’s not useful to invoke the tokenizer on a project name, although it is a string of characters – next to a technical description also a semantic description is needed22 May 2012 LREC Metadata 2012 workshop 10
  11. 11. National CLARIN initiatives - Spain• IULA at UPF (continuation in PANACEA)• architecture: RPC (SOAP)• IDL: WSDL• semantic description: – SoapLab 2 and myGrid inspired – a CMD profile has been created May 2012 11
  12. 12. National CLARIN initiatives - Germany• WebLicht (D-SPIN continuation in CLARIN-D)• architecture: REST-RPC• IDL: none as there is a known pattern, i.e., POST TCF documents• semantic description: – WebLicht used a propriety service description – WebLicht 2.0 uses a core model compliant CMD profile May 2012 12
  13. 13. National CLARIN initiatives – Netherlands and Flanders• TTNWW project• architecture: any• IDL: when available and supported by the framework (Taverne)• semantic description: – a CMD profile has been created – a core model compliant CMD profile has been created May 2012 13
  14. 14. A CMD core model for CLARIN Web Services• Several CMD profiles have been created by the national initiatives – large overlap due to common area: • service • input and output specifications – differences due to design choices: • multiple operations per description • separate technical description (IDL) or none at all • handling of embedded parameters • ...• The CMD core model aims to align these profiles, but also allow extensions for accommodate differences in design choices22 May 2012 LREC Metadata 2012 workshop 14
  15. 15. Additional aims for the core model• The core model should preferably provide enough information to – do (basic) profile matching – invoke a service• This should allow (national) CLARIN web service chaining and workflow engines to potentially use all CLARIN web services22 May 2012 LREC Metadata 2012 workshop 15
  16. 16. UML model for Web Services 16
  17. 17. Salient points• A (technical) service description is mandatory, but the model doesn’t prescribe an IDL• The location/endpoint of the service is part of the technical description, i.e., only the PID/URL of the service description is part of the semantic description• A description can cover multiple operations• Parameters might be (deeply) embedded in a technical input document, e.g., the token or lemma layer inside a TCF document, this is covered by parameter groups• Names of operations, parameters and/or groups in the semantic description should be resolvable in the technical description, so after profile matching it is known how parameters should technically be passed on during invocation• Supports parameter (profile) matching on various semantic levels: MIME type, data type, data category, semantic type22 May 2012 LREC Metadata 2012 workshop 17
  18. 18. Parameter matching• MIME type reveals the media type: text/plain• Data type is generally an XML Schema data type: ID• Data category is generally an ISOcat data category PID: /project id/ (DC-2535)• Semantic type is generally a service specific type: clam.project.adelheid• The tokenize server could specify text/plain as its input MIME type but still an Adelheid project name as the output of the Adelheid create project service would not be proper input22 May 2012 LREC Metadata 2012 workshop 18
  19. 19. From UML to CMD• Transformation to deal with inheritance: – each non-abstract class becomes a component – each atomic attribute becomes an element, but – each referential attribute becomes a component with the referred class as a child component, except – when this class is abstract all non-abstract subclasses become child components – copy cardinality constraints where possible May 2012 LREC Metadata 2012 workshop 19
  20. 20. Service • Name: string • Description: string? • ServiceDescriptionLocation: resource Operations + Operation • Name: string • Description: string? ? Input Output * * * * ParameterGroup Parameter • Name: string • Name: string • Description: string? • Description: string? • MIMEType: string? • MIMEType: string? • DataType: string? • DataType: string? • DataCategory: anyURI? • DataCategory: anyURI? • SemanticType: string? • SemanticType: string? • isConfigurationParameter: boolean? ? Parameters Values + + ParameterValue • Value: string • Description: string ?22 May 2012 • DataCategory: anyURI?
  21. 21. Usage of the CMD core model• The core model is only a starting point, i.e., provides enough information for basic profile matching and technical invocation• It is a template to form a basis for the CMD profile of specific web service registries, i.e., the model can be extended• However, instantiations should also maintain compliance to the core model – cardinalities should be within the boundaries of the core model, e.g., mandatory elements cannot become optional – closed value domains cannot be extended, but open value domains can be turned into closed ones – data category references should not be changed as this would imply different semantics• CLARIN-NL ToolService profile is a compliant extension:• Validate compliance of instances to the core model: May 2012 LREC Metadata 2012 workshop 21
  22. 22. Current status and future work• Current state: – There are some compliant profiles: • CLARIN-NL ToolService profile – but not in use by TTNWW yet • WebLicht 2.0 profile – in use, but still missing technical description (WADL) – The core model is successful if there is a workflow/chaining engine invoking web services which were originally targeted at another engine or none at all• Future work: – Complex chains of web services are captured in (mini) workflows • can this core model also describe the mini workflows? – Alignment with or reuse by other initiatives • e.g., META-SHARE meta model is also based on components and ISOcat, and contains a section on Tools and Services – Identify common extensions and incorporate them into the core • e.g., default values, cardinalities, asynchronicity22 May 2012 LREC Metadata 2012 workshop 22
  23. 23. Thanks for your attention! Please visit: May 2012 LREC Metadata 2012 workshop 23