SlideShare a Scribd company logo
1 of 14
Intro
This memo describes steps to configure and run a language resource processing. It
is intended for internal use only.
Architecture overview
Main components
There are three main components involved in the language resources processing:
● The Resource Server (hereafter RS) manages information about resources,
their status and associated files.
● The Workflow Server (hereafter WS) is responsible to process resource
input files to output files that are loaded to the Virtuoso server. The WS is
implemented using Oozie and Hadoop.
● DERI and others participants processing components
Data and Processing Flow
The following diagram shows communication between WS and RS during processing
a resource:
The flow:
1. The flow is started by the administrator with an http call to the RS REST API.
The call URL contains resource ID as a parameter. Example: POST
/resources/48957c5d-456c-4d7a-abc9-3062c91dafdd/processed
2. First step in the processing is done by the RS. It downloads the resource input
file, uploads it to the SCP server with name: ${resource_id}.ext
3. The resource server then selects flow by resource type, sets flow properties
and starts the flow using WS API of Oozie.
4. Oozie executes the flow that contains data moving steps and execution of the
resource processing components. The penultimate step in the flow moves is
the loading of data to the Virtuoso server, that is done by the miniLoader java
action.
5. The last step in the Oozie flow is notification of the resource server about
Virtuoso load status. The resource server then notify LRPMA about processing
status.
Processing set up overview
The whole processing is configured by following steps
1. resource type definition
2. registration of resource
3. definition of workflow
Processing set up
Definition of the resource type
1st is necessary to create an resource type using the resource server. Creating of
the resource type is the HTTP POST request so it is possible to do it either by
command line HTTP tool like curl or using a REST client. There are screen-shots
from the Postman REST client in following text for illustration. Beside it there are
also request parameters in table because it is easier to read. (and copy&paste).
The HTTP header ContentType should be set to “application/json”.
The resource server address is http://54.201.101.125:9999. Suppose that it is
necessary to process resources provided by Paradigma ltd. That contains a lexicon
so result of processing will be one graph.
Reques
t
POST http://54.201.101.125:9999/resourcestypes
Exampl
e body
{
"id":"paradigma",
"description": "type intended for processing of resources provided by
Paradigma ",
"graphsSuffixes": ["lexicon"]
}
Exampl
e
respons
e
{
"id": "paradigma"
}
The resource type define which workflow is used for processing of the resource and
the resource type id is used as a name of subfolder on HDFS for Oozie workflow.
Registration of the resource
The language resource should be registered in the resource server. Normally it is
done via the LRPMA but it it is possible to do it manually for test purposes using the
resource server REST API.
Request POST http://54.201.101.125:9999/resources
Example {
body "id": "48957c5d-456c-4d7a-abc9-3062c91dafE0",
"resourceType": "paradigma",
"downloadUri":
"scp://ubuntu@54.201.101.125/home/ubuntu/ParadigmaData/hotel_
ca_tricks.csv",
"credentials": "-----BEGIN RSA PRIVATE KEY----- …...,
"language": "ca",
"domain": "hotel",
"provider": "Paradigma ltd",
"licence": "LRGPL",
"graphNamesPrefix":
"http://www.eurosentiment.com/hotel/ca/lexicon/paradigma/"
}
Example
response
{
"id": "48957c5d-456c-4d7a-abc9-3062c91dafE0"
}
Definition of Workflow
Processing steps are defined by XML work flow file that should be copied to Hadoop
Distributed File System to the location that is configured in the Resource file
configuration. The flow contains actions. Every action defines next action in case of
its success.
Properties populated by the resources server are used in the workflow definition
XML files.
Properties of flows populated by the Resource Server:
Properties calculated or retrieved from the resource properties:
Property Description
rsresourceid id of the resource
rsgraphprefix prefix for graphs, please see the miniLoader java action
description below
rsgraphsufix0,
[rsgraphsufix1]...
graph suffixes, one for each file produced by the flow
rsdomain domain of the processed resource
rslanguage language of the processed resource
rsprovider provider
rslicense license
oozie.wf.application.p
ath
${hdfs-folder-uri}/${resourceTypeId}
hdfs-folder-uri is specified in conf.properties of the rs,
resourceTypeId is property of the resource on the rs
The resource server also copy properties from the resource server
configuration file conf/job.properties to the flow properties. It can be used for
properties common for all flows like:
Property Description
nameNode HDFS name node address
jobTracker Map reduce job tracker address
queueName Map reduce jobs queue name
user.name user used to run the OOzie flow
inputfolder where downloaded resource files are stored
rspfilesdir folder for processed files
rsvirtuosoloadfolder absolute path to the folder where files for loading are
stored
rsvirtuosohost hostname or address of the virtuoso server
rsvirtuosojdbcport JDBC port
rsvirtuosojdbcuserr user
rsvirtuosojdbcpasswd password
rsprocessedurl url to send result of the virtuoso load
Example:
Configuring Actions
Work flows usually contains following sequence
◦ Move of data to place when it can be reached by the first processing
component
◦ Processing by the first component
◦ Move of data to place when it can be reached by the second processing
component
◦ Processing by second component
◦ ….
◦ Load to the Virtuoso triple store
Moving the resource file to the processing components
The following snippet shows an example of configuration of first step in flow to
move the resource files to folder where it can be picked up by a processing
component.
<workflow-app xmlns="uri:oozie:workflow:0.3" name="deri-workflow">
<start to="move-resource-file"/>
<action name="move-resource-file" retry-max="2" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptwf</host>
<command>${moveScriptPath} -onlyCopy ${inputfolder}$
{rsresourceid}* ubuntu@ptnuig:/home/ubuntu/data/$
{rsresourceid}.csv</command>
<capture-output/>
</sshWithRetry>
<ok to="lemon-marl-generator"/>
<error to="fail"/>
</action>
Configuring processing
The following xml snippet shows an example of processing by the Lomon Marl
generator.
<action name="lemon-marl-generator" retry-max="3" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptnuig</host>
<command>~/bin/runLemonMarlGeneratorParadigma.sh
/home/ubuntu/data/${rsresourceid}.csv /home/ubuntu/data/outputs/$
{rsresourceid}.ttl ${rsdomain} ${rslanguage} ${rsgraphprefix}$
{rsgraphsufix0}</command>
<capture-output/>
</sshWithRetry>
<ok to="move-file2virtuoso"/>
<error to="fail"/>
</action>
Moving data to Virtuoso Server
The following xml snippet shows an action which move output of previous step to
the Virtuoso server.
<action name="move-file2virtuoso" retry-max="2" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptnuig</host>
<command>${moveScriptPath} /home/ubuntu/data/outputs/$
{rsresourceid}.ttl ${virtuosoUser}@${rsvirtuosohost}:${rsvirtuosoloadfolder}$
{rsresourceid}.ttl</command>
<capture-output/>
</sshWithRetry>
<ok to="load2virtuoso"/>
<error to="fail"/>
</action>
Load data to the Virtuoso Server
The following xml snippet shows an example configuration of the miniLoader
component that is used for load of the processed resources files to the Virtuoso
server.
<action name="load2virtuoso" retry-max="2" retry-interval="10">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>com.sindice.miniloader.Miniloader</main-class>
<arg>${rsvirtuosohost}</arg>
<arg>${rsvirtuosojdbcport}</arg>
<arg>${rsvirtuosojdbcuser}</arg>
<arg>${rsvirtuosojdbcpasswd}</arg>
<arg>${rsvirtuosoloadfolder}${rsresourceid}.ttl</arg>
<arg>${rsgraphprefix}${rsgraphsufix0}</arg>
<capture-output/>
</java>
<ok to="notify_rs" />
<error to="fail" />
</action>
Notifying the resource server
Last step notifies the RS that data was loaded to the Virtuoso server.
<action name="notify_rs">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>curl</exec>
<argument>-H</argument>
<argument>Content-Type:application/json</argument>
<argument>-X</argument>
<argument>POST</argument>
<argument>-d</argument>
<argument>${wf:actionData('load2virtuoso')
['miniloader_json4rs']}</argument>
<argument>${rsprocessedurl}$
{rsresourceid}/processed</argument>
</shell>
<ok to="end" />
<error to="fail" />
</action>
Copy the configuration to the HDFS
The property “hdfs-folder-uri” in conf.properties RS configuration file define the path
where the configuration should be stored.
The resource type ID (paradigma) is part of the HDFS path so it is firs necessary to
check if exists:
If the folder for given resource file does not exists yet it is necessary to create it.
Now is necessary to copy the workflow and required jars. In this case only the
miniloader jar is required and it should be copied to the lib subfolder.
hadoop fs -put workflow.xml /user/ubuntu/nuig-flows/paradigma/
fs -put ~/virtuoso-miniloader-0.0.1-SNAPSHOT.jar /user/ubuntu/nuig-
flows/paradigma/lib
Processing Resources
Processing is started by HTTP POST request to the RS server with empty body.
It is possible to control status of the processing using Oozie web console:
clicking the running line the detail window appears
When processing finished all step should have status OK
When resource is processed successfully it is possible to make a sparql request to
verify the content.
Appendix A: example of whole flow definition
<workflow-app xmlns="uri:oozie:workflow:0.3" name="deri-workflow">
<start to="move-resource-file"/>
<action name="move-resource-file" retry-max="2" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptwf</host>
<command>${moveScriptPath} -onlyCopy ${inputfolder}${rsresourceid}*
ubuntu@ptnuig:/home/ubuntu/data/${rsresourceid}.csv</command>
<capture-output/>
</sshWithRetry>
<ok to="lemon-marl-generator"/>
<error to="fail"/>
</action>
<action name="lemon-marl-generator" retry-max="3" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptnuig</host>
<command>~/bin/runLemonMarlGeneratorParadigma.sh /home/ubuntu/data/$
{rsresourceid}.csv /home/ubuntu/data/outputs/${rsresourceid}.ttl ${rsdomain} $
{rslanguage} ${rsgraphprefix}${rsgraphsufix0}</command>
<capture-output/>
</sshWithRetry>
<ok to="move-file2virtuoso"/>
<error to="fail"/>
</action>
<action name="move-file2virtuoso" retry-max="2" retry-interval="1">
<sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1">
<host>ubuntu@ptnuig</host>
<command>${moveScriptPath} /home/ubuntu/data/outputs/${rsresourceid}.ttl $
{virtuosoUser}@${rsvirtuosohost}:${rsvirtuosoloadfolder}$
{rsresourceid}.ttl</command>
<capture-output/>
</sshWithRetry>
<ok to="load2virtuoso"/>
<error to="fail"/>
</action>
<action name="load2virtuoso" retry-max="2" retry-interval="10">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>com.sindice.miniloader.Miniloader</main-class>
<arg>${rsvirtuosohost}</arg>
<arg>${rsvirtuosojdbcport}</arg>
<arg>${rsvirtuosojdbcuser}</arg>
<arg>${rsvirtuosojdbcpasswd}</arg>
<arg>${rsvirtuosoloadfolder}${rsresourceid}.ttl</arg>
<arg>${rsgraphprefix}${rsgraphsufix0}</arg>
<capture-output/>
</java>
<ok to="notify_rs" />
<error to="fail" />
</action>
<action name="notify_rs">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>curl</exec>
<argument>-H</argument>
<argument>Content-Type:application/json</argument>
<argument>-X</argument>
<argument>POST</argument>
<argument>-d</argument>
<argument>${wf:actionData('load2virtuoso')
['miniloader_json4rs']}</argument>
<argument>${rsprocessedurl}${rsresourceid}/processed</argument>
</shell>
<ok to="dir4processed_file" />
<error to="fail" />
</action>
<action name="dir4processed_file">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>mkdir</exec>
<argument>${rspfilesdir}/${rsresourceid}</argument>
</shell>
<ok to="move_processed_file" />
<error to="fail" />
</action>
<action name="move_processed_file">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>mv</exec>
<argument>${rsvirtuosoloadfolder}${rsresourceid}.ttl</argument>
<argument>${rspfilesdir}/${rsresourceid}</argument>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>SSH action failed, error message[$
{wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

More Related Content

What's hot

Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioRupak Roy
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scalab0ris_1
 
Oracle database - Get external data via HTTP, FTP and Web Services
Oracle database - Get external data via HTTP, FTP and Web ServicesOracle database - Get external data via HTTP, FTP and Web Services
Oracle database - Get external data via HTTP, FTP and Web ServicesKim Berg Hansen
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsInvenire Aude
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauMapR Technologies
 
Ldap configuration documentation
Ldap configuration documentationLdap configuration documentation
Ldap configuration documentationShree Niraula
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Configuring the Apache Web Server
Configuring the Apache Web ServerConfiguring the Apache Web Server
Configuring the Apache Web Serverwebhostingguy
 
MidwestPHP Symfony2 Internals
MidwestPHP Symfony2 InternalsMidwestPHP Symfony2 Internals
MidwestPHP Symfony2 InternalsRaul Fraile
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content TransformationAlfresco Software
 
Restful webservices
Restful webservicesRestful webservices
Restful webservicesKong King
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeMarco Gralike
 
MongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquareMongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquarejorgeortiz85
 
Network Device Database Management with REST using Jersey
Network Device Database Management with REST using JerseyNetwork Device Database Management with REST using Jersey
Network Device Database Management with REST using JerseyPayal Jain
 
intro unix/linux 10
intro unix/linux 10intro unix/linux 10
intro unix/linux 10duquoi
 

What's hot (20)

Import Database Data using RODBC in R Studio
Import Database Data using RODBC in R StudioImport Database Data using RODBC in R Studio
Import Database Data using RODBC in R Studio
 
Psr-7
Psr-7Psr-7
Psr-7
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scala
 
Oracle database - Get external data via HTTP, FTP and Web Services
Oracle database - Get external data via HTTP, FTP and Web ServicesOracle database - Get external data via HTTP, FTP and Web Services
Oracle database - Get external data via HTTP, FTP and Web Services
 
Psr 7 symfony-day
Psr 7 symfony-dayPsr 7 symfony-day
Psr 7 symfony-day
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data Processors
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
API
APIAPI
API
 
Ldap configuration documentation
Ldap configuration documentationLdap configuration documentation
Ldap configuration documentation
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Configuring the Apache Web Server
Configuring the Apache Web ServerConfiguring the Apache Web Server
Configuring the Apache Web Server
 
MidwestPHP Symfony2 Internals
MidwestPHP Symfony2 InternalsMidwestPHP Symfony2 Internals
MidwestPHP Symfony2 Internals
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content Transformation
 
TO Hack an ASP .NET website?
TO Hack an ASP .NET website?  TO Hack an ASP .NET website?
TO Hack an ASP .NET website?
 
Restful webservices
Restful webservicesRestful webservices
Restful webservices
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
 
MongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquareMongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquare
 
Network Device Database Management with REST using Jersey
Network Device Database Management with REST using JerseyNetwork Device Database Management with REST using Jersey
Network Device Database Management with REST using Jersey
 
intro unix/linux 10
intro unix/linux 10intro unix/linux 10
intro unix/linux 10
 

Viewers also liked

Tugas tik bab1
Tugas tik bab1Tugas tik bab1
Tugas tik bab1OsHz
 
Eurosentiment - Developing a new service
Eurosentiment - Developing a new serviceEurosentiment - Developing a new service
Eurosentiment - Developing a new servicemario_munoz
 
TIK BAB 4
TIK BAB 4TIK BAB 4
TIK BAB 4OsHz
 
Eurosentiment - Developing a new service
Eurosentiment - Developing a new serviceEurosentiment - Developing a new service
Eurosentiment - Developing a new servicemario_munoz
 
TIK KELAS IX SEMESTER 1 BAB 1
TIK KELAS IX SEMESTER 1 BAB 1TIK KELAS IX SEMESTER 1 BAB 1
TIK KELAS IX SEMESTER 1 BAB 1OsHz
 
Global e payment system ppt
Global e payment system pptGlobal e payment system ppt
Global e payment system pptSalman Khaja
 
Tugas tik bab 2 tik
Tugas tik bab 2 tikTugas tik bab 2 tik
Tugas tik bab 2 tikOsHz
 

Viewers also liked (8)

Tugas tik bab1
Tugas tik bab1Tugas tik bab1
Tugas tik bab1
 
Eurosentiment - Developing a new service
Eurosentiment - Developing a new serviceEurosentiment - Developing a new service
Eurosentiment - Developing a new service
 
TIK BAB 4
TIK BAB 4TIK BAB 4
TIK BAB 4
 
Eurosentiment - Developing a new service
Eurosentiment - Developing a new serviceEurosentiment - Developing a new service
Eurosentiment - Developing a new service
 
TIK KELAS IX SEMESTER 1 BAB 1
TIK KELAS IX SEMESTER 1 BAB 1TIK KELAS IX SEMESTER 1 BAB 1
TIK KELAS IX SEMESTER 1 BAB 1
 
Global e payment system ppt
Global e payment system pptGlobal e payment system ppt
Global e payment system ppt
 
Tugas tik bab 2 tik
Tugas tik bab 2 tikTugas tik bab 2 tik
Tugas tik bab 2 tik
 
Presentation on memory
Presentation on memoryPresentation on memory
Presentation on memory
 

Similar to Language Resource Processing Configuration and Run

Red5workshop 090619073420-phpapp02
Red5workshop 090619073420-phpapp02Red5workshop 090619073420-phpapp02
Red5workshop 090619073420-phpapp02arghya007
 
nodejs_at_a_glance.ppt
nodejs_at_a_glance.pptnodejs_at_a_glance.ppt
nodejs_at_a_glance.pptWalaSidhom1
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 
Scalable network applications, event-driven - Node JS
Scalable network applications, event-driven - Node JSScalable network applications, event-driven - Node JS
Scalable network applications, event-driven - Node JSCosmin Mereuta
 
Networked APIs with swift
Networked APIs with swiftNetworked APIs with swift
Networked APIs with swiftTim Burks
 
Use perl creating web services with xml rpc
Use perl creating web services with xml rpcUse perl creating web services with xml rpc
Use perl creating web services with xml rpcJohnny Pork
 
JAX-RS 2.0 and OData
JAX-RS 2.0 and ODataJAX-RS 2.0 and OData
JAX-RS 2.0 and ODataAnil Allewar
 
Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4N Masahiro
 
Create Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integrationCreate Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integrationRutul Shah
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow enginedmoebius
 
Node.js Workshop - Sela SDP 2015
Node.js Workshop  - Sela SDP 2015Node.js Workshop  - Sela SDP 2015
Node.js Workshop - Sela SDP 2015Nir Noy
 

Similar to Language Resource Processing Configuration and Run (20)

Red5workshop 090619073420-phpapp02
Red5workshop 090619073420-phpapp02Red5workshop 090619073420-phpapp02
Red5workshop 090619073420-phpapp02
 
nodejs_at_a_glance.ppt
nodejs_at_a_glance.pptnodejs_at_a_glance.ppt
nodejs_at_a_glance.ppt
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Apache web server
Apache web serverApache web server
Apache web server
 
Scalable network applications, event-driven - Node JS
Scalable network applications, event-driven - Node JSScalable network applications, event-driven - Node JS
Scalable network applications, event-driven - Node JS
 
Networked APIs with swift
Networked APIs with swiftNetworked APIs with swift
Networked APIs with swift
 
Use perl creating web services with xml rpc
Use perl creating web services with xml rpcUse perl creating web services with xml rpc
Use perl creating web services with xml rpc
 
Intro to Node
Intro to NodeIntro to Node
Intro to Node
 
JAX-RS 2.0 and OData
JAX-RS 2.0 and ODataJAX-RS 2.0 and OData
JAX-RS 2.0 and OData
 
RoR guide_p1
RoR guide_p1RoR guide_p1
RoR guide_p1
 
Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4Fluentd and Embulk Game Server 4
Fluentd and Embulk Game Server 4
 
SCDJWS 6. REST JAX-P
SCDJWS 6. REST  JAX-PSCDJWS 6. REST  JAX-P
SCDJWS 6. REST JAX-P
 
Apache
ApacheApache
Apache
 
Red5 - PHUG Workshops
Red5 - PHUG WorkshopsRed5 - PHUG Workshops
Red5 - PHUG Workshops
 
Create Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integrationCreate Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integration
 
Rack
RackRack
Rack
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
Node js beginner
Node js beginnerNode js beginner
Node js beginner
 
Node.js Workshop - Sela SDP 2015
Node.js Workshop  - Sela SDP 2015Node.js Workshop  - Sela SDP 2015
Node.js Workshop - Sela SDP 2015
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 

Language Resource Processing Configuration and Run

  • 1. Intro This memo describes steps to configure and run a language resource processing. It is intended for internal use only. Architecture overview Main components There are three main components involved in the language resources processing: ● The Resource Server (hereafter RS) manages information about resources, their status and associated files. ● The Workflow Server (hereafter WS) is responsible to process resource input files to output files that are loaded to the Virtuoso server. The WS is implemented using Oozie and Hadoop. ● DERI and others participants processing components Data and Processing Flow The following diagram shows communication between WS and RS during processing a resource: The flow: 1. The flow is started by the administrator with an http call to the RS REST API. The call URL contains resource ID as a parameter. Example: POST /resources/48957c5d-456c-4d7a-abc9-3062c91dafdd/processed 2. First step in the processing is done by the RS. It downloads the resource input file, uploads it to the SCP server with name: ${resource_id}.ext
  • 2. 3. The resource server then selects flow by resource type, sets flow properties and starts the flow using WS API of Oozie. 4. Oozie executes the flow that contains data moving steps and execution of the resource processing components. The penultimate step in the flow moves is the loading of data to the Virtuoso server, that is done by the miniLoader java action. 5. The last step in the Oozie flow is notification of the resource server about Virtuoso load status. The resource server then notify LRPMA about processing status. Processing set up overview The whole processing is configured by following steps 1. resource type definition 2. registration of resource 3. definition of workflow Processing set up Definition of the resource type 1st is necessary to create an resource type using the resource server. Creating of the resource type is the HTTP POST request so it is possible to do it either by command line HTTP tool like curl or using a REST client. There are screen-shots from the Postman REST client in following text for illustration. Beside it there are also request parameters in table because it is easier to read. (and copy&paste). The HTTP header ContentType should be set to “application/json”. The resource server address is http://54.201.101.125:9999. Suppose that it is necessary to process resources provided by Paradigma ltd. That contains a lexicon so result of processing will be one graph.
  • 3. Reques t POST http://54.201.101.125:9999/resourcestypes Exampl e body { "id":"paradigma", "description": "type intended for processing of resources provided by Paradigma ", "graphsSuffixes": ["lexicon"] } Exampl e respons e { "id": "paradigma" } The resource type define which workflow is used for processing of the resource and the resource type id is used as a name of subfolder on HDFS for Oozie workflow. Registration of the resource The language resource should be registered in the resource server. Normally it is done via the LRPMA but it it is possible to do it manually for test purposes using the resource server REST API. Request POST http://54.201.101.125:9999/resources Example {
  • 4. body "id": "48957c5d-456c-4d7a-abc9-3062c91dafE0", "resourceType": "paradigma", "downloadUri": "scp://ubuntu@54.201.101.125/home/ubuntu/ParadigmaData/hotel_ ca_tricks.csv", "credentials": "-----BEGIN RSA PRIVATE KEY----- …..., "language": "ca", "domain": "hotel", "provider": "Paradigma ltd", "licence": "LRGPL", "graphNamesPrefix": "http://www.eurosentiment.com/hotel/ca/lexicon/paradigma/" } Example response { "id": "48957c5d-456c-4d7a-abc9-3062c91dafE0" } Definition of Workflow Processing steps are defined by XML work flow file that should be copied to Hadoop Distributed File System to the location that is configured in the Resource file configuration. The flow contains actions. Every action defines next action in case of its success. Properties populated by the resources server are used in the workflow definition XML files. Properties of flows populated by the Resource Server: Properties calculated or retrieved from the resource properties: Property Description rsresourceid id of the resource rsgraphprefix prefix for graphs, please see the miniLoader java action description below rsgraphsufix0, [rsgraphsufix1]... graph suffixes, one for each file produced by the flow rsdomain domain of the processed resource rslanguage language of the processed resource rsprovider provider
  • 5. rslicense license oozie.wf.application.p ath ${hdfs-folder-uri}/${resourceTypeId} hdfs-folder-uri is specified in conf.properties of the rs, resourceTypeId is property of the resource on the rs The resource server also copy properties from the resource server configuration file conf/job.properties to the flow properties. It can be used for properties common for all flows like: Property Description nameNode HDFS name node address jobTracker Map reduce job tracker address queueName Map reduce jobs queue name user.name user used to run the OOzie flow inputfolder where downloaded resource files are stored rspfilesdir folder for processed files rsvirtuosoloadfolder absolute path to the folder where files for loading are stored rsvirtuosohost hostname or address of the virtuoso server rsvirtuosojdbcport JDBC port rsvirtuosojdbcuserr user rsvirtuosojdbcpasswd password rsprocessedurl url to send result of the virtuoso load Example:
  • 6. Configuring Actions Work flows usually contains following sequence ◦ Move of data to place when it can be reached by the first processing component ◦ Processing by the first component ◦ Move of data to place when it can be reached by the second processing component ◦ Processing by second component ◦ …. ◦ Load to the Virtuoso triple store Moving the resource file to the processing components The following snippet shows an example of configuration of first step in flow to move the resource files to folder where it can be picked up by a processing component. <workflow-app xmlns="uri:oozie:workflow:0.3" name="deri-workflow"> <start to="move-resource-file"/> <action name="move-resource-file" retry-max="2" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptwf</host> <command>${moveScriptPath} -onlyCopy ${inputfolder}$ {rsresourceid}* ubuntu@ptnuig:/home/ubuntu/data/$ {rsresourceid}.csv</command> <capture-output/> </sshWithRetry> <ok to="lemon-marl-generator"/>
  • 7. <error to="fail"/> </action> Configuring processing The following xml snippet shows an example of processing by the Lomon Marl generator. <action name="lemon-marl-generator" retry-max="3" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptnuig</host> <command>~/bin/runLemonMarlGeneratorParadigma.sh /home/ubuntu/data/${rsresourceid}.csv /home/ubuntu/data/outputs/$ {rsresourceid}.ttl ${rsdomain} ${rslanguage} ${rsgraphprefix}$ {rsgraphsufix0}</command> <capture-output/> </sshWithRetry> <ok to="move-file2virtuoso"/> <error to="fail"/> </action> Moving data to Virtuoso Server The following xml snippet shows an action which move output of previous step to the Virtuoso server. <action name="move-file2virtuoso" retry-max="2" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptnuig</host> <command>${moveScriptPath} /home/ubuntu/data/outputs/$ {rsresourceid}.ttl ${virtuosoUser}@${rsvirtuosohost}:${rsvirtuosoloadfolder}$ {rsresourceid}.ttl</command> <capture-output/> </sshWithRetry> <ok to="load2virtuoso"/> <error to="fail"/> </action> Load data to the Virtuoso Server The following xml snippet shows an example configuration of the miniLoader component that is used for load of the processed resources files to the Virtuoso server.
  • 8. <action name="load2virtuoso" retry-max="2" retry-interval="10"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <main-class>com.sindice.miniloader.Miniloader</main-class> <arg>${rsvirtuosohost}</arg> <arg>${rsvirtuosojdbcport}</arg> <arg>${rsvirtuosojdbcuser}</arg> <arg>${rsvirtuosojdbcpasswd}</arg> <arg>${rsvirtuosoloadfolder}${rsresourceid}.ttl</arg> <arg>${rsgraphprefix}${rsgraphsufix0}</arg> <capture-output/> </java> <ok to="notify_rs" /> <error to="fail" /> </action> Notifying the resource server Last step notifies the RS that data was loaded to the Virtuoso server. <action name="notify_rs"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>curl</exec> <argument>-H</argument> <argument>Content-Type:application/json</argument> <argument>-X</argument> <argument>POST</argument> <argument>-d</argument> <argument>${wf:actionData('load2virtuoso') ['miniloader_json4rs']}</argument> <argument>${rsprocessedurl}$ {rsresourceid}/processed</argument> </shell>
  • 9. <ok to="end" /> <error to="fail" /> </action> Copy the configuration to the HDFS The property “hdfs-folder-uri” in conf.properties RS configuration file define the path where the configuration should be stored. The resource type ID (paradigma) is part of the HDFS path so it is firs necessary to check if exists: If the folder for given resource file does not exists yet it is necessary to create it. Now is necessary to copy the workflow and required jars. In this case only the miniloader jar is required and it should be copied to the lib subfolder. hadoop fs -put workflow.xml /user/ubuntu/nuig-flows/paradigma/ fs -put ~/virtuoso-miniloader-0.0.1-SNAPSHOT.jar /user/ubuntu/nuig- flows/paradigma/lib Processing Resources Processing is started by HTTP POST request to the RS server with empty body.
  • 10. It is possible to control status of the processing using Oozie web console: clicking the running line the detail window appears
  • 11. When processing finished all step should have status OK
  • 12. When resource is processed successfully it is possible to make a sparql request to verify the content. Appendix A: example of whole flow definition <workflow-app xmlns="uri:oozie:workflow:0.3" name="deri-workflow"> <start to="move-resource-file"/> <action name="move-resource-file" retry-max="2" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptwf</host> <command>${moveScriptPath} -onlyCopy ${inputfolder}${rsresourceid}* ubuntu@ptnuig:/home/ubuntu/data/${rsresourceid}.csv</command> <capture-output/> </sshWithRetry> <ok to="lemon-marl-generator"/> <error to="fail"/> </action> <action name="lemon-marl-generator" retry-max="3" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptnuig</host> <command>~/bin/runLemonMarlGeneratorParadigma.sh /home/ubuntu/data/$ {rsresourceid}.csv /home/ubuntu/data/outputs/${rsresourceid}.ttl ${rsdomain} $ {rslanguage} ${rsgraphprefix}${rsgraphsufix0}</command> <capture-output/> </sshWithRetry> <ok to="move-file2virtuoso"/> <error to="fail"/> </action> <action name="move-file2virtuoso" retry-max="2" retry-interval="1"> <sshWithRetry xmlns="uri:oozie:sshWithRetry-action:0.1"> <host>ubuntu@ptnuig</host> <command>${moveScriptPath} /home/ubuntu/data/outputs/${rsresourceid}.ttl $ {virtuosoUser}@${rsvirtuosohost}:${rsvirtuosoloadfolder}$
  • 13. {rsresourceid}.ttl</command> <capture-output/> </sshWithRetry> <ok to="load2virtuoso"/> <error to="fail"/> </action> <action name="load2virtuoso" retry-max="2" retry-interval="10"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <main-class>com.sindice.miniloader.Miniloader</main-class> <arg>${rsvirtuosohost}</arg> <arg>${rsvirtuosojdbcport}</arg> <arg>${rsvirtuosojdbcuser}</arg> <arg>${rsvirtuosojdbcpasswd}</arg> <arg>${rsvirtuosoloadfolder}${rsresourceid}.ttl</arg> <arg>${rsgraphprefix}${rsgraphsufix0}</arg> <capture-output/> </java> <ok to="notify_rs" /> <error to="fail" /> </action> <action name="notify_rs"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>curl</exec> <argument>-H</argument> <argument>Content-Type:application/json</argument> <argument>-X</argument> <argument>POST</argument> <argument>-d</argument> <argument>${wf:actionData('load2virtuoso') ['miniloader_json4rs']}</argument> <argument>${rsprocessedurl}${rsresourceid}/processed</argument> </shell> <ok to="dir4processed_file" /> <error to="fail" /> </action> <action name="dir4processed_file"> <shell xmlns="uri:oozie:shell-action:0.1">
  • 14. <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>mkdir</exec> <argument>${rspfilesdir}/${rsresourceid}</argument> </shell> <ok to="move_processed_file" /> <error to="fail" /> </action> <action name="move_processed_file"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>mv</exec> <argument>${rsvirtuosoloadfolder}${rsresourceid}.ttl</argument> <argument>${rspfilesdir}/${rsresourceid}</argument> </shell> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>SSH action failed, error message[$ {wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>