Ph d defense_Department of Information Technology, Uppsala University, Sweden

  • 531 views
Uploaded on

Querying Data Providing Web Services …

Querying Data Providing Web Services

Manivasakan Sabesan
Department of Information Technology
Uppsala University
Sweden.

Abstract
Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis.
The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services.
For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage.
To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser.
WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
531
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Searching data through web services is an important class of application. What is data providing operation ! There are no query or view facilities provided by the web services. SQL- search data & popular
  • Web service provide a framework for applications by defining sets of operations that can be invoked over the web. Operations are described by WSDL and publicly available. Mediator don’t save any data itself. Import metadata from WSDL. Users allowed to create views and semantic enrichments. They issue SQL queries against these views. User query issued Mediator send sub queries Wrapper invokes the web service call User need to give WSDL URL Meta data imported User define views.
  • publicly available web service to access and search to the National Nutrient Database for US Department of Agriculture contains information about the nutrient content of over 6000 food items. contains five different operations View is based 6 pages long clumsy wsdl document Explanation about attributes and key
  • populate the web service descriptions by, given the URL of a WSDL document, reading the WSDL document using the Java tool kits WSDL4J and Castor . parses the retrieved WSDL document, converts it to the format used by the web service schema , and stores the extracted meta-data in the web service meta-database. In addition to the web service descriptions, WSMED also keeps additional user added WSMED enrichments in its local store. Query processor exploits the web service descriptions and WSMED enrichments to process queries. It utilizes an existing mediator engine Amos II query processor calls the web service manager component, which is implemented using the APIs SAAJ The web service manager is accountable for invoking web service calls using SOAP in order to retrieve the result for the user query - query processor- calculus generator produces a domain calculus expression from an SQL query expression is passed to the query rewriter for further processing to produce an equivalent but simpler domain calculus expression . query rewriter calls the view processor to translate SQL query fragments over the WSMED view into web service operations. An important task for the query rewriter is to identify overlaps between different sub-queries and views calling the same web service operation. This requires knowledge about the key constraints rewritten query is translated into an algebra expression by a cost-based optimizer that uses a generic web service cost model as default. The algebra has operators to invoke web services and to apply external functions implemented in WSDL . algebra expression is finally interpreted by the execution engine . It uses the web service meta-database to convert between the WSMED data representation and a SOAP message when a web service operation is called
  • First approach for query optimization
  • To illustrate the impact of key constraints we define two other views in terms of the WSMED view food . The view foodclasses is used to classify food items while fooddescriptions describes each food item. It is very natural to have these multi-level views. Multi-level view is known as defined based on the main view. Wrapper function is manually designed.
  • Explain x, y axis values Default is 1.5 times faster than Naive Hash join is faster than 300 than default and 500 times than naive
  • Full-semantic enrichment is 5 times faster No explicit cost model Only Key enrichment is the important
  • A common need to search information through data providing web services , with out any side effects, returning set of objects for a given set of parameters.
  • The views can be queried with SQL GetAllSates & GetPlacesWithin with GeoPlaces web service- GetPlaceList with Terraweb service Our queries are concerning data from data providing web service- sql quite natural to express the queries and still popular around Go to demo Import terraservice and execute query
  • Central plan – heuristic cost model- web service signature- assuming web service call is expensive Sequential execution is slow. Plan splitter can indetify the parallelizable web service calls Parallel pipe liner- generates parallel operators
  • Multilevel execution plans generated with several layers of parallelism – process tree fanout central query plan to parallel query plan coordinator initiates communication between child processes and ships plan functions. Then it stream of different parameter tuples Algebra operators upside down results delivered as streams from child processes
  • FF_APPLYP First Finished Apply in Parallel
  • Manually investigation Based on the results- adaptive parallelization
  • A common need to search information through data providing web services , with out any side effects, returning set of objects for a given set of parameters.
  • End of call message
  • A common need to search information through data providing web services , with out any side effects, returning set of objects for a given set of parameters.

Transcript

  • 1. Querying Data Providing Web Services Manivasakan Sabesan Uppsala DataBase Laboratory Dept. of Information Technology Uppsala University Sweden
  • 2. Outline
    • WSMED Architecture
    • Semantic Enrichments
    • Adaptive Parallelization
    • Web Service Query Service
    • Related Work & Future Directions
  • 3.
    • It is difficult to retrieve data provided by web services:
        • Web service applications must be developed using a regular programming language such Java or C#
    • WSMED :
      • Simplifies searching web services data by using database queries
      • Automatically generates collections of parallel programs to do search
      • Automatically optimizes the generated programs
    Our problem area
  • 4. Search information Search information through web services Automatically generated parallel programs WSMED US States information Place Details Weather Forecast information Web service operations
  • 5. Our approach WSMED, a web service based mediator prototype: WSMED mediator SQL query result wrapper DPWSO 1 wrapper DPWSO 2 wrapper DPWSO n SOAP SOAP SOAP WSDL Data Providing Web Service Operations
  • 6. Relational WSMED view View food is based on the web service operation : SearchFoodByDescription select descry from food where gpcode = ’1900’ and keyword = ’Sweet’; SQL Query: WSDL document ≡ ndb keyword descry gpcode 19080 Sweet Candies, Sweet chocolate 1900 ……… . ……… ………… ……… .
  • 7. Research questions
    • How can standards, such as WSDL and SOAP, be automatically utilized by a mediator?
    • How can database views of web service operations be automatically generated?
    • How can modern query optimization be used to provide efficient and scalable search from different web services?
    • How can the query optimizer speed up queries calling web service operations without any cost estimate?
    • How can data sources that are not accessible via web services be simply transformed into data providing web service operations?
    • How can Everything as a Service paradigm be used for querying web services?
  • 8. Web Service MEDiator (WSMED) system architecture
    • WSDL importer: extracts meta data from WSDL document using Web Service Schema and store them in the Web service meta-database
    • Web Service Manager: invokes the web service operation to retrieve the data
    • WSMED enrichments: contains the semantic enrichments
    WSDL Importer Web service Manager SQL query Query Processor WSMED enrichments Web service Schema Web service Meta-database Results Web Service WSDL document
  • 9. Outline
    • WSMED Architecture
    • Semantic Enrichments
    • Adaptive Parallelization
    • Web Service Query Service
    • Related Work & Future Directions
  • 10. Semantic enrichments
    • Manually define SQL views over web service operations defined by imported WSDL
    • Manually add semantic enrichments to help WSMED improve the query performance
  • 11. create view food(ndb, keyword, descry, gpcode) as < wrapper definition >; create view foodclasses(ndb, keyword, gpcode) as select ndb, keyword, gpcode from food; create view fooddescriptions(ndb, descry) as select ndb, descry from food; Multi-level views SQL query accesses the above views: select fd.descry from foodclasses fc, fooddescriptions fd where fc.ndb=fd.ndb and fc.gpcode=’1900’;
  • 12. Query execution strategies
    • No query optimization
    • Heuristic cost model : very simple manual heuristic cost model of web service operation cost and naïve join strategy
    • Hash join strategy: heuristic cost model + hash join
    • Semantic enrichment : key of the view is also specified
  • 13. Comparison of query execution strategies
  • 14. Full semantic enrichment Vs hash join Hash join requires memory to materialize results of the web service calls 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 0 100 200 300 400 500 600 700 800 900 Number of Food Items Response Time(sec) hash join semantic enrichment
  • 15. Outline
    • WSMED Architecture
    • Semantic Enrichments
    • Adaptive Parallelization
    • Web Service Query Service
    • Related Work & Future Directions
  • 16. Adaptive parallelization
    • SQL Views are fully automatically generated
    • No semantic enrichments
    • Costs are not known of web service operations:
      • => Need for adaptive query processing which changes the query plans while running the query
  • 17.
    • Queries calling data providing web services often have dependent calls :
    • Web service calls incur high-latency and high message setup cost.
    • A naïve implementation of an application making these calls sequentially is time consuming
    • WSMED :
      • automatically generates parallel plans
      • experimented with three operators for adaptive parallelization
    Parallelization of queries calling dependent web service operations WS 1 WS 2 WS 3 WS n
  • 18. Example query select gl.City , gl.TypeId from GetAllStates gs, GetPlacesWithin gp, GetPlaceList gl where gs.state=gp.state and gp.distance=15.0 and gp.placeTypeToFind='City' and gp.place='Atlanta' and gl.placeName=gp.ToPlace + ', ' + gp.ToState and gl.MaxItems=100 and gl.imagePresence='true' Finds information about places located within 15 km from each City named ’ Atlanta ‘ in all US states Invokes 300 web service calls and r eturns a stream of 360 tuples <City, TypeId> GetAllStates GetPlacesWithin GetPlaceList <state> <ToPlace, ToState> <15,’City’,’Atlanta’> <100,’true’>
  • 19. Parallel plans in WSMED Parallel query plan SQL query Calculus Generator Parallel pipeliner Plan function generator Central plan creator Plan splitter Phase 1 Phase 2 central plan
  • 20. Manually parallelized execution plan (FF_APPLYP)
    • Parallel pipeline of calls to plan functions PF1 and PF2
    • Manually specified fanout :
      • fixed number of children in a level (e.g. fanout of level 2 is 3 )
    • Query processes q i : Processes executing plan functions
    Level 2 q0 q1 q3 q4 q2 GetPlacesWithin GetAllStates GetPlaceList q5 q8 q7 q6 Coordinator Level 1 Query <State> FF_APPLYP( PF2, 3, ToPlace,ToState ) <City, TypeID> γ GetAllStates() FF_ APPLYP( PF1, 2, State ) <ToPlace, ToState> Parallel Plan Process tree
  • 21. Define process tree by manually specifying fanouts per level: FF_APPLYP( Function PF , Integer fo , Stream pstream ) -> Stream result
    • PF – plan function
    • fo – fanout , values are manually set
    • pstream – stream of argument tuples for PF: a i
    • result – stream of results r i from PF
    • Asynchronous operator
    q3 q4 q5 PF PF PF p 1 p 2 p 3 FF_APPLYP FF_APPLYP r 1 r 2 r 3 p 4 p 5 p 6 PF p 1 , p 2 , p 3 r 1 p 4 r 3 p 5 r 2 p 6
  • 22. Observations
    • Fastest execution time 56.4 sec outperformed with the speed-up of 4.3 the central plan ( 244.8 sec )
    • Limitation: Manual specification of fanout
    Non parallel plan Best execution time
  • 23. AFF_APPLYP
    • 1. AFF_APPLYP initially forms a binary process tree by always setting fanout to 2 - init stage .
    Automatically adapts process tree at run time: AFF_APPLYP( Function PF , Stream pstream ) -> Stream result q0 q1 q3 q4 q2 q6 q5 Coordinator Level 1 Level 2
  • 24. AFF_APPLYP (cont.) 2. Executes a monitoring cycle for each invocation of PF for argument tuple a i in non-leaf node 2.1 After the first monitoring cycle A FF_APPLYP adds p new child processes - an add stage to compare performance change 3. When an added node has several levels of children, recursive init stages of A FF_APPLYP s will produce a binary sub–tree q0 q1 q3 q4 q2 q5 Coordinator Level 1 q7 q9 q8 q10 Level 2 q6 q11
  • 25. AFF_APPLYP (cont.) 4. A FF_APPLYP records per monitoring cycle i the average time t i to produce an incoming tuple from the children 4.1 If t i decreases more than a threshold the add stage is rerun 4.2 If t i increases we either add no more children or run a drop stage that drops one child and its children q0 q1 q3 q4 q2 q5 Coordinator Level 1 q12 q10 Level 2 q6 q11
  • 26. Adaptive results p - number of children added after each monitoring cycle Methods with different p value Non parallel plan Best FF_APPLYP Best AFF_APPLYP
  • 27.
    • The PAP operator adaptively parallelizes independent and dependent calls
    • AFF_APPLYP can also handle independent calls, but will treat them as a sequence (suboptimal):
    Parameterized adaptive parallelization
    • Queries calling data providing web services often have both dependent & independent calls :
    WS 5 WS 1 WS 2 WS 3 WS 4 WS 1 WS 2 WS 3 WS 4 WS 5
  • 28. PAP(Vector of Function VPF , Stream pstream , Vector argorder, Vector resorder ) -> Stream result
    • VPF – set of plan function
    • pstream – stream of argument values p i
    • argorder – arguments order
    • resorder – result order
    • result – stream of results r j
    • Different plan functions use different argument values from an argument tuple in pstream
      • argorder specifies for each plan function how to form the its arguments
    • Similarly resorder specifies how the result of PAP is constructed from the results of its children
    • Asynchronous operator
    PAP operator (Parameterized Adaptive Parallelization)
  • 29. Experimental study
    • Cached dependent (CD): Modifies D by caching the results of web service operation calls using AFF_APPLYP
    • Dependent (D): All web service operations are using AFF_APPLYP
    • Independent (I): Parallel independent calls using PAP
    WS 1 WS 2 WS 3 WS 4 WS 5 WS 1 WS 2 WS 3 WS 4 WS 5 Cache WS 5 WS 1 WS 2 WS 3 WS 4
  • 30. Experimental results Experiments with adaptive strategies Relative scalability
  • 31. Outline
    • WSMED Architecture
    • Semantic Enrichments
    • Adaptive Parallelization
    • Web Service Query Service
    • Related Work & Future Directions
  • 32.
    • WSMED assumes data sources are web service operations
      • How handle a data providing system not available as web service ?
    • The conventional way:
      • Develop software, define WSDL, deploy the interface code
    • Our approach: WSMED Web Service Generator
      • Once data source defined as Amos II mediator system
        • Automatically generates web service interfaces, generates WSDL, dynamically deploys the Web Service
    • The WSMED query service is automatically generated by the WSMED Web Service Generator
      • Everything as a Service paradigm (XaaS)
      • URL to use WSMED web service: http://udbl2.it.uu.se/WSMED/wsmed.html
    Web service query service
  • 33. Outline
    • WSMED Architecture
    • Semantic Enrichments
    • Adaptive Parallelization
    • Web Service Query Service
    • Related Work & Future Directions
  • 34. Contributions of papers A - Answered PA – Partially answered 1. Paper1 - Semantic enrichments 2. Paper II - Adaptive parallelization with dependent calls: AFF_APPLYP 3. Paper III - Adaptive parallelization with dependent & independent calls: PAP 4. Paper IV - Web service query service Research questions Paper I Paper II Paper III paper IV 1. How can web service standards be automatically utilized? A A A 2. How can views of web service operations be automatically generated? PA A A 3. How can query optimization be used to provide efficient and scalable search from web services? PA PA A 4. How can the query optimizer speed up queries without any cost estimate? PA PA A 5. How can data sources that are not accessible via web services be transformed into web services? A 6. How can Everything as a Service paradigm be used for querying web services? A
  • 35.
    • WSMS ( U.Srivastava, J.Widom, K.Munagala, and R.Motwani, Query Optimization over Web Services, VLDB 2 006 )
      • WSMED also invokes parallel web service calls.
      • WSMS has static cost model
      • WSMED supports adaptive parallelization without any static cost model .
    • Eddies ( R.Avnur, et al., Eddies: Continuously adaptive query processing, SIGMOD ,2000 )
      • Adaptive operator
      • Eddies dynamically adapting algebra expression
      • PAP speeds up the calls to individual plan functions for a given algebra expression .
    • Two-phase query optimization strategies in distributed databases ( Hasan, W. :Optimization of SQL queries for Parallel Machines, 1997 )
      • Two-phase optimization
      • Two-phase query optimization used static cost model to statically distribute execution plans
      • WSMED supports adaptive parallelization without any static cost model.
    Related work
  • 36. Future directions
    • WSMED approach relies on calling side effect free data providing web service operations
      • WSDL language does not provide meta-data describing side effects
      • When such a standard is available WSMED can utilize it to guarantee query correctness by managing the updatable views.
    • All performance measurements were made with publicly available web service operations
      • Development of a benchmark to simulate the parallel web service calls for controlled experiments.
  • 37. Thank you for your attention
    • ?
    “ The un-queried life is not worth living”