SlideShare a Scribd company logo
CHAPTER 1
                                   INTRODUCTION

1.1    Statement Of The Problem
        To use web services, it is required for users to find relevant services from a
collection of services dispersed on the web. We currently use UDDI, a distributed registry
system for Web services, to find services. However, UDDI only supports exact keyword
match and category based query towards UDDI data entries representing each Web
service, so it is hard to get a ranked query result and alternate services which are other
possible services when a currently chosen service is no more useful or unreachable.


1.2    Objective Of The Project
We propose a framework for Web services retrieval that provides ranked lists of Web
services. To do this, we apply existing information retrieval approaches to compare Web
services. This system is located on top of a document database, which consists of UDDI
entries of web services and considers not only UDDI data entries but also WSDL
definitions of services. Our framework provides two query types: keyword-based and
template-based query.

1.3    Current Scope
       Current Web service discovery solution is UDDI, a registry-based discovery
mechanism. This UDDI only supports keyword search. It provides a service list that
consist the words that exactly match with user query. This means that UDDI does not
provide ranked query results and alternate services. We use both UDDI and WSDL
definitions and use information retrieval ranking strategy to rank web services according
to their relevance and to find alternate web services.


1.4    Future Scope
       This implementation will improve the efficiency of web service retrieval as user
can specify his own query in terms of his own WSDL file as a template based query.
Instead of getting keyword based results, user can get a list of web services which are
ranked and also he can get alternate web services.



CHAPTER 2

                                                                                          1
LITERATURE SURVEY

2.1 History of Web Services
       Web services are applications that can communicate with other applications over
a network by using a set of standardized protocols. The technology originated from the
efforts of many companies that share an interest in building electronic marketplaces.

       EDI was the first attempt to create a standard way for businesses to communicate
over a network. In the 25 years since EDI came on the scene, there have been numerous
attempts at a universal conduit for connecting business logic over a network: Common
Object Request Broker Architecture, Distributed Component Object Model, Unix Remote
Procedure Call, and Java Remote Method Invocation. Each of those technologies failed to
gain significant market share or enough momentum to succeed. All of them exist today--
each still has its uses--but each failed to gain a broad reach.

       Before the Web, getting all the major software vendors to agree on a transport
protocol for cross-network application services might have been impossible. But the Web
rendered that decision academic, by specifying lower level transports for standardized
communication. The Web uses HTTP running on TCP/IP. TCP/IP was already a mature
standard by the time the Web went mainstream in 1994, and, by 1997, HTTP had become
a universal business standard. With HTTP and TCP/IP in place, all that was needed was
some kind of messaging and data encapsulation standard--and a lot of vendor
cooperation.

       It was XML's invention that really paved the way for Web services. As a widely
heralded, platform-independent standard for data description that could also be used to
describe message-passing protocols, XML was a logical choice for the job of
standardized application-to-application communication. XML officially became a
standard in February 1998, when the World Wide Web Consortium announced that XML
1.0 had reached draft recommendation status: suitable for deployment in applications. By
early 1998, several attempts at an XML protocol for interprocess communication were
made. Allaire Corp.'s Web Distributed Data Exchange (WDDX) was one independent
attempt of note, but it was SOAP, developed by Dave Winer, CEO of Userland Software,
Microsoft engineers Bob Atkinson and Mohsen Al-Ghosein, and Don Box, co-founder of
Develop Mentor Incorporated, that was to become the basis for Web services.
                                                                                        2
Electronic marketplaces were a hot concept in December 1999, when Microsoft
held a private meeting with IBM and other interested companies to show off SOAP 1.0,
its specification for a standardized message-passing protocol based on XML. By the
summer of 2000, SOAP was gaining wider acceptance. IBM and Microsoft were also
each working on a way to programmatically describe how to connect up to a Web service.
After some discussion, protocol proposals from Microsoft and IBM merged. IBM
contributed Network Accessible Service Specification Language and Microsoft offered
both Service Description Language and SOAP Contract Language. In the fall of 2000, the
merged specification, Web Services Description Language, was announced.

       With SOAP and WSDL, companies could create and describe their Web services.
But someone still needed to provide a way to advertise and locate Web services. In March
2000, IBM, Microsoft, and Ariba started working on the solution: Universal Description,
Discovery, and Integration. With SOAP, WSDL, and UDDI in place, the de-facto
standards to create Web services had arrived, but it wasn't until the end of 2000 that five
major IT software infrastructure vendors announced their commitment to Web services.
Oracle, HP, Sun, IBM, and Microsoft, an unlikely--and thus impressive--alliance, stated
their intention to support and deploy the Web services standards in their products.

       It took unprecedented vendor cooperation and commitment on the design of the
core standards (SOAP, WSDL, and UDDI) to make Web services happen--but there's still
more to be done. Additional Web services standards must be defined and adopted in order
to build and integrate things like authentication and connection management.




Figure 2.1 Web services architecture.


2.2 What Is WSDL?


                                                                                         3
As communications protocols and message formats are standardized in the web
community, it becomes increasingly possible and important to be able to describe the
communications in some structured way. WSDL addresses this need by defining an XML
grammar for describing network services as collections of communication endpoints
capable of exchanging messages. WSDL service definitions provide documentation for
distributed systems and serve as a recipe for automating the details involved in
applications communication.

       A WSDL document defines services as collections of network endpoints, or ports.
In WSDL, the abstract definition of endpoints and messages is separated from their
concrete network deployment or data format bindings. This allows the reuse of abstract
definitions: messages, which are abstract descriptions of the data being exchanged, and
port types which are abstract collections of operations. The concrete protocol and data
format specification for a particular port type constitutes a reusable binding. A port is
defined by associating a network address with a reusable binding, and a collection of
ports define a service. Hence, a WSDL document uses the following elements in the
definition of network services:

       Types – a container for data type definitions using some type system (such as
       XSD).
       Message – an abstract, typed definition of the data being communicated.
       Operation – an abstract description of an action supported by the service.
       Port Type – an abstract set of operations supported by one or more endpoints.
       Binding – a concrete protocol and data format specification for a particular port
       type.
       Port – a single endpoint defined as a combination of a binding and a network
       address.
       Service – a collection of related endpoints.

       These elements are described in detail in Section 2. It is important to observe that
WSDL does not introduce a new type definition language. WSDL recognizes the need for
rich type systems for describing message formats, and supports the XML Schemas
specification (XSD) as its canonical type system. However, since it is unreasonable to
expect a single type system grammar to be used to describe all message formats present
and future, WSDL allows using other type definition languages via extensibility.


                                                                                         4
In addition, WSDL defines a common binding mechanism. This is used to attach
a specific protocol or data format or structure to an abstract message, operation, or
endpoint. It allows the reuse of abstract definitions.

   In addition to the core service definition framework, this specification introduces
specific binding extensions for the following protocols and message formats:

       SOAP 1.1
       HTTP GET / POST
       MIME

       Although defined within this document, the above language extensions are layered
on top of the core service definition framework. Nothing precludes the use of other
binding extensions with WSDL.

       In WSDL the term binding refers to the process associating protocol or data
format information with an abstract entity like a message, operation, or portType. WSDL
allows elements representing a specific technology (referred to here as extensibility
elements) under various elements defined by WSDL. These points of extensibility are
typically used to specify binding information for a particular protocol or message format,
but are not limited to such use. Extensibility elements MUST use an XML namespace
different from that of WSDL.

       Extensibility elements are commonly used to specify some technology specific
binding. To distinguish whether the semantic of the technology specific binding is
required for communication or optional, extensibility elements may place a WSDL
required attribute of type Boolean on the element. The default value for required is false.
The required attribute is defined in the namespace "http://schemas.xmlsoap.org/wsdl/".

       Extensibility elements allow innovation in the area of network and message
protocols without having to revise the base WSDL specification. WSDL recommends that
specifications defining such protocols also define any necessary WSDL extensions used
to describe those protocols or formats


2.3 What is SOAP?
       SOAP, originally defined as Simple Object Access Protocol, is a protocol
specification for exchanging structured information in the implementation of Web
                                                                               5
Services in computer networks. It relies on Extensible Markup Language (XML) for its
message format, and usually relies on other Application Layer protocols, most notably
Remote Procedure Call (RPC) and Hypertext Transfer Protocol (HTTP), for message
negotiation and transmission. SOAP can form the foundation layer of a web services
protocol stack, providing a basic messaging framework upon which web services can be
built. This XML based protocol consists of three parts: an envelope, which defines what
is in the message and how to process it, a set of encoding rules for expressing instances of
application-defined data types, and a convention for representing procedure calls and
responses.

       As a layman's example of how SOAP procedures can be used, a SOAP message
could be sent to a web-service-enabled web site, for example, a real-estate price database,
with the parameters needed for a search. The site would then return an XML-formatted
document with the resulting data, e.g., prices, location, features. Because the data is
returned in a standardized machine-parseable format, it could then be integrated directly
into a third-party web site or application.

       The SOAP architecture consists of several layers of specifications: for message
format, Message Exchange Patterns (MEP), underlying transport protocol bindings,
message processing models, and protocol extensibility. SOAP is the successor of XML-
RPC,    though    it   borrows    its   transport   and   interaction   neutrality   and   the
envelope/header/body from elsewhere (probably from WDDX).

The SOAP specification defines the messaging framework which consists of:

       The SOAP processing model defining the rules for processing a SOAP message
       The SOAP extensibility model defining the concepts of SOAP features and SOAP
       modules
       The SOAP underlying protocol binding framework describing the rules for
       defining a binding to an underlying protocol that can be used for exchanging
       SOAP messages between SOAP nodes
       The SOAP message construct defining the structure of a SOAP message

       The SOAP processing model describes a distributed processing model, its
participants, the SOAP nodes and how a SOAP receiver processes a SOAP message. The
following SOAP nodes are defined:

                                                                                            6
SOAP sender: A SOAP node that transmits a SOAP message.
       SOAP receiver: A SOAP node that accepts a SOAP message.
       SOAP message path: The set of SOAP nodes through which a single SOAP
       message passes.
       Initial SOAP sender (Originator): The SOAP sender that originates a SOAP
       message at the starting point of a SOAP message path.
       SOAP intermediary: A SOAP intermediary is both a SOAP receiver and a SOAP
       sender and is targetable from within a SOAP message. It processes the SOAP
       header blocks targeted at it and acts to forward a SOAP message towards an
       ultimate SOAP receiver.
       Ultimate SOAP receiver: The SOAP receiver that is a final destination of a
       SOAP message. It is responsible for processing the contents of the SOAP body
       and any SOAP header blocks targeted at it. In some circumstances, a SOAP
       message might not reach an ultimate SOAP receiver, for example because of a
       problem at a SOAP intermediary. An ultimate SOAP receiver cannot also be a
       SOAP intermediary for the same SOAP message.


2.4 What is UDDI?
       UDDI is a public registry designed to house information about businesses and
their services in a structured way. Through UDDI, one can publish and discover
information about a business and its Web Services. This data can be classified using
standard taxonomies so that information can be found based on categorization. Most
importantly, UDDI contains information about the technical interfaces of a business's
services. Through a set of SOAP-based XML API calls, one can interact with UDDI at
both design time and run time to discover technical data, such that those services can be
invoked and used. In this way, UDDI serves as infrastructure for a software landscape
based on Web Services.

       Why UDDI? What is the need for such a registry? As we look toward a software
landscape of thousands—perhaps millions—of Web Services, some tough challenges
emerge:

       How are Web Services discovered?
       How is this information categorized in a meaningful way?
       What implications are there for localization?

                                                                                       7
What implications are there around proprietary technologies? How can I guarantee
       interoperability in the discovery mechanism?
       How can I interact with such a discovery mechanism at run time once my
       application is dependent upon a Web Service?

       In response to these challenges, the UDDI initiative emerged. A number of
companies, including Microsoft, IBM, Sun, Oracle, Compaq, Hewlett Packard, Intel,
SAP, and over three hundred other companies (see UDDI: Community for a complete
list), came together to develop a specification based on open standards and non-
proprietary technologies to solve these challenges. The result, initially launched in beta
December 2000 and in production by May 2001, was a global business registry hosted by
multiple operator nodes that users could—at no cost—both search and publish to.

       With such an infrastructure for Web Services in place, data about Web Services
can now be found consistently and reliably in a universal, completely vendor-neutral
capacity. Precise categorical searches can be performed using extensible taxonomy
systems and identification. Run-time UDDI integration can be incorporated into
applications. As a result, a Web Services software environment can flourish.

       The UDDI data is hosted by operator nodes, companies that have committed to
running a public node that conforms to the specification governed by the UDDI.org
consortium. Today, two public nodes exist that conform to the Version 1 UDDI
specification: One is hosted by Microsoft and one by IBM. Hewlett Packard has
committed to hosting a node under the Version 2 specification as well. Host operators are
required to replicate data between one another across a secure channel, providing a
redundancy to the entire UDDI cloud. Data can be published to one node and, after
replication, can be discovered on another node. Today, replication occurs at 24-hour
intervals; in the future, as more applications are dependent on the UDDI data, the
intervals will become shorter between replication.

       It is worth noting that there are no proprietary requirements as far as how a host
operator implements its node. The node simply must conform to the UDDI specification.
The Microsoft node (http://uddi.microsoft.com/default.aspx), for example, has been
written entirely in C# and runs in production on the .NET Beta 2 Common Language
Runtime. The code base takes significant advantage of the native SOAP support and
serialization offered by .NET system classes. On the backend, the Microsoft operator

                                                                                        8
node utilizes Microsoft® SQL Server 2000 as its data store. Suffice to say that the IBM
node is using different technologies to run its node! However, the two nodes behave
identically, because they conform to the same set of SOAP-based XML API calls. Client
tools can interoperate with the nodes seamlessly. As such, the UDDI public cloud serves
as a prime example of how the XML Web Services model works across heterogeneous
environments.

       Taking a look at what data is stored in UDDI and how it is structured is the next
step in understanding the UDDI initiative. UDDI is relatively lightweight; it is designed
as a registry, not a repository. The distinction is subtle, but crucial. A registry redirects a
user to resources, whereas a repository is an actual information store. Consider the
Microsoft® Windows® registry as an example: It contains basic settings and parameters,
but ultimately leads an application to a resource or binary. Looking up a COM component
based upon its Prog ID leads to a Class ID, which leads to the location of the binary itself.

UDDI behaves similarly: Like the Windows registry, it relies on Globally Unique
IDentifiers (GUID) to guarantee the ability to perform look-ups and determine the
location of resources. UDDI queries ultimately lead to an interface—a .WSDL file, .XSD
file, .DTD file, and so on—or an implementation (such as an .ASMX or .ASP file)
located on another server. UDDI can thus answer the following kinds of questions:

       "What Web Service interfaces have been published that are based on WSDL and
       established for a given industry?"
       "What companies have written an implementation around one of these
       interfaces?"
       "What Web Services, categorized in a certain way, are offered today?"
       "What Web Services does a given company offer?"
       "Who do I need to contact about using a company's Web Service?"
       "What are the implementation details of a particular Web Service?"




2.5 Information retrieval (IR)

                                                                                             9
Information retrieval is the science of searching for documents, for information
within documents, and for metadata about documents, as well as that of searching
relational databases and the World Wide Web. There is overlap in the usage of the terms
data retrieval, document retrieval, information retrieval, and text retrieval, but each also
has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary,
based on computer science, mathematics, library science, information science,
information architecture, cognitive psychology, linguistics, and statistics.

       The idea of using computers to search for relevant pieces of information was
popularized in the article As We May Think by Vannevar Bush in 1945. The first
automated information retrieval systems were introduced in the 1950s and 1960s. By
1970 several different techniques had been shown to perform well on small text corpora
such as the Cranfield collection (several thousand documents). Large-scale retrieval
systems, such as the Lockheed Dialog system, came into use early in the 1970s.

       In 1992, the US Department of Defense along with the National Institute of
Standards and Technology (NIST) cosponsored the Text Retrieval Conference (TREC) as
part of the TIPSTER text program. The aim of this was to look into the information
retrieval community by supplying the infrastructure that was needed for evaluation of text
retrieval methodologies on a very large text collection. This catalyzed research on
methods that scale to huge corpora. The introduction of web search engines has boosted
the need for very large scale retrieval systems even further.

       Nowadays IR has been extensively deployed for keyword, template based ranking
and retrieval of web documents, web services by search engines and other web
components. Most IR systems compute a numeric score on how well each object in the
database match the query, and rank the objects according to this value. The top ranking
objects are then shown to the user. The process may then be iterated if the user wishes to
refine the query.




2.6 Why use Web Services?

                                                                                         10
To support greater business efficiency and agility, information systems and their
operations have become increasingly decentralized and, for a variety of historical,
technical and business reasons, increasingly heterogeneous. Business processes are
distributed among far-flung business divisions, suppliers, partners, and customers, with
each participant having their own special needs for technology and automation. As a
consequence, the demand for a high degree of interoperability among disparate
information systems has never been greater. Moreover, it‘s critical for this high degree of
interoperability to be sustained in the face of rapid evolution of the cooperating systems,
as participants continually modify their systems in response to new or changing business
requirements.

       Traditional assembly and integration methods (and the resulting integration
software market stimulated by these methods) are not particularly well suited to this new
business environment. These methods rely on a tight coupling between cooperating
systems, which requires either the universal deployment of homogeneous systems
(unlikely, considering the diversity and broad scale of modern business services) or
extraordinarily close coordination among participating development organizations during
initial development and sustainment (for example, to ensure that any changes to APIs or
protocols are simultaneously reflected in all of the deployed systems). In this business
environment, such tight coordination is often impractical (e.g., prohibitively expensive),
and rapid evolution in response to a new business opportunity is typically out of the
question.

       In contrast to traditional assembly and integration methods, web services
technology provides a paradigm that uses messages (in the form of XML documents)
passed among diverse, loosely coupled systems as the focal point for integration. These
systems are no longer viewed solely as components in a larger system of systems but also
as providers of services that are applied to the messages. Web services are a special case
of the more general notion of Service-Oriented Architectures (SOA). Service-Oriented
Architectures represent (i.e., model) interconnected systems or components as collections
of cooperating services. The goal of web services technology is to dramatically reduce the
interoperability issues that would otherwise arise when integrating disparate systems
using traditional means.

Web service is an architectural and programming model that achieves interoperability and
reusability in the following ways:
                                                                                        11
Loosely coupled systems—Service requesters and service providers agree upon an
        interface and abstract away the implementation details. The integration point is
        defined by the interface contract, which isolates the participants from the effects
        of change over time.
        Virtualization services and open standards—Allowing applications to run in a
        virtual environment based on open standards, independent of the specific details
        of the underlying operating system or hardware platform, allows for Internet-level
        scalability and distribution similar to HTTP-based web applications.
        Document-orientation—Interoperability across diverse technology is achieved at
        runtime by leveraging XML documents as a common way to express and
        exchange data.




2.7 Presently Available Technologies and Challenges

Current Web service discovery mechanisms can be classified into three categories:


1. Registry-based discovery like UDDI.
2. Semantic annotation and discovery.
3. Similarity-based search.


2.7.1 Registry-based Discovery
        UDDI itself is a technical specification for a Web service registry, and it was
standardized by OASIS. UDDI is a typical model of registry-based solution. It provides
users with a uniform way to publish and to discover Web services via normative
registries. However, UDDI only supports exact keyword match and category based query
towards UDDI data entries representing each Web service, so it is hard to get a ranked
query result and alternate services which are other possible services when a currently
chosen service is no more useful or unreachable. Moreover, UDDI does not use WSDL
definitions that actually describe service interfaces and message formats of Web services
as a target of queries.


2.7.2 Semantic Annotation and Discovery


                                                                                        12
The second effort is semantic annotation and discovery. In this work, semantic
descriptions are annotated into Web service descriptions, and services can be found with
these semantic descriptions by inference of the semantics. Such efforts are OWL-S and
WSDL-S. However, in recent days this kind of studies is confronted with some severe
questions such as how could we make these semantics automatically? How fast could we
make inferences with these semantics? The complexities and difficulty in interpreting
solutions for these kinds of questions made this method less popular.


2.7.3 Similarity-based Search:
       The last approach is to adopt current information retrieval techniques to retrieve
Web services. In this approach, Web service similarity could be measured as probabilistic
values, and it helps to find relevant services by comparing these values. However, current
IR techniques have been focusing on the retrieval of plain texts, so it cannot be directly
applied to service retrieval. The reason is that a service interface has a logical structure of
operations and messages which are consumed by each operation. For this reason, our
work and related works focus on how it is well adapted in Web services environment.
Recently, there have been some efforts to use IR techniques for service retrieval. For
similarity search of service operations, we can perform clustering of operations based on
parameter name and operation name. We can also show that vector space model (VSM)
could be useful in service retrieval too. Efficiency of VSM in service retrieval and usage
of TF-IDF weighted similarity measures to provide a rank of query results is better than
other IR retrieval techniques. A new solution, named concept lattices can be used for
finding alternative services.


2.8 Our approach
       To use web services, it is required for users to find relevant services from a
collection of services dispersed on the web. We currently use UDDI, a distributed registry
system for Web services, to find services. However, UDDI only supports exact keyword
match and category based query towards UDDI data entries representing each Web
service, so it is hard to get a ranked query result and alternate services which are other
possible services when a currently chosen service is no more useful or unreachable.
Moreover, UDDI does not use WSDL definitions that actually describe service interfaces
and message formats of Web services as a target of queries.



                                                                                            13
In this project, we introduce a framework for XML Web services retrieval, which
can solve the current problems that lie on UDDI. Our system is located on top of a
document database, which consists of UDDI data entries and WSDL files. It provides
ranked query result of Web services and finding alternate Web services by using a
similarity measure. In addition, we discuss related works and further features needed to
improve performance of this Web service retrieval framework.


       We propose a framework for Web services retrieval that provides ranked lists of
Web services. To do this, we apply existing information retrieval approaches to compare
Web services. This system is located on top of UDDI and considers not only UDDI data
entries but also WSDL definitions of services. Our framework provides two query types:
keyword-based and template-based query. Template-based query is a query which itself is
user own WSDL definition to retrieve services in which interfaces quite match with the
interfaces of user own services.


Our framework has following characteristics:
        It provides a ranked list of services and hence user gets most relevant services
        corresponding to his query.
        Users can determine query granularity that is a region where item users need to
        get is located in the hierarchy. User can search for a particular business, service
        or operation instead of searching for the entire web service. This differentiates
        our keyword based search compared to normal keyword search and fine tunes our
        search.
        It supports discovery of alternate web services which are similar to interface
        definition of user own services. We call this template-based query.




                                                                                        14
CHAPTER 3


     SOFTWARE REQUIREMENTS SPECIFICATION

3.1 Introduction
       This Software Requirements Specification provides a complete description of the
design and implementation of a framework for XML web services retrieval with ranking.
The expected users of this framework are all those users who occasionally or frequently
search for relevant web services or alternate web services for their application. It will
also serves as reference for other web services retrieval methods.


3.1.1 Glossary


Term                                          Definition

XML                                           Extensible Mark-Up Language

WSDL                                          Web Services Description Language.

SOAP                                          Simple Object Access Protocol

UDDI                                          Universal Description, Discovery and
                                              Integration
IR                                            Information Retrieval

VSM                                           Vector Space Model

TF-IDF                                        Term Frequency – Inverse Document
                                              Frequency
Table 3.1 Term and definition



3.2 General Description

       Current Web service discovery solution is UDDI, a registry-based discovery
mechanism. This UDDI only supports keyword search. It provides a service list that
consists the words that exactly match with user query. This means that UDDI does not
provide ranked query results and alternate services. We use both UDDI and WSDL
                                                                                      15
definitions and use information retrieval ranking strategy to rank web services according
to their relevance and to find alternate web services. Template based queries will improve
the precision and efficiency of service retrieval providing relevant query results.


3.2.1 Product Description

       Basically we implement a search engine for web services. User can input either
keyword based query or template based query to get a ranked list of web services. He can
also retrieve alternate web services by specifying WSDL file in template based query.
Users can get query granularity by specifying a particular business ,service or operation in
the search engine.



3.2.2 End User Expectation
         User should be able specify WSDL file as template based query for search
engine. He is also expected to know basic elements of WSDL file so that he can find
alternate web services requires for his purpose.


3.2.3 General Constraints
               We have to create our own document database consisting of UDDI data
               entries.
               User should be able specify WSDL file as template based query for search
               engine.
               User friendly interface to take keyword queries.




3.1 Specific Requirements

3.3.1 Functional requirements



The main functional requirements for this solution are as follows:


            User input: User has to specify his query either as keywords (for keyword
            based search) or as WSDL file (for template based search) to the search
            engine.

                                                                                         16
Parsing: In case of template based queries, we parse the input WSDL file
           using XML parser .It will be tokenized and stemmed.
           Similarity Search and Ranking: Calculate similarity between query and
           service descriptions. Percentages of similarity are calculated for web services.
           This similarity comparison is accomplished by vector space model and a TF-
           IDF calculation way. However, we don‘t use original TF-IDF weight for our
           framework. We give a weight to each tokenized word according to the
           position where the words exist in.
           Output: Ranked list of top 3 web services (WSDL file URLs) along with their
           percentage of similarity will be displayed by the search engine.


3.3.2 Non-functional Requirements


The main non-functional requirements for this solution are as follows:


              Performance: The system must provide good ranking mechanism with
              better precisions. Search engine‘s response time should be less.
              Scalability: The system has to be scalable to a potentially unlimited
              number of parties.
              User-friendly operation interface: The system must be easily operated so
              that non-specialized users, can access it without extra help.




3.3.3 Software Requirements

               Eclipse Integrated development environment
               Tomcat5.5.7, application server.
               MS Access 2007 bottom database server.
               Java 1.6.0.

3.3.4 Hardware Requirements

              Operating system: Microsoft XP Professional.
              Processor: Pentium IV, 2.7 GHz, 1 GB RAM.



                                                                                        17
CHAPTER 4
                              SYSTEM DESIGN

4.1 Introduction and Design Overview
The system design has three fundamental sections
        Document database which is required for storing UDDI data entries and WSDL
        entries.
        Web services retrieval framework for parsing, tokenizing, indexing service
        descriptors and calculating similarity percentage.
        Query interface which provides a search engine for entering queries and
        displaying results.


4.2 System Architectural Design




Figure 4.2 Overview of the framework




                                                                               18
4.3 Detailed Description of Components

4.3.1 Parsing
           The first step of retrieval process in template based query is to parse service
descriptions where each service is described in XML structures. In this step, service
descriptions are to be parsed and organized as a DOM tree. To extract service
descriptions from WSDL files, we use Java APIs for XML parser.


4.3.2 Tokenizing
           Next step is tokenizing of WSDL elements (in case of Template based search) and
tokenization of words in user query (in case of keyword based search). During this step a
concatenated word such as ‗SearchByAuthor‘ splits up as ‗search‘, ‗by‘ and ‗author‘
strings.


4.3.3 Document database
       We implement UDDI by creating a database. We represent each web service by
four documents-a document regarding business, a document regarding service, a
document regarding operations and a document providing the description of web service.
Hence we maintain four tables-business, service, operations, description which contain
index terms or keywords of the above mentioned documents respectively. In addition to
these four tables we have a separate table called web services which consists of list of
sample web services and the URLs of their respective WSDL files.


4.3.4 Ranking
           And the final step is to calculate similarity between query and service
descriptions. With this similarity, we measure the rank of services in a result list. This
similarity comparison is accomplished by vector space model and a TF-IDF calculation
way. However, we don‘t use original TF-IDF weight for our framework. We give a
weight to each tokenized word according to the position where the words exist in. For
example, let a word is positioned in one of a name and a description element. If this word
exists in a name field, it has more important meaning than the same word in a description
field. So we calculate term frequencies according to their position as shown below.




                                                                                       19
In here, Wij is a weighted vector for a word in a query vector, and Wid is a
weighted vector for a word in a service descriptions.
       Document weight is calculated using:
               Wid = ∑tf * idf


       Query weight is calculated using:
               Wqj = ∑ (0.5+(0.5* tf ))* idf




4.4 User Interface Design
               4.4.1 Description of the User Interface
                       A form for the user to enter keyword based query.
                       Radio buttons for selecting business ,service ,operations enabling
                       granular search.
                       A form for the user to enter template based query by specifying
                       WSDL.
                       List of operations in the specified WSDL file.
                       Name, URL and similarity percentage of ranked query results.




                                                                                         20
4.4.2 Use Case Diagram


                      Keyword-based Search




The use case diagram above depict the working of the system when user enters keyword
based query. The interactions happening between the actors are also depicted accordingly.


Actors
   1. Client
   2. Server


Description of Use Cases


1. Keyword-based Query
       Here, user enters the query in the natural language. It produces only the exact
match for the query that is been entered by the user.
                                                                                         21
2. Databases
       Web service descriptors and WSDL entries of web services are stored in a
    document database.These entries are considered during similarity calculation with
    user queries.


3. Retrieved URLs
       When a user enters a query in the form of keyword, all the relevant URLs of
    WSDL files of relevant web services are displayed. User is provided with an option
    to select which URL he wants to use.


4. Search
      Relevant web services for the user given keyword based query are searched.
Similarity calculation is done and results are displayed in descending order of similarity
percentage.




                                                                                         22
Template based Search




The use case diagrams above depict the basic working of the system when user enters
template based query. The interactions happening between the actors are also depicted
accordingly.
Actors
   1. Client
   2. Server


Description of Use Cases


1. Template
       User has to enter the WSDL file as template based query in the form provided in
    the search engine the file will be parsed here, which will be checked against the
    indexes stored in the databases.
                                                                                        23
2. Parser
     The WSDL file specified will be parsed using XML parser using java APIs.
      A parser creates a DOM tree from the given WSDL file and it will be used to
   extract operation names from WSDL file.


3. Similarity Measures
      Whenever a match is found then, we use vector space model to calculate the
   similarity. We represent all the files and the query in the form of a vector and do the
   TF-IDF calculations to get the similarity.



4. VSM
      In a vector space, Documents and queries are represented as vectors.
                      dj = (w1,j,w2,j,...,wt,j)
                      q = (w1,q,w2,q,...,wt,q)
   Each dimension corresponds to a separate term. If a term occurs in the document, its
   value in the vector is non-zero. Several different ways of computing these values,
   also known as (term) weights, have been developed. One of the best known schemes
   is TF-IDF weighting, which is used as a similarity measure.


5. Ranked List
      After the calculation of all the similarities, all the relevant web services are
   arranged in the descending order of the similarity percentage. User is provided with
   an option to select which URL he wants to use.


6. Retrieved URLs
     When a user enters a query in the form of keyword, all the relevant URLs of WSDL
files of relevant web services are displayed. User is provided with an option to select
which URL he wants to use.




                                                                                          24
4.5 Test Plan


       4.5.1 Features to Be Tested


                   Parsing module has to be tested whether the WSDL file
                   is being parsed correctly and the appropriate operation
                   names are extracted.


                   Tokenizing module has to be tested to make sure that
                   the queries are being tokenized correctly into separate
                   tokens.


                   Document database connection module has to be tested
                   whether the connection is established correctly and we
                   need to ensure that required data entries are extracted.


                   TF-IDF similarity module has to be tested to make sure
                   that similarity between user query and web services are
                   calculated accurately in different cases.




                                                                          25
CHAPTER 5
                        IMPLEMENTATION


5.1 Class Description
     5.1.1 MainClass:
                 Extends: JFrame
                 Implements: ActionListener and TreeSelectionListener
          Attributes:
                        bg : ButtonGroup
                        bname : String
                        browse : JButton
                        key_search : JButton
                        key_text : JTextField
                        n : int
                        op_text : JTextArea
                        rb1 : JRadioButton
                        rb2 : JRadioButton
                        rb3 : JRadioButton
                        result : String[][]
                        scrollText : JScrollPane
                        selected : String[]
                        table : JTable
                        temp_parse : JButton
                        temp_text : JTextField
                        top : DefaultMutableTreeNode
                        tree : JTree
                        tree_search : JButton
                        type : String




                                                                        26
Operations:
                   MainClass(String): Constructor for the class which passes
                   the value,     name of the frame, to super class JFrame
                   constructor.


                   actionPerformed(ActionEvent): Abstract method of class
                   ActionListener which must be overloaded. It sets the
                   variable type to the component selected on the frame.


                   addComponentsToPane(Container): This method is used to
                   add the components to frame, using grid layout. It divides
                   the pane to 4 regions and adds components related to
                   keyword search in region1, components related to parsing
                   in region2, components related to template search in
                   region3 and the components related to result in region4.


                   valueChanged(TreeSelectionEvent):Abstract      method      of
                   class TreeActionListener which must be overloaded. It
                   copies the selected nodes of operations tree and adds to
                   string array selected.



5.1.2 :BusinessSearch:
           Package: search
           Operation:
                          calculateWeight(String[]):   Takes    the   tokenized
                          query words as Input. Connects to the document
                          database and for each entry of the WSDL file we
                          calculate the weight of the token with respect to the
                          document by searching for the token in the business
                          names and description entries. It loops for total
                          number of documents in database and returns float
                          array with weights.




                                                                              27
5.1.3 :OperationSearch:
                 Package: search
                 Operation:
                               calculateWeight(String[]):    Takes    the   tokenized
                               query words as Input. Connects to the document
                               database and for each entry of the WSDL file we
                               calculate the weight of the token with respect to the
                               document by searching for the token in the
                               operation names and description entries. It loops for
                               total number of documents in database and returns
                               float array with weights.



     5.1.4 :ServiceSearch:
                 Package: search
                 Operation:
                               calculateWeight(String[]):    Takes    the   tokenized
                               query words as Input. Connects to the document
                               database and for each entry of the WSDL file we
                               calculate the weight of the token with respect to the
                               document by searching for the token in the service
                               names and description entries. It loops for total
                               number of documents in database and returns float
                               array with weights.




     5.1.5 :Tokenization:
                 Package: search
                 Operation:

                               tokenize(String): Takes a string as input. And the
                               string is broken in to tokens as either a capital letter
                               is found or one of the special characters is
                               encountered. If the token collected is either of the
                               words ―by, to, and etc..‖ then it s discarded.

                                                                                    28
5.1.6 :SimilarityCalculator:
            Package: search
            Attribute:
                          similarity: float[][]
            Operation:
                          calculateSimilarity(float[][], float[], int): Takes
                          weighted array of query and weighted array of
                          documents and number of tokens as input. It
                          calculates the similarity of the document to the
                          query for tokens. Care is taken such that similarity
                          never exceeds 100. And stores similarities of
                          document in float array similarity.


                          finalResult(): It searches for top 3 non zero
                          similarity documents and connects to the document
                          database and retrieves the WSDL URL and name of
                          the service and stores in string variable. And returns
                          it.



5.1.7:KeyWordSearch:
            Package: search
            Implements: ActionListener


            Operation:
                          actionPerformed(ActionEvent): Abstract method of
                          class ActionListener which must be overloaded.
                          This method is invoked when user enters the query
                          and presses the key word search button. It takes the
                          query string and creates Tokenization object and
                          tokenizes the string. And calculates the TF/IDF on
                          the query by using object QueryTFIDF. Depending
                          on the radio button selected business, operation or

                                                                             29
service   it   creates   corresponding   object   and
                         calculates the weight for the documents. And using
                         query weight and document weight it calculates
                         similarity using similarityCalculation object and
                         calculates the final result and updates the result
                         table.


5.1.8:TemplateSearch:
           Package: search
           Implements: ActionListener


           Operation:
                         actionPerformed(ActionEvent): Abstract method of
                         class ActionListener which must be overloaded.
                         This method is invoked when user selects the nodes
                         of the operation tree and presses the template search
                         button. It takes the string selected which contains
                         the selected tree nodes and creates Tokenization
                         object and tokenizes the string. And calculates the
                         TF/IDF on the query by using object QueryTFIDF.
                         Creates OperationSearch object and calculates the
                         weight for the documents. And using query weight
                         and document weight it calculates similarity using
                         similarityCalculation object and calculates the final
                         result and updates the result table.


5.1.9: OpenFile:
           Package: search
           Implements: ActionListener


           Operation:
                         actionPerformed(ActionEvent): Abstract method of
                         class ActionListener which must be overloaded.
                         This method is invoked when user selects browse
                         button in the template search to select the WSDL
                                                                       30
file. It creates the JFileChooser object and opens the
                           window after user selecting the file the path is
                           extracted and set to the text field.



5.1.10: ParseWSDL:
          Package: search
          Implements: ActionListener


          Operation:

                           actionPerformed(ActionEvent): Abstract method of
                           class ActionListener which must be overloaded.
                           This method is invoked when user selects the
                           WSDL file and presses the parse button in the
                           template based search. It takes the string which is
                           the path of the WSDL file and creates the file reader
                           object and converts the contents into a single string
                           and then that string is converted into document of
                           DocumentBuilderFactory object and then a xpath is
                           defined for the operations field of WSDL file and
                           executed on the document. Where the list of
                           operations is stored in the NodeList. Using this list
                           it creates the Jtree and adds the nodes to it.



5.1.11: QueryTFIDF:
          Package: search
          Operation:
                           calculateWeight(String[]): Takes the query tokens
                           string as input and calculates the weight of the token
                           in query and returns it.




5.1.12:MyTableModel:
          Package: table

                                                                              31
Extends: AbstractTableModel



          Attributes:
                           columnNames : String[]
                           data : Object[][]


          Operations:
                           getColumnCount(): returns number of colums in
                           variable columnNames


                           getColumnName(int): returns the string value in
                           columnNames array in the given index.


                           getRowCount(): returns number of rows in 2-D
                           array data.


                           getValueAt(int, int): returns the value in the table at
                           the given indices.


                           setValueAt(Object, int, int) sets the given value in
                           the table at the given indices.



5.1.13: BrowserLaunch:
          Package: table
          Attributes:
                           browsers : String[]
                           errMsg : String


          Operations:
                           openURL(String): It s executed when user selects
                           any row in the result table. The URL in the row is
                           extracted and given as input. It searches for the
                           operating system of the running system using
                           system properties and opens the browser selecting
                                                                          32
from browser string and opens it with the given
                                URL.


5.2 Algorithms Used


    5.2.1 Keyword Based Search Algorithm
          //input: contents of text field which has the user query.
         //output: displays table with 3 rows containing top3 non zero similarity
                  service WSDL file URL.
         //description: it tokenizes the given query and finds the weight of the query
                       and also searches the document database and finds the weigh
                        of document with the tokens and finds the similarity of the
                        document to the query.
         Step1: string input=getText(text_field)
         Step2: string[] tokens=Tokenization.tokenize(input)
         Step3: float[] queryweight=QueryTFIDF.calculateWeight(tokens)
         Step4: if(buttonselected=Business)then
                float[][] documentweight=BusinessSearch.calculateWeight(tokens)
               else if(buttonselected=Service)then
                float[][] documentweight=ServiceSearch.calculateWeight(tokens)
                else if(buttonselected=Operation)then
               float[][] documentweight=OperationSearch.calculateWeight(tokens)
                {end of if else if}
         Step5: SimilarityCalculator.calculateSimilarity(documentweight,querywei
                ght,tokens.length())
         Step6: string[][] result = : SimilarityCalculator.finalResult()
         Step7: for i=0 to number of rows in result
                 for j=0 to number of columns in result
                   resultTable.setValue(result[i][j],i,j)




                                                                                    33
5.2.2 Template Based Search Algorithm
     //input: WSDL file as the user query.
     //output: displays table with 3 rows containing top3 non zero similarity
              service WSDL file URL.
     //description: it parses the given WSDL file and extracts the operation
                   names and displays in tree form and lets user select nodes
                   tokenizes the selected nodes and finds the weight of qery
                   and also searches the document database and finds the weigh
                    of document with the tokens and finds the similarity of the
                    document to the query.



     Step1: string input=getText(text_field)
     Step2: string[] operations = parse((WSDL)input)
     Step3: tree.nodes = operations.
     Step4: string[] selectednodes = tree.getselectednodes()
     Step5:for each selectednode
             String[] tokens = Tokenization.tokenize(selectednode[i])
     Step6: float[] queryweight=QueryTFIDF.calculateWeight(tokens)
     Step7:float[][] documentweight=OperationSearch.calculateWeight(tokens)
     Step8: SimilarityCalculator.calculateSimilarity(documentweight,querywei
            ght,tokens.length())
     Step9: string[][] result = : SimilarityCalculator.finalResult()
     Step10: for i=0 to number of rows in result
             for j=0 to number of columns in result
               resultTable.setValue(result[i][j],i,j)




                                                                                34
CHAPTER 6
                                  TESTING

    6.1 Introduction

          6.1.1 System Overview
                   Our system is a search engine which takes input as either keyword
   or template based queries from user and produce ranked list of relevant web
   services arranged in decreasing order of similarity percentage.


          6.1.2 Test Approach
                 User can provide keyword based query as per
                   Business
                   Service
                   Operation
                  User can also provide template based query by specifying a
   WSDL file. The search engine is expected to produce a ranked list of web services
   with their similarity percentage and URLs of corresponding WSDL files.


   6.2 Test Cases
          6.2.1 Business Search
                  6.2.1.1 Purpose
                         To verify whether the retrieval of web services as per
          business is accurate
                  6.2.1.2 Inputs
                         A keyword based query on businesses or companies which
          provide web services.
                  6.2.1.3 Expected Outputs
                         Output consists of a set of web services which are ranked
          and listed in the decreasing order of their similarity with the respective
          URLs of WSDL files.




                                                                                       35
6.2.1.4 Test Procedure
               User specified query will be tokenized and TF-IDF
calculations for each term are calculated for each web service by referring
business table in the document database. The description table entries are
also considered for TF-IDF but less weightage is given compared to
business table entries. Finally we calculate similarity percentage for each
web service with respect to the user query and rank them.
       6.2.1.5 Test Results
               Result consists of a ranked list of web services according to
the business query specified by user with correctly calculated similarity
percentages.




6.2.2 Service Search
       6.2.2.1 Purpose
               To verify whether the retrieval of web services as per
service description is accurate.
       6.2.2.2 Inputs
               A keyword based query on service expected by the user.
       6.2.2.3 Expected Outputs
               Output consists of a set of web services which are ranked
and listed in the decreasing order of their similarity with the respective
URLs of WSDL files.
       6.2.2.4 Test Procedure
               User specified query will be tokenized and TF-IDF
calculations for each term are calculated for each web service by referring
service table in the document database. The description table entries are
also considered for TF-IDF but less weightage is given compared to
business table entries. Finally we calculate similarity percentage for each
web service with respect to the user query and rank them.
       6.2.2.5 Test Results
               Result consists of a ranked list of web services according to
the service description specified by user with correctly calculated
similarity percentages
                                                                             36
6.2.3 Operation Search
       6.2.3.1 Purpose
               To verify whether the retrieval of web services as per
operation description is accurate.
       6.2.3.2 Inputs
               A keyword based query on operations expected by the user.
       6.2.3.3 Expected Outputs
               Output consists of a set of web services which are ranked
and listed in the decreasing order of their similarity with the respective
URLs of WSDL files.
       6.2.3.4 Test Procedure
               User specified query will be tokenized and TF-IDF
calculations for each term are calculated for each web service by referring
operations table in the document database. The description table entries are
also considered for TF-IDF but less weightage is given compared to
business table entries. Finally we calculate similarity percentage for each
web service with respect to the user query and rank them.
       6.2.3.5 Test Results
               Result consists of a ranked list of web services according to
the operation description specified by user with correctly calculated
similarity percentages.




6.2.4 Template based Search
       6.2.4.1 Purpose
               To verify whether the retrieval of web services as per the
WSDL file specified by the user is accurate.
       6.2.4.2 Inputs
               A template based query on by providing a WSDL file.




                                                                             37
6.2.4.3 Expected Outputs
                       Output consists of a set of query relevant web services
       arranged in the decreasing order of their similarity with the respective
       URLs of WSDL files.
               6.2.4.4 Test Procedure
                       User specified WSDL file will be parsed and operation
       names will be extracted from the WSDL file. User can select desired
       operations from the list of available operations in WSDL file. The selected
       operation names will be tokenized and TF-IDF calculations for each term
       are calculated for each web service by referring operations table in the
       document database. The description table entries are also considered for
       TF-IDF but less weightage is given compared to business table entries.
       Finally we calculate similarity percentage for each web service with
       respect to the user query and rank them.
               6.2.4.5 Test Results
                       Result consist of a ranked list of web services whose
       WSDL files have similar operations compared to the user specified WSDL
       file, with correctly calculated similarity percentages.


6.3 Experimental Study
           The experiments were performed on a Pentium IV 2.7GHz, 1G RAM,
Windows machine and codes are implemented with Java 1.6.0.
Number of web services: 20
Number of operations: 42
Document database size: 1.34MB


The experiments were performed on following 5 queries
1.Area of square
2.Search book by author name
3.Air routes between two cities
4.Car price by name
5.Country by its capital city




                                                                                  38
0.8

   0.7

   0.6

   0.5

   0.4                                                     Keyword Query

   0.3                                                     Template Query

   0.2

   0.1

     0
            1        2        3         4        5


            Graph 1.Execution time for keyword and template searches

Template based queries take more time to execute than keyword based query as
more time is required for parsing and extracting useful data from WSDL file ,
which is shown is Graph 1.




                             Keyword Queries
  100
   90
   80
   70
   60
   50
                                                                  Keyword Queries
   40
   30
   20
   10
    0
            1            2        3         4          5

         Graph 2.Precision percentages of keyword based query searches.




                                                                                    39
Template Queries
         100
          90
          80
          70
          60
          50
                                                                       Template Queries
          40
          30
          20
          10
           0
                   1          2         3         4          5


               Graph 3.Precision percentages of template based query searches


Template based queries produce more efficient results compared to keyword based
queries as we directly match WSDL files of web services, which provide more accurate
results. This is reflected in Graphs 2 and 3 where precision percentages for keyword and
template based queries are respectively shown.




                                                                                     40
CHAPTER 7


         CONCLUSION AND FUTURE ENHANCEMENT

           In this project we have proposed a new form of query called template based
query which optimizes searching and produces fine tuned results. We also improve the
conventional keyword based search by providing granular search where user can search
for a particular business, operation, service instead of searching vaguely for the entire
web service.
         Template based queries take more time to execute than keyword based query as
more time is required for parsing and extracting useful data from WSDL file .However
template based queries produce more efficient results compared to keyword based queries
as we directly match WSDL files of web services, which provide more accurate results.


          As a future enhancement we can perform service composition wherein user can
combine one or more web services to create a new web services as per his requirement.
One way to do this is selecting all those services whose number and type of parameters
matches with the existing web service so that we can combine the services to have new
web services




                                                                                            41
CHAPTER 8


            BIBLIOGRAPHY AND REFERENCES

[1] IEEE 2007 paper, Kyong-Ha Lee, Mi-young Lee Yun-Young Hwang and Kyu-Chul
Lee, Department of Computer Engineering, Chungnam National University Daejeon,
305-764, KOREA , “A Framework for XML Web Services Retrieval with Ranking”.


[2] Chiristian Platzer, Schahram Dustdar, “A Vector space search engine for Web
services”, In Proceedings of 3rd European Conference on Web Services, 2005


[3] Natallia Kokash, ―A Comparison of web service interface similarity measures”,
Technical Report DIT-06-025, University of Trento, 2006


[4] Dunglu Peng, et. al., ―Concept-based retrieval of alternate Web services‖, In
Proceedings of DASFAA 2005, LNCS V.3453, pp. 359-371, 2005


[5] Jong P. Yoon, et. al. “BitCube: A Three-dimensional Bitmap Indexing for XML
Documents”, Journal of Intelligent Information Systems Vol. 17 pp. 241-252, 2001


[6] W3 schools-www.w3c.org


[7] Wikipedia-web services, information retrieval.




                                                                                   42
SCREEN SNAPSHOTS




1. Keyword Search




                              43
2. Template search




                     44

More Related Content

What's hot

Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
Andriy Buday
 
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
PET Computação
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
Mohamed Zaytoun
 
Concept of SOA
Concept of SOAConcept of SOA
Concept of SOA
Sylvain Witmeyer
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)
Biniam Asnake
 
Service Oriented Architecture
Service Oriented Architecture Service Oriented Architecture
Service Oriented Architecture
Prabhat gangwar
 
Tutorial Webservices
Tutorial WebservicesTutorial Webservices
Tutorial Webservices
Fabian Lopez
 
SOA
SOASOA
X-Road as a Platform to Exchange MyData
X-Road as a Platform to Exchange MyDataX-Road as a Platform to Exchange MyData
X-Road as a Platform to Exchange MyData
Petteri Kivimäki
 
JBoss SOA Platform - Overview
JBoss SOA Platform - OverviewJBoss SOA Platform - Overview
JBoss SOA Platform - Overview
Xpand IT
 
03 Service Oriented Architecture Series - Basic SOA Architecture
03 Service Oriented Architecture Series - Basic SOA Architecture03 Service Oriented Architecture Series - Basic SOA Architecture
03 Service Oriented Architecture Series - Basic SOA Architecture
Pouria Ghatrenabi
 
Service-Oriented Architecture
Service-Oriented ArchitectureService-Oriented Architecture
Service-Oriented Architecture
Samantha Geitz
 
Biz Talk Overview
Biz Talk OverviewBiz Talk Overview
Biz Talk Overview
rajeshgaddam
 
Understanding Web services
Understanding Web servicesUnderstanding Web services
Understanding Web services
Fabricio Epaminondas
 
Soa 20 steps to soa governance
Soa 20 steps to soa governanceSoa 20 steps to soa governance
Soa 20 steps to soa governance
Vaibhav Khanna
 
Introduction to Biz Talk
Introduction to Biz TalkIntroduction to Biz Talk
Introduction to Biz Talk
Adi Dancu
 
Web Services
Web ServicesWeb Services
Web Services
chidi
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
Syed Mustafa
 
BizTalk: Server, Services and Apps
BizTalk: Server, Services and AppsBizTalk: Server, Services and Apps
BizTalk: Server, Services and Apps
Sandro Pereira
 
What new in Integration with BizTalk Server 2013 R2
What new in Integration with BizTalk Server 2013 R2What new in Integration with BizTalk Server 2013 R2
What new in Integration with BizTalk Server 2013 R2
Bill Chesnut
 

What's hot (20)

Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
 
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
Software Evolution: From Legacy Systems, Service Oriented Architecture to Clo...
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
 
Concept of SOA
Concept of SOAConcept of SOA
Concept of SOA
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)
 
Service Oriented Architecture
Service Oriented Architecture Service Oriented Architecture
Service Oriented Architecture
 
Tutorial Webservices
Tutorial WebservicesTutorial Webservices
Tutorial Webservices
 
SOA
SOASOA
SOA
 
X-Road as a Platform to Exchange MyData
X-Road as a Platform to Exchange MyDataX-Road as a Platform to Exchange MyData
X-Road as a Platform to Exchange MyData
 
JBoss SOA Platform - Overview
JBoss SOA Platform - OverviewJBoss SOA Platform - Overview
JBoss SOA Platform - Overview
 
03 Service Oriented Architecture Series - Basic SOA Architecture
03 Service Oriented Architecture Series - Basic SOA Architecture03 Service Oriented Architecture Series - Basic SOA Architecture
03 Service Oriented Architecture Series - Basic SOA Architecture
 
Service-Oriented Architecture
Service-Oriented ArchitectureService-Oriented Architecture
Service-Oriented Architecture
 
Biz Talk Overview
Biz Talk OverviewBiz Talk Overview
Biz Talk Overview
 
Understanding Web services
Understanding Web servicesUnderstanding Web services
Understanding Web services
 
Soa 20 steps to soa governance
Soa 20 steps to soa governanceSoa 20 steps to soa governance
Soa 20 steps to soa governance
 
Introduction to Biz Talk
Introduction to Biz TalkIntroduction to Biz Talk
Introduction to Biz Talk
 
Web Services
Web ServicesWeb Services
Web Services
 
Service Oriented Architecture
Service Oriented ArchitectureService Oriented Architecture
Service Oriented Architecture
 
BizTalk: Server, Services and Apps
BizTalk: Server, Services and AppsBizTalk: Server, Services and Apps
BizTalk: Server, Services and Apps
 
What new in Integration with BizTalk Server 2013 R2
What new in Integration with BizTalk Server 2013 R2What new in Integration with BizTalk Server 2013 R2
What new in Integration with BizTalk Server 2013 R2
 

Viewers also liked

Algebra prep5 equation
Algebra prep5 equationAlgebra prep5 equation
Algebra prep5 equation
Aluko Sayo Enoch
 
Practica de laboratorio Número dos
Practica de laboratorio Número dosPractica de laboratorio Número dos
Practica de laboratorio Número dos
sparkhigu6
 
Sessió 9 projecte 2n eso 2012 m4t3s - estadística
Sessió 9   projecte 2n eso 2012 m4t3s - estadísticaSessió 9   projecte 2n eso 2012 m4t3s - estadística
Sessió 9 projecte 2n eso 2012 m4t3s - estadísticaisabel_rm
 
Gretapeyran 2010 11-esercizio4
Gretapeyran 2010 11-esercizio4Gretapeyran 2010 11-esercizio4
Gretapeyran 2010 11-esercizio4
gretaperi
 
Abp masterclass aranjuez
Abp masterclass aranjuezAbp masterclass aranjuez
Abp masterclass aranjuez
Diego OJEDA ALVAREZ
 
REPUBLIC DAY
REPUBLIC DAYREPUBLIC DAY
REPUBLIC DAYindianwhc
 
Evidencias del reto desafío de energía
Evidencias del reto desafío de energíaEvidencias del reto desafío de energía
Evidencias del reto desafío de energía
Escuela San Francisco
 
Lugares virtuales
Lugares virtualesLugares virtuales
Lugares virtuales
reyzonjhariel
 
Eines taller de tecnologia
Eines taller de tecnologiaEines taller de tecnologia
Eines taller de tecnologia
Ricard
 
Opiod analgesics by Dr. Amit T. Suryawanshi
Opiod analgesics by Dr. Amit T. Suryawanshi Opiod analgesics by Dr. Amit T. Suryawanshi
Opiod analgesics by Dr. Amit T. Suryawanshi
All Good Things
 
PP k 4 produktutvikling
PP k 4 produktutviklingPP k 4 produktutvikling
PP k 4 produktutvikling
Trine Skarvang
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
Zeeshan Aslam
 
dr.Husni - paracetamol iv as a safety analgesic
dr.Husni - paracetamol iv as a safety analgesicdr.Husni - paracetamol iv as a safety analgesic
dr.Husni - paracetamol iv as a safety analgesic
Department of Anesthesiology, Faculty of Medicine Hasanuddin University
 
Catheter Markets
Catheter MarketsCatheter Markets
Catheter Markets
ReportLinker.com
 
Competencias y capacidades del área de educación fisica
Competencias y capacidades del área de educación fisicaCompetencias y capacidades del área de educación fisica
Competencias y capacidades del área de educación fisica
Santos Davalos
 
Fonts d'energia
Fonts d'energiaFonts d'energia
Fonts d'energiaMprof
 
Google Docs Presentaciones
Google Docs   PresentacionesGoogle Docs   Presentaciones
Google Docs Presentaciones
Natalia Urrego Ospina
 

Viewers also liked (20)

Asta el cap,10
Asta el cap,10Asta el cap,10
Asta el cap,10
 
1
11
1
 
Algebra prep5 equation
Algebra prep5 equationAlgebra prep5 equation
Algebra prep5 equation
 
Practica de laboratorio Número dos
Practica de laboratorio Número dosPractica de laboratorio Número dos
Practica de laboratorio Número dos
 
Asta el cap,7
Asta el cap,7Asta el cap,7
Asta el cap,7
 
Sessió 9 projecte 2n eso 2012 m4t3s - estadística
Sessió 9   projecte 2n eso 2012 m4t3s - estadísticaSessió 9   projecte 2n eso 2012 m4t3s - estadística
Sessió 9 projecte 2n eso 2012 m4t3s - estadística
 
Gretapeyran 2010 11-esercizio4
Gretapeyran 2010 11-esercizio4Gretapeyran 2010 11-esercizio4
Gretapeyran 2010 11-esercizio4
 
Abp masterclass aranjuez
Abp masterclass aranjuezAbp masterclass aranjuez
Abp masterclass aranjuez
 
REPUBLIC DAY
REPUBLIC DAYREPUBLIC DAY
REPUBLIC DAY
 
Evidencias del reto desafío de energía
Evidencias del reto desafío de energíaEvidencias del reto desafío de energía
Evidencias del reto desafío de energía
 
Lugares virtuales
Lugares virtualesLugares virtuales
Lugares virtuales
 
Eines taller de tecnologia
Eines taller de tecnologiaEines taller de tecnologia
Eines taller de tecnologia
 
Opiod analgesics by Dr. Amit T. Suryawanshi
Opiod analgesics by Dr. Amit T. Suryawanshi Opiod analgesics by Dr. Amit T. Suryawanshi
Opiod analgesics by Dr. Amit T. Suryawanshi
 
PP k 4 produktutvikling
PP k 4 produktutviklingPP k 4 produktutvikling
PP k 4 produktutvikling
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
 
dr.Husni - paracetamol iv as a safety analgesic
dr.Husni - paracetamol iv as a safety analgesicdr.Husni - paracetamol iv as a safety analgesic
dr.Husni - paracetamol iv as a safety analgesic
 
Catheter Markets
Catheter MarketsCatheter Markets
Catheter Markets
 
Competencias y capacidades del área de educación fisica
Competencias y capacidades del área de educación fisicaCompetencias y capacidades del área de educación fisica
Competencias y capacidades del área de educación fisica
 
Fonts d'energia
Fonts d'energiaFonts d'energia
Fonts d'energia
 
Google Docs Presentaciones
Google Docs   PresentacionesGoogle Docs   Presentaciones
Google Docs Presentaciones
 

Similar to Web services

Web services concepts, protocols and development
Web services concepts, protocols and developmentWeb services concepts, protocols and development
Web services concepts, protocols and development
ishmecse13
 
Efficient retrieval of web services using prioritization and clustering
Efficient retrieval of web services using prioritization and clusteringEfficient retrieval of web services using prioritization and clustering
Efficient retrieval of web services using prioritization and clustering
Alexander Decker
 
Unit 5 WEB TECHNOLOGIES
Unit 5 WEB TECHNOLOGIES Unit 5 WEB TECHNOLOGIES
Unit 5 WEB TECHNOLOGIES
tamilmozhiyaltamilmo
 
web technologies Unit 5
 web technologies Unit 5 web technologies Unit 5
web technologies Unit 5
madhusrinivasan9
 
Web services
Web servicesWeb services
Web services
Akshay Ballarpure
 
Anatomy Of A Web Service
Anatomy Of A Web ServiceAnatomy Of A Web Service
Anatomy Of A Web Service
kchavd01
 
Web Services in Cloud Computing.pptx
Web Services in Cloud Computing.pptxWeb Services in Cloud Computing.pptx
Web Services in Cloud Computing.pptx
ssuser403d87
 
Service view
Service viewService view
Service view
Pooja Dixit
 
Service Oriented Architecture Luqman
Service Oriented Architecture LuqmanService Oriented Architecture Luqman
Service Oriented Architecture Luqman
Luqman Shareef
 
Context And Concept Of Web Services
Context And Concept Of Web ServicesContext And Concept Of Web Services
Context And Concept Of Web Services
Fatih Taşkın
 
webservices overview
webservices overviewwebservices overview
webservices overview
elliando dias
 
Xml For Dummies Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
Xml For Dummies   Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...Xml For Dummies   Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
Xml For Dummies Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
phanleson
 
Wsdl Bahankuliah
Wsdl BahankuliahWsdl Bahankuliah
Wsdl Bahankuliah
Eri Alam
 
Description of soa and SOAP,WSDL & UDDI
Description of soa and SOAP,WSDL & UDDIDescription of soa and SOAP,WSDL & UDDI
Description of soa and SOAP,WSDL & UDDI
TUSHAR VARSHNEY
 
SynapseIndia dotnet web applications development
SynapseIndia  dotnet web applications developmentSynapseIndia  dotnet web applications development
SynapseIndia dotnet web applications development
Synapseindiappsdevelopment
 
dotNETfinal.ppt
dotNETfinal.pptdotNETfinal.ppt
dotNETfinal.ppt
ssuser041880
 
dotNETfinal.ppt
dotNETfinal.pptdotNETfinal.ppt
dotNETfinal.ppt
almkjdfhjjfa
 
Investigating Soap and Xml Technologies in Web Service
Investigating Soap and Xml Technologies in Web Service  Investigating Soap and Xml Technologies in Web Service
Investigating Soap and Xml Technologies in Web Service
ijsc
 
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICEINVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
ijsc
 
As044285288
As044285288As044285288
As044285288
IJERA Editor
 

Similar to Web services (20)

Web services concepts, protocols and development
Web services concepts, protocols and developmentWeb services concepts, protocols and development
Web services concepts, protocols and development
 
Efficient retrieval of web services using prioritization and clustering
Efficient retrieval of web services using prioritization and clusteringEfficient retrieval of web services using prioritization and clustering
Efficient retrieval of web services using prioritization and clustering
 
Unit 5 WEB TECHNOLOGIES
Unit 5 WEB TECHNOLOGIES Unit 5 WEB TECHNOLOGIES
Unit 5 WEB TECHNOLOGIES
 
web technologies Unit 5
 web technologies Unit 5 web technologies Unit 5
web technologies Unit 5
 
Web services
Web servicesWeb services
Web services
 
Anatomy Of A Web Service
Anatomy Of A Web ServiceAnatomy Of A Web Service
Anatomy Of A Web Service
 
Web Services in Cloud Computing.pptx
Web Services in Cloud Computing.pptxWeb Services in Cloud Computing.pptx
Web Services in Cloud Computing.pptx
 
Service view
Service viewService view
Service view
 
Service Oriented Architecture Luqman
Service Oriented Architecture LuqmanService Oriented Architecture Luqman
Service Oriented Architecture Luqman
 
Context And Concept Of Web Services
Context And Concept Of Web ServicesContext And Concept Of Web Services
Context And Concept Of Web Services
 
webservices overview
webservices overviewwebservices overview
webservices overview
 
Xml For Dummies Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
Xml For Dummies   Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...Xml For Dummies   Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
Xml For Dummies Chapter 15 Using Xml With Web Servicesit-slideshares.blogsp...
 
Wsdl Bahankuliah
Wsdl BahankuliahWsdl Bahankuliah
Wsdl Bahankuliah
 
Description of soa and SOAP,WSDL & UDDI
Description of soa and SOAP,WSDL & UDDIDescription of soa and SOAP,WSDL & UDDI
Description of soa and SOAP,WSDL & UDDI
 
SynapseIndia dotnet web applications development
SynapseIndia  dotnet web applications developmentSynapseIndia  dotnet web applications development
SynapseIndia dotnet web applications development
 
dotNETfinal.ppt
dotNETfinal.pptdotNETfinal.ppt
dotNETfinal.ppt
 
dotNETfinal.ppt
dotNETfinal.pptdotNETfinal.ppt
dotNETfinal.ppt
 
Investigating Soap and Xml Technologies in Web Service
Investigating Soap and Xml Technologies in Web Service  Investigating Soap and Xml Technologies in Web Service
Investigating Soap and Xml Technologies in Web Service
 
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICEINVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
INVESTIGATING SOAP AND XML TECHNOLOGIES IN WEB SERVICE
 
As044285288
As044285288As044285288
As044285288
 

Web services

  • 1. CHAPTER 1 INTRODUCTION 1.1 Statement Of The Problem To use web services, it is required for users to find relevant services from a collection of services dispersed on the web. We currently use UDDI, a distributed registry system for Web services, to find services. However, UDDI only supports exact keyword match and category based query towards UDDI data entries representing each Web service, so it is hard to get a ranked query result and alternate services which are other possible services when a currently chosen service is no more useful or unreachable. 1.2 Objective Of The Project We propose a framework for Web services retrieval that provides ranked lists of Web services. To do this, we apply existing information retrieval approaches to compare Web services. This system is located on top of a document database, which consists of UDDI entries of web services and considers not only UDDI data entries but also WSDL definitions of services. Our framework provides two query types: keyword-based and template-based query. 1.3 Current Scope Current Web service discovery solution is UDDI, a registry-based discovery mechanism. This UDDI only supports keyword search. It provides a service list that consist the words that exactly match with user query. This means that UDDI does not provide ranked query results and alternate services. We use both UDDI and WSDL definitions and use information retrieval ranking strategy to rank web services according to their relevance and to find alternate web services. 1.4 Future Scope This implementation will improve the efficiency of web service retrieval as user can specify his own query in terms of his own WSDL file as a template based query. Instead of getting keyword based results, user can get a list of web services which are ranked and also he can get alternate web services. CHAPTER 2 1
  • 2. LITERATURE SURVEY 2.1 History of Web Services Web services are applications that can communicate with other applications over a network by using a set of standardized protocols. The technology originated from the efforts of many companies that share an interest in building electronic marketplaces. EDI was the first attempt to create a standard way for businesses to communicate over a network. In the 25 years since EDI came on the scene, there have been numerous attempts at a universal conduit for connecting business logic over a network: Common Object Request Broker Architecture, Distributed Component Object Model, Unix Remote Procedure Call, and Java Remote Method Invocation. Each of those technologies failed to gain significant market share or enough momentum to succeed. All of them exist today-- each still has its uses--but each failed to gain a broad reach. Before the Web, getting all the major software vendors to agree on a transport protocol for cross-network application services might have been impossible. But the Web rendered that decision academic, by specifying lower level transports for standardized communication. The Web uses HTTP running on TCP/IP. TCP/IP was already a mature standard by the time the Web went mainstream in 1994, and, by 1997, HTTP had become a universal business standard. With HTTP and TCP/IP in place, all that was needed was some kind of messaging and data encapsulation standard--and a lot of vendor cooperation. It was XML's invention that really paved the way for Web services. As a widely heralded, platform-independent standard for data description that could also be used to describe message-passing protocols, XML was a logical choice for the job of standardized application-to-application communication. XML officially became a standard in February 1998, when the World Wide Web Consortium announced that XML 1.0 had reached draft recommendation status: suitable for deployment in applications. By early 1998, several attempts at an XML protocol for interprocess communication were made. Allaire Corp.'s Web Distributed Data Exchange (WDDX) was one independent attempt of note, but it was SOAP, developed by Dave Winer, CEO of Userland Software, Microsoft engineers Bob Atkinson and Mohsen Al-Ghosein, and Don Box, co-founder of Develop Mentor Incorporated, that was to become the basis for Web services. 2
  • 3. Electronic marketplaces were a hot concept in December 1999, when Microsoft held a private meeting with IBM and other interested companies to show off SOAP 1.0, its specification for a standardized message-passing protocol based on XML. By the summer of 2000, SOAP was gaining wider acceptance. IBM and Microsoft were also each working on a way to programmatically describe how to connect up to a Web service. After some discussion, protocol proposals from Microsoft and IBM merged. IBM contributed Network Accessible Service Specification Language and Microsoft offered both Service Description Language and SOAP Contract Language. In the fall of 2000, the merged specification, Web Services Description Language, was announced. With SOAP and WSDL, companies could create and describe their Web services. But someone still needed to provide a way to advertise and locate Web services. In March 2000, IBM, Microsoft, and Ariba started working on the solution: Universal Description, Discovery, and Integration. With SOAP, WSDL, and UDDI in place, the de-facto standards to create Web services had arrived, but it wasn't until the end of 2000 that five major IT software infrastructure vendors announced their commitment to Web services. Oracle, HP, Sun, IBM, and Microsoft, an unlikely--and thus impressive--alliance, stated their intention to support and deploy the Web services standards in their products. It took unprecedented vendor cooperation and commitment on the design of the core standards (SOAP, WSDL, and UDDI) to make Web services happen--but there's still more to be done. Additional Web services standards must be defined and adopted in order to build and integrate things like authentication and connection management. Figure 2.1 Web services architecture. 2.2 What Is WSDL? 3
  • 4. As communications protocols and message formats are standardized in the web community, it becomes increasingly possible and important to be able to describe the communications in some structured way. WSDL addresses this need by defining an XML grammar for describing network services as collections of communication endpoints capable of exchanging messages. WSDL service definitions provide documentation for distributed systems and serve as a recipe for automating the details involved in applications communication. A WSDL document defines services as collections of network endpoints, or ports. In WSDL, the abstract definition of endpoints and messages is separated from their concrete network deployment or data format bindings. This allows the reuse of abstract definitions: messages, which are abstract descriptions of the data being exchanged, and port types which are abstract collections of operations. The concrete protocol and data format specification for a particular port type constitutes a reusable binding. A port is defined by associating a network address with a reusable binding, and a collection of ports define a service. Hence, a WSDL document uses the following elements in the definition of network services: Types – a container for data type definitions using some type system (such as XSD). Message – an abstract, typed definition of the data being communicated. Operation – an abstract description of an action supported by the service. Port Type – an abstract set of operations supported by one or more endpoints. Binding – a concrete protocol and data format specification for a particular port type. Port – a single endpoint defined as a combination of a binding and a network address. Service – a collection of related endpoints. These elements are described in detail in Section 2. It is important to observe that WSDL does not introduce a new type definition language. WSDL recognizes the need for rich type systems for describing message formats, and supports the XML Schemas specification (XSD) as its canonical type system. However, since it is unreasonable to expect a single type system grammar to be used to describe all message formats present and future, WSDL allows using other type definition languages via extensibility. 4
  • 5. In addition, WSDL defines a common binding mechanism. This is used to attach a specific protocol or data format or structure to an abstract message, operation, or endpoint. It allows the reuse of abstract definitions. In addition to the core service definition framework, this specification introduces specific binding extensions for the following protocols and message formats: SOAP 1.1 HTTP GET / POST MIME Although defined within this document, the above language extensions are layered on top of the core service definition framework. Nothing precludes the use of other binding extensions with WSDL. In WSDL the term binding refers to the process associating protocol or data format information with an abstract entity like a message, operation, or portType. WSDL allows elements representing a specific technology (referred to here as extensibility elements) under various elements defined by WSDL. These points of extensibility are typically used to specify binding information for a particular protocol or message format, but are not limited to such use. Extensibility elements MUST use an XML namespace different from that of WSDL. Extensibility elements are commonly used to specify some technology specific binding. To distinguish whether the semantic of the technology specific binding is required for communication or optional, extensibility elements may place a WSDL required attribute of type Boolean on the element. The default value for required is false. The required attribute is defined in the namespace "http://schemas.xmlsoap.org/wsdl/". Extensibility elements allow innovation in the area of network and message protocols without having to revise the base WSDL specification. WSDL recommends that specifications defining such protocols also define any necessary WSDL extensions used to describe those protocols or formats 2.3 What is SOAP? SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for exchanging structured information in the implementation of Web 5
  • 6. Services in computer networks. It relies on Extensible Markup Language (XML) for its message format, and usually relies on other Application Layer protocols, most notably Remote Procedure Call (RPC) and Hypertext Transfer Protocol (HTTP), for message negotiation and transmission. SOAP can form the foundation layer of a web services protocol stack, providing a basic messaging framework upon which web services can be built. This XML based protocol consists of three parts: an envelope, which defines what is in the message and how to process it, a set of encoding rules for expressing instances of application-defined data types, and a convention for representing procedure calls and responses. As a layman's example of how SOAP procedures can be used, a SOAP message could be sent to a web-service-enabled web site, for example, a real-estate price database, with the parameters needed for a search. The site would then return an XML-formatted document with the resulting data, e.g., prices, location, features. Because the data is returned in a standardized machine-parseable format, it could then be integrated directly into a third-party web site or application. The SOAP architecture consists of several layers of specifications: for message format, Message Exchange Patterns (MEP), underlying transport protocol bindings, message processing models, and protocol extensibility. SOAP is the successor of XML- RPC, though it borrows its transport and interaction neutrality and the envelope/header/body from elsewhere (probably from WDDX). The SOAP specification defines the messaging framework which consists of: The SOAP processing model defining the rules for processing a SOAP message The SOAP extensibility model defining the concepts of SOAP features and SOAP modules The SOAP underlying protocol binding framework describing the rules for defining a binding to an underlying protocol that can be used for exchanging SOAP messages between SOAP nodes The SOAP message construct defining the structure of a SOAP message The SOAP processing model describes a distributed processing model, its participants, the SOAP nodes and how a SOAP receiver processes a SOAP message. The following SOAP nodes are defined: 6
  • 7. SOAP sender: A SOAP node that transmits a SOAP message. SOAP receiver: A SOAP node that accepts a SOAP message. SOAP message path: The set of SOAP nodes through which a single SOAP message passes. Initial SOAP sender (Originator): The SOAP sender that originates a SOAP message at the starting point of a SOAP message path. SOAP intermediary: A SOAP intermediary is both a SOAP receiver and a SOAP sender and is targetable from within a SOAP message. It processes the SOAP header blocks targeted at it and acts to forward a SOAP message towards an ultimate SOAP receiver. Ultimate SOAP receiver: The SOAP receiver that is a final destination of a SOAP message. It is responsible for processing the contents of the SOAP body and any SOAP header blocks targeted at it. In some circumstances, a SOAP message might not reach an ultimate SOAP receiver, for example because of a problem at a SOAP intermediary. An ultimate SOAP receiver cannot also be a SOAP intermediary for the same SOAP message. 2.4 What is UDDI? UDDI is a public registry designed to house information about businesses and their services in a structured way. Through UDDI, one can publish and discover information about a business and its Web Services. This data can be classified using standard taxonomies so that information can be found based on categorization. Most importantly, UDDI contains information about the technical interfaces of a business's services. Through a set of SOAP-based XML API calls, one can interact with UDDI at both design time and run time to discover technical data, such that those services can be invoked and used. In this way, UDDI serves as infrastructure for a software landscape based on Web Services. Why UDDI? What is the need for such a registry? As we look toward a software landscape of thousands—perhaps millions—of Web Services, some tough challenges emerge: How are Web Services discovered? How is this information categorized in a meaningful way? What implications are there for localization? 7
  • 8. What implications are there around proprietary technologies? How can I guarantee interoperability in the discovery mechanism? How can I interact with such a discovery mechanism at run time once my application is dependent upon a Web Service? In response to these challenges, the UDDI initiative emerged. A number of companies, including Microsoft, IBM, Sun, Oracle, Compaq, Hewlett Packard, Intel, SAP, and over three hundred other companies (see UDDI: Community for a complete list), came together to develop a specification based on open standards and non- proprietary technologies to solve these challenges. The result, initially launched in beta December 2000 and in production by May 2001, was a global business registry hosted by multiple operator nodes that users could—at no cost—both search and publish to. With such an infrastructure for Web Services in place, data about Web Services can now be found consistently and reliably in a universal, completely vendor-neutral capacity. Precise categorical searches can be performed using extensible taxonomy systems and identification. Run-time UDDI integration can be incorporated into applications. As a result, a Web Services software environment can flourish. The UDDI data is hosted by operator nodes, companies that have committed to running a public node that conforms to the specification governed by the UDDI.org consortium. Today, two public nodes exist that conform to the Version 1 UDDI specification: One is hosted by Microsoft and one by IBM. Hewlett Packard has committed to hosting a node under the Version 2 specification as well. Host operators are required to replicate data between one another across a secure channel, providing a redundancy to the entire UDDI cloud. Data can be published to one node and, after replication, can be discovered on another node. Today, replication occurs at 24-hour intervals; in the future, as more applications are dependent on the UDDI data, the intervals will become shorter between replication. It is worth noting that there are no proprietary requirements as far as how a host operator implements its node. The node simply must conform to the UDDI specification. The Microsoft node (http://uddi.microsoft.com/default.aspx), for example, has been written entirely in C# and runs in production on the .NET Beta 2 Common Language Runtime. The code base takes significant advantage of the native SOAP support and serialization offered by .NET system classes. On the backend, the Microsoft operator 8
  • 9. node utilizes Microsoft® SQL Server 2000 as its data store. Suffice to say that the IBM node is using different technologies to run its node! However, the two nodes behave identically, because they conform to the same set of SOAP-based XML API calls. Client tools can interoperate with the nodes seamlessly. As such, the UDDI public cloud serves as a prime example of how the XML Web Services model works across heterogeneous environments. Taking a look at what data is stored in UDDI and how it is structured is the next step in understanding the UDDI initiative. UDDI is relatively lightweight; it is designed as a registry, not a repository. The distinction is subtle, but crucial. A registry redirects a user to resources, whereas a repository is an actual information store. Consider the Microsoft® Windows® registry as an example: It contains basic settings and parameters, but ultimately leads an application to a resource or binary. Looking up a COM component based upon its Prog ID leads to a Class ID, which leads to the location of the binary itself. UDDI behaves similarly: Like the Windows registry, it relies on Globally Unique IDentifiers (GUID) to guarantee the ability to perform look-ups and determine the location of resources. UDDI queries ultimately lead to an interface—a .WSDL file, .XSD file, .DTD file, and so on—or an implementation (such as an .ASMX or .ASP file) located on another server. UDDI can thus answer the following kinds of questions: "What Web Service interfaces have been published that are based on WSDL and established for a given industry?" "What companies have written an implementation around one of these interfaces?" "What Web Services, categorized in a certain way, are offered today?" "What Web Services does a given company offer?" "Who do I need to contact about using a company's Web Service?" "What are the implementation details of a particular Web Service?" 2.5 Information retrieval (IR) 9
  • 10. Information retrieval is the science of searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, and statistics. The idea of using computers to search for relevant pieces of information was popularized in the article As We May Think by Vannevar Bush in 1945. The first automated information retrieval systems were introduced in the 1950s and 1960s. By 1970 several different techniques had been shown to perform well on small text corpora such as the Cranfield collection (several thousand documents). Large-scale retrieval systems, such as the Lockheed Dialog system, came into use early in the 1970s. In 1992, the US Department of Defense along with the National Institute of Standards and Technology (NIST) cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection. This catalyzed research on methods that scale to huge corpora. The introduction of web search engines has boosted the need for very large scale retrieval systems even further. Nowadays IR has been extensively deployed for keyword, template based ranking and retrieval of web documents, web services by search engines and other web components. Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query. 2.6 Why use Web Services? 10
  • 11. To support greater business efficiency and agility, information systems and their operations have become increasingly decentralized and, for a variety of historical, technical and business reasons, increasingly heterogeneous. Business processes are distributed among far-flung business divisions, suppliers, partners, and customers, with each participant having their own special needs for technology and automation. As a consequence, the demand for a high degree of interoperability among disparate information systems has never been greater. Moreover, it‘s critical for this high degree of interoperability to be sustained in the face of rapid evolution of the cooperating systems, as participants continually modify their systems in response to new or changing business requirements. Traditional assembly and integration methods (and the resulting integration software market stimulated by these methods) are not particularly well suited to this new business environment. These methods rely on a tight coupling between cooperating systems, which requires either the universal deployment of homogeneous systems (unlikely, considering the diversity and broad scale of modern business services) or extraordinarily close coordination among participating development organizations during initial development and sustainment (for example, to ensure that any changes to APIs or protocols are simultaneously reflected in all of the deployed systems). In this business environment, such tight coordination is often impractical (e.g., prohibitively expensive), and rapid evolution in response to a new business opportunity is typically out of the question. In contrast to traditional assembly and integration methods, web services technology provides a paradigm that uses messages (in the form of XML documents) passed among diverse, loosely coupled systems as the focal point for integration. These systems are no longer viewed solely as components in a larger system of systems but also as providers of services that are applied to the messages. Web services are a special case of the more general notion of Service-Oriented Architectures (SOA). Service-Oriented Architectures represent (i.e., model) interconnected systems or components as collections of cooperating services. The goal of web services technology is to dramatically reduce the interoperability issues that would otherwise arise when integrating disparate systems using traditional means. Web service is an architectural and programming model that achieves interoperability and reusability in the following ways: 11
  • 12. Loosely coupled systems—Service requesters and service providers agree upon an interface and abstract away the implementation details. The integration point is defined by the interface contract, which isolates the participants from the effects of change over time. Virtualization services and open standards—Allowing applications to run in a virtual environment based on open standards, independent of the specific details of the underlying operating system or hardware platform, allows for Internet-level scalability and distribution similar to HTTP-based web applications. Document-orientation—Interoperability across diverse technology is achieved at runtime by leveraging XML documents as a common way to express and exchange data. 2.7 Presently Available Technologies and Challenges Current Web service discovery mechanisms can be classified into three categories: 1. Registry-based discovery like UDDI. 2. Semantic annotation and discovery. 3. Similarity-based search. 2.7.1 Registry-based Discovery UDDI itself is a technical specification for a Web service registry, and it was standardized by OASIS. UDDI is a typical model of registry-based solution. It provides users with a uniform way to publish and to discover Web services via normative registries. However, UDDI only supports exact keyword match and category based query towards UDDI data entries representing each Web service, so it is hard to get a ranked query result and alternate services which are other possible services when a currently chosen service is no more useful or unreachable. Moreover, UDDI does not use WSDL definitions that actually describe service interfaces and message formats of Web services as a target of queries. 2.7.2 Semantic Annotation and Discovery 12
  • 13. The second effort is semantic annotation and discovery. In this work, semantic descriptions are annotated into Web service descriptions, and services can be found with these semantic descriptions by inference of the semantics. Such efforts are OWL-S and WSDL-S. However, in recent days this kind of studies is confronted with some severe questions such as how could we make these semantics automatically? How fast could we make inferences with these semantics? The complexities and difficulty in interpreting solutions for these kinds of questions made this method less popular. 2.7.3 Similarity-based Search: The last approach is to adopt current information retrieval techniques to retrieve Web services. In this approach, Web service similarity could be measured as probabilistic values, and it helps to find relevant services by comparing these values. However, current IR techniques have been focusing on the retrieval of plain texts, so it cannot be directly applied to service retrieval. The reason is that a service interface has a logical structure of operations and messages which are consumed by each operation. For this reason, our work and related works focus on how it is well adapted in Web services environment. Recently, there have been some efforts to use IR techniques for service retrieval. For similarity search of service operations, we can perform clustering of operations based on parameter name and operation name. We can also show that vector space model (VSM) could be useful in service retrieval too. Efficiency of VSM in service retrieval and usage of TF-IDF weighted similarity measures to provide a rank of query results is better than other IR retrieval techniques. A new solution, named concept lattices can be used for finding alternative services. 2.8 Our approach To use web services, it is required for users to find relevant services from a collection of services dispersed on the web. We currently use UDDI, a distributed registry system for Web services, to find services. However, UDDI only supports exact keyword match and category based query towards UDDI data entries representing each Web service, so it is hard to get a ranked query result and alternate services which are other possible services when a currently chosen service is no more useful or unreachable. Moreover, UDDI does not use WSDL definitions that actually describe service interfaces and message formats of Web services as a target of queries. 13
  • 14. In this project, we introduce a framework for XML Web services retrieval, which can solve the current problems that lie on UDDI. Our system is located on top of a document database, which consists of UDDI data entries and WSDL files. It provides ranked query result of Web services and finding alternate Web services by using a similarity measure. In addition, we discuss related works and further features needed to improve performance of this Web service retrieval framework. We propose a framework for Web services retrieval that provides ranked lists of Web services. To do this, we apply existing information retrieval approaches to compare Web services. This system is located on top of UDDI and considers not only UDDI data entries but also WSDL definitions of services. Our framework provides two query types: keyword-based and template-based query. Template-based query is a query which itself is user own WSDL definition to retrieve services in which interfaces quite match with the interfaces of user own services. Our framework has following characteristics: It provides a ranked list of services and hence user gets most relevant services corresponding to his query. Users can determine query granularity that is a region where item users need to get is located in the hierarchy. User can search for a particular business, service or operation instead of searching for the entire web service. This differentiates our keyword based search compared to normal keyword search and fine tunes our search. It supports discovery of alternate web services which are similar to interface definition of user own services. We call this template-based query. 14
  • 15. CHAPTER 3 SOFTWARE REQUIREMENTS SPECIFICATION 3.1 Introduction This Software Requirements Specification provides a complete description of the design and implementation of a framework for XML web services retrieval with ranking. The expected users of this framework are all those users who occasionally or frequently search for relevant web services or alternate web services for their application. It will also serves as reference for other web services retrieval methods. 3.1.1 Glossary Term Definition XML Extensible Mark-Up Language WSDL Web Services Description Language. SOAP Simple Object Access Protocol UDDI Universal Description, Discovery and Integration IR Information Retrieval VSM Vector Space Model TF-IDF Term Frequency – Inverse Document Frequency Table 3.1 Term and definition 3.2 General Description Current Web service discovery solution is UDDI, a registry-based discovery mechanism. This UDDI only supports keyword search. It provides a service list that consists the words that exactly match with user query. This means that UDDI does not provide ranked query results and alternate services. We use both UDDI and WSDL 15
  • 16. definitions and use information retrieval ranking strategy to rank web services according to their relevance and to find alternate web services. Template based queries will improve the precision and efficiency of service retrieval providing relevant query results. 3.2.1 Product Description Basically we implement a search engine for web services. User can input either keyword based query or template based query to get a ranked list of web services. He can also retrieve alternate web services by specifying WSDL file in template based query. Users can get query granularity by specifying a particular business ,service or operation in the search engine. 3.2.2 End User Expectation User should be able specify WSDL file as template based query for search engine. He is also expected to know basic elements of WSDL file so that he can find alternate web services requires for his purpose. 3.2.3 General Constraints We have to create our own document database consisting of UDDI data entries. User should be able specify WSDL file as template based query for search engine. User friendly interface to take keyword queries. 3.1 Specific Requirements 3.3.1 Functional requirements The main functional requirements for this solution are as follows: User input: User has to specify his query either as keywords (for keyword based search) or as WSDL file (for template based search) to the search engine. 16
  • 17. Parsing: In case of template based queries, we parse the input WSDL file using XML parser .It will be tokenized and stemmed. Similarity Search and Ranking: Calculate similarity between query and service descriptions. Percentages of similarity are calculated for web services. This similarity comparison is accomplished by vector space model and a TF- IDF calculation way. However, we don‘t use original TF-IDF weight for our framework. We give a weight to each tokenized word according to the position where the words exist in. Output: Ranked list of top 3 web services (WSDL file URLs) along with their percentage of similarity will be displayed by the search engine. 3.3.2 Non-functional Requirements The main non-functional requirements for this solution are as follows: Performance: The system must provide good ranking mechanism with better precisions. Search engine‘s response time should be less. Scalability: The system has to be scalable to a potentially unlimited number of parties. User-friendly operation interface: The system must be easily operated so that non-specialized users, can access it without extra help. 3.3.3 Software Requirements Eclipse Integrated development environment Tomcat5.5.7, application server. MS Access 2007 bottom database server. Java 1.6.0. 3.3.4 Hardware Requirements Operating system: Microsoft XP Professional. Processor: Pentium IV, 2.7 GHz, 1 GB RAM. 17
  • 18. CHAPTER 4 SYSTEM DESIGN 4.1 Introduction and Design Overview The system design has three fundamental sections Document database which is required for storing UDDI data entries and WSDL entries. Web services retrieval framework for parsing, tokenizing, indexing service descriptors and calculating similarity percentage. Query interface which provides a search engine for entering queries and displaying results. 4.2 System Architectural Design Figure 4.2 Overview of the framework 18
  • 19. 4.3 Detailed Description of Components 4.3.1 Parsing The first step of retrieval process in template based query is to parse service descriptions where each service is described in XML structures. In this step, service descriptions are to be parsed and organized as a DOM tree. To extract service descriptions from WSDL files, we use Java APIs for XML parser. 4.3.2 Tokenizing Next step is tokenizing of WSDL elements (in case of Template based search) and tokenization of words in user query (in case of keyword based search). During this step a concatenated word such as ‗SearchByAuthor‘ splits up as ‗search‘, ‗by‘ and ‗author‘ strings. 4.3.3 Document database We implement UDDI by creating a database. We represent each web service by four documents-a document regarding business, a document regarding service, a document regarding operations and a document providing the description of web service. Hence we maintain four tables-business, service, operations, description which contain index terms or keywords of the above mentioned documents respectively. In addition to these four tables we have a separate table called web services which consists of list of sample web services and the URLs of their respective WSDL files. 4.3.4 Ranking And the final step is to calculate similarity between query and service descriptions. With this similarity, we measure the rank of services in a result list. This similarity comparison is accomplished by vector space model and a TF-IDF calculation way. However, we don‘t use original TF-IDF weight for our framework. We give a weight to each tokenized word according to the position where the words exist in. For example, let a word is positioned in one of a name and a description element. If this word exists in a name field, it has more important meaning than the same word in a description field. So we calculate term frequencies according to their position as shown below. 19
  • 20. In here, Wij is a weighted vector for a word in a query vector, and Wid is a weighted vector for a word in a service descriptions. Document weight is calculated using: Wid = ∑tf * idf Query weight is calculated using: Wqj = ∑ (0.5+(0.5* tf ))* idf 4.4 User Interface Design 4.4.1 Description of the User Interface A form for the user to enter keyword based query. Radio buttons for selecting business ,service ,operations enabling granular search. A form for the user to enter template based query by specifying WSDL. List of operations in the specified WSDL file. Name, URL and similarity percentage of ranked query results. 20
  • 21. 4.4.2 Use Case Diagram Keyword-based Search The use case diagram above depict the working of the system when user enters keyword based query. The interactions happening between the actors are also depicted accordingly. Actors 1. Client 2. Server Description of Use Cases 1. Keyword-based Query Here, user enters the query in the natural language. It produces only the exact match for the query that is been entered by the user. 21
  • 22. 2. Databases Web service descriptors and WSDL entries of web services are stored in a document database.These entries are considered during similarity calculation with user queries. 3. Retrieved URLs When a user enters a query in the form of keyword, all the relevant URLs of WSDL files of relevant web services are displayed. User is provided with an option to select which URL he wants to use. 4. Search Relevant web services for the user given keyword based query are searched. Similarity calculation is done and results are displayed in descending order of similarity percentage. 22
  • 23. Template based Search The use case diagrams above depict the basic working of the system when user enters template based query. The interactions happening between the actors are also depicted accordingly. Actors 1. Client 2. Server Description of Use Cases 1. Template User has to enter the WSDL file as template based query in the form provided in the search engine the file will be parsed here, which will be checked against the indexes stored in the databases. 23
  • 24. 2. Parser The WSDL file specified will be parsed using XML parser using java APIs. A parser creates a DOM tree from the given WSDL file and it will be used to extract operation names from WSDL file. 3. Similarity Measures Whenever a match is found then, we use vector space model to calculate the similarity. We represent all the files and the query in the form of a vector and do the TF-IDF calculations to get the similarity. 4. VSM In a vector space, Documents and queries are represented as vectors. dj = (w1,j,w2,j,...,wt,j) q = (w1,q,w2,q,...,wt,q) Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is TF-IDF weighting, which is used as a similarity measure. 5. Ranked List After the calculation of all the similarities, all the relevant web services are arranged in the descending order of the similarity percentage. User is provided with an option to select which URL he wants to use. 6. Retrieved URLs When a user enters a query in the form of keyword, all the relevant URLs of WSDL files of relevant web services are displayed. User is provided with an option to select which URL he wants to use. 24
  • 25. 4.5 Test Plan 4.5.1 Features to Be Tested Parsing module has to be tested whether the WSDL file is being parsed correctly and the appropriate operation names are extracted. Tokenizing module has to be tested to make sure that the queries are being tokenized correctly into separate tokens. Document database connection module has to be tested whether the connection is established correctly and we need to ensure that required data entries are extracted. TF-IDF similarity module has to be tested to make sure that similarity between user query and web services are calculated accurately in different cases. 25
  • 26. CHAPTER 5 IMPLEMENTATION 5.1 Class Description 5.1.1 MainClass: Extends: JFrame Implements: ActionListener and TreeSelectionListener Attributes: bg : ButtonGroup bname : String browse : JButton key_search : JButton key_text : JTextField n : int op_text : JTextArea rb1 : JRadioButton rb2 : JRadioButton rb3 : JRadioButton result : String[][] scrollText : JScrollPane selected : String[] table : JTable temp_parse : JButton temp_text : JTextField top : DefaultMutableTreeNode tree : JTree tree_search : JButton type : String 26
  • 27. Operations: MainClass(String): Constructor for the class which passes the value, name of the frame, to super class JFrame constructor. actionPerformed(ActionEvent): Abstract method of class ActionListener which must be overloaded. It sets the variable type to the component selected on the frame. addComponentsToPane(Container): This method is used to add the components to frame, using grid layout. It divides the pane to 4 regions and adds components related to keyword search in region1, components related to parsing in region2, components related to template search in region3 and the components related to result in region4. valueChanged(TreeSelectionEvent):Abstract method of class TreeActionListener which must be overloaded. It copies the selected nodes of operations tree and adds to string array selected. 5.1.2 :BusinessSearch: Package: search Operation: calculateWeight(String[]): Takes the tokenized query words as Input. Connects to the document database and for each entry of the WSDL file we calculate the weight of the token with respect to the document by searching for the token in the business names and description entries. It loops for total number of documents in database and returns float array with weights. 27
  • 28. 5.1.3 :OperationSearch: Package: search Operation: calculateWeight(String[]): Takes the tokenized query words as Input. Connects to the document database and for each entry of the WSDL file we calculate the weight of the token with respect to the document by searching for the token in the operation names and description entries. It loops for total number of documents in database and returns float array with weights. 5.1.4 :ServiceSearch: Package: search Operation: calculateWeight(String[]): Takes the tokenized query words as Input. Connects to the document database and for each entry of the WSDL file we calculate the weight of the token with respect to the document by searching for the token in the service names and description entries. It loops for total number of documents in database and returns float array with weights. 5.1.5 :Tokenization: Package: search Operation: tokenize(String): Takes a string as input. And the string is broken in to tokens as either a capital letter is found or one of the special characters is encountered. If the token collected is either of the words ―by, to, and etc..‖ then it s discarded. 28
  • 29. 5.1.6 :SimilarityCalculator: Package: search Attribute: similarity: float[][] Operation: calculateSimilarity(float[][], float[], int): Takes weighted array of query and weighted array of documents and number of tokens as input. It calculates the similarity of the document to the query for tokens. Care is taken such that similarity never exceeds 100. And stores similarities of document in float array similarity. finalResult(): It searches for top 3 non zero similarity documents and connects to the document database and retrieves the WSDL URL and name of the service and stores in string variable. And returns it. 5.1.7:KeyWordSearch: Package: search Implements: ActionListener Operation: actionPerformed(ActionEvent): Abstract method of class ActionListener which must be overloaded. This method is invoked when user enters the query and presses the key word search button. It takes the query string and creates Tokenization object and tokenizes the string. And calculates the TF/IDF on the query by using object QueryTFIDF. Depending on the radio button selected business, operation or 29
  • 30. service it creates corresponding object and calculates the weight for the documents. And using query weight and document weight it calculates similarity using similarityCalculation object and calculates the final result and updates the result table. 5.1.8:TemplateSearch: Package: search Implements: ActionListener Operation: actionPerformed(ActionEvent): Abstract method of class ActionListener which must be overloaded. This method is invoked when user selects the nodes of the operation tree and presses the template search button. It takes the string selected which contains the selected tree nodes and creates Tokenization object and tokenizes the string. And calculates the TF/IDF on the query by using object QueryTFIDF. Creates OperationSearch object and calculates the weight for the documents. And using query weight and document weight it calculates similarity using similarityCalculation object and calculates the final result and updates the result table. 5.1.9: OpenFile: Package: search Implements: ActionListener Operation: actionPerformed(ActionEvent): Abstract method of class ActionListener which must be overloaded. This method is invoked when user selects browse button in the template search to select the WSDL 30
  • 31. file. It creates the JFileChooser object and opens the window after user selecting the file the path is extracted and set to the text field. 5.1.10: ParseWSDL: Package: search Implements: ActionListener Operation: actionPerformed(ActionEvent): Abstract method of class ActionListener which must be overloaded. This method is invoked when user selects the WSDL file and presses the parse button in the template based search. It takes the string which is the path of the WSDL file and creates the file reader object and converts the contents into a single string and then that string is converted into document of DocumentBuilderFactory object and then a xpath is defined for the operations field of WSDL file and executed on the document. Where the list of operations is stored in the NodeList. Using this list it creates the Jtree and adds the nodes to it. 5.1.11: QueryTFIDF: Package: search Operation: calculateWeight(String[]): Takes the query tokens string as input and calculates the weight of the token in query and returns it. 5.1.12:MyTableModel: Package: table 31
  • 32. Extends: AbstractTableModel Attributes: columnNames : String[] data : Object[][] Operations: getColumnCount(): returns number of colums in variable columnNames getColumnName(int): returns the string value in columnNames array in the given index. getRowCount(): returns number of rows in 2-D array data. getValueAt(int, int): returns the value in the table at the given indices. setValueAt(Object, int, int) sets the given value in the table at the given indices. 5.1.13: BrowserLaunch: Package: table Attributes: browsers : String[] errMsg : String Operations: openURL(String): It s executed when user selects any row in the result table. The URL in the row is extracted and given as input. It searches for the operating system of the running system using system properties and opens the browser selecting 32
  • 33. from browser string and opens it with the given URL. 5.2 Algorithms Used 5.2.1 Keyword Based Search Algorithm //input: contents of text field which has the user query. //output: displays table with 3 rows containing top3 non zero similarity service WSDL file URL. //description: it tokenizes the given query and finds the weight of the query and also searches the document database and finds the weigh of document with the tokens and finds the similarity of the document to the query. Step1: string input=getText(text_field) Step2: string[] tokens=Tokenization.tokenize(input) Step3: float[] queryweight=QueryTFIDF.calculateWeight(tokens) Step4: if(buttonselected=Business)then float[][] documentweight=BusinessSearch.calculateWeight(tokens) else if(buttonselected=Service)then float[][] documentweight=ServiceSearch.calculateWeight(tokens) else if(buttonselected=Operation)then float[][] documentweight=OperationSearch.calculateWeight(tokens) {end of if else if} Step5: SimilarityCalculator.calculateSimilarity(documentweight,querywei ght,tokens.length()) Step6: string[][] result = : SimilarityCalculator.finalResult() Step7: for i=0 to number of rows in result for j=0 to number of columns in result resultTable.setValue(result[i][j],i,j) 33
  • 34. 5.2.2 Template Based Search Algorithm //input: WSDL file as the user query. //output: displays table with 3 rows containing top3 non zero similarity service WSDL file URL. //description: it parses the given WSDL file and extracts the operation names and displays in tree form and lets user select nodes tokenizes the selected nodes and finds the weight of qery and also searches the document database and finds the weigh of document with the tokens and finds the similarity of the document to the query. Step1: string input=getText(text_field) Step2: string[] operations = parse((WSDL)input) Step3: tree.nodes = operations. Step4: string[] selectednodes = tree.getselectednodes() Step5:for each selectednode String[] tokens = Tokenization.tokenize(selectednode[i]) Step6: float[] queryweight=QueryTFIDF.calculateWeight(tokens) Step7:float[][] documentweight=OperationSearch.calculateWeight(tokens) Step8: SimilarityCalculator.calculateSimilarity(documentweight,querywei ght,tokens.length()) Step9: string[][] result = : SimilarityCalculator.finalResult() Step10: for i=0 to number of rows in result for j=0 to number of columns in result resultTable.setValue(result[i][j],i,j) 34
  • 35. CHAPTER 6 TESTING 6.1 Introduction 6.1.1 System Overview Our system is a search engine which takes input as either keyword or template based queries from user and produce ranked list of relevant web services arranged in decreasing order of similarity percentage. 6.1.2 Test Approach User can provide keyword based query as per Business Service Operation User can also provide template based query by specifying a WSDL file. The search engine is expected to produce a ranked list of web services with their similarity percentage and URLs of corresponding WSDL files. 6.2 Test Cases 6.2.1 Business Search 6.2.1.1 Purpose To verify whether the retrieval of web services as per business is accurate 6.2.1.2 Inputs A keyword based query on businesses or companies which provide web services. 6.2.1.3 Expected Outputs Output consists of a set of web services which are ranked and listed in the decreasing order of their similarity with the respective URLs of WSDL files. 35
  • 36. 6.2.1.4 Test Procedure User specified query will be tokenized and TF-IDF calculations for each term are calculated for each web service by referring business table in the document database. The description table entries are also considered for TF-IDF but less weightage is given compared to business table entries. Finally we calculate similarity percentage for each web service with respect to the user query and rank them. 6.2.1.5 Test Results Result consists of a ranked list of web services according to the business query specified by user with correctly calculated similarity percentages. 6.2.2 Service Search 6.2.2.1 Purpose To verify whether the retrieval of web services as per service description is accurate. 6.2.2.2 Inputs A keyword based query on service expected by the user. 6.2.2.3 Expected Outputs Output consists of a set of web services which are ranked and listed in the decreasing order of their similarity with the respective URLs of WSDL files. 6.2.2.4 Test Procedure User specified query will be tokenized and TF-IDF calculations for each term are calculated for each web service by referring service table in the document database. The description table entries are also considered for TF-IDF but less weightage is given compared to business table entries. Finally we calculate similarity percentage for each web service with respect to the user query and rank them. 6.2.2.5 Test Results Result consists of a ranked list of web services according to the service description specified by user with correctly calculated similarity percentages 36
  • 37. 6.2.3 Operation Search 6.2.3.1 Purpose To verify whether the retrieval of web services as per operation description is accurate. 6.2.3.2 Inputs A keyword based query on operations expected by the user. 6.2.3.3 Expected Outputs Output consists of a set of web services which are ranked and listed in the decreasing order of their similarity with the respective URLs of WSDL files. 6.2.3.4 Test Procedure User specified query will be tokenized and TF-IDF calculations for each term are calculated for each web service by referring operations table in the document database. The description table entries are also considered for TF-IDF but less weightage is given compared to business table entries. Finally we calculate similarity percentage for each web service with respect to the user query and rank them. 6.2.3.5 Test Results Result consists of a ranked list of web services according to the operation description specified by user with correctly calculated similarity percentages. 6.2.4 Template based Search 6.2.4.1 Purpose To verify whether the retrieval of web services as per the WSDL file specified by the user is accurate. 6.2.4.2 Inputs A template based query on by providing a WSDL file. 37
  • 38. 6.2.4.3 Expected Outputs Output consists of a set of query relevant web services arranged in the decreasing order of their similarity with the respective URLs of WSDL files. 6.2.4.4 Test Procedure User specified WSDL file will be parsed and operation names will be extracted from the WSDL file. User can select desired operations from the list of available operations in WSDL file. The selected operation names will be tokenized and TF-IDF calculations for each term are calculated for each web service by referring operations table in the document database. The description table entries are also considered for TF-IDF but less weightage is given compared to business table entries. Finally we calculate similarity percentage for each web service with respect to the user query and rank them. 6.2.4.5 Test Results Result consist of a ranked list of web services whose WSDL files have similar operations compared to the user specified WSDL file, with correctly calculated similarity percentages. 6.3 Experimental Study The experiments were performed on a Pentium IV 2.7GHz, 1G RAM, Windows machine and codes are implemented with Java 1.6.0. Number of web services: 20 Number of operations: 42 Document database size: 1.34MB The experiments were performed on following 5 queries 1.Area of square 2.Search book by author name 3.Air routes between two cities 4.Car price by name 5.Country by its capital city 38
  • 39. 0.8 0.7 0.6 0.5 0.4 Keyword Query 0.3 Template Query 0.2 0.1 0 1 2 3 4 5 Graph 1.Execution time for keyword and template searches Template based queries take more time to execute than keyword based query as more time is required for parsing and extracting useful data from WSDL file , which is shown is Graph 1. Keyword Queries 100 90 80 70 60 50 Keyword Queries 40 30 20 10 0 1 2 3 4 5 Graph 2.Precision percentages of keyword based query searches. 39
  • 40. Template Queries 100 90 80 70 60 50 Template Queries 40 30 20 10 0 1 2 3 4 5 Graph 3.Precision percentages of template based query searches Template based queries produce more efficient results compared to keyword based queries as we directly match WSDL files of web services, which provide more accurate results. This is reflected in Graphs 2 and 3 where precision percentages for keyword and template based queries are respectively shown. 40
  • 41. CHAPTER 7 CONCLUSION AND FUTURE ENHANCEMENT In this project we have proposed a new form of query called template based query which optimizes searching and produces fine tuned results. We also improve the conventional keyword based search by providing granular search where user can search for a particular business, operation, service instead of searching vaguely for the entire web service. Template based queries take more time to execute than keyword based query as more time is required for parsing and extracting useful data from WSDL file .However template based queries produce more efficient results compared to keyword based queries as we directly match WSDL files of web services, which provide more accurate results. As a future enhancement we can perform service composition wherein user can combine one or more web services to create a new web services as per his requirement. One way to do this is selecting all those services whose number and type of parameters matches with the existing web service so that we can combine the services to have new web services 41
  • 42. CHAPTER 8 BIBLIOGRAPHY AND REFERENCES [1] IEEE 2007 paper, Kyong-Ha Lee, Mi-young Lee Yun-Young Hwang and Kyu-Chul Lee, Department of Computer Engineering, Chungnam National University Daejeon, 305-764, KOREA , “A Framework for XML Web Services Retrieval with Ranking”. [2] Chiristian Platzer, Schahram Dustdar, “A Vector space search engine for Web services”, In Proceedings of 3rd European Conference on Web Services, 2005 [3] Natallia Kokash, ―A Comparison of web service interface similarity measures”, Technical Report DIT-06-025, University of Trento, 2006 [4] Dunglu Peng, et. al., ―Concept-based retrieval of alternate Web services‖, In Proceedings of DASFAA 2005, LNCS V.3453, pp. 359-371, 2005 [5] Jong P. Yoon, et. al. “BitCube: A Three-dimensional Bitmap Indexing for XML Documents”, Journal of Intelligent Information Systems Vol. 17 pp. 241-252, 2001 [6] W3 schools-www.w3c.org [7] Wikipedia-web services, information retrieval. 42