Web services

CHAPTER 1
INTRODUCTION

1.1 Statement Of The Problem
To use web services, it is required for users to find relevant services from a
collection of services dispersed on the web. We currently use UDDI, a distributed registry
system for Web services, to find services. However, UDDI only supports exact keyword
match and category based query towards UDDI data entries representing each Web
service, so it is hard to get a ranked query result and alternate services which are other
possible services when a currently chosen service is no more useful or unreachable.

1.2 Objective Of The Project
We propose a framework for Web services retrieval that provides ranked lists of Web
services. To do this, we apply existing information retrieval approaches to compare Web
services. This system is located on top of a document database, which consists of UDDI
entries of web services and considers not only UDDI data entries but also WSDL
definitions of services. Our framework provides two query types: keyword-based and
template-based query.

1.3 Current Scope
Current Web service discovery solution is UDDI, a registry-based discovery
mechanism. This UDDI only supports keyword search. It provides a service list that
consist the words that exactly match with user query. This means that UDDI does not
provide ranked query results and alternate services. We use both UDDI and WSDL
definitions and use information retrieval ranking strategy to rank web services according
to their relevance and to find alternate web services.

1.4 Future Scope
This implementation will improve the efficiency of web service retrieval as user
can specify his own query in terms of his own WSDL file as a template based query.
Instead of getting keyword based results, user can get a list of web services which are
ranked and also he can get alternate web services.

CHAPTER 2

1

LITERATURE SURVEY

2.1 History of Web Services
Web services are applications that can communicate with other applications over
a network by using a set of standardized protocols. The technology originated from the
efforts of many companies that share an interest in building electronic marketplaces.

EDI was the first attempt to create a standard way for businesses to communicate
over a network. In the 25 years since EDI came on the scene, there have been numerous
attempts at a universal conduit for connecting business logic over a network: Common
Object Request Broker Architecture, Distributed Component Object Model, Unix Remote
Procedure Call, and Java Remote Method Invocation. Each of those technologies failed to
gain significant market share or enough momentum to succeed. All of them exist today--
each still has its uses--but each failed to gain a broad reach.

Before the Web, getting all the major software vendors to agree on a transport
protocol for cross-network application services might have been impossible. But the Web
rendered that decision academic, by specifying lower level transports for standardized
communication. The Web uses HTTP running on TCP/IP. TCP/IP was already a mature
standard by the time the Web went mainstream in 1994, and, by 1997, HTTP had become
a universal business standard. With HTTP and TCP/IP in place, all that was needed was
some kind of messaging and data encapsulation standard--and a lot of vendor
cooperation.

It was XML's invention that really paved the way for Web services. As a widely
heralded, platform-independent standard for data description that could also be used to
describe message-passing protocols, XML was a logical choice for the job of
standardized application-to-application communication. XML officially became a
standard in February 1998, when the World Wide Web Consortium announced that XML
1.0 had reached draft recommendation status: suitable for deployment in applications. By
early 1998, several attempts at an XML protocol for interprocess communication were
made. Allaire Corp.'s Web Distributed Data Exchange (WDDX) was one independent
attempt of note, but it was SOAP, developed by Dave Winer, CEO of Userland Software,
Microsoft engineers Bob Atkinson and Mohsen Al-Ghosein, and Don Box, co-founder of
Develop Mentor Incorporated, that was to become the basis for Web services.
2

Electronic marketplaces were a hot concept in December 1999, when Microsoft
held a private meeting with IBM and other interested companies to show off SOAP 1.0,
its specification for a standardized message-passing protocol based on XML. By the
summer of 2000, SOAP was gaining wider acceptance. IBM and Microsoft were also
each working on a way to programmatically describe how to connect up to a Web service.
After some discussion, protocol proposals from Microsoft and IBM merged. IBM
contributed Network Accessible Service Specification Language and Microsoft offered
both Service Description Language and SOAP Contract Language. In the fall of 2000, the
merged specification, Web Services Description Language, was announced.

With SOAP and WSDL, companies could create and describe their Web services.
But someone still needed to provide a way to advertise and locate Web services. In March
2000, IBM, Microsoft, and Ariba started working on the solution: Universal Description,
Discovery, and Integration. With SOAP, WSDL, and UDDI in place, the de-facto
standards to create Web services had arrived, but it wasn't until the end of 2000 that five
major IT software infrastructure vendors announced their commitment to Web services.
Oracle, HP, Sun, IBM, and Microsoft, an unlikely--and thus impressive--alliance, stated
their intention to support and deploy the Web services standards in their products.

It took unprecedented vendor cooperation and commitment on the design of the
core standards (SOAP, WSDL, and UDDI) to make Web services happen--but there's still
more to be done. Additional Web services standards must be defined and adopted in order
to build and integrate things like authentication and connection management.

Figure 2.1 Web services architecture.

2.2 What Is WSDL?

3

As communications protocols and message formats are standardized in the web
community, it becomes increasingly possible and important to be able to describe the
communications in some structured way. WSDL addresses this need by defining an XML
grammar for describing network services as collections of communication endpoints
capable of exchanging messages. WSDL service definitions provide documentation for
distributed systems and serve as a recipe for automating the details involved in
applications communication.

A WSDL document defines services as collections of network endpoints, or ports.
In WSDL, the abstract definition of endpoints and messages is separated from their
concrete network deployment or data format bindings. This allows the reuse of abstract
definitions: messages, which are abstract descriptions of the data being exchanged, and
port types which are abstract collections of operations. The concrete protocol and data
format specification for a particular port type constitutes a reusable binding. A port is
defined by associating a network address with a reusable binding, and a collection of
ports define a service. Hence, a WSDL document uses the following elements in the
definition of network services:

Types – a container for data type definitions using some type system (such as
XSD).
Message – an abstract, typed definition of the data being communicated.
Operation – an abstract description of an action supported by the service.
Port Type – an abstract set of operations supported by one or more endpoints.
Binding – a concrete protocol and data format specification for a particular port
type.
Port – a single endpoint defined as a combination of a binding and a network
address.
Service – a collection of related endpoints.

These elements are described in detail in Section 2. It is important to observe that
WSDL does not introduce a new type definition language. WSDL recognizes the need for
rich type systems for describing message formats, and supports the XML Schemas
specification (XSD) as its canonical type system. However, since it is unreasonable to
expect a single type system grammar to be used to describe all message formats present
and future, WSDL allows using other type definition languages via extensibility.

4

In addition, WSDL defines a common binding mechanism. This is used to attach
a specific protocol or data format or structure to an abstract message, operation, or
endpoint. It allows the reuse of abstract definitions.

In addition to the core service definition framework, this specification introduces
specific binding extensions for the following protocols and message formats:

SOAP 1.1
HTTP GET / POST
MIME

Although defined within this document, the above language extensions are layered
on top of the core service definition framework. Nothing precludes the use of other
binding extensions with WSDL.

In WSDL the term binding refers to the process associating protocol or data
format information with an abstract entity like a message, operation, or portType. WSDL
allows elements representing a specific technology (referred to here as extensibility
elements) under various elements defined by WSDL. These points of extensibility are
typically used to specify binding information for a particular protocol or message format,
but are not limited to such use. Extensibility elements MUST use an XML namespace
different from that of WSDL.

Extensibility elements are commonly used to specify some technology specific
binding. To distinguish whether the semantic of the technology specific binding is
required for communication or optional, extensibility elements may place a WSDL
required attribute of type Boolean on the element. The default value for required is false.
The required attribute is defined in the namespace "http://schemas.xmlsoap.org/wsdl/".

Extensibility elements allow innovation in the area of network and message
protocols without having to revise the base WSDL specification. WSDL recommends that
specifications defining such protocols also define any necessary WSDL extensions used
to describe those protocols or formats

2.3 What is SOAP?
SOAP, originally defined as Simple Object Access Protocol, is a protocol
specification for exchanging structured information in the implementation of Web
5

Services in computer networks. It relies on Extensible Markup Language (XML) for its
message format, and usually relies on other Application Layer protocols, most notably
Remote Procedure Call (RPC) and Hypertext Transfer Protocol (HTTP), for message
negotiation and transmission. SOAP can form the foundation layer of a web services
protocol stack, providing a basic messaging framework upon which web services can be
built. This XML based protocol consists of three parts: an envelope, which defines what
is in the message and how to process it, a set of encoding rules for expressing instances of
application-defined data types, and a convention for representing procedure calls and
responses.

As a layman's example of how SOAP procedures can be used, a SOAP message
could be sent to a web-service-enabled web site, for example, a real-estate price database,
with the parameters needed for a search. The site would then return an XML-formatted
document with the resulting data, e.g., prices, location, features. Because the data is
returned in a standardized machine-parseable format, it could then be integrated directly
into a third-party web site or application.

The SOAP architecture consists of several layers of specifications: for message
format, Message Exchange Patterns (MEP), underlying transport protocol bindings,
message processing models, and protocol extensibility. SOAP is the successor of XML-
RPC, though it borrows its transport and interaction neutrality and the
envelope/header/body from elsewhere (probably from WDDX).

The SOAP specification defines the messaging framework which consists of:

The SOAP processing model defining the rules for processing a SOAP message
The SOAP extensibility model defining the concepts of SOAP features and SOAP
modules
The SOAP underlying protocol binding framework describing the rules for
defining a binding to an underlying protocol that can be used for exchanging
SOAP messages between SOAP nodes
The SOAP message construct defining the structure of a SOAP message

The SOAP processing model describes a distributed processing model, its
participants, the SOAP nodes and how a SOAP receiver processes a SOAP message. The
following SOAP nodes are defined:

6

SOAP sender: A SOAP node that transmits a SOAP message.
SOAP receiver: A SOAP node that accepts a SOAP message.
SOAP message path: The set of SOAP nodes through which a single SOAP
message passes.
Initial SOAP sender (Originator): The SOAP sender that originates a SOAP
message at the starting point of a SOAP message path.
SOAP intermediary: A SOAP intermediary is both a SOAP receiver and a SOAP
sender and is targetable from within a SOAP message. It processes the SOAP
header blocks targeted at it and acts to forward a SOAP message towards an
ultimate SOAP receiver.
Ultimate SOAP receiver: The SOAP receiver that is a final destination of a
SOAP message. It is responsible for processing the contents of the SOAP body
and any SOAP header blocks targeted at it. In some circumstances, a SOAP
message might not reach an ultimate SOAP receiver, for example because of a
problem at a SOAP intermediary. An ultimate SOAP receiver cannot also be a
SOAP intermediary for the same SOAP message.

2.4 What is UDDI?
UDDI is a public registry designed to house information about businesses and
their services in a structured way. Through UDDI, one can publish and discover
information about a business and its Web Services. This data can be classified using
standard taxonomies so that information can be found based on categorization. Most
importantly, UDDI contains information about the technical interfaces of a business's
services. Through a set of SOAP-based XML API calls, one can interact with UDDI at
both design time and run time to discover technical data, such that those services can be
invoked and used. In this way, UDDI serves as infrastructure for a software landscape
based on Web Services.

Why UDDI? What is the need for such a registry? As we look toward a software
landscape of thousands—perhaps millions—of Web Services, some tough challenges
emerge:

How are Web Services discovered?
How is this information categorized in a meaningful way?
What implications are there for localization?

7

What implications are there around proprietary technologies? How can I guarantee
interoperability in the discovery mechanism?
How can I interact with such a discovery mechanism at run time once my
application is dependent upon a Web Service?

In response to these challenges, the UDDI initiative emerged. A number of
companies, including Microsoft, IBM, Sun, Oracle, Compaq, Hewlett Packard, Intel,
SAP, and over three hundred other companies (see UDDI: Community for a complete
list), came together to develop a specification based on open standards and non-
proprietary technologies to solve these challenges. The result, initially launched in beta
December 2000 and in production by May 2001, was a global business registry hosted by
multiple operator nodes that users could—at no cost—both search and publish to.

With such an infrastructure for Web Services in place, data about Web Services
can now be found consistently and reliably in a universal, completely vendor-neutral
capacity. Precise categorical searches can be performed using extensible taxonomy
systems and identification. Run-time UDDI integration can be incorporated into
applications. As a result, a Web Services software environment can flourish.

The UDDI data is hosted by operator nodes, companies that have committed to
running a public node that conforms to the specification governed by the UDDI.org
consortium. Today, two public nodes exist that conform to the Version 1 UDDI
specification: One is hosted by Microsoft and one by IBM. Hewlett Packard has
committed to hosting a node under the Version 2 specification as well. Host operators are
required to replicate data between one another across a secure channel, providing a
redundancy to the entire UDDI cloud. Data can be published to one node and, after
replication, can be discovered on another node. Today, replication occurs at 24-hour
intervals; in the future, as more applications are dependent on the UDDI data, the
intervals will become shorter between replication.

It is worth noting that there are no proprietary requirements as far as how a host
operator implements its node. The node simply must conform to the UDDI specification.
The Microsoft node (http://uddi.microsoft.com/default.aspx), for example, has been
written entirely in C# and runs in production on the .NET Beta 2 Common Language
Runtime. The code base takes significant advantage of the native SOAP support and
serialization offered by .NET system classes. On the backend, the Microsoft operator

8

node utilizes Microsoft® SQL Server 2000 as its data store. Suffice to say that the IBM
node is using different technologies to run its node! However, the two nodes behave
identically, because they conform to the same set of SOAP-based XML API calls. Client
tools can interoperate with the nodes seamlessly. As such, the UDDI public cloud serves
as a prime example of how the XML Web Services model works across heterogeneous
environments.

Taking a look at what data is stored in UDDI and how it is structured is the next
step in understanding the UDDI initiative. UDDI is relatively lightweight; it is designed
as a registry, not a repository. The distinction is subtle, but crucial. A registry redirects a
user to resources, whereas a repository is an actual information store. Consider the
Microsoft® Windows® registry as an example: It contains basic settings and parameters,
but ultimately leads an application to a resource or binary. Looking up a COM component
based upon its Prog ID leads to a Class ID, which leads to the location of the binary itself.

UDDI behaves similarly: Like the Windows registry, it relies on Globally Unique
IDentifiers (GUID) to guarantee the ability to perform look-ups and determine the
location of resources. UDDI queries ultimately lead to an interface—a .WSDL file, .XSD
file, .DTD file, and so on—or an implementation (such as an .ASMX or .ASP file)
located on another server. UDDI can thus answer the following kinds of questions:

"What Web Service interfaces have been published that are based on WSDL and
established for a given industry?"
"What companies have written an implementation around one of these
interfaces?"
"What Web Services, categorized in a certain way, are offered today?"
"What Web Services does a given company offer?"
"Who do I need to contact about using a company's Web Service?"
"What are the implementation details of a particular Web Service?"

2.5 Information retrieval (IR)

9

Information retrieval is the science of searching for documents, for information
within documents, and for metadata about documents, as well as that of searching
relational databases and the World Wide Web. There is overlap in the usage of the terms
data retrieval, document retrieval, information retrieval, and text retrieval, but each also
has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary,
based on computer science, mathematics, library science, information science,
information architecture, cognitive psychology, linguistics, and statistics.

The idea of using computers to search for relevant pieces of information was
popularized in the article As We May Think by Vannevar Bush in 1945. The first
automated information retrieval systems were introduced in the 1950s and 1960s. By
1970 several different techniques had been shown to perform well on small text corpora
such as the Cranfield collection (several thousand documents). Large-scale retrieval
systems, such as the Lockheed Dialog system, came into use early in the 1970s.

In 1992, the US Department of Defense along with the National Institute of
Standards and Technology (NIST) cosponsored the Text Retrieval Conference (TREC) as
part of the TIPSTER text program. The aim of this was to look into the information
retrieval community by supplying the infrastructure that was needed for evaluation of text
retrieval methodologies on a very large text collection. This catalyzed research on
methods that scale to huge corpora. The introduction of web search engines has boosted
the need for very large scale retrieval systems even further.

Nowadays IR has been extensively deployed for keyword, template based ranking
and retrieval of web documents, web services by search engines and other web
components. Most IR systems compute a numeric score on how well each object in the
database match the query, and rank the objects according to this value. The top ranking
objects are then shown to the user. The process may then be iterated if the user wishes to
refine the query.

2.6 Why use Web Services?

10

To support greater business efficiency and agility, information systems and their
operations have become increasingly decentralized and, for a variety of historical,
technical and business reasons, increasingly heterogeneous. Business processes are
distributed among far-flung business divisions, suppliers, partners, and customers, with
each participant having their own special needs for technology and automation. As a
consequence, the demand for a high degree of interoperability among disparate
information systems has never been greater. Moreover, it‘s critical for this high degree of
interoperability to be sustained in the face of rapid evolution of the cooperating systems,
as participants continually modify their systems in response to new or changing business
requirements.

Traditional assembly and integration methods (and the resulting integration
software market stimulated by these methods) are not particularly well suited to this new
business environment. These methods rely on a tight coupling between cooperating
systems, which requires either the universal deployment of homogeneous systems
(unlikely, considering the diversity and broad scale of modern business services) or
extraordinarily close coordination among participating development organizations during
initial development and sustainment (for example, to ensure that any changes to APIs or
protocols are simultaneously reflected in all of the deployed systems). In this business
environment, such tight coordination is often impractical (e.g., prohibitively expensive),
and rapid evolution in response to a new business opportunity is typically out of the
question.

In contrast to traditional assembly and integration methods, web services
technology provides a paradigm that uses messages (in the form of XML documents)
passed among diverse, loosely coupled systems as the focal point for integration. These
systems are no longer viewed solely as components in a larger system of systems but also
as providers of services that are applied to the messages. Web services are a special case
of the more general notion of Service-Oriented Architectures (SOA). Service-Oriented
Architectures represent (i.e., model) interconnected systems or components as collections
of cooperating services. The goal of web services technology is to dramatically reduce the
interoperability issues that would otherwise arise when integrating disparate systems
using traditional means.

Web service is an architectural and programming model that achieves interoperability and
reusability in the following ways:
11

Loosely coupled systems—Service requesters and service providers agree upon an
interface and abstract away the implementation details. The integration point is
defined by the interface contract, which isolates the participants from the effects
of change over time.
Virtualization services and open standards—Allowing applications to run in a
virtual environment based on open standards, independent of the specific details
of the underlying operating system or hardware platform, allows for Internet-level
scalability and distribution similar to HTTP-based web applications.
Document-orientation—Interoperability across diverse technology is achieved at
runtime by leveraging XML documents as a common way to express and
exchange data.

2.7 Presently Available Technologies and Challenges

Current Web service discovery mechanisms can be classified into three categories:

1. Registry-based discovery like UDDI.
2. Semantic annotation and discovery.
3. Similarity-based search.

2.7.1 Registry-based Discovery
UDDI itself is a technical specification for a Web service registry, and it was
standardized by OASIS. UDDI is a typical model of registry-based solution. It provides
users with a uniform way to publish and to discover Web services via normative
registries. However, UDDI only supports exact keyword match and category based query
towards UDDI data entries representing each Web service, so it is hard to get a ranked
query result and alternate services which are other possible services when a currently
chosen service is no more useful or unreachable. Moreover, UDDI does not use WSDL
definitions that actually describe service interfaces and message formats of Web services
as a target of queries.

2.7.2 Semantic Annotation and Discovery

12

The second effort is semantic annotation and discovery. In this work, semantic
descriptions are annotated into Web service descriptions, and services can be found with
these semantic descriptions by inference of the semantics. Such efforts are OWL-S and
WSDL-S. However, in recent days this kind of studies is confronted with some severe
questions such as how could we make these semantics automatically? How fast could we
make inferences with these semantics? The complexities and difficulty in interpreting
solutions for these kinds of questions made this method less popular.

2.7.3 Similarity-based Search:
The last approach is to adopt current information retrieval techniques to retrieve
Web services. In this approach, Web service similarity could be measured as probabilistic
values, and it helps to find relevant services by comparing these values. However, current
IR techniques have been focusing on the retrieval of plain texts, so it cannot be directly
applied to service retrieval. The reason is that a service interface has a logical structure of
operations and messages which are consumed by each operation. For this reason, our
work and related works focus on how it is well adapted in Web services environment.
Recently, there have been some efforts to use IR techniques for service retrieval. For
similarity search of service operations, we can perform clustering of operations based on
parameter name and operation name. We can also show that vector space model (VSM)
could be useful in service retrieval too. Efficiency of VSM in service retrieval and usage
of TF-IDF weighted similarity measures to provide a rank of query results is better than
other IR retrieval techniques. A new solution, named concept lattices can be used for
finding alternative services.

2.8 Our approach
To use web services, it is required for users to find relevant services from a
collection of services dispersed on the web. We currently use UDDI, a distributed registry
system for Web services, to find services. However, UDDI only supports exact keyword
match and category based query towards UDDI data entries representing each Web
service, so it is hard to get a ranked query result and alternate services which are other
possible services when a currently chosen service is no more useful or unreachable.
Moreover, UDDI does not use WSDL definitions that actually describe service interfaces
and message formats of Web services as a target of queries.

13

In this project, we introduce a framework for XML Web services retrieval, which
can solve the current problems that lie on UDDI. Our system is located on top of a
document database, which consists of UDDI data entries and WSDL files. It provides
ranked query result of Web services and finding alternate Web services by using a
similarity measure. In addition, we discuss related works and further features needed to
improve performance of this Web service retrieval framework.

We propose a framework for Web services retrieval that provides ranked lists of
Web services. To do this, we apply existing information retrieval approaches to compare
Web services. This system is located on top of UDDI and considers not only UDDI data
entries but also WSDL definitions of services. Our framework provides two query types:
keyword-based and template-based query. Template-based query is a query which itself is
user own WSDL definition to retrieve services in which interfaces quite match with the
interfaces of user own services.

Our framework has following characteristics:
It provides a ranked list of services and hence user gets most relevant services
corresponding to his query.
Users can determine query granularity that is a region where item users need to
get is located in the hierarchy. User can search for a particular business, service
or operation instead of searching for the entire web service. This differentiates
our keyword based search compared to normal keyword search and fine tunes our
search.
It supports discovery of alternate web services which are similar to interface
definition of user own services. We call this template-based query.

14

CHAPTER 3

SOFTWARE REQUIREMENTS SPECIFICATION

3.1 Introduction
This Software Requirements Specification provides a complete description of the
design and implementation of a framework for XML web services retrieval with ranking.
The expected users of this framework are all those users who occasionally or frequently
search for relevant web services or alternate web services for their application. It will
also serves as reference for other web services retrieval methods.

3.1.1 Glossary

Term Definition

XML Extensible Mark-Up Language

WSDL Web Services Description Language.

SOAP Simple Object Access Protocol

UDDI Universal Description, Discovery and
Integration
IR Information Retrieval

VSM Vector Space Model

TF-IDF Term Frequency – Inverse Document
Frequency
Table 3.1 Term and definition

3.2 General Description

Current Web service discovery solution is UDDI, a registry-based discovery
mechanism. This UDDI only supports keyword search. It provides a service list that
consists the words that exactly match with user query. This means that UDDI does not
provide ranked query results and alternate services. We use both UDDI and WSDL
15

definitions and use information retrieval ranking strategy to rank web services according
to their relevance and to find alternate web services. Template based queries will improve
the precision and efficiency of service retrieval providing relevant query results.

3.2.1 Product Description

Basically we implement a search engine for web services. User can input either
keyword based query or template based query to get a ranked list of web services. He can
also retrieve alternate web services by specifying WSDL file in template based query.
Users can get query granularity by specifying a particular business ,service or operation in
the search engine.

3.2.2 End User Expectation
User should be able specify WSDL file as template based query for search
engine. He is also expected to know basic elements of WSDL file so that he can find
alternate web services requires for his purpose.

3.2.3 General Constraints
We have to create our own document database consisting of UDDI data
entries.
User should be able specify WSDL file as template based query for search
engine.
User friendly interface to take keyword queries.

3.1 Specific Requirements

3.3.1 Functional requirements

The main functional requirements for this solution are as follows:

User input: User has to specify his query either as keywords (for keyword
based search) or as WSDL file (for template based search) to the search
engine.

16

Parsing: In case of template based queries, we parse the input WSDL file
using XML parser .It will be tokenized and stemmed.
Similarity Search and Ranking: Calculate similarity between query and
service descriptions. Percentages of similarity are calculated for web services.
This similarity comparison is accomplished by vector space model and a TF-
IDF calculation way. However, we don‘t use original TF-IDF weight for our
framework. We give a weight to each tokenized word according to the
position where the words exist in.
Output: Ranked list of top 3 web services (WSDL file URLs) along with their
percentage of similarity will be displayed by the search engine.

3.3.2 Non-functional Requirements

The main non-functional requirements for this solution are as follows:

Performance: The system must provide good ranking mechanism with
better precisions. Search engine‘s response time should be less.
Scalability: The system has to be scalable to a potentially unlimited
number of parties.
User-friendly operation interface: The system must be easily operated so
that non-specialized users, can access it without extra help.

3.3.3 Software Requirements

Eclipse Integrated development environment
Tomcat5.5.7, application server.
MS Access 2007 bottom database server.
Java 1.6.0.

3.3.4 Hardware Requirements

Operating system: Microsoft XP Professional.
Processor: Pentium IV, 2.7 GHz, 1 GB RAM.

17

CHAPTER 4
SYSTEM DESIGN

4.1 Introduction and Design Overview
The system design has three fundamental sections
Document database which is required for storing UDDI data entries and WSDL
entries.
Web services retrieval framework for parsing, tokenizing, indexing service
descriptors and calculating similarity percentage.
Query interface which provides a search engine for entering queries and
displaying results.

4.2 System Architectural Design

Figure 4.2 Overview of the framework

18

4.3 Detailed Description of Components

4.3.1 Parsing
The first step of retrieval process in template based query is to parse service
descriptions where each service is described in XML structures. In this step, service
descriptions are to be parsed and organized as a DOM tree. To extract service
descriptions from WSDL files, we use Java APIs for XML parser.

4.3.2 Tokenizing
Next step is tokenizing of WSDL elements (in case of Template based search) and
tokenization of words in user query (in case of keyword based search). During this step a
concatenated word such as ‗SearchByAuthor‘ splits up as ‗search‘, ‗by‘ and ‗author‘
strings.

4.3.3 Document database
We implement UDDI by creating a database. We represent each web service by
four documents-a document regarding business, a document regarding service, a
document regarding operations and a document providing the description of web service.
Hence we maintain four tables-business, service, operations, description which contain
index terms or keywords of the above mentioned documents respectively. In addition to
these four tables we have a separate table called web services which consists of list of
sample web services and the URLs of their respective WSDL files.

4.3.4 Ranking
And the final step is to calculate similarity between query and service
descriptions. With this similarity, we measure the rank of services in a result list. This
similarity comparison is accomplished by vector space model and a TF-IDF calculation
way. However, we don‘t use original TF-IDF weight for our framework. We give a
weight to each tokenized word according to the position where the words exist in. For
example, let a word is positioned in one of a name and a description element. If this word
exists in a name field, it has more important meaning than the same word in a description
field. So we calculate term frequencies according to their position as shown below.

19

In here, Wij is a weighted vector for a word in a query vector, and Wid is a
weighted vector for a word in a service descriptions.
Document weight is calculated using:
Wid = ∑tf * idf

Query weight is calculated using:
Wqj = ∑ (0.5+(0.5* tf ))* idf

4.4 User Interface Design
4.4.1 Description of the User Interface
A form for the user to enter keyword based query.
Radio buttons for selecting business ,service ,operations enabling
granular search.
A form for the user to enter template based query by specifying
WSDL.
List of operations in the specified WSDL file.
Name, URL and similarity percentage of ranked query results.

20

4.4.2 Use Case Diagram

Keyword-based Search

The use case diagram above depict the working of the system when user enters keyword
based query. The interactions happening between the actors are also depicted accordingly.

Actors
1. Client
2. Server

Description of Use Cases

1. Keyword-based Query
Here, user enters the query in the natural language. It produces only the exact
match for the query that is been entered by the user.
21

2. Databases
Web service descriptors and WSDL entries of web services are stored in a
document database.These entries are considered during similarity calculation with
user queries.

3. Retrieved URLs
When a user enters a query in the form of keyword, all the relevant URLs of
WSDL files of relevant web services are displayed. User is provided with an option
to select which URL he wants to use.

4. Search
Relevant web services for the user given keyword based query are searched.
Similarity calculation is done and results are displayed in descending order of similarity
percentage.

22

Template based Search

The use case diagrams above depict the basic working of the system when user enters
template based query. The interactions happening between the actors are also depicted
accordingly.
Actors
1. Client
2. Server

Description of Use Cases

1. Template
User has to enter the WSDL file as template based query in the form provided in
the search engine the file will be parsed here, which will be checked against the
indexes stored in the databases.
23

2. Parser
The WSDL file specified will be parsed using XML parser using java APIs.
A parser creates a DOM tree from the given WSDL file and it will be used to
extract operation names from WSDL file.

3. Similarity Measures
Whenever a match is found then, we use vector space model to calculate the
similarity. We represent all the files and the query in the form of a vector and do the
TF-IDF calculations to get the similarity.

4. VSM
In a vector space, Documents and queries are represented as vectors.
dj = (w1,j,w2,j,...,wt,j)
q = (w1,q,w2,q,...,wt,q)
Each dimension corresponds to a separate term. If a term occurs in the document, its
value in the vector is non-zero. Several different ways of computing these values,
also known as (term) weights, have been developed. One of the best known schemes
is TF-IDF weighting, which is used as a similarity measure.

5. Ranked List
After the calculation of all the similarities, all the relevant web services are
arranged in the descending order of the similarity percentage. User is provided with
an option to select which URL he wants to use.

6. Retrieved URLs
When a user enters a query in the form of keyword, all the relevant URLs of WSDL
files of relevant web services are displayed. User is provided with an option to select
which URL he wants to use.

24

4.5 Test Plan

4.5.1 Features to Be Tested

Parsing module has to be tested whether the WSDL file
is being parsed correctly and the appropriate operation
names are extracted.

Tokenizing module has to be tested to make sure that
the queries are being tokenized correctly into separate
tokens.

Document database connection module has to be tested
whether the connection is established correctly and we
need to ensure that required data entries are extracted.

TF-IDF similarity module has to be tested to make sure
that similarity between user query and web services are
calculated accurately in different cases.

25

CHAPTER 5
IMPLEMENTATION

5.1 Class Description
5.1.1 MainClass:
Extends: JFrame
Implements: ActionListener and TreeSelectionListener
Attributes:
bg : ButtonGroup
bname : String
browse : JButton
key_search : JButton
key_text : JTextField
n : int
op_text : JTextArea
rb1 : JRadioButton
rb2 : JRadioButton
rb3 : JRadioButton
result : String[][]
scrollText : JScrollPane
selected : String[]
table : JTable
temp_parse : JButton
temp_text : JTextField
top : DefaultMutableTreeNode
tree : JTree
tree_search : JButton
type : String

26

Operations:
MainClass(String): Constructor for the class which passes
the value, name of the frame, to super class JFrame
constructor.

actionPerformed(ActionEvent): Abstract method of class
ActionListener which must be overloaded. It sets the
variable type to the component selected on the frame.

addComponentsToPane(Container): This method is used to
add the components to frame, using grid layout. It divides
the pane to 4 regions and adds components related to
keyword search in region1, components related to parsing
in region2, components related to template search in
region3 and the components related to result in region4.

valueChanged(TreeSelectionEvent):Abstract method of
class TreeActionListener which must be overloaded. It
copies the selected nodes of operations tree and adds to
string array selected.

5.1.2 :BusinessSearch:
Package: search
Operation:
calculateWeight(String[]): Takes the tokenized
query words as Input. Connects to the document
database and for each entry of the WSDL file we
calculate the weight of the token with respect to the
document by searching for the token in the business
names and description entries. It loops for total
number of documents in database and returns float
array with weights.

27

5.1.3 :OperationSearch:
Package: search
Operation:
document by searching for the token in the
operation names and description entries. It loops for
total number of documents in database and returns
float array with weights.

5.1.4 :ServiceSearch:
Package: search
Operation:
document by searching for the token in the service
names and description entries. It loops for total
number of documents in database and returns float
array with weights.

5.1.5 :Tokenization:
Package: search
Operation:

tokenize(String): Takes a string as input. And the
string is broken in to tokens as either a capital letter
is found or one of the special characters is
encountered. If the token collected is either of the
words ―by, to, and etc..‖ then it s discarded.

28

5.1.6 :SimilarityCalculator:
Package: search
Attribute:
similarity: float[][]
Operation:
calculateSimilarity(float[][], float[], int): Takes
weighted array of query and weighted array of
documents and number of tokens as input. It
calculates the similarity of the document to the
query for tokens. Care is taken such that similarity
never exceeds 100. And stores similarities of
document in float array similarity.

finalResult(): It searches for top 3 non zero
similarity documents and connects to the document
database and retrieves the WSDL URL and name of
the service and stores in string variable. And returns
it.

5.1.7:KeyWordSearch:
Package: search
Implements: ActionListener

Operation:
actionPerformed(ActionEvent): Abstract method of
class ActionListener which must be overloaded.
This method is invoked when user enters the query
and presses the key word search button. It takes the
query string and creates Tokenization object and
tokenizes the string. And calculates the TF/IDF on
the query by using object QueryTFIDF. Depending
on the radio button selected business, operation or

29

service it creates corresponding object and
calculates the weight for the documents. And using
query weight and document weight it calculates
similarity using similarityCalculation object and
calculates the final result and updates the result
table.

5.1.8:TemplateSearch:
Package: search

Operation:
This method is invoked when user selects the nodes
of the operation tree and presses the template search
button. It takes the string selected which contains
the selected tree nodes and creates Tokenization
object and tokenizes the string. And calculates the
TF/IDF on the query by using object QueryTFIDF.
Creates OperationSearch object and calculates the
weight for the documents. And using query weight
and document weight it calculates similarity using
similarityCalculation object and calculates the final
result and updates the result table.

5.1.9: OpenFile:
Package: search

Operation:
This method is invoked when user selects browse
button in the template search to select the WSDL
30

file. It creates the JFileChooser object and opens the
window after user selecting the file the path is
extracted and set to the text field.

5.1.10: ParseWSDL:
Package: search

Operation:

This method is invoked when user selects the
WSDL file and presses the parse button in the
template based search. It takes the string which is
the path of the WSDL file and creates the file reader
object and converts the contents into a single string
and then that string is converted into document of
DocumentBuilderFactory object and then a xpath is
defined for the operations field of WSDL file and
executed on the document. Where the list of
operations is stored in the NodeList. Using this list
it creates the Jtree and adds the nodes to it.

5.1.11: QueryTFIDF:
Package: search
Operation:
calculateWeight(String[]): Takes the query tokens
string as input and calculates the weight of the token
in query and returns it.

5.1.12:MyTableModel:
Package: table

31

Extends: AbstractTableModel

Attributes:
columnNames : String[]
data : Object[][]

Operations:
getColumnCount(): returns number of colums in
variable columnNames

getColumnName(int): returns the string value in
columnNames array in the given index.

getRowCount(): returns number of rows in 2-D
array data.

getValueAt(int, int): returns the value in the table at
the given indices.

setValueAt(Object, int, int) sets the given value in
the table at the given indices.

5.1.13: BrowserLaunch:
Package: table
Attributes:
browsers : String[]
errMsg : String

Operations:
openURL(String): It s executed when user selects
any row in the result table. The URL in the row is
extracted and given as input. It searches for the
operating system of the running system using
system properties and opens the browser selecting
32

from browser string and opens it with the given
URL.

5.2 Algorithms Used

5.2.1 Keyword Based Search Algorithm
//input: contents of text field which has the user query.
//output: displays table with 3 rows containing top3 non zero similarity
service WSDL file URL.
//description: it tokenizes the given query and finds the weight of the query
and also searches the document database and finds the weigh
of document with the tokens and finds the similarity of the
document to the query.
Step1: string input=getText(text_field)
Step2: string[] tokens=Tokenization.tokenize(input)
Step3: float[] queryweight=QueryTFIDF.calculateWeight(tokens)
Step4: if(buttonselected=Business)then
float[][] documentweight=BusinessSearch.calculateWeight(tokens)
else if(buttonselected=Service)then
float[][] documentweight=ServiceSearch.calculateWeight(tokens)
else if(buttonselected=Operation)then
float[][] documentweight=OperationSearch.calculateWeight(tokens)
{end of if else if}
Step5: SimilarityCalculator.calculateSimilarity(documentweight,querywei
ght,tokens.length())
Step6: string[][] result = : SimilarityCalculator.finalResult()
Step7: for i=0 to number of rows in result
for j=0 to number of columns in result
resultTable.setValue(result[i][j],i,j)

33

5.2.2 Template Based Search Algorithm
//input: WSDL file as the user query.
//output: displays table with 3 rows containing top3 non zero similarity
service WSDL file URL.
//description: it parses the given WSDL file and extracts the operation
names and displays in tree form and lets user select nodes
tokenizes the selected nodes and finds the weight of qery
and also searches the document database and finds the weigh
of document with the tokens and finds the similarity of the
document to the query.

Step1: string input=getText(text_field)
Step2: string[] operations = parse((WSDL)input)
Step3: tree.nodes = operations.
Step4: string[] selectednodes = tree.getselectednodes()
Step5:for each selectednode
String[] tokens = Tokenization.tokenize(selectednode[i])
Step6: float[] queryweight=QueryTFIDF.calculateWeight(tokens)
Step7:float[][] documentweight=OperationSearch.calculateWeight(tokens)
Step8: SimilarityCalculator.calculateSimilarity(documentweight,querywei
ght,tokens.length())
Step9: string[][] result = : SimilarityCalculator.finalResult()
Step10: for i=0 to number of rows in result
for j=0 to number of columns in result
resultTable.setValue(result[i][j],i,j)

34

CHAPTER 6
TESTING

6.1 Introduction

6.1.1 System Overview
Our system is a search engine which takes input as either keyword
or template based queries from user and produce ranked list of relevant web
services arranged in decreasing order of similarity percentage.

6.1.2 Test Approach
User can provide keyword based query as per
Business
Service
Operation
User can also provide template based query by specifying a
WSDL file. The search engine is expected to produce a ranked list of web services
with their similarity percentage and URLs of corresponding WSDL files.

6.2 Test Cases
6.2.1 Business Search
6.2.1.1 Purpose
To verify whether the retrieval of web services as per
business is accurate
6.2.1.2 Inputs
A keyword based query on businesses or companies which
provide web services.
6.2.1.3 Expected Outputs
Output consists of a set of web services which are ranked
and listed in the decreasing order of their similarity with the respective
URLs of WSDL files.

35

6.2.1.4 Test Procedure
User specified query will be tokenized and TF-IDF
calculations for each term are calculated for each web service by referring
business table in the document database. The description table entries are
also considered for TF-IDF but less weightage is given compared to
business table entries. Finally we calculate similarity percentage for each
web service with respect to the user query and rank them.
6.2.1.5 Test Results
Result consists of a ranked list of web services according to
the business query specified by user with correctly calculated similarity
percentages.

6.2.2 Service Search
6.2.2.1 Purpose
service description is accurate.
6.2.2.2 Inputs
A keyword based query on service expected by the user.
URLs of WSDL files.
service table in the document database. The description table entries are
the service description specified by user with correctly calculated
similarity percentages
36

6.2.3 Operation Search
6.2.3.1 Purpose
operation description is accurate.
6.2.3.2 Inputs
A keyword based query on operations expected by the user.
URLs of WSDL files.
operations table in the document database. The description table entries are
the operation description specified by user with correctly calculated
similarity percentages.

6.2.4 Template based Search
6.2.4.1 Purpose
To verify whether the retrieval of web services as per the
WSDL file specified by the user is accurate.
6.2.4.2 Inputs
A template based query on by providing a WSDL file.

37

Output consists of a set of query relevant web services
arranged in the decreasing order of their similarity with the respective
URLs of WSDL files.
User specified WSDL file will be parsed and operation
names will be extracted from the WSDL file. User can select desired
operations from the list of available operations in WSDL file. The selected
operation names will be tokenized and TF-IDF calculations for each term
are calculated for each web service by referring operations table in the
document database. The description table entries are also considered for
TF-IDF but less weightage is given compared to business table entries.
Finally we calculate similarity percentage for each web service with
respect to the user query and rank them.
Result consist of a ranked list of web services whose
WSDL files have similar operations compared to the user specified WSDL
file, with correctly calculated similarity percentages.

6.3 Experimental Study
The experiments were performed on a Pentium IV 2.7GHz, 1G RAM,
Windows machine and codes are implemented with Java 1.6.0.
Number of web services: 20
Number of operations: 42
Document database size: 1.34MB

The experiments were performed on following 5 queries
1.Area of square
2.Search book by author name
3.Air routes between two cities
4.Car price by name
5.Country by its capital city

38

0.8

0.7

0.6

0.5

0.4 Keyword Query

0.3 Template Query

0.2

0.1

0
1 2 3 4 5

Graph 1.Execution time for keyword and template searches

Template based queries take more time to execute than keyword based query as
more time is required for parsing and extracting useful data from WSDL file ,
which is shown is Graph 1.

Keyword Queries
100
90
80
70
60
50
Keyword Queries
40
30
20
10
0
1 2 3 4 5

Graph 2.Precision percentages of keyword based query searches.

39

Template Queries
100
90
80
70
60
50
Template Queries
40
30
20
10
0
1 2 3 4 5

Graph 3.Precision percentages of template based query searches

Template based queries produce more efficient results compared to keyword based
queries as we directly match WSDL files of web services, which provide more accurate
results. This is reflected in Graphs 2 and 3 where precision percentages for keyword and
template based queries are respectively shown.

40

CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENT

In this project we have proposed a new form of query called template based
query which optimizes searching and produces fine tuned results. We also improve the
conventional keyword based search by providing granular search where user can search
for a particular business, operation, service instead of searching vaguely for the entire
web service.
Template based queries take more time to execute than keyword based query as
more time is required for parsing and extracting useful data from WSDL file .However
template based queries produce more efficient results compared to keyword based queries
as we directly match WSDL files of web services, which provide more accurate results.

As a future enhancement we can perform service composition wherein user can
combine one or more web services to create a new web services as per his requirement.
One way to do this is selecting all those services whose number and type of parameters
matches with the existing web service so that we can combine the services to have new
web services

41

CHAPTER 8

BIBLIOGRAPHY AND REFERENCES

[1] IEEE 2007 paper, Kyong-Ha Lee, Mi-young Lee Yun-Young Hwang and Kyu-Chul
Lee, Department of Computer Engineering, Chungnam National University Daejeon,
305-764, KOREA , “A Framework for XML Web Services Retrieval with Ranking”.

[2] Chiristian Platzer, Schahram Dustdar, “A Vector space search engine for Web
services”, In Proceedings of 3rd European Conference on Web Services, 2005

[3] Natallia Kokash, ―A Comparison of web service interface similarity measures”,
Technical Report DIT-06-025, University of Trento, 2006

[4] Dunglu Peng, et. al., ―Concept-based retrieval of alternate Web services‖, In
Proceedings of DASFAA 2005, LNCS V.3453, pp. 359-371, 2005

[5] Jong P. Yoon, et. al. “BitCube: A Three-dimensional Bitmap Indexing for XML
Documents”, Journal of Intelligent Information Systems Vol. 17 pp. 241-252, 2001

[6] W3 schools-www.w3c.org

[7] Wikipedia-web services, information retrieval.

42

SCREEN SNAPSHOTS

1. Keyword Search

43

2. Template search

44

Web services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Web services

Similar to Web services (20)

Web services