SlideShare a Scribd company logo
Software Requirements
                       Specification
                                                              For

Record Matching over Query
  Results from Multiple Web
                 Databases
                                                      Prepared by


                                          Frederick H. Lochovsky


                                                  Pelican Infotech


                  Submitted in partial fulfillment
                      Of the requirements of
Mining sequential patterns matching over high utility data sets
Mining sequential patterns matching over high utility data sets                                                              Page ii


Table of Contents
Introduction...................................................................................................................................3
    Purpose ....................................................................................................................................................    3
    Document Conventions ...........................................................................................................................                3
    Intended Audience and Reading Suggestions..........................................................................................                             3
    Product Scope...........................................................................................................................................        3
    References................................................................................................................................................      4
Overall Description.......................................................................................................................4
    Product Perspective.................................................................................................................................. 4
    Product Functions..................................................................................................................................... 5
    User Classes and Characteristics.............................................................................................................. 5
    Operating Environment............................................................................................................................ 6
    Design and Implementation Constraints.................................................................................................. 7
    User Documentation............................................................................................................................... 10
    Assumptions and Dependencies............................................................................................................. 10
External Interface Requirements.............................................................................................. 11
    User Interfaces.......................................................................................................................................         11
    Hardware Interfaces...............................................................................................................................             11
    Software Interfaces.................................................................................................................................           12
    Communications Interfaces....................................................................................................................                  15
System Features.......................................................................................................................... 18
Other Nonfunctional Requirements..........................................................................................25
    Performance Requirements....................................................................................................................                   25
    Safety Requirements..............................................................................................................................              25
    Security Requirements...........................................................................................................................               25
    Software Quality Attributes...................................................................................................................                 25
    Business Rules.......................................................................................................................................          25
Other Requirements................................................................................................................... 25



Revision History
Name                                Date                 Reason For Changes                                                                     Version
Introduction
Purpose

   This Software Requirements Specification provides a complete description of all the

functions and specifications of the Frederick H. Lochovsky on Mining sequential patterns

matching over high utility data sets



Document Conventions

Though this document is intended as a set of Requirements, and not a design document,

technical information has been included wherever it was deem appropriate.

Priority for all functionality is assumed to be equally except where noted.



Intended Audience and Reading Suggestions
The primary audience for this document is the development team. The secondary audience is the

Pelican InfoTech project management team.


Product Scope
Query-dependent and a pre learned method using training examples from previous query

results may fail on the results of a new query. To address the problem of record matching in

the Web database scenario, we present an unsupervised, online record matching method,



                            Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
UDD, which, for a given query, can effectively identify duplicates from the query result

records of multiple Web databases.




References
The following references are relevant to the project and can be consulted to project a more

detailed view of the technologies and standards being used in this project



       1. Eliminating Fuzzy Duplicates in Data Warehouses

              R. Ananthakrishna, S. Chaudhuri, and V. Ganti

       2. A Comparison of Fast Blocking Methods for Record Linkage

              R. Baxter, P. Christen, and T. Churches

       3. Robust Identification of Fuzzy Duplicates

              S. Chaudhuri, V. Ganti, and R. Motwani




Overall Description
Product Perspective




                          Ambit lick Solutions
             Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
• False data can discover the actions when unauthorized users

       attempted to access computer systems or authorized users attempted

       to misuse their privileges.

   • Association rule mining

   • An algorithm based on sequential pattern mining using the same data

       collected by the Databases.


Product Functions

The product shall allow users to:

   •   Install and set up an issue tracking database
   •   Define the formats of acceptable issues
   •   File preformatted reports in a database
   •   Submit issues to a database
   •   Query the database in a number of ways
   •   Edit issues in the database and resubmit them
   •   Merge multiple issues into a single issue
   •   Relate issues to each other in a hierarchical form
   •   Assemble groups of related issues into a document




User Classes and Characteristics

Individual Local Developers. Individual developers should be able to submit issues, edit
issues, and perform queries on the database to discover what issues are relevant to them,
which issues are open (in the case of issues to which that is relevant, such as defect reports or
unsatisfied requirements), etc. These individual developers are assumed to have some
knowledge of the development environment and are familiar and comfortable with basic

                            Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
software tools such as text editors etc. As a result, the individual developer tools will be the
most "primitive" but also the most efficient for use, probably implemented as text-based
command line tools. Since Network simulation is primarily intended as an easy-to-use, free
tool for individual developers and small teams, this is the most critical user class to satisfy.
The tools must be relatively easy to use, and extremely easy to set up.

Local Issue Managers. Issue managers -- those responsible for keeping track of open issues,
etc. -- must have tools capable of querying the database and relating issues to developers. The
tools used for issue managers and individual developers will be very similar, as they will be
doing similar tasks -- querying the database for open issues, assigning people to issues as
appropriate, recategorizing issues or merging/splitting them, etc. However, issue managers
may not be as comfortable with "primitive" tools as individual developers, so some thought
will be given to more "scripted" or directive tools, possibly involving simple GUI elements.
However, the bulk of user-interface issues will be placed on the next user class, remote users.

Remote Users. If Network simulation is used as a defect management system, then remote
users (users of software packages submitting reports to a Network simulation center) will
constitute the bulk of submissions. If Network simulation is to be used in this way, it must
cater to the needs of these users, who will have much lower skills and will require very
simple, easy-to-use interfaces. Primarily these interfaces will focus on problem submission,
but they will also allow some ability to query the database, etc.




Operating Environment

In a computer the operating environment includes temperature and so on affecting circuitry;

but in particular the term is often used to describe the non-physical environment in which



                            Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
software runs. This may apply to application software with which users interact, comprising

the "look and feel" of the system, its appearance and the things that have to be done to achieve

desired results. The term may also apply to system software; e.g., software designed for a

Unix environment will do things differently than in a Microsoft Windows environment. Some

operating environments for programming purposes are referred as programming

environments; e.g., the "UNIX programming environment" for a Unix shell with its look and

feel and functionality.



"Operating environment" is not the totality of the functionality and appearance of an operating

system.




Design and Implementation Constraints




1 Architecture



                           Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Applying the Mining Tool




  Using                       Mining the Data
Algorithms




                       Check the customer using RFC
                                   model




                            Analyze the Business




                                           Customer
                  Ambit lick Solutions
       Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Cluster formation
                 DB




                 Check            max         High profit, gold
                the user                         customer



               min                            Start the mining

              Low profit



                                              Store & manage




                                                     Analyze




           Ambit lick Solutions
Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
User Documentation
none




Assumptions and Dependencies
Data bases are defined as


@relation 'cpu'
@attribute MYCT real
@attribute MMIN real
@attribute MMAX real
@attribute CACH real
@attribute CHMIN real
@attribute CHMAX real
@attribute class real
@data


125,256,6000,256,16,128,199
29,8000,32000,32,8,32,253
29,8000,32000,32,8,32,253
29,8000,32000,32,8,32,253
29,8000,16000,32,8,16,132
26,8000,32000,64,8,32,290
23,16000,32000,64,16,32,381
23,16000,32000,64,16,32,381
23,16000,64000,64,16,32,749
23,32000,64000,128,32,64,1238
400,1000,3000,0,1,2,23
400,512,3500,4,1,6,24
60,2000,8000,65,1,8,70
50,4000,16000,65,1,8,117
350,64,64,0,1,4,15



                            Ambit lick Solutions
            Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
External Interface Requirements
User Interfaces

      Artificial neural networks: Non-linear predictive models that learn through
        training and resemble biological neural networks in structure.
      Decision trees: Tree-shaped structures that represent sets of decisions. These
        decisions generate rules for the classification of a dataset. Specific decision tree
        methods include Classification and Regression Trees (CART) and Chi Square
        Automatic Interaction Detection (CHAID).
      Genetic algorithms: Optimization techniques that use process such as genetic
        combination, mutation, and natural selection in a design based on the concepts of
        evolution.
      Nearest neighbor method: A technique that classifies each record in a dataset
        based on a combination of the classes of the k record(s) most similar to it in a
        historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.
      Rule induction: The extraction of useful if-then rules from data based on
        statistical significance.




Hardware Interfaces

     Hardware Specification


                     Processor Type          : Pentium -III
                     Speed                   : 1.6 GHZ
                     Ram                     : 128 MB RAM
                     Hard disk               : 8 GB HD

                         Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Software Interfaces

       Java began as a client side platform independent programming language that enabled
stand-alone Java applications and applets. The numerous benefits of Java resulted in an
explosion in the usage of Java in the back end server side enterprise systems. The Java
Development Kit (JDK), which was the original standard platform defined by Sun, was soon
supplemented by a collection of enterprise APIs. The proliferation of enterprise APIs, often
developed by several different groups, resulted in divergence of APIs and caused concern
among the Java developer community.
       Java byte code can execute on the server instead of or in addition to the client,
enabling you to build traditional client/server applications and modern thin client Web
applications. Two key server side Java technologies are servlets and JavaServer Pages.
Servlets are protocol and platform independent server side components which extend the
functionality of a Web server. JavaServer Pages (JSPs) extend the functionality of servlets by
allowing Java servlet code to be embedded in an HTML file.

Features of Java

       •   Platform Independence
           o The Write-Once-Run-Anywhere ideal has not been achieved (tuning for

                different platforms usually required), but closer than with other languages.
       •   Object Oriented
           •    Object oriented throughout - no coding outside of class definitions, including
                main().
           •    An extensive class library available in the core language packages.
       •   Compiler/Interpreter Combo


                            Ambit lick Solutions
               Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•    Code is compiled to byte codes that are interpreted by a Java virtual machines
         (JVM).
    •    This provides portability to any machine for which a virtual machine has been
         written.
    •    The two steps of compilation and interpretation allow for extensive code
         checking and improved security.
•   Robust
    •    Exception handling built-in, strong type checking (that is, all data must be
         declared an explicit type), local variables must be initialized.
•   Several dangerous features of C & C++ eliminated:
    • No memory pointers
    •    No preprocessor
    •    Array index limit checking




•   Automatic Memory Management
    •    Automatic garbage collection - memory management handled by JVM.

•   Security
    •    No memory pointers
    •    Programs run inside the virtual machine sandbox.
    •    Array index limit checking
    •    Code pathologies reduced by
    •    byte code verifier - checks classes after loading
    •    Class loader - confines objects to unique namespaces. Prevents loading a
         hacked "java.lang.SecurityManager" class, for example.

                     Ambit lick Solutions
        Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•    Security manager - determines what resources a class can access such as
         reading and writing to the local disk.

•   Dynamic Binding
    •    The linking of data and methods to where they are located is done at run-time.
    •    New classes can be loaded while a program is running. Linking is done on the
         fly.
    •    Even if libraries are recompiled, there is no need to recompile code that uses
         classes in those libraries. This differs from C++, which uses static binding.
         This can result in fragile classes for cases where linked code is changed and
         memory pointers then point to the wrong addresses.


•   Good Performance
    •    Interpretation of byte codes slowed performance in early versions, but
         advanced virtual machines with adaptive and just-in-time compilation and
         other techniques now typically provide performance up to 50% to 100% the
         speed of C++ programs.




•   Threading
    •    Lightweight processes, called threads, can easily be spun off to perform
         multiprocessing.
    •    Can take advantage of multiprocessors where available
    •    Great for multimedia displays.



                     Ambit lick Solutions
        Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•   Built-in Networking
           •    Java was designed with networking in mind and comes with many classes to
                develop sophisticated Internet communications.




Communications Interfaces

ECLIPSE
                 Eclipse is an open-source software framework written primarily in Java .The
initial codebase originated from VisualAge. In its default form it is an Integrated
Development Environment (IDE) for Java developers, consisting of the Java Development
Tools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipse
software framework, such as development toolkits for other programming languages, and can
write and contribute their own plug-in modules. Language packs provide translations into over
a dozen natural languages.


4.1.1 ARCHITECTURE:

                The basis for Eclipse is the Rich Client Platform (RCP). The following
components constitute the rich client platform:
           •    OSGi - a standard bundling framework
           •    Core platform - boot Eclipse, run plug-ins
           •    The Standard Widget Toolkit (SWT) - a portable widget toolkit
           •    JFace - viewer classes to bring model view controller programming to SWT,
                file buffers, text handling, and text editors
           •    The Eclipse Workbench - views, editors, perspectives, wizards

                             Ambit lick Solutions
               Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Eclipse's widgets are implemented by a widget toolkit for Java called SWT,
unlike most Java applications, which use the Java standard Abstract Window Toolkit(AWT)
or Swing. Eclipse's user interface also leverages an intermediate GUI layer called JFace,
which simplifies the construction of applications based on SWT.
Eclipse employs plug-ins in order to provide all of its functionality on top of (and including)
the rich client platform, in contrast to some other applications where functionality is typically
hard coded. This plug-in mechanism is a lightweight software componentry framework. In
addition to allowing Eclipse to be extended using other programming languages such as C and
Python, the plug-in framework allows Eclipse to work with typesetting languages like LaTeX,
[3] networking applications such as telnet, and database management systems. The plug-in
architecture supports writing any desired extension to the environment, such as for
configuration management. Java and CVS support is provided in the Eclipse SDK.
 The key to the seamless integration of tools with Eclipse is the plugin. With the exception of
a small run-time kernel, everything in Eclipse is a plug-in. This means that a plug-in you
develop integrates with Eclipse in exactly the same way as other plug-ins; in this respect, all
features are created equal. Eclipse provides plugins for a wide variety of features, some of
which are through third parties using both free and commercial models. Examples of plugins
include UML plugin for Sequence and other UML diagrams, plugin for Database explorer,
etc.
The Eclipse SDK includes the Eclipse Java Development Tools, offering an IDE with a built-
in incremental Java compiler and a full model of the Java source files. This allows for
advanced refactoring techniques and code analysis. The IDE also makes use of a workspace,
in this case a set of metadata over a flat files pace allowing external file modifications as long
as the corresponding workspace "resource" is refreshed afterwards. The Visual Editor project
allows interfaces to be created interactively, hence allowing Eclipse to be used as a RAD tool.


4.1.2 HISTORY

                           Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Eclipse began as an IBM Canada project. It was developed by OTI (Object Technology
International) as a replacement for VisualAge, which itself had been developed by OTI. In
November 2001, a consortium was formed to further the development of Eclipse as open
source. In 2003, the Eclipse Foundation was created.
Eclipse 3.0 (released on June 21 2004) selected the OSGi Service Platform specifications as
the runtime architecture.
Eclipse was originally released under the Common Public License, but was later re-licensed
under the Eclipse Public License. The Free Software Foundation has said that both licenses
are free software licenses, but are incompatible with the GNU General Public License (GPL).
Mike Milinkovich, of the Eclipse Foundation has commented that moving to the GPL will be
considered when version 3 of the GPL is released.




4.1.3 MYECLIPSE:

       MyEclipse is a commercially available Enterprise Java and AJAX IDE created and
maintained by the company Genuitec, a founding member of the Eclipse Foundation.
MyEclipse is built upon the Eclipse platform, and integrates both proprietary and open source
solutions into the development environment.
        MyEclipse has two primary versions a professional and a standard edition. The
standard edition adds database tools, a visual web designer, persistence tools, Spring tools,
Struts and JSF tooling, and a number of other features to the basic Eclipse Java Developer
profile. It competes with the Web Tools Project, which is a part of Eclipse itself, but
MyEclipse is a separate project entirely and offers a different feature set. Most recently,
MyEclipse has been made available via Pulse, a provisioning tool that maintains Eclipse
software profiles, including those that use MyEclipse.


                            Ambit lick Solutions
             Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
System Features
Embedding Data into Weka Data mining tool


       Weka (Waikato Environment for Knowledge Analysis) is a Java-based data mining
tool developed by Waikato University. After loading the dataset into it, the preprocess
function of Weka allows the user to input undesired attributes to prevent them from affecting
the quality of extracted knowledge. Next, the user can use one of the three algorithms to mine
the data: Classification, Clustering, and Association Rule.
       Data Mining is playing a key role in most enterprises, which have to analyse great
amounts of data in order to achieve higher profits. Nevertheless, due to the large datasets
involved in this process, the data mining field must face some technological challenges. Grid
Computing takes advantage of the low-load periods of all the computers connected to a
network, making possible resource and data sharing. Providing Grid services constitute a
flexible manner of tackling the data mining needs. This paper shows the adaptation of Weka, a
widely used Data Mining tool, to a grid infrastructure.


         Classifiers in WEKA are models for predicting nominal or numeric quantities,
Implemented learning schemes include: Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptions, logistic regression, Bayes’ nets, “Meta”-
classifiers include: Bagging, boosting, stacking, error-correcting output
Codes, locally weighted learning.


         WEKA contains “clusters” for finding groups of similar instances in a dataset
Implemented schemes are: k-Means, EM, Cobweb, Farthest First , Clusters can be visualized

                           Ambit lick Solutions
             Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
and compared to “true” clusters (if given) Evaluation based on log likelihood if clustering
scheme produces a probability distribution


       Suppose you have some data and you want to build a decision tree from it. A common
situation is for the data to be stored in a spreadsheet or database. However, Weka expects it to
be in ARFF format, introduced in Section 2.4, because it is necessary to have type information
about each attribute which cannot be automatically deducted from the attribute values. Before
you can apply any algorithm to your data, is must be converted to ARFF form. This can be
done very easily. Recall that the bulk of an ARFF file consists of a list of all the instances,
with the attribute values for each instance being separated by commas (Figure 2.2). Most
spreadsheet and database programs allow you to export your data into a file in comma
separated format—as a list of records where the items are separated by commas.


       Once this has been done, you need only load the file into a text editor or a word
processor; add the dataset’s name using the @relation tag, the attribute information using
@attribute, and a @data line; save the file as raw text—and you’re done! In the following
example we assume that your data is stored in a Microsoft Excel spreadsheet, and you’re
using Microsoft Word for text processing. Of course, the process of converting data into
ARFF format is very similar for other software packages. Figure 8.1a shows an Excel
spreadsheet containing the weather data. It is easy to save this data in comma-separated
format. First, select the Save As… item from the File pull-down menu. Then, in the ensuing
dialog box, select CSV. Now load this file into Microsoft Word. Your screen will look like.


       The rows of the original spreadsheet have been converted into lines of text, and the
elements are separated from each other by commas. All you have to do is convert the first
line, which holds the attribute names, into the header structure that makes up the beginning of
an ARFF file. Shows the result. The dataset’s name is introduced by a @relation tag, and the

                           Ambit lick Solutions
             Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
names, types, and values of each attribute are defined by @attribute tags. The data section of
the ARFF file begins with a @data tag. Once the structure of your dataset matches, you
should save it as a text file.


        Choose Save as… from the File menu, and specify Text Only with Line Breaks as the
file type by using the corresponding popup menu. Enter a file name, and press the Save
button. We suggest that you rename the file to weather.arff to indicate that it is in ARFF
format. Note that the classification schemes in Weka assume by default that the class is the
last attribute in the ARFF file, which fortunately it is in this case. (We explain in Section 8.3
below how to override this default.) Now you can start analyzing this data using the
algorithms provided. In the following we assume that you have downloaded Weka to your
system, and that your Java environment knows where to find the library. (More information
on how to do this can be found at the Weka Web site.) To see what the C4.5 decision tree
learner described in Section 6.1 does with this dataset, we use the J4.8 algorithm, which is
Weka’s implementation of this decision tree learner. (J4.8 actually implements a later and
slightly improved version called C4.5 Revision 8, which was the last public version of this
family of algorithms before C5.0, a commercial implementation, was released.) Type java
weka.classifiers.j48.J48 -t weather.arff at the command line.


        This incantation calls the Java virtual machine and instructs it to execute the J48
algorithm from the j48 package—a sub package of classifiers, which is part of the overall
weka package. Weka is organized in “packages” that correspond to a directory hierarchy.
We’ll give more details of the package structure in the next section: in this case, the sub
package name is j48 and the program to be executed from it is called J48. The –t option
informs the algorithm that the next argument is the name of the training file. After pressing
Return, you’ll see the output shown.


                             Ambit lick Solutions
               Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
5.2.2.1 The weka.core package
       The core package is central to the Weka system. It contains classes that are accessed
from almost every other class. You can find out what they are by clicking on the hyperlink
underlying weka.core, which brings up. The Web page is divided into two parts: the Interface
Index and the Class Index. The latter is a list of all classes contained within the package, while
the former lists all the interfaces it provides. An interface is very similar to a class, the only
difference being that it doesn’t actually do anything by itself—it is merely a list of methods
without actual implementations. Other classes can declare that they “implement” a particular
interface, and then provide code for its methods. For example, the Option Handler interface
defines those methods that are implemented by all classes that can process command-line
options—including all classifiers.


       The key classes in the core package are called Attribute, Instance, and Instances. An
object of class Attribute represents an attribute. It contains the attribute’s name, its type and,
in the case of a nominal attribute, its possible values. An object of class Instance contains the
attribute values of a particular instance; and an object of class Instances holds an ordered set
of instances, in other words, a dataset. By clicking on the hyperlinks underlying the classes,
you can find out more about them. However, you need not know the details just to use Weka
from the command line. We will return to these classes in Section 8.4 when we discuss how to
access the machine learning routines from other Java code. Clicking on the All Packages
hyperlink in the upper left corner of any documentation page brings you back to the listing of
all the packages in Weka.


5.2.2.2 The weka.classifiers package




                            Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
The classifiers package contains implementations of most of the algorithms for
classification and numeric prediction that have been discussed in this book. (Numeric
prediction is included in classifiers: it is interpreted as prediction of a continuous class.) The
most important class in this package is Classifier, which defines the general structure of any
scheme for classification or numeric prediction. It contains two methods, buildClassifier() and
classifyInstance(), which all of these learning algorithms have to implement. In the jargon of
object-oriented programming, the learning algorithms are represented by subclasses of
Classifier, and therefore automatically inherit these two methods. Every scheme redefines
them according to how it builds a classifier and how it
classifies instances. This gives a uniform interface for building and using classifiers from
other Java code.
        Hence, for example, the same evaluation module can be used to evaluate the
performance of any classifier in Weka. Another important class is Distribution Classifier. This
subclass of Classifier defines the method distributionForInstance(), which returns a
probability distribution for a given instance. Any classifier that can calculate class
probabilities is a subclass of Distribution Classifier and implements this method.
To see an example, click on DecisionStump, which is a class for building a simple one-level
binary decision tree (with an extra branch for missing values). You have to use this rather
lengthy expression if you want to build a decision stump from the command line. The page
then displays a tree structure showing the relevant part of the class hierarchy. As you can see,
Decision Stump is a subclass of Distribution Classifier, and therefore produces class
probabilities. Distribution Classifier, in turn, is a subclass of Classifier, which is itself a
subclass of Object. The Object class is the most general one in Java: all classes are
automatically subclasses of it. After some generic information about the class, its author, and
its version, it gives an index of the constructors and methods of this class.
        A constructor is a special kind of method that is called whenever an object of that
class is created, usually initializing the variables that collectively define its state. The index of

                            Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
methods lists the name of each one, the type of parameters it takes, and a short description of
its functionality. Beneath those indexes, the Web page gives more details about the
constructors and methods. We return to those details later. As you can see, Decision Stump
implements all methods required by both a Classifier and a Distribution Classifier. In addition,
it contains toString() and main() methods. The former returns a textual description of the
classifier, used whenever it is printed on the screen. The latter is called every time you ask for
a decision stump from the command line, in other words, every time you enter a command
beginning with java weka.classifiers. Decision Stump
The presence of a main() method in a class indicates that it can be run from the command line,
and all learning methods and filter algorithms implement it.
        Waikato Environment for Knowledge Analysis
        Collection of state-of-the-art machine learning algorithms and data processing
           tools implemented in Java
               o Released under the GPL
        Support for the whole process of experimental data mining
               o Preparation of input data
               o Statistical evaluation of learning schemes
               o Visualization of input data and the result of learning
        Used for education, research and applications
        Complements “Data Mining” by Witten & Frank
5.2.2.3 Features


    49 data preprocessing tools
    76 classification/regression algorithms
    8 clustering algorithms
    15 attribute/subset evaluators + 10 search algorithms for feature selection

                           Ambit lick Solutions
              Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
 3 algorithms for finding association rules
 3 graphical user interfaces
       o “The Explorer” (exploratory data analysis)
       o “The Experimenter” (experimental environment)
       o “The Knowledge Flow” (new process model inspired interface)
 Continue to develop and support WEKA
 MOA (Massive Online Analysis)
       o Framework that supports learning from data streams
                    Facilities   for data generation,   experimental analysis, learning
                     algorithms, etc.
       o The Moa (another native NZ bird) is not only flightless, like the Weka, but also
           extinct
       o First public release, probably this Christmas, or perhaps Thanksgiving (as it’s
           just another turkey)
 MILK
       o Multi-Instance Learning Kit
 Proper
       o Propositionalization toolbox for WEKA




                         Ambit lick Solutions
         Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Other Nonfunctional Requirements
Performance Requirements
The system has no specific performance requirements at this time

Safety Requirements
The system has no specific safety requirements at this time, except to the extent that it is
designed to run without root access.

Security Requirements
The system has no specific security requirements at this time.

Software Quality Attributes
No additional software quality attributes are addressed in the requirements at this time.

Business Rules
There are no explicit business rules for operation of Network simulation at this time. All users
with access to the command line tools and a copy of the repository will be allowed to perform
all actions. Additional security measures and procedures may be added at a future date.


Other Requirements
There are no additional requirements for the product at this time

Appendix A: Glossary




                           Ambit lick Solutions
             Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

More Related Content

Viewers also liked

Drupal
DrupalDrupal
Drupal
ambitlick
 
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc NetworksTowards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
ambitlick
 
TCP Fairness for Uplink and Downlink Flows in WLANs
TCP Fairness for Uplink and Downlink Flows in WLANsTCP Fairness for Uplink and Downlink Flows in WLANs
TCP Fairness for Uplink and Downlink Flows in WLANs
ambitlick
 
A clustering protocol using multiple chain
A clustering protocol using multiple chainA clustering protocol using multiple chain
A clustering protocol using multiple chainambitlick
 
Ambitlick ns2 2013
Ambitlick ns2 2013Ambitlick ns2 2013
Ambitlick ns2 2013ambitlick
 
Energy efficient protocol for deterministic
Energy efficient protocol for deterministicEnergy efficient protocol for deterministic
Energy efficient protocol for deterministic
ambitlick
 
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processes
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processesDynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processes
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processesambitlick
 
Backbone nodes based stable routing for mobile ad hoc networks
Backbone nodes based stable routing for mobile ad hoc networksBackbone nodes based stable routing for mobile ad hoc networks
Backbone nodes based stable routing for mobile ad hoc networksambitlick
 

Viewers also liked (8)

Drupal
DrupalDrupal
Drupal
 
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc NetworksTowards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
Towards the Performance Analysis of IEEE 802.11 in Multi-hop Ad-Hoc Networks
 
TCP Fairness for Uplink and Downlink Flows in WLANs
TCP Fairness for Uplink and Downlink Flows in WLANsTCP Fairness for Uplink and Downlink Flows in WLANs
TCP Fairness for Uplink and Downlink Flows in WLANs
 
A clustering protocol using multiple chain
A clustering protocol using multiple chainA clustering protocol using multiple chain
A clustering protocol using multiple chain
 
Ambitlick ns2 2013
Ambitlick ns2 2013Ambitlick ns2 2013
Ambitlick ns2 2013
 
Energy efficient protocol for deterministic
Energy efficient protocol for deterministicEnergy efficient protocol for deterministic
Energy efficient protocol for deterministic
 
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processes
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processesDynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processes
Dynamic%20 authentication%20for%20cross realm%20soa-based%20business%20processes
 
Backbone nodes based stable routing for mobile ad hoc networks
Backbone nodes based stable routing for mobile ad hoc networksBackbone nodes based stable routing for mobile ad hoc networks
Backbone nodes based stable routing for mobile ad hoc networks
 

Similar to Record matching over query results

Srs document for identity based secure distributed data storage schemes
Srs document for identity based secure distributed data storage schemesSrs document for identity based secure distributed data storage schemes
Srs document for identity based secure distributed data storage schemesSahithi Naraparaju
 
17337071 srs-library-management-system
17337071 srs-library-management-system17337071 srs-library-management-system
17337071 srs-library-management-systemANAS NAIN
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
Kamau Francis
 
Sunserver Open Solaris
Sunserver Open SolarisSunserver Open Solaris
Sunserver Open Solaris
pankaj009
 
Github-Source code management system SRS
Github-Source code management system SRSGithub-Source code management system SRS
Github-Source code management system SRS
Aditya Narayan Swami
 
report_vendor_connect
report_vendor_connectreport_vendor_connect
report_vendor_connectYash Mittal
 
Info sphere overview
Info sphere overviewInfo sphere overview
Info sphere overview
Bhawani N Prasad
 
BrownResearch_CV
BrownResearch_CVBrownResearch_CV
BrownResearch_CVAbby Brown
 
Oracle9i application server oracle forms services
Oracle9i application server   oracle forms servicesOracle9i application server   oracle forms services
Oracle9i application server oracle forms servicesFITSFSd
 
Digital Content Retrieval Final Report
Digital Content Retrieval Final ReportDigital Content Retrieval Final Report
Digital Content Retrieval Final Report
Kourosh Sajjadi
 
Case Study for Ego-centric Citation Network
Case Study for Ego-centric Citation NetworkCase Study for Ego-centric Citation Network
Case Study for Ego-centric Citation Network
Mike Taylor
 
SafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud AppsSafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud Apps
Vladi Vexler
 
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6Michael Parish
 
Tideway Software Identification
Tideway   Software IdentificationTideway   Software Identification
Tideway Software IdentificationPeter Grant
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
Vasu S
 
Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)Reinier Eiman
 
Devops interview questions 1 www.bigclasses.com
Devops interview questions  1  www.bigclasses.comDevops interview questions  1  www.bigclasses.com
Devops interview questions 1 www.bigclasses.com
bigclasses.com
 

Similar to Record matching over query results (20)

Srs document for identity based secure distributed data storage schemes
Srs document for identity based secure distributed data storage schemesSrs document for identity based secure distributed data storage schemes
Srs document for identity based secure distributed data storage schemes
 
17337071 srs-library-management-system
17337071 srs-library-management-system17337071 srs-library-management-system
17337071 srs-library-management-system
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Sunserver Open Solaris
Sunserver Open SolarisSunserver Open Solaris
Sunserver Open Solaris
 
Github-Source code management system SRS
Github-Source code management system SRSGithub-Source code management system SRS
Github-Source code management system SRS
 
report_vendor_connect
report_vendor_connectreport_vendor_connect
report_vendor_connect
 
Info sphere overview
Info sphere overviewInfo sphere overview
Info sphere overview
 
BrownResearch_CV
BrownResearch_CVBrownResearch_CV
BrownResearch_CV
 
Oracle9i application server oracle forms services
Oracle9i application server   oracle forms servicesOracle9i application server   oracle forms services
Oracle9i application server oracle forms services
 
Digital Content Retrieval Final Report
Digital Content Retrieval Final ReportDigital Content Retrieval Final Report
Digital Content Retrieval Final Report
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Case Study for Ego-centric Citation Network
Case Study for Ego-centric Citation NetworkCase Study for Ego-centric Citation Network
Case Study for Ego-centric Citation Network
 
SAP BODS 4.2
SAP BODS 4.2 SAP BODS 4.2
SAP BODS 4.2
 
Database project
Database projectDatabase project
Database project
 
SafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud AppsSafePeak whitepaper for Cloud Apps
SafePeak whitepaper for Cloud Apps
 
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6
USERV Auto Insurance Corticon Rule Model 2015 (Simplified) V6
 
Tideway Software Identification
Tideway   Software IdentificationTideway   Software Identification
Tideway Software Identification
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)
 
Devops interview questions 1 www.bigclasses.com
Devops interview questions  1  www.bigclasses.comDevops interview questions  1  www.bigclasses.com
Devops interview questions 1 www.bigclasses.com
 

More from ambitlick

DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
ambitlick
 
Low cost Java 2013 IEEE projects
Low cost Java 2013 IEEE projectsLow cost Java 2013 IEEE projects
Low cost Java 2013 IEEE projects
ambitlick
 
Low cost Java IEEE Projects 2013
Low cost Java IEEE Projects 2013Low cost Java IEEE Projects 2013
Low cost Java IEEE Projects 2013ambitlick
 
Handling selfishness in replica allocation
Handling selfishness in replica allocationHandling selfishness in replica allocation
Handling selfishness in replica allocationambitlick
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocolsambitlick
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsambitlick
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsambitlick
 
IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2  IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2
ambitlick
 
Adaptive weight factor estimation from user review 1
Adaptive weight factor estimation from user   review 1Adaptive weight factor estimation from user   review 1
Adaptive weight factor estimation from user review 1
ambitlick
 
Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portalambitlick
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
ambitlick
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsambitlick
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
ambitlick
 
Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”
ambitlick
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
ambitlick
 
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
ambitlick
 
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor NetworksA Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
ambitlick
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
ambitlick
 

More from ambitlick (20)

DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency...
 
Low cost Java 2013 IEEE projects
Low cost Java 2013 IEEE projectsLow cost Java 2013 IEEE projects
Low cost Java 2013 IEEE projects
 
Low cost Java IEEE Projects 2013
Low cost Java IEEE Projects 2013Low cost Java IEEE Projects 2013
Low cost Java IEEE Projects 2013
 
Handling selfishness in replica allocation
Handling selfishness in replica allocationHandling selfishness in replica allocation
Handling selfishness in replica allocation
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
 
IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2  IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2
 
Adaptive weight factor estimation from user review 1
Adaptive weight factor estimation from user   review 1Adaptive weight factor estimation from user   review 1
Adaptive weight factor estimation from user review 1
 
Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
 
Embassy
EmbassyEmbassy
Embassy
 
Crm
Crm Crm
Crm
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
 
Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
 
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor NetworksA Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 

Recently uploaded

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 

Recently uploaded (20)

Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 

Record matching over query results

  • 1. Software Requirements Specification For Record Matching over Query Results from Multiple Web Databases Prepared by Frederick H. Lochovsky Pelican Infotech Submitted in partial fulfillment Of the requirements of Mining sequential patterns matching over high utility data sets
  • 2. Mining sequential patterns matching over high utility data sets Page ii Table of Contents Introduction...................................................................................................................................3 Purpose .................................................................................................................................................... 3 Document Conventions ........................................................................................................................... 3 Intended Audience and Reading Suggestions.......................................................................................... 3 Product Scope........................................................................................................................................... 3 References................................................................................................................................................ 4 Overall Description.......................................................................................................................4 Product Perspective.................................................................................................................................. 4 Product Functions..................................................................................................................................... 5 User Classes and Characteristics.............................................................................................................. 5 Operating Environment............................................................................................................................ 6 Design and Implementation Constraints.................................................................................................. 7 User Documentation............................................................................................................................... 10 Assumptions and Dependencies............................................................................................................. 10 External Interface Requirements.............................................................................................. 11 User Interfaces....................................................................................................................................... 11 Hardware Interfaces............................................................................................................................... 11 Software Interfaces................................................................................................................................. 12 Communications Interfaces.................................................................................................................... 15 System Features.......................................................................................................................... 18 Other Nonfunctional Requirements..........................................................................................25 Performance Requirements.................................................................................................................... 25 Safety Requirements.............................................................................................................................. 25 Security Requirements........................................................................................................................... 25 Software Quality Attributes................................................................................................................... 25 Business Rules....................................................................................................................................... 25 Other Requirements................................................................................................................... 25 Revision History Name Date Reason For Changes Version
  • 3. Introduction Purpose This Software Requirements Specification provides a complete description of all the functions and specifications of the Frederick H. Lochovsky on Mining sequential patterns matching over high utility data sets Document Conventions Though this document is intended as a set of Requirements, and not a design document, technical information has been included wherever it was deem appropriate. Priority for all functionality is assumed to be equally except where noted. Intended Audience and Reading Suggestions The primary audience for this document is the development team. The secondary audience is the Pelican InfoTech project management team. Product Scope Query-dependent and a pre learned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 4. UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. References The following references are relevant to the project and can be consulted to project a more detailed view of the technologies and standards being used in this project 1. Eliminating Fuzzy Duplicates in Data Warehouses R. Ananthakrishna, S. Chaudhuri, and V. Ganti 2. A Comparison of Fast Blocking Methods for Record Linkage R. Baxter, P. Christen, and T. Churches 3. Robust Identification of Fuzzy Duplicates S. Chaudhuri, V. Ganti, and R. Motwani Overall Description Product Perspective Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 5. • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases. Product Functions The product shall allow users to: • Install and set up an issue tracking database • Define the formats of acceptable issues • File preformatted reports in a database • Submit issues to a database • Query the database in a number of ways • Edit issues in the database and resubmit them • Merge multiple issues into a single issue • Relate issues to each other in a hierarchical form • Assemble groups of related issues into a document User Classes and Characteristics Individual Local Developers. Individual developers should be able to submit issues, edit issues, and perform queries on the database to discover what issues are relevant to them, which issues are open (in the case of issues to which that is relevant, such as defect reports or unsatisfied requirements), etc. These individual developers are assumed to have some knowledge of the development environment and are familiar and comfortable with basic Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 6. software tools such as text editors etc. As a result, the individual developer tools will be the most "primitive" but also the most efficient for use, probably implemented as text-based command line tools. Since Network simulation is primarily intended as an easy-to-use, free tool for individual developers and small teams, this is the most critical user class to satisfy. The tools must be relatively easy to use, and extremely easy to set up. Local Issue Managers. Issue managers -- those responsible for keeping track of open issues, etc. -- must have tools capable of querying the database and relating issues to developers. The tools used for issue managers and individual developers will be very similar, as they will be doing similar tasks -- querying the database for open issues, assigning people to issues as appropriate, recategorizing issues or merging/splitting them, etc. However, issue managers may not be as comfortable with "primitive" tools as individual developers, so some thought will be given to more "scripted" or directive tools, possibly involving simple GUI elements. However, the bulk of user-interface issues will be placed on the next user class, remote users. Remote Users. If Network simulation is used as a defect management system, then remote users (users of software packages submitting reports to a Network simulation center) will constitute the bulk of submissions. If Network simulation is to be used in this way, it must cater to the needs of these users, who will have much lower skills and will require very simple, easy-to-use interfaces. Primarily these interfaces will focus on problem submission, but they will also allow some ability to query the database, etc. Operating Environment In a computer the operating environment includes temperature and so on affecting circuitry; but in particular the term is often used to describe the non-physical environment in which Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 7. software runs. This may apply to application software with which users interact, comprising the "look and feel" of the system, its appearance and the things that have to be done to achieve desired results. The term may also apply to system software; e.g., software designed for a Unix environment will do things differently than in a Microsoft Windows environment. Some operating environments for programming purposes are referred as programming environments; e.g., the "UNIX programming environment" for a Unix shell with its look and feel and functionality. "Operating environment" is not the totality of the functionality and appearance of an operating system. Design and Implementation Constraints 1 Architecture Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 8. Applying the Mining Tool Using Mining the Data Algorithms Check the customer using RFC model Analyze the Business Customer Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 9. Cluster formation DB Check max High profit, gold the user customer min Start the mining Low profit Store & manage Analyze Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 10. User Documentation none Assumptions and Dependencies Data bases are defined as @relation 'cpu' @attribute MYCT real @attribute MMIN real @attribute MMAX real @attribute CACH real @attribute CHMIN real @attribute CHMAX real @attribute class real @data 125,256,6000,256,16,128,199 29,8000,32000,32,8,32,253 29,8000,32000,32,8,32,253 29,8000,32000,32,8,32,253 29,8000,16000,32,8,16,132 26,8000,32000,64,8,32,290 23,16000,32000,64,16,32,381 23,16000,32000,64,16,32,381 23,16000,64000,64,16,32,749 23,32000,64000,128,32,64,1238 400,1000,3000,0,1,2,23 400,512,3500,4,1,6,24 60,2000,8000,65,1,8,70 50,4000,16000,65,1,8,117 350,64,64,0,1,4,15 Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 11. External Interface Requirements User Interfaces  Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.  Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID).  Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.  Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.  Rule induction: The extraction of useful if-then rules from data based on statistical significance. Hardware Interfaces Hardware Specification Processor Type : Pentium -III Speed : 1.6 GHZ Ram : 128 MB RAM Hard disk : 8 GB HD Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 12. Software Interfaces Java began as a client side platform independent programming language that enabled stand-alone Java applications and applets. The numerous benefits of Java resulted in an explosion in the usage of Java in the back end server side enterprise systems. The Java Development Kit (JDK), which was the original standard platform defined by Sun, was soon supplemented by a collection of enterprise APIs. The proliferation of enterprise APIs, often developed by several different groups, resulted in divergence of APIs and caused concern among the Java developer community. Java byte code can execute on the server instead of or in addition to the client, enabling you to build traditional client/server applications and modern thin client Web applications. Two key server side Java technologies are servlets and JavaServer Pages. Servlets are protocol and platform independent server side components which extend the functionality of a Web server. JavaServer Pages (JSPs) extend the functionality of servlets by allowing Java servlet code to be embedded in an HTML file. Features of Java • Platform Independence o The Write-Once-Run-Anywhere ideal has not been achieved (tuning for different platforms usually required), but closer than with other languages. • Object Oriented • Object oriented throughout - no coding outside of class definitions, including main(). • An extensive class library available in the core language packages. • Compiler/Interpreter Combo Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 13. Code is compiled to byte codes that are interpreted by a Java virtual machines (JVM). • This provides portability to any machine for which a virtual machine has been written. • The two steps of compilation and interpretation allow for extensive code checking and improved security. • Robust • Exception handling built-in, strong type checking (that is, all data must be declared an explicit type), local variables must be initialized. • Several dangerous features of C & C++ eliminated: • No memory pointers • No preprocessor • Array index limit checking • Automatic Memory Management • Automatic garbage collection - memory management handled by JVM. • Security • No memory pointers • Programs run inside the virtual machine sandbox. • Array index limit checking • Code pathologies reduced by • byte code verifier - checks classes after loading • Class loader - confines objects to unique namespaces. Prevents loading a hacked "java.lang.SecurityManager" class, for example. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 14. Security manager - determines what resources a class can access such as reading and writing to the local disk. • Dynamic Binding • The linking of data and methods to where they are located is done at run-time. • New classes can be loaded while a program is running. Linking is done on the fly. • Even if libraries are recompiled, there is no need to recompile code that uses classes in those libraries. This differs from C++, which uses static binding. This can result in fragile classes for cases where linked code is changed and memory pointers then point to the wrong addresses. • Good Performance • Interpretation of byte codes slowed performance in early versions, but advanced virtual machines with adaptive and just-in-time compilation and other techniques now typically provide performance up to 50% to 100% the speed of C++ programs. • Threading • Lightweight processes, called threads, can easily be spun off to perform multiprocessing. • Can take advantage of multiprocessors where available • Great for multimedia displays. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 15. Built-in Networking • Java was designed with networking in mind and comes with many classes to develop sophisticated Internet communications. Communications Interfaces ECLIPSE Eclipse is an open-source software framework written primarily in Java .The initial codebase originated from VisualAge. In its default form it is an Integrated Development Environment (IDE) for Java developers, consisting of the Java Development Tools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipse software framework, such as development toolkits for other programming languages, and can write and contribute their own plug-in modules. Language packs provide translations into over a dozen natural languages. 4.1.1 ARCHITECTURE: The basis for Eclipse is the Rich Client Platform (RCP). The following components constitute the rich client platform: • OSGi - a standard bundling framework • Core platform - boot Eclipse, run plug-ins • The Standard Widget Toolkit (SWT) - a portable widget toolkit • JFace - viewer classes to bring model view controller programming to SWT, file buffers, text handling, and text editors • The Eclipse Workbench - views, editors, perspectives, wizards Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 16. Eclipse's widgets are implemented by a widget toolkit for Java called SWT, unlike most Java applications, which use the Java standard Abstract Window Toolkit(AWT) or Swing. Eclipse's user interface also leverages an intermediate GUI layer called JFace, which simplifies the construction of applications based on SWT. Eclipse employs plug-ins in order to provide all of its functionality on top of (and including) the rich client platform, in contrast to some other applications where functionality is typically hard coded. This plug-in mechanism is a lightweight software componentry framework. In addition to allowing Eclipse to be extended using other programming languages such as C and Python, the plug-in framework allows Eclipse to work with typesetting languages like LaTeX, [3] networking applications such as telnet, and database management systems. The plug-in architecture supports writing any desired extension to the environment, such as for configuration management. Java and CVS support is provided in the Eclipse SDK. The key to the seamless integration of tools with Eclipse is the plugin. With the exception of a small run-time kernel, everything in Eclipse is a plug-in. This means that a plug-in you develop integrates with Eclipse in exactly the same way as other plug-ins; in this respect, all features are created equal. Eclipse provides plugins for a wide variety of features, some of which are through third parties using both free and commercial models. Examples of plugins include UML plugin for Sequence and other UML diagrams, plugin for Database explorer, etc. The Eclipse SDK includes the Eclipse Java Development Tools, offering an IDE with a built- in incremental Java compiler and a full model of the Java source files. This allows for advanced refactoring techniques and code analysis. The IDE also makes use of a workspace, in this case a set of metadata over a flat files pace allowing external file modifications as long as the corresponding workspace "resource" is refreshed afterwards. The Visual Editor project allows interfaces to be created interactively, hence allowing Eclipse to be used as a RAD tool. 4.1.2 HISTORY Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 17. Eclipse began as an IBM Canada project. It was developed by OTI (Object Technology International) as a replacement for VisualAge, which itself had been developed by OTI. In November 2001, a consortium was formed to further the development of Eclipse as open source. In 2003, the Eclipse Foundation was created. Eclipse 3.0 (released on June 21 2004) selected the OSGi Service Platform specifications as the runtime architecture. Eclipse was originally released under the Common Public License, but was later re-licensed under the Eclipse Public License. The Free Software Foundation has said that both licenses are free software licenses, but are incompatible with the GNU General Public License (GPL). Mike Milinkovich, of the Eclipse Foundation has commented that moving to the GPL will be considered when version 3 of the GPL is released. 4.1.3 MYECLIPSE: MyEclipse is a commercially available Enterprise Java and AJAX IDE created and maintained by the company Genuitec, a founding member of the Eclipse Foundation. MyEclipse is built upon the Eclipse platform, and integrates both proprietary and open source solutions into the development environment. MyEclipse has two primary versions a professional and a standard edition. The standard edition adds database tools, a visual web designer, persistence tools, Spring tools, Struts and JSF tooling, and a number of other features to the basic Eclipse Java Developer profile. It competes with the Web Tools Project, which is a part of Eclipse itself, but MyEclipse is a separate project entirely and offers a different feature set. Most recently, MyEclipse has been made available via Pulse, a provisioning tool that maintains Eclipse software profiles, including those that use MyEclipse. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 18. System Features Embedding Data into Weka Data mining tool Weka (Waikato Environment for Knowledge Analysis) is a Java-based data mining tool developed by Waikato University. After loading the dataset into it, the preprocess function of Weka allows the user to input undesired attributes to prevent them from affecting the quality of extracted knowledge. Next, the user can use one of the three algorithms to mine the data: Classification, Clustering, and Association Rule. Data Mining is playing a key role in most enterprises, which have to analyse great amounts of data in order to achieve higher profits. Nevertheless, due to the large datasets involved in this process, the data mining field must face some technological challenges. Grid Computing takes advantage of the low-load periods of all the computers connected to a network, making possible resource and data sharing. Providing Grid services constitute a flexible manner of tackling the data mining needs. This paper shows the adaptation of Weka, a widely used Data Mining tool, to a grid infrastructure. Classifiers in WEKA are models for predicting nominal or numeric quantities, Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptions, logistic regression, Bayes’ nets, “Meta”- classifiers include: Bagging, boosting, stacking, error-correcting output Codes, locally weighted learning. WEKA contains “clusters” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, Farthest First , Clusters can be visualized Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 19. and compared to “true” clusters (if given) Evaluation based on log likelihood if clustering scheme produces a probability distribution Suppose you have some data and you want to build a decision tree from it. A common situation is for the data to be stored in a spreadsheet or database. However, Weka expects it to be in ARFF format, introduced in Section 2.4, because it is necessary to have type information about each attribute which cannot be automatically deducted from the attribute values. Before you can apply any algorithm to your data, is must be converted to ARFF form. This can be done very easily. Recall that the bulk of an ARFF file consists of a list of all the instances, with the attribute values for each instance being separated by commas (Figure 2.2). Most spreadsheet and database programs allow you to export your data into a file in comma separated format—as a list of records where the items are separated by commas. Once this has been done, you need only load the file into a text editor or a word processor; add the dataset’s name using the @relation tag, the attribute information using @attribute, and a @data line; save the file as raw text—and you’re done! In the following example we assume that your data is stored in a Microsoft Excel spreadsheet, and you’re using Microsoft Word for text processing. Of course, the process of converting data into ARFF format is very similar for other software packages. Figure 8.1a shows an Excel spreadsheet containing the weather data. It is easy to save this data in comma-separated format. First, select the Save As… item from the File pull-down menu. Then, in the ensuing dialog box, select CSV. Now load this file into Microsoft Word. Your screen will look like. The rows of the original spreadsheet have been converted into lines of text, and the elements are separated from each other by commas. All you have to do is convert the first line, which holds the attribute names, into the header structure that makes up the beginning of an ARFF file. Shows the result. The dataset’s name is introduced by a @relation tag, and the Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 20. names, types, and values of each attribute are defined by @attribute tags. The data section of the ARFF file begins with a @data tag. Once the structure of your dataset matches, you should save it as a text file. Choose Save as… from the File menu, and specify Text Only with Line Breaks as the file type by using the corresponding popup menu. Enter a file name, and press the Save button. We suggest that you rename the file to weather.arff to indicate that it is in ARFF format. Note that the classification schemes in Weka assume by default that the class is the last attribute in the ARFF file, which fortunately it is in this case. (We explain in Section 8.3 below how to override this default.) Now you can start analyzing this data using the algorithms provided. In the following we assume that you have downloaded Weka to your system, and that your Java environment knows where to find the library. (More information on how to do this can be found at the Weka Web site.) To see what the C4.5 decision tree learner described in Section 6.1 does with this dataset, we use the J4.8 algorithm, which is Weka’s implementation of this decision tree learner. (J4.8 actually implements a later and slightly improved version called C4.5 Revision 8, which was the last public version of this family of algorithms before C5.0, a commercial implementation, was released.) Type java weka.classifiers.j48.J48 -t weather.arff at the command line. This incantation calls the Java virtual machine and instructs it to execute the J48 algorithm from the j48 package—a sub package of classifiers, which is part of the overall weka package. Weka is organized in “packages” that correspond to a directory hierarchy. We’ll give more details of the package structure in the next section: in this case, the sub package name is j48 and the program to be executed from it is called J48. The –t option informs the algorithm that the next argument is the name of the training file. After pressing Return, you’ll see the output shown. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 21. 5.2.2.1 The weka.core package The core package is central to the Weka system. It contains classes that are accessed from almost every other class. You can find out what they are by clicking on the hyperlink underlying weka.core, which brings up. The Web page is divided into two parts: the Interface Index and the Class Index. The latter is a list of all classes contained within the package, while the former lists all the interfaces it provides. An interface is very similar to a class, the only difference being that it doesn’t actually do anything by itself—it is merely a list of methods without actual implementations. Other classes can declare that they “implement” a particular interface, and then provide code for its methods. For example, the Option Handler interface defines those methods that are implemented by all classes that can process command-line options—including all classifiers. The key classes in the core package are called Attribute, Instance, and Instances. An object of class Attribute represents an attribute. It contains the attribute’s name, its type and, in the case of a nominal attribute, its possible values. An object of class Instance contains the attribute values of a particular instance; and an object of class Instances holds an ordered set of instances, in other words, a dataset. By clicking on the hyperlinks underlying the classes, you can find out more about them. However, you need not know the details just to use Weka from the command line. We will return to these classes in Section 8.4 when we discuss how to access the machine learning routines from other Java code. Clicking on the All Packages hyperlink in the upper left corner of any documentation page brings you back to the listing of all the packages in Weka. 5.2.2.2 The weka.classifiers package Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 22. The classifiers package contains implementations of most of the algorithms for classification and numeric prediction that have been discussed in this book. (Numeric prediction is included in classifiers: it is interpreted as prediction of a continuous class.) The most important class in this package is Classifier, which defines the general structure of any scheme for classification or numeric prediction. It contains two methods, buildClassifier() and classifyInstance(), which all of these learning algorithms have to implement. In the jargon of object-oriented programming, the learning algorithms are represented by subclasses of Classifier, and therefore automatically inherit these two methods. Every scheme redefines them according to how it builds a classifier and how it classifies instances. This gives a uniform interface for building and using classifiers from other Java code. Hence, for example, the same evaluation module can be used to evaluate the performance of any classifier in Weka. Another important class is Distribution Classifier. This subclass of Classifier defines the method distributionForInstance(), which returns a probability distribution for a given instance. Any classifier that can calculate class probabilities is a subclass of Distribution Classifier and implements this method. To see an example, click on DecisionStump, which is a class for building a simple one-level binary decision tree (with an extra branch for missing values). You have to use this rather lengthy expression if you want to build a decision stump from the command line. The page then displays a tree structure showing the relevant part of the class hierarchy. As you can see, Decision Stump is a subclass of Distribution Classifier, and therefore produces class probabilities. Distribution Classifier, in turn, is a subclass of Classifier, which is itself a subclass of Object. The Object class is the most general one in Java: all classes are automatically subclasses of it. After some generic information about the class, its author, and its version, it gives an index of the constructors and methods of this class. A constructor is a special kind of method that is called whenever an object of that class is created, usually initializing the variables that collectively define its state. The index of Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 23. methods lists the name of each one, the type of parameters it takes, and a short description of its functionality. Beneath those indexes, the Web page gives more details about the constructors and methods. We return to those details later. As you can see, Decision Stump implements all methods required by both a Classifier and a Distribution Classifier. In addition, it contains toString() and main() methods. The former returns a textual description of the classifier, used whenever it is printed on the screen. The latter is called every time you ask for a decision stump from the command line, in other words, every time you enter a command beginning with java weka.classifiers. Decision Stump The presence of a main() method in a class indicates that it can be run from the command line, and all learning methods and filter algorithms implement it.  Waikato Environment for Knowledge Analysis  Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java o Released under the GPL  Support for the whole process of experimental data mining o Preparation of input data o Statistical evaluation of learning schemes o Visualization of input data and the result of learning  Used for education, research and applications  Complements “Data Mining” by Witten & Frank 5.2.2.3 Features  49 data preprocessing tools  76 classification/regression algorithms  8 clustering algorithms  15 attribute/subset evaluators + 10 search algorithms for feature selection Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 24.  3 algorithms for finding association rules  3 graphical user interfaces o “The Explorer” (exploratory data analysis) o “The Experimenter” (experimental environment) o “The Knowledge Flow” (new process model inspired interface)  Continue to develop and support WEKA  MOA (Massive Online Analysis) o Framework that supports learning from data streams  Facilities for data generation, experimental analysis, learning algorithms, etc. o The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct o First public release, probably this Christmas, or perhaps Thanksgiving (as it’s just another turkey)  MILK o Multi-Instance Learning Kit  Proper o Propositionalization toolbox for WEKA Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 25. Other Nonfunctional Requirements Performance Requirements The system has no specific performance requirements at this time Safety Requirements The system has no specific safety requirements at this time, except to the extent that it is designed to run without root access. Security Requirements The system has no specific security requirements at this time. Software Quality Attributes No additional software quality attributes are addressed in the requirements at this time. Business Rules There are no explicit business rules for operation of Network simulation at this time. All users with access to the command line tools and a copy of the repository will be allowed to perform all actions. Additional security measures and procedures may be added at a future date. Other Requirements There are no additional requirements for the product at this time Appendix A: Glossary Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com