Your SlideShare is downloading. ×
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Record matching over query results
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Record matching over query results

2,424

Published on

Published in: Education, Technology, Business
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,424
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Software Requirements Specification ForRecord Matching over Query Results from Multiple Web Databases Prepared by Frederick H. Lochovsky Pelican Infotech Submitted in partial fulfillment Of the requirements ofMining sequential patterns matching over high utility data sets
  • 2. Mining sequential patterns matching over high utility data sets Page iiTable of ContentsIntroduction...................................................................................................................................3 Purpose .................................................................................................................................................... 3 Document Conventions ........................................................................................................................... 3 Intended Audience and Reading Suggestions.......................................................................................... 3 Product Scope........................................................................................................................................... 3 References................................................................................................................................................ 4Overall Description.......................................................................................................................4 Product Perspective.................................................................................................................................. 4 Product Functions..................................................................................................................................... 5 User Classes and Characteristics.............................................................................................................. 5 Operating Environment............................................................................................................................ 6 Design and Implementation Constraints.................................................................................................. 7 User Documentation............................................................................................................................... 10 Assumptions and Dependencies............................................................................................................. 10External Interface Requirements.............................................................................................. 11 User Interfaces....................................................................................................................................... 11 Hardware Interfaces............................................................................................................................... 11 Software Interfaces................................................................................................................................. 12 Communications Interfaces.................................................................................................................... 15System Features.......................................................................................................................... 18Other Nonfunctional Requirements..........................................................................................25 Performance Requirements.................................................................................................................... 25 Safety Requirements.............................................................................................................................. 25 Security Requirements........................................................................................................................... 25 Software Quality Attributes................................................................................................................... 25 Business Rules....................................................................................................................................... 25Other Requirements................................................................................................................... 25Revision HistoryName Date Reason For Changes Version
  • 3. IntroductionPurpose This Software Requirements Specification provides a complete description of all thefunctions and specifications of the Frederick H. Lochovsky on Mining sequential patternsmatching over high utility data setsDocument ConventionsThough this document is intended as a set of Requirements, and not a design document,technical information has been included wherever it was deem appropriate.Priority for all functionality is assumed to be equally except where noted.Intended Audience and Reading SuggestionsThe primary audience for this document is the development team. The secondary audience is thePelican InfoTech project management team.Product ScopeQuery-dependent and a pre learned method using training examples from previous queryresults may fail on the results of a new query. To address the problem of record matching inthe Web database scenario, we present an unsupervised, online record matching method, Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 4. UDD, which, for a given query, can effectively identify duplicates from the query resultrecords of multiple Web databases.ReferencesThe following references are relevant to the project and can be consulted to project a moredetailed view of the technologies and standards being used in this project 1. Eliminating Fuzzy Duplicates in Data Warehouses R. Ananthakrishna, S. Chaudhuri, and V. Ganti 2. A Comparison of Fast Blocking Methods for Record Linkage R. Baxter, P. Christen, and T. Churches 3. Robust Identification of Fuzzy Duplicates S. Chaudhuri, V. Ganti, and R. MotwaniOverall DescriptionProduct Perspective Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 5. • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases.Product FunctionsThe product shall allow users to: • Install and set up an issue tracking database • Define the formats of acceptable issues • File preformatted reports in a database • Submit issues to a database • Query the database in a number of ways • Edit issues in the database and resubmit them • Merge multiple issues into a single issue • Relate issues to each other in a hierarchical form • Assemble groups of related issues into a documentUser Classes and CharacteristicsIndividual Local Developers. Individual developers should be able to submit issues, editissues, and perform queries on the database to discover what issues are relevant to them,which issues are open (in the case of issues to which that is relevant, such as defect reports orunsatisfied requirements), etc. These individual developers are assumed to have someknowledge of the development environment and are familiar and comfortable with basic Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 6. software tools such as text editors etc. As a result, the individual developer tools will be themost "primitive" but also the most efficient for use, probably implemented as text-basedcommand line tools. Since Network simulation is primarily intended as an easy-to-use, freetool for individual developers and small teams, this is the most critical user class to satisfy.The tools must be relatively easy to use, and extremely easy to set up.Local Issue Managers. Issue managers -- those responsible for keeping track of open issues,etc. -- must have tools capable of querying the database and relating issues to developers. Thetools used for issue managers and individual developers will be very similar, as they will bedoing similar tasks -- querying the database for open issues, assigning people to issues asappropriate, recategorizing issues or merging/splitting them, etc. However, issue managersmay not be as comfortable with "primitive" tools as individual developers, so some thoughtwill be given to more "scripted" or directive tools, possibly involving simple GUI elements.However, the bulk of user-interface issues will be placed on the next user class, remote users.Remote Users. If Network simulation is used as a defect management system, then remoteusers (users of software packages submitting reports to a Network simulation center) willconstitute the bulk of submissions. If Network simulation is to be used in this way, it mustcater to the needs of these users, who will have much lower skills and will require verysimple, easy-to-use interfaces. Primarily these interfaces will focus on problem submission,but they will also allow some ability to query the database, etc.Operating EnvironmentIn a computer the operating environment includes temperature and so on affecting circuitry;but in particular the term is often used to describe the non-physical environment in which Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 7. software runs. This may apply to application software with which users interact, comprisingthe "look and feel" of the system, its appearance and the things that have to be done to achievedesired results. The term may also apply to system software; e.g., software designed for aUnix environment will do things differently than in a Microsoft Windows environment. Someoperating environments for programming purposes are referred as programmingenvironments; e.g., the "UNIX programming environment" for a Unix shell with its look andfeel and functionality."Operating environment" is not the totality of the functionality and appearance of an operatingsystem.Design and Implementation Constraints1 Architecture Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 8. Applying the Mining Tool Using Mining the DataAlgorithms Check the customer using RFC model Analyze the Business Customer Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 9. Cluster formation DB Check max High profit, gold the user customer min Start the mining Low profit Store & manage Analyze Ambit lick SolutionsMail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 10. User DocumentationnoneAssumptions and DependenciesData bases are defined as@relation cpu@attribute MYCT real@attribute MMIN real@attribute MMAX real@attribute CACH real@attribute CHMIN real@attribute CHMAX real@attribute class real@data125,256,6000,256,16,128,19929,8000,32000,32,8,32,25329,8000,32000,32,8,32,25329,8000,32000,32,8,32,25329,8000,16000,32,8,16,13226,8000,32000,64,8,32,29023,16000,32000,64,16,32,38123,16000,32000,64,16,32,38123,16000,64000,64,16,32,74923,32000,64000,128,32,64,1238400,1000,3000,0,1,2,23400,512,3500,4,1,6,2460,2000,8000,65,1,8,7050,4000,16000,65,1,8,117350,64,64,0,1,4,15 Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 11. External Interface RequirementsUser Interfaces  Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.  Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID).  Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.  Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.  Rule induction: The extraction of useful if-then rules from data based on statistical significance.Hardware Interfaces Hardware Specification Processor Type : Pentium -III Speed : 1.6 GHZ Ram : 128 MB RAM Hard disk : 8 GB HD Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 12. Software Interfaces Java began as a client side platform independent programming language that enabledstand-alone Java applications and applets. The numerous benefits of Java resulted in anexplosion in the usage of Java in the back end server side enterprise systems. The JavaDevelopment Kit (JDK), which was the original standard platform defined by Sun, was soonsupplemented by a collection of enterprise APIs. The proliferation of enterprise APIs, oftendeveloped by several different groups, resulted in divergence of APIs and caused concernamong the Java developer community. Java byte code can execute on the server instead of or in addition to the client,enabling you to build traditional client/server applications and modern thin client Webapplications. Two key server side Java technologies are servlets and JavaServer Pages.Servlets are protocol and platform independent server side components which extend thefunctionality of a Web server. JavaServer Pages (JSPs) extend the functionality of servlets byallowing Java servlet code to be embedded in an HTML file.Features of Java • Platform Independence o The Write-Once-Run-Anywhere ideal has not been achieved (tuning for different platforms usually required), but closer than with other languages. • Object Oriented • Object oriented throughout - no coding outside of class definitions, including main(). • An extensive class library available in the core language packages. • Compiler/Interpreter Combo Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 13. • Code is compiled to byte codes that are interpreted by a Java virtual machines (JVM). • This provides portability to any machine for which a virtual machine has been written. • The two steps of compilation and interpretation allow for extensive code checking and improved security.• Robust • Exception handling built-in, strong type checking (that is, all data must be declared an explicit type), local variables must be initialized.• Several dangerous features of C & C++ eliminated: • No memory pointers • No preprocessor • Array index limit checking• Automatic Memory Management • Automatic garbage collection - memory management handled by JVM.• Security • No memory pointers • Programs run inside the virtual machine sandbox. • Array index limit checking • Code pathologies reduced by • byte code verifier - checks classes after loading • Class loader - confines objects to unique namespaces. Prevents loading a hacked "java.lang.SecurityManager" class, for example. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 14. • Security manager - determines what resources a class can access such as reading and writing to the local disk.• Dynamic Binding • The linking of data and methods to where they are located is done at run-time. • New classes can be loaded while a program is running. Linking is done on the fly. • Even if libraries are recompiled, there is no need to recompile code that uses classes in those libraries. This differs from C++, which uses static binding. This can result in fragile classes for cases where linked code is changed and memory pointers then point to the wrong addresses.• Good Performance • Interpretation of byte codes slowed performance in early versions, but advanced virtual machines with adaptive and just-in-time compilation and other techniques now typically provide performance up to 50% to 100% the speed of C++ programs.• Threading • Lightweight processes, called threads, can easily be spun off to perform multiprocessing. • Can take advantage of multiprocessors where available • Great for multimedia displays. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 15. • Built-in Networking • Java was designed with networking in mind and comes with many classes to develop sophisticated Internet communications.Communications InterfacesECLIPSE Eclipse is an open-source software framework written primarily in Java .Theinitial codebase originated from VisualAge. In its default form it is an IntegratedDevelopment Environment (IDE) for Java developers, consisting of the Java DevelopmentTools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipsesoftware framework, such as development toolkits for other programming languages, and canwrite and contribute their own plug-in modules. Language packs provide translations into overa dozen natural languages.4.1.1 ARCHITECTURE: The basis for Eclipse is the Rich Client Platform (RCP). The followingcomponents constitute the rich client platform: • OSGi - a standard bundling framework • Core platform - boot Eclipse, run plug-ins • The Standard Widget Toolkit (SWT) - a portable widget toolkit • JFace - viewer classes to bring model view controller programming to SWT, file buffers, text handling, and text editors • The Eclipse Workbench - views, editors, perspectives, wizards Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 16. Eclipses widgets are implemented by a widget toolkit for Java called SWT,unlike most Java applications, which use the Java standard Abstract Window Toolkit(AWT)or Swing. Eclipses user interface also leverages an intermediate GUI layer called JFace,which simplifies the construction of applications based on SWT.Eclipse employs plug-ins in order to provide all of its functionality on top of (and including)the rich client platform, in contrast to some other applications where functionality is typicallyhard coded. This plug-in mechanism is a lightweight software componentry framework. Inaddition to allowing Eclipse to be extended using other programming languages such as C andPython, the plug-in framework allows Eclipse to work with typesetting languages like LaTeX,[3] networking applications such as telnet, and database management systems. The plug-inarchitecture supports writing any desired extension to the environment, such as forconfiguration management. Java and CVS support is provided in the Eclipse SDK. The key to the seamless integration of tools with Eclipse is the plugin. With the exception ofa small run-time kernel, everything in Eclipse is a plug-in. This means that a plug-in youdevelop integrates with Eclipse in exactly the same way as other plug-ins; in this respect, allfeatures are created equal. Eclipse provides plugins for a wide variety of features, some ofwhich are through third parties using both free and commercial models. Examples of pluginsinclude UML plugin for Sequence and other UML diagrams, plugin for Database explorer,etc.The Eclipse SDK includes the Eclipse Java Development Tools, offering an IDE with a built-in incremental Java compiler and a full model of the Java source files. This allows foradvanced refactoring techniques and code analysis. The IDE also makes use of a workspace,in this case a set of metadata over a flat files pace allowing external file modifications as longas the corresponding workspace "resource" is refreshed afterwards. The Visual Editor projectallows interfaces to be created interactively, hence allowing Eclipse to be used as a RAD tool.4.1.2 HISTORY Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 17. Eclipse began as an IBM Canada project. It was developed by OTI (Object TechnologyInternational) as a replacement for VisualAge, which itself had been developed by OTI. InNovember 2001, a consortium was formed to further the development of Eclipse as opensource. In 2003, the Eclipse Foundation was created.Eclipse 3.0 (released on June 21 2004) selected the OSGi Service Platform specifications asthe runtime architecture.Eclipse was originally released under the Common Public License, but was later re-licensedunder the Eclipse Public License. The Free Software Foundation has said that both licensesare free software licenses, but are incompatible with the GNU General Public License (GPL).Mike Milinkovich, of the Eclipse Foundation has commented that moving to the GPL will beconsidered when version 3 of the GPL is released.4.1.3 MYECLIPSE: MyEclipse is a commercially available Enterprise Java and AJAX IDE created andmaintained by the company Genuitec, a founding member of the Eclipse Foundation.MyEclipse is built upon the Eclipse platform, and integrates both proprietary and open sourcesolutions into the development environment. MyEclipse has two primary versions a professional and a standard edition. Thestandard edition adds database tools, a visual web designer, persistence tools, Spring tools,Struts and JSF tooling, and a number of other features to the basic Eclipse Java Developerprofile. It competes with the Web Tools Project, which is a part of Eclipse itself, butMyEclipse is a separate project entirely and offers a different feature set. Most recently,MyEclipse has been made available via Pulse, a provisioning tool that maintains Eclipsesoftware profiles, including those that use MyEclipse. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 18. System FeaturesEmbedding Data into Weka Data mining tool Weka (Waikato Environment for Knowledge Analysis) is a Java-based data miningtool developed by Waikato University. After loading the dataset into it, the preprocessfunction of Weka allows the user to input undesired attributes to prevent them from affectingthe quality of extracted knowledge. Next, the user can use one of the three algorithms to minethe data: Classification, Clustering, and Association Rule. Data Mining is playing a key role in most enterprises, which have to analyse greatamounts of data in order to achieve higher profits. Nevertheless, due to the large datasetsinvolved in this process, the data mining field must face some technological challenges. GridComputing takes advantage of the low-load periods of all the computers connected to anetwork, making possible resource and data sharing. Providing Grid services constitute aflexible manner of tackling the data mining needs. This paper shows the adaptation of Weka, awidely used Data Mining tool, to a grid infrastructure. Classifiers in WEKA are models for predicting nominal or numeric quantities,Implemented learning schemes include: Decision trees and lists, instance-based classifiers,support vector machines, multi-layer perceptions, logistic regression, Bayes’ nets, “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting outputCodes, locally weighted learning. WEKA contains “clusters” for finding groups of similar instances in a datasetImplemented schemes are: k-Means, EM, Cobweb, Farthest First , Clusters can be visualized Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 19. and compared to “true” clusters (if given) Evaluation based on log likelihood if clusteringscheme produces a probability distribution Suppose you have some data and you want to build a decision tree from it. A commonsituation is for the data to be stored in a spreadsheet or database. However, Weka expects it tobe in ARFF format, introduced in Section 2.4, because it is necessary to have type informationabout each attribute which cannot be automatically deducted from the attribute values. Beforeyou can apply any algorithm to your data, is must be converted to ARFF form. This can bedone very easily. Recall that the bulk of an ARFF file consists of a list of all the instances,with the attribute values for each instance being separated by commas (Figure 2.2). Mostspreadsheet and database programs allow you to export your data into a file in commaseparated format—as a list of records where the items are separated by commas. Once this has been done, you need only load the file into a text editor or a wordprocessor; add the dataset’s name using the @relation tag, the attribute information using@attribute, and a @data line; save the file as raw text—and you’re done! In the followingexample we assume that your data is stored in a Microsoft Excel spreadsheet, and you’reusing Microsoft Word for text processing. Of course, the process of converting data intoARFF format is very similar for other software packages. Figure 8.1a shows an Excelspreadsheet containing the weather data. It is easy to save this data in comma-separatedformat. First, select the Save As… item from the File pull-down menu. Then, in the ensuingdialog box, select CSV. Now load this file into Microsoft Word. Your screen will look like. The rows of the original spreadsheet have been converted into lines of text, and theelements are separated from each other by commas. All you have to do is convert the firstline, which holds the attribute names, into the header structure that makes up the beginning ofan ARFF file. Shows the result. The dataset’s name is introduced by a @relation tag, and the Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 20. names, types, and values of each attribute are defined by @attribute tags. The data section ofthe ARFF file begins with a @data tag. Once the structure of your dataset matches, youshould save it as a text file. Choose Save as… from the File menu, and specify Text Only with Line Breaks as thefile type by using the corresponding popup menu. Enter a file name, and press the Savebutton. We suggest that you rename the file to weather.arff to indicate that it is in ARFFformat. Note that the classification schemes in Weka assume by default that the class is thelast attribute in the ARFF file, which fortunately it is in this case. (We explain in Section 8.3below how to override this default.) Now you can start analyzing this data using thealgorithms provided. In the following we assume that you have downloaded Weka to yoursystem, and that your Java environment knows where to find the library. (More informationon how to do this can be found at the Weka Web site.) To see what the C4.5 decision treelearner described in Section 6.1 does with this dataset, we use the J4.8 algorithm, which isWeka’s implementation of this decision tree learner. (J4.8 actually implements a later andslightly improved version called C4.5 Revision 8, which was the last public version of thisfamily of algorithms before C5.0, a commercial implementation, was released.) Type javaweka.classifiers.j48.J48 -t weather.arff at the command line. This incantation calls the Java virtual machine and instructs it to execute the J48algorithm from the j48 package—a sub package of classifiers, which is part of the overallweka package. Weka is organized in “packages” that correspond to a directory hierarchy.We’ll give more details of the package structure in the next section: in this case, the subpackage name is j48 and the program to be executed from it is called J48. The –t optioninforms the algorithm that the next argument is the name of the training file. After pressingReturn, you’ll see the output shown. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 21. 5.2.2.1 The weka.core package The core package is central to the Weka system. It contains classes that are accessedfrom almost every other class. You can find out what they are by clicking on the hyperlinkunderlying weka.core, which brings up. The Web page is divided into two parts: the InterfaceIndex and the Class Index. The latter is a list of all classes contained within the package, whilethe former lists all the interfaces it provides. An interface is very similar to a class, the onlydifference being that it doesn’t actually do anything by itself—it is merely a list of methodswithout actual implementations. Other classes can declare that they “implement” a particularinterface, and then provide code for its methods. For example, the Option Handler interfacedefines those methods that are implemented by all classes that can process command-lineoptions—including all classifiers. The key classes in the core package are called Attribute, Instance, and Instances. Anobject of class Attribute represents an attribute. It contains the attribute’s name, its type and,in the case of a nominal attribute, its possible values. An object of class Instance contains theattribute values of a particular instance; and an object of class Instances holds an ordered setof instances, in other words, a dataset. By clicking on the hyperlinks underlying the classes,you can find out more about them. However, you need not know the details just to use Wekafrom the command line. We will return to these classes in Section 8.4 when we discuss how toaccess the machine learning routines from other Java code. Clicking on the All Packageshyperlink in the upper left corner of any documentation page brings you back to the listing ofall the packages in Weka.5.2.2.2 The weka.classifiers package Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 22. The classifiers package contains implementations of most of the algorithms forclassification and numeric prediction that have been discussed in this book. (Numericprediction is included in classifiers: it is interpreted as prediction of a continuous class.) Themost important class in this package is Classifier, which defines the general structure of anyscheme for classification or numeric prediction. It contains two methods, buildClassifier() andclassifyInstance(), which all of these learning algorithms have to implement. In the jargon ofobject-oriented programming, the learning algorithms are represented by subclasses ofClassifier, and therefore automatically inherit these two methods. Every scheme redefinesthem according to how it builds a classifier and how itclassifies instances. This gives a uniform interface for building and using classifiers fromother Java code. Hence, for example, the same evaluation module can be used to evaluate theperformance of any classifier in Weka. Another important class is Distribution Classifier. Thissubclass of Classifier defines the method distributionForInstance(), which returns aprobability distribution for a given instance. Any classifier that can calculate classprobabilities is a subclass of Distribution Classifier and implements this method.To see an example, click on DecisionStump, which is a class for building a simple one-levelbinary decision tree (with an extra branch for missing values). You have to use this ratherlengthy expression if you want to build a decision stump from the command line. The pagethen displays a tree structure showing the relevant part of the class hierarchy. As you can see,Decision Stump is a subclass of Distribution Classifier, and therefore produces classprobabilities. Distribution Classifier, in turn, is a subclass of Classifier, which is itself asubclass of Object. The Object class is the most general one in Java: all classes areautomatically subclasses of it. After some generic information about the class, its author, andits version, it gives an index of the constructors and methods of this class. A constructor is a special kind of method that is called whenever an object of thatclass is created, usually initializing the variables that collectively define its state. The index of Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 23. methods lists the name of each one, the type of parameters it takes, and a short description ofits functionality. Beneath those indexes, the Web page gives more details about theconstructors and methods. We return to those details later. As you can see, Decision Stumpimplements all methods required by both a Classifier and a Distribution Classifier. In addition,it contains toString() and main() methods. The former returns a textual description of theclassifier, used whenever it is printed on the screen. The latter is called every time you ask fora decision stump from the command line, in other words, every time you enter a commandbeginning with java weka.classifiers. Decision StumpThe presence of a main() method in a class indicates that it can be run from the command line,and all learning methods and filter algorithms implement it.  Waikato Environment for Knowledge Analysis  Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java o Released under the GPL  Support for the whole process of experimental data mining o Preparation of input data o Statistical evaluation of learning schemes o Visualization of input data and the result of learning  Used for education, research and applications  Complements “Data Mining” by Witten & Frank5.2.2.3 Features  49 data preprocessing tools  76 classification/regression algorithms  8 clustering algorithms  15 attribute/subset evaluators + 10 search algorithms for feature selection Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 24.  3 algorithms for finding association rules 3 graphical user interfaces o “The Explorer” (exploratory data analysis) o “The Experimenter” (experimental environment) o “The Knowledge Flow” (new process model inspired interface) Continue to develop and support WEKA MOA (Massive Online Analysis) o Framework that supports learning from data streams  Facilities for data generation, experimental analysis, learning algorithms, etc. o The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct o First public release, probably this Christmas, or perhaps Thanksgiving (as it’s just another turkey) MILK o Multi-Instance Learning Kit Proper o Propositionalization toolbox for WEKA Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 25. Other Nonfunctional RequirementsPerformance RequirementsThe system has no specific performance requirements at this timeSafety RequirementsThe system has no specific safety requirements at this time, except to the extent that it isdesigned to run without root access.Security RequirementsThe system has no specific security requirements at this time.Software Quality AttributesNo additional software quality attributes are addressed in the requirements at this time.Business RulesThere are no explicit business rules for operation of Network simulation at this time. All userswith access to the command line tools and a copy of the repository will be allowed to performall actions. Additional security measures and procedures may be added at a future date.Other RequirementsThere are no additional requirements for the product at this timeAppendix A: Glossary Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

×