• Save
RapidMiner:  Word Vector Tool And Rapid Miner
Upcoming SlideShare
Loading in...5
×
 

RapidMiner: Word Vector Tool And Rapid Miner

on

  • 6,906 views

RapidMiner: Word Vector Tool And Rapid Miner

RapidMiner: Word Vector Tool And Rapid Miner

Statistics

Views

Total Views
6,906
Views on SlideShare
6,874
Embed Views
32

Actions

Likes
3
Downloads
0
Comments
0

4 Embeds 32

http://www.slideshare.net 26
http://www.dataminingtools.net 3
http://dataminingtools.net 2
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RapidMiner:  Word Vector Tool And Rapid Miner RapidMiner: Word Vector Tool And Rapid Miner Presentation Transcript

  • RapidMiner5
    2.9 - Word vector tool and RapidMiner
  • Word Vector tool
    The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources.
    It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
  • Installation
    1. Download the archive form wvtoolsourceforge website.
  • Installation
    2. Putting it into lib/plugins directory of your
    RapidMiner installation, example:
    D:Program FilesRapid-IRapidMiner5libplugins
  • Word Vector tool
    The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining.
    It can easily be invoked from any Java application.
  • Word Vector tool
    WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
  • Functions
  • Word List
    A word list contains all terms used for vectorization together with some statistics (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
  • WVtool functions
    Input list that tells the system which text documents to process
    WVTool Function
    Inputs
    A configuration object,
    that tells the system which methods to use in the individual steps.
  • Defining the input
    The input list tells the WVTool which texts should be processed. Every item in the list contains the following information:
    A URI
    The language the document is written in (optional)
    ˆ The type of the document (optional)
    ˆ The character encoding of the document, e.g. UTF-8 (optional)
    ˆ A class label
  • Using Predefined Word Lists
    In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
  • Text Input
    The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
  • Text Classification, Clustering and Visualization
    For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
  • Creating and Maintaining Word Lists
    Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
  • Creating and Maintaining Word Lists
    Applying a Word List: You can apply a word list in two ways:
    To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
  • Creating and Maintaining Word Lists
    Applying a Word List: You can apply a word list in two ways:
    2. To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
  • Creating and Maintaining Word Lists
    Updating a Word List : If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
  • More Questions?
    Reach us at support@dataminingtools.net
    Visit: www.dataminingtools.net