SlideShare a Scribd company logo
RapidMiner5 2.9 - Word vector tool and RapidMiner
Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources.  It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
Installation 1.	Download the archive form wvtoolsourceforge website.
Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
Functions
Word List A word list contains all terms used for vectorization together with some statistics  (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
WVtool functions Input list that tells the system which text documents to process WVTool Function  Inputs A configuration object, that tells the system which methods to use in the individual steps.
Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI  The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  2.  To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
Creating and Maintaining Word Lists Updating a Word List :  If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net

More Related Content

Viewers also liked

Clustering
ClusteringClustering
Clustering
Meme Hei
 
Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
DataminingTools Inc
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
Rapidmining Content
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
DataminingTools Inc
 
Welcome
WelcomeWelcome
2008 IEDM presentation
2008 IEDM presentation2008 IEDM presentation
2008 IEDM presentation
slrommel
 
LISP:Predicates in lisp
LISP:Predicates in lispLISP:Predicates in lisp
LISP:Predicates in lisp
DataminingTools Inc
 
Quick Look At Classification
Quick Look At ClassificationQuick Look At Classification
Quick Look At Classification
DataminingTools Inc
 
Data Applied: Similarity
Data Applied: SimilarityData Applied: Similarity
Data Applied: Similarity
DataminingTools Inc
 
SPSS: Data Editor
SPSS: Data EditorSPSS: Data Editor
SPSS: Data Editor
DataminingTools Inc
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
DataminingTools Inc
 
Ireland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs SlideshareIreland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs SlideshareTibbs Pereira
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
DataminingTools Inc
 
Traffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids IntroTraffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids Intro
Eugene SRTS
 
XL-Miner: Timeseries
XL-Miner: TimeseriesXL-Miner: Timeseries
XL-Miner: Timeseries
DataminingTools Inc
 
Apresentação Red Advisers
Apresentação Red AdvisersApresentação Red Advisers
Apresentação Red Advisers
mezkita
 

Viewers also liked (20)

Clustering
ClusteringClustering
Clustering
 
Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
Welcome
WelcomeWelcome
Welcome
 
2008 IEDM presentation
2008 IEDM presentation2008 IEDM presentation
2008 IEDM presentation
 
LISP:Predicates in lisp
LISP:Predicates in lispLISP:Predicates in lisp
LISP:Predicates in lisp
 
Test
TestTest
Test
 
Quick Look At Classification
Quick Look At ClassificationQuick Look At Classification
Quick Look At Classification
 
Data Applied: Similarity
Data Applied: SimilarityData Applied: Similarity
Data Applied: Similarity
 
Anime
AnimeAnime
Anime
 
SPSS: Data Editor
SPSS: Data EditorSPSS: Data Editor
SPSS: Data Editor
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
Ireland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs SlideshareIreland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs Slideshare
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
 
Traffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids IntroTraffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids Intro
 
XL-Miner: Timeseries
XL-Miner: TimeseriesXL-Miner: Timeseries
XL-Miner: Timeseries
 
Apresentação Red Advisers
Apresentação Red AdvisersApresentação Red Advisers
Apresentação Red Advisers
 
Txomin Hartz Txikia
Txomin Hartz TxikiaTxomin Hartz Txikia
Txomin Hartz Txikia
 

Similar to RapidMiner: Word Vector Tool And Rapid Miner

List of values Best Practices
List of values Best PracticesList of values Best Practices
List of values Best Practices
Ad Ghauri
 
Team G
Team GTeam G
Team Gbutest
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
sqlserver content
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
DataminingTools Inc
 
I x scripting
I x scriptingI x scripting
I x scripting
Alex do Amaral Dias
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freewaresarahannelazarus
 
20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx
NTUSubjectRooms
 
Olap
OlapOlap
Olap
preksha33
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) Environment
Mahmoud Samir Fayed
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
Alexandro Colorado
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2
GDSCUniversitasMatan
 
Intro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexIntro To Flex Typography 360|Flex
Intro To Flex Typography 360|Flex
Matt Guest
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushalAjay Yadav
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
arsmobiles
 
What is html xml and xhtml
What is html xml and xhtmlWhat is html xml and xhtml
What is html xml and xhtml
FkdiMl
 
Robot framework
Robot frameworkRobot framework

Similar to RapidMiner: Word Vector Tool And Rapid Miner (20)

List of values Best Practices
List of values Best PracticesList of values Best Practices
List of values Best Practices
 
Team G
Team GTeam G
Team G
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
I x scripting
I x scriptingI x scripting
I x scripting
 
 
Project seminar
Project seminarProject seminar
Project seminar
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx
 
Olap
OlapOlap
Olap
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) Environment
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2
 
Intro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexIntro To Flex Typography 360|Flex
Intro To Flex Typography 360|Flex
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushal
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
 
What is html xml and xhtml
What is html xml and xhtmlWhat is html xml and xhtml
What is html xml and xhtml
 
PDFArticle
PDFArticlePDFArticle
PDFArticle
 
Robot framework
Robot frameworkRobot framework
Robot framework
 

More from DataminingTools Inc

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
 

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

RapidMiner: Word Vector Tool And Rapid Miner

  • 1. RapidMiner5 2.9 - Word vector tool and RapidMiner
  • 2. Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources. It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
  • 3. Installation 1. Download the archive form wvtoolsourceforge website.
  • 4. Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
  • 5. Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
  • 6. Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
  • 8. Word List A word list contains all terms used for vectorization together with some statistics (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
  • 9. WVtool functions Input list that tells the system which text documents to process WVTool Function Inputs A configuration object, that tells the system which methods to use in the individual steps.
  • 10. Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
  • 11. Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
  • 12. Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
  • 13. Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
  • 14. Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
  • 15. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
  • 16. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: 2. To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
  • 17. Creating and Maintaining Word Lists Updating a Word List : If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
  • 18. More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net