SlideShare a Scribd company logo
Linguistic Component: Sentiment
Analyzer for the Russian language
Technical description
SemanticAnalyzer Group, 2013-08-30
www.semanticanalyzer.info
This document describes technical details of sentiment analyzer for the Russian language. The
component has several modes of operation:
 Processing of generic texts: news, technical articles etc
 Processing of Twitter messages
 Processing of above two types of texts for generic background sentiment
 Processing of above two types of texts for a set of multi-word synonyms representing a target
object
The sentiment analyzer is based on two other linguistic components: tokenizer and lemmatizer (see
their respective Technical descriptions). Beside attributing to one of three classes {NEGATIVE, NEUTRAL,
POSITIVE} the analyzer is capable of analyzing objectivity / subjectivity of an input message.
Demo package sent upon request contains the following:
 Java library of sentiment analyzer in a form of a binary
 Polarity dictionaries
 run_sentiment_engine.sh script for swift checking the functionality of the module
 messages_to_detect_sentiment.txt file containing examples of generic text and tweets for
sentiment attribution using the run_sentiment_engine.sh script
The algorithm is based on a set of rules, that compactly model flow of sentiment within an input
message. The synonym matching can be strong and fuzzy (accomodating misspellings of an object name
in a text).
Speed of processing
Server: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz
Operating system: ubuntu 10.04, Java 1.7.0_21 64 bit server
480 characters/ms
70 tokens/ms
Tests were conducted in a single thread on 63 511 tweet messages with 2 527 227 words and 17 350 258
characters. Total time of execution: 36170 ms.
Format of the messages_to_detect_sentiment.txt file
This file describes input data for the sentiment analyzer for demo purposes.
Format:
Text
OR
TexttKeyword comma separated list
Text contains textual data in Russian for detecting sentiment
t – tab symbol
Keyword comma separated list is a list of object synonyms to detect sentiment against.
Examples of detecting sentiment
The run_sentiment_engine.sh script will generate the following file: messages_to_detect_sentiment.out.
For the following input file messages_to_detect_sentiment.txt:
Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone
(sentence: ”I liked new iPhone, but GalaxyS is unhandy” with the object described with the keyword
”iPhone”)
This output gets generated:
Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone [iphone] POSITIVE
For the following input file messages_to_detect_sentiment.txt:
Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS
(same sentence, but with the object described with the keyword ”GalaxyS”)
This output gets generated:
Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS [galaxys]
NEGATIVE
Examples of using the library from the Java code
public void testdetectPolarityOfText() throws Exception {
SentimentEngine sentimentEngine = new SentimentEngine(new
File("conf/sentiment-module.properties"));
sentimentEngine.setVerbose(true);
// variants of the same brand McCafe in Russian tweets
String synonyms[] = {"МсCafe", "maccafe", "маккафе", ""мак кафе"",
"маккафэ", ""мак кафэ""};
List<List<String>> synonymsList = new ArrayList<List<String>>();
for(String synonym: synonyms) {
List<String> curSynonym = new ArrayList<String>();
curSynonym.add(synonym);
synonymsList.add(curSynonym);
}
// tweet message: ”We were in McCafe today! Unbelievable tasty cakes,
but damn, they are so big!!”
SynonymSentiment synonymSentiment =
sentimentEngine.detectPolarityOfTextForSynonyms("ох сегодня
были в МакКафе! безумно вкусные пирожные, но блии н они ж гиганские!!",
synonymsList);
assertEquals(true, synonymSentiment.isSynonymFound());
assertEquals(Enumerations.Sentiment.POSITIVE,
synonymSentiment.getSentimentTag());
}
This test case should pass, i.e. the detected sentiment for a set of object synonyms is going to be
POSITIVE.

More Related Content

Viewers also liked

Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeup
Dmitry Kan
 
Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsDmitry Kan
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian language
Dmitry Kan
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine Translation
Dmitry Kan
 
Introduction To Machine Translation
Introduction To Machine TranslationIntroduction To Machine Translation
Introduction To Machine TranslationDmitry Kan
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Dmitry Kan
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Dmitry Kan
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
Dmitry Kan
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian language
Dmitry Kan
 
Semantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use casesSemantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use cases
Dmitry Kan
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source state
Dmitry Kan
 

Viewers also liked (12)

Lucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeupLucene revolution eu 2013 dublin writeup
Lucene revolution eu 2013 dublin writeup
 
Solr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwordsSolr onfitnesse learningfromberlinbuzzwords
Solr onfitnesse learningfromberlinbuzzwords
 
Linguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian languageLinguistic component Lemmatizer for the Russian language
Linguistic component Lemmatizer for the Russian language
 
MTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine TranslationMTEngine: Semantic-level Crowdsourced Machine Translation
MTEngine: Semantic-level Crowdsourced Machine Translation
 
Introduction To Machine Translation
Introduction To Machine TranslationIntroduction To Machine Translation
Introduction To Machine Translation
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011Rule based approach to sentiment analysis at ROMIP 2011
Rule based approach to sentiment analysis at ROMIP 2011
 
Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...Poster: Method for an automatic generation of a semantic-level contextual tra...
Poster: Method for an automatic generation of a semantic-level contextual tra...
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
 
Linguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian languageLinguistic component Tokenizer for the Russian language
Linguistic component Tokenizer for the Russian language
 
Semantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use casesSemantic Analysis: theory, applications and use cases
Semantic Analysis: theory, applications and use cases
 
IR: Open source state
IR: Open source stateIR: Open source state
IR: Open source state
 

Similar to Linguistic component Sentiment Analyzer for the Russian language

Using Static Analysis in Program Development
Using Static Analysis in Program DevelopmentUsing Static Analysis in Program Development
Using Static Analysis in Program Development
PVS-Studio
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Data Science Society
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdf
Inexture Solutions
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingNagarjun Kotyada
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...
cscpconf
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
Tommaso Teofili
 
Working with Dictionaries and ListsSets Modules you can use.pdf
Working with Dictionaries and ListsSets Modules you can use.pdfWorking with Dictionaries and ListsSets Modules you can use.pdf
Working with Dictionaries and ListsSets Modules you can use.pdf
advancesystem
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
ijnlc
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
kevig
 
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Oles Petriv “Creating one concept embedding space for persons, brands and new...Oles Petriv “Creating one concept embedding space for persons, brands and new...
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Lviv Startup Club
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
herthaweston
 
Language Search
Language SearchLanguage Search
Language Search
Bryan Warner
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaSrikanth Vanama
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
Prof Ansari
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Dinesh V
 
Userguide xmllistboxlite
Userguide xmllistboxliteUserguide xmllistboxlite
Userguide xmllistboxliteSamir Dash
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
Andrey Karpov
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 

Similar to Linguistic component Sentiment Analyzer for the Russian language (20)

Using Static Analysis in Program Development
Using Static Analysis in Program DevelopmentUsing Static Analysis in Program Development
Using Static Analysis in Program Development
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdf
 
Poster (2)
Poster (2)Poster (2)
Poster (2)
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
 
Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...Creation of a Test Bed Environment for Core Java Applications using White Box...
Creation of a Test Bed Environment for Core Java Applications using White Box...
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
Working with Dictionaries and ListsSets Modules you can use.pdf
Working with Dictionaries and ListsSets Modules you can use.pdfWorking with Dictionaries and ListsSets Modules you can use.pdf
Working with Dictionaries and ListsSets Modules you can use.pdf
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Oles Petriv “Creating one concept embedding space for persons, brands and new...Oles Petriv “Creating one concept embedding space for persons, brands and new...
Oles Petriv “Creating one concept embedding space for persons, brands and new...
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
 
Language Search
Language SearchLanguage Search
Language Search
 
Compiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_VanamaCompiler_Project_Srikanth_Vanama
Compiler_Project_Srikanth_Vanama
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Userguide xmllistboxlite
Userguide xmllistboxliteUserguide xmllistboxlite
Userguide xmllistboxlite
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
 

More from Dmitry Kan

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
Dmitry Kan
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Dmitry Kan
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social media
Dmitry Kan
 
Icsoft 2011 51_cr
Icsoft 2011 51_crIcsoft 2011 51_cr
Icsoft 2011 51_cr
Dmitry Kan
 
Computer Semantics And Machine Translation
Computer Semantics And Machine TranslationComputer Semantics And Machine Translation
Computer Semantics And Machine Translation
Dmitry Kan
 

More from Dmitry Kan (6)

London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
 
SentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social mediaSentiScan: система автоматической разметки тональности в social media
SentiScan: система автоматической разметки тональности в social media
 
Icsoft 2011 51_cr
Icsoft 2011 51_crIcsoft 2011 51_cr
Icsoft 2011 51_cr
 
Computer Semantics And Machine Translation
Computer Semantics And Machine TranslationComputer Semantics And Machine Translation
Computer Semantics And Machine Translation
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

Linguistic component Sentiment Analyzer for the Russian language

  • 1. Linguistic Component: Sentiment Analyzer for the Russian language Technical description SemanticAnalyzer Group, 2013-08-30 www.semanticanalyzer.info This document describes technical details of sentiment analyzer for the Russian language. The component has several modes of operation:  Processing of generic texts: news, technical articles etc  Processing of Twitter messages  Processing of above two types of texts for generic background sentiment  Processing of above two types of texts for a set of multi-word synonyms representing a target object The sentiment analyzer is based on two other linguistic components: tokenizer and lemmatizer (see their respective Technical descriptions). Beside attributing to one of three classes {NEGATIVE, NEUTRAL, POSITIVE} the analyzer is capable of analyzing objectivity / subjectivity of an input message. Demo package sent upon request contains the following:  Java library of sentiment analyzer in a form of a binary  Polarity dictionaries  run_sentiment_engine.sh script for swift checking the functionality of the module  messages_to_detect_sentiment.txt file containing examples of generic text and tweets for sentiment attribution using the run_sentiment_engine.sh script The algorithm is based on a set of rules, that compactly model flow of sentiment within an input message. The synonym matching can be strong and fuzzy (accomodating misspellings of an object name in a text). Speed of processing Server: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz Operating system: ubuntu 10.04, Java 1.7.0_21 64 bit server 480 characters/ms 70 tokens/ms Tests were conducted in a single thread on 63 511 tweet messages with 2 527 227 words and 17 350 258 characters. Total time of execution: 36170 ms. Format of the messages_to_detect_sentiment.txt file This file describes input data for the sentiment analyzer for demo purposes. Format: Text OR
  • 2. TexttKeyword comma separated list Text contains textual data in Russian for detecting sentiment t – tab symbol Keyword comma separated list is a list of object synonyms to detect sentiment against. Examples of detecting sentiment The run_sentiment_engine.sh script will generate the following file: messages_to_detect_sentiment.out. For the following input file messages_to_detect_sentiment.txt: Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone (sentence: ”I liked new iPhone, but GalaxyS is unhandy” with the object described with the keyword ”iPhone”) This output gets generated: Мне понравился новый iPhone, но вот GalaxyS неудобный. iPhone [iphone] POSITIVE For the following input file messages_to_detect_sentiment.txt: Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS (same sentence, but with the object described with the keyword ”GalaxyS”) This output gets generated: Мне понравился новый iPhone, но вот GalaxyS неудобный. GalaxyS [galaxys] NEGATIVE Examples of using the library from the Java code public void testdetectPolarityOfText() throws Exception { SentimentEngine sentimentEngine = new SentimentEngine(new File("conf/sentiment-module.properties")); sentimentEngine.setVerbose(true); // variants of the same brand McCafe in Russian tweets String synonyms[] = {"МсCafe", "maccafe", "маккафе", ""мак кафе"", "маккафэ", ""мак кафэ""}; List<List<String>> synonymsList = new ArrayList<List<String>>(); for(String synonym: synonyms) { List<String> curSynonym = new ArrayList<String>(); curSynonym.add(synonym); synonymsList.add(curSynonym); } // tweet message: ”We were in McCafe today! Unbelievable tasty cakes, but damn, they are so big!!” SynonymSentiment synonymSentiment =
  • 3. sentimentEngine.detectPolarityOfTextForSynonyms("ох сегодня были в МакКафе! безумно вкусные пирожные, но блии н они ж гиганские!!", synonymsList); assertEquals(true, synonymSentiment.isSynonymFound()); assertEquals(Enumerations.Sentiment.POSITIVE, synonymSentiment.getSentimentTag()); } This test case should pass, i.e. the detected sentiment for a set of object synonyms is going to be POSITIVE.