AINL 2016: Kozerenko

•

0 likes•399 views

Lidia Pivovarova

The Pullenti NER Engine in the Tasks of Semantic Similarity Establishment

Science

Semantics-Oriented Linguistic
Processor (SOLP)
• Establishing text segments containing
the similar semantic units for the tasks
of analytical text processing within the
semantic technology platform.
• The methods and instruments
presented provide the discovery of
relevant content based on users'
focused interests within a certain
domain.
2
суббота, 12 ноября 16 г.

Basic Ideas
• The hybrid approach comprising
linguistic rules and example-based
learning techniques is employed.
• The Pullenti-based engine is speciﬁed,
the two-step Semantic Expansion
Algorithm is presented.
• The legal and mass media texts are
considered. 3
суббота, 12 ноября 16 г.

Pullenti Engine Features
• Pullenti is used as the engine for
selecting entities and structuring
information.
• The feature set comprises common
morphological algorithms and parsing.
Modules are focused on the
identiﬁcation of speciﬁc entities.
суббота, 12 ноября 16 г.

Linguistic Processing
• The system integrates the seed feature-
value uniﬁcation grammar which is
extensible by creating new rule-based
modules and incremental machine-
learning components extracting phrase
structures from texts under study.
суббота, 12 ноября 16 г.

Pullenti Implementation
• The system is implemented as a
software development kit (SDK) for
information systems development
dealing with unstructured data, i.e.
texts in natural languages, for use in
the systems developed in .NET
environment. For Mac and Linux
systems it is running on Mono
platform.
суббота, 12 ноября 16 г.

Software Development Kit
• The SDK is completely written in C #
and .NET, it is a set of assemblies
of .NET Framework (2.0 and above).
Third-party extensions are not used.
• If necessary, one can take the form of a
standard service for unstructured
information processing UIMA
(Unstructured Information
Management Architecture).
суббота, 12 ноября 16 г.

The Pullenti-based technology
• The Pullenti-based technology is a
novel development designed from
scratch four years ago:
• http://semantick.ru/
• http://pullenti.ru/DemoPage.aspx
• http://semantick.ru/Ner.aspx
суббота, 12 ноября 16 г.

Performance
• The work speed can be very
approximately estimated as
60.000-80.000 characters per second
on a serial computer model. The
maximum amount of text on a 32-bit
computer that can be processed at one
time is no more than 20 MB. On a 64-
bit processor the result depends on the
size of the operational memory.
суббота, 12 ноября 16 г.

Evaluation
• The Precision (P), Recall (R) and F-
measure demonstrated by Pullenti for
the NER tasks are as follows:
• P = 0.7957-0.9663;
• R = 0.7091-0.8675;
• F-measure = 0.8083-0.8570.
•
суббота, 12 ноября 16 г.

Competition Results
• The Pullenti-based engine participated
in the NER competition organized for
the Dialog 2016 Conference
(nickname Pink):
• http://www.dialog-21.ru/media/3430/
starostinaetal.pdf
суббота, 12 ноября 16 г.

Terms of Use
• The system is free for non-commercial
use and it can be downloaded. For
commercial use the SDK is shipped
without restrictions on the number of
end users and installations:
http://semantick.ru/Technology.aspx
суббота, 12 ноября 16 г.

AINL Prototyping Environment
•www.ipiranlogos.com
суббота, 12 ноября 16 г.

What's hot

"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...GeeksLab Odessa

Data wrangling week 6Ferdin Joe John Joseph PhD

mchristy-DH2014-emop-bookhistory-toolsMatt Christy

Combining Textual and Graph-based Features for Entity Disambiguationshakimov

Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar

OpenRefine - Data Science Training for Librarianstfmorris

Scanned texts as corpora - a case study jsbien

MultiSegEfsun Kayi

Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar

Semanticnews 230913-finalDavid Newman

What's hot (10)

"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...

Data wrangling week 6

mchristy-DH2014-emop-bookhistory-tools

Combining Textual and Graph-based Features for Entity Disambiguation

Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf

OpenRefine - Data Science Training for Librarians

Scanned texts as corpora - a case study

MultiSeg

Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment

Semanticnews 230913-final

Viewers also liked

AINL 2016: Bodrunova, Blekanov, MaksimovLidia Pivovarova

AINL 2016: GoncharovLidia Pivovarova

AINL 2016: Romanova, NefedovLidia Pivovarova

AINL 2016: MuravyovLidia Pivovarova

AINL 2016: Alekseev, NikolenkoLidia Pivovarova

AINL 2016: Bastrakova, Ledesma, Millan, ZighedLidia Pivovarova

AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova

AINL 2016: SkornyakovLidia Pivovarova

AINL 2016: BugaychenkoLidia Pivovarova

AINL 2016: Ustalov Lidia Pivovarova

AINL 2016: YagunovaLidia Pivovarova

AINL 2016: NikolenkoLidia Pivovarova

AINL 2016: Panicheva, LedovayaLidia Pivovarova

AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...Lidia Pivovarova

AINL 2016: KravchenkoLidia Pivovarova

AINL 2016: BoldyrevaLidia Pivovarova

AINL 2016: Fenogenova, Karpov, KazorinLidia Pivovarova

AINL 2016: Galinsky, Alekseev, NikolenkoLidia Pivovarova

AINL 2016: PronchevaLidia Pivovarova

AINL 2016: EyeciogluLidia Pivovarova

Viewers also liked (20)

AINL 2016: Bodrunova, Blekanov, Maksimov

AINL 2016: Goncharov

AINL 2016: Romanova, Nefedov

AINL 2016: Muravyov

AINL 2016: Alekseev, Nikolenko

AINL 2016: Bastrakova, Ledesma, Millan, Zighed

AINL 2016: Castro, Lopez, Cavalcante, Couto

AINL 2016: Skornyakov

AINL 2016: Bugaychenko

AINL 2016: Ustalov

AINL 2016: Yagunova

AINL 2016: Nikolenko

AINL 2016: Panicheva, Ledovaya

AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...

AINL 2016: Kravchenko

AINL 2016: Boldyreva

AINL 2016: Fenogenova, Karpov, Kazorin

AINL 2016: Galinsky, Alekseev, Nikolenko

AINL 2016: Proncheva

AINL 2016: Eyecioglu

Similar to AINL 2016: Kozerenko

Ontology mapping for the semantic webWorawith Sangkatip

Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta

Paper id 25201463IJRAT

Development of an intelligent information resource model based on modern na...IJECEIAES

ITAC 2016 Where Open Source Meets Audit AnalyticsAndrew Clark

SoftNews-lowresMarco Masetti

How to put an annotation in htmlSTIinnsbruck

Web Annotations – A Game Changer for Language Technology?Georg Rehm

Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki

Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos

Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems

Building Blocks for IoTBob Marcus

TanushreeHaldarTanushree Haldar

Semantic technologies for the Internet of Things PayamBarnaghi

Ontologies and Software Technologies - the MOST projectComarch

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Semantic Web Company

[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek PohekarIJET - International Journal of Engineering and Techniques

Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra

A knowledge-based solution for automatic mapping in component based automat...FAST-Lab. Factory Automation Systems and Technologies Laboratory, Tampere University of Technology

A Schema-Based Approach To Modeling And Querying WWW DataLisa Garcia

Similar to AINL 2016: Kozerenko (20)

Ontology mapping for the semantic web

Research in Intelligent Systems and Data Science at the Knowledge Media Insti...

Paper id 25201463

Development of an intelligent information resource model based on modern na...

ITAC 2016 Where Open Source Meets Audit Analytics

SoftNews-lowres

How to put an annotation in html

Web Annotations – A Game Changer for Language Technology?

Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data

Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre

Building Blocks for IoT

TanushreeHaldar

Semantic technologies for the Internet of Things

Ontologies and Software Technologies - the MOST project

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar

Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...

A knowledge-based solution for automatic mapping in component based automat...

A Schema-Based Approach To Modeling And Querying WWW Data

Recently uploaded

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54

Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9

Speech, hearing, noise, intelligibility.pptxpriyankatabhane

Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad

Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003

FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV

Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Transposable elements in prokaryotes.pptArshadWarsi13

Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003

Topic 9- General Principles of International Law.pptxJorenAcuavera1

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh

Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde

Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju

Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY

Harmful and Useful Microorganisms Presentationtahreemzahra82

Recently uploaded (20)

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)

Microphone- characteristics,carbon microphone, dynamic microphone.pptx

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...

Speech, hearing, noise, intelligibility.pptx

Environmental Biotechnology Topic:- Microbial Biosensor

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

preservation, maintanence and improvement of industrial organism.pptx

FREE NURSING BUNDLE FOR NURSES.PDF by na

Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service

Transposable elements in prokaryotes.ppt

Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx

Topic 9- General Principles of International Law.pptx

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx

Microteaching on terms used in filtration .Pharmaceutical Engineering

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf

Behavioral Disorder: Schizophrenia & it's Case Study.pdf

Harmful and Useful Microorganisms Presentation

AINL 2016: Kozerenko

1. The Pullenti NER Engine in the Tasks of Semantic Similarity Establishment Kuznetsov K.I. (k.smith@mail.ru), Kozerenko E.B. (kozerenko@mail.ru), Romanov D.A. DRomanov@it.ru FRC “Computer Science and Control” NRU HSE AINL FRUCT 2016 суббота, 12 ноября 16 г.

2. Semantics-Oriented Linguistic Processor (SOLP) • Establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. • The methods and instruments presented provide the discovery of relevant content based on users' focused interests within a certain domain. 2 суббота, 12 ноября 16 г.

3. Basic Ideas • The hybrid approach comprising linguistic rules and example-based learning techniques is employed. • The Pullenti-based engine is speciﬁed, the two-step Semantic Expansion Algorithm is presented. • The legal and mass media texts are considered. 3 суббота, 12 ноября 16 г.

4. Pullenti Engine Features • Pullenti is used as the engine for selecting entities and structuring information. • The feature set comprises common morphological algorithms and parsing. Modules are focused on the identiﬁcation of speciﬁc entities. суббота, 12 ноября 16 г.

5. Linguistic Processing • The system integrates the seed feature- value uniﬁcation grammar which is extensible by creating new rule-based modules and incremental machine- learning components extracting phrase structures from texts under study. суббота, 12 ноября 16 г.

6. Pullenti Implementation • The system is implemented as a software development kit (SDK) for information systems development dealing with unstructured data, i.e. texts in natural languages, for use in the systems developed in .NET environment. For Mac and Linux systems it is running on Mono platform. суббота, 12 ноября 16 г.

7. Software Development Kit • The SDK is completely written in C # and .NET, it is a set of assemblies of .NET Framework (2.0 and above). Third-party extensions are not used. • If necessary, one can take the form of a standard service for unstructured information processing UIMA (Unstructured Information Management Architecture). суббота, 12 ноября 16 г.

8. Semantics Expansion Algorithm • The two steps of the Semantics Expansion Algorithm (SEA): • a text of a newly received application to the Constitutional Court (CC) is matched to a list of links to the profile of the applications traced by links to Normative Legal Acts (NLA) and legal norms (LN) (meaningful terms and bigrams). суббота, 12 ноября 16 г.

9. The Pullenti-based technology • The Pullenti-based technology is a novel development designed from scratch four years ago: • http://semantick.ru/ • http://pullenti.ru/DemoPage.aspx • http://semantick.ru/Ner.aspx суббота, 12 ноября 16 г.

10. Performance • The work speed can be very approximately estimated as 60.000-80.000 characters per second on a serial computer model. The maximum amount of text on a 32-bit computer that can be processed at one time is no more than 20 MB. On a 64- bit processor the result depends on the size of the operational memory. суббота, 12 ноября 16 г.

11. Evaluation • The Precision (P), Recall (R) and F- measure demonstrated by Pullenti for the NER tasks are as follows: • P = 0.7957-0.9663; • R = 0.7091-0.8675; • F-measure = 0.8083-0.8570. • суббота, 12 ноября 16 г.

12. Competition Results • The Pullenti-based engine participated in the NER competition organized for the Dialog 2016 Conference (nickname Pink): • http://www.dialog-21.ru/media/3430/ starostinaetal.pdf суббота, 12 ноября 16 г.

13. Terms of Use • The system is free for non-commercial use and it can be downloaded. For commercial use the SDK is shipped without restrictions on the number of end users and installations: http://semantick.ru/Technology.aspx суббота, 12 ноября 16 г.

14. AINL Prototyping Environment •www.ipiranlogos.com суббота, 12 ноября 16 г.

15. • Thank you! суббота, 12 ноября 16 г.

AINL 2016: Kozerenko

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (20)

Similar to AINL 2016: Kozerenko

Similar to AINL 2016: Kozerenko (20)

More from Lidia Pivovarova

More from Lidia Pivovarova (13)

Recently uploaded

Recently uploaded (20)

AINL 2016: Kozerenko