The document describes an information extraction system that analyzes news in Russian and extracts structured information. It discusses expanding an existing information extraction system developed for English texts to also process Russian language texts. The system uses existing natural language processing tools adapted for Russian, including a morphological analyzer and syntactic parser, along with domain ontologies and extraction patterns to analyze documents and extract relevant information into database records.
This document contains a list of website URLs and attribution credits. It acknowledges stock image sources like sxc.hu and pixelio.de, as well as individual photographers like Peter Rheinäcker and Jeff Watkins. Classroom websites and tools bitstrips.com and classroomrevolution.com are also referenced.
This document is an invitation to journey inward toward spiritual wholeness and outward to serve others. It discusses the Christian tradition of pilgrimage as a journey of faith. At the center of God's vision is the restoration of all creation to harmony and relationship. As disciples of Christ, we are called to lay down self-centered lives and journey toward God's presence through spiritual transformation and acts of service.
The document summarizes food safety issues in the United States. It states that there are 75 million cases of foodborne illness annually, resulting in 325,000 hospitalizations and 5,000 deaths. Major outbreaks linked to produce have occurred in the last 15 months, costing hundreds of millions. Most Americans believe the food supply is less safe than in the past and are not confident in the government's ability to respond to attacks on the food system. Consumers universally want assurances of food safety.
This document discusses using the VLOOKUP function in Excel to look up a student's grade based on their percentage score. It provides a table that lists percentage ranges and the corresponding letter grades. It explains that a VLOOKUP formula can be used to easily look up and return the correct grade by matching the student's percentage to the table, rather than using nested IF functions. This allows the grades to be more easily edited or modified in the future by changing the source table.
Saint Valentine's Day is celebrated annually on February 14th and commemorates love and romance. The origins of the holiday can be traced back to ancient Roman festivals. In the Middle Ages, Valentine's Day traditions developed in England, such as exchanging love notes and putting women's names in a jar to be drawn by bachelors. Today, people celebrate by sending cards, gifts, and flowers to loved ones as well as participating in other Valentine's Day customs.
This document contains a list of website URLs and attribution credits. It acknowledges stock image sources like sxc.hu and pixelio.de, as well as individual photographers like Peter Rheinäcker and Jeff Watkins. Classroom websites and tools bitstrips.com and classroomrevolution.com are also referenced.
This document is an invitation to journey inward toward spiritual wholeness and outward to serve others. It discusses the Christian tradition of pilgrimage as a journey of faith. At the center of God's vision is the restoration of all creation to harmony and relationship. As disciples of Christ, we are called to lay down self-centered lives and journey toward God's presence through spiritual transformation and acts of service.
The document summarizes food safety issues in the United States. It states that there are 75 million cases of foodborne illness annually, resulting in 325,000 hospitalizations and 5,000 deaths. Major outbreaks linked to produce have occurred in the last 15 months, costing hundreds of millions. Most Americans believe the food supply is less safe than in the past and are not confident in the government's ability to respond to attacks on the food system. Consumers universally want assurances of food safety.
This document discusses using the VLOOKUP function in Excel to look up a student's grade based on their percentage score. It provides a table that lists percentage ranges and the corresponding letter grades. It explains that a VLOOKUP formula can be used to easily look up and return the correct grade by matching the student's percentage to the table, rather than using nested IF functions. This allows the grades to be more easily edited or modified in the future by changing the source table.
Saint Valentine's Day is celebrated annually on February 14th and commemorates love and romance. The origins of the holiday can be traced back to ancient Roman festivals. In the Middle Ages, Valentine's Day traditions developed in England, such as exchanging love notes and putting women's names in a jar to be drawn by bachelors. Today, people celebrate by sending cards, gifts, and flowers to loved ones as well as participating in other Valentine's Day customs.
Freecards, also known as YOCards, are picture postcards distributed through public racks that feature commercial, ideological, or cultural messages. They are taken voluntarily by consumers and kept more often than traditional advertisements. YOCards have proven effective at raising brand awareness and engagement, with campaigns receiving thousands of website visits and high pass-along rates to friends. Common types of postcards that draw people include fashion, events, and movies.
A highlight of 10 2.0 sites for those involved in knowledge management to visit and reflect on. Keeping it to 10 was very challenging, but so is the pressure of time....
The document provides a high-level overview of how data moves through a computer system from input to output. It describes the input, storage, processing, and output stages. Specifically, it mentions that data enters through devices like keyboards and mice, is stored temporarily in memory, is processed by fetching, decoding and executing instructions in the microprocessor, and is finally displayed or printed at the output stage. It also provides some computer facts from 1988-1989, such as the introduction of liquid crystal displays, copyright issues between Apple and Microsoft, and the release of early processors.
This document discusses Pay Per Click (PPC) Judo, which aims to lower cost per conversion while maintaining or increasing total conversions. It outlines essential PPC metrics like impressions, clicks, and conversions. The main PPC Judo moves are improving clickthrough rate through keyword and ad copy management and improving conversion rate through ad copy and landing page management. The key is to test changes using A/B testing and measure metrics in a continuous cycle of management.
This document discusses interactive installation art and the use of software engineering in such artworks. It provides examples of interactive art installations that utilize digital technology and software to control sensors, incorporate environmental parameters into music/light, maintain archives, and present the artwork online. Developing the software for these complex interactive artworks requires involving programmers and addressing software engineering issues like requirements, architecture, validation, and tools. The document also gives simple code examples in Processing and discusses how software engineering can be useful in education, fostering innovation, and tailoring existing SE theories and methods to the needs of the art domain.
Oolinnguaq and Knud Peter are from Illorsuit, Greenland. They attend school in Uummannaq and enjoy listening to music by Enok Poulsen. Their favorite film is about Greenlandic Inuit wrestling.
eVize 2007 - Atestace informačních systémů veřejné správyEquica
Prezentace společnosti Equica (www.equica.cz) věnující se atestacím informačních systémů veřejné správy (ISVS) a atestačním postupům přednesená v rámci konference eVize konané v rámci veletrhu Invex 2007.
My name is Larsine and I was born in Ikerasak, where I currently live, though I go to school in Uummannaq. I enjoy watching the TV show Friends and listening to music by Celine Dion and Shania Twain, and I have friends.
Aviaq was born in Uummannaq, Greenland and still lives there today. She is in 10th grade and enjoys football, ice cream, dogs, and comedy and drama movies and music. Her favorite singer is Rihanna. Historically, her people were called Inuit but Europeans called them Eskimos, meaning "people who eat raw meat." Now they call themselves Kalaallit and no longer eat raw meat, though skin clothing remains part of their traditional attire worn for celebrations.
The document discusses failures that can occur in projects and provides suggestions to prevent and recover from them. It suggests planning for failures by documenting work, testing implementations, verifying assumptions, considering different perspectives, and conducting post-mortems. Tools like wikis, test frameworks, staging environments, and collaboration with others can help address failures in documentation, testing, verification, imagination, and implementation. Conducting reflections and learning from mistakes is important for improving future work.
The document summarizes the backend systems and processes that power the new EBI search engine EB-eye. It describes the large amounts and various formats of data being indexed, the parsing and indexing of different data formats using various tools, and the distributed indexing approach across multiple servers that allows indexing to be completed in under 18 hours. It also provides an overview of the web frontend and load balancing, as well as future plans for automatic updates and verifications.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
Zotero Presentation for UNO History Dept.karindalziel
This document introduces Zotero, an open-source citation management software. It can be used to organize research, cite sources, and share citations in papers. Zotero stores citation items online or on the user's computer. It supports many reference styles and can export citations in several formats including APA, MLA, and Chicago styles. Zotero integrates with word processors to allow citations to be automatically inserted into papers.
CONFidence 2014: Davi Ottenheimer Protecting big data at scalePROIDEA
We are meant to measure and manage data with more precision than ever before using Big Data. But companies are getting Hadoopy often with little or no consideration of security. Are we taking on too much risk too fast? This session explains how best to handle the looming Big Data risk in any environment. Better predictions and more intelligent decisions are expected from our biggest data sets, yet do we really trust systems we secure the least? And do we really know why "learning" machines continue to make amusing and sometimes tragic mistakes? Infosec is in this game but with Big Data we appear to be waiting on the sidelines. What have we done about emerging vulnerabilities and threats to Hadoop as it leaves many of our traditional data paradigms behind? This presentation, based on the new book "Realities of Big Data Security" takes the audience through an overview of the hardest big data protection problem areas ahead and into our best solutions for the elephantine challenges here today.
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
This presentation demonstrates the functionality provided by the Logical Model Designer (LMD) and Snow Owl tools, which enables terminology to be bound to the Singapore Logical Information Model.
Abstract:
A critical enabler in the journey towards semantic interoperability in Singapore is the Singapore "˜Logical Information Model' (LIM). The LIM is a model of the healthcare information shared within Singapore, and is defined as a set of reusable "˜archetypes' for each clinical concept (e.g. Problem/Diagnosis, Pharmacy Order). These archetypes are then constrained and composed into "˜templates' to support specific use cases.
The Singapore LIM harmonises the semantics of the information structures with the terminology, using multiple types of terminology bindings, including semantic, value domain and constraint bindings. Value domain bindings are defined to both national "˜reference terminology' (used for querying nationally-collated data), as well as to a variety of "˜interface terminologies' used within local clinical systems (required to enforce conformance-compliance rules over message specifications generated from the LIM). To support the diversity of pre-coordination captured in local interface terms, "˜design patterns' are included in the LIM, based on the SNOMED CT concept model. These design patterns represent a logical model of meaning for a specific concept, and allow more than one split between the information model and the terminology model to be represented in a semantically-consistent manner.
This presentation will demonstrate the "˜Logical Model Designer' (LMD) - an Eclipse-based tool that is being used to maintain Singapore's Logical Information Model. A number of features of the LMD tooling will be demonstrated, with a specific focus on how the information structure is bound to the terminology via an interface to the Snow Owl platform. Value Domains are defined as reference sets within Snow Owl and then linked to the information structures defined in the LMD.
Please see our website http://b2i.sg for further information.
SHELDON is the first true hybridization of NLP machine reading and Semantic Web. It is a framework that builds upon a ma- chine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current human-readable web by using Semantic Web practices and technologies in a machine-processable form. Given a sentence in any language, it provides different semantic functionalities (frame detection, topic extraction, named entity recognition, resolution and coreference, terminology extraction, sense tagging and disambiguation, taxonomy induction, semantic role labeling, type induction, sentiment analysis, citation inference, relation and event extraction) as well as nice visualization tools which make use of the JavaScript infoVis Toolkit and RelFinder, as well as a knowledge enrichment component that extends machine reading to Semantic Web data. The system can be freely used at http://wit.istc.cnr.it/stlab-tools/sheldon.
Classification and clustering in media monitoring: from knowledge engineering...Lidia Pivovarova
This PhD thesis examines classification and clustering techniques for media monitoring, including news grouping, multi-label text classification, and business polarity detection. It focuses on applying these methods to the PULS media monitoring system, which collects over 10,000 news articles daily. The thesis contributes novel algorithms and datasets for grouping news into stories based on named entity salience, large-scale multi-label text classification balancing training sets, and the first dataset and methods for entity-level business polarity detection.
Freecards, also known as YOCards, are picture postcards distributed through public racks that feature commercial, ideological, or cultural messages. They are taken voluntarily by consumers and kept more often than traditional advertisements. YOCards have proven effective at raising brand awareness and engagement, with campaigns receiving thousands of website visits and high pass-along rates to friends. Common types of postcards that draw people include fashion, events, and movies.
A highlight of 10 2.0 sites for those involved in knowledge management to visit and reflect on. Keeping it to 10 was very challenging, but so is the pressure of time....
The document provides a high-level overview of how data moves through a computer system from input to output. It describes the input, storage, processing, and output stages. Specifically, it mentions that data enters through devices like keyboards and mice, is stored temporarily in memory, is processed by fetching, decoding and executing instructions in the microprocessor, and is finally displayed or printed at the output stage. It also provides some computer facts from 1988-1989, such as the introduction of liquid crystal displays, copyright issues between Apple and Microsoft, and the release of early processors.
This document discusses Pay Per Click (PPC) Judo, which aims to lower cost per conversion while maintaining or increasing total conversions. It outlines essential PPC metrics like impressions, clicks, and conversions. The main PPC Judo moves are improving clickthrough rate through keyword and ad copy management and improving conversion rate through ad copy and landing page management. The key is to test changes using A/B testing and measure metrics in a continuous cycle of management.
This document discusses interactive installation art and the use of software engineering in such artworks. It provides examples of interactive art installations that utilize digital technology and software to control sensors, incorporate environmental parameters into music/light, maintain archives, and present the artwork online. Developing the software for these complex interactive artworks requires involving programmers and addressing software engineering issues like requirements, architecture, validation, and tools. The document also gives simple code examples in Processing and discusses how software engineering can be useful in education, fostering innovation, and tailoring existing SE theories and methods to the needs of the art domain.
Oolinnguaq and Knud Peter are from Illorsuit, Greenland. They attend school in Uummannaq and enjoy listening to music by Enok Poulsen. Their favorite film is about Greenlandic Inuit wrestling.
eVize 2007 - Atestace informačních systémů veřejné správyEquica
Prezentace společnosti Equica (www.equica.cz) věnující se atestacím informačních systémů veřejné správy (ISVS) a atestačním postupům přednesená v rámci konference eVize konané v rámci veletrhu Invex 2007.
My name is Larsine and I was born in Ikerasak, where I currently live, though I go to school in Uummannaq. I enjoy watching the TV show Friends and listening to music by Celine Dion and Shania Twain, and I have friends.
Aviaq was born in Uummannaq, Greenland and still lives there today. She is in 10th grade and enjoys football, ice cream, dogs, and comedy and drama movies and music. Her favorite singer is Rihanna. Historically, her people were called Inuit but Europeans called them Eskimos, meaning "people who eat raw meat." Now they call themselves Kalaallit and no longer eat raw meat, though skin clothing remains part of their traditional attire worn for celebrations.
The document discusses failures that can occur in projects and provides suggestions to prevent and recover from them. It suggests planning for failures by documenting work, testing implementations, verifying assumptions, considering different perspectives, and conducting post-mortems. Tools like wikis, test frameworks, staging environments, and collaboration with others can help address failures in documentation, testing, verification, imagination, and implementation. Conducting reflections and learning from mistakes is important for improving future work.
The document summarizes the backend systems and processes that power the new EBI search engine EB-eye. It describes the large amounts and various formats of data being indexed, the parsing and indexing of different data formats using various tools, and the distributed indexing approach across multiple servers that allows indexing to be completed in under 18 hours. It also provides an overview of the web frontend and load balancing, as well as future plans for automatic updates and verifications.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
Zotero Presentation for UNO History Dept.karindalziel
This document introduces Zotero, an open-source citation management software. It can be used to organize research, cite sources, and share citations in papers. Zotero stores citation items online or on the user's computer. It supports many reference styles and can export citations in several formats including APA, MLA, and Chicago styles. Zotero integrates with word processors to allow citations to be automatically inserted into papers.
CONFidence 2014: Davi Ottenheimer Protecting big data at scalePROIDEA
We are meant to measure and manage data with more precision than ever before using Big Data. But companies are getting Hadoopy often with little or no consideration of security. Are we taking on too much risk too fast? This session explains how best to handle the looming Big Data risk in any environment. Better predictions and more intelligent decisions are expected from our biggest data sets, yet do we really trust systems we secure the least? And do we really know why "learning" machines continue to make amusing and sometimes tragic mistakes? Infosec is in this game but with Big Data we appear to be waiting on the sidelines. What have we done about emerging vulnerabilities and threats to Hadoop as it leaves many of our traditional data paradigms behind? This presentation, based on the new book "Realities of Big Data Security" takes the audience through an overview of the hardest big data protection problem areas ahead and into our best solutions for the elephantine challenges here today.
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
This presentation demonstrates the functionality provided by the Logical Model Designer (LMD) and Snow Owl tools, which enables terminology to be bound to the Singapore Logical Information Model.
Abstract:
A critical enabler in the journey towards semantic interoperability in Singapore is the Singapore "˜Logical Information Model' (LIM). The LIM is a model of the healthcare information shared within Singapore, and is defined as a set of reusable "˜archetypes' for each clinical concept (e.g. Problem/Diagnosis, Pharmacy Order). These archetypes are then constrained and composed into "˜templates' to support specific use cases.
The Singapore LIM harmonises the semantics of the information structures with the terminology, using multiple types of terminology bindings, including semantic, value domain and constraint bindings. Value domain bindings are defined to both national "˜reference terminology' (used for querying nationally-collated data), as well as to a variety of "˜interface terminologies' used within local clinical systems (required to enforce conformance-compliance rules over message specifications generated from the LIM). To support the diversity of pre-coordination captured in local interface terms, "˜design patterns' are included in the LIM, based on the SNOMED CT concept model. These design patterns represent a logical model of meaning for a specific concept, and allow more than one split between the information model and the terminology model to be represented in a semantically-consistent manner.
This presentation will demonstrate the "˜Logical Model Designer' (LMD) - an Eclipse-based tool that is being used to maintain Singapore's Logical Information Model. A number of features of the LMD tooling will be demonstrated, with a specific focus on how the information structure is bound to the terminology via an interface to the Snow Owl platform. Value Domains are defined as reference sets within Snow Owl and then linked to the information structures defined in the LMD.
Please see our website http://b2i.sg for further information.
SHELDON is the first true hybridization of NLP machine reading and Semantic Web. It is a framework that builds upon a ma- chine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current human-readable web by using Semantic Web practices and technologies in a machine-processable form. Given a sentence in any language, it provides different semantic functionalities (frame detection, topic extraction, named entity recognition, resolution and coreference, terminology extraction, sense tagging and disambiguation, taxonomy induction, semantic role labeling, type induction, sentiment analysis, citation inference, relation and event extraction) as well as nice visualization tools which make use of the JavaScript infoVis Toolkit and RelFinder, as well as a knowledge enrichment component that extends machine reading to Semantic Web data. The system can be freely used at http://wit.istc.cnr.it/stlab-tools/sheldon.
Similar to Expansion of Information Extraction System to the Russian language (6)
Classification and clustering in media monitoring: from knowledge engineering...Lidia Pivovarova
This PhD thesis examines classification and clustering techniques for media monitoring, including news grouping, multi-label text classification, and business polarity detection. It focuses on applying these methods to the PULS media monitoring system, which collects over 10,000 news articles daily. The thesis contributes novel algorithms and datasets for grouping news into stories based on named entity salience, large-scale multi-label text classification balancing training sets, and the first dataset and methods for entity-level business polarity detection.
The document describes a Russian paraphrase corpus created by the authors. It contains over 8000 sentence pairs annotated as precise, loose, or non-paraphrases using crowdsourcing. The corpus was collected from news headlines and aims to capture the most important events. The authors evaluate different models for classifying sentence pairs and find that combining linguistic features improves performance over individual feature types. Graphs built from the corpus can reveal connected events more completely than human annotations alone.
This document discusses the work of Antiplagiat Research, which tackles challenging natural language processing and plagiarism detection problems. It outlines their focus on cross-language plagiarism detection, machine-generated text detection, and intrinsic plagiarism detection. It also describes Antiplagiat Research's collaboration opportunities and their participation in evaluating plagiarism detection algorithms through workshops like Dialogue Evaluation.
This document summarizes a study that analyzed 47,410 Instagram images from Saint Petersburg over one year to understand human experience in different urban areas. The images were clustered using Google tags and user hashtags into topics like portraits, cars, flowers. The clusters were mapped geographically to see their spatial distribution. Clusters like hairstyle and animals were evenly distributed, while clothing, fitness and architecture were more detached, indicating urban segregation. The combination of semantic and geospatial analysis of social media images provided new insights into urban life not previously available from traditional data sources.
The document discusses the Pullenti NER Engine and its use in semantic similarity tasks. It presents the Semantics-Oriented Linguistic Processor (SOLP) which establishes text segments containing similar semantic units. It then describes the hybrid linguistic and machine learning approach used by the Pullenti-based engine, including the two-step Semantic Expansion Algorithm. Performance figures and evaluation metrics for Pullenti's named entity recognition are also provided.
The document discusses the reliability of results from corpus research and introduces a solution called GICR that provides automatic result analysis. GICR allows users to see statistics on search areas to check for bias or lack of homogeneity compared to the entire corpus by displaying metadata attributes like URLs, document IDs, author information, region, gender, and genre. It aims to address the problem that simply getting IPM and KWIC search results does not indicate if the results are biased by providing analysis directly in the interface.
This document discusses methods for estimating a user's actual age and gender when those values are not directly provided. It outlines using social graph analysis, natural language processing, analyzing user interests, and statistical methods. For social graph analysis, it examines using connections like classmates to infer age and analyzing local graph properties. NLP looks at gender-specific language in user profiles while interest analysis matches users to gender-biased communities. Statistics applies overall patterns in the data to make estimations.
This document presents mathematical models of information dissemination and warfare. It discusses:
1) Models of information spreading through both vertical (centralized) and horizontal (interpersonal) flows, and how the combination of these determines information dynamics in society.
2) Models of information adoption and forgetting over time, and the effects of incomplete media coverage and two-step perception.
3) Models of information warfare between two information sources, examining the necessary conditions for one to win over the other.
4) Extensions of these models including periodic destabilization, additional factors like forgetting, and a model of individual choice-making during information warfare.
This document discusses the analysis and modeling of complex systems. It describes analyzing the problem, modeling the system, and determining both quantitative and qualitative parameters. An example is given of assigning weights to different quantitative parameters. The document recommends creating a coordinate system and basis to define qualitative parameters. It formulates the final task as creating a concept for a basis of a quality parameter system. It seeks colleagues to partner with on further developing these analysis methods.
This document discusses trend detection at OK. It describes the multi-step process used: text extraction from logs, language detection, tokenization, dictionary extraction, vectorization, deduplication, statistics calculation, trend identification, clustering of trending terms, extraction of relevant documents, and visualization of trends. Both batch and streaming approaches are discussed to address the need for timely trend detection. Technologies used include Apache Kafka, YARN, Spark, Samza, Lucene and ELKI.
1. The researcher analyzed quantitative characteristics such as entropy, readability, lexical diversity, frequencies of words, and parts of speech for different text genres including scientific texts, news articles, and student writings.
2. The analysis found that student writings had higher entropy and readability than news articles or scientific texts. News articles had higher lexical diversity and frequencies of common words.
3. To evaluate the accuracy of a developed Old Irish lemmatizer, the researcher applied it to a test corpus of 840 tokens, of which 186 were unknown words. The lemmatizer correctly predicted lemmas for 84 of the unknown words, achieving an accuracy of around 60% for unknown words.
This document discusses methods for evaluating clustering validity indices (CVIs) that measure the quality of clustering results. It proposes using human assessments of clustered data as ground truth to evaluate how well different CVIs match human judgments. An experimental evaluation of 19 CVIs on 41 datasets clustered using 6 algorithms showed that none of the CVIs perfectly matched human assessments. The document concludes that while no universal CVI exists, meta-learning from past human assessments could help select the most appropriate CVI for a new clustering problem.
The document provides information on various artificial intelligence and voice assistant technologies including:
1) JUST AI and Eugene Goostman chatbot, a winner of the 2014 Turing 100 Chatbots competition.
2) Everyday Assistant, a voice assistant available on mobile devices.
3) Dusi Voice Assistant with over 1 million downloads on Google Play.
4) Era of messengers for chatting with personal assistants without voice.
5) ElSmart, the first Android phone for blind users.
6) Zenbot, an open source framework for developing voice assistants across platforms.
This document proposes a data augmentation method for image sentiment analysis using hashtags. It involves collecting a small set of manually labeled images and their hashtags, learning to predict sentiment labels from the hashtags using machine learning, and using this model to automatically label more images. Preliminary results show the hashtag-predicted labels match human labels with 83-95% accuracy. However, more testing is needed on a general set of images to fully evaluate the method's effectiveness.
This document proposes a method for continuous time series alignment in human action recognition. It defines continuous versions of time series, warping paths, and the dynamic time warping (DTW) distance. The method finds the optimal continuous warping path by approximating solutions to a cost minimization problem. An experiment applies the continuous DTW to classify human activities from accelerometer data, achieving classification accuracy close to the discrete DTW method. The continuous approach solves issues with resampling data and has potential for improved approximations and optimization methods.
➒➌➍➑➊➑➏➍➋➒ Satta Matka Satta result marka result Satta Matka Satta result marka result Dp Boss sattamatka341 satta143 Satta Matka Sattamatka New Mumbai Ratan Satta Matka Fast Matka Milan Market Kalyan Matka Results Satta Game Matka Game Satta Matka Kalyan Satta Matka Mumbai Main Online Matka Results Satta Matka Tips Milan Chart Satta Matka Boss New Star Day Satta King Live Satta Matka Results Satta Matka Company Indian Matka Satta Matka Kalyan Night Matka
Kalyan Result Final ank Satta 143 Kalyan final Kalyan panel chart Kalyan Result guessing Time bazar Kalyan guessing Kalyan satta sattamatka
Satta Matka Sattamatka Satta matka Satta result Matka result Satta result matkaresult Satta matka result Matka 420 Matka420 matka guessing matka guessing satta matta matka satta matta matka Kalyan chart Kalyan chart Satta matta Matka 143 SattamattaMatka143 Satta live Satta live Kalyan open Kalyan open Kalyan final Kalyan final Kalyan chart Kalyan chart Kalyan Panel Chart Kalyan Panel Chart Dp Boss india matka india matka
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing Indian Matka KALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing Indian Matka KALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka GuessingKALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
➒➌➎➏➑➐➋➑➐➐ Matka Guessing Satta Matka Kalyan panel Chart Indian Matka Satta Matta Matka Dpboss KALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
➒➌➎➏➑➐➋➑➐➐ Satta Matka Dpboss Matka Guessing Indian Matka
KALYAN MATKA | MATKA RESULT | KALYAN MATKA TIPS | SATTA MATKA | MATKA.COM | MATKA PANA JODI TODAY | BATTA SATKA | MATKA PATTI JODI NUMBER | MATKA RESULTS | MATKA CHART | MATKA JODI | SATTA COM | FULL RATE GAME | MATKA GAME | MATKA WAPKA | ALL MATKA RESULT LIVE ONLINE | MATKA RESULT | KALYAN MATKA RESULT | DPBOSS MATKA 143 | MAIN MATKA
➒➌➍➑➊➑➏➍➋➒ Satta Matka Satta result marka result
Satta Matka Satta result marka result Dpboss sattamatka341 satta143 Satta Matka Sattamatka New Mumbai Ratan Satta Matka Fast Matka Milan Market Kalyan Matka Results Satta Game Matka Game Satta Matka Kalyan Satta Matka Mumbai Main Online Matka Results Satta Matka Tips Milan Chart Satta Matka Boss
New Star Day Satta King Live Satta Matka Results Satta Matka Company Indian Matka Satta Matka Kalyan Night Matka
➒➌➍➑➊➑➏➍➋➒ Satta Matka Satta result marka result
Kalyan chart DP boss guessing matka number➑➌➋➑➒➎➑➑➊➍
Satta Matka Kalyan Main Mumbai Fastest Results
Satta Matka ❋ Sattamatka ❋ New Mumbai Ratan Satta Matka ❋ Fast Matka ❋ Milan Market ❋ Kalyan Matka Results ❋ Satta Game ❋ Matka Game ❋ Satta Matka ❋ Kalyan Satta Matka ❋ Mumbai Main ❋ Online Matka Results ❋ Satta Matka Tips ❋ Milan Chart ❋ Satta Matka Boss❋ New Star Day ❋ Satta King ❋ Live Satta Matka Results ❋ Satta Matka Company ❋ Indian Matka ❋ Satta Matka 143❋ Kalyan Night Matka..
2. PULS Project
● http://puls.cs.helsinki.fi/puls
● University of Helsinki, Department of Computer Science
● Team:
Project lead: Roman Yangarber
Mian Du
Peter von Etter
Silja Huttunen
Lidia Pivovarova, St. Petersburg State University, Russia
Visitors and Past Members:
Mikhail Novikov (2011)
Esben Alfort, Copenhagen Business School, Denmark (Summer 2011)
Lauri Jokipii (2006)
Gaël Lejeune, University of Caen, France (2009)
Heikki Manninen (2009)
Natalia Tarbeeva, State University of Perm', Russia (2011)
Arto Vihavainen (2010-2011)
3.
4.
5. Papers
● Ralph Grishman, Silja Huttunen, Roman Yangarber. Real-Time Event Extraction for
Infectious Disease Outbreaks In Proceedings of the 3rd Annual Human Language
Technology Conference HLT-2002 (2002) San Diego, CA
● M Atkinson, J Piskorski, H Tanev, E van der Goot, R Yangarber, V Zavarella.
Automated event extraction in the domain of Border Security In Proceedings of
MINUCS-2009: Workshop on Mining User-Generated Content for Security, at the
UCMedia-2009: ICST Conference on User-Centric Media (2009) Venice, Italy
● Silja Huttunen, Arto Vihavainen, Peter von Etter, Roman Yangarber. Relevance
prediction in information extraction using discourse and lexical features Nodalida-
2011: Nordic Conference on Computational Linguistics (2011) Riga, Latvia
● Mian Du, Peter von Etter, Mikhail Kopotev, Mikhail Novikov, Natalia Tarbeeva,
Roman Yangarber. Building support tools for Russian-language information
extraction BSNLP-2011: Balto-Slavonic Natural Language Processing (2011) Plzeň,
Czech Republic
6. Russian: task definition
● Russian news analysis for Border Security and
Medical scenario
● Results representation in a unified form
(common for Russian and English texts)
● Usage of existing (made for English) tools – as
much as it possible
11. Scenario: Border Security
● Monitoring of:
– Illegal migration
– Criminal activity related to border crossing (e.g.
smuggling)
– Criminal activity in general
● Motivation
– News may be the only information source for an
event
– Or the fastest source
– Or provide with an alternative point of view /
extra details
16. General Scheme
AOT Ontology
Document
(plain text) Wrapper Dictionaries
Linguistic analysis Semantic markup
Sentences, syntax
groups, entities
Information Extraction
Inference Database
Patterns record
rules
17. General Scheme
AOT Ontology
Document
(plain text) Wrapper Dictionaries
Linguistic analysis Semantic markup
Sentences, syntax
groups, entities
Information Extraction
Inference Database
Patterns record
rules
18. AOT
● http://seman.sourceforge.net/
● open source toolkit for Russian linguistic
analysis
● libraries for morphological, syntactic, and
semantic analysis, language generation, tools
for working with dictionaries, and GUIs for
visualizing the analysis
● we use only the morphological and syntactic
analyzers, called Lemm and Synan
19. AOT: LEMM
На берегу пограничной реки задержаны двадцать семь нелегальных мигрантов.
0 0 BEG DOC
На 0 2 RLE Aa NAM? EXPR1 EXPR2 EXPR_NO277 +?? НА яв 44907 0
берегу 3 6 RLE aa +Ун БЕРЕЧЬ кб 133825 0
берегу 3 6 RLE aa +Фа БЕРЕГ авЭх 153063 0
пограничной 10 11 RLE aa +?? ПОГРАНИЧНЫЙ йзйийкйл 167378 0
реки 22 4 RLE aa +Фа РЕКА гбгжгй 150782 0
задержаны 27 9 RLE aa +Ул ЗАДЕРЖАТЬ сэ 144652 0
двадцать 37 8 RLE aa +?? ДВАДЦАТЬ эаэг 145038 0
семь 46 4 RLE aa +?? СЕМЬ эаэг 145046 0
нелегальных 51 11 RLE aa +Уе НЕЛЕГАЛЬНЫЙ йуйхйч 170468 0
мигрантов 63 9 RLE aa CS? SENT_END +Фб МИГРАНТ азай 87080 0
20. AOT: LEMM
На берегу пограничной реки задержаны двадцать семь нелегальных мигрантов.
Twenty seven illegal migrants have been detained on the bank of the borderline river
22. AOT: SYNAN
На берегу пограничной реки задержаны двадцать семь нелегальных мигрантов.
Twenty seven illegal migrants have been detained on the bank of the borderline river
23. WRAPPER
● Lemm: does not disambiguate
● Synan: does not contain all the words, only
words which are used in grammar relations
● Wrapper: combines results of Lemm and Synan
+ elements of semantic analysis (e.g. proper
names)
24. WRAPPER
● Grammar tags are mapped into common English tags
● For every binary relation in Synan output, we take the
corresponding parent and child analyses from Lemm
– all other analyses are removed
– If the lemma for parent or child was null (e.g. a group)
we infer information from Lemm
● If AOT produces two parents for a node (e.g. conjunctions) the
wrapper adjusts the links so that they form a proper tree
structure
● Some groups are converted into relations
● If a word does not participate in any relation, its analysis is
taken entirely from Lemm output, passing along any unresolved
ambiguity
25. WRAPPER
0 0 На НА NIL:>NIL PREP NIL
1 3 берегу БЕРЕГ PREPGR:>0 NOUN (2GENL INAN MASC LOC SG)
2 10 пограничной ПОГРАНИЧНЫЙ ADJ_NOUN:>3 ADJ (INAN ANIM FEM GEN SG)
3 22 реки РЕКА GEN_NOMGR:>1 NOUN (INAN FEM GEN SG)
4 27 задержаны ЗАДЕРЖАТЬ NIL:>NIL SPARTICIP (PASS TRV PERF INAN ANIM PAST PL)
5 37 двадцать ДВАДЦАТЬ CARD_ORD_GR:>6 CARD (NOM)
6 46 семь СЕМЬ NUM_NOUN:>8 CARD (NOM)
7 51 нелегальных НЕЛЕГАЛЬНЫЙ ADJ_NOUN:>8 ADJ (QADJ INAN ANIM GEN PL)
8 63 мигрантов МИГРАНТ SUBJ:>4 NOUN (ANIM MASC GEN PL)
26. General Scheme
AOT Ontology
Document
(plain text) Wrapper Dictionaries
Linguistic analysis Semantic markup
Sentences, syntax
groups, entities
Information Extraction
Inference Database
Patterns record
rules
28. Ontology structure
ENGLISH LEXICON
CONCEPT
1. Implicit:
HIERACHY
- If a concept name is the only
word it is concidered to be a word
- IS-A relation
which can be found in a text
- it as also possible to add
- multiple inheritance
single word synonyms (aliases)
2. Explicit
- Multiword English lexicon
30. Ontology structure
ENGLISH LEXICON
CONCEPT
1. Implicit:
HIERACHY
- If a concept name is the only
word it is concidered to be a word
- IS-A relation
which can be found in a text
- it as also possible to add
- multiple inheritance
single word synonyms (aliases)
2. Explicit
- Multiword English lexicon
31. Ontology structure
ENGLISH LEXICON
CONCEPT
1. Implicit:
HIERACHY
- If a concept name is the only
word it is concidered to be a word
- IS-A relation
which can be found in a text
- it as also possible to add
- multiple inheritance
single word synonyms (aliases)
2. Explicit
- Multiword English lexicon
Russian lexicon
- Words
- Multiword expressions (in a form of
low-level patterns)
32. Ontology structure
ENGLISH LEXICON DICTIONARIES
- INSTANCE-OF
CONCEPT
1. Implicit: relation
HIERACHY
- If a concept name is the only - locations
word it is concidered to be a word - diseases
- IS-A relation
which can be found in a text - companies
- it as also possible to add - persons
- multiple inheritance
single word synonyms (aliases) - etc...
2. Explicit
- Multiword English lexicon
Russian lexicon
- Words
- Multiword expressions (in a form of
low-level patterns)
33. Ontology structure
ENGLISH LEXICON DICTIONARIES
- INSTANCE-OF
CONCEPT
1. Implicit: relation
HIERACHY
- If a concept name is the only - locations
word it is concidered to be a word - diseases
- IS-A relation
which can be found in a text - companies
- it as also possible to add - persons
- multiple inheritance
single word synonyms (aliases) - etc...
2. Explicit
- Multiword English lexicon Dictionaries mapping to
Russian
Russian lexicon
- Words
- Multiword expressions (in a form of
low-level patterns)
34. General Scheme
AOT Ontology
Document
(plain text) Wrapper Dictionaries
Linguistic analysis Semantic markup
Sentences, syntax
groups, entities
Information Extraction
Inference Database
Patterns record
rules
35. Patterns
;; example
;; Житель Дагестана пересекал белорусско-украинскую границу с пистолетом на ремне.
;; A citizen of Dagestan crossed Russian-Belorussian border with a gun attached to his belt
;; Definition
(defpattern rus-cross-border-act
" mix* noun-group(c-person) mix* finv-group(cross-border) mix* noun-group(border) mix*
item-with? sa* location-pp-loc?
:
suspect=2.attributes, anchor=4.attributes, border=6.attributes"
)
;; Constraints: here we check syntax properties but it also possible to check any other
(lexical, semantic) features as well as their combination
(defun constraint-rus-cross-border-act ()
(with-bindings (suspect anchor border)
(active-clause-check-p suspect anchor border)))
;; Action: event is created; it is also possible to create entities (for low-level patterns)
(defun whenpattern-rus-cross-border ()
(with-bindings (suspect anchor item-with location)
(event (assert-event :predicate 'SECURITY_EVENT
:type 'ILLEGAL-MIGRATION
:subtype 'ILLEGAL-ENTRY
:suspect suspect
:anchor verb-head
:location location
:item item-with-entity))))
36. Patterns
● Fixed word order
● Semantic classes verification
● Verification of grammatical features (it may could
be any features, the most common are POS)
● Some elements may be optional (?) or multiple
(*)
● It is possible to use sub-patterns
This pattern language is already developed for
English.
37. Inference rules
;; Area (here: text, plus-minus one sentence)
(def-kb-infrule IR-CRISIS->TYPE-ON-SUSPECT (:pool :discourse :distance 1)
;; Event, found by patterns
(?crisis-event (event :predicate SECURITY_EVENT
:TYPE CRISIS
:suspect ?suspect
:SUBTYPE ?subtype))
;; Some other property form text
(?perpetrator-entity (entity :CLASS (isa C-PERPETRATOR)))
-->
;; Action (here: event type is changed)
(new-event-type (case-isa perpetrator-class
(C-HUMAN-TRAFFICKER 'HUMAN-TRAFFICKING)
(C-SMUGGLER 'SMUGGLE)
(C-MIGRANT 'ILLEGAL-MIGRATION)
(C-TERRORIST 'CRISIS)
(C-KIDNAPPING 'CRISIS)
(TRANSPLANTOLOGIST 'HUMAN-TRAFFICKING)))
38. Inference rules
● Work on semantic level
● Do not check any “physical” features, except for
distance
● As a consequence cover much more language
phenomena than patterns (including stylistic
variations)
● Do not depend on language (sic!)
● Cannot be used without patterns (do not
precise enough)
39. Patterns
System
expansion to = Patterns + Lexicon
another language
● All other parts (at least theoretically) can be
borrowed from the existing system
● Key question: what is the optimal from for
Russian patterns?
40. First idea: just copy English patterns
● Subject – Verb – Object
● Check all the grammar agreement (to
distinguish subject and object)
● Word order alterations need additional patterns
- though they may share the same constraints
or actions
41. Reasons
● Russian has a flexible word order
● However, some word orders are more
preferable
● News use a standard language: many clichés,
common constructions, bureaucratic
collocations etc.
● That is why we can apply Information Extraction
42. Reasons
● Russian has a flexible word order
● However, some word orders are more
preferable
● News use a standard language: many clichés,
common constructions, bureaucratic
collocations etc.
● That is why we can apply Information Extraction
Well, it was too optimistic
43. All variants are possible
Полиция арестовала преступника
Полиция преступника арестовала
Арестовала преступника полиция
Преступника полиция арестовала
Преступника арестовала полиция
Арестовала полиция преступника
A police arrested a perpetrator
44. All variant are possible
Полиция арестовала преступника
Полиция преступника арестовала, а не оштрафовала
Арестовала преступника полиция, а не таможня
Преступника полиция арестовала в тот момент, когда он
пытался пересечь границу
Преступника, который пять лет скрывался от закона, в конце
концов арестовала полиция
Арестовала наша доблестная полиция преступника только после
того, как поступил звонок “сверху”
45. All variant are possible
Even in news word order depends on:
- information structure (topic – focus)
- relative clauses added to subject (object) may change word order
- stylistic features, e.g. irony
Patterns have to catch other types of clauses:
- passive
- relative clauses (...perpetrator, who was arrested by police...)
- participle clause (...perpetrator arrested by police...)
For English we have a paraphrase module, which produces all these forms from
an active clause. It would be quite useful to make such a system for Russian.
However, it is clear that fixed word order in patterns leads to unnecessary
growth of the pattern base.
46. Preliminary results
● Word order is non-informative
● However, the existing pattern search algorithm
based on a fixed order and it would be too
painful to change it
● Another solution: patterns as triggers that
create events
● Inference rules responsible for specification and
extra slots filling
● And no need to change logic – only knowledge
bases!
47. Preliminary results
● Word order is non-informative
Similar to:
● However, the existing pattern search Machine
Tanev, H., Zavarella, V., etc.: Exploiting algorithm
Learning Techniques to Build an Event Extraction
based on a fixed order andandwould LINGUAMÁTICA
System for Portuguese it Spanish. be too
painful to change2, 55–66 (2009)
Journal it
and other papers by the authors
● Another solution: patterns as triggers that
create events
● Inference rules responsible for specification and
extra slots filling
48. Experiment
● Single-word (collocation) pattern
● No grammar is checked
● Pattern checks only semantic class:
class-or(c-illegal-activity, c-authority-activity)
49. Single-word pattern
class-or(c-illegal-activity, c-authority-activity)
● Semantic:
– c-authority-activity includes c-report (announce)
→ too general
– the majority of texts devoted to illegal activity
are reviews or new about some general
events (conferences, government programs
etc.)
– we need events related to arrest, or sentence,
or deportation – it was not clear a priori,
before the experiment
50. Single-word pattern
class-or(c-illegal-activity, c-authority-activity)
● Ambiguity:
– same verbs are used in legal and common context
– to accuse smb. of telling lies - to accuse smb. as a thief
● In some cases syntax determines event type:
– A policeman caught a perpetrator → ARREST
– A policeman was caught by a perpetrator → KIDNAPING
– In Russian only cases are different
Полицейский поймал преступника
Полицейского поймал преступник
Syntax is unavoidable.
51. Final pattern form
● Trigger (verb, or participle, or nominalisation) +
object
● Two words is much better than three – less
variants, less permutations
● For now we analyze the following constructions:
VERB + NOUN
NOUN + VERB (<policeman> arrested migrant)
PARTICIPLE + NOUN
NOUN + PARTICIPLE (migrant is arrested)
NOUN + NOUN (arrest of a migrant)
54. Inference rules
● In most of the cases pattern creates CRISIS event
● Inference rules specify event type, fill additional slots (at least
locations)
● Working on the Russian system we succeed to use without
major differences inference rules developed previously for the
English system
● Furthermore, we added several rules, which now work for both
Russian and English:
– transplantation → HUMAN-TRAFFICKING-ORGANS;
– border guards → MIGRATION-ILLEGAL-ENTRY;
– customs officers → SMUGGLING.
55. Additional patterns
● I'd like to put all the Information Extraction into
one “clever” pattern
● For different reasons it is not always possible:
– Different semantic of object
● DETERMINE-CONTRABAND (police arrested a
contraband)
– Different semantic of verb
● CROSS-BORDER (a perpetrator crossed a border)
– Stylistic features
● RUS-PROSECUTE (Russian collocation for “prosecute”
used in legal context only and may be used alone)
56. Stylistic features
arrest
ARREST
hold, break, stop...
ARREST
Russian common
SENTENCE words
CHARGE
Russian
bureaucratic/legal
ACCUSE
terms
PROSECUTE
57. Stylistic features
arrest
ARREST
hold, break, stop...
ARREST
Russian common
SENTENCE words
CHARGE
Russian
bureaucratic/legal
ACCUSE
terms
PROSECUTE
58. Is it possible to use stylistic
features in more clever way?
● All the Russian words are dictionary units
● Any grouping is made with ontology units
only
● Attempts to group words lead to unmotivated
ontology increase with artificial concepts
● Working with Russian system we cleaned up
the ontology – all English related concepts
were moved to the lexicon
● For now patterns do not distinguish common
and legal words
59. Patterns vs. inference rules
● Patterns: need an exact ontology
– A person arrested on a border → ILLEGAL-ENTRY
– Goods arrested on a border → SMUGGLING
● Rules: need a thesaurus
– Border, border-guard, illegal entry → ILLEGAL-ENTRY
– Customs, customs-officer, contraband → SMUGGLING
● Concept base:
– balance between exactness and completeness;
– contradictions between patterns and rules and also
between languages
60. Evaluation
● 64 documents
● Part of them marked up in advanced before
system development started
● Another part based on early prototype results
(made by St.Petersburg State University
students)
● 65 event
● One third of documents contains events
61. Evaluation
● 64 documents
● Part of them marked up in advanced before
system development started
● Another part based on early prototype results
(made by St.Petersburg State University
students)
● 65 event
● One third of documents contains events
Recall Precision F-measure
Russian system 47 34 39.1
English system 48 45 46.15
62. Evaluation
Russian system English system
Recall Precision Recall Precision
TYPE 63 51 64 63
SUBTYPE 39 30 28 29
COUNTRIES 47 41 54 57
LOCATION 35 25 27 14
SUSPECT 57 48 44 46
TOTAL 42 35 32 90
OPERATIONAL
ACTIVITY 82 17 58 62
ACTING
AUTHORITY 0 0 25 25
TIME 0 0 47 45
ALL SLOTS 46 33 48 45
F-MEASURE 39.01 46.15
63. Evaluation
Russian system English system
Recall Precision Recall Precision
TYPE 63 51 64 63
SUBTYPE 39 30 28 29
47 41 54
COUNTRIESThe numbers reflect not the system performance only 57
but also peculiarity of the keys themselves.
LOCATION 35 25 27
A correct and well-balanced test suite development is 14
SUSPECT a challenging task itself. 48
57 44 46
TOTAL 42 35 32
The test suite is regularly specified and amplified. 90
OPERATIONAL
ACTIVITY 82 17 58 62
ACTING
AUTHORITY 0 0 25 25
TIME 0 0 47 45
ALL SLOTS 46 33 48 45
F-MEASURE 39.01 46.15
64. Inference rules contribution
RUSSIAN SYSTEM
PATTERNS + RULES PATTERNS ONLY
Recall Precision Recall Precision
TYPE 63 51 40 33
SUBTYPE 39 30 10 8
ALL SLOTS 46 33 28 34
F-MEASURE 38.62 30.87
ENGLISH SYSTEM
PATTERNS + RULES PATTERNS ONLY
Recall Precision Recall Precision
TYPE 64 63 31 32
SUBTYPE 28 29 16 17
ALL SLOTS 48 45 34 43
F-MEASURE 46.15 37.85
65. Further work
● Patterns:
– Patterns specification, in particular filling of missing slots
– Expansion to different syntax structures (related
clauses, participle clauses etc.)
– Implementation of date and time pattern set for Russian
– Implementation of Named Entity Recognition pattern set
(in addition to what is done by AOT)
● Ontology:
– Other relation types
– Inference rules improvement using new relations
– Futher lexical sources development and normalization