Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
Вычислительная лексическая семантика: метрики семантической близости и их приложения
Серия лекций в НИУ ВШЭ, факультет бизнес-информатики и прикладной математики (Нижний Новгород)
Jeffrey A. Bourque has over 20 years of experience in marketing communications, advertising, and sales. He has a proven track record of developing and implementing successful marketing strategies and targeting new accounts. Bourque is knowledgeable in analytics, presentations, software, and various marketing channels including print, online, social media, and trade shows. He is currently seeking a new position in marketing communications.
Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
Вычислительная лексическая семантика: метрики семантической близости и их приложения
Серия лекций в НИУ ВШЭ, факультет бизнес-информатики и прикладной математики (Нижний Новгород)
Jeffrey A. Bourque has over 20 years of experience in marketing communications, advertising, and sales. He has a proven track record of developing and implementing successful marketing strategies and targeting new accounts. Bourque is knowledgeable in analytics, presentations, software, and various marketing channels including print, online, social media, and trade shows. He is currently seeking a new position in marketing communications.
The document discusses several aspects of Chinese culture, including:
1) China has a unique one-child policy implemented by the government to control population growth, which places pressure on families to have a son.
2) In China, smiling can have different meanings than in America - it may indicate embarrassment, helpfulness, curiosity, or that one does not want an argument to become personal.
3) In Chinese negotiations, do not lose your temper or use pressure tactics, as this will damage relationships. Negotiations are process-oriented and leave room for flexibility.
Often teams are on different locations in the enterprise. If you work with agile methode as Scrum or Kanban this is difficult to handle. This short presentation shows you some ideas and pitfalls about this topic.
This document provides personas and descriptions of different types of product managers. It describes Martha, a seasoned product manager who manages an established product in an automated way focusing on performance and process. It describes Barney, an advocate focused on managing stakeholder expectations and ensuring happy customers. It describes Sabrina, a strategic marketer who focuses on developing comprehensive solutions and telling the full story to prospects. It describes Jess, a director of product delivery focused on new features and technologies.
Glen Gatin presented his dissertation research which developed the grounded theory of "Keeping Your Distance" (KYD). KYD explains how people use physical, emotional, and psychological distancing strategies to regulate interactions and conserve personal energy. Some key findings include:
- People use different levels of distance as a way to feel safe, autonomous, and to manage their energy levels.
- Distancing strategies include increasing physical space, symbolic/internal distancing, and compartmentalizing.
- An "algorithm of engagement" determines how much distance one places between themselves and others in different situations.
- Technology can impact notions of distance, requiring new applications of the KYD theory.
The document discusses Adobe Flex 4.6 and its capabilities for mobile application development. It outlines how Flex allows developing once for multiple mobile platforms like Android, iOS, and others. It highlights features like automatic scaling of user interface elements for different device densities. The document also discusses Adobe's continued support for Flex through contributions to the Apache Flex project.
The document discusses a new approach to enterprise innovation proposed in the BIVEE project. It involved 10 partners over 3 years to develop an enterprise innovation platform. The platform spans theoretical, methodological, and technological aspects. It was tested in robotics/automation and wood/furniture industries. The book outlines competencies, guidelines, and instructions for implementing this new approach to enterprise innovation focusing on SME networks. Traditional knowledge management is not suited for innovation purposes, but ontologies, semantic wikis, social media, and cooperation tools show promise when combined with human creativity and intelligence.
The document discusses challenging employees in the workplace. It defines a challenging employee as someone who disrupts or hinders productivity, whether consciously or subconsciously. It notes that employee dissatisfaction can cause issues like loss of clients, decreased productivity, and increased absenteeism. The document provides tips for dealing with difficult situations and personalities in an objective manner to avoid legal problems or favoritism. It emphasizes giving constructive feedback to strengthen trust and morale while recruiting employee participation in goal setting.
El documento presenta una lista de imágenes recogidas por el telescopio Hubble del espacio exterior, incluyendo galaxias, nebulosas, planetas y otros objetos celestes como agujeros negros y estrellas. El autor envía un saludo al final.
Groups of 4 students with different colored dots will brainstorm how authoring environments can be used across communication modes. They will then choose one idea to develop into an activity that connects across modes of communication, situates the activity in a lesson, includes an example, and provides an assessment tool.
This document discusses home remodeling and restoration services including restoring stairs, customizing bathrooms, and renovating office spaces. A variety of services are offered such as choosing custom colors and restoring hardwood floors. The goal is to help make clients' dreams for their home renovations come true.
МЕТОД НАВИГАЦИИ ПО ТЕКСТУ ДОКУМЕНТА С ПОМОЩЬЮ АВТОМАТИЧЕСКОЙ ОБРАБОТКИ ЕГО СО...ITMO University
Описывается подход, который может быть использован в качестве альтернативы автоматическому реферированию текста. Суть подхода заключается в формировании представлений исходного текста и возможности перемещаться по его содержанию с помощью этих представлений – от общего представления к более конкретному представлению и обратно. Представления формируются на основании методов автоматической обработки текста – статистических методов и поверхностного лингвистического анализа. В работе дано формализованное описание подхода, а также рассмотрена реализация на основе реляционной базы данных.
Classification and clustering in media monitoring: from knowledge engineering...Lidia Pivovarova
This PhD thesis examines classification and clustering techniques for media monitoring, including news grouping, multi-label text classification, and business polarity detection. It focuses on applying these methods to the PULS media monitoring system, which collects over 10,000 news articles daily. The thesis contributes novel algorithms and datasets for grouping news into stories based on named entity salience, large-scale multi-label text classification balancing training sets, and the first dataset and methods for entity-level business polarity detection.
The document describes a Russian paraphrase corpus created by the authors. It contains over 8000 sentence pairs annotated as precise, loose, or non-paraphrases using crowdsourcing. The corpus was collected from news headlines and aims to capture the most important events. The authors evaluate different models for classifying sentence pairs and find that combining linguistic features improves performance over individual feature types. Graphs built from the corpus can reveal connected events more completely than human annotations alone.
This document discusses the work of Antiplagiat Research, which tackles challenging natural language processing and plagiarism detection problems. It outlines their focus on cross-language plagiarism detection, machine-generated text detection, and intrinsic plagiarism detection. It also describes Antiplagiat Research's collaboration opportunities and their participation in evaluating plagiarism detection algorithms through workshops like Dialogue Evaluation.
This document summarizes a study that analyzed 47,410 Instagram images from Saint Petersburg over one year to understand human experience in different urban areas. The images were clustered using Google tags and user hashtags into topics like portraits, cars, flowers. The clusters were mapped geographically to see their spatial distribution. Clusters like hairstyle and animals were evenly distributed, while clothing, fitness and architecture were more detached, indicating urban segregation. The combination of semantic and geospatial analysis of social media images provided new insights into urban life not previously available from traditional data sources.
The document discusses the Pullenti NER Engine and its use in semantic similarity tasks. It presents the Semantics-Oriented Linguistic Processor (SOLP) which establishes text segments containing similar semantic units. It then describes the hybrid linguistic and machine learning approach used by the Pullenti-based engine, including the two-step Semantic Expansion Algorithm. Performance figures and evaluation metrics for Pullenti's named entity recognition are also provided.
The document discusses the reliability of results from corpus research and introduces a solution called GICR that provides automatic result analysis. GICR allows users to see statistics on search areas to check for bias or lack of homogeneity compared to the entire corpus by displaying metadata attributes like URLs, document IDs, author information, region, gender, and genre. It aims to address the problem that simply getting IPM and KWIC search results does not indicate if the results are biased by providing analysis directly in the interface.
This document discusses methods for estimating a user's actual age and gender when those values are not directly provided. It outlines using social graph analysis, natural language processing, analyzing user interests, and statistical methods. For social graph analysis, it examines using connections like classmates to infer age and analyzing local graph properties. NLP looks at gender-specific language in user profiles while interest analysis matches users to gender-biased communities. Statistics applies overall patterns in the data to make estimations.
This document presents mathematical models of information dissemination and warfare. It discusses:
1) Models of information spreading through both vertical (centralized) and horizontal (interpersonal) flows, and how the combination of these determines information dynamics in society.
2) Models of information adoption and forgetting over time, and the effects of incomplete media coverage and two-step perception.
3) Models of information warfare between two information sources, examining the necessary conditions for one to win over the other.
4) Extensions of these models including periodic destabilization, additional factors like forgetting, and a model of individual choice-making during information warfare.
This document discusses the analysis and modeling of complex systems. It describes analyzing the problem, modeling the system, and determining both quantitative and qualitative parameters. An example is given of assigning weights to different quantitative parameters. The document recommends creating a coordinate system and basis to define qualitative parameters. It formulates the final task as creating a concept for a basis of a quality parameter system. It seeks colleagues to partner with on further developing these analysis methods.
This document discusses trend detection at OK. It describes the multi-step process used: text extraction from logs, language detection, tokenization, dictionary extraction, vectorization, deduplication, statistics calculation, trend identification, clustering of trending terms, extraction of relevant documents, and visualization of trends. Both batch and streaming approaches are discussed to address the need for timely trend detection. Technologies used include Apache Kafka, YARN, Spark, Samza, Lucene and ELKI.
1. The researcher analyzed quantitative characteristics such as entropy, readability, lexical diversity, frequencies of words, and parts of speech for different text genres including scientific texts, news articles, and student writings.
2. The analysis found that student writings had higher entropy and readability than news articles or scientific texts. News articles had higher lexical diversity and frequencies of common words.
3. To evaluate the accuracy of a developed Old Irish lemmatizer, the researcher applied it to a test corpus of 840 tokens, of which 186 were unknown words. The lemmatizer correctly predicted lemmas for 84 of the unknown words, achieving an accuracy of around 60% for unknown words.
This document discusses methods for evaluating clustering validity indices (CVIs) that measure the quality of clustering results. It proposes using human assessments of clustered data as ground truth to evaluate how well different CVIs match human judgments. An experimental evaluation of 19 CVIs on 41 datasets clustered using 6 algorithms showed that none of the CVIs perfectly matched human assessments. The document concludes that while no universal CVI exists, meta-learning from past human assessments could help select the most appropriate CVI for a new clustering problem.
The document provides information on various artificial intelligence and voice assistant technologies including:
1) JUST AI and Eugene Goostman chatbot, a winner of the 2014 Turing 100 Chatbots competition.
2) Everyday Assistant, a voice assistant available on mobile devices.
3) Dusi Voice Assistant with over 1 million downloads on Google Play.
4) Era of messengers for chatting with personal assistants without voice.
5) ElSmart, the first Android phone for blind users.
6) Zenbot, an open source framework for developing voice assistants across platforms.
This document proposes a data augmentation method for image sentiment analysis using hashtags. It involves collecting a small set of manually labeled images and their hashtags, learning to predict sentiment labels from the hashtags using machine learning, and using this model to automatically label more images. Preliminary results show the hashtag-predicted labels match human labels with 83-95% accuracy. However, more testing is needed on a general set of images to fully evaluate the method's effectiveness.
This document proposes a method for continuous time series alignment in human action recognition. It defines continuous versions of time series, warping paths, and the dynamic time warping (DTW) distance. The method finds the optimal continuous warping path by approximating solutions to a cost minimization problem. An experiment applies the continuous DTW to classify human activities from accelerometer data, achieving classification accuracy close to the discrete DTW method. The continuous approach solves issues with resampling data and has potential for improved approximations and optimization methods.
4. The Modules of the Summarization Machine E X T R A C T I O N I N T E R P R E T A T I O N EXTRACTS ABSTRACTS ? CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS MULTIDOC EXTRACTS G E N E R A T I O N F I L T E R I N G DOC EXTRACTS