This document proposes a new semantic relatedness measure based on representing words as co-occurrence networks instead of vectors. It addresses two key issues: 1) defining network operations to represent phrases and 2) measuring similarity between networks using a graph kernel. The approach is evaluated on tasks like synonym finding, word sense disambiguation, and translation disambiguation, showing improved performance over vector-based baselines.
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
paper at ICML 2019; "L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA"
openr eview link : https://openreview.net/forum?id=S1E3Ko09F7
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
paper at ICML 2019; "L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA"
openr eview link : https://openreview.net/forum?id=S1E3Ko09F7
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks, followed by a TypeScript-based code sample that replicates the Tensorflow playground. Basic knowledge of matrices is helpful for this session.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of convolutional neural networks. Concepts covered are image pixels, image preprocessing, feature detectors, feature maps, convolution, ReLU, pooling and flattening.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required. Some knowledge of neural networks is recommended.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Synthetic dialogue generation with Deep LearningS N
A walkthrough of a Deep Learning based technique which would generate TV scripts using Recurrent Neural Network. The model will generate a completely new TV script for a scene, after being training from a dataset. One will learn the concepts around RNN, NLP and various deep learning techniques.
Technologies to be used:
Python 3, Jupyter, TensorFlow
Source code: https://github.com/syednasar/talks/tree/master/synthetic-dialog
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Slides reviewing the paper:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in Neural Information Processing Systems, pp. 6000-6010. 2017.
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
Time series data is increasingly ubiquitous. This trend is especially obvious in health and wellness, with both the adoption of electronic health record (EHR) systems in hospitals and clinics and the proliferation of wearable sensors. In 2009, intensive care units in the United States treated nearly 55,000 patients per day, generating digital-health databases containing millions of individual measurements, most of those forming time series. In the first quarter of 2015 alone, over 11 million health-related wearables were shipped by vendors. Recording hundreds of measurements per day per user, these devices are fueling a health time series data explosion. As a result, we will need ever more sophisticated tools to unlock the true value of this data to improve the lives of patients worldwide.
Deep learning, specifically with recurrent neural networks (RNNs), has emerged as a central tool in a variety of complex temporal-modeling problems, such as speech recognition. However, RNNs are also among the most challenging models to work with, particularly outside the domains where they are widely applied. Josh Patterson, David Kale, and Zachary Lipton bring the open source deep learning library DL4J to bear on the challenge of analyzing clinical time series using RNNs. DL4J provides a reliable, efficient implementation of many deep learning models embedded within an enterprise-ready open source data ecosystem (e.g., Hadoop and Spark), making it well suited to complex clinical data. Josh, David, and Zachary offer an overview of deep learning and RNNs and explain how they are implemented in DL4J. They then demonstrate a workflow example that uses a pipeline based on DL4J and Canova to prepare publicly available clinical data from PhysioNet and apply the DL4J RNN.
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks, followed by a TypeScript-based code sample that replicates the Tensorflow playground. Basic knowledge of matrices is helpful for this session.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of convolutional neural networks. Concepts covered are image pixels, image preprocessing, feature detectors, feature maps, convolution, ReLU, pooling and flattening.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required. Some knowledge of neural networks is recommended.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Synthetic dialogue generation with Deep LearningS N
A walkthrough of a Deep Learning based technique which would generate TV scripts using Recurrent Neural Network. The model will generate a completely new TV script for a scene, after being training from a dataset. One will learn the concepts around RNN, NLP and various deep learning techniques.
Technologies to be used:
Python 3, Jupyter, TensorFlow
Source code: https://github.com/syednasar/talks/tree/master/synthetic-dialog
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Slides reviewing the paper:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in Neural Information Processing Systems, pp. 6000-6010. 2017.
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
Time series data is increasingly ubiquitous. This trend is especially obvious in health and wellness, with both the adoption of electronic health record (EHR) systems in hospitals and clinics and the proliferation of wearable sensors. In 2009, intensive care units in the United States treated nearly 55,000 patients per day, generating digital-health databases containing millions of individual measurements, most of those forming time series. In the first quarter of 2015 alone, over 11 million health-related wearables were shipped by vendors. Recording hundreds of measurements per day per user, these devices are fueling a health time series data explosion. As a result, we will need ever more sophisticated tools to unlock the true value of this data to improve the lives of patients worldwide.
Deep learning, specifically with recurrent neural networks (RNNs), has emerged as a central tool in a variety of complex temporal-modeling problems, such as speech recognition. However, RNNs are also among the most challenging models to work with, particularly outside the domains where they are widely applied. Josh Patterson, David Kale, and Zachary Lipton bring the open source deep learning library DL4J to bear on the challenge of analyzing clinical time series using RNNs. DL4J provides a reliable, efficient implementation of many deep learning models embedded within an enterprise-ready open source data ecosystem (e.g., Hadoop and Spark), making it well suited to complex clinical data. Josh, David, and Zachary offer an overview of deep learning and RNNs and explain how they are implemented in DL4J. They then demonstrate a workflow example that uses a pipeline based on DL4J and Canova to prepare publicly available clinical data from PhysioNet and apply the DL4J RNN.
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsJames Salter
Presented at the Fourth International Network Conference (INC 2004), Plymouth, UK, 6 July 2004. [Winner of Best Paper Award]
Abstract: Computational Grids are designed to bring together collections of resources distributed among diverse physical locations, allowing an individual to exploit a huge amount of computing power, specialist instruments and vast databases. It is essential that an effective method of resource discovery is available for users and software agents to find the resources they require. We present an initial model for resource discovery in Grid environments, designed to remove the need for broadcast of updates and queries across the network. We compare our system with several others in terms of the number of messages needed to query for resources and the ability to guarantee to find matching resources if they exist anywhere in the network.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
Architectural decisions in designing data and computation intensive systems can have a major impact on the ability of these systems to perform statistical and other complex calculations efficiently. The storage, processing, tools, and associated databases coupled with the networking and compute infrastructure make some kinds of computations easier, and other harder. This talk will provide an introduction to software and data systems components that are important for understanding how these choices impact data analysis uncertainties and costs, and thus for developing system and software designs best suited to statistical analyses.
Survey for recursive neural networks. Including recursive neural network (RNN), recursive autoencoder (RAE), unfolding RAE & dynamic pooling, matrix-vector RNN (MV-RNN), and recursive neural tensor network (RNTN), published by Socher et al.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Impact of front-end architecture on development cost", Viktor Turskyi
Tg noh jeju_workshop
1. A Semantic Relatedness Measure Based on Co-occurrence Network and Graph Kernel Kyungpook National University 노태길 (Tae-Gil Noh) tailblues@me.com 2011년 1월 20일,ACL 유치 기념 워크샵
2. Overview A new semantic relatedness measure From co-occurrence observations on raw corpus For words and phrases Improving Vector space model with Network representations Similarity measure in a kernel space Co-occurrence observations compared by kernel R-convolution Kernel and Graph Kernel
3. Introduction: Semantic relatedness measure Measuring semantic distance between two terms/phrases Also known as semantic similarity, or semantic distance. A tool that can be used in various NLP situations. Examples Which is semantically closer to “orange juice”? 음료수(Drinks) 향신료(Spice) Which sense describes the term better in the context? 1) “Apple launched a new device …”, 2) “Apple is my favorite fruit, second only to …” A) A company famous with its iPhone and iPod. B) A fruit with lots of Vitamin-C, shiny red or green skins, …
4. Semantic Relatedness, with Lexical Resources By Lexical Resources WordNet, Thesauri, Ontologies, … Pros Reliable data generated by lexicographers. Detailed relationships between lexicons. Cons Generated by human, high cost. Not readily available for minor languages. There are always new / unlisted entries.
5. Semantic Relatedness, with Corpus Semantic Relatedness based on corpus By observationson unlabeled corpus. Measuring relatedness to a numerical value Various methods Co-occurrence Vectors Mutual Information (PMI) Rank reducing (LSA, random projection, ESA) Topic Models (PLSA, LDA, CTM)
6. Semantic Relatedness, with Corpus Corpus based methods generally approaches Two terms are “semantically close” if; “occurs in similar documents” Shares similar distributions among documents “co-occurs with similar terms” Shares common co-occurring terms Occurrences / co-occurrences are generally expressed as vectors; Vectors themselves are used as representations, Or refined by mathematical/statistical methods Weighting schemes, rank-reduce, higher-order vector, random-projection, topic estimation, etc
8. Motivation Network has “more structure” A network of terms can be seen as a relaxation of “Bag-of-words” (independent) assumption. Previous work showed: structure of a co-occurrence network can be used to induce senses [Veronis 2005]. However, network itself was never used as a representation before. What if, co-occurrence vectors are replaced by co-occurrence networks? What tools are needed? Can we gain some performance improvements?
9. An example of capturing co-occurrences; as a vector, or as a network. A data disc can contain anything; system files,, ... Eject the system disc by pressing ... This is their best concert on disc. On the double disc soundtrack, the orchestra have ... Disc of the year & best orchestra winner is announced by ... concert data disc-{ data, system, files } disc-{ system, } disc-{ concert } disc-{ soundtrack, orchestra} disc-{ year, orchestra, winner } 1 1 1 soundtrack system 1 disc 1 2 1 1 winner 1 1 2 1 1 files concert, data, files, orchestra, soundtrack, system, winner, year ( 1, 1, 1, 2, 1, 2, 1, 1 ) 1 orchestra year
10. Replacing co-occurrence vectors with co-occurrence networks A Co-occurrence Patterns CosSim(A,B) L1dist(A,B) EucDist(A,B) B Vector Representation Applying Similarity/Distances functions of vector domain A Co-occurrence Patterns B NetworkSIM(A,B) Applying Network similarity measure Network representation
11. Using co-occurrence network as a direct representation Evaluating gains on on some NLPtasks Compare the performance with vector-basedbaselines, unsupervised state-of-the-arts Tasks like Synonym finding TOEFL-synonym test set Word sense disambiguation General domain & Biomedical domain Annotation translation Automatic translation of FLICKR tags
12. Two basic issues of using network representations Expressing phrases How to compose an expression of a phrase? In vector semantic spaces, vector summation/multiplications are used to represent phrases. Equivalent network operations must be defined. Comparing two networks Given two network representations, how their similarity can be calculated? A network similarity measure equivalent to cosine similarity is needed.
13. An example WSD setup LSO + disc = context2 Microsoft + disc = context1 A WSD setup from [Wilks, 1990] & [Schütze, 1998] This is also sometimes called as a modified Lesk algorithm. Vsense1 (vdisc+vdiskette+vmagnetic disc) Vcontext1 “Microsoft will replace your disc, if its within ...” {Microsoft, disc} θ Vcontext2 “Previn and the LSO on the front of any disc were ..." {LSO, disc} Vsense2 (vphonograph+vrecord+vsound recording) WordNet Senses (synsets) Disc sense-1: {disk, diskette, magnetic disc} Disc sense-2: {phonograph, record, sound recording}
14. The two issues in WSD setup(network case) disc sense - 1 network of {Microsoft, disc} {Microsoft, disc} Microsoft disc network of {disk, diskette, magnetic disc} (+) (2)? (1)? disc sense - 2 LSO disc (+) network of {LSO, disc} {LSO, disc} network of {phonograph, record, sound recording}
15. Issue #1Network operators Generating context (multi-term) networks from single-term networks. Two network operations; Network Union Equivalent to vector summation Network Intersection Similar to vector multiplications in effect Defined as matrix operations Since networks are represented as adjacency matrices
17. Network of Disc Network of Disc & LSO Network of Disc & Microsoft
18. Issue #2Similarity measure for networks Cosine similarity is a normalized dot product Graph kernel Dot product of two “Graph structure”. Graph kernels have been used in biomedical domains to compare proteins and genes. A R-convolution kernel A way to systemically define kernels for structures. In language processing, tree kernels are the most widely used case of R-convolution kernel.
19. Random walk graph kernel The most widely used graph kernel. It compares two graphs by Measuring common random walks numerically The result is a dot product value in an infinitely high dimension, where each dimension is each possible random walk It has “tottering” issue A well known problem: kernel effectiveness is severely limited by counting cycles again and again. I have proposed an efficient acyclic version, that can be used if all node labels are unique. i.e. co-occurrence networks
22. Simplest possible sub kernels Nodekernel Delta Kernel (exact match) Edge kernel Brownian Bridge Kernel
23. Two issues solved; previous WSD setup Networks of Candidate senses (union & intersection) Network of Context Terms observed in Context of target term t1 t2 tn ... (1) Network Operations (union) (2) Network kernel (similarity function) ....
24. Synonym Test Synonym Test Finding synonym from given candidates ex) grin: {exercise, rest, joke, smile} applying Selecting the most similar candidate in terms of normalized dot product. Testset TOEFL Synonym test set (Landauer 1997) Training corpus British National Corpus (BNC-XML)
26. Synonym Test Summary Within same conditions, various parameters Same corpus, same sampling method. Network performs about 3 points better in average. But statistically insignificant. Only 80 tests in this test set. Network similarity is less sensitive to the context window size.
27. Word Sense Disambiguation WSDis, Sense disambiguation example: term “disc” Phrase 1) Previn and the LSO on the front of any disc was... Phrase2) Microsoft will replace your disc, if it’s within … Sense candidates Sense1) Disc as “Phonograph, record, recording” Sense2) Disc as “Magnetic disc” A task to assign a sense from the candidates Again, selecting the most similar sense candidate in terms of kernel similarity value.
28. Word Sense Disambiguation General Domain Test set: SensEval-3 lexical sample data Sense candidates: WordNet Senses Corpus: BNC-XML Sense expressions SynsetUnion/Intersection Context expressions Union of phrase terms 4+ point performance gain Statistically significant Network version is comparable to state-of-the-art unsupervised WSD Supervised Unsupervised
29. Word Sense Disambiguation WSD Accuracy on Biomedical WSD test set Biomedical Domain Test set: extended NLM Dataset Corpus: PubMed open subset Same representation for senses and context. Sense candidates from UMLS- Metathesaurus Average number of senses were: 2.4 Outperformed baseline vector method nearly 10+points
30. Flickr tag translation Tag translation Translation Disambiguation Finding proper translation for given term Spring, Field, Flowers {spring of season=(봄, Frühjahr), spring as a mechanical device=(스프링, Sprungfeder), hot/water springs=(샘, Brunnen ) … } Experiments on MIRFLICK 25000 Image Translating English tags in image number 1 to 1000, from English to German. Baseline method (state-of-the-art) Coherence (Mutual Information) based method. A method that selects the most co-occurring translation candidates in the target language corpus. {spring, field, flowers}
31. Tag translation wood : Holz (wood as material), Wald (forest) desk : Schalter (a counter), Schreibtisch (a desk for reading/writing), Tisch (a table) {wood, desk} {Holz, Schalter},{Holz, Schreibtisch}, {Holz, Tisch}, {Wald, Schalter},{Wald, Schreibtisch},{Wald, Tisch} (1) (2) {Holz, Schreibtisch} {Wald, Schreibtisch} {wood, desk} {Holz, Tisch}
32. Tag translation Candidates are notsenses, but target language networks Incompatible node labels! Target network nodes have German labels Adopting a node kernel with machine-readable dictionary
33. Tag translation result Targets 3696 tags that are listed in the dictionary, among 5899 unique tags. 965 among 3696 only had single translation. Outperformed the coherent based translation nearly 5%.
34. Summary Network as Semantic Representation Co-occurrence Network to replace co-occurrence vectors Performance gains In some NLP tasks that needs semantic relatedness measures, the network-based representations constantly outperformed equivalent vector representations. Co-occurrence network and the associated kernel They can be used in applications that uses co-occurrence vectors and cosine similarity Language resources can be adopted to the kernel with minimal impact, by modifying sub-kernels. One notable shortcoming is that the kernel operation is so much slower than the cosine similarity calculation.
35. Please remember this, even if you forget everything else! (Not true) Data mined from corpus should be represented as vectors. There are well established mathematical methods to compare data captured in forms of structures. R-Convolution Kernel (Not true) Kernels are for kernel machines. A kernel is just a dot product in a higher dimensional space. without explicitly generating that high dimension – kernel trick Kernels are essential in kernel machines (i.e. SVMs), but a kernel can be useful just as a dot product itself.