Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Intent Classifier with Facebook fastText
Facebook Developer Circle, Malang
22 February 2017
This is slide for Facebook Developer Circle meetup.
This is for beginner.
Code cloning negatively affects industrial software and threatens intellectual property. This paper presents
a novel approach to detecting cloned software by using a bijective matching technique. The proposed
approach focuses on increasing the range of similarity measures and thus enhancing the precision of the
detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating
how matches between code fragments of different files can be expressed. A prototype of the proposed
approach is provided using a proper scenario, which shows a noticeable improvement in several features
of clone detection such as scalability and accuracy.
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
Rabin-Karp is one of the algorithms used to detect the similarity levels of two strings. In this case, the string can be either a short sentence or a document containing complex words. In this algorithm, the plagiarism level determination is based on the same hash value on both documents examined. Each word will form K-Gram of a certain length. The K-Gram will then be converted into a hash value. Each hash value in the source document will be compared to the hash value in the target document. The same number of hashes is the level of plagiarism created. The length of K-Gram is the determinant of the plagiarism level. By determining the proper length of K-Gram, it produces the accurate result. The results will vary for each K-Gram value.
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Intent Classifier with Facebook fastText
Facebook Developer Circle, Malang
22 February 2017
This is slide for Facebook Developer Circle meetup.
This is for beginner.
Code cloning negatively affects industrial software and threatens intellectual property. This paper presents
a novel approach to detecting cloned software by using a bijective matching technique. The proposed
approach focuses on increasing the range of similarity measures and thus enhancing the precision of the
detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating
how matches between code fragments of different files can be expressed. A prototype of the proposed
approach is provided using a proper scenario, which shows a noticeable improvement in several features
of clone detection such as scalability and accuracy.
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
Rabin-Karp is one of the algorithms used to detect the similarity levels of two strings. In this case, the string can be either a short sentence or a document containing complex words. In this algorithm, the plagiarism level determination is based on the same hash value on both documents examined. Each word will form K-Gram of a certain length. The K-Gram will then be converted into a hash value. Each hash value in the source document will be compared to the hash value in the target document. The same number of hashes is the level of plagiarism created. The length of K-Gram is the determinant of the plagiarism level. By determining the proper length of K-Gram, it produces the accurate result. The results will vary for each K-Gram value.
Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
This presentation about comparing different word embedding models and libraries like word2vec and fastText, describing their difference and showing pros and cons.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two reformulations. In this paper, we study the distributed representation of queries learnt by deep neural network models, such as the Convolutional Latent Semantic Model, and show that they can be used to represent query reformulations as vectors. These reformulation vectors exhibit favourable properties such as mapping semantically and syntactically similar query changes closer in the embedding space. Our work is motivated by the success of continuous space language models in capturing relationships between words and their meanings using offset vectors. We demonstrate a way to extend the same intuition to represent query reformulations.
Furthermore, we show that the distributed representations of queries and reformulations are both useful for modelling session context for query prediction tasks, such as for query auto-completion (QAC) ranking. Our empirical study demonstrates that short-term (session) history context features based on these two representations improves the mean reciprocal rank (MRR) for the QAC ranking task by more than 10% over a supervised ranker baseline. Our results also show that by using features based on both these representations together we achieve a better performance, than either of them individually.
Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
Clustering Algorithm for Gujarati Languageijsrd.com
Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage , complete, linkage ,average linkage, Hear no of clusters to be formed are not known, so it's all depends on the type of data set provided .Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
(Invited talk at Search Solutions 2015)
A lot of recent work in neural models and “Deep Learning” is focused on learning vector representations for text, image, speech, entities, and other nuggets of information. From word analogies to automatically generating human level descriptions of images, the use of text embeddings has become a key ingredient in many natural language processing (NLP) and information retrieval (IR) tasks.
In this talk, I will present some personal learnings from working on (neural and non-neural) text embeddings for IR, as well as highlight a few key recent insights from the broader academic community. I will talk about the affinity of certain embeddings for certain kinds of tasks, and how the notion of relatedness in an embedding space depends on how the vector representations are trained. The goal of this talk is to encourage everyone to start thinking about text embeddings beyond just as an output of a “black box” machine learning model, and to highlight that the relationships between different embedding spaces are about as interesting as the relationships between items within an embedding space.
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf
Abstract:
One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. We need data science techniques which an represent and reason effectively with this form of rich and multi-relational graph data. In this presentation, I will describe some common collective inference patterns needed for graph data including: collective classification (predicting missing labels for nodes in a network), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe three key capabilities required: relational feature construction, collective inference, and scaling. Finally, I briefly describe some of the cutting edge analytic tools being developed within the machine learning, AI, and database communities to address these challenges.
Tomoyuki Kajiwara, Kazuhide Yamamoto.
Noun Paraphrasing Based on a Variety of Contexts.
In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (PACLIC 28), pp.644-649. Phuket, Thailand, December 2014.
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
This presentation about comparing different word embedding models and libraries like word2vec and fastText, describing their difference and showing pros and cons.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
Search logs contain examples of frequently occurring patterns of user reformulations of queries. Intuitively, the reformulation "san francisco" → "san francisco 49ers" is semantically similar to "detroit" →"detroit lions". Likewise, "london"→"things to do in london" and "new york"→"new york tourist attractions" can also be considered similar transitions in intent. The reformulation "movies" → "new movies" and "york" → "new york", however, are clearly different despite the lexical similarities in the two reformulations. In this paper, we study the distributed representation of queries learnt by deep neural network models, such as the Convolutional Latent Semantic Model, and show that they can be used to represent query reformulations as vectors. These reformulation vectors exhibit favourable properties such as mapping semantically and syntactically similar query changes closer in the embedding space. Our work is motivated by the success of continuous space language models in capturing relationships between words and their meanings using offset vectors. We demonstrate a way to extend the same intuition to represent query reformulations.
Furthermore, we show that the distributed representations of queries and reformulations are both useful for modelling session context for query prediction tasks, such as for query auto-completion (QAC) ranking. Our empirical study demonstrates that short-term (session) history context features based on these two representations improves the mean reciprocal rank (MRR) for the QAC ranking task by more than 10% over a supervised ranker baseline. Our results also show that by using features based on both these representations together we achieve a better performance, than either of them individually.
Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
The project is developed as a part of IRE course work @IIIT-Hyderabad.
Team members:
Aishwary Gupta (201302216)
B Prabhakar (201505618)
Sahil Swami (201302071)
Links:
https://github.com/prabhakar9885/Text-Summarization
http://prabhakar9885.github.io/Text-Summarization/
https://www.youtube.com/playlist?list=PLtBx4kn8YjxJUGsszlev52fC1Jn07HkUw
http://www.slideshare.net/prabhakar9885/text-summarization-60954970
https://www.dropbox.com/sh/uaxc2cpyy3pi97z/AADkuZ_24OHVi3PJmEAziLxha?dl=0
Clustering Algorithm for Gujarati Languageijsrd.com
Natural language processing area is still under research. But now a day it is on platform for worldwide researchers. Natural language processing includes analyzing the language based on its structure and then tagging of each word appropriately with its grammar base. Here we have 50,000 tagged words set and we try to cluster those Gujarati words based on proposed algorithm, we have defined our own algorithm for processing. Many clustering techniques are available Ex. Single linkage , complete, linkage ,average linkage, Hear no of clusters to be formed are not known, so it's all depends on the type of data set provided .Clustering is preprocess for stemming . Stemming is the process where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and plural form.
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
(Invited talk at Search Solutions 2015)
A lot of recent work in neural models and “Deep Learning” is focused on learning vector representations for text, image, speech, entities, and other nuggets of information. From word analogies to automatically generating human level descriptions of images, the use of text embeddings has become a key ingredient in many natural language processing (NLP) and information retrieval (IR) tasks.
In this talk, I will present some personal learnings from working on (neural and non-neural) text embeddings for IR, as well as highlight a few key recent insights from the broader academic community. I will talk about the affinity of certain embeddings for certain kinds of tasks, and how the notion of relatedness in an embedding space depends on how the vector representations are trained. The goal of this talk is to encourage everyone to start thinking about text embeddings beyond just as an output of a “black box” machine learning model, and to highlight that the relationships between different embedding spaces are about as interesting as the relationships between items within an embedding space.
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf
Abstract:
One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. We need data science techniques which an represent and reason effectively with this form of rich and multi-relational graph data. In this presentation, I will describe some common collective inference patterns needed for graph data including: collective classification (predicting missing labels for nodes in a network), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe three key capabilities required: relational feature construction, collective inference, and scaling. Finally, I briefly describe some of the cutting edge analytic tools being developed within the machine learning, AI, and database communities to address these challenges.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
A toy model of human cognition: Utilizing fluctuation in uncertain and non-s...Tatsuji Takahashi
http://www.yukawa.kyoto-u.ac.jp/contents/seminar/detail.php?SNUM=51633
Tatsuji Takahashi1, Yu Kohno1,2
Seminar on science of complex systems
(organized by Yukio-Pegio Gunji),
Yukawa Institute for Theoretical Physics,
Kyoto University,
Jan. 20, 2014
1 Tokyo Denki University,
2 JSPS (from Apr., 2014)
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Entity Resolution is the task of disambiguating manifestations of real world entities through linking and grouping and is often an essential part of the data wrangling process. There are three primary tasks involved in entity resolution: deduplication, record linkage, and canonicalization; each of which serve to improve data quality by reducing irrelevant or repeated data, joining information from disparate records, and providing a single source of information to perform analytics upon. However, due to data quality issues (misspellings or incorrect data), schema variations in different sources, or simply different representations, entity resolution is not a straightforward process and most ER techniques utilize machine learning and other stochastic approaches.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
1. Probabilistic Soft Logic:
An Overview
Lise Getoor, Stephen Bach, Bert Huang
University of Maryland, College Park
http://psl.umiacs.umd.edu
Additional Contributors: Matthias Broecheler, Shobeir Fakhraei, Angelika Kimmig,
London, Alex Memory, Hui Miao, Lily Mihalkova, Eric Norris, Jay Pujara, Theo Rekatsinas, Arti Rames
2. Probabilistic Soft Logic (PSL)
Declarative language based on logics to express
collective probabilistic inference problems
Predicate = relationship or property
Atom = (continuous) random variable
Rule = capture dependency or constraint
Set = define aggregates
PSL Program = Rules + Input DB
3. Probabilistic Soft Logic (PSL)
Declarative language based on logics to express
collective probabilistic inference problems
Predicate = relationship or property
Atom = (continuous) random variable
Rule = capture dependency or constraint
Set = define aggregates
PSL Program = Rules + Input DB
4. Probabilistic Soft Logic
Entity Resolution
Entities
- People References
Attributes
- Name
Relationships
- Friendship
Goal: Identify
references that denote
the same person
A B
John Smith J. Smith
name name
C
E
D F G
H
friend friend
=
=
5. Probabilistic Soft Logic
Entity Resolution
References, names,
friendships
Use rules to express
evidence
- ‘’If two people have similar
names, they are probably the
same’’
- ‘’If two people have similar
friends, they are probably the
same’’
- ‘’If A=B and B=C, then A and C
must also denote the same
person’’
A B
John Smith J. Smith
name name
C
E
D F G
H
friend friend
=
=
6. Probabilistic Soft Logic
Entity Resolution
References, names,
friendships
Use rules to express
evidence
- ‘’If two people have similar
names, they are probably the
same’’
- ‘’If two people have similar
friends, they are probably the
same’’
- ‘’If A=B and B=C, then A and C
must also denote the same
person’’
A B
John Smith J. Smith
name name
C
E
D F G
H
friend friend
=
=
A.name ≈{str_sim} B.name => A≈B : 0.8
7. Probabilistic Soft Logic
Entity Resolution
References, names,
friendships
Use rules to express
evidence
- ‘’If two people have similar
names, they are probably the
same’’
- ‘’If two people have similar
friends, they are probably the
same’’
- ‘’If A=B and B=C, then A and C
must also denote the same
person’’
A B
John Smith J. Smith
name name
C
E
D F G
H
friend friend
=
=
{A.friends} ≈{} {B.friends} => A≈B : 0.6
8. Probabilistic Soft Logic
Entity Resolution
References, names,
friendships
Use rules to express
evidence
- ‘’If two people have similar
names, they are probably the
same’’
- ‘’If two people have similar
friends, they are probably the
same’’
- ‘’If A=B and B=C, then A and C
must also denote the same
person’’
A B
John Smith J. Smith
name name
C
E
D F G
H
friend friend
=
=
A≈B ^ B≈C => A≈C : ∞
9. Probabilistic Soft Logic
Link Prediction
Entities
- People, Emails
Attributes
- Words in emails
Relationships
- communication, work
relationship
Goal: Identify work
relationships
- Supervisor, subordinate,
colleague
10. Probabilistic Soft Logic
Link Prediction
People, emails, words,
communication, relations
Use rules to express
evidence
- “If email content suggests type X,
it is of type X”
- “If A sends deadline emails to B,
then A is the supervisor of B”
- “If A is the supervisor of B, and A
is the supervisor of C, then B and C
are colleagues”
11. Probabilistic Soft Logic
Link Prediction
People, emails, words,
communication, relations
Use rules to express
evidence
- “If email content suggests type X,
it is of type X”
- “If A sends deadline emails to B,
then A is the supervisor of B”
- “If A is the supervisor of B, and A
is the supervisor of C, then B and C
are colleagues”
complete by
due
12. Probabilistic Soft Logic
Link Prediction
People, emails, words,
communication, relations
Use rules to express
evidence
- “If email content suggests type X,
it is of type X”
- “If A sends deadline emails to B,
then A is the supervisor of B”
- “If A is the supervisor of B, and A
is the supervisor of C, then B and C
are colleagues”
13. Probabilistic Soft Logic
Link Prediction
People, emails, words,
communication, relations
Use rules to express
evidence
- “If email content suggests type X,
it is of type X”
- “If A sends deadline emails to B,
then A is the supervisor of B”
- “If A is the supervisor of B, and A
is the supervisor of C, then B and C
are colleagues”
23. Probabilistic Soft Logic
Rules
Atoms are real valued
- Interpretation I, atom A: I(A) [0,1]
- We will omit the interpretation and write A [0,1]
∨, ∧ are combination functions
- T-norms: [0,1]n →[0,1]
H1 ∨... Hm ← B1 ∧ B2 ∧ ... Bn
Ground Atoms
[Broecheler, et al., UAI ‘1
24. Probabilistic Soft Logic
Rules
Combination functions (Lukasiewicz T-norm)
A ∨ B = min(1, A + B)
A ∧ B = max(0, A + B – 1)
H1 ∨... Hm ← B1 ∧ B2 ∧ ... Bn
[Broecheler, et al., UAI ‘1
28. Probabilistic Soft Logic
Let’s Review
Given a data set and a PSL program, we can
construct a set of ground rules.
Some of the atoms have fixed truth values
and some have unknown truth values.
For every assignment of truth values to the
unknown atoms, we get a set of weighted
distances from satisfaction.
How to decide which is best?
30. Probabilistic Soft Logic
Probabilistic Model
Probability
density over
interpretation I
Normalization
constant
Set of ground
rules
Distance
exponent
in {1, 2}
Rule’s weight Rule’s distance to satisfaction
31. Probabilistic Soft Logic
Hinge-loss Markov Random Fields
Subject to arbitrary linear constraints
PSL models ground out to HL-MRFs
Log-concave!
33. Probabilistic Soft Logic
Inferring Most Probable Explanations
Objective:
Convex optimization
Decomposition-based inference
algorithm using the ADMM framework
34. Probabilistic Soft Logic
Alternating Direction Method of
Multipliers
• We perform inference using the alternating
direction method of multipliers (ADMM)
• Inference with ADMM
is fast, scalable,
and straightforward
• Optimize subproblems
(ground rules) independently, in parallel
• Auxiliary variables enforce consensus
across subproblems
[Bach et al., NIPS ’12]
35. Probabilistic Soft Logic
Inference Algorithm
Initialize local copies
of variables and
Lagrange multipliers
Begin inference
iterations
Simple updates
solve subproblems
for each potential...
...and each
constraint
Average to update
global variables and
clip to [0,1]
[Bach, et al., Under revie
36. Probabilistic Soft Logic
ADMM Scalability
0
100
200
300
400
500
600
125000 225000 325000
Timeinseconds
Number of potential functions and constraints
ADMM
Interior-point method
[Bach, et al., NIPS ‘12]
38. Probabilistic Soft Logic
Distributed MPE Inference:
MapReduce Implementation
z1z1 zq zqz1 z2z2 zp
sub
problem
local
variable
copy
Mapper
z1 z2z1 z1 z2 zq zq zp
Reducer
update
global
component
load global variable
X as side data
Job Bootstrap
HDFS or
HBase
read/write
subproblem
write new
global variable
read global
variable X
Hui Miao, Xiangyang Liu
39. Probabilistic Soft Logic
Distributed MPE Inference:
GraphLab Implementation
.
.
.
.
.
.
.
.
.
.
.
.
sub
problem
node
global
variable
component
gather
get z
apply
update y
update x
scatter
notify z
gather
get local z,y
apply
update z
scatter
unless converge
notify X
update i update i+1
Hui Miao, Xiangyang Liu
41. Probabilistic Soft Logic
Weight Learning
Learn from training data
No need to hand-code rule-weights
Various methods:
- approximate maximum likelihood
[Broecheler et al., UAI ’10]
- maximum pseudolikelihood
- large-margin estimation
[Bach, et al., UAI ’13]
44. Probabilistic Soft Logic
Foundations Summary
Design probabilistic models using
declarative language
- Syntax based on first-order logic
Inference of most-probable
explanation is fast convex
optimization (ADMM)
Learning algorithms for training rule
weights from labeled data
46. Probabilistic Soft Logic
Document Classification
2
2
2
2
2A
B
A or B?
A or B?
A or B?
Given a networked collection of documents
Observe some labels
Predict remaining labels using
link direction
inferred class label
[Bach, et al., UAI ’13]
49. Probabilistic Soft Logic
Activity Recognition in Videos
crossing waiting queueing walking talking dancing jogging
[London, et al., Under revi
50. Probabilistic Soft Logic
Results on Activity Recognition
Recall matrix between
different activity types
Accuracy metrics
compared against
baseline features
[London, et al., Under revi
51. Probabilistic Soft Logic
Social Trust Prediction
Competing models from social
psychology of strong ties
- Structural balance [Granovetter ’73]
- Social status [Cosmides et al., ’92]
Leskovec, Huttenlocher, & Kleinberg,
[2010]: effects of both models present
in online social networks
52. Probabilistic Soft Logic
Structural Balance vs. Social Status
Structural balance: strong ties are governed by
tendency toward balanced triads
- e.g., the enemy of my enemy...
Social status: strong ties indicate unidirectional
respect, “looking up to”, expertise status
- e.g., patient-nurse-doctor, advisor-advisee
57. Probabilistic Soft Logic
Experiments
User-user trust ratings from two different
online social networks
Observe some ratings, predict held-out
Eight-fold cross validation on two data sets:
- FilmTrust - movie review network,
trust ratings from 1-10
- Epinions - product review network,
trust / distrust ratings {-1, 1}
[Huang, et al., SBP ‘1
58. Probabilistic Soft Logic
Compared Methods
TidalTrust: graph-based propagation of trust
- Predict trust via breadth-first search to combine
closest known relationships
EigenTrust: spectral method for trust
- Predict trustworthiness of nodes based on
eigenvalue centrality of weighted trust network
Average baseline: predict average trust score for
all relationships
[Huang, et al., SBP ‘1
59. Probabilistic Soft Logic
FilmTrust Experiment
Normalize [1,10] rating to [0,1]
Prune network to largest connected-component
1,754 users, 2,055 relationships
Compare mean average error, Spearman’s rank
coefficient, and Kendall-tau distance
* measured on only non-default predictions
• PSL-Status and PSL-Balance disagree on 514 relationships
[Huang, et al., SBP ‘1
60. Probabilistic Soft Logic
Epinions Experiment
Snowball sample of 2,000 users from
Epinions data set
8,675 trust scores normalized to {0,1}
Measure area under precision-recall
curve for distrust edges (rarer class)
[Huang, et al., SBP ‘1
61. Probabilistic Soft Logic
Knowledge Graph Identification
Problem: Collectively reason about noisy,
inter-related fact extractions
Task: NELL fact-promotion (web-scale IE)
- Millions of extractions, with entity ambiguity
and confidence scores
- Rich ontology: Domain, Range, Inverse,
Mutex, Subsumption
Goal: Determine which facts to include in
NELL’s knowledge base
Jay Pujara, Hui Miao, William Cohen
64. Probabilistic Soft Logic
Schema Matching
Correspondences between
source and target schemas
Matching rules
- ‘’If two concepts are the same,
they should have similar
subconcepts’’
- ‘’If the domains of two attributes
are similar, they may be the
same’’
Organization
Customers
Service &
Products
provides
buys
Company
Customer
Products &
Services
develop
buys
Portfolios
includes
develop(A, B) <= provides(A, B)
Company(A) <= Organization(A)
Products&Services(B) <= Service&Products(B)
65. Probabilistic Soft Logic
Schema Mapping
Input: Schema matches
Output: S-T query pairs
(TGD) for exchange or
mediation
Mapping rules
- “Every matched attribute should
participate in some TGD.”
- “The solutions to the queries in
TGDs should be similar.”
Organization
Customers
Service &
Products
provides
buys
Company
Customer
Products &
Services
develop
buys
Portfolios
includes
∃Portfolio P, develop(A, P) ∧
includes(P, B) <= provides(A, B) . . .
Alex Memory, Angelika Kimmig
66. Probabilistic Soft Logic
Learning Latent Groups
Can we better understand political discourse in social
media by learning groups of similar people?
Case study: 2012 Venezuelan Presidential Election
Incumbent: Hugo Chávez
Challenger: Henrique Capriles
Left: This photograph was produced by Agência Brasil, a public Brazilian news agency. This file is licensed under the Creative
Commons Attribution 3.0 Brazil license. Right: This photograph was produced by Wilfredor. This file is licensed under the Creative
Commons Attribution-Share Alike 3.0 Unported license.
[Bach, et al., Under revi
67. Probabilistic Soft Logic
Learning Latent Groups
South American tweets collected from 48-hour
window around election.
Selected 20 top users
Candidates, campaigns, media, and most
retweeted
1,678 regular users interacted with at least one
top user and used at least one hashtag in another
tweet
Those regular users had 8,784 interactions with
non-top users
[Bach, et al., Under revi
74. Probabilistic Soft Logic
Conclusion
PSL:
expressive framework for structured problems
scalable
Much ongoing work, including incorporating hidden
variables and structure learning, distributing
Very interested in applying it to a variety of
domains, especially in computational social science
We encourage you to try it!
http://psl.umiacs.umd.edu