The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia". This presentation covers complete and exact work that has been covered in our IJCAI accepted paper.
This presentation is the first part covering around 80% of content that I had presented in my mid term. There is another presentation with same title but with 'PART-II' in end which is in continuation of this presentation.
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...Abhay Prakash
This document describes a method for automatically mining interesting trivia about entities from Wikipedia. It presents the Wikipedia Trivia Miner (WTM) system, which selects candidate sentences from Wikipedia pages and ranks them based on an interestingness model trained on human ratings. WTM uses linguistic and entity-based features to determine interestingness. Evaluation shows WTM outperforms baselines in precision and recall for retrieving interesting trivia about movie entities. The authors contribute a novel approach for mining interesting facts from text and make their data and code publicly available.
Mining Interesting Trivia for Entities from Wikipedia PART-IIAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia".
This presentation is the second part and in continuation of my another presentation, which is having the same title but with 'PART-I' in end
Enterprise Intelligence: Putting the Pieces Together
http://enterpriserelevance.com/kdd2016/keynote.html
These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016).
About the author:
Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.
This document discusses query understanding in search engines. It describes how query understanding involves identifying entities and tags in queries, predicting the user's intent or topic area, expanding queries using related terms, and incorporating spelling corrections. The key aspects of query understanding covered are tagging queries for entities like names, titles, companies; predicting the user's vertical intent like jobs, people or companies; and expanding queries using name synonyms, job title synonyms or signals from past user queries and clicks. The document also suggests giving users more transparency, guidance and control over the search process.
Query understanding is about focusing less on the results and more on the query. It’s about figuring out what the searcher wants, rather than scoring and ranking results. Once you’ve established this mindset, your approach to search changes: you focus on query performance rather than ranking.
Presented at QConSF 2016: https://qconsf.com/sf2016/presentation/query-understanding-manifesto
Techniques For Deep Query UnderstandingAbhay Prakash
The document summarizes techniques for deep query understanding in search systems. It discusses query understanding, which involves understanding a user's information need from their query. This allows for query correction, suggestion, expansion, classification and semantic tagging. Query correction reformulates ill-formed queries. Query suggestion provides similar queries. Query expansion adds synonyms to broaden results. Query classification determines the intent or topic of the query. Semantic tagging identifies entities in the query. The document outlines various models for these techniques, including using contextual information and graph representations of search logs.
A Model Of Opinion Mining For Classifying MoviesAndrew Molina
This document summarizes a research paper that proposes a model for classifying movies based on user opinions mined from online reviews. The model is capable of suggesting words a reviewer may use based on the title of their review. It can also intelligently predict the popularity of a movie on a scale of "super-flop" to "super-hit" by analyzing sentiments in reviews. The model was tested on over 1000 movie reviews and showed better performance at classifying less popular movies compared to popular review websites. The researchers believe this model could simplify the reviewing process by making it quicker and more effective.
Getting Things To Rank: Improve Search Visibility Using EntitiesJustin Briggs
Presentation for SMX London Session: What is Hummingbird & The Entity Search Revolution.
Covers:
Implicit vs. explicit entity search queries
Tokenization
Parts of speech tagging
Lemmatization
Knowledge graph optimization
MQL
Schema.org
Targeting entities
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...Abhay Prakash
This document describes a method for automatically mining interesting trivia about entities from Wikipedia. It presents the Wikipedia Trivia Miner (WTM) system, which selects candidate sentences from Wikipedia pages and ranks them based on an interestingness model trained on human ratings. WTM uses linguistic and entity-based features to determine interestingness. Evaluation shows WTM outperforms baselines in precision and recall for retrieving interesting trivia about movie entities. The authors contribute a novel approach for mining interesting facts from text and make their data and code publicly available.
Mining Interesting Trivia for Entities from Wikipedia PART-IIAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia".
This presentation is the second part and in continuation of my another presentation, which is having the same title but with 'PART-I' in end
Enterprise Intelligence: Putting the Pieces Together
http://enterpriserelevance.com/kdd2016/keynote.html
These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016).
About the author:
Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.
This document discusses query understanding in search engines. It describes how query understanding involves identifying entities and tags in queries, predicting the user's intent or topic area, expanding queries using related terms, and incorporating spelling corrections. The key aspects of query understanding covered are tagging queries for entities like names, titles, companies; predicting the user's vertical intent like jobs, people or companies; and expanding queries using name synonyms, job title synonyms or signals from past user queries and clicks. The document also suggests giving users more transparency, guidance and control over the search process.
Query understanding is about focusing less on the results and more on the query. It’s about figuring out what the searcher wants, rather than scoring and ranking results. Once you’ve established this mindset, your approach to search changes: you focus on query performance rather than ranking.
Presented at QConSF 2016: https://qconsf.com/sf2016/presentation/query-understanding-manifesto
Techniques For Deep Query UnderstandingAbhay Prakash
The document summarizes techniques for deep query understanding in search systems. It discusses query understanding, which involves understanding a user's information need from their query. This allows for query correction, suggestion, expansion, classification and semantic tagging. Query correction reformulates ill-formed queries. Query suggestion provides similar queries. Query expansion adds synonyms to broaden results. Query classification determines the intent or topic of the query. Semantic tagging identifies entities in the query. The document outlines various models for these techniques, including using contextual information and graph representations of search logs.
A Model Of Opinion Mining For Classifying MoviesAndrew Molina
This document summarizes a research paper that proposes a model for classifying movies based on user opinions mined from online reviews. The model is capable of suggesting words a reviewer may use based on the title of their review. It can also intelligently predict the popularity of a movie on a scale of "super-flop" to "super-hit" by analyzing sentiments in reviews. The model was tested on over 1000 movie reviews and showed better performance at classifying less popular movies compared to popular review websites. The researchers believe this model could simplify the reviewing process by making it quicker and more effective.
Getting Things To Rank: Improve Search Visibility Using EntitiesJustin Briggs
Presentation for SMX London Session: What is Hummingbird & The Entity Search Revolution.
Covers:
Implicit vs. explicit entity search queries
Tokenization
Parts of speech tagging
Lemmatization
Knowledge graph optimization
MQL
Schema.org
Targeting entities
GraphQL est un language de requête mis en open source par Facebook en 2015 qui représente une alternative à REST. Après un bref récapitulatif sur le paradigme que propose GraphQL pour exposer de la donnée, nous verrons comment implémenter un serveur GraphQL en Scala grâce à la librairie Sangria.
KiwiPyCon 2014 talk - Understanding human language with PythonAlyona Medelyan
Introduction into Natural Language Processing:
- Fiction vs Reality
- Complexities of NLP
- NLP with Python: NLTK, Gensim, TextBlob
(stopwords removal, part of speech tagging, tfidf, text categorization, sentiment analysis
- What's next
The document discusses the paradigm shift in search engines from document-based to consensus-based search. It notes the increase in subjective queries as more personal opinions and views are shared online. Current search engines are limited in addressing subjective queries as the answer requires analyzing the consensus view from many opinions rather than just top documents. The document introduces the concept of a consensus search engine that can process and analyze the large number of implicit votes from personal opinions online to provide answers based on aggregated public sentiment.
Presentation of the paper titled "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases" at the ISWC 2020 - Research Track.
@inproceedings{mihindu-sling-2020,
title = "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases",
author = "Mihindukulasooriya, Nandana and Rossiello, Gaetano and Kapanipathi, Pavan and Abdelaziz, Ibrahim and Ravishankar, Srinivas and Yu, Mo and Gliozzo, Alfio and Roukos, Salim and Gray, Alexander",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="402--419",
url = "https://link.springer.com/chapter/10.1007/978-3-030-62419-4_23",
doi = "10.1007/978-3-030-62419-4_23"
}
The document describes research on implicit entity linking in tweets. It discusses how tweets often contain implicit mentions of entities without explicitly naming them, and presents a knowledge-driven approach to identify these implicit entities. The approach builds entity models using factual knowledge from Wikipedia and contextual knowledge from tweets, and then applies a two-step process of candidate selection and disambiguation to link implicit entity mentions to candidate entities. Evaluation on movie and book tweets shows the approach achieves over 60% accuracy in linking implicit entities.
Natural language text can have explicit and implicit constructs. In this presentation we discuss how to link the entities mentioned in an implicit manner in tweets.
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
Iccv2009 recognition and learning object categories p3 c00 - summary and da...zukun
This document discusses various computer vision datasets. It provides information on dataset sizes ranging from 10^0 to greater than 10^11 images. Specific datasets mentioned include Caltech 101/256, PASCAL, Lotus Hill, LabelMe, and ImageNet, among others. It also discusses challenges with crowd-sourcing large datasets and obtaining consistent annotations.
WWW2013: Web Usage Mining with Semantic AnalysisLaura Hollink
Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. In proceedings of the International World Wide Web Conference, Rio de Janeiro, Brazil, May 2013.
This document provides an overview of natural language processing (NLP) tools and resources that can be used to build a machine learning classifier to identify the fame of people mentioned in news articles. It describes NLP tasks like tokenization, part-of-speech tagging, chunking, named entity recognition, parsing, and coreference resolution. It also introduces libraries like the Curator for accessing NLP tools, Edison for feature extraction, and Learning Based Java for building the classifier. Finally, it demonstrates connecting all the pieces to construct a system that can label famous people as politicians, athletes, or corporate moguls.
#### Talk 2: what we learn from our study group on ML
Speakers: your dear study group hosts Dave Snowdon (Software engineer at G-Research, ex VMware) and Jeremie Charlet (CTO at Trackener)
While working on the Kaggle project on sentiment analysis of IMDB reviews for several months, we have been trying multiple vectorization techniques (bag of word, tfidf), running multiple models (from Random Forest to CNN and RNN), read many articles and research papers, compared the results and, most above all, learned a great amount. We will showcase here our work and share our learnings.
PredictionIO is an open source machine learning server that allows software developers to build predictive features into their web and mobile apps. It provides a horizontally scalable architecture based on Spark and uses algorithms like content-based recommendation, trend detection, and sentiment analysis. YELPIO-NAVI is a restaurant recommendation app for Japan that was built using PredictionIO and data from Yelp. It demonstrates neighborhood-based and collaborative filtering recommendations. MovieLens is a content-based movie recommendation engine that uses the MovieLens dataset and feature-based algorithms to provide movie recommendations based on a user's preferences for genres and attributes like director, country, and actors.
This document provides an introduction and overview of Apache UIMA (Unstructured Information Management Architecture).
Apache UIMA is an open source framework for analyzing unstructured information like text, audio, and video. It allows defining type systems and building analysis pipelines using components called annotators that can extract metadata from unstructured data.
The document outlines some key aspects of Apache UIMA including its goals of supporting a community around analyzing unstructured content, how it can bridge different domains, and provides an example scenario of using it to extract metadata from articles about movies.
Humantics | Optimizing Your Content Strategy in an Entity-Driven WorldGrant Simmons
The document discusses strategies for creating content that both satisfies human readers and is well understood by search engines, referred to as "Humantics". It outlines the CARL framework for context, aim, relationships, and links and provides examples of tools that can be used to analyze content, find entities, connections, and questions to ensure the content will be salient for search engines. The goal is to build content that search engines can fully understand in order to rank well without relying solely on keywords.
This document discusses how search engines are leveraging artificial intelligence and machine learning to better understand user queries. It introduces concepts like Humantics, which refers to the intersection of human actions and search engine understanding. Various AI systems used by search engines are discussed, including BERT, RankBrain, and MUM. The document also covers how entities, relationships, and context are important for search engines to understand content. Various tools for identifying entities in text are also mentioned.
This document provides an overview of deep learning techniques for recommendations. It begins with an introduction to recommender systems and how they are used widely in online services for tasks like product, news, and friend recommendations. It then discusses fundamentals of deep recommender systems including collaborative filtering using matrix factorization to learn latent representations of users and items from user-item interaction data. Later sections cover advances in deep recommender systems including using reinforcement learning, graph neural networks, automated machine learning, and defending against adversarial attacks. The tutorial agenda outlines these topics to be covered.
This document presents a method for interpreting and answering entity-seeking telegraphic queries using both a knowledge graph and annotated text corpus. It segments queries into entity, relation, and type partitions and generates interpretations. It retrieves relevant snippets from the corpus and candidate answers from the knowledge graph. A collective inference model combines corpus and graph evidence to infer the answer. Experiments show the joint model outperforms using either source alone and existing semantic parsers on benchmark query sets.
This paper describes a system for representing knowledge from Bahasa Indonesia text using an ontology written in Web Ontology Language Description Logic (OWL DL). The system takes natural language text as input, analyzes it semantically, generates ontology instances and properties, and can answer queries by reasoning over the ontology. It combines prior work on Indonesian natural language processing and using description logics for knowledge representation. An evaluation demonstrates the system representing and reasoning over sample texts about economic activities.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
GraphQL est un language de requête mis en open source par Facebook en 2015 qui représente une alternative à REST. Après un bref récapitulatif sur le paradigme que propose GraphQL pour exposer de la donnée, nous verrons comment implémenter un serveur GraphQL en Scala grâce à la librairie Sangria.
KiwiPyCon 2014 talk - Understanding human language with PythonAlyona Medelyan
Introduction into Natural Language Processing:
- Fiction vs Reality
- Complexities of NLP
- NLP with Python: NLTK, Gensim, TextBlob
(stopwords removal, part of speech tagging, tfidf, text categorization, sentiment analysis
- What's next
The document discusses the paradigm shift in search engines from document-based to consensus-based search. It notes the increase in subjective queries as more personal opinions and views are shared online. Current search engines are limited in addressing subjective queries as the answer requires analyzing the consensus view from many opinions rather than just top documents. The document introduces the concept of a consensus search engine that can process and analyze the large number of implicit votes from personal opinions online to provide answers based on aggregated public sentiment.
Presentation of the paper titled "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases" at the ISWC 2020 - Research Track.
@inproceedings{mihindu-sling-2020,
title = "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases",
author = "Mihindukulasooriya, Nandana and Rossiello, Gaetano and Kapanipathi, Pavan and Abdelaziz, Ibrahim and Ravishankar, Srinivas and Yu, Mo and Gliozzo, Alfio and Roukos, Salim and Gray, Alexander",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="402--419",
url = "https://link.springer.com/chapter/10.1007/978-3-030-62419-4_23",
doi = "10.1007/978-3-030-62419-4_23"
}
The document describes research on implicit entity linking in tweets. It discusses how tweets often contain implicit mentions of entities without explicitly naming them, and presents a knowledge-driven approach to identify these implicit entities. The approach builds entity models using factual knowledge from Wikipedia and contextual knowledge from tweets, and then applies a two-step process of candidate selection and disambiguation to link implicit entity mentions to candidate entities. Evaluation on movie and book tweets shows the approach achieves over 60% accuracy in linking implicit entities.
Natural language text can have explicit and implicit constructs. In this presentation we discuss how to link the entities mentioned in an implicit manner in tweets.
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
Iccv2009 recognition and learning object categories p3 c00 - summary and da...zukun
This document discusses various computer vision datasets. It provides information on dataset sizes ranging from 10^0 to greater than 10^11 images. Specific datasets mentioned include Caltech 101/256, PASCAL, Lotus Hill, LabelMe, and ImageNet, among others. It also discusses challenges with crowd-sourcing large datasets and obtaining consistent annotations.
WWW2013: Web Usage Mining with Semantic AnalysisLaura Hollink
Laura Hollink, Peter Mika and Roi Blanco. Web Usage Mining with Semantic Analysis. In proceedings of the International World Wide Web Conference, Rio de Janeiro, Brazil, May 2013.
This document provides an overview of natural language processing (NLP) tools and resources that can be used to build a machine learning classifier to identify the fame of people mentioned in news articles. It describes NLP tasks like tokenization, part-of-speech tagging, chunking, named entity recognition, parsing, and coreference resolution. It also introduces libraries like the Curator for accessing NLP tools, Edison for feature extraction, and Learning Based Java for building the classifier. Finally, it demonstrates connecting all the pieces to construct a system that can label famous people as politicians, athletes, or corporate moguls.
#### Talk 2: what we learn from our study group on ML
Speakers: your dear study group hosts Dave Snowdon (Software engineer at G-Research, ex VMware) and Jeremie Charlet (CTO at Trackener)
While working on the Kaggle project on sentiment analysis of IMDB reviews for several months, we have been trying multiple vectorization techniques (bag of word, tfidf), running multiple models (from Random Forest to CNN and RNN), read many articles and research papers, compared the results and, most above all, learned a great amount. We will showcase here our work and share our learnings.
PredictionIO is an open source machine learning server that allows software developers to build predictive features into their web and mobile apps. It provides a horizontally scalable architecture based on Spark and uses algorithms like content-based recommendation, trend detection, and sentiment analysis. YELPIO-NAVI is a restaurant recommendation app for Japan that was built using PredictionIO and data from Yelp. It demonstrates neighborhood-based and collaborative filtering recommendations. MovieLens is a content-based movie recommendation engine that uses the MovieLens dataset and feature-based algorithms to provide movie recommendations based on a user's preferences for genres and attributes like director, country, and actors.
This document provides an introduction and overview of Apache UIMA (Unstructured Information Management Architecture).
Apache UIMA is an open source framework for analyzing unstructured information like text, audio, and video. It allows defining type systems and building analysis pipelines using components called annotators that can extract metadata from unstructured data.
The document outlines some key aspects of Apache UIMA including its goals of supporting a community around analyzing unstructured content, how it can bridge different domains, and provides an example scenario of using it to extract metadata from articles about movies.
Humantics | Optimizing Your Content Strategy in an Entity-Driven WorldGrant Simmons
The document discusses strategies for creating content that both satisfies human readers and is well understood by search engines, referred to as "Humantics". It outlines the CARL framework for context, aim, relationships, and links and provides examples of tools that can be used to analyze content, find entities, connections, and questions to ensure the content will be salient for search engines. The goal is to build content that search engines can fully understand in order to rank well without relying solely on keywords.
This document discusses how search engines are leveraging artificial intelligence and machine learning to better understand user queries. It introduces concepts like Humantics, which refers to the intersection of human actions and search engine understanding. Various AI systems used by search engines are discussed, including BERT, RankBrain, and MUM. The document also covers how entities, relationships, and context are important for search engines to understand content. Various tools for identifying entities in text are also mentioned.
This document provides an overview of deep learning techniques for recommendations. It begins with an introduction to recommender systems and how they are used widely in online services for tasks like product, news, and friend recommendations. It then discusses fundamentals of deep recommender systems including collaborative filtering using matrix factorization to learn latent representations of users and items from user-item interaction data. Later sections cover advances in deep recommender systems including using reinforcement learning, graph neural networks, automated machine learning, and defending against adversarial attacks. The tutorial agenda outlines these topics to be covered.
This document presents a method for interpreting and answering entity-seeking telegraphic queries using both a knowledge graph and annotated text corpus. It segments queries into entity, relation, and type partitions and generates interpretations. It retrieves relevant snippets from the corpus and candidate answers from the knowledge graph. A collective inference model combines corpus and graph evidence to infer the answer. Experiments show the joint model outperforms using either source alone and existing semantic parsers on benchmark query sets.
This paper describes a system for representing knowledge from Bahasa Indonesia text using an ontology written in Web Ontology Language Description Logic (OWL DL). The system takes natural language text as input, analyzes it semantically, generates ontology instances and properties, and can answer queries by reasoning over the ontology. It combines prior work on Indonesian natural language processing and using description logics for knowledge representation. An evaluation demonstrates the system representing and reasoning over sample texts about economic activities.
Similar to Mining Interesting Trivia for Entities from Wikipedia PART-I (20)
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Azure Interview Questions and Answers PDF By ScholarHat
Mining Interesting Trivia for Entities from Wikipedia PART-I
1. Mining Interesting Trivia for Entities
from Wikipedia
Supervised By: Presented By:
Dr. Dhaval Patel,
Assistant Professor,
IIT Roorkee
Abhay Prakash,
En. No. - 10211002,
IIT Roorkee
Dr. Manoj Chinnakotla,
Applied Researcher,
Microsoft India
2. Motivation
Actual Consumption
by Bing during CWC’15
User Engagement
(Rich Experience)
Facts for
quiz games
(shows like KBC)
Manual Curation?
Professional Curator
- In 1 day: 50 trivia
(spanning 10 entities)
3. Introduction: Problem Statement
Definition: Trivia is any fact about an entity which is interesting due to any of
the following characteristics - unusualness, uniqueness, unexpectedness or
weirdness.
E.g. “Aamir Khan did not blink his eyes even once in complete movie” [Movie: PK (2014)]
Unusual for a human to not blink his eyes
Problem Statement: For a given entity, mine top-k interesting trivia from its Wikipedia
page, where a trivia is considered interesting if when it is shown to 𝑁 persons, more
than 𝑁/2 persons find it interesting.
For evaluation of unseen set, we chose 𝑁 = 5 (statistical significance discussed ahead)
4. Position w.r.t Related Works
Automatic generation of trivia questions (2002) [1]
Their Work: Trivia Questions from structured Database.
Difference: WTM retrieves Trivia (facts) from Unstructured Text.
Predicting Interesting Things in Text (2014) [2]
Their Work: Click prediction on anchors(links) with in Wikipedia page.
Difference: WTM is not limited to Links and don’t(can’t) use any click-through data.
Automatic Prediction of Text Aesthetics and Interestingness (2014) [3]
Their Work: One-class algorithm for identifying poetically beautiful sentences.
Difference: Similar Nature, but domain different so engineered features differ a lot.
Man bites dog: looking for interesting inconsistencies in structured news reports (2004) [4]
Their Work: Found unexpected news articles, dependent on ‘structured’ news reports.
Difference: WTM not limited to structured data.
5. Wikipedia Trivia Miner
Mines Trivia for a Target Entity (Expt: Movie)
Trains a ranker using trivia of target domain
Uses Wikipedia as source of Trivia
Retrieves Top-k interesting trivia from entity’s page
Why Wikipedia?
Reliable for factual correctness
Ample # of interesting trivia (56/100 in expt.)
Two Phases
Model Building (Train Phase)
Retrieval (Test Phase)
Candidate
Selection
Human Voted Trivia Source
Train Dataset Candidates’ Source
Top-K Interesting Trivia
from Candidates
Wikipedia Trivia Miner (WTM)
Interestingness Ranker
Filtering & Grading
Feature Extraction Feature ExtractionSVMrank
Knowledge Base
Train Phase Retrieval Phase
6. System Architecture
Filtering & Grading
Filters out less reliable samples
Give a grade to each sample, as reqd. by ranker
Interestingness Ranker
Extracts features from the samples/candidates
Trains ranker(SVMrank)/Ranks candidates
Candidate Selection
Identifies candidates from Wikipedia
Candidate
Selection
Human Voted Trivia Source
Train Dataset Candidates’ Source
Top-K Interesting Trivia
from Candidates
Wikipedia Trivia Miner (WTM)
Interestingness Ranker
Filtering & Grading
Feature Extraction Feature ExtractionSVMrank
Knowledge Base
7. Filtering & Grading
Crawled Trivia from IMDB
Top 5K movies, 99K trivia in total
Filtered on # of votes ≥ 5
𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 𝐿. 𝑅 =
# 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑖𝑛𝑔 𝑉𝑜𝑡𝑒𝑠
# 𝑜𝑓 𝑇𝑜𝑡𝑎𝑙 𝑉𝑜𝑡𝑒𝑠
Normal Dist. required on grade
Sample Trivia for movie 'Batman Begins‘ [screenshot taken from IMDB]
0
5
10
15
20
25
30
35
40
39.56
30.33
17.08
4.88
3.57
1.74 1.06 0.65 0.6 0.33 0.21
%ageCoverage
Likeness Ratio
TRAIN PHASE
8. Filtering & Grading (Contd..)
High Support for High LR
For L.R. > 0.6, # of votes >= 100
Graded by Percentile-Cutoff to get
5 grades
[90,100], [75-90), [25-75), [10-25), [0-10)
6163 samples from 846 movies
706
1091
2880
945
541
0
500
1000
1500
2000
2500
3000
3500
4 (Very
Interesting)
3
(Interesting)
2
(Ambiguous)
1 (Boring) 0 (Very
Boring)
Frequency
Trivia Grade
TRAIN PHASE
9. Feature Engineering
Unigrams (U): Basic Technique in Text Mining
Linguistic (L): Language Analysis Features
Superlative Words
Contradictory Words
Root Word (Verb)
Subject Word (First noun)
Readability
Entity (E): Understanding/Generalizing the entities present
Present Entities
Linking Entities for Linguistic Features
Focus Entities of sentence
TRAIN PHASE
10. Feature: Unigram Features
Basic Technique in Text Mining
Each word(unigram) as a feature column, its TF-IDF as feature value
Pre-processing
Stop word removal, Case conversion, Stemming and Punctuation removal
Why this Feature?
Try to identify imp. words which make the trivia interesting
Prominent emerged words - “stunt”, “award”, “improvise.”
e.g. “Tom Cruise did all of his own stunt driving.” [Movie: Jack Reacher (2012)]
TRAIN PHASE
11. Feature: Linguistic Features
Presence of Superlative Words
Words like “best”, “longest”, “first” etc.
Shows the extremeness (uniqueness)
Identified by Part of Speech(POS) Tags: superlative adjective (JJS) and superlative adverbs (RBS)
E.g. “The longest animated Disney film since Fantasia (1940).” [Movie: Tangled (2010)]
Presence of Contradictory Words
Words like “but”, “although”, “unlike” etc.
Opposing ideas could spark intrigue and interest
E.g. “The studios wanted Matthew McConaughey for lead role, but James Cameron insisted on
Leonardo DiCaprio.” [Movie: The Shawshank Redemption (1994)]
TRAIN PHASE
12. Root Word of Sentence
Captures core activity being discussed in the sentence
E.g. “Gravity grossed $274 Mn in North America,” talks about revenue related stuff
Feature column of root_gross
Subject of Sentence (first noun before root verb)
Captures core thing being discussed in the sentence
E.g “The actors snorted crushed B vitamins for scenes involving cocaine.”
Feature column of subj_actor
Readability Score
Complex and lengthy trivia are hardly interesting
FOG Index calculated and binned in three bins
Feature: Linguistic Features
TRAIN PHASE
13. Presence of Generic NEs
Presence of NEs: MONEY, ORGANIZATION, PERSON, DATE, TIME and LOCATION
Feature column for each of the six NEs
E.g. “The guns in the film were supplied by Aldo Uberti Inc., a company in Italy.”
ORGANIZATION and LOCATION
Feature: Entity Features
TRAIN PHASE
14. Feature: Entity Features
Present Entities
Presence of related entities (Resolved using DBPedia)
E.g. entity_producer and entity_character in above sample
Entities Linked before Linguistic
“According to entity_producer, …”
Linguistic Feature Subject Word: subj_Victoria subj_entity_producer
Focus Named Entities of Sentence
Presence of any NE present directly under the root
For above ex. Feature columns of underroot_entity_producer, underroot_entity_character
“According to Victoria Alonso, Rocket Raccoon and Groot were created through a mix of motion-capture and rotomation VFX.”
TRAIN PHASE
15. Model Building: Ranker
Used Rank-SVM
Finds a plane, projection of each sample on which is in given grade order
Order of samples within a movie
MOVIE_ID FEATURES GRADE
1 1:1 5:2 … 4
1 … 2
1 … 1
2 … 4
2 … 3
2 … 1
2 … 1
MOVIE_ID FEATURES
1 1:1 5:2 …
1 …
2 …
2 …
2 …
3 …
3 …
Image taken and modified from Wikipedia
SCORE
1.7
2.4
1.2
2.7
0.13
3.1
1.3
INPUT FOR TRAINING MODEL BUILT (Hyperplane) INPUT FOR RANKING OUTPUT OF RANKING
MODEL
TRAIN PHASE
16. Model Building: Cross Validate Results
Feature increment and model building
0.934
0.919
0.929
0.9419
0.944
0.951
0.9
0.91
0.92
0.93
0.94
0.95
0.96
Unigram (U) Linguistic (L) Entity Features
(E)
U + L U + E WTM (U + L + E)
NDCG@10
Feature Group
TRAIN PHASE
17. Model Building: Feature Weights
Sneak peek inside the model - What the model is learning?
Top Features: Our advanced features are useful and intuitive for humans too
Rank Feature Group
1 subj_scene Linguistic
2 subj_entity_cast Linguistic + Entity
3 entity_produced_by Entity
4 underroot_unlinked_organization Linguistic + Entity
6 root_improvise Linguistic
7 entity_character Entity
8 MONEY Entity (NER)
14 stunt Unigram
16 superPOS Linguistic
17 subj_actor Linguistic
• Entity Linking lead to
better generalization
• else these would have
been subj_wolverine etc.
TRAIN PHASE
18. Candidate
Selection
Human Voted Trivia Source
Train Dataset Candidates’ Source
Top-K Interesting Trivia
from Candidates
Wikipedia Trivia Miner (WTM)
Interestingness Ranker
Filtering & Grading
Feature Extraction Feature ExtractionSVMrank
Knowledge Base
Retrieval Phase
Retrieval Phase
- Get Trivia from Wikipedia Page
19. Candidate Selection
Sentence Extraction
Crawled only the text in paragraph tag <p>…</p>
Sentence detection each sentence for further processing
Removed sentences with missing context
E.g. “It really reminds me of my childhood.”
Co-ref resolution to find out links to different sentence
Remove if out link not the target entity
e.g. “Hanks revealed that he signed onto the film after an hour and a half of reading the script. He
initially ...”
First ‘he’ not an out link, ‘the film’ points to the target entity. Second ‘He’ is an out link
First sentence kept, Second removed
RETRIEVAL PHASE
20. Test Set for Model Evaluation
Generated trivia for 20 Movies from Wikipedia
Judged (crowd-sourced) by 5 judges
Two scale voting – Boring / Interesting
Majority voting for Class Labeling
Statistically significant?
Got 100 trivia from IMDB also judged by 5 judges only
Mechanism I: Majority voting of IMDB crowd v/s Mechanism II: Crowd-sourced by 5 judges
Agreement between two mechanisms = Substantial (Kappa Value = 0.618)
Kappa Agreement
< 0 Less than chance agreement
0.01-0.20 Slight agreement
0.21-0.40 Fair agreement
0.41-0.60 Moderate agreement
0.61-0.80 Substantial agreement
0.81-0.99 Almost perfect agreement
RETRIEVAL PHASE
21. Results: Metrics on Unseen: P@10
Comparative Approaches & Baselines
Random:
- 10 sentences picked randomly from Wikipedia
0.25
0.3
0.32 0.33 0.34 0.34
0.45
0
0.1
0.2
0.3
0.4
0.5
P@10
Model
Random
(Baseline-I)
RETRIEVAL PHASE
22. Results: Metrics on Unseen: P@10
Comparative Approaches & Baselines
CS + Random:
- Missing context sentences removed by CS
- 10 sent. picked randomly
0.25
0.3
0.32 0.33 0.34 0.34
0.45
0
0.1
0.2
0.3
0.4
0.5
P@10
Model
CS then Random
B-I
(19.61% Imp.)
RETRIEVAL PHASE
23. Results: Metrics on Unseen: P@10
Comparative Approaches & Baselines
CS + supPOS(Worst):
- Ranked by # of sup. words
- Deliberately taking boring sent. for same # of sup.
CS + supPOS(Rand):
- Ranked by # of sup. words
- Shuffled for same # of sup. Words
CS + supPOS(Best):
- Ranked by # of sup. words
- Deliberately taking interesting sent. for same # of sup.
0.25
0.3
0.32 0.33 0.34 0.34
0.45
0
0.1
0.2
0.3
0.4
0.5
P@10
Model
supPOS_W
B-I
supPOS Trivia: Marlon Brando did not memorize most of his lines and read from cue cards during most of the film.
RETRIEVAL PHASE
supPOS_R
(29.41% Imp.)
supPOS_B
(Baseline-II)
24. Results: Metrics on Unseen: P@10
Comparative Approaches & Baselines
CS + WTM(U):
- ML Ranking
- With only basic Unigram(U) features
0.25
0.3
0.32 0.33 0.34 0.34
0.45
0
0.1
0.2
0.3
0.4
0.5
P@10
Model
B-I
WTM (U)
B-II
RETRIEVAL PHASE
25. Results: Metrics on Unseen: P@10
Comparative Approaches & Baselines
CS + WTM(U): ML Ranking with only (U) features
CS + WTM(U+L+E):
- ML Ranking
- With advanced (U+L+E) features
0.25
0.3
0.32 0.33 0.34 0.34
0.45
0
0.1
0.2
0.3
0.4
0.5
P@10
Model
B-I
B-II
WTM (U+L+E)
78.43% imp. (B-I)
33.82% imp. (B-II)
RETRIEVAL PHASE
26. Results: Metrics on Unseen: Recall@K
supPOS limited to one kind of trivia
WTM captures varied types
62% recall till rank 25
Performance Comparison
supPOS better till rank 3
Soon after rank 3, WTM beats superPOS 0
10
20
30
40
50
60
70
0 5 10 15 20 25
%Recall
Rank
SuperPOS (Best Case) WTM Random
RETRIEVAL PHASE
27. Results: Qualitative Discussion
Result Movie Trivia Description
WTM Wins
(Sup. POS
Misses)
Interstellar
(2014)
Paramount is providing a virtual reality walkthrough
of the Endurance spacecraft using Oculus Rift
technology.
Due to Organization being
subject, and (U) features
(technology, reality, virtual)
Gravity
(2013)
When the script was finalized, Cuarón assumed it
would take about a year to complete the film, but it
took four and a half years.
Due to Entity.Director,
Subject (the script), Root
word (assume) and (U)
features (film, years)
WTM’s Bad
Elf (2003) Stop motion animation was also used. Candidate Selection failed
Rio 2
(2014) Rio 2 received mixed reviews from critics.
Root verb "receive" has high
weightage in model
RETRIEVAL PHASE
28. Results: Qualitative Discussion (Contd…)
Result Movie Trivia Description
Sup. POS Wins
(WTM misses)
The
Incredibles
(2004)
Humans are widely considered to be the most
difficult thing to execute in animation.
Presence of ‘most’,
absence of any Entity,
vague Root word
(consider)
Sup. POS's Bad
Lone
Survivor
(2013)
Most critics praised Berg's direction, as well as the
acting, story, visuals and battle sequences.
Here 'most' is not to show
degree but instead to
show genericity.
RETRIEVAL PHASE
29. Dissertation Contribution
Identified, Defined and Provided a novel research problem
not just only providing solutions to existing problem
Proposed a system “Wikipedia Trivia Miner (WTM)”
To mine top-k interesting trivia for any given entity based on their interestingness
Engineered features that capture ‘about-ness’ of sentence
Generalizes which one are interesting
Shown how publicly available IMDB data can be leveraged for model learning
Cost effective, as eliminates the need of crowd annotation
Proposed a mechanism to prepare ground truth for test-set
Cost-effective but statistically significant
30. Publication Submitted
[1] Abhay Prakash, Manoj Chinnakotla, Dhaval Patel, Puneet Garg (2015): “Did
you know?: Mining Interesting Trivia for Entities from Wikipedia”. Submitted in
International Joint Conference on Artificial Intelligence (IJCAI).
31. Further Work
Replicate the work on Celebrities domain
Verify that WTM approach is actually domain independent
Feature Engineering to capture deviation from expectation
Expectation based on topics in that domain, compare topic of candidate
Fact Popularity
Lesser known trivia could be more interesting to majority people
32. Key References
[1] Matthew Merzbacher, "Automatic generation of trivia questions," Foundations of Intelligent
Systems, Lecture Notes in Computer Science, vol. 2366, pp. 123-130, 2002
[2] Michael Gamon, Arjun Mukherjee, and Patrick Pantel, "Predicting interesting things in text,“
in COLING, 2014.
[3] Debasis Ganguly, Johannes Leveling, and Gareth Jones, "Automatic prediction of text
aesthetics and interestingness," in COLING, 2014.
[4] Emma Byrne and Anthony Hunter, "Man bites dog: looking for interesting inconsistencies in
structured news reports," Data and Knowledge Engineering, vol. 48, no. 3, pp. 265-295, 2004.