This document describes a method for automatically mining interesting trivia about entities from Wikipedia. It presents the Wikipedia Trivia Miner (WTM) system, which selects candidate sentences from Wikipedia pages and ranks them based on an interestingness model trained on human ratings. WTM uses linguistic and entity-based features to determine interestingness. Evaluation shows WTM outperforms baselines in precision and recall for retrieving interesting trivia about movie entities. The authors contribute a novel approach for mining interesting facts from text and make their data and code publicly available.
Mining Interesting Trivia for Entities from Wikipedia PART-IAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia". This presentation covers complete and exact work that has been covered in our IJCAI accepted paper.
This presentation is the first part covering around 80% of content that I had presented in my mid term. There is another presentation with same title but with 'PART-II' in end which is in continuation of this presentation.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
Mining Interesting Trivia for Entities from Wikipedia PART-IIAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia".
This presentation is the second part and in continuation of my another presentation, which is having the same title but with 'PART-I' in end
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
This document presents a method for interpreting and answering entity-seeking telegraphic queries using both a knowledge graph and annotated text corpus. It segments queries into entity, relation, and type partitions and generates interpretations. It retrieves relevant snippets from the corpus and candidate answers from the knowledge graph. A collective inference model combines corpus and graph evidence to infer the answer. Experiments show the joint model outperforms using either source alone and existing semantic parsers on benchmark query sets.
Presentation of the paper titled "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases" at the ISWC 2020 - Research Track.
@inproceedings{mihindu-sling-2020,
title = "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases",
author = "Mihindukulasooriya, Nandana and Rossiello, Gaetano and Kapanipathi, Pavan and Abdelaziz, Ibrahim and Ravishankar, Srinivas and Yu, Mo and Gliozzo, Alfio and Roukos, Salim and Gray, Alexander",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="402--419",
url = "https://link.springer.com/chapter/10.1007/978-3-030-62419-4_23",
doi = "10.1007/978-3-030-62419-4_23"
}
Mining Interesting Trivia for Entities from Wikipedia PART-IAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia". This presentation covers complete and exact work that has been covered in our IJCAI accepted paper.
This presentation is the first part covering around 80% of content that I had presented in my mid term. There is another presentation with same title but with 'PART-II' in end which is in continuation of this presentation.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
Mining Interesting Trivia for Entities from Wikipedia PART-IIAbhay Prakash
The following presentation is on my Masters Graduate Thesis Work - "Mining Interesting Trivia for Entities from Wikipedia".
This presentation is the second part and in continuation of my another presentation, which is having the same title but with 'PART-I' in end
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
This is the 3rd part of the tutorial on commonsense knowledge (CSK) at ACM WSDM 2021 by Simon Razniewski, Niket Tandon and Aparna Varde. It focuses on evaluation of the acquired knowledge, both intrinsic & extrinsic, as well as highlights, outlook with a brief perspective on COVID and open issues for further research.
Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt’s birth date, or Angelina Jolie’s movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
This document presents a method for interpreting and answering entity-seeking telegraphic queries using both a knowledge graph and annotated text corpus. It segments queries into entity, relation, and type partitions and generates interpretations. It retrieves relevant snippets from the corpus and candidate answers from the knowledge graph. A collective inference model combines corpus and graph evidence to infer the answer. Experiments show the joint model outperforms using either source alone and existing semantic parsers on benchmark query sets.
Presentation of the paper titled "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases" at the ISWC 2020 - Research Track.
@inproceedings{mihindu-sling-2020,
title = "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases",
author = "Mihindukulasooriya, Nandana and Rossiello, Gaetano and Kapanipathi, Pavan and Abdelaziz, Ibrahim and Ravishankar, Srinivas and Yu, Mo and Gliozzo, Alfio and Roukos, Salim and Gray, Alexander",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="402--419",
url = "https://link.springer.com/chapter/10.1007/978-3-030-62419-4_23",
doi = "10.1007/978-3-030-62419-4_23"
}
The document discusses how the entertainment industry is evolving with declining movie attendance and rising streaming, and how Sony Pictures is applying data-driven "Moneyball" approaches like using predictive analytics and machine learning for greenlight decisions and applications like movie recommendation engines and title clustering. It also covers new opportunities in augmented reality, virtual reality and cinematic VR.
This document summarizes the key expectations and challenges when visualizing data or building visual analytics tools. There are several main points:
1. Expect potential mismatches between what clients think they need versus what the data and visualization actually require, requiring clear communication and compromise.
2. Different projects will have different goals that require flexibility in the types of visualizations created, whether for presentation, exploration, or both.
3. A significant amount of time, often 70-80%, will be spent cleaning and preparing data prior to visualization due to issues like missing values, formatting inconsistencies, and data quality problems.
4. Iteration is essential to work out bugs and refine visualizations to best meet requirements and dead
Microsoft wants to enter the movie industry but lacks knowledge. The author analyzed recent box office data and found:
1) Animated musicals budgeted $75-200M and released in June/November or live-action superhero films budgeted $200-400M released in April/May tend to succeed.
2) Hiring composers for animated musicals and directors who worked on top-grossing superhero films is recommended.
3) This analysis of factors like genre, budget, and release time can help Microsoft maximize revenue from its initial movie productions.
Extracting, Aligning, and Linking Data to Build Knowledge GraphsCraig Knoblock
This document discusses building knowledge graphs by extracting, aligning, and linking data from various sources. It describes crawling websites to acquire raw data, using both structured and unstructured extraction to extract features from the data, aligning the extracted features to a common schema, and resolving entities in the data to merge records referring to the same real-world entity. It also discusses techniques for collectively resolving entities in large datasets, summarizing graphs by grouping similar nodes into super-nodes, and using the summarized graph to predict links in the original graph. The overall goal is to clean, organize, and link disconnected data into a knowledge graph that is easier to query, analyze, and visualize.
Curated Proof Markets & Token-Curated Identities in Ocean ProtocolTrent McConaghy
This talk describes Ocean Protocol’s token mechanics via step-by-step examples of how users earn tokens by curating data and making it available.
Blog post: https://medium.com/@trentmc0/curated-proofs-markets-a-walk-through-of-oceans-core-token-mechanics-3d50851a8005
Presented at 9984 Blockchain Meetup, Berlin, Mar 28, 2018
Modern Oracle DBAs have spent years acquiring extremely valuable skills, even while facing increased responsibility for growing numbers of diverse multi-version databases, demands to transition to public cloud computing Infrastructure, and a never-ending drumbeat for upskilling and relevance in our industry. It’s the perfect time to consider a transition in your career by leveraging your expertise with the Oracle database in a new role as a Data Engineer (DE).
This document provides an overview of deep learning. It discusses the motivation and history of machine learning, including pattern recognition, machine learning algorithms based on linear models, and neural networks. It then introduces deep learning, noting that deep neural networks combined with GPUs and large datasets have led to significant performance gains compared to other machine learning techniques.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML MeetupRomain Yon
Original event: https://www.meetup.com/NYC-Machine-Learning/events/256605862/
--
"Doing large scale ML in production is hard" – Everyone who's tried
This talk is focussed on ML Systems. Especially the less obvious pitfalls, which have caused us troubles at Spotify.
This talk assumes a certain level of familiarity with ML: You'll get the most out of if you've some experience with applied ML, ideally on production systems.
Romain Yon is a Staff ML Engineer at Spotify. Over the years, Romain has worked on many of the core ML systems that power Spotify today (Music Recommendation, Catalog Quality, Search Ranking, Ads, ..).
During the past year, Romain has been mostly focusing on designing reusable ML Infrastructure that can be leveraged throughout Spotify.
Prior to Spotify, Romain co-founded the startup https://linkurio.us while getting his MSc in ML from Georgia Tech.
Chatbots and Natural Language Generation - A Bird Eyes ViewMark Cieliebak
Chatbots, conversational user interfaces, dialogue systems, question-answering - the names differ, but the fundamental idea is the same: smart computer systems which can "talk" to humans in a natural way. Chatbots and their derivatives are designed to understand human language, interpret its content, and reply accordingly. This long-standing vision from artificial intelligence has gained enormous momentum since 2015.
But what is possible, and where are the boundaries? Do chatbots really "understand" the meaning of text? And how can they be employed beneficially in real-world applications?
In this talk, we will give an overview of state-of-the-art technologies and applications for dialogue systems in research and industry.
Future of AI-powered automation in businessLouis Dorard
Starting from examples of current use cases of AI in business and in everyday life, we'll see what the future holds and we'll mention questions to address when giving autonomy to intelligent machines. We'll also aim at demystifying how AI works, in particular how machines can use data to automatically learn business rules and actions to perform in different contexts.
This document describes a movie recommendation system project that will use collaborative filtering techniques to predict movie ratings and recommend movies to users. The project will use the MovieLens dataset to identify user demographics and movie genres and classify them using different algorithms. Conditional inference trees and random forests will be implemented and evaluated on the MovieLens data, with the highest accuracy achieved using age, gender, occupation, and genre features. Exploratory data analysis of the MovieLens data found that most users are students aged 20-30 and most movies are from the 1990s across many genres.
Introduction to Unsupervised Learning - Code Herokucodeheroku
This slide is a part of Introduction to Machine Learning course by Code Heroku.
Here is the recorded version of our Introduction to Unsupervised Learning tutorial: https://www.youtube.com/watch?v=gnxCdjaBkXY
Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1
You can watch all our upcoming and past workshops here: http://www.codeheroku.com
Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g
Follow our publication on Medium: https://medium.com/code-heroku
Visit our Facebook page: https://www.facebook.com/codeheroku
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
(This is material presented as keynote at AMECSE 2014 on 21 Oct 2014 at Cairo, Egypt.)
State-of-the-art Artifical Intelligence (AI) and data management techniques have been demonstrated to process large volumes of noisy data to extract meaningful patterns and drive decisions in diverse applications ranging from space exploration (NASA's Curiosity), game shows (IBM's Watson in Jeopardy™ ) and even consumer products (Apple's SIRI™ voice-recognition). However, what stops them from helping us in more mundane things like fighting diseases, eliminating hunger, improving commuting
to work, or reducing financial frauds and corruption? Consumable data!
In this talk, Biplav will demonstrate and discuss how large volumes of data (Big), made available publicly (Open), can be productively used with semantic web and analytical techniques to drive day-to-day applications. One important source of this type of data is government open data which is from governments and free to be reused. Big Open Data is leading to early examples of "open innovations" - a confluence of open data (e.g., Data.gov, data.gov.in), accessible via API techniques (e.g., Open 311),
annotated with semantic information (e.g., W3C ontologies, Schema.org) and processed with analytical techniques (e.g., R, Weka) to drive actionable insights. The talk will illustrate how this can help bring increased benefits to citizens and discuss research issues that can accelerate its pace. It is increasingly being adopted by progressive businesses and governments to drive innovation that matters.
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
This session explores the use of advanced tools and strategies for data analytics to increase audience engagement and customer retention. The presentation examines the use of machine learning to build predictive analytics models to identify customers likely to churn as well as for sentiment analysis for customer interest. This also includes a case study on predictive analytics for OTT (over the top) content delivery.
Discover the Future of Entertainment: Dive into the world of movie recommendation systems in our engaging presentation. Join us as we explore the power of cutting-edge technology and data analytics to enhance user experiences in the entertainment industry. Our journey begins with data collection and cleaning, followed by a fascinating peek into the importance of movie recommendation systems.
Uncover the Problem: Have you ever felt overwhelmed by the sheer number of movie choices on streaming platforms like Netflix and Amazon Prime? Our project addresses this very challenge by simplifying your movie selection process.
A Glimpse into the Timeline: Journey with us through the phases of data collection, preprocessing, and basic exploratory data analysis. Witness the transformation of raw data into actionable insights.
Cosine Similarity Revealed: Delve into the heart of our recommendation system as we explain the concept of Cosine Similarity, the mathematical foundation behind our recommendations.
Pros and Cons Explored: Explore the pros and cons of movie recommendation systems, from personalized user experiences and increased engagement to challenges like the 'Cold Start Problem' and privacy concerns.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
More Related Content
Similar to IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entities from Wikipedia
The document discusses how the entertainment industry is evolving with declining movie attendance and rising streaming, and how Sony Pictures is applying data-driven "Moneyball" approaches like using predictive analytics and machine learning for greenlight decisions and applications like movie recommendation engines and title clustering. It also covers new opportunities in augmented reality, virtual reality and cinematic VR.
This document summarizes the key expectations and challenges when visualizing data or building visual analytics tools. There are several main points:
1. Expect potential mismatches between what clients think they need versus what the data and visualization actually require, requiring clear communication and compromise.
2. Different projects will have different goals that require flexibility in the types of visualizations created, whether for presentation, exploration, or both.
3. A significant amount of time, often 70-80%, will be spent cleaning and preparing data prior to visualization due to issues like missing values, formatting inconsistencies, and data quality problems.
4. Iteration is essential to work out bugs and refine visualizations to best meet requirements and dead
Microsoft wants to enter the movie industry but lacks knowledge. The author analyzed recent box office data and found:
1) Animated musicals budgeted $75-200M and released in June/November or live-action superhero films budgeted $200-400M released in April/May tend to succeed.
2) Hiring composers for animated musicals and directors who worked on top-grossing superhero films is recommended.
3) This analysis of factors like genre, budget, and release time can help Microsoft maximize revenue from its initial movie productions.
Extracting, Aligning, and Linking Data to Build Knowledge GraphsCraig Knoblock
This document discusses building knowledge graphs by extracting, aligning, and linking data from various sources. It describes crawling websites to acquire raw data, using both structured and unstructured extraction to extract features from the data, aligning the extracted features to a common schema, and resolving entities in the data to merge records referring to the same real-world entity. It also discusses techniques for collectively resolving entities in large datasets, summarizing graphs by grouping similar nodes into super-nodes, and using the summarized graph to predict links in the original graph. The overall goal is to clean, organize, and link disconnected data into a knowledge graph that is easier to query, analyze, and visualize.
Curated Proof Markets & Token-Curated Identities in Ocean ProtocolTrent McConaghy
This talk describes Ocean Protocol’s token mechanics via step-by-step examples of how users earn tokens by curating data and making it available.
Blog post: https://medium.com/@trentmc0/curated-proofs-markets-a-walk-through-of-oceans-core-token-mechanics-3d50851a8005
Presented at 9984 Blockchain Meetup, Berlin, Mar 28, 2018
Modern Oracle DBAs have spent years acquiring extremely valuable skills, even while facing increased responsibility for growing numbers of diverse multi-version databases, demands to transition to public cloud computing Infrastructure, and a never-ending drumbeat for upskilling and relevance in our industry. It’s the perfect time to consider a transition in your career by leveraging your expertise with the Oracle database in a new role as a Data Engineer (DE).
This document provides an overview of deep learning. It discusses the motivation and history of machine learning, including pattern recognition, machine learning algorithms based on linear models, and neural networks. It then introduces deep learning, noting that deep neural networks combined with GPUs and large datasets have led to significant performance gains compared to other machine learning techniques.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML MeetupRomain Yon
Original event: https://www.meetup.com/NYC-Machine-Learning/events/256605862/
--
"Doing large scale ML in production is hard" – Everyone who's tried
This talk is focussed on ML Systems. Especially the less obvious pitfalls, which have caused us troubles at Spotify.
This talk assumes a certain level of familiarity with ML: You'll get the most out of if you've some experience with applied ML, ideally on production systems.
Romain Yon is a Staff ML Engineer at Spotify. Over the years, Romain has worked on many of the core ML systems that power Spotify today (Music Recommendation, Catalog Quality, Search Ranking, Ads, ..).
During the past year, Romain has been mostly focusing on designing reusable ML Infrastructure that can be leveraged throughout Spotify.
Prior to Spotify, Romain co-founded the startup https://linkurio.us while getting his MSc in ML from Georgia Tech.
Chatbots and Natural Language Generation - A Bird Eyes ViewMark Cieliebak
Chatbots, conversational user interfaces, dialogue systems, question-answering - the names differ, but the fundamental idea is the same: smart computer systems which can "talk" to humans in a natural way. Chatbots and their derivatives are designed to understand human language, interpret its content, and reply accordingly. This long-standing vision from artificial intelligence has gained enormous momentum since 2015.
But what is possible, and where are the boundaries? Do chatbots really "understand" the meaning of text? And how can they be employed beneficially in real-world applications?
In this talk, we will give an overview of state-of-the-art technologies and applications for dialogue systems in research and industry.
Future of AI-powered automation in businessLouis Dorard
Starting from examples of current use cases of AI in business and in everyday life, we'll see what the future holds and we'll mention questions to address when giving autonomy to intelligent machines. We'll also aim at demystifying how AI works, in particular how machines can use data to automatically learn business rules and actions to perform in different contexts.
This document describes a movie recommendation system project that will use collaborative filtering techniques to predict movie ratings and recommend movies to users. The project will use the MovieLens dataset to identify user demographics and movie genres and classify them using different algorithms. Conditional inference trees and random forests will be implemented and evaluated on the MovieLens data, with the highest accuracy achieved using age, gender, occupation, and genre features. Exploratory data analysis of the MovieLens data found that most users are students aged 20-30 and most movies are from the 1990s across many genres.
Introduction to Unsupervised Learning - Code Herokucodeheroku
This slide is a part of Introduction to Machine Learning course by Code Heroku.
Here is the recorded version of our Introduction to Unsupervised Learning tutorial: https://www.youtube.com/watch?v=gnxCdjaBkXY
Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1
You can watch all our upcoming and past workshops here: http://www.codeheroku.com
Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g
Follow our publication on Medium: https://medium.com/code-heroku
Visit our Facebook page: https://www.facebook.com/codeheroku
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
(This is material presented as keynote at AMECSE 2014 on 21 Oct 2014 at Cairo, Egypt.)
State-of-the-art Artifical Intelligence (AI) and data management techniques have been demonstrated to process large volumes of noisy data to extract meaningful patterns and drive decisions in diverse applications ranging from space exploration (NASA's Curiosity), game shows (IBM's Watson in Jeopardy™ ) and even consumer products (Apple's SIRI™ voice-recognition). However, what stops them from helping us in more mundane things like fighting diseases, eliminating hunger, improving commuting
to work, or reducing financial frauds and corruption? Consumable data!
In this talk, Biplav will demonstrate and discuss how large volumes of data (Big), made available publicly (Open), can be productively used with semantic web and analytical techniques to drive day-to-day applications. One important source of this type of data is government open data which is from governments and free to be reused. Big Open Data is leading to early examples of "open innovations" - a confluence of open data (e.g., Data.gov, data.gov.in), accessible via API techniques (e.g., Open 311),
annotated with semantic information (e.g., W3C ontologies, Schema.org) and processed with analytical techniques (e.g., R, Weka) to drive actionable insights. The talk will illustrate how this can help bring increased benefits to citizens and discuss research issues that can accelerate its pace. It is increasingly being adopted by progressive businesses and governments to drive innovation that matters.
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
This session explores the use of advanced tools and strategies for data analytics to increase audience engagement and customer retention. The presentation examines the use of machine learning to build predictive analytics models to identify customers likely to churn as well as for sentiment analysis for customer interest. This also includes a case study on predictive analytics for OTT (over the top) content delivery.
Discover the Future of Entertainment: Dive into the world of movie recommendation systems in our engaging presentation. Join us as we explore the power of cutting-edge technology and data analytics to enhance user experiences in the entertainment industry. Our journey begins with data collection and cleaning, followed by a fascinating peek into the importance of movie recommendation systems.
Uncover the Problem: Have you ever felt overwhelmed by the sheer number of movie choices on streaming platforms like Netflix and Amazon Prime? Our project addresses this very challenge by simplifying your movie selection process.
A Glimpse into the Timeline: Journey with us through the phases of data collection, preprocessing, and basic exploratory data analysis. Witness the transformation of raw data into actionable insights.
Cosine Similarity Revealed: Delve into the heart of our recommendation system as we explain the concept of Cosine Similarity, the mathematical foundation behind our recommendations.
Pros and Cons Explored: Explore the pros and cons of movie recommendation systems, from personalized user experiences and increased engagement to challenges like the 'Cold Start Problem' and privacy concerns.
Similar to IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entities from Wikipedia (20)
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entities from Wikipedia
1. Did you know?- Mining Interesting Trivia for
Entities from Wikipedia
Abhay Prakash1, Manoj K. Chinnakotla2, Dhaval Patel1, Puneet Garg2
1Indian Institute of Technology Roorkee, India 2Microsoft, India
2. Did you know?
Dark Knight (2008): To prepare for Joker’s role, Heath Ledger lived alone in a hotel
room for a month, formulating the character’s posture, voice, and personality.
IJCAI-15: IJCAI-15 is the first IJCAI edition in South America, and the southern most
edition ever.
Argentina: In 2001, Argentina had 5 Presidents in 10 days!
Tom Hanks: Tom Hanks has an asteroid named after him: “12818 tomhanks”
3. What is a Trivia?
Definition: Trivia is any fact about an entity which is interesting due to any of
the following characteristics
Unusualness
Uniqueness
Unexpectedness
Weirdness
But, Isn’t interestingness subjective?
Yes!
For the current work, we take a majoritarian view for interestingness
5. Wikipedia Trivia Miner (WTM)
Automatically mine trivia for entities from unstructured text of Wikipedia
Why Wikipedia?
Reliable for factual correctness
Ample # of interesting trivia (56/100 in expt.)
Learn a model of interestingness for target domain
Use the interestingness model to rank sentences from Wikipedia
7. Candidate
Selection
Candidates’ Source
Top-K Interesting Trivia
from Candidates
Feature ExtractionSVMrank
Knowledge Base
Retrieval Phase
Human Voted Trivia Source
Train Dataset
Filtering & Grading
Feature Extraction SVMrank
Train Phase
Model
System Architecture
8. Candidate
Selection
Human Voted Trivia Source
Train Dataset Candidates’ Source
Top-K Interesting Trivia
from Candidates
Wikipedia Trivia Miner (WTM)
Interestingness Ranker
Filtering & Grading
Feature Extraction Feature ExtractionSVMrank
Knowledge Base
Training Phase
Learn Interestingness Model
Train Phase
9. Filtering & Grading
Crawled Trivia from IMDB
Top 5K movies, 99K trivia in total
Filter facts with lesser reliability
Number of votes < 5
𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 𝐿. 𝑅 =
# 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑖𝑛𝑔 𝑉𝑜𝑡𝑒𝑠
# 𝑜𝑓 𝑇𝑜𝑡𝑎𝑙 𝑉𝑜𝑡𝑒𝑠
Convert this skewed distribution into grades
Sample Trivia for movie 'Batman Begins‘ [screenshot taken from IMDB]
0
5
10
15
20
25
30
35
40
39.56
30.33
17.08
4.88
3.57
1.74 1.06 0.65 0.6 0.33 0.21
%ageCoverage
Likeness Ratio
10. Filtering & Grading (Contd..)
High Support for High LR
For L.R. > 0.6, # of votes >= 100
Graded by Percentile-Cutoff to get 5 grades
[90,100], [75-90), [25-75), [10-25), [0-10)
6163 samples from 846 movies
706
1091
2880
945
541
0
500
1000
1500
2000
2500
3000
3500
4 (Very
Interesting)
3
(Interesting)
2
(Ambiguous)
1 (Boring) 0 (Very
Boring)
Frequency
Trivia Grade
11. Feature Engineering
Bucket Feature Significance
Sample
features
Example Trivia
Unigram (U)
Features
Each word’s
TF-IDF
Identify imp. words which
make the trivia interesting
“stunt”, “award”,
“improvise”
“Tom Cruise did all of his own stunt driving.”
Linguistic (L)
Features
Superlative
Words
Shows the extremeness
(uniqueness)
“best”, “longest”,
“first”
“The longest animated Disney film since
Fantasia (1940).”
Contradictory
Words
Opposing ideas could spark
intrigue and interest
“but”,
“although”,
“unlike”
“The studios wanted Matthew McConaughey
for lead role, but James Cameron insisted on
Leonardo DiCaprio.”
Root Word
(Main Verb)
Captures core activity
being discussed in the
sentence
root_gross “Gravity grossed $274 Mn in North America”
Subject Word
(First Noun)
Captures core thing being
discussed in the sentence
subj_actor “The actors snorted crushed B vitamins for
scenes involving cocaine”
Readability Complex and lengthy trivia
are hardly interesting
FOG Index binned
in 3 bins ---
12. Feature Engineering (Contd…)
Bucket Feature Significance Sample features Example Trivia
Entity (E)
Features
Generic NEs captures general about-
ness
MONEY,
ORGANIZATION,
PERSON, DATE, TIME
and LOCATION
“The guns in the film were supplied by Aldo
Uberti Inc., a company in Italy.”
• ORGANIZATION and LOCATION
Related
Entities
captures specific about-
ness
(Entities resolved using
DBPedia)
entity_producer,
entity_director
“According to Victoria Alonso, Rocket
Raccoon and Groot were created through a
mix of motion-capture and rotomation VFX.”
• entity_producer, entity_character
Entity Linking
before
(L) Parsing
Captures generalized
story of sentence
subj_entity_produce
r
[The same trivia above]
• “According to entity_producer, …”
• subj_Victoria subj_entity_producer
Focus Entities Captures core entities
being talked about
underroot_entity_
producer
[The same trivia above]
• underroot_entity_producer,
underroot_entity_character
13. Domain Independence of Features
All the features are automatically generated and domain-independent
Entity Features are automatically generated using attribute:value pairs in Dbpedia
For a match of ‘value’ in sentence, the match is replaced by entity_‘attribute’
Unigram (U) and Linguistic (L) features are clearly domain independent
DBpedia (attribute: value) pairs for Batman BeginsSample Trivia (Batman Begins)
14. Interestingness Ranking Model
Given facts (sentences) along with their interestingness grade, learn a model of
interestingness which will rank sentences based on their interestingness
Use Rank SVM model
MOVIE_ID FEATURES GRADE
1 1:1 5:2 … 4
1 … 2
1 … 1
2 … 4
2 … 3
2 … 1
2 … 1
MOVIE_ID FEATURES
1 1:1 5:2 …
1 …
2 …
2 …
2 …
3 …
3 …
Image taken and modified from Wikipedia
SCORE
1.7
2.4
1.2
2.7
0.13
3.1
1.3
INPUT FOR TRAINING MODEL BUILT (Hyperplane) INPUT FOR RANKING OUTPUT OF RANKING
MODEL
15. Interestingness Model: Cross Validation Results
0.934
0.919
0.929
0.9419
0.944
0.951
0.9
0.91
0.92
0.93
0.94
0.95
0.96
Unigram (U) Linguistic (L) Entity Features (E) U + L U + E WTM (U + L + E)
NDCG@10
Feature Group
16. Interestingness Model: Feature Weights
Rank Feature Group
1 subj_scene Linguistic
2 subj_entity_cast Linguistic + Entity
3 entity_produced_by Entity
4 underroot_unlinked_organization Linguistic + Entity
6 root_improvise Linguistic
7 entity_character Entity
8 MONEY Entity (NER)
14 stunt Unigram
16 superPOS Linguistic
17 subj_actor Linguistic
Entity Linking leads to better
generalization else these
would have been
subj_wolverine etc.
17. Candidate
Selection
Human Voted Trivia Source
Train Dataset Candidates’ Source
Top-K Interesting Trivia
from Candidates
Wikipedia Trivia Miner (WTM)
Interestingness Ranker
Filtering & Grading
Feature Extraction Feature ExtractionSVMrank
Knowledge Base
Retrieval Phase
Retrieval Phase
Get Trivia from Wikipedia Page
18. Candidate Selection
Sentence Extraction
Crawled only the text in paragraph tag <p>…</p>
Sentence detection took each sentence for further processing
Removed sentences with missing context
E.g. “It really reminds me of my childhood.”
Co-ref resolution to find out links to different sentence
Remove if out link not the target entity
“Hanks revealed that he signed onto the film after an hour and a half
of reading the script. He initially ...”
First ‘he’ not an out link, ‘the film’ points to the target entity. Second
‘He’ is an out link. First sentence kept, Second removed
19. Evaluation Dataset
20 New Movie Pages from Wikipedia
No. of Sentences: 2928
No. of Positive Sentences: 791
Judged (crowd-sourced) by 5 judges
Two scale voting
Boring / Interesting
Majority voting for class rating
Statistically significant?
Got 100 trivia from IMDB also judged by 5 judges only
Mechanism I: Majority voting of IMDB crowd v/s Mechanism II: Crowd-
sourced by 5 judges
Agreement between two mechanisms = Substantial (Kappa Value = 0.618)
Kappa Agreement
< 0 Less than chance agreement
0.01-0.20 Slight agreement
0.21-0.40 Fair agreement
0.41-0.60 Moderate agreement
0.61-0.80 Substantial agreement
0.81-0.99 Almost perfect agreement
20. Comparative Baselines
I. Random [Baseline I]:
- 10 sentences picked randomly from Wikipedia
II. CS + Random
- Candidates Selected
- Remove sentences like “it really reminds me of my childhood”
III. CS + supPOS(Best) [Baseline II]:
- Candidates Selected
- Ranked by No. of Superlative Words
Rank # of sup.
words
Class
1 2 Interesting
2 2 Boring
3 1 Interesting
4 1 Interesting
5 1 Interesting
6 1 Boring
7 1 Boring
supPOS (Best Case)
23. Results: Precision@10
CS+Random > Random
Shows significance of Candidate
Selection
WTM (U+L+E) >> WTM (U)
Shows significance of Engineered
Linguistic (L) and Entity (E)
Features
0.25
0.3
0.34 0.34
0.45
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Random CS+Random supPOS
(Best Case)
WTM (U) WTM
(U+L+E)
P@10
Approaches
24. Results: Recall@K
supPOS limited to one kind of
trivia
WTM captures varied types
62% recall till rank 25
Performance Comparison
supPOS better till rank 3
Soon after rank 3, WTM
beats superPOS
0
10
20
30
40
50
60
70
0 5 10 15 20 25
%Recall
Rank
SuperPOS (Best Case) WTM Random
25. Qualitative Analysis
Result Movie Trivia Description
WTM Wins
(Sup. POS
Misses)
Interstellar
(2014)
Paramount is providing a virtual reality walkthrough
of the Endurance spacecraft using Oculus Rift
technology.
Due to Organization being
subject, and (U) features
(technology, reality, virtual)
Gravity
(2013)
When the script was finalized, Cuarón assumed it
would take about a year to complete the film, but it
took four and a half years.
Due to Entity.Director,
Subject (the script), Root
word (assume) and (U)
features (film, years)
WTM’s Bad
Elf (2003) Stop motion animation was also used. Candidate Selection failed
Rio 2
(2014) Rio 2 received mixed reviews from critics.
Root verb "receive" has high
weightage in model
26. Qualitative Analysis (Contd…)
Result Movie Trivia Description
Sup. POS Wins
(WTM misses)
The
Incredibles
(2004)
Humans are widely considered to be the most
difficult thing to execute in animation.
Presence of ‘most’,
absence of any Entity,
vague Root word
(consider)
Sup. POS's Bad
Lone
Survivor
(2013)
Most critics praised Berg's direction, as well as the
acting, story, visuals and battle sequences.
Here 'most' is not to show
degree but instead to
show generality.
27. Our Contributions
Introduced a novel research problem
Mining Interesting Facts for Entities from Unstructured Text
Proposed a novel approach “Wikipedia Trivia Miner (WTM)”
For mining top-k interesting trivia for movie entities based on their
interestingness
For movie entities, we leverage already available user-generated trivia data from
IMDB for learning interestingness
All the Data and Code used in this paper have been made publicly available for research purposes at
https://github.com/abhayprakash/WikipediaTriviaMiner_SharedResources/
28. Acknowledgements
First author travel was supported by travel grants from Xerox Research Centre India,
IIT Roorkee, IJCAI and Microsoft Research India.