Joint work wih Amadeus presenting a recommender system for your next destination using knowledge graphs and deep learning network, presented at the LocWeb 2019 Workshop colocated with TheWebConf 2019 (San Francisco, USA)
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...IJSRD
Recent years have witnessed an increased interest on recommendation system. Classification techniques are supervised that has classified data item into predefined class. An existing system unsupervised constraints are automatically derived from two hidden Tourist area season topic (TAST) for tourist in travel group. It used to an alternating TRAST model are unique characteristic for the travel data and cocktail.
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...IJSRD
Recent years have witnessed an increased interest on recommendation system. Classification techniques are supervised that has classified data item into predefined class. An existing system unsupervised constraints are automatically derived from two hidden Tourist area season topic (TAST) for tourist in travel group. It used to an alternating TRAST model are unique characteristic for the travel data and cocktail.
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
Beyond Collaborative Filtering: Learning to Rank Research ArticlesMaya Hristakeva
At Elsevier we work on recommender systems to help researchers connect to their research and to collaborators (e.g. Mendeley Suggest, Science Direct, Funding Opportunities and Evise Reviewer recommenders). This talk focused on the recent improvements the team has made to the Science Direct research articles recommender by deploying ranking models in production.
I gave this presentation at the 7th RecSys London Meetup - https://www.meetup.com/RecSys-London/events/255362180/
Connecting Scenario Approaches with Scenario ToolsRPO America
During the 2017 National Regional Transportation Conference, Ian Varley provided case study examples from City Explained, LLC's work using scenario analysis.
Studying information behavior: The Many Faces of Digital Visitors and ResidentsLynn Connaway
Connaway, L. S. (2018). Studying information behavior: The Many Faces of Digital Visitors and Residents. Presented at Bar-Ilan University, March 11, 2018, Ramat Gan, Israel.
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
Connaway, L. S. (2018). Studying information behavior: The Many Faces of Digital Visitors and Residents. Presented at Bar-Ilan University, March 11, 2018, Ramat Gan, Israel.
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Diversifying Contextual Suggestions from Location-based Social Networks
M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis
A talk at the IIiX 2014 conference in Resenburg
RDF and graph databases are steadily increasing their adoption and are no longer choices of niche-only communities. For almost 20 years, a constraint language for RDF was a big missing piece in the technology stack and a prohibiting factor for further adoption.
Even though most RDF-based systems were performing data validation and quality assessment, there was no standardized way to define constraints. People were using ad-hoc solutions or schemas and languages that were not meant for validation.
Thankfully, since 2017 there are 2 additions to the RDF technology stack: SHACL & ShEx. Both provide a high level RDF constraint language that people can use to define data constraints (a.k.a. Shapes), each with different strengths.
This talk provides an outline of different types of RDF data quality issues and existing approaches to quality assessment. The goal is to give an overview of the existing RDF validation landscape and hopefully, inspire people on how to improve their RDF publishing workflows.
Smart Cities that don't go "bump" in the night: delivering interoperable smar...Rick Robinson
I gave this presentation at the launch of the British Standards Institute's development of standards for interoperability between Smart Cities systems. It draws on my experience delivering large-scale, standards-based technology architectures. Whilst Open Standards will be absolutely crucial to the delivery and operation of interoperable, open Smart Cities systems, they are not a panacea, and it's vital that we're aware of their limitations as well as their value.
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning Phuc Nguyen
Semantic labeling for numerical values is a task of assigning semantic labels to unknown numerical attributes. The semantic labels could be numerical properties in ontologies, instances in knowledge bases, or labeled data that are manually annotated by domain experts. In this paper, we refer to semantic labeling as a retrieval setting where the label of an unknown attribute is assigned by the label of the most relevant attribute in labeled data. One of the greatest challenges is that an unknown attribute rarely has the same set of values with the similar one in the labeled data. To overcome the issue, statistical interpretation of value distribution is taken into account. However, the existing studies assume a specific form of distribution. It is not appropriate in particular to apply open data where there is no knowledge of data in advance. To address these problems, we propose a neural numerical embedding model (EmbNum) to learn useful representation vectors for numerical attributes without prior assumptions on the distribution of data. Then, the "semantic similarities" between the attributes are measured on these representation vectors by the Euclidean distance. Our empirical experiments on City Data and Open Data show that EmbNum significantly outperforms state-of-the-art methods for the task of numerical attribute semantic labeling regarding effectiveness and efficiency.
Beyond Collaborative Filtering: Learning to Rank Research ArticlesMaya Hristakeva
At Elsevier we work on recommender systems to help researchers connect to their research and to collaborators (e.g. Mendeley Suggest, Science Direct, Funding Opportunities and Evise Reviewer recommenders). This talk focused on the recent improvements the team has made to the Science Direct research articles recommender by deploying ranking models in production.
I gave this presentation at the 7th RecSys London Meetup - https://www.meetup.com/RecSys-London/events/255362180/
Connecting Scenario Approaches with Scenario ToolsRPO America
During the 2017 National Regional Transportation Conference, Ian Varley provided case study examples from City Explained, LLC's work using scenario analysis.
Studying information behavior: The Many Faces of Digital Visitors and ResidentsLynn Connaway
Connaway, L. S. (2018). Studying information behavior: The Many Faces of Digital Visitors and Residents. Presented at Bar-Ilan University, March 11, 2018, Ramat Gan, Israel.
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
Connaway, L. S. (2018). Studying information behavior: The Many Faces of Digital Visitors and Residents. Presented at Bar-Ilan University, March 11, 2018, Ramat Gan, Israel.
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Diversifying Contextual Suggestions from Location-based Social Networks
M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis
A talk at the IIiX 2014 conference in Resenburg
RDF and graph databases are steadily increasing their adoption and are no longer choices of niche-only communities. For almost 20 years, a constraint language for RDF was a big missing piece in the technology stack and a prohibiting factor for further adoption.
Even though most RDF-based systems were performing data validation and quality assessment, there was no standardized way to define constraints. People were using ad-hoc solutions or schemas and languages that were not meant for validation.
Thankfully, since 2017 there are 2 additions to the RDF technology stack: SHACL & ShEx. Both provide a high level RDF constraint language that people can use to define data constraints (a.k.a. Shapes), each with different strengths.
This talk provides an outline of different types of RDF data quality issues and existing approaches to quality assessment. The goal is to give an overview of the existing RDF validation landscape and hopefully, inspire people on how to improve their RDF publishing workflows.
Smart Cities that don't go "bump" in the night: delivering interoperable smar...Rick Robinson
I gave this presentation at the launch of the British Standards Institute's development of standards for interoperability between Smart Cities systems. It draws on my experience delivering large-scale, standards-based technology architectures. Whilst Open Standards will be absolutely crucial to the delivery and operation of interoperable, open Smart Cities systems, they are not a panacea, and it's vital that we're aware of their limitations as well as their value.
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning Phuc Nguyen
Semantic labeling for numerical values is a task of assigning semantic labels to unknown numerical attributes. The semantic labels could be numerical properties in ontologies, instances in knowledge bases, or labeled data that are manually annotated by domain experts. In this paper, we refer to semantic labeling as a retrieval setting where the label of an unknown attribute is assigned by the label of the most relevant attribute in labeled data. One of the greatest challenges is that an unknown attribute rarely has the same set of values with the similar one in the labeled data. To overcome the issue, statistical interpretation of value distribution is taken into account. However, the existing studies assume a specific form of distribution. It is not appropriate in particular to apply open data where there is no knowledge of data in advance. To address these problems, we propose a neural numerical embedding model (EmbNum) to learn useful representation vectors for numerical attributes without prior assumptions on the distribution of data. Then, the "semantic similarities" between the attributes are measured on these representation vectors by the Euclidean distance. Our empirical experiments on City Data and Open Data show that EmbNum significantly outperforms state-of-the-art methods for the task of numerical attribute semantic labeling regarding effectiveness and efficiency.
NERD: an open source platform for extracting and disambiguating named entitie...Raphael Troncy
"NERD: an open source platform for extracting and disambiguating named entities in very diverse documents" - Keynote Talk given at the NLP&DBpedia International Workshop (NLP&DBpedia), 22 October 2013
Deep-linking into Media Assets at the Fragment Level SMAM 2013Raphael Troncy
"Deep-linking into Media Assets at the Fragment Level: Specification, Model and Applications" - Keynote Talk given at the International Workshop on Semantic Music and Media (SMAM), 21 October 2013
Semantics at the multimedia fragment level SSSW 2013Raphael Troncy
"Semantics at the multimedia fragment level or how enabling the remixing of online media" - Invited Talk given at the Semantic Web Summer School (SSSW), 12 July 2013
MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the CrowdRaphael Troncy
"MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd", talk given at the 2nd Real Time Analysis and Mining of Social Streams Workshop (RAMSS) colocated with WWW 2013, Rio de Janeiro, Brazil
EventMedia Live: Exploring Events Connections in Real-Time to Enhance ContentRaphael Troncy
"EventMedia Live: Exploring Events Connections in Real-Time to Enhance Content" presented at the Semantic Web Challenge, Open Track, of the 11th International Semantic Web Conference, Boston, USA, November 2012
ShareIt: Mining SocialMedia Activities for Detecting EventsRaphael Troncy
ShareIt: Mining #SocialMedia Activities for Detecting #Events, Talk given at the 2nd Summer School on Social Media Retrieval (S3MR), June 2011, Antalya, Turkey
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Location Embeddings for Next Trip Recommendation
1. Location Embeddings for Next Trip
Recommendation
Amine Dadoun, Raphael Troncy,
Riccardo Petitti, Olivier Ratier
LocWeb19,
13 May 2019
2. LocWeb 2019 … Why?
LocationUser
Web, Social Media Recommendation, Travel
2
3. Travel … A great source of inspiration
John Doe
“I do not
know where
to go”
“Try this”
3
4. Use Case Description
Given a traveler, his demographics, his historical bookings and the
contextual data related to these bookings, we recommend him a
ranked list of destinations he would like to go to.
Traveler's Demographic Data
43 years old, Malaysian, Male, Nature, Museums
Time
Contextual Data
14/09/2016, Wednesday, 2 Days, Alone, etc.
21/12/2016, Friday, 14 Days, 4 persons in party, etc.
07/06/2017, Saturday, 10 Days, 2 persons in party, etc.
15/01/2017, Sunday, 5 Days, Alone, etc.
09/09/2018, Sunday, 4 Days, Alone, etc.
?
+
4
5. Scientific Problems
Given historical purchases made by a user (or user-item past interactions), plus the
context where the interaction was made, how can we accurately predict what will
be the next item the user is going to interact with?
Research Questions
1. What item to recommend to the user?
2. Can we integrate external data to improve the accuracy of a predictive model?
3. How can we evaluate the recommendation made to this user?
5
6. DKFM (our approach):
It combines Factorization
Machines in order to
represent contextual
information and the WDL
Recommender System in
order to have the user-
item interactions and the
content information. The
combination of these two
models are represented in
a DNN
6
State of the Art
Recommender
System
Collaborative
Filtering [1, 2, 3]
Implicit MF
Bayesian
Personalized MF
Neural
Collaborative
Filtering
Content-based
Filtering [4]
Item KNN
Hybrid Method [5]
Wide & Deep
Learning
Context-aware
Recommender
System [6, 7]
Factorization
Machines
Neural
Factorization
Machines
Knowledge-aware
Recommender
System [8]
Deep Knowledge
Factorization
Machines
Collaborative Fileting:
They are Matrix Factorization
methods based only on the user-
item interaction. They vary either on
the loss used in the training or in the
interaction function that computes
the recommendation probability.
Content-based Filtering:
Item KNN is a neighborhood based
collaborative filtering method, it
computes the k nearest neighbors
for each item.
Hybrid Method:
WDL is a DNN Model that computes
the probability to have a user-item
pair based on both user-item
interaction and the content of the
item
Context-aware Recommender System:
These two methods are based on
factorization machines algorithm
which take into account the context of
the recommendation in addition to
the user-item interaction
Our ModelSota & baselines
Recommender
Systems
7. 7
Data integration to enrich the representation of destination
User
Items
𝑢𝑢1
𝑖𝑖1
𝑖𝑖2
𝑖𝑖3
...
User-Item Interactions
Age,
Nationality,
Gender,
Etc.
User’s Demographics
Date,
Session behavior,
Etc.
Interaction Information
Item description:
• Text
• Knowledge Graph
• Etc.
Content Information
8. 8
Contribution: Deep Knowledge Factorization Machines (DKFM)
Deep Neural Network:
• Collaborative information
• Content information
• Contextual information
User
Items
𝑢𝑢1
𝑖𝑖1
𝑖𝑖2
𝑖𝑖3
...
User-Item Interactions
Item Description:
• Text
• Knowledge Graph
• Etc.
Content Information
Age,
Nationality,
Gender,
Etc.
User’s Demographics
Date,
Session Behavior,
Etc.
Interaction Information
9. 9
Back to our problem … Next Trip Destination
Traveler's Demographic Data
43 years old, Malaysian, Male, Nature, Museums
14/09/2016
Wednesday
2 Days
Alone
21/12/2016
Friday
14 Days
4 persons in party
07/06/2017
Saturday
10 Days
2 persons in party
09/09/2018
Sunday
4 Days
Alone
?
Historical Bookings with contextual information Next Trip Recommendation
10. 10
Traveller's Profiles Data
• Real Traveler’s Data • Number of Profiles: ~20M
• Number of Trips: ~15 M• Trip Type: One-way, Round-Trip, Multiple Journeys Trip
• Time range: February 2013- October 2019 • Number of Destinations: 1146
• Booking Creation Date
• Stay Duration
• Origin Airport
• Origin City
• Origin Country
• Origin Region
• Destination Airport
• Destination City
• Destination Country
• Destination Region
• Departure Date
• Departure Day of the Week
• Arrival Date
• Advanced Purchase
• Advanced Check-in
• Trip Number in Party
TripCustomer
• Age
• Customer Value
• Days to Next Bday
• Days to Next Flight
• Nationality
• Gender
• Last Booking Date
• Last Flown Date
• Type of Services
• Service Code
Trip
Services
Traveller
11. Data Pre-processing Pipeline
• Trips
• Traveler
demographics
Remove Travelers
with less than 5 Trips
• Remove Travelers
with less than 5 different Trips
• Remove Destinations visited less
than 20 times
Only 32% of the trips left Only 4% of the trips left
Business Leisure
Only 2% of the trips left
Number of Travelers 26K/20M (0.13%)
Number of Trips 300K/15M (2.1%)
Number of Destinations 119/1146 (10%)
Travelers Segmentation
11
12. 12
Data Pre-processing: Data Filtering for Recommendation
• Remove Travelers with less than 5 Trips (Different Destinations)
• Remove Destinations that are visited less than 20 Times
Kuala Lumpur Sydney London New York Paris
Traveler 1 8 2 1 0 0
Traveler 2 4 0 1 0 1
Traveler 3 2 2 2 1 0
Traveler 4 4 0 0 0 2
Traveler 5 1 0 2 0 3
• Number of Trips: ~4.8 M bookings
• Number of Travelers: 814 919
• Number of Destinations: 763
R =
• Sparsity is defined as follows: 𝜌𝜌 𝑅𝑅 = 1 −
#𝐼𝐼 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝐼𝐼𝐼𝐼
#𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 × #𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
#Feedbacks #Interactions #Cities #Travelers Sparsity
610 515 361 412 135 31 205 92%
• 𝜌𝜌(Leisure_Trips) = 99.8%: Too sparse to build a Recommender System
• More than 65% of travelers have traveled only 2 times
• Interaction Matrix: 𝑅𝑅 ∈ 𝑁𝑁#𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 × #𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
:
𝑟𝑟𝑢𝑢𝑢𝑢 = #𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑡𝑡𝑡𝑡 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑢𝑢
13. 13
Data Pre-processing: Customer Segmentation
CEM Trips
Business Leisure
Historical Trips already
labeled B/L
Training
B/L Classifier
Prediction
Trips Data: 122 242 trips
Features used:
• Number of Passenger, Stay Duration,
Saturday Stay, Purchase Anticipation, Age, Gender
Time Range:
• Feb 2014 - Feb 2017
Distribution:
• 40-60 % B/L
Training
Random Forest Classifier
Grid Search on Training Data
5 Fold Cross Validation for evaluation with 75-25%
Training & Test Set
Accuracy = 0.87, Precision = 0.87, Recall = 0.91
Features Importance
#Feedbacks #Interactions #Cities #Travelers Sparsity
304 019 152 547 119 26 019 95%
14. 14
Data Enrichment using Word Embeddings
Phuket
Adelaide
London
Etc.
Cities
…
Wikipedia Cities Content
1. Compute the TF-IDF of each word
the
a
Etc.
Pre-trained
Word Vectors [8]
2. London Textual Embedding:
Weighted sum of word vectors,
where the weight of each word vector corresponds to the term
frequency-inverse document frequency (TF-IDF) of the word
15. 15
Data Enrichment using Knowledge Graph Embeddings
Knowledge Graph Embeddings (KGE)
Phuket
Adelaide
London
Etc.
Cities
TransE Model[9] :
Given a triple (h, r, t) in the graph,
the idea is to minimize the distance
between h and t embeddings
KGE_Phuket
KGE_Adelaide
KGE_London
Etc.
KGE Cities
Knowledge Graph
Embedding of Phuket
Semantic Trails Knowledge Graph:
The knowledge graph represents the interaction user-venue,
through the property ’visiting’ as well as the relations
between the venue and the other entities,
namely: category, schema and city
https://arxiv.org/abs/1812.04367
16. 16
Deep Knowledge Factorization Machines
Deep Neural Network:
• Collaborative information
• Content information
• Contextual information
Semantic Trails Knowledge Graph
• What characterized a city the most?
• An Embedding of each city is constructed
based on TransE model
• TransE Model: Given a triple (h, r, t) in the
graph, the idea is to minimize the distance
between h and t embeddings
Wikipedia
• Representation of cities based on their textual description
in Wikipedia
• Each Wikipedia Document is encoded as a weighted sum of
word vectors
• We used pre-trained word vectors from fasttext (n-gram
model)
• N-gram model is similar to Skip-gram model, but instead of
learning a vector representation for a word, we learn a
representation for each character.
• Weights of the word vectors are their TF-IDF scores
Travelers' Profiles & Trips
External Data
17. Leave-one-out protocol: for each user, we remove the last destination he went to, and consider it as test set
17
Training Procedure and Evaluation ProtocolTime
Training Data
Test Data
Recommender
System
Non Existing
Traveler-Destination
pair
Recommender
System trained
Ranked list
of Destinations
Prediction
…
1.
2.
4.
3.
Hitrate@K [3]
MRR@K [7]
Adelaide
Osaka
Phuket
Brunei
19. 19
DKFM: what is the contribution of each input data?
Better
Deep Neural Network + Data Enrichment => Best results
Demographics
Data
Textual
Embedding
Knowledge
Graph
Embedding
HR@
10
MRR@
10
0.72 0.34
0.79 0.37
0.80 0.38
0.82 0.38
0.84 0.41
0.85 0.42
0.88 0.44
Input Contribution
20. 20
Conclusion and Future Work
Future Work
• Enrich cities’ characteristics using visual embeddings
• Explore other loss functions such as pairwise loss
• Explore the use of similarity measure inside the DNN such as cosine similarity
Conclusions
• Combining different types of input improves remarkably recommendation results
• DKFM model outperforms state-of-the-art collaborative filtering methods
Open Science
• DKFM implementation available at
https://gitlab.eurecom.fr/amadeus/DKFM-recommendation
21. 21
References
[1] Badrul Sarwar, George Karypis, Joseph A Konstan, and John Riedl. 2001. Item-based collaborative filtering
recommendation algorithms.
[2] Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets.
[3] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized
Ranking from Implicit Feedback.
[4] Steffen Rendle. 2010. Factorization Machines.
[5] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg
Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016.
Wide & Deep Learning for Recommender Systems.
[6] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering.
[7] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based
Neural Network for CTR Prediction.
[8] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training
Distributed Word Representations.
[9] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating
Embeddings for Modeling Multi-relational Data.