This document summarizes an application of machine learning techniques to location-based social networks. It discusses two applications:
1) GeoSRS, a hybrid social recommender system that provides personalized venue recommendations to users. It extracts data from Foursquare using an API, performs text modeling on tip content, and generates recommendations using both collaborative and content-based approaches.
2) Tweet-SCAN, an event discovery technique that identifies dense groups of geolocated tweets close in space, time, and topic to discover real-world events. It extends the DBSCAN clustering algorithm and represents tweet topics using probabilistic models. The technique is evaluated on tweets from Barcelona events.
Contextual Recommendation of Social Updates, a tag-based frameworkAdrien Joly
How to cope with information overload?
In this presentation (and the corresponding paper), we propose a framework to improve the relevance of awareness information about people and subjects, by adapting recommendation techniques to real-time web data, in order to reduce information overload. The novelty of our approach relies on the use of contextual information about people's current activities to rank social updates which they are following on Social Networking Services and other collaborative software. The two hypothesis that we are supporting in this paper are: (i) a social update shared by person X is relevant to another person Y if the current context of Y is similar to X's context at time of sharing; and (ii) in a web-browsing session, a reliable current context of a user can be processed using metadata of web documents accessed by the user. We discuss the validity of these hypothesis by analyzing their results on experimental data.
Presented by Adrien Joly, on the 28/08/2010, at the Active Media Technology (AMT) conference, Toronto, Ontario, Canada.
WSDM 2018 Tutorial on Influence Maximization in Online Social NetworksCigdem Aslay
In this tutorial, we extensively survey the research on social influence propagation and maximization, with a focus on the recent algorithmic and theoretical advances. To this end, we provide detailed reviews of the latest research effort devoted to (i) improving the efficiency and scalability of the influence maximization algorithms; (ii) context-aware modeling of the influence maximization problem to better capture real-world marketing scenarios; (iii) modeling and learning of real-world social influence; (iv) bridging the gap between social advertising and viral marketing.
Emotional Social Signals for Search RankingIsmail BADACHE
A large amount of social feedback expressed by social signals (e.g. like, +1, rating) are assigned to web resources. These signals are often exploited as additional sources of evidence in search engines. Our objective in this paper is to study the impact of the new social signals, called Facebook reactions (love, haha, angry, wow, sad) in the retrieval. These reactions allow users to express more nuanced emotions compared to classic signals (e.g. like, share). First, we analyze these reactions and show how users use these signals to interact with posts. Second, we evaluate the impact of each such reaction in the retrieval, by comparing them to both the textual model without social features and the first classical signal (like-based model). These social features are modeled as document prior and are integrated into a language model. We conducted a series of experiments on IMDb dataset. Our findings reveal that incorporating social features is a promising approach for improving the retrieval ranking performance.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. In digital marketing, memes have become an attractive tool for engaging an online audience. Memes have an impact on buyers’ and sellers’ online behavior and information spreading processes. Thus, the technology of generating memes is a significant tool for social media engagement. In this study, we collected a new memes dataset of ∼650K meme instances, applied state of the art Deep Learning technique – GPT-2 model [1] towards meme generation, and compared machine-generated memes with human-created. We justified that MTurk workers can be used for the approximate estimating of users’ behavior in a social network, more precisely to measure engagement. Generated memes cause the same engagement as human memes, which didn’t collect engagement in the social network (historically). Still, generated memes are less engaging then random memes created by humans.
Learning to Classify Users in Online Interaction NetworksSymeon Papadopoulos
Presentation given at ICCSS 2015, Helsinki, Finland. It illustrates an approach for classifying users of OSNs solely based on their interactions with other users.
Contextual Recommendation of Social Updates, a tag-based frameworkAdrien Joly
How to cope with information overload?
In this presentation (and the corresponding paper), we propose a framework to improve the relevance of awareness information about people and subjects, by adapting recommendation techniques to real-time web data, in order to reduce information overload. The novelty of our approach relies on the use of contextual information about people's current activities to rank social updates which they are following on Social Networking Services and other collaborative software. The two hypothesis that we are supporting in this paper are: (i) a social update shared by person X is relevant to another person Y if the current context of Y is similar to X's context at time of sharing; and (ii) in a web-browsing session, a reliable current context of a user can be processed using metadata of web documents accessed by the user. We discuss the validity of these hypothesis by analyzing their results on experimental data.
Presented by Adrien Joly, on the 28/08/2010, at the Active Media Technology (AMT) conference, Toronto, Ontario, Canada.
WSDM 2018 Tutorial on Influence Maximization in Online Social NetworksCigdem Aslay
In this tutorial, we extensively survey the research on social influence propagation and maximization, with a focus on the recent algorithmic and theoretical advances. To this end, we provide detailed reviews of the latest research effort devoted to (i) improving the efficiency and scalability of the influence maximization algorithms; (ii) context-aware modeling of the influence maximization problem to better capture real-world marketing scenarios; (iii) modeling and learning of real-world social influence; (iv) bridging the gap between social advertising and viral marketing.
Emotional Social Signals for Search RankingIsmail BADACHE
A large amount of social feedback expressed by social signals (e.g. like, +1, rating) are assigned to web resources. These signals are often exploited as additional sources of evidence in search engines. Our objective in this paper is to study the impact of the new social signals, called Facebook reactions (love, haha, angry, wow, sad) in the retrieval. These reactions allow users to express more nuanced emotions compared to classic signals (e.g. like, share). First, we analyze these reactions and show how users use these signals to interact with posts. Second, we evaluate the impact of each such reaction in the retrieval, by comparing them to both the textual model without social features and the first classical signal (like-based model). These social features are modeled as document prior and are integrated into a language model. We conducted a series of experiments on IMDb dataset. Our findings reveal that incorporating social features is a promising approach for improving the retrieval ranking performance.
Ukrainian Catholic University
Faculty of Applied Sciences
Data Science Master Program
January 24th
Abstract. In digital marketing, memes have become an attractive tool for engaging an online audience. Memes have an impact on buyers’ and sellers’ online behavior and information spreading processes. Thus, the technology of generating memes is a significant tool for social media engagement. In this study, we collected a new memes dataset of ∼650K meme instances, applied state of the art Deep Learning technique – GPT-2 model [1] towards meme generation, and compared machine-generated memes with human-created. We justified that MTurk workers can be used for the approximate estimating of users’ behavior in a social network, more precisely to measure engagement. Generated memes cause the same engagement as human memes, which didn’t collect engagement in the social network (historically). Still, generated memes are less engaging then random memes created by humans.
Learning to Classify Users in Online Interaction NetworksSymeon Papadopoulos
Presentation given at ICCSS 2015, Helsinki, Finland. It illustrates an approach for classifying users of OSNs solely based on their interactions with other users.
[ECWEB2012]Differential Context Relaxation for Context-Aware Travel Recommend...YONG ZHENG
Context-aware recommendation (CARS) has been shown to be an effective approach to recommendation in a number of domains. However, the problem of identifying appropriate contextual variables remains: using too many contextual variables risks a drastic increase in dimensionality and a loss of accuracy in recommendation. In this paper, we propose a novel treatment of context – identifying influential contexts for different algorithm components instead of for the whole algorithm. Based on this idea, we take traditional user-based collaborative filtering (CF) as an example, decompose it into three context-sensitive components, and propose a hybrid contextual approach. We then identify appropriate relaxations of contextual constraints for each algorithm component. The effectiveness of context relaxation is demonstrated by comparison of three algorithms using a travel data set: a contenxt-ignorant approach, contextual pre-filtering, and our hybrid contextual algorithm. The experiments show that choosing an appropriate relaxation of the contextual constraints for each component of an algorithm outperforms strict application of the context.
The Internet of Things (IoT) comes with great possibilities as well as major security and privacy issues. Although digital forensics has long been studied in both academia and industry, mobility forensics is relatively new and unexplored. Mobility forensics deals with tools and techniques that work towards forensically sound recovery of data and evidence from mobile devices [1]. In this paper, we explore mobility forensics in the context of IoT. This paper discusses the data collection and classification process from IoT smart home devices in details. It also contains attack scenario based analysis of collected data and a proposed mobility forensics model that fits into such scenarios.
Cite: K. M. S. Rahman, M. Bishop, and A. Holt, “Internet of Things Mobility Forensics,” INSuRE Conference, 2016.
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
Passenger profiling plays a vital part of commercial aviation security. Classical passenger profiling methods are inefficient in handling the rapidly increasing amounts of electronic records. Emerging deep learning models combined with highly parallel computing have exhibited promising performance for feature exaction and abstraction, but their applications in aviation security management have rarely been reported.
Online Machine Learning: introduction and examplesFelipe
In this talk I introduce the topic of Online Machine Learning, which deals with techniques for doing machine learning in an online setting, i.e. where you train your model a few examples at a time, rather than using the full dataset (off-line learning).
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
How multiple experts can be leveraged in a machine learning application without knowing apriori who are "good" experts and who are "bad" experts. See how we can quantify the bounds on the overall results.
* GOTO Berlin Conference 2013
Toru Shimogaki / NTT DATA CORPORATION
"The realtime processing for web services"
In Recruit Technologies, we are now concentrating on using streaming data processing and machine learning to analyze online user behavior and improve our services. We have a packaged solution named "Genn.ai" to make these technologies widely available in Recruit group. It will be opensourced. Using it, you can extract the power of Storm with simple scripts! In addition, we are making an effort to use online machine learning middleware "Jubatus" in production with NTT DATA.
http://gotocon.com/berlin-2013/presentation/The%20realtime%20processing%20for%20web%20services
https://imatge.upc.edu/web/publications/semantic-and-diverse-summarization-egocentric-photo-events
This project generates visual summaries of events depicted from egocentric photos taken with a wearable camera. These summaries are addressed to mild-dementia patients in order to exercise their memory in a daily base. The main contribution is an iterative approach that guarantees the semantic diversity of the summary and a novel soft metric to assess subjective results. Medical experts validated the proposed solution with a Mean Opinion Score of 4.6 out of of 5.0. The flexibility and quality of the solution was also tested in the 2015 Retrieving Diverse Social Images Task from the scientific international benchmark, MediaEval.
[ECWEB2012]Differential Context Relaxation for Context-Aware Travel Recommend...YONG ZHENG
Context-aware recommendation (CARS) has been shown to be an effective approach to recommendation in a number of domains. However, the problem of identifying appropriate contextual variables remains: using too many contextual variables risks a drastic increase in dimensionality and a loss of accuracy in recommendation. In this paper, we propose a novel treatment of context – identifying influential contexts for different algorithm components instead of for the whole algorithm. Based on this idea, we take traditional user-based collaborative filtering (CF) as an example, decompose it into three context-sensitive components, and propose a hybrid contextual approach. We then identify appropriate relaxations of contextual constraints for each algorithm component. The effectiveness of context relaxation is demonstrated by comparison of three algorithms using a travel data set: a contenxt-ignorant approach, contextual pre-filtering, and our hybrid contextual algorithm. The experiments show that choosing an appropriate relaxation of the contextual constraints for each component of an algorithm outperforms strict application of the context.
The Internet of Things (IoT) comes with great possibilities as well as major security and privacy issues. Although digital forensics has long been studied in both academia and industry, mobility forensics is relatively new and unexplored. Mobility forensics deals with tools and techniques that work towards forensically sound recovery of data and evidence from mobile devices [1]. In this paper, we explore mobility forensics in the context of IoT. This paper discusses the data collection and classification process from IoT smart home devices in details. It also contains attack scenario based analysis of collected data and a proposed mobility forensics model that fits into such scenarios.
Cite: K. M. S. Rahman, M. Bishop, and A. Holt, “Internet of Things Mobility Forensics,” INSuRE Conference, 2016.
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
Passenger profiling plays a vital part of commercial aviation security. Classical passenger profiling methods are inefficient in handling the rapidly increasing amounts of electronic records. Emerging deep learning models combined with highly parallel computing have exhibited promising performance for feature exaction and abstraction, but their applications in aviation security management have rarely been reported.
Online Machine Learning: introduction and examplesFelipe
In this talk I introduce the topic of Online Machine Learning, which deals with techniques for doing machine learning in an online setting, i.e. where you train your model a few examples at a time, rather than using the full dataset (off-line learning).
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
How multiple experts can be leveraged in a machine learning application without knowing apriori who are "good" experts and who are "bad" experts. See how we can quantify the bounds on the overall results.
* GOTO Berlin Conference 2013
Toru Shimogaki / NTT DATA CORPORATION
"The realtime processing for web services"
In Recruit Technologies, we are now concentrating on using streaming data processing and machine learning to analyze online user behavior and improve our services. We have a packaged solution named "Genn.ai" to make these technologies widely available in Recruit group. It will be opensourced. Using it, you can extract the power of Storm with simple scripts! In addition, we are making an effort to use online machine learning middleware "Jubatus" in production with NTT DATA.
http://gotocon.com/berlin-2013/presentation/The%20realtime%20processing%20for%20web%20services
https://imatge.upc.edu/web/publications/semantic-and-diverse-summarization-egocentric-photo-events
This project generates visual summaries of events depicted from egocentric photos taken with a wearable camera. These summaries are addressed to mild-dementia patients in order to exercise their memory in a daily base. The main contribution is an iterative approach that guarantees the semantic diversity of the summary and a novel soft metric to assess subjective results. Medical experts validated the proposed solution with a Mean Opinion Score of 4.6 out of of 5.0. The flexibility and quality of the solution was also tested in the 2015 Retrieving Diverse Social Images Task from the scientific international benchmark, MediaEval.
Wimmics Research Team 2015 Activity ReportFabien Gandon
Extract of the activity report of the Wimmics joint research team between Inria Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis). Wimmics stands for web-instrumented man-machine interactions, communities and semantics. The team focuses on bridging social semantics and formal semantics on the web.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
In today’s age of ever increasing use of internet, there are around 74% active internet users out of which 60% users contribute to social networking and most of them are students from the age group 16-30. If this young generation is targeted specifically towards educational activities keeping the same social networking environment in the background would create interest in students for educational activities and also yield productive results. This can be implemented by creating a social-cum-educational portal with recommender systems. Specific information to specific student can be provided. Use of such technology can reduce the gap between students and the information which can lead to their inherent development and success! However, most of the existing Social Recommender systems do not have good scalabilities which are unable to process huge volumes of data. Aiming to this problem we can design a social recommender system based on Hadoop and its parallel computing platform.
Slides from my talk on Personalised Access to Linked Data. Presented at the EKAW 2014 conference. The poster to this paper won the best poster award at the conference!
We are providing training on IEEE 2016-17 projects for Ph.D Scalars, M.Tech, B.E, MCA, BCA and Diploma students for
all branches for their academic projects.
For more details call us or watsapp us @ 7676768124 0r 9545252155
Email your base papers to "adritsolutions@gmail.co.in"
We are providing IEEE projects on
1) Cloud Computing, Data Mining, BigData Projects Using JAva
2) Image Processing and Video Procesing (MATLAB) , Signal Processing
3) NS2 (Wireless Sensor, MANET, VANET)
4) ANDRIOD APPS
5) JAVA, JEE, J2EE, J2ME
6) Mechanical Design projects
7) Embedded Systems and IoT Projects
8) VLSI- Verilog Projects (ModelSim and Xilinx using FPGA)
For More details Please Visit us at
Adrit Solutions
Near Maruthi Mandir
#42/5, 18th Cross, 21st Main
Vijaynagar
Bangalore.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
The affect of service quality and online reviews on customer loyalty in the E...
Applications of Machine Learning to Location-based Social Networks
1. Applications of Machine Learning
to Location-based Social Networks
Joan Capdevila Pujol
e-mail: jc@ac.upc.edu
website: http://people.ac.upc.edu/jc
Advisors: Jordi Torres Viñals, Jesús Cerquides Bueno
2. 2
Table of contents
Motivation
Location-based Social Networks (LBSNs)
App 1: GeoSRS: A social recommender system
App 2: Tweet-SCAN: An event discovery technique
Conclusions and future trends
7. 7
A ML geek might have thought:
“With all this tagged data, I am going to build a classifier to
decide whether the person in the pic is hot or not.”
8. 8
Mark Zuckerberg probably thought:
“I’d rather prefer to keep playing to scale up the network
and then…”
11. 11
User engagement through several social networking services:
Linking to friends, colleagues, etc.
Setting school/college
Tagging friends to pictures
Liking publications
Geolocating content
Reviewing business
Expressing how one feels
…
12. 12
User engagement through several social networking services:
Linking to friends, colleagues, etc. à Social graphs
Setting school/college à User profiles
Tagging friends to pictures à Tagged images
Liking publications à Rating information
Geolocating content à Geolocated content
Reviewing business à Textual comments
Expressing how one feels à People feelings
…
16. Location-based Social Networks (LBSNs)
VIRTUALWORLD
Mobile communication + Positioning technologies
PHYSICALWORLD
2010 - …
17. 17
Locations
Location-acquisition technologies
– Outdoor: GPS, GSM, etc.
– Indoor: Wi-Fi, RFID, etc.
Representation of locations
– Absolute (e.g. latitude-longitude coordinates)
– Symbolic (e.g. at Pl. Catalunya, at Aeroport Girona-Costa Brava )
Forms of locations
– Point locations (e.g. Foursquare venues)
– Regions (e.g. Twitter places)
– Trajectories (e.g. Strava)
18. 18
Research lines
Understanding users
– User similarity/link prediction
– Experts/influencers detection
– Community discovery
Understanding locations
– Generic recommendation
• Most interesting locations and travel routes
• Itinerary planning
• Location-activity recommenders
– Personalized recommendation: GeoSRS [Capdevila et al. 2015]
Understanding events
– Anomaly detection: Tweet-SCAN [Capdevila et al. 2015]
– Crowd behavioral patterns
Zheng, Y. 2011
19. 19
Research lines
Understanding users
– User similarity/link prediction
– Experts/influencers detection
– Community discovery
Understanding Locations
– Generic recommendation
• Most interesting locations and travel routes
• Itinerary planning
• Location-activity recommenders
– Personalized recommendation: GeoSRS [Capdevila et al. 2015]
Understanding events
– Anomaly detection: Tweet-SCAN [Capdevila et al. 2015]
– Crowd behavioral patterns
Zheng, Y. 2011
20. 20
Research lines
Understanding users
– User similarity/link prediction
– Experts/influencers detection
– Community discovery
Understanding locations
– Generic recommendation
• Most interesting locations and travel routes
• Itinerary planning
• Location-activity recommenders
– Personalized recommendation: GeoSRS [Capdevila et al. 2015]
Understanding events
– Anomaly detection: Tweet-SCAN [Capdevila et al. 2015]
– Crowd behavioral patterns
Zheng, Y. 2011
21. GEOSRS:
A SOCIAL RECOMMENDER SYSTEM
21
Joan Capdevila, Marta Arias, and Argimiro Arratia. "GeoSRS: A hybrid social
recommender system for geolocated data." Information Systems (2015).
32. 32
Foursquare API
HTTP METHODS
– GET, POST, PUT ,DELETE
RESOURCES
– Venue, tip, user
e.g. GET https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3
ASPECTS
– Tips of a venue, friends of a user
e.g. GET https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3/tips
ACTIONS
– Approve a friendship, like a venue
e.g. POST https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3/like
33. 33
Foursquare API
App registration https://foursquare.com/developers/apps
Obtain the Foursquare API credentials (Client ID and Client Secret)
Access token
Allows apps to make requests to Foursquare on behalf of a user
Userless request
Specify consumer key’s Client ID and Client Secret
https://api.foursquare.com/v2/venues/search?
ll=40.7,-74&client_id=XX&client_secret=ZZ&v=20151125
Authenticated request
Specify access token
https://api.foursquare.com/v2/users/self/checkins?oauth_token=AA
34. 34
Foursquare API
Technical Limitations
Userless requests to venues/ resource = 5.000 request/hour
Userless requests to other resources = 500 request/hour
Authenticated requests = 500 request/hour*token
36. 36
Data Extraction
Goal: extract all tips from venues
in Manhattan (New York)
Medium:
– aspect: venues/VENUE_ID/tips
– resource: venues/search(sw, ne)
Limitations:
– 5000 request/hour
– at most 50 venues per request
SW
NE
40. 40
Quadtree algorithm
In each Quadcell at the tree leaves, there
are at most 50 venues.
Through venues/VENUE_ID/tips, we now
retrieve the tips for this venue
Each tip is linked to a VENUE_ID and
USER_ID
We now have a database of triplets
(USER, TIPS, VENUE) to perform
recommendation
50. TWEET-SCAN:
AN EVENT DISCOVERY TECHNIQUE
50
Joan Capdevila Jesús Cerquides Jordi Nin Jordi Torres. “Tweet-SCAN: An event
discovery technique for geo-located tweets”. Proceedings of the 18th International
Conference of the Catalan Association for Artificial Intelligence, 2015
53. 53
Examination of data
We looked at several tweet dimension separately
… from a dataset of tweets collected during “la Mercè” 2014
Spatial Temporal Textual
54. 54
Examination of data
We looked at several tweet dimension separately
… from a dataset of tweets collected during “la Mercè” 2014
Spatial Temporal Textual
58. 58
Tweet-SCAN
Tweet-SCAN is a technique to discover events from
geolocated Tweets.
It allows to discover dense groups of Tweets which are close
in space, time and textual meaning.
These dense groups of Tweets are linked to physical world
events
Textual meaning is represented through probabilistic topic
models
Tweet-scan can be seen as an extension of the popular
DBSCAN algorithm or a particular case of GDBSCAN
59. 59
Probabilistic topic modeling
Fig. - Xuriguera et al. 2013
LDA - Blei et al. 2003
HDP - Teh et al. 2006
Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet Process (HDP)
– Non-parametric version of LDA
60. 60
Probabilistic topic modeling
VAN VAN MARKET - 🚐🚎🍤🍱🍜 Mercat gastronòmada #lepetitbangkok @ Parc de la Ciutadella
http://t.co/5CvnUFoIDa
Topic Proportions: [(1, 0.30002802458675537), (11,0.58330530874655417)]
62. 62
DBSCAN
Density-based Spatial Clustering for Applications with Noise
Core points
params: Minpts=4, ε =
ε
Ester et al. 1996
63. 63
DBSCAN
Density-based Spatial Clustering for Applications with Noise
Core points
params: Minpts=4, ε =
ε
Ester et al. 1996
64. 64
DBSCAN
Density-based Spatial Clustering for Applications with Noise
Border points
params: Minpts=4, ε =
ε
Ester et al. 1996
65. 65
DBSCAN
Density-based Spatial Clustering for Applications with Noise
Border points
params: Minpts=4, ε =
ε
Ester et al. 1996
66. 66
DBSCAN
Density-based Spatial Clustering for Applications with Noise
ε
Noise point Border point Core point
params: Minpts=4, ε =
Ester et al. 1996
67. 67
Tweet-SCAN
Neighborhood identification
– ε1: spatial (m) – Euclidean distance
– ε2: time (sec) – Euclidean distance
– ε3: text – Jensen-Shannon distance (proper metric for prob. dist.)
Cardinality of the neighborhood
– MinPts – minimum number of neighbors (Tweets)
– µ – minimum percentage of unique users in the neighborhood.
68. 68
Experimentation
We used 45.623 tweets to unsupervisedly discover event-related tweets
by means of Tweet-SCAN.
We seek to understand the parameters role by comparing the resulting
clusters against the 1.163 tagged event-related Tweets.
69. 69
Evaluation
Extrinsic clustering metrics
Amigo et al. 2008
Purity =
Ci
n
max Precision Ci, Lj( )( )
i
∑ → Precision Ci, Lj( )=
Ci ∩ L j
Ci
Precision(C,L)=9/10
Precision(C,L)=1/10
Precision(C,L)=0/10
Precision(C,L)=0/9
Precision(C,L)=8/9
Precision(C,L)=1/9
Precision(C,L)=0/9
Precision(C,L)=0/9
Precision(C,L)=9/9
C C C
L L L
70. 70
Evaluation
Extrinsic clustering metrics
Amigo et al. 2008
Purity =
Ci
n
max Precision Ci, Lj( )( )
i
∑ → Precision Ci, Lj( )=
Ci ∩ L j
Ci
Purity = 0.92
C C C
L L L
Precision(C,L)=9/10
Precision(C,L)=1/10
Precision(C,L)=0/10
Precision(C,L)=0/9
Precision(C,L)=8/9
Precision(C,L)=1/9
Precision(C,L)=0/9
Precision(C,L)=0/9
Precision(C,L)=9/9
72. 72
Evaluation
Extrinsic clustering metrics
Amigo et al. 2008
Recall(C,L)=9/9
Recall(C,L)=0/9
Recall(C,L)=0/9
Recall(C,L)=1/9
Recall(C,L)=8/9
Recall(C,L)=0/9
Recall(C,L)=0/10
Recall(C,L)=1/10
Recall(C,L)=9/10
InvPurity =
Li
n
max Recall Cj, Li( )( )
i
∑ → Recall Cj, Li( )=
Li ∩C j
Li
C C C
L L L
73. 73
Evaluation
Extrinsic clustering metrics
Amigo et al. 2008
InvPurity =
Li
n
max Recall Cj, Li( )( )
i
∑ → Recall Cj, Li( )=
Li ∩C j
Li
Recall(C,L)=9/9
Recall(C,L)=0/9
Recall(C,L)=0/9
Recall(C,L)=1/9
Recall(C,L)=8/9
Recall(C,L)=0/9
Recall(C,L)=0/10
Recall(C,L)=1/10
Recall(C,L)=9/10
InvPurity = 0.92
C C C
L L L
76. 76
Evaluation
Extrinsic clustering metrics
which is the harmonic mean F(Li,Cj) is the harmonic mean of
Precision(Cj,Li) and Recall(Cj,Li)
Amigo et al. 2008
F =
Li
n
max F Li, Cj( )( )
i
∑ →F Li, Cj( )=
2×Recall(Cj,Li )×Precision( Cj,Li )
Recall(Cj,Li )+ Precision( Cj,Li )
84. Conclusions
The birth of social networks is one of the major causes of
current levels of digitalized personal data.
Social networks have kept the doors opened to the developer
community in order to stimulate the creation of apps.
This “openness” has been materialized with RESTful APIs,
that enables communication between third party apps and
social networks.
Through these APIs we are able to access vast amounts of
data, develop and validate machine learning tools.
However, technical and legal limitations have to be taken
into account to build functional applications.
85. 85
Conclusions
Location-based social networks enable to bridge the virtual
and physical world.
Classical application such as recommender systems have to
be reconsidered to take into account this new dimension.
Recommendation from textual reviews is feasible and
hybridization improves performance.
Data from SN can be very biased by their own services in the
SN (e.g. by their own RS).
Other novel application, such as event discovery, gain
meaning with LBSNs.
Event discovery has to consider textual dimension to uncover
meaningful events
91. 91
References
Zheng, Yu. "Location-based social networks: Users." Computing with Spatial
Trajectories. Springer New York, 2011. 243-276.
Joan Capdevila, Marta Arias, and Argimiro Arratia. "GeoSRS: A hybrid social
recommender system for geolocated data." Information Systems (2015).
Joan Capdevila, Jesús Cerquides Jordi Nin Jordi Torres. “Tweet-SCAN: An
event discovery technique for geo-located tweets”. Proceedings of the 18th
International Conference of the Catalan Association for Artificial Intelligence,
2015
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet
allocation." the Journal of machine Learning research 3 (2003): 993-1022.
92. 92
References
Teh, Yee Whye, et al. "Hierarchical dirichlet processes." Journal of the american
statistical association 101.476 (2006).
Ester, Martin, et al. "A density-based algorithm for discovering clusters in large
spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
Amigó, Enrique, et al. "A comparison of extrinsic clustering evaluation metrics
based on formal constraints." Information retrieval 12.4 (2009): 461-486.
Melanie Swan. “Quantified Self Ideology. Personal data becomes Big Data”
February 2014. Université Paris Descartes
Akyildiz, Ian F., and Josep Miquel Jornet. "The internet of nano-things." Wireless
Communications, IEEE 17.6 (2010): 58-63.
Luigi Atzori A presentation on THE SOCIAL INTERNET OF THINGS University of
Cagliari, Italy 2012