Ranieri Baraglia, Carlos Castillo, Debora Donato, Franco Maria Nardini, Raffaele Perego and Fabrizio Silvestri: "The Effects of Time on Query Flow Graph-based Models for Query Suggestion". In proceedings of RIAO. Paris, France, 2010.
Slides prepared by Franco Maria Nardini
A talk I gave at the February 2009 Startup2Startup dinner. Some of these slides were created as part of another talk I did with Erika Hall of Mule Design, and are used with her permission.
Presentation at the Tow Center for Digital Journalism, Columbia University. November 14th, 2013.
VIDEO: http://new.livestream.com/accounts/1079539/events/2542929
http://towcenter.org/events/conversation-with-carlos-castillo/
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer and Matt Stempeck: Characterizing the Life Cycle of Online News Stories Using Social Media Reactions. In CSCW. Baltimore, USA. February 2014.
Barbara Poblete, Aristides Gionis, Carlos Castillo: \"Dr. Searcher and Mr. Browser: a unified hyperlink-click graph\". Proceedings of CIKM, Napa Valley, CA, USA, October 2008.
Abdulfatai Popoola, Dmytro Krasnoshtan, Attila Toth, Victor
Naroditskiy, Carlos Castillo, Patrick Meier and Iyad Rahwan: Information Verification During Natural Disasters. In SWDM workshop, Rio de Janeiro, Brazil. 2013.
See also: http://www.veri.ly/
=========================
- Video #1 (Al Jazeera): http://www.youtube.com/watch?v=w4TmT_o8wy4
- Video #2 (CNN): http://www.youtube.com/watch?v=pAHoEO-K0Ek
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
We examine biases in online news sources and social media communities around them. To that end, we introduce unsupervised methods considering three types of biases: selection or "gatekeeping" bias, coverage bias, and statement bias, characterizing each one through a series of metrics. Our results, obtained by analyzing 80 international news sources during a two-week period, show that biases are subtle but observable, and follow geographical boundaries more closely than political ones. We also demonstrate how these biases are to some extent amplied by social media.
A talk I gave at the February 2009 Startup2Startup dinner. Some of these slides were created as part of another talk I did with Erika Hall of Mule Design, and are used with her permission.
Presentation at the Tow Center for Digital Journalism, Columbia University. November 14th, 2013.
VIDEO: http://new.livestream.com/accounts/1079539/events/2542929
http://towcenter.org/events/conversation-with-carlos-castillo/
Characterizing the Life Cycle of Online News Stories Using Social Media React...Carlos Castillo (ChaTo)
Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer and Matt Stempeck: Characterizing the Life Cycle of Online News Stories Using Social Media Reactions. In CSCW. Baltimore, USA. February 2014.
Barbara Poblete, Aristides Gionis, Carlos Castillo: \"Dr. Searcher and Mr. Browser: a unified hyperlink-click graph\". Proceedings of CIKM, Napa Valley, CA, USA, October 2008.
Abdulfatai Popoola, Dmytro Krasnoshtan, Attila Toth, Victor
Naroditskiy, Carlos Castillo, Patrick Meier and Iyad Rahwan: Information Verification During Natural Disasters. In SWDM workshop, Rio de Janeiro, Brazil. 2013.
See also: http://www.veri.ly/
=========================
- Video #1 (Al Jazeera): http://www.youtube.com/watch?v=w4TmT_o8wy4
- Video #2 (CNN): http://www.youtube.com/watch?v=pAHoEO-K0Ek
Social Media News Communities: Gatekeeping, Coverage, and Statement BiasMounia Lalmas-Roelleke
We examine biases in online news sources and social media communities around them. To that end, we introduce unsupervised methods considering three types of biases: selection or "gatekeeping" bias, coverage bias, and statement bias, characterizing each one through a series of metrics. Our results, obtained by analyzing 80 international news sources during a two-week period, show that biases are subtle but observable, and follow geographical boundaries more closely than political ones. We also demonstrate how these biases are to some extent amplied by social media.
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
This presentation describes our work presented at the 10th International Conference on Information Systems on Crisis Response and Management (ISCRAM) in Baden-Baden, Germany. The work shows the importance of microblogging websites such as Twitter, and huge number of informative messages that can contribute to situational awareness at the time of disasters. Specifically, the work shows the classification, and information extractions of those valuable, actionable informative messages that people post during emergencies.
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press.
KDD 2016 tutorial on Algorithmic Bias, Parts III and IV.
Video: https://www.youtube.com/watch?v=ErgHjxJsEKA
By Sara Hajian, Francesco Bonchi, and Carlos Castillo.
http://francescobonchi.com/algorithmic_bias_tutorial.html
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
Slides presented at WikiSym 2012.
This paper presents a large-scale analysis of emotions in conversations among Wikipedia editors. Our focus is on the emotions expressed by editors in talk pages, measured by using the Affective Norms for English Words (ANEW).
We find evidence that to a large extent women tend to participate in discussions with a more positive tone, and that administrators are more positive than non-administrators. Surprisingly, female non-administrators tend to behave like administrators in many aspects.
We observe that replies are on average more positive than the comments they reply to, preventing many discussions from spiralling down into conflict. We also find evidence of emotional homophily: editors having similar emotional styles are more likely to interact with each other.
Our findings offer novel insights into the emotional dimension of interactions in peer-production communities, and contribute to debates on issues such as the flattening of editor growth and the gender gap.
Given the growth of social media and rapid evolution of Web of Data, we have unprecedented opportunities to improve crisis response by extracting social signals, creating spatio-temporal mappings, performing analytics on social and Web of Data, and supporting a variety of applications. Such applications can help provide situational awareness during an emergency, improve preparedness, and assist during the rebuilding/recovery phase of a disaster. Data mining can provide valuable insights to support emergency responders and other stakeholders during crisis. However, there are a number of challenges and existing computing technology may not work in all cases. Therefore, our objective here is to present the characterization of such data mining tasks, and challenges that need further research attention for leveraging social media and Web of Data to assist crisis response coordination.
Main topics: social media mining, social networks, and influence propagation. Includes an application to social media in disasters.
Talk given at the European Summer School on Information Retrieval (ESSIR 2015) on September 1st, 2015.
See also: http://chato.cl/
KDD 2016 tutorial on Algorithmic Bias, Parts I and II.
Video:
Part I: https://www.youtube.com/watch?v=mJcWrfoGup8
Part II: https://www.youtube.com/watch?v=nKemhMbaYcU
Part III: https://www.youtube.com/watch?v=ErgHjxJsEKA
By Sara Hajian, Francesco Bonchi, and Carlos Castillo.
http://francescobonchi.com/algorithmic_bias_tutorial.html
Keynote at the Dutch-Belgian Information Retrieval Workshop, November 2016, Delft, Netherlands.
Based on KDD 2016 tutorial with Sara Hajian and Francesco Bonchi.
Various examples of observational studies, mostly fo the analysis of social media.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Basic concepts about natural experiments, based mostly on Dunning's book.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
This presentation describes our work presented at the 10th International Conference on Information Systems on Crisis Response and Management (ISCRAM) in Baden-Baden, Germany. The work shows the importance of microblogging websites such as Twitter, and huge number of informative messages that can contribute to situational awareness at the time of disasters. Specifically, the work shows the classification, and information extractions of those valuable, actionable informative messages that people post during emergencies.
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press.
KDD 2016 tutorial on Algorithmic Bias, Parts III and IV.
Video: https://www.youtube.com/watch?v=ErgHjxJsEKA
By Sara Hajian, Francesco Bonchi, and Carlos Castillo.
http://francescobonchi.com/algorithmic_bias_tutorial.html
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
Slides presented at WikiSym 2012.
This paper presents a large-scale analysis of emotions in conversations among Wikipedia editors. Our focus is on the emotions expressed by editors in talk pages, measured by using the Affective Norms for English Words (ANEW).
We find evidence that to a large extent women tend to participate in discussions with a more positive tone, and that administrators are more positive than non-administrators. Surprisingly, female non-administrators tend to behave like administrators in many aspects.
We observe that replies are on average more positive than the comments they reply to, preventing many discussions from spiralling down into conflict. We also find evidence of emotional homophily: editors having similar emotional styles are more likely to interact with each other.
Our findings offer novel insights into the emotional dimension of interactions in peer-production communities, and contribute to debates on issues such as the flattening of editor growth and the gender gap.
Given the growth of social media and rapid evolution of Web of Data, we have unprecedented opportunities to improve crisis response by extracting social signals, creating spatio-temporal mappings, performing analytics on social and Web of Data, and supporting a variety of applications. Such applications can help provide situational awareness during an emergency, improve preparedness, and assist during the rebuilding/recovery phase of a disaster. Data mining can provide valuable insights to support emergency responders and other stakeholders during crisis. However, there are a number of challenges and existing computing technology may not work in all cases. Therefore, our objective here is to present the characterization of such data mining tasks, and challenges that need further research attention for leveraging social media and Web of Data to assist crisis response coordination.
Main topics: social media mining, social networks, and influence propagation. Includes an application to social media in disasters.
Talk given at the European Summer School on Information Retrieval (ESSIR 2015) on September 1st, 2015.
See also: http://chato.cl/
KDD 2016 tutorial on Algorithmic Bias, Parts I and II.
Video:
Part I: https://www.youtube.com/watch?v=mJcWrfoGup8
Part II: https://www.youtube.com/watch?v=nKemhMbaYcU
Part III: https://www.youtube.com/watch?v=ErgHjxJsEKA
By Sara Hajian, Francesco Bonchi, and Carlos Castillo.
http://francescobonchi.com/algorithmic_bias_tutorial.html
Keynote at the Dutch-Belgian Information Retrieval Workshop, November 2016, Delft, Netherlands.
Based on KDD 2016 tutorial with Sara Hajian and Francesco Bonchi.
Various examples of observational studies, mostly fo the analysis of social media.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Basic concepts about natural experiments, based mostly on Dunning's book.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
PHP Frameworks: I want to break free (IPC Berlin 2024)
The Effects of Time on Query Flow Graph-based Models for Query Suggestion
1. The Effects of Time on Query
Flow Graph-based Models for
Query Suggestion
Carlos Castillo, Debora Donato Ranieri Baraglia, Franco Maria Nardini
Raffaele Perego, Fabrizio Silvestri
Yahoo! Research Barcelona
HPC Lab, ISTI-CNR, Pisa
martedì 4 maggio 2010
3. Outline
• Introduction
• Aims of this Work
• The Query-Flow Graph
• Evaluating the Aging Effect
• Combating the Aging Effect
• Distributed QFG Building
• Conclusions & Future Works
martedì 4 maggio 2010
5. Introduction
• Web search engines use query recommender
systems to improve users’ search experience;
martedì 4 maggio 2010
6. Introduction
• Web search engines use query recommender
systems to improve users’ search experience;
• Query recommender systems give hints to users on
possible “interesting queries”:
• relative to their information needs;
martedì 4 maggio 2010
7. Introduction
• Web search engines use query recommender
systems to improve users’ search experience;
• Query recommender systems give hints to users on
possible “interesting queries”:
• relative to their information needs;
• Query recommender systems exploit the
knowledge of past web search engines users:
• recorded in query logs.
martedì 4 maggio 2010
9. Aims of this Work
• to show that time has negative effects on a query
recommender model:
• the model becomes unable to generate good suggestions
as time passes;
• bursty queries;
martedì 4 maggio 2010
10. Aims of this Work
• to show that time has negative effects on a query
recommender model:
• the model becomes unable to generate good suggestions
as time passes;
• bursty queries;
• to extend a state-of-the-art recommender system by providing
a methodology for dealing efficiently with evolving data;
• to define a “good” strategy to update the model;
• to define an distributed/parallel algorithm to update the
model;
martedì 4 maggio 2010
12. The Query-Flow Graph
•
barcelona fc
QFG [Boldi et al., CIKM’08] is a website
compact and powerful representation 0.043
barcelona fc
of Web Search engine users’ behavior; 0.031
fixtures
barcelona fc 0.017 real
madrid
0.080
0.011
0.506
0.439
barcelona
hotels 0.072
0.018 cheap
barcelona
0.023
hotels
0.029
<T>
barcelona luxury
0.043
barcelona
0.018
barcelona hotels
weather
0.416
0.523
0.100
barcelona
weather
online
martedì 4 maggio 2010
13. The Query-Flow Graph
•
barcelona fc
QFG [Boldi et al., CIKM’08] is a website
compact and powerful representation 0.043
barcelona fc
of Web Search engine users’ behavior; 0.031
fixtures
• QFG is a graph composed by:
0.080
barcelona fc 0.017 real
madrid
1. a set of nodes, V = Q ∪ {s,t}; 0.011
0.506
0.439
2. a set of directed edges, E ⊆ V x V: barcelona
hotels 0.072
0.018 cheap
•
barcelona
0.023
(q, q’) are connected if they are 0.029
hotels
<T>
consecutive at least one time in 0.043
barcelona luxury
at least one session;
barcelona
0.018
barcelona hotels
weather
0.416
3. a weighting function w = E --> (0, 1]:
•
0.523
assigning a weight w(q, q’) to 0.100
each edge; barcelona
weather
online
martedì 4 maggio 2010
15. The Query-Flow Graph
• two weighting schemes:
• relative frequencies: counting query occurrences;
• chaining probabilities: (q,q’) in the same chain
• classification on a set of features (text, n-grams,
session) over all sessions where (q,q’) are
consecutive;
martedì 4 maggio 2010
16. The Query-Flow Graph
• two weighting schemes:
• relative frequencies: counting query occurrences;
• chaining probabilities: (q,q’) in the same chain
• classification on a set of features (text, n-grams,
session) over all sessions where (q,q’) are
consecutive;
• noisy edges: edges with low probability are removed;
martedì 4 maggio 2010
18. The Query-Flow Graph
• Query recommendation:
• random walk with restart on the graph;
• considering history of the users (on the
preference vector);
martedì 4 maggio 2010
19. The Query-Flow Graph
• Query recommendation:
• random walk with restart on the graph;
• considering history of the users (on the
preference vector);
• A score is associated to each suggestion;
martedì 4 maggio 2010
27. Boldi et al. in [4]. This method uses chaining probabi
measured by means of a machine learning method. The
Experimental
tial step was thus to extract those features from each t
ing log, and storing them into a compressed graph re
sentation. In particular we extracted 25 different feat
Assumptions
(time-related, session and textual features) for each pa
queries (q, q ) that are consecutive in at least one sessio
the query log.
Table 1 shows the number of nodes and edges of the
• M , M are used for training;
1 2
ferent graphs corresponding to each query log segment
for training.
• two different QFGs; time window
March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648
Table 1: Number of nodes and edges for the gra
corresponding to the two different training
ments.
It is important to remark that we have not re-trained
classification model for the assignment of weights associ
with QFG edges. We reuse the one that has been used i
for segmenting users sessions into query chains1 . Th
another point in favor of QFG-based models. Once you t
the classifier to assign weights to QFG edges, you can r
it on different data-sets without losing in effectiveness.
martedì 4 maggio 2010 1
28. Boldi et al. in [4]. This method uses chaining probabi
measured by means of a machine learning method. The
Experimental
tial step was thus to extract those features from each t
ing log, and storing them into a compressed graph re
sentation. In particular we extracted 25 different feat
Assumptions
(time-related, session and textual features) for each pa
queries (q, q ) that are consecutive in at least one sessio
the query log.
Table 1 shows the number of nodes and edges of the
• M , M are used for training;
1 2
ferent graphs corresponding to each query log segment
for training.
• two different QFGs; time window
March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648
• Queries in the third month Number of nodes testing; for the gra
Table 1: are used for and edges
corresponding to the two different training
ments.
It is important to remark that we have not re-trained
classification model for the assignment of weights associ
with QFG edges. We reuse the one that has been used i
for segmenting users sessions into query chains1 . Th
another point in favor of QFG-based models. Once you t
the classifier to assign weights to QFG edges, you can r
it on different data-sets without losing in effectiveness.
martedì 4 maggio 2010 1
29. Boldi et al. in [4]. This method uses chaining probabi
measured by means of a machine learning method. The
Experimental
tial step was thus to extract those features from each t
ing log, and storing them into a compressed graph re
sentation. In particular we extracted 25 different feat
Assumptions
(time-related, session and textual features) for each pa
queries (q, q ) that are consecutive in at least one sessio
the query log.
Table 1 shows the number of nodes and edges of the
• M , M are used for training;
1 2
ferent graphs corresponding to each query log segment
for training.
• two different QFGs; time window
March 06
id
M1
nodes
3,814,748
edges
6,129,629
April 06 M2 3,832,973 6,266,648
• Queries in the third month Number of nodes testing; for the gra
Table 1: are used for and edges
corresponding to the two different training
• We evaluate the aging effect by measuring the quality
ments.
of suggestions produced by models on M , and M ;
It is important to remark that we have not re-trained
1 2
classification model for the assignment of weights associ
• If the model ages M
with QFG edges. We reuse the one that has been used i
outperforms M , in terms of
for segmenting users sessions1into query chains1 . Th
2
another point in favor of QFG-based models. Once you t
quality of suggestions;
the classifier to assign weights to QFG edges, you can r
it on different data-sets without losing in effectiveness.
martedì 4 maggio 2010 1
31. Evaluating the Aging
Effect
1e+06
Top 1000 queries in month 1 on month 1
Top 1000 queries in month 3 on month 1
100000
10000
1000
100
10 !#$%'()*+,'
1
1 10 100 1000
martedì 4 maggio 2010
32. Evaluating the Aging
Effect
• Two classes of test queries:
• F1: 30 queries highly
1e+06
Top 1000 queries in month 1 on month 1
Top 1000 queries in month 3 on month 1
frequent in M1 having a 100000
large drop in the test
month (ex. shakira). 10000
• F3: 30 queries highly 1000
frequent in the test
month having a large
100
drop in M1 (ex. da vinci 10 !#$%'()*+,'
code, mothers day gift);
1
1 10 100 1000
martedì 4 maggio 2010
33. Evaluating the Aging
Effect
• Two classes of test queries:
• F1: 30 queries highly
1e+06
Top 1000 queries in month 1 on month 1
Top 1000 queries in month 3 on month 1
frequent in M1 having a 100000
large drop in the test
month (ex. shakira). 10000
• F3: 30 queries highly 1000
frequent in the test
month having a large
100
drop in M1 (ex. da vinci 10 !#$%'()*+,'
code, mothers day gift);
•
1
F1, F3 contain very diverse
1 10 100 1000
queries;
martedì 4 maggio 2010
36. 3742 2652
2162 2615
Evaluating the Aging
2001 2341
1913 2341
1913 2341
Effect (II)
• When k suggestions share the
same score, those are useless; (!!!
'!!!
!!!
%!!! )*+,
-./)012.342+*5
$!!!
#!!!
!
# $ % ' (
martedì 4 maggio 2010
37. 3742 2652
2162 2615
Evaluating the Aging
2001 2341
1913 2341
1913 2341
Effect (II)
• When k suggestions share the
same score, those are useless; (!!!
• Same suggestion score: '!!!
•
!!!
same probability on the
graph; %!!! )*+,
-./)012.342+*5
• the model is not able to $!!!
give a priority to #!!!
recommendations; !
# $ % ' (
martedì 4 maggio 2010
38. 3742 2652
2162 2615
Evaluating the Aging
2001 2341
1913 2341
1913 2341
Effect (II)
• When k suggestions share the
same score, those are useless; (!!!
• Same suggestion score: '!!!
•
!!!
same probability on the
graph; %!!! )*+,
-./)012.342+*5
• the model is not able to $!!!
give a priority to #!!!
recommendations; !
• Confirmed by an user-study
# $ % ' (
on F1, and F3;
martedì 4 maggio 2010
40. Evaluating the Aging
Effect (III)
• Working hypothesis:
• useful recommendations do not share the same
recommendation score;
martedì 4 maggio 2010
41. Evaluating the Aging
Effect (III)
• Working hypothesis:
• useful recommendations do not share the same
recommendation score;
• Automatic evaluation;
• 400 highly frequent queries in the test month;
• evaluating the number of useful recommendations;
• k = 3;
martedì 4 maggio 2010
43. ate recommendations are taken from different query
Evaluating the Aging
recommendations with their assigned relative scores.
Effect (IV)
reduces the “noise” on the data and generates more precise
knowledge on which recommendations are computed. Fur-
thermore, the increase is quite independent from the thresh-
old level, i.e. by increasing the threshold from 0.5 to 0.75
the overall quality is, roughly, constant.
• Results: filtering
threshold
average number
of useful sugges-
tions on M1
average number
of useful sugges-
tions on M2
0 2.84 2.91
0.5 5.85 6.23
0.65 5.85 6.23
0.75 5.85 6.18
Table 4: Recommendation statistics obtained by us-
ing the automatic evaluation method on a set of 400
queries drawn from the most frequent in the third
month.
We further break down the overall results shown in Table 4
to show the number of queries on which the QFG-based
martedì 4 maggio 2010
44. ate recommendations are taken from different query
Evaluating the Aging
recommendations with their assigned relative scores.
Effect (IV)
reduces the “noise” on the data and generates more precise
knowledge on which recommendations are computed. Fur-
thermore, the increase is quite independent from the thresh-
old level, i.e. by increasing the threshold from 0.5 to 0.75
the overall quality is, roughly, constant.
• Results: filtering
threshold
average number
of useful sugges-
tions on M1
average number
of useful sugges-
tions on M2
0 2.84 2.91
0.5 5.85 6.23
0.65 5.85 6.23
0.75 5.85 6.18
• Table 4: Recommendation statistics obtained by us-
Average ing the automatic evaluation method on a set of 400
number of useful suggestions is greater in
M2 than queries drawn from the most frequent in the third
in M1;
month.
• Filtering process helps a lot;
We further break down the overall results shown in Table 4
to show the number of queries on which the QFG-based
martedì 4 maggio 2010
47. Evaluating the Aging
Effect (V)
• On a histogram (cumulative distribution):
400
300
200
100
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M1 M2
• Results on M are always better than those on M :
2 1
• less queries without suggestions;
martedì 4 maggio 2010
49. Combating the Aging
Effect
• QFG recommender models age:
• Average recommendation quality degrades;
• Recommendations should not be influenced by
time;
martedì 4 maggio 2010
50. Combating the Aging
Effect
• QFG recommender models age:
• Average recommendation quality degrades;
• Recommendations should not be influenced by
time;
• Update of the model vs. rebuilding it “from scratch”;
martedì 4 maggio 2010
52. Combating the Aging
t a model
or which Effect (II)
QFGs. Suppose the model used to generate recommenda-
tions consists of a portion of data representing one month
(for M1 and M2 ) or two months (for M12 ) of the query
commen- log. The model is being updated every 15 days (for M1
•
to always and M2 ) or every 30 days (for M12 ). By using the first ap-
Solution: incremental update of Mevery means days to rebuild
proach, we pay 22 (44) minutes 1 by 15 (30) of “fresh data” in M2
•
the new model from scratch on a new set of data obtained
Graph the last two months of the query log. Instead, by using
from algebra [Bordino et al., 2008];
FLOW
•
the second approach, we need to pay only 15 (32) minutes
Some measures on the two different approaches:
for updating the one-month (two-months) QFG.
apidly in
“From scratch” “Incremental”
commen-
Dataset strategy [min.] strategy [min.]
endation M1 (March 2006) 21 14
tive queries. M2 (April 2006) 22 15
both fre- M12 (March and April) 44 32
heir value
ariation). Table 5: Time needed to build a Query Flow Graph
o movies, from scratch and using our “incremental” approach
eral with (from merging two QFG representing an half of
it is easy data).
martedì 4 maggio 2010
53. Combating the Aging
t a model
or which Effect (II)
QFGs. Suppose the model used to generate recommenda-
tions consists of a portion of data representing one month
(for M1 and M2 ) or two months (for M12 ) of the query
commen- log. The model is being updated every 15 days (for M1
•
to always and M2 ) or every 30 days (for M12 ). By using the first ap-
Solution: incremental update of Mevery means days to rebuild
proach, we pay 22 (44) minutes 1 by 15 (30) of “fresh data” in M2
•
the new model from scratch on a new set of data obtained
Graph the last two months of the query log. Instead, by using
from algebra [Bordino et al., 2008];
FLOW
•
the second approach, we need to pay only 15 (32) minutes
Some measures on the two different approaches:
for updating the one-month (two-months) QFG.
apidly in
“From scratch” “Incremental”
commen-
Dataset strategy [min.] strategy [min.]
endation M1 (March 2006) 21 14
tive queries. M2 (April 2006) 22 15
both fre- M12 (March and April) 44 32
•
heir value
Incremental updates: 2/3 of the build w.r.t. “from scratch” strategy;
ariation). Table 5: Time needed to time a Query Flow Graph
from scratch and using our “incremental” approach
•
o movies,
Evaluation onmerging two QFG representing an half of
eral with (from the same set of 400 queries;
it is easy data).
martedì 4 maggio 2010
55. 3698 shakira video
shakira 3135 shakira nude
Combating the Aging
3099 shakira wallpaper
3020 shakira biography
3018 shakira aol music
2015 free video downloads
Effect (III)
Table 7: Some examples of recommendations gen-
erated on different QFG models. Queries used to
generate recommendations are taken from different
query sets.
• Results: filtering
threshold
average number
of useful sugges-
tions on M2
average number
of useful sugges-
tions on M12
0 2.91 3.64
0.5 6.23 7.95
0.65 6.23 7.94
0.75 6.18 7.9
Table 8: Recommendation statistics obtained by us-
ing the automatic evaluation method on a relatively
large set of 400 queries drawn from the most fre-
quent in the third month.
martedì 4 maggio 2010
gated the main reasons why we obtain such an improvement.
56. 3698 shakira video
shakira 3135 shakira nude
Combating the Aging
3099 shakira wallpaper
3020 shakira biography
3018 shakira aol music
2015 free video downloads
Effect (III)
Table 7: Some examples of recommendations gen-
erated on different QFG models. Queries used to
generate recommendations are taken from different
query sets.
• Results: filtering
threshold
average number
of useful sugges-
tions on M2
average number
of useful sugges-
tions on M12
0 2.91 3.64
0.5 6.23 7.95
0.65 6.23 7.94
0.75 6.18 7.9
• Average number of useful suggestion is obtained by us-
Table 8: Recommendation statistics greater in
ing the automatic evaluation method on a relatively
M12 than in M2, or 400M1;
large set of in queries drawn from the most fre-
quent in the third month.
martedì 4 maggio 2010
gated the main reasons why we obtain such an improvement.
58. 12,5
Combating the Aging
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M1 M2 M12
Effect (IV)
Figure 4: Histogram showing the number of queries
(on the y axis) having a certain number of useful
recommendations (on the x axis). Results are eval-
• uated automatically.
On a histogram (cumulative distribution):
400
300
t 200
100
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M1 M2 M12
-
Figure 5: Histogram showing the total number of
queries (on the y axis) having at least a certain num-
ber of useful recommendations (on the x axis). For
instance the third bucket shows how many queries
martedì 4 maggio 2010
59. 12,5
Combating the Aging
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M1 M2 M12
Effect (IV)
Figure 4: Histogram showing the number of queries
(on the y axis) having a certain number of useful
recommendations (on the x axis). Results are eval-
• uated automatically.
On a histogram (cumulative distribution):
400
300
t 200
100
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
M1 M2 M12
- • Results on M12 are always better than M1, and M2;
Figure 5: Histogram showing the total number of
• queries improvement ofhaving at least aleast four good
large (on the y axis) queries with at certain num-
suggestions;
ber of useful recommendations (on the x axis). For
instance the third bucket shows how many queries
martedì 4 maggio 2010
61. Distributed QFG
4. using the graph algebra described in [8], each pa
graph is iteratively merged. Each iteration is do
parallel on the different available nodes of the clo
Building
5. the final resulting data-graph is now processed
other steps [4] (normalization, chain extraction,
dom walk) to obtain the complete and usable QF
• a parallel way to update QFGs:
01)2()*+,'#3456#7)8#
Divide-and-Conquer approach;
• the query log is split in m
!#$%'# !#$%'# !#$%'# !#$%'#
parts;
• parallel extraction of the
-./# -./# -./# -./#
features;
• compressing step;
!#()*+,#-./# !#()*+,#-./#
• merging graphs;
• final operations 9#()*+,'#-./#
(normalization, pagerank, etc.);
martedì 4 maggio 2010 Figure 6: Example of the building of a two mo
63. Conclusions
• We study the effects of time on QFG-based query
recommender systems;
martedì 4 maggio 2010
64. Conclusions
• We study the effects of time on QFG-based query
recommender systems;
• We built different QFGs from the AOL query log;
• we analyze the quality of recommendation;
• we show that recommendation models ages;
• we introduce an “incremental” algorithm for updating
the model;
• we propose a parallel/distributed way of building
QFGs;
martedì 4 maggio 2010
66. Future Works
• to define a strategy for merging graphs assigning
different weights to each subgraph;
• more importance to “fresh” data;
martedì 4 maggio 2010
67. Future Works
• to define a strategy for merging graphs assigning
different weights to each subgraph;
• more importance to “fresh” data;
• to compare the robustness of QFG recommender
systems with other query recommenders with
respect to aging;
martedì 4 maggio 2010
68. Future Works
• to define a strategy for merging graphs assigning
different weights to each subgraph;
• more importance to “fresh” data;
• to compare the robustness of QFG recommender
systems with other query recommenders with
respect to aging;
• to design a MapReduce algorithm to build and update
efficiently QFGs recommender systems;
martedì 4 maggio 2010
69. Questions?
Thank you for your attention!
martedì 4 maggio 2010
70. References
• [Boldi et al., CIKM’08]: The Query Flow Graph: model
and applications. Boldi, Bonchi, Castillo, Donato,
Gionis,Vigna. CIKM’08.
• [Boldi et al., WSCD’09]: Query Suggestions using
Query-Flow Graphs. Boldi, Bonchi, Castillo, Donato,
Vigna. WSCD’09.
• [Bordino et al., 2008]: Algebra for the joint mining of
query log graphs, 2008.
martedì 4 maggio 2010