The document discusses Elizabeth Murnane's research focusing on understanding the nuanced dimensions of digital footprints, including semantics, personality, emotion, and behavior. She aims to design intelligent systems that can better interpret these dimensions from user-generated data on search queries, social media, mobile sensing, and other online activities. Some of her projects analyze semantic, psychological, and behavioral dimensions to improve information retrieval, knowledge sharing, and personal informatics applications. Her RESLVE project specifically seeks to develop computational methods to more reliably understand the intended meanings of ambiguous entities in social web data.
Social Search: A Little Help From My FriendsBrynn Evans
These are the slides from the SxSW'10 panel on social search with Max Ventilla (@ventilla), Ash Rust (@ashrust), Scott Prindle (@prindlescott), Marc Vermut (@mvermut), and me!
The Wall has Come Down: Integrating our Online and Offline Worlds (IoT / Wear...Noz Urbina
Thesis: There are no longer discrete online and offline worlds. Holding onto this idea is hurting our communications.
In this session we take a look at the medium and long-term implications of wearable devices and the internet of things. Walking through a journey of realisation across 2 years, we'll go through what it means to content when you take omnichannel, wearable devices and the internet of things and put them together in one integrated ecosystem.
Screens are shrinking and working in tandem; connectivity is marching on towards ubiquity. Eventually there comes a point where the online world or ‘digital space’ and our real-life day-to-day will integrate so seamlessly that differentiating them will seem antiquated.
What does that mean to communication and content? What happens when Moore’s Law applies to our lives? What is the impact on information, technology and eventually culture? How should communications change to cope with a life of constantly accelerating change?
We'll will address these questions and more.
The Internet is Everywhere – So What's Changed? [Noz Urbina, DITA EU 2013]Noz Urbina
The word “internet” is 30 years old, the actual networks even older. Email is nearly 40 years old. We now live in a world where professional-and-parenting-age adults have never known a World Without Web. But what has the impact been? This generation—and the internet user population as a whole—is consuming content in wildly different ways. Each new experience immediately sets new expectations for the future, creating a snowball effect. This session will look at that snowball, try to demonstrate quite how enormous it truly is, and discuss how DITA content helps us address a new crop of user expectations. We will look at how the true scale of changes in culture and expectations that impact communication, real-world scenarios where user and products will operate differently and why DITA is ideal to address the new challenges.
Social Search: A Little Help From My FriendsBrynn Evans
These are the slides from the SxSW'10 panel on social search with Max Ventilla (@ventilla), Ash Rust (@ashrust), Scott Prindle (@prindlescott), Marc Vermut (@mvermut), and me!
The Wall has Come Down: Integrating our Online and Offline Worlds (IoT / Wear...Noz Urbina
Thesis: There are no longer discrete online and offline worlds. Holding onto this idea is hurting our communications.
In this session we take a look at the medium and long-term implications of wearable devices and the internet of things. Walking through a journey of realisation across 2 years, we'll go through what it means to content when you take omnichannel, wearable devices and the internet of things and put them together in one integrated ecosystem.
Screens are shrinking and working in tandem; connectivity is marching on towards ubiquity. Eventually there comes a point where the online world or ‘digital space’ and our real-life day-to-day will integrate so seamlessly that differentiating them will seem antiquated.
What does that mean to communication and content? What happens when Moore’s Law applies to our lives? What is the impact on information, technology and eventually culture? How should communications change to cope with a life of constantly accelerating change?
We'll will address these questions and more.
The Internet is Everywhere – So What's Changed? [Noz Urbina, DITA EU 2013]Noz Urbina
The word “internet” is 30 years old, the actual networks even older. Email is nearly 40 years old. We now live in a world where professional-and-parenting-age adults have never known a World Without Web. But what has the impact been? This generation—and the internet user population as a whole—is consuming content in wildly different ways. Each new experience immediately sets new expectations for the future, creating a snowball effect. This session will look at that snowball, try to demonstrate quite how enormous it truly is, and discuss how DITA content helps us address a new crop of user expectations. We will look at how the true scale of changes in culture and expectations that impact communication, real-world scenarios where user and products will operate differently and why DITA is ideal to address the new challenges.
The wall falls down: Integrating our online and offline worlds [Confab 2015]Noz Urbina
[Confab version of my keynote talk]
There are no longer discrete online and offline worlds. Holding onto this idea is hurting our communications.
In this session, we will take a look at communications that seamlessly blend physical and digital experiences. When you take omnichannel, wearable devices, and the internet of things—and put them together in one integrated ecosystem for users—the dividing line disappears.
What do we gain when we fully integrate online and offline? How should communications change to cope with a life of constantly accelerating change? We’ll look at examples and techniques that can help prepare for this new paradigm.
[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...Noz Urbina
Standards – either in the XML sense or simply communication best practices –help grow, accelerate and “professionalise” an industry. Would construction be without material standards for width and strengths, or certification for specific skills? How could we have transportation without standards for traffic and processes? Standards are what help ad-hoc processes become enterprise-class, and scale beyond our expectations.
Technical communication is in an era of rapid, disruptive and revolutionary change. The true nature of the challenge is understood by a few, and pros and cons of potential solutions by even fewer. The future therefore will require that we work together to exchange knowledge as best we can to help each other hit the many moving targets. We must do this because our old techniques and processes just can’t keep up, and no one organisation has the time or funds to re-invent every solution on their own. Standards help an organisation with little funds tackle larger challenges, and larger organisations implement profound change with reduced risk. The alternative is potentially getting left behind as the industry and community rush forward.
YouTube: http://bit.ly/20110630-change-youtube
Share how three changes have impacted pastoral care:
The Gospel remains unchanged, the way we communicate changes;
The time constraint remains unchanged, the way we spend time changes;
The desire to share remains unchanged, the way we share changes.
分享三個轉變給牧養工作帶來的挑戰:
福音不變,但溝通模式不再一樣;
時間限制不變,但時間運用不再一樣;
分享的冀望不變,但分享方式不再一樣。
Social Communications: Getting Prepared and Making it HappenMorris County NJ
With the proliferation of social channels, where should public libraries be and why should they be there? What should they post, and how can they build public interest in their social networking? This presentation includes suggestions for effective use of social communications.
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...elpinchito
This is a slide deck used for 'Netnography: Overview & How-to' presentation on Feb. 15, 2012. The presentation (watch the YouTube video below) was a part of the class assignments for "Social Media Marketing" class taught by Robert Kozinets at Schulich School of Business, York University. In this presentation, topics such as why netnography is useful for marketing research and what the researchers have to keep in mind are explored with some specific examples.
The video on the first slide is a teaser for this presentation.
The link to the recorded presentation: https://www.youtube.com/watch?v=UWApBu2ERTU&context=C31c1b83ADOEgsToPDskJO-DQt8ZUtzIA-tdvMiOHd
Designing an Online Civic Participation Platform: Socio-Computational Support...Elizabeth Murnane
Attracting new members to online communities and encouraging substantive participation is an open research problem at the intersection of behavioral and social sciences, computer science, and human computer interaction. eGovernment communities, which are intended to promote civic engagement and deliberative discussion, particularly struggle with challenges of motivation and under-contribution since barriers to entry can be significant and the personal benefits and impacts of participation may be non-obvious when compared with the costs in terms of time and effort. This presentation describes projects undertaken as part of the Cornell eRulemaking Initiative (CeRI) to address such issues. In particular, it details development of our civic engagement platform and the socio-computational supports we have designed for intelligently routing work, recommending content during deliberative writing, providing personalized moderation, and facilitating consensus-building and collaboration.
Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social MediaElizabeth Murnane
We study the behavior-change process of smoking-cessation by leveraging longitudinal data from Twitter. Finding behavioral, social, and emotional differences between abstinence vs. relapse, we provide implications for designing technology-based interventions.
The wall falls down: Integrating our online and offline worlds [Confab 2015]Noz Urbina
[Confab version of my keynote talk]
There are no longer discrete online and offline worlds. Holding onto this idea is hurting our communications.
In this session, we will take a look at communications that seamlessly blend physical and digital experiences. When you take omnichannel, wearable devices, and the internet of things—and put them together in one integrated ecosystem for users—the dividing line disappears.
What do we gain when we fully integrate online and offline? How should communications change to cope with a life of constantly accelerating change? We’ll look at examples and techniques that can help prepare for this new paradigm.
[soap Keynote] The Freedom to Grow: how standards facilitate the techcomm ind...Noz Urbina
Standards – either in the XML sense or simply communication best practices –help grow, accelerate and “professionalise” an industry. Would construction be without material standards for width and strengths, or certification for specific skills? How could we have transportation without standards for traffic and processes? Standards are what help ad-hoc processes become enterprise-class, and scale beyond our expectations.
Technical communication is in an era of rapid, disruptive and revolutionary change. The true nature of the challenge is understood by a few, and pros and cons of potential solutions by even fewer. The future therefore will require that we work together to exchange knowledge as best we can to help each other hit the many moving targets. We must do this because our old techniques and processes just can’t keep up, and no one organisation has the time or funds to re-invent every solution on their own. Standards help an organisation with little funds tackle larger challenges, and larger organisations implement profound change with reduced risk. The alternative is potentially getting left behind as the industry and community rush forward.
YouTube: http://bit.ly/20110630-change-youtube
Share how three changes have impacted pastoral care:
The Gospel remains unchanged, the way we communicate changes;
The time constraint remains unchanged, the way we spend time changes;
The desire to share remains unchanged, the way we share changes.
分享三個轉變給牧養工作帶來的挑戰:
福音不變,但溝通模式不再一樣;
時間限制不變,但時間運用不再一樣;
分享的冀望不變,但分享方式不再一樣。
Social Communications: Getting Prepared and Making it HappenMorris County NJ
With the proliferation of social channels, where should public libraries be and why should they be there? What should they post, and how can they build public interest in their social networking? This presentation includes suggestions for effective use of social communications.
Netnography: Overview and How to (Schulich School of Business, MBA class, Soc...elpinchito
This is a slide deck used for 'Netnography: Overview & How-to' presentation on Feb. 15, 2012. The presentation (watch the YouTube video below) was a part of the class assignments for "Social Media Marketing" class taught by Robert Kozinets at Schulich School of Business, York University. In this presentation, topics such as why netnography is useful for marketing research and what the researchers have to keep in mind are explored with some specific examples.
The video on the first slide is a teaser for this presentation.
The link to the recorded presentation: https://www.youtube.com/watch?v=UWApBu2ERTU&context=C31c1b83ADOEgsToPDskJO-DQt8ZUtzIA-tdvMiOHd
Designing an Online Civic Participation Platform: Socio-Computational Support...Elizabeth Murnane
Attracting new members to online communities and encouraging substantive participation is an open research problem at the intersection of behavioral and social sciences, computer science, and human computer interaction. eGovernment communities, which are intended to promote civic engagement and deliberative discussion, particularly struggle with challenges of motivation and under-contribution since barriers to entry can be significant and the personal benefits and impacts of participation may be non-obvious when compared with the costs in terms of time and effort. This presentation describes projects undertaken as part of the Cornell eRulemaking Initiative (CeRI) to address such issues. In particular, it details development of our civic engagement platform and the socio-computational supports we have designed for intelligently routing work, recommending content during deliberative writing, providing personalized moderation, and facilitating consensus-building and collaboration.
Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social MediaElizabeth Murnane
We study the behavior-change process of smoking-cessation by leveraging longitudinal data from Twitter. Finding behavioral, social, and emotional differences between abstinence vs. relapse, we provide implications for designing technology-based interventions.
This presentation is designed as an introduction to information visualization and aims to provide details about:
- Key ideas and techniques related to the creation and critique of visualizations
- What levers visualizations help us pull as designers
- Why visualizations are useful and how they relate to user goals
- Various motivations, trade-offs, and responsibilities surrounding visualizations
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextElizabeth Murnane
These are the presentation slides for the paper "RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text", which was named the Best Paper at the Web of Linked Entities (WoLE'13) workshop at the 22nd International World Wide Web Conference (WWW'13). The paper's abstract is below, along with a link to the full paper.
Abstract:
We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by developing a model of user-interest with respect to a personal knowledge context; and Wikipedia, a particularly well-established and reliable knowledge base, is used to instantiate the procedure. We conduct systematic evaluations using individuals' posts from Twitter, YouTube, and Flickr and demonstrate that our novel technique is able to achieve substantial performance gains beyond state-of-the-art NED methods.
Full Paper: http://arxiv.org/abs/1304.2401
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
Similar to Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints
Open-source intelligence (OSINT) is intelligence collected from publicly available sources. In the intelligence community (IC), the term "open" refers to overt, publicly available sources (as opposed to covert or clandestine sources); it is not related to open-source software or public intelligence.
Presentation of the project "Mapping historical networks . Building the Biographical / prosopographical Information System (APIS)" at the congress Europa baut auf Biographien in Wien / Vienna.
How can we mine, analyse and visualise the Social Web?
In this lecture, you will learn about mining social web data for analysis. Data preparation and gathering basic statistics on your data.
Talk at a Data Journalism BootCamp organised by ICFJ, World Bank Group and African Media Initiative in New Delhi to a group of 60 journalists, coders and social sector folks. Other amazing sessions included those from Govind Ethiraj of IndiaSpend, Andrew from BBC, Parul from Google, Nasr from HacksHacker, Thej from DataMeet and David from Code for Africa. http://delhi.dbootcamp.org/
Webinar presented to Cognizant Analytics Audiences on Context, Narratives & Big Data Analytics. This presentation draws from my previous presentation made at a conference on Narratives & Big Data.
This is the group presentation (MIC - Made in China) for the client Headway UK, which is a national and local charity looking after people with head injuries.
Does your performing arts organization need help getting found online? Are you maximizing the digital opportunities for connecting with your potential audience and ticket purchasers? Get an overview of the ways digital marketing can help with this goal, with a deeper dive into search and recommendation engines and the opportunity presented by Wikidata.
This presentation was developed and delivered as part of the Linked Digital future Initiative. For more information, visit: https://linkeddigitalfuture.ca/resources/workshops/
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
A presentation to the Owen Graduate School of Management (Vanderbilt University) about social media and some of the technology behind the future uses of social media that are likely to shape the future of the Web as we know it.
The goal of this presentation is to allow researchers to understand the possibilities of Social Media as a research field on the fields related to NLP/IR/DM.
Technology Trends Social Media, a copy of the presentation delivered by Alistair Leathwood, Managing Director, Freshminds Research from the CIM East of England Summer Marketing Conference held on 9 June 2011 at ARU, Chelmsford
LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio.
To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way.
This LinkedIn talk at the NYC Hadoop Meetup held 3/18 at ContextWeb focused on best practices for quickly uncovering patterns, visualizing trends, and generating actionable insights from large datasets.
Cortana intelligence suite for projects & hacksLee Stott
Microsoft Cortana Intelligence Suite the perfect selection of APIs and Tools for Student UG Projects and Master Project and the best tool kits for hackathons
Similar to Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints (20)
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Noticing the Nuance: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions of our digital footprints
1. NOTICING
THE
NUANCE:
Designing
intelligent
systems
that
can
understand
semantic,
psychological,
and
behavioral
dimensions
of
our
digital
footprints
Elizabeth
L.
Murnane
elm236@cornell.edu
www.cs.cornell.edu/~elm236/
2. ABOUT
ELIZABETH
Currently
• 3nd
year
PhD
at
Cornell
Information
Science
• Committee:
Profs.
Dan
Cosley
(chair),
Claire
Cardie,
Geri
Gay
Research
• Personalization;
IR/NLP;
Personal
Informatics;
Affective-‐,
Semantic-‐,
Social-‐
Computing
• 2011
NSF
Graduate
Research
Fellow
Background
• 2007
MIT
S.B.
in
Mathematics
with
Computer
Science
• Co-‐founded
MIT
CSAIL
startup
3. USER-‐CENTRIC
DATA
• Explicit
&
Implicit
• User-‐generated
content
• Sensor
data
• Big
Data
&
Big
Personal
Data
(“Little
Data”)
7. DIGITAL
FOOTPRINTS
• Search
Queries
• Social
web,
microblogs,
media
sharing
• Mobile
sensing,
personal
informatics,
life-‐logging,
check-‐ins
8. DIGITAL
FOOTPRINTS
• Search
Queries
• Social
web,
microblogs,
media
sharing
• Mobile
sensing,
personal
informatics,
life-‐logging,
check-‐ins
• Social
networking
9. NUANCED
DIMENSIONS
OF
DATA
• Semantics
• Helping
machines
extract
intended
meaning
from
an
individual’s
content
• Personality
&
Emotion
• Helping
machines
interpret
psychological,
affective,
and
subjective
characteristics
of
users
and
their
data
• Behavior
• Helping
machines
understand
the
dynamics
of
both
private
and
interpersonal
activities
13. THE
RESLVE
PROJECT
• Gain
better
understanding
of
challenges
machines
face
in
understanding
semantic
meaning
of
social
Web
data
• Use
those
insights
to
develop
more
advanced
computational
methods
that
can
more
reliably
make
sense
of
this
data
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
19. TASK
DEFINITION
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
Named
En)ty
Recogni)on
(NER)
• SystemaEcally
idenEfying
menEons
of
en##es
(e.g.,
people,
places,
concepts,
ideas)
20. TASK
DEFINITION
Named
En)ty
Recogni)on
(NER)
• SystemaEcally
idenEfying
menEons
of
en##es
(e.g.,
people,
places,
concepts,
ideas)
Named
En)ty
Disambigua)on
(NED)
• Resolving
the
intended
meaning
of
ambiguous
enEEes
from
mulEple
candidate
meanings
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
21. AMBIGUOUS
ENTITIES
aaahh
one
more
day
un,l
finn!!!
#cantwait
office
holiday
party
Beetle
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
22. AMBIGUOUS
ENTITIES
aaahh
one
more
day
un,l
finn!!!
#cantwait
office
holiday
party
Beetle
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
23. AMBIGUOUS
ENTITIES
aaahh
one
more
day
un,l
finn!!!
#cantwait
office
holiday
party
Beetle
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
24. AMBIGUOUS
ENTITIES
aaahh
one
more
day
un,l
finn!!!
#cantwait
office
holiday
party
Beetle
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
25. Footage:
office
holiday
party
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
26. Footage:
office
holiday
party
Footage:
• Workplace?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
27. Footage:
office
holiday
party
Footage:
• Workplace?
• TV
Show?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
28. Footage:
office
holiday
party
Footage:
• Workplace?
• TV
Show?
Episode
4
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
29. Footage:
office
holiday
party
Episode
4
Footage:
• Workplace?
• TV
Show?
• US
Version?
• UK
Version?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
30. Episode
4
office
holiday
party
office,
december
3
Footage:
• Workplace?
• TV
Show?
• US
Version?
• UK
Version?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
31. ANALYSIS
Data
Sample
• TwiKer:
tweets
• YouTube:
video
Etles,
descripEons
• Flickr:
photo
tags,
Etles,
descripEons
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
32. TEXT
LENGTH
• Longest
uKerances
sEll
shorter
than
even
shortest
texts
from
NER
task
corpora
like
Reuters-‐21578,
Brown-‐Corpus
0"
5"
10"
15"
20"
25"
30"
10"
40"
70"
100"
130"
160"
190"
300"
450"
600"
800"
1100"
1400"
2500"
4000"
5500"
7000"
8500"
10000"
11500"
13000"
14500"
Twi/er" YouTube" Flickr"
Reuters" Brown"
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
34. HIGH
AMBIGUITY
• NER
services
have
low
confidence
• Many
potenEal
candidates
(2
to
163,
avg.
5-‐6,
median
4)
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Wikipedia"Miner" DBPedia"Spotlight"
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
35. HIGH
AMBIGUITY
• 91%
of
uKerances
contain
at
least
1
ambiguous
enEty
• 2/3
of
enEEes
detected
are
ambiguous
• Almost
no
enEEes
without
at
least
2
senses
to
disambiguate
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
36. CHALLENGES
&
FOCUS
• Short
Length
• Sparse
Lexical
Context
• Noisy
• Highly
personal
in
nature
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
37. CHALLENGES
&
FOCUS
• Short
Length
• Sparse
Lexical
Context
• Noisy
• Highly
personal
in
nature
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
38. LIMITATIONS
OF
EXTANT
RESEARCH
Tweets
severely
degrade
tradiEonal
techniques
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
39. LIMITATIONS
OF
EXTANT
RESEARCH
Tweets
severely
degrade
tradiEonal
techniques
• Stanford
NER:
F1
drops
90%
à
46%
• DBPedia
Spotlight
&
Wikipedia
Miner:
P@1
<
40%
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
40. LIMITATIONS
OF
EXTANT
RESEARCH
Tweets
severely
degrade
tradiEonal
techniques
• Stanford
NER:
F1
drops
90%
à
46%
• DBPedia
Spotlight
&
Wikipedia
Miner:
P@1
<
40%
Recent
strategies
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
41. LIMITATIONS
OF
EXTANT
RESEARCH
Tweets
severely
degrade
tradiEonal
techniques
• Stanford
NER:
F1
drops
90%
à
46%
• DBPedia
Spotlight
&
Wikipedia
Miner:
P@1
<
40%
Recent
strategies
• Crowd-‐sourcing
• LimitaEon:
Dependent
on
reliable
human
workers
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
42. LIMITATIONS
OF
EXTANT
RESEARCH
Tweets
severely
degrade
tradiEonal
techniques
• Stanford
NER:
F1
drops
90%
à
46%
• DBPedia
Spotlight
&
Wikipedia
Miner:
P@1
<
40%
Recent
strategies
• Crowd-‐sourcing
• LimitaEon:
Dependent
on
reliable
human
workers
• Automated
aKempts
• LimitaEon:
Focus
on
NER
not
NED
• LimitaEon:
Generalizability
beyond
TwiKer?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
43. HYPOTHESES
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
• User
has
core
interests
• User
more
likely
to
menEon
an
enEty
about
a
topic
relevant
to
personal
interests
than
menEon
a
topic
of
non-‐interest
• User
expresses
these
interests
consistently
in
content
she
posts
online
in
mulEple
communiEes
• Can
use
a
semanEc
knowledge
base
to
formally
represent
these
topics
of
interest
44. HYPOTHESES
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
• User
has
core
interests
• User
more
likely
to
menEon
an
enEty
about
a
topic
relevant
to
personal
interests
than
menEon
a
topic
of
non-‐interest
• User
expresses
these
interests
consistently
in
content
she
posts
online
in
mulEple
communiEes
• Can
use
a
semanEc
knowledge
base
to
formally
represent
these
topics
of
interest
• Wikipedia
• ArEcles,
categories
effecEvely
represent
topic
• CompaEble
with
NER
toolkits
(DBPedia
Spotlight,
Wikipedia
Miner)
ArEcle
ediEng
behavior
≈
interests
45. QUALITATIVE
ANALYSIS:
STABLE
INTERESTS
User’s
topics
of
contribuEon
similar
across
Web:
On
average,
52.4%
of
enEEes
a
user
menEons
in
social
Web
(e.g.,
“Java”)
have
at
least
1
candidate
sense
in
same
parent
category
of
Wikipedia
arEcle
same
user
edited
(e.g.,
“Programming
language”)
If
extend
to
just
4
parents
up
category
hierarchy,
get
all
100%
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
46. QUALITATIVE
ANALYSIS:
STABLE
INTERESTS
User’s
topics
of
contribuEon
similar
across
Web:
Same
Topic
On
average,
52.4%
of
enEEes
a
user
menEons
in
social
Web
(e.g.,
“Java”)
have
at
least
1
candidate
sense
in
same
parent
category
of
Wikipedia
arEcle
same
user
edited
(e.g.,
“Programming
language”)
If
extend
to
just
4
parents
up
category
hierarchy,
get
all
100%
Ambiguous
YouTube
post:
office,
december
3
Same
user’s
recent
Wikipedia
edit:
<item
userid="xxxx"
user="xxxx”
pageid="31841130”
,tle=
"The
Office
(U.S.
season
8)"/>
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
47. QUALITATIVE
ANALYSIS:
STABLE
INTERESTS
A
user’s
topics
of
contribuEon
similar
across
Web:
Same
Topic
Same
categories
On
average,
52.4%
of
enEEes
a
user
menEons
in
social
Web
(e.g.,
“Java”)
have
at
least
1
candidate
sense
in
same
parent
category
of
Wikipedia
arEcle
same
user
edited
(e.g.,
“Programming
language”)
If
extend
to
just
4
parents
up
category
hierarchy,
get
all
100%
Ambiguous
YouTube
post:
office,
december
3
Same
user’s
recent
Wikipedia
edit:
<item
userid="xxxx"
user="xxxx”
pageid="31841130”
,tle=
"The
Office
(U.S.
season
8)"/>
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
48. STRATEGY
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
Ø Bridge
user
idenEty
between
social
Web
and
knowledge
base,
K
Ø Model
interests
using
K’s
organizaEonal
scheme
Ø Rank
enEty
senses
according
to
relevance
to
interests
49. EXPLORING
A
PERSONALIZED
SOLUTION
Individual-‐centric
approach
to
NED
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
50. EXPLORING
A
PERSONALIZED
SOLUTION
Individual-‐centric
approach
to
NED
Incorporates
external,
user-‐specific
semanEc
data
Personal
Context
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
51. EXPLORING
A
PERSONALIZED
SOLUTION
Individual-‐centric
approach
to
NED
Incorporates
external,
user-‐specific
semanEc
data
Model
personal
interests
with
respect
to
this
informaEon
Personal
Context
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
52. EXPLORING
A
PERSONALIZED
SOLUTION
Individual-‐centric
approach
to
NED
Incorporates
external,
user-‐specific
semanEc
data
Model
personal
interests
with
respect
to
this
informaEon
Determine
user’s
likely
intended
meaning
of
ambiguous
enEty
based
on
similarity
between
potenEal
meanings
and
interests
Personal
Context
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
53. EXPLORING
A
PERSONALIZED
SOLUTION
Individual-‐centric
approach
to
NED
Incorporates
external,
user-‐specific
semanEc
data
Model
personal
interests
with
respect
to
this
informaEon
Determine
user’s
likely
intended
meaning
of
ambiguous
enEty
based
on
similarity
between
potenEal
meanings
and
interests
RESLVE
Resolving
EnEty
Sense
by
LeVeraging
Edits
Personal
Context
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
54. IMPLEMENTATION:
THE
RESLVE
SYSTEM
RESLVE
(Resolving
EnEty
Sense
by
LeVeraging
Edits)
addresses
NED
by:
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
55. IMPLEMENTATION:
THE
RESLVE
SYSTEM
RESLVE
(Resolving
EnEty
Sense
by
LeVeraging
Edits)
addresses
NED
by:
I. ConnecEng
social
Web
+
Wikipedia
editor
idenEty
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
56. IMPLEMENTATION:
THE
RESLVE
SYSTEM
RESLVE
(Resolving
EnEty
Sense
by
LeVeraging
Edits)
addresses
NED
by:
I. ConnecEng
social
Web
+
Wikipedia
editor
idenEty
II. Modeling
topics
of
interests
using
arEcle
edits
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
57. IMPLEMENTATION:
THE
RESLVE
SYSTEM
RESLVE
(Resolving
EnEty
Sense
by
LeVeraging
Edits)
addresses
NED
by:
I. ConnecEng
social
Web
+
Wikipedia
editor
idenEty
II. Modeling
topics
of
interests
using
arEcle
edits
III. Ranking
enEty
candidates
by
personal
relevance
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
58. IMPLEMENTATION:
THE
RESLVE
SYSTEM
RESLVE
(Resolving
EnEty
Sense
by
LeVeraging
Edits)
addresses
NED
by:
I. ConnecEng
social
Web
+
Wikipedia
editor
idenEty
II. Modeling
topics
of
interests
using
arEcle
edits
III. Ranking
enEty
candidates
by
personal
relevance
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
59. PHASE
1:
BRIDGING
WEB
IDENTITIES
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
• Connect
idenEty
of
social
media
user
with
Wikipedia
editor
60. PHASE
1:
BRIDGING
WEB
IDENTITIES
• Connect
idenEty
of
social
media
user
with
Wikipedia
editor
• Simple
string
matching
Iofciu,
2011;
Perito,
2011
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
61. pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Models
user’s
topics
of
interest
using
bridged
Wiki
account’s
ediEng-‐history
Compares
similarity
of
those
topics
to
topic
associated
with
candidate
sense
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
PHASE
2:
REPRESENTING
USERS
AND
ENTITIES
62. Models
user’s
topics
of
interest
using
bridged
Wiki
account’s
ediEng-‐history
Compares
similarity
of
those
topics
to
topic
associated
with
candidate
sense
Content-‐based
&
knowledge-‐graph
based
similarity
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
PHASE
2:
REPRESENTING
USERS
AND
ENTITIES
63. MODELING
A
KNOWLEDGE
CONTEXT
Knowledge
base,
K
K=(N,E)
2
node
types:
Categories
Topics
c1
c2
c4
t3t2
c3
d2d1 d3
t1
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
64. USER
INTEREST
MODEL
• EdiEng
a
descripEon
signals
interest
in
associated
topic
• Topic
nodes:
all
topics
user
edited
descripEon
of
• Category
nodes:
categories
reachable
in
knowledge
graph
from
those
topics
• Edge
weight
=
inverse
of
shortest
path
length
! c1 c2 c3 c4
t1
!
!
! 1!
!
!
! 0!
t2
!
!
! 1!
!
!
! 1!
t3 0! 0!
!
!
! 1!
• Same
representaEon
for
candidates
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
65. Models
user’s
topics
of
interest
using
bridged
Wiki
account’s
ediEng-‐history
Compares
similarity
of
those
topics
to
topic
associated
with
candidate
sense
Content-‐based
&
knowledge-‐graph
based
similarity
Weighted
vectors
used
to
represent
user
and
candidate
sense
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
PHASE
2:
REPRESENTING
USERS
AND
ENTITIES
66. PHASE
3:
RANKING
BY
PERSONAL
RELEVANCE
Output
highest
scoring
candidate
as
intended
meaning
by
measuring:
sim(u,m)=α*simcontent(u,m)+(1-‐α)*simcategory(u,m)
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
67. PRE-‐PROCESSING
&
PREPARATION
MODULES
pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
68. pre-
processor
Wikipedia
Miner
user utterances
unstructured
short texts
DBPedia
Spotlight
top ranked
personally-
relevant
candidates
entity
m
m
m
entity
username
user contributed
structured
documents
user interest
model
BRIDGING
USER
IDENTITY
MODELING
USER
INTEREST
I II
III
RANKING
CANDIDATES
BY PERSONAL
RELEVANCE
m
m
m
m m
m m
m
m
m
entity
entity
detected entities &
candidate meanings ("m")
PRE-‐PROCESSING
&
PREPARATION
MODULES
69. EXPERIMENT
Labeling
correct
enEty
meaning
• 1545
valid
ambiguous
enEEes
• Mechanical
Turk
CategorizaEon
Masters
• Averaged
observed
agreement
across
all
coders
and
items
=
0.866
• Average
Fleiss
Kappa
=
0.803
• 918
unanimously
labeled
ambiguous
enEEes
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
70. PERFORMANCE
Metric
• Precision
at
rank
1
(P@1)
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
71. PERFORMANCE
Metric
• Precision
at
rank
1
(P@1)
Methods
of
comparison
• Human
annotated
gold
standard
• RC:
Randomly
sorted
candidates
• PF:
Prior
frequency
• RU:
RESLVE
given
a
random
Wikipedia
user's
interest
model
• DS:
DBPedia
Spotlight
• WM:
Wikipedia
Miner
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
73. RESULTS
• Best
performance
on
YouTube
texts
(longest)
due
to
content-‐based
sim
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
74. RESULTS
• Best
performance
on
YouTube
texts
(longest)
due
to
content-‐based
sim
• Outperforms
on
more
personal
text
(e.g.,
tweets)
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
75. RESULTS
• Best
performance
on
YouTube
texts
(longest)
due
to
content-‐based
sim
• Outperforms
on
more
personal
text
(e.g.,
tweets)
• Less
effecEve
on
impersonal
text
(e.g.,
photo
geo-‐tags)
•
High
prior
frequency
so
standard
methods
suffice
• Personally-‐unfamiliar
topics
so
not
likely
to
make
Wiki
edits
about
them
• Stable
interests
assumpEon
breaks
down
here
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
78. SENTIMENT
BASED
SEARCH
• Zip
codes
of
10
most
populated
cities,
10
least
populated
cities,
10
random
cities
across
the
country
• 54,015
places
across
1500
US
cities
• Movie
theaters,
hotels,
spas,
stores,
restaurants,
etc.
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
83. CHALLENGES
FOR
USERS
• Interpreting
mixed
reviews
• Confidence
in
reviewer’s
subjective
opinions
• Reading
multiple
reviews
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
84. RESEARCH
QUESTIONS
• Language
and
rating
• How
does
a
place’s
rating
relate
to
the
language
used
in
its
reviews?
• Personality
and
rating
• Do
people
with
similar
personalities
tend
to
like
or
dislike
the
same
places?
• Search
interfaces
• How
can
we
rank
search
results
in
order
to
recommend
places
according
to
how
appealing
their
atmosphere
is
likely
to
be
to
a
user
based
on
her
personality
and
mood?
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
85. STRATEGY
• Extract
features
from
reviews
using
Linguistic
Inquiry
and
Word
Count
(LIWC)
and
MRC
Psycholinguistic
Database
• Support
vector
models
trained
by
Mairesse
algorithm
to
derive
Big
Five
personality
types
of
reviewers
• Average
personality
score
of
reviewers
of
a
place
who
rated
the
place
5
or
higher/lower
as
proxy
for
people
who
like/dislike
a
location’s
essence
Information
Retrieval
Knowledge
Sharing
Personal
Informatics
92. CERI:
CORNELL
E-‐RULEMAKING
INITIATIVE
• Law
School
• Legal
Information
Institute
(LII)
• Information
Science
• Computer
Science
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
93. BACKGROUND
• Rulemaking:
process
federal
agencies
use
to
create
regulations
(called
“rules”)
• e-‐Rulemaking:
the
use
of
digital
technologies
during
this
process
• Regulations.gov,
RegulationRoom.org:
online
communities
that
allow
people
to
learn
about,
discuss,
and
react
to
proposed
rules
during
e-‐Rulemaking
process
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
94. BACKGROUND
• Rulemaking:
process
federal
agencies
use
to
create
regulations
(called
“rules”)
• e-‐Rulemaking:
the
use
of
digital
technologies
during
this
process
• Regulations.gov,
RegulationRoom.org:
online
communities
that
allow
people
to
learn
about,
discuss,
and
react
to
proposed
rules
during
e-‐Rulemaking
process
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
95. PARTICIPATION
PATTERNS
• Regulations.gov
• 14,000
rules
• 2
million
comments
• Regulation
Room
• 5
live
rules
• 1,318
comments
• Common
problem:
under-‐contribution
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
97. PARTICIPATION
PATTERNS
Frequency
of
comments
per
rule
Comments
per
rule
across
agencies
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
98. CHALLENGE
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
• A
major
goal
of
eRulemaking
is
to
increase
public
participation
across
a
broad
audience
and
make
the
process
more
representative
• A
major
challenge
is
sustained
participation
by
multiple
actors
across
rules
99. SOLUTION
• Twitter
is
a
popular
medium
where
people
express
views
and
ideas
• Identify
and
target
Twitter
users
who
may
be
interested
in
contributing
feedback
on
a
rule
A
solution
is
to
bring
new
users
to
an
e-‐rule
• A
major
goal
of
eRulemaking
is
to
increase
public
participation
across
a
broad
audience
and
make
the
process
more
representative
• A
major
challenge
is
sustained
participation
by
multiple
actors
across
rules
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
100. EXPERIMENT
• How
useful
is
Twitter
content
for
drawing
inferences
about
people’s
interests
and
knowledgeability
about
a
topic?
• Are
users
who
create
content
about
topics
relevant
to
an
e-‐rule
more
likely
to
engage
in
related
e-‐Rulemaking
processes
if
targeted
with
requests
for
participation?
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
101. 1 Identify
Subjects
Bio
Tweet
Combo
Contro
l
• Similarity
between
query
and
each
document
• Highest
score
used
to
assign
user
to
condition
*
via
Google
Keyword
Tool,
which
provides
less
technical
words
used
by
public
to
discuss
same
topics
User:
Rule:
Document
term
matrix
Query
q = words in rule +
query expansion *
D1 = bio D2 = tweets
D3 = bio+tweets (“combo”)
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
102. 2
² Highest
ranked
users
in
each
group
sent
an
outreach
tweet
Send
Tweets
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
103. 3
• Engagement
(retweets,
replies,
and
follows)
• Click
Through
Rate
• Contributed
to
the
rule
Measure
Response
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
104. PSYCHOLOGICAL
TRAITS
OF
EFFECTIVE
CONTRIBUTORS
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
• Connecting
psychological
traits,
language
use,
and
contribution
capability
• Classification,
Outreach,
and
Task
Routing
105. PSYCHOLOGICAL
TRAITS
OF
EFFECTIVE
CONTRIBUTORS
• Connecting
psychological
traits,
language
use,
and
contribution
capability
• Classification,
Outreach,
and
Task
Routing
• Inventories
• Self-‐efficacy
&
self-‐esteem
• Big
5
personality
• Self-‐regulation
&
self-‐monitoring
• Trendsetting
&
Opinion
Leadership
• Pro-‐social
&
altruistic
value
orientations
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
106. COMPUTATIONAL
SUPPORTS
FOR
KNOWLEDGE
SHARING
• Meaningful
games
to
teach
community
norms
• Personalized
rule
recommendation
• Providing
assistance,
prompts,
and
examples
to
improve
the
quality
of
contributions
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
107. COMPUTATIONAL
SUPPORTS
FOR
KNOWLEDGE
SHARING
• Meaningful
games
to
teach
community
norms
• Personalized
rule
recommendation
• Providing
assistance,
prompts,
and
examples
to
improve
the
quality
of
contributions
Knowledge
Sharing
Personal
Informatics
Information
Retrieval
108. RESEARCH
PROJECTS
Knowledge
Sharing
• Psychological
• Behavioral
• CeRI
• Outreach
• Task
routing
• Commenting
interface
• Smart
Pensieve
• Activity
Rhythms
• Smoking
Cessation
• Semantic
• Psychological
Information
Retrieval
• RESLVE
• Sentiment-‐based
search
Personal
Informatics
• Semantic
• Psychological
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
Computational
Problem:
Dimensions
Mined:
Projects:
109. RESEARCH
PROJECTS
Knowledge
Sharing
• Psychological
• Behavioral
• CeRI
• Outreach
• Task
routing
• Commenting
interface
• Activity
Rhythms
• Smoking
Cessation
• Semantic
• Psychological
Information
Retrieval
• RESLVE
• Sentiment-‐based
search
Personal
Informatics
• Semantic
• Psychological
• Smart
Pensieve
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
Computational
Problem:
Dimensions
Mined:
Projects:
110. REMINISCENCE
• Current
tools
are
too
technically
focused
• Emphasize
data
capture
and
logging
(photos,
videos,
scanned
documents)
• Treats
memories
as
information
to
be
later
manipulated
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
111. REMINISCENCE
• Current
tools
are
too
technically
focused
• Emphasize
data
capture
and
logging
(photos,
videos,
scanned
documents)
• Treats
memories
as
information
to
be
later
manipulated
• But
the
activity
of
reminiscence
is
actually..
• Imprecise
• Social
• Nuanced
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
112. SMART
PENSIEVE:
WHAT
MAKES
A
MEMORY
MEANINGFUL?
• Content
type
• Photos,
wall
posts,
status
updates,
event
information
• Social
dynamics
• Tie
strength,
kind
of
relationship,
amount
of
interaction
• Temporal
features
• Recent,
distant
past
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
121. SMOKING
CESSATION
• Leading
cause
of
preventable
death
&
leading
form
of
chemical
dependence
in
U.S.
• 44
million
smokers
in
the
U.S.
alone
(1/5
of
population)
• 68.8%
report
they
want
to
quit
and
over
50%
have
tried
for
at
least
1
day
in
the
past
year
• Relapse
common
&
a
minority
permanently
abstain
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
122. INTERVENTION
• Requires
tailoring
to
individual
conditions
• Lack
of
long
term
patient
assessment
&
follow-‐up
• Access
and
affordability
are
obstacles
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
123. INTERVENTION
• Requires
tailoring
to
individual
conditions
• Lack
of
long
term
patient
assessment
&
follow-‐up
• Access
and
affordability
are
obstacles
• Technology
based
interventions
have
major
shortcomings
• Low
adherence
to
established
guidelines
• Not
personalized
• Unable
to
handle
user
struggles
and
setbacks
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
124. FACTORS
INFLUENCING
OUTCOME
• Personal,
psychological,
emotional
traits
• Behaviors
&
activities
• Environment
and
social
interactions
• Cessation
motivations
and
process
125. LEVERAGING
DIGITAL
FOOTPRINTS
• Naturally
expressed
language
• Content
is
posted
spontaneously
and
regularly
• Social
setting
• Low-‐cost,
large-‐scale,
longitudinal
data
access
Information
Retrieval
Personal
Informatics
Information
Retrieval
Knowledge
Sharing
126. MAKE
A
PREDICTION
General
illness
+
coughing
+
wheezing
=
Today
I
quit
smoking.
Just
saw
a
cigarette
commercial
with
people
with
holes
in
their
throat.
It's
official.
No
more
cigarettes.
Today,
I
quit
smoking.
My
son
came
home
with
an
ashtray
he
made
in
arts
and
crafts
class.
FML
127. MAKE
A
PREDICTION
General
illness
+
coughing
+
wheezing
=
Today
I
quit
smoking.
Just
saw
a
cigarette
commercial
with
people
with
holes
in
their
throat.
It's
official.
No
more
cigarettes.
Today,
I
quit
smoking.
My
son
came
home
with
an
ashtray
he
made
in
arts
and
crafts
class.
FML
128. MAKE
A
PREDICTION
General
illness
+
coughing
+
wheezing
=
Today
I
quit
smoking.
Just
saw
a
cigarette
commercial
with
people
with
holes
in
their
throat.
It's
official.
No
more
cigarettes.
Today,
I
quit
smoking.
My
son
came
home
with
an
ashtray
he
made
in
arts
and
crafts
class.
FML
n i’m
cool,
day
4
no
cigs
but
my
mom
smokes,
i
stay
with
her,
does
not
respect
me
trying
to
quit
:
n I
quit
smoking
on
Sunday
evening.
Day
3
today.
I
feel
exhausted,
annoyed,
bored.
But
the
fight
must
go
on.
Keep
fighting
:)
n somebody
is
getting
punched
in
the
f***ing
mouth
today.
#coldturkey
129. MAKE
A
PREDICTION
General
illness
+
coughing
+
wheezing
=
Today
I
quit
smoking.
Just
saw
a
cigarette
commercial
with
people
with
holes
in
their
throat.
It's
official.
No
more
cigarettes.
Today,
I
quit
smoking.
My
son
came
home
with
an
ashtray
he
made
in
arts
and
crafts
class.
FML
n i’m
cool,
day
4
no
cigs
but
my
mom
smokes,
i
stay
with
her,
does
not
respect
me
trying
to
quit
:
n I
quit
smoking
on
Sunday
evening.
Day
3
today.
I
feel
exhausted,
annoyed,
bored.
But
the
fight
must
go
on.
Keep
fighting
:)
n somebody
is
getting
punched
in
the
f***ing
mouth
today.
#coldturkey
130. METHODOLOGY
&
DATA
COLLECTION
• Identify
smokers
• Query
Twitter
firehose
for
cessation
event
tweets
• Sample
2000
users
• 3
Mechanical
Turkers
per
tweet
for
verification
• 2
years
worth
of
tweets
per
verified
smoker
(1
year
before
cessation
event,
1
year
after)
131. MEASURES
Activity
variables
• Tweet
volume,
burstiness,
frequency
Social
variables
• Friends,
followers,
tweets
with
@mentions,
unique
mentions
Personal
&
Emotional
variables
• Location,
sentiment
intensity
Behavior
Change
Process
variables
• Cessation
date,
motive
to
quit,
treatment,
stages
of
behavior
change
132. MEASURES
Activity
variables
• Tweet
volume,
burstiness,
frequency
Social
variables
• Friends,
followers,
tweets
with
@mentions,
unique
mentions
Personal
&
Emotional
variables
• Location,
sentiment
intensity
Behavior
Change
Process
variables
• Cessation
date,
motive
to
quit,
treatment,
stages
of
behavior
change
133. MEASURES
Activity
variables
• Tweet
volume,
burstiness,
frequency
Social
variables
• Friends,
followers,
tweets
with
@mentions,
unique
mentions
Personal
&
Emotional
variables
• Location,
sentiment
intensity
Behavior
Change
Process
variables
• Cessation
date,
motive
to
quit,
treatment,
stages
of
behavior
change
134. MEASURES
Activity
variables
• Tweet
volume,
burstiness,
frequency
Social
variables
• Friends,
followers,
tweets
with
@mentions,
unique
mentions
Personal
&
Emotional
variables
• Location,
sentiment
intensity
Behavior
Change
Process
variables
• Cessation
date,
motive
to
quit,
treatment,
stages
of
behavior
change
135. MEASURES
Activity
variables
• Tweet
volume,
burstiness,
frequency
Social
variables
• Friends,
followers,
tweets
with
@mentions,
unique
mentions
Personal
&
Emotional
variables
• Location,
sentiment
intensity
Behavior
Change
Process
variables
• Cessation
date,
motive
to
quit,
treatment,
stages
of
behavior
change
136. RESPONSE
VARIABLES
Outcome
Survival
/
Relapse
Survivors
Congratulations
to
me,
still
smoke
free
J
@username
nope
i
don’t
smoke
anymore
first
few
weeks
were
hard
but
I
haven’t
craved
a
cig
in
months
Relapsers
Day
26:
Broke
down
and
bought
a
pack
of
smokes
last
weekend.
Smoked
the
last
one
today.
Well,
tried
to
quit
smokin
tobacco
but..had
a
fucked
up
day
So
day
3
of
not
smoking
is
about
to
get
cut
short..i
can’t
do
it
lol
137. ALIGNMENT
WITH
CDC
REPORTS
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location
Gender
Abstinence
Rates
! !
138. ALIGNMENT
WITH
CDC
REPORTS
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location
Gender
Abstinence
Rates
! !
139. ALIGNMENT
WITH
CDC
REPORTS
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location
Gender
Abstinence
Rates
! !
140. ALIGNMENT
WITH
CDC
REPORTS
!
!
Men Women
CDC 54% 46%
Twitter 59% 41%
Location
Gender
Abstinence
Rates
! !
141. RESULTS
• Survivors
(S)
and
Relapsers
(R)
• Before
(B)
and
After
(A)
the
cessation
point
142. SIGNIFICANT
DIFFERENCES:
ACTIVITY
Tweets
before
Tweets
after
Burst
before
Burst
after
Freq
before
Freq
after
FAIL
1243
3551
10.119
10.943
3.56
2.704
SUCCEED
412
771
4.459
4.278
9.906
11.254
143. TIME
OF
DAY
!
“im
really
considering
smoking
tonight
bcause
im
so
stressed”
144. TIME
OF
DAY
!
“outside
the
club
and
guy
beside
me
smoking
makes
me
wanna”
“im
really
considering
smoking
tonight
bcause
im
so
stressed”
145. SIGNIFICANT
DIFFERENCES:
SOCIAL
Friends
before
Friends
after
Followers
before
Follwers
after
FAIL
.093
.073
.074
.064
SUCCEED
.187
.207
.114
.125
“Starting
the
patch
today.
Everyone
please
support
me
on
the
road
to
quitting
smoking”
“Ok
I
started
a
really
big
challenge
yesterday...
I
quit
smoking!
I
may
need
some
help
from
you
guys
in
the
upcoming
days/weeks”.
146. SIGNIFICANT
DIFFERENCES:
SOCIAL
Friends
before
Friends
after
Followers
before
Follwers
after
FAIL
.093
.073
.074
.064
SUCCEED
.187
.207
.114
.125
Day
2
of
not
smoking
#bittersweet
I
quit
smoking
yesterday
and
everyone
is
pissing
me
off!
Day
3
without
a
cig.
Ooo
I'm
about
to
shoot
someone
151. SUMMARY
&
CONCLUSION
• Advance
our
understanding
of
what
our
digital
footprints
reveal
about
us
as
humans
• Develop
new
computational
techniques
that
can
make
sense
of
and
utilize
this
data’s
nuanced
semantic,
psychological,
and
behavioral
dimensions
• Apply
the
resulting
intelligent
systems
across
multiple
domains
in
order
to
help
people
use
digital
information
and
have
meaningful
experiences
with
technology
152. THANK
YOU!
• Advance
our
understanding
of
what
our
digital
footprints
reveal
about
us
as
humans
• Develop
new
computational
techniques
that
can
make
sense
of
and
utilize
this
data’s
nuanced
semantic,
psychological,
and
behavioral
dimensions
• Apply
the
resulting
intelligent
systems
across
multiple
domains
in
order
to
help
people
use
digital
information
and
have
meaningful
experiences
with
technology
v Questions,
comments,
and
guidance
welcome!
Elizabeth
L.
Murnane
elm236@cornell.edu
www.cs.cornell.edu/~elm236/