SlideShare a Scribd company logo
Profiling a Person With  Log Data Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address]   Interested in how much  descriptive  information we can generate about a  people  by leveraging  search log data .
What Did We Find Out? We can tell quite a lot!
The State of Web Search
The Power of Search and the Web  Search is  the   top online activity Search drives over  7 billion monthly  queries in the U.S. Online activity has a  huge impact  on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
Analysis of Search Marketplace  Holding  fairly stable  over the last year or so, albeit with some  Bing flux
Search Logs Contains the  trace data  recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been  criticized   for  not being rich enough  (i.e., only have behaviors but  not  the  ‘why ’ factors) On the other hand, logs have been  criticized  for  recording too much  about us (i.e., logging a lot of  personal  information about a person) search logs How much we can  learn  about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of  what  a person is doing, of  why  they are doing it, and to  predict  what are they going to do next?
An illustrative example
How much can we tell from a single query?  ASIS&T  is an acronym for the American Society of Information Science and Technology  Good  probability  that this user is an  academic , a researcher, a librarian, or a student in one of these disciplines  Leveraging  demographic information : 57 percent female / 43 percent male probability  66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
How much can we tell from a single query?  Leveraging  demographic information  (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia.  Using  IP , we can locate the geographical area Based on  time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference).  Theoretically,  we can tell a lot ! However, with  billions of queries  per month, we can’t do the analysis  by hand  like this example. To develop user profiles, we need  automated methods . Research Question -  How complete of a  profile  can one develop for a Web search engine  user  from search  log  data?  [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
Specific aspects with automated methods …  Location  Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification –  where the user is at –  where the user is going –  what the user is interested in –  how motivated is the user –  Info, Nav, Transactional –  eCommerce related –  getting ready to buy –  will user click on link - demographic targeting/personalization - specific user targeting –  IP look-up script –  query term usage –  tools like Open Calais –  n-grams pattern analysis –  binary tree, k-mans clustering –  tools like MSN adLabs –  session analysis –  time series analysis - tools like MSN adLabs (need a whole lot of data)
A comment about user identification  we can tell a lot  about  a person within a group of people with search logs (i.e., behaviors) … … identifying  a particular individual is much more difficult with just search logs (probably takes ~12 – 18 months of data). Given a group of folks who use a search engine, …
User Profiling Framework  Classify user aspects into two levels:  internal  and  external .  Internal  aspects refer to  attributes  of the users themselves.  External  aspects relate to the  behavior or interest  of the users.  Interaction  between  internal  and  external  aspects. Can  infer   external  aspects from  internal  aspects.  External  aspects  reflect   internal  aspects
Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address]
Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
Back
Back

More Related Content

What's hot

50320140501002
5032014050100250320140501002
50320140501002
IAEME Publication
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
Institute of Contemporary Sciences
 
Www04 -rose
Www04 -roseWww04 -rose
209
209209
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
home
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
rangesharp
 
Ref22: Searchers Academy 2.0 Redux
Ref22: Searchers Academy 2.0 ReduxRef22: Searchers Academy 2.0 Redux
Ref22: Searchers Academy 2.0 Redux
Ahniwa Ferrari
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
ijnlc
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
Todd Rutherford
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement
Roi Blanco
 
Information retrieval system!
Information retrieval system!Information retrieval system!
Information retrieval system!
Jane Garay
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
Macemann
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
Thengo Kim
 
Neigh october2012
Neigh october2012Neigh october2012
Neigh october2012
Melanie Parlette-Stewart
 
Secondary source qual
Secondary source qualSecondary source qual
Secondary source qual
Manikandan844955
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
akash_mishra
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
RESHAN FARAZ
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
voginip
 
CRJS250 Carsuso Criminology Research Paper Guide
CRJS250 Carsuso Criminology Research Paper GuideCRJS250 Carsuso Criminology Research Paper Guide
CRJS250 Carsuso Criminology Research Paper Guide
HVCClibrary
 

What's hot (19)

50320140501002
5032014050100250320140501002
50320140501002
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 
Www04 -rose
Www04 -roseWww04 -rose
Www04 -rose
 
209
209209
209
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Ref22: Searchers Academy 2.0 Redux
Ref22: Searchers Academy 2.0 ReduxRef22: Searchers Academy 2.0 Redux
Ref22: Searchers Academy 2.0 Redux
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement
 
Information retrieval system!
Information retrieval system!Information retrieval system!
Information retrieval system!
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Neigh october2012
Neigh october2012Neigh october2012
Neigh october2012
 
Secondary source qual
Secondary source qualSecondary source qual
Secondary source qual
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 
CRJS250 Carsuso Criminology Research Paper Guide
CRJS250 Carsuso Criminology Research Paper GuideCRJS250 Carsuso Criminology Research Paper Guide
CRJS250 Carsuso Criminology Research Paper Guide
 

Viewers also liked

Stormwater Utilities: A regional and national perspective on planning and imp...
Stormwater Utilities: A regional and national perspective on planning and imp...Stormwater Utilities: A regional and national perspective on planning and imp...
Stormwater Utilities: A regional and national perspective on planning and imp...
OHM Advisors
 
Performance is the new normal 20120426-preso
Performance is the new normal 20120426-presoPerformance is the new normal 20120426-preso
Performance is the new normal 20120426-preso
PERFORMENSATION
 
Map of WWII Europe theatre
Map of WWII  Europe theatreMap of WWII  Europe theatre
Map of WWII Europe theatre
Patricia Guzman
 
Bni 2013 presentation
Bni 2013 presentationBni 2013 presentation
Bni 2013 presentation
Darren Dowdell
 
Sim House Example Dogwood
Sim House Example   DogwoodSim House Example   Dogwood
Sim House Example Dogwood
yog_live
 
Cv L.S.Bhandary Eng
Cv L.S.Bhandary EngCv L.S.Bhandary Eng
Cv L.S.Bhandary Eng
lbhandary
 
Sunday Streets Bpag Presentation 1
Sunday Streets   Bpag Presentation 1Sunday Streets   Bpag Presentation 1
Sunday Streets Bpag Presentation 1
gcantori
 
Green Stormwater: LID with GIS
Green Stormwater: LID with GISGreen Stormwater: LID with GIS
Green Stormwater: LID with GIS
OHM Advisors
 
Adventures in freemium
Adventures in freemiumAdventures in freemium
Adventures in freemium
Navin Ganeshan
 
Cartoons Innovation Dynamics writeshop
Cartoons Innovation Dynamics writeshopCartoons Innovation Dynamics writeshop
Cartoons Innovation Dynamics writeshop
hobrie
 
Indy 2009
Indy 2009Indy 2009
Indy 2009
rlantz
 
Linha 0i - Comparativo e opções
Linha 0i - Comparativo e opçõesLinha 0i - Comparativo e opções
Linha 0i - Comparativo e opções
Prestus®
 
Anais
AnaisAnais
Anais
unama
 
Linha Vivo - Comparativo e opções
Linha Vivo - Comparativo e opçõesLinha Vivo - Comparativo e opções
Linha Vivo - Comparativo e opções
Prestus®
 
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesseslesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
Jim Jansen
 
Rosa Et Al. 2010
Rosa Et Al. 2010Rosa Et Al. 2010
Rosa Et Al. 2010
sabrinarosa
 
I luv hongkong行程终极篇
I luv hongkong行程终极篇I luv hongkong行程终极篇
I luv hongkong行程终极篇
CHIN HUILING
 
Jjansen networked consumer_2011
Jjansen networked consumer_2011Jjansen networked consumer_2011
Jjansen networked consumer_2011
Jim Jansen
 
Impressionism
ImpressionismImpressionism
Impressionism
Patricia Guzman
 
Cold war (1)
Cold war (1)Cold war (1)
Cold war (1)
Patricia Guzman
 

Viewers also liked (20)

Stormwater Utilities: A regional and national perspective on planning and imp...
Stormwater Utilities: A regional and national perspective on planning and imp...Stormwater Utilities: A regional and national perspective on planning and imp...
Stormwater Utilities: A regional and national perspective on planning and imp...
 
Performance is the new normal 20120426-preso
Performance is the new normal 20120426-presoPerformance is the new normal 20120426-preso
Performance is the new normal 20120426-preso
 
Map of WWII Europe theatre
Map of WWII  Europe theatreMap of WWII  Europe theatre
Map of WWII Europe theatre
 
Bni 2013 presentation
Bni 2013 presentationBni 2013 presentation
Bni 2013 presentation
 
Sim House Example Dogwood
Sim House Example   DogwoodSim House Example   Dogwood
Sim House Example Dogwood
 
Cv L.S.Bhandary Eng
Cv L.S.Bhandary EngCv L.S.Bhandary Eng
Cv L.S.Bhandary Eng
 
Sunday Streets Bpag Presentation 1
Sunday Streets   Bpag Presentation 1Sunday Streets   Bpag Presentation 1
Sunday Streets Bpag Presentation 1
 
Green Stormwater: LID with GIS
Green Stormwater: LID with GISGreen Stormwater: LID with GIS
Green Stormwater: LID with GIS
 
Adventures in freemium
Adventures in freemiumAdventures in freemium
Adventures in freemium
 
Cartoons Innovation Dynamics writeshop
Cartoons Innovation Dynamics writeshopCartoons Innovation Dynamics writeshop
Cartoons Innovation Dynamics writeshop
 
Indy 2009
Indy 2009Indy 2009
Indy 2009
 
Linha 0i - Comparativo e opções
Linha 0i - Comparativo e opçõesLinha 0i - Comparativo e opções
Linha 0i - Comparativo e opções
 
Anais
AnaisAnais
Anais
 
Linha Vivo - Comparativo e opções
Linha Vivo - Comparativo e opçõesLinha Vivo - Comparativo e opções
Linha Vivo - Comparativo e opções
 
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesseslesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
 
Rosa Et Al. 2010
Rosa Et Al. 2010Rosa Et Al. 2010
Rosa Et Al. 2010
 
I luv hongkong行程终极篇
I luv hongkong行程终极篇I luv hongkong行程终极篇
I luv hongkong行程终极篇
 
Jjansen networked consumer_2011
Jjansen networked consumer_2011Jjansen networked consumer_2011
Jjansen networked consumer_2011
 
Impressionism
ImpressionismImpressionism
Impressionism
 
Cold war (1)
Cold war (1)Cold war (1)
Cold war (1)
 

Similar to Profiling a Person With Search Log Data

Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
Louis Rosenfeld
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
Louis Rosenfeld
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
A survey on various architectures, models and methodologies for information r...
A survey on various architectures, models and methodologies for information r...A survey on various architectures, models and methodologies for information r...
A survey on various architectures, models and methodologies for information r...
IAEME Publication
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
Louis Rosenfeld
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdfCS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Louis Rosenfeld
 
Summary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by BroderSummary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by Broder
Bhavesh Singh
 
Tallink
TallinkTallink
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
Louis Rosenfeld
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
richwig
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
Jim Jansen
 
G017415465
G017415465G017415465
G017415465
IOSR Journals
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentation
Jim Jansen
 
Search Analytics: Powerful diagnostics for your site
Search Analytics:  Powerful diagnostics for your siteSearch Analytics:  Powerful diagnostics for your site
Search Analytics: Powerful diagnostics for your site
Louis Rosenfeld
 
Ac02411221125
Ac02411221125Ac02411221125
Ac02411221125
ijceronline
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
Information Search
Information SearchInformation Search
Information Search
allerhed
 

Similar to Profiling a Person With Search Log Data (20)

Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
A survey on various architectures, models and methodologies for information r...
A survey on various architectures, models and methodologies for information r...A survey on various architectures, models and methodologies for information r...
A survey on various architectures, models and methodologies for information r...
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdfCS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 
Summary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by BroderSummary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by Broder
 
Tallink
TallinkTallink
Tallink
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
 
G017415465
G017415465G017415465
G017415465
 
Web analytics presentation
Web analytics presentationWeb analytics presentation
Web analytics presentation
 
Search Analytics: Powerful diagnostics for your site
Search Analytics:  Powerful diagnostics for your siteSearch Analytics:  Powerful diagnostics for your site
Search Analytics: Powerful diagnostics for your site
 
Ac02411221125
Ac02411221125Ac02411221125
Ac02411221125
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
 
Information Search
Information SearchInformation Search
Information Search
 

More from Jim Jansen

Networked Consumers: How networked and how important?
Networked Consumers:  How networked and how important?Networked Consumers:  How networked and how important?
Networked Consumers: How networked and how important?
Jim Jansen
 
Twitter and EWOM Branding
Twitter and EWOM BrandingTwitter and EWOM Branding
Twitter and EWOM Branding
Jim Jansen
 
Lesson_04_ist402_google_adwords_02
Lesson_04_ist402_google_adwords_02Lesson_04_ist402_google_adwords_02
Lesson_04_ist402_google_adwords_02
Jim Jansen
 
Lesson 15 When Where To Show Your Ads
Lesson 15 When Where To Show Your AdsLesson 15 When Where To Show Your Ads
Lesson 15 When Where To Show Your Ads
Jim Jansen
 
Lesson 13 Writing Good Ads 02
Lesson 13 Writing Good Ads 02Lesson 13 Writing Good Ads 02
Lesson 13 Writing Good Ads 02
Jim Jansen
 
Lesson 11 Writing Good Ads
Lesson 11 Writing Good AdsLesson 11 Writing Good Ads
Lesson 11 Writing Good Ads
Jim Jansen
 
Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02
Jim Jansen
 
Lesson 06 Ist402 Keywords 02
Lesson 06 Ist402 Keywords 02Lesson 06 Ist402 Keywords 02
Lesson 06 Ist402 Keywords 02
Jim Jansen
 
Lesson 05 Three Course Requirements
Lesson 05 Three Course RequirementsLesson 05 Three Course Requirements
Lesson 05 Three Course Requirements
Jim Jansen
 
Ist402 Google Marketing Challenge V02
Ist402 Google Marketing Challenge V02Ist402 Google Marketing Challenge V02
Ist402 Google Marketing Challenge V02
Jim Jansen
 
What Is Log Analyis
What Is Log AnalyisWhat Is Log Analyis
What Is Log Analyis
Jim Jansen
 

More from Jim Jansen (11)

Networked Consumers: How networked and how important?
Networked Consumers:  How networked and how important?Networked Consumers:  How networked and how important?
Networked Consumers: How networked and how important?
 
Twitter and EWOM Branding
Twitter and EWOM BrandingTwitter and EWOM Branding
Twitter and EWOM Branding
 
Lesson_04_ist402_google_adwords_02
Lesson_04_ist402_google_adwords_02Lesson_04_ist402_google_adwords_02
Lesson_04_ist402_google_adwords_02
 
Lesson 15 When Where To Show Your Ads
Lesson 15 When Where To Show Your AdsLesson 15 When Where To Show Your Ads
Lesson 15 When Where To Show Your Ads
 
Lesson 13 Writing Good Ads 02
Lesson 13 Writing Good Ads 02Lesson 13 Writing Good Ads 02
Lesson 13 Writing Good Ads 02
 
Lesson 11 Writing Good Ads
Lesson 11 Writing Good AdsLesson 11 Writing Good Ads
Lesson 11 Writing Good Ads
 
Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02
 
Lesson 06 Ist402 Keywords 02
Lesson 06 Ist402 Keywords 02Lesson 06 Ist402 Keywords 02
Lesson 06 Ist402 Keywords 02
 
Lesson 05 Three Course Requirements
Lesson 05 Three Course RequirementsLesson 05 Three Course Requirements
Lesson 05 Three Course Requirements
 
Ist402 Google Marketing Challenge V02
Ist402 Google Marketing Challenge V02Ist402 Google Marketing Challenge V02
Ist402 Google Marketing Challenge V02
 
What Is Log Analyis
What Is Log AnalyisWhat Is Log Analyis
What Is Log Analyis
 

Recently uploaded

Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
Management Institute of Skills Development
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 

Recently uploaded (20)

Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 

Profiling a Person With Search Log Data

  • 1. Profiling a Person With Log Data Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address] Interested in how much descriptive information we can generate about a people by leveraging search log data .
  • 2. What Did We Find Out? We can tell quite a lot!
  • 3. The State of Web Search
  • 4. The Power of Search and the Web Search is the top online activity Search drives over 7 billion monthly queries in the U.S. Online activity has a huge impact on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
  • 5. Analysis of Search Marketplace Holding fairly stable over the last year or so, albeit with some Bing flux
  • 6. Search Logs Contains the trace data recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been criticized for not being rich enough (i.e., only have behaviors but not the ‘why ’ factors) On the other hand, logs have been criticized for recording too much about us (i.e., logging a lot of personal information about a person) search logs How much we can learn about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of what a person is doing, of why they are doing it, and to predict what are they going to do next?
  • 8. How much can we tell from a single query? ASIS&T is an acronym for the American Society of Information Science and Technology Good probability that this user is an academic , a researcher, a librarian, or a student in one of these disciplines Leveraging demographic information : 57 percent female / 43 percent male probability 66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
  • 9. How much can we tell from a single query? Leveraging demographic information (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia. Using IP , we can locate the geographical area Based on time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference). Theoretically, we can tell a lot ! However, with billions of queries per month, we can’t do the analysis by hand like this example. To develop user profiles, we need automated methods . Research Question - How complete of a profile can one develop for a Web search engine user from search log data? [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
  • 10. Specific aspects with automated methods … Location Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification – where the user is at – where the user is going – what the user is interested in – how motivated is the user – Info, Nav, Transactional – eCommerce related – getting ready to buy – will user click on link - demographic targeting/personalization - specific user targeting – IP look-up script – query term usage – tools like Open Calais – n-grams pattern analysis – binary tree, k-mans clustering – tools like MSN adLabs – session analysis – time series analysis - tools like MSN adLabs (need a whole lot of data)
  • 11. A comment about user identification we can tell a lot about a person within a group of people with search logs (i.e., behaviors) … … identifying a particular individual is much more difficult with just search logs (probably takes ~12 – 18 months of data). Given a group of folks who use a search engine, …
  • 12. User Profiling Framework Classify user aspects into two levels: internal and external . Internal aspects refer to attributes of the users themselves. External aspects relate to the behavior or interest of the users. Interaction between internal and external aspects. Can infer external aspects from internal aspects. External aspects reflect internal aspects
  • 13. Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address]
  • 14. Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
  • 15. Back
  • 16. Back