SlideShare a Scribd company logo
1 of 12
Team member:
Rui Cai
Zheng Gao
Kaili Li
Expert Finding in Social Media
• We need to create an "expert finding" search engine. When
people use google and bing, they have access to numerous web
documents. However, there are many occasions that finding the
right people is more important than finding the right
documents. Our project is to create a search engine whose data
is based on twitter. Our goal is to grab a particular area experts'
twitter data and save the data in MYSQL database. With the
help of the data, when we type some query in the input box, the
search engine will generate the related experts' information as
the result.
Interpretation of Project
The Architecture of Information
Retrieval System
• The users will be experts or students who are
interested in information retrieval area and want to
know IR experts’ information.
• Data Input & Output
Goals
Data Collected
3 Million Tweets/ 3000 Users/ 1000 tweets per person
Data Collected
• Traditional Vector Space Model:
• In traditional method, we calculate the tf and idf for query and document. After
that, we will calculate the weight of query and the document. Finally, we will do
the product of query and document weight. As for the final score for query
matching the document which is 0+0+0.82+ 2.46=3.28. Then the query will go
again and again in every document to calculate each document scores. At last,
they will rank all the scores within the collection.
Approach Steps
Approach Steps(cont.)
Language Model with Dirichlet smoothing:
Sample retrieval scenarios:
Put in the topic search word
• Results:
Sample retrieval scenarios:
Evaluation of the retrieval results:
Vector Space Model Language Model
Result return time:
Nanosec
67629667000 136555119000
Precision More Precision
Return Value:
Vector Space model
return:
882=236.07245706131792
3821=209.29423849205102
877=146.25158485743512
3811=133.04793510520034
3833=127.16441840114109
3386=112.38369647213892
3719=106.16029341775558
3216=91.17045922756672
3797=90.50222759464644
Thank you!!

More Related Content

What's hot

Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
Ravi Teja
 

What's hot (20)

Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big Data Courses in Pune
Big Data Courses in PuneBig Data Courses in Pune
Big Data Courses in Pune
 
Big data
Big dataBig data
Big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Big data
Big dataBig data
Big data
 
Customer Driven Products
Customer Driven ProductsCustomer Driven Products
Customer Driven Products
 
Tutorial on Text Mining, ECML, 2002
Tutorial on Text Mining, ECML, 2002Tutorial on Text Mining, ECML, 2002
Tutorial on Text Mining, ECML, 2002
 
Data analytics course archtype
Data analytics course archtypeData analytics course archtype
Data analytics course archtype
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
Data science
Data science Data science
Data science
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Big data
Big dataBig data
Big data
 
Data science using r multisoft systems
Data science using r  multisoft systemsData science using r  multisoft systems
Data science using r multisoft systems
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 

Viewers also liked

Estructure sus ideas pdf
Estructure sus ideas pdfEstructure sus ideas pdf
Estructure sus ideas pdf
Desiderio1963
 
Dining options in vegas
Dining options in vegasDining options in vegas
Dining options in vegas
supa_snooze
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
Fabio Silva
 

Viewers also liked (19)

Fianl_Paper
Fianl_PaperFianl_Paper
Fianl_Paper
 
Hva er en spillbare rolle?
Hva er en spillbare rolle?Hva er en spillbare rolle?
Hva er en spillbare rolle?
 
Teori organismik kurt goldstein ( teori kepribadian )
Teori organismik kurt goldstein ( teori kepribadian )Teori organismik kurt goldstein ( teori kepribadian )
Teori organismik kurt goldstein ( teori kepribadian )
 
9. Perilaku Wisatawan - Influence of Culture On Consumer Behavior
9. Perilaku Wisatawan - Influence of Culture On Consumer Behavior 9. Perilaku Wisatawan - Influence of Culture On Consumer Behavior
9. Perilaku Wisatawan - Influence of Culture On Consumer Behavior
 
420 final presentation
420 final presentation 420 final presentation
420 final presentation
 
邁向 BYOD 時代的家校溝通習慣及使用
邁向 BYOD 時代的家校溝通習慣及使用邁向 BYOD 時代的家校溝通習慣及使用
邁向 BYOD 時代的家校溝通習慣及使用
 
Amman er hovedstad i Jordan og har mere end 1,2 mio
Amman er hovedstad i Jordan og har mere end 1,2 mioAmman er hovedstad i Jordan og har mere end 1,2 mio
Amman er hovedstad i Jordan og har mere end 1,2 mio
 
Zugouris ilias presentacion
Zugouris ilias presentacionZugouris ilias presentacion
Zugouris ilias presentacion
 
Tugas teknik tambang batubara Institut Teknologi Medan
Tugas teknik tambang batubara Institut Teknologi MedanTugas teknik tambang batubara Institut Teknologi Medan
Tugas teknik tambang batubara Institut Teknologi Medan
 
Pensamiento sistemico
Pensamiento sistemicoPensamiento sistemico
Pensamiento sistemico
 
London Brochure
London BrochureLondon Brochure
London Brochure
 
Estructure sus ideas pdf
Estructure sus ideas pdfEstructure sus ideas pdf
Estructure sus ideas pdf
 
Maria quillupangui
Maria quillupanguiMaria quillupangui
Maria quillupangui
 
Komponen dan fungsi alatbor
Komponen dan fungsi alatborKomponen dan fungsi alatbor
Komponen dan fungsi alatbor
 
Tesis BALANCE DE EMISIONES DE GASES DE EFECTO INVERNADERO
Tesis BALANCE DE EMISIONES DE GASES DE EFECTO INVERNADEROTesis BALANCE DE EMISIONES DE GASES DE EFECTO INVERNADERO
Tesis BALANCE DE EMISIONES DE GASES DE EFECTO INVERNADERO
 
Daily Report Commodities
Daily Report CommoditiesDaily Report Commodities
Daily Report Commodities
 
Dining options in vegas
Dining options in vegasDining options in vegas
Dining options in vegas
 
Forest
ForestForest
Forest
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 

Similar to presentation

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
dclsocialmedia
 

Similar to presentation (20)

Using AI to classify your SharePoint Data
Using AI to classify your SharePoint DataUsing AI to classify your SharePoint Data
Using AI to classify your SharePoint Data
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
Harnessing search engines for KM
Harnessing search engines for KMHarnessing search engines for KM
Harnessing search engines for KM
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Brave new search world
Brave new search worldBrave new search world
Brave new search world
 
1 data science with python
1 data science with python1 data science with python
1 data science with python
 

presentation

  • 1. Team member: Rui Cai Zheng Gao Kaili Li Expert Finding in Social Media
  • 2. • We need to create an "expert finding" search engine. When people use google and bing, they have access to numerous web documents. However, there are many occasions that finding the right people is more important than finding the right documents. Our project is to create a search engine whose data is based on twitter. Our goal is to grab a particular area experts' twitter data and save the data in MYSQL database. With the help of the data, when we type some query in the input box, the search engine will generate the related experts' information as the result. Interpretation of Project
  • 3. The Architecture of Information Retrieval System
  • 4. • The users will be experts or students who are interested in information retrieval area and want to know IR experts’ information. • Data Input & Output Goals
  • 5. Data Collected 3 Million Tweets/ 3000 Users/ 1000 tweets per person
  • 7. • Traditional Vector Space Model: • In traditional method, we calculate the tf and idf for query and document. After that, we will calculate the weight of query and the document. Finally, we will do the product of query and document weight. As for the final score for query matching the document which is 0+0+0.82+ 2.46=3.28. Then the query will go again and again in every document to calculate each document scores. At last, they will rank all the scores within the collection. Approach Steps
  • 8. Approach Steps(cont.) Language Model with Dirichlet smoothing:
  • 9. Sample retrieval scenarios: Put in the topic search word
  • 11. Evaluation of the retrieval results: Vector Space Model Language Model Result return time: Nanosec 67629667000 136555119000 Precision More Precision Return Value: Vector Space model return: 882=236.07245706131792 3821=209.29423849205102 877=146.25158485743512 3811=133.04793510520034 3833=127.16441840114109 3386=112.38369647213892 3719=106.16029341775558 3216=91.17045922756672 3797=90.50222759464644