SlideShare a Scribd company logo
1 of 15
Learning Scientific scholar representations using a
combination of Collaboration and Text data
Ankush Khandelwal
Raksha Jalan
Bhavitha K
( IRE GROUP : 40 )
Information Retrieval Extraction
IIIT-H Spring 2016
Problem :
❏ Aim of the project is to learn vector representations for authors
who publish scientific research papers .
❏ These representations should be such that authors who work in
same domain ( i.e. same research area ) must be closer in vector
space.
❏ These representations helps to categorize or cluster authors into
various categories and further predict future collaboration based
on past data.
Introduction :
❏ Representation Learning/Feature Learning technique (transformation
of raw data input into a representation) is performed to learn good
vector representations for authors.
❏ They have gained a great success in various applications like image
processing, speech recognition and natural language processing (NLP).
❏ The advantage is that once the vector representation is formed, the
difficult network mining tasks can be solved with the help of various
machine learning techniques.
Dataset :
❏ The DBLP computer science bibliography contains the metadata
of publications, written by several authors in thousands of
journals or conference proceedings series.
❏ We have used a subset of the dataset which has metadata of
around 2,75,000 papers.
Text-processing :
Parse the dataset file to
get a list of unique authors
and assign each author
with an id.
A snapshot of the auth id
file :
Co-authorship Information:
Each line in the given snapshot corresponds to a
paper.
The first line signifies that author with id 1 has
worked for the first paper.
The second line implies that authors with id’s 2
and 3 have collaborated for second paper and so
on.
The author name mapping to id is taken from the
authid file mentioned in the previous slide.
Author Label generation:
Information of all tags of the papers a particular author has worked for is taken.
Highest occurring tag is assigned to the
each author
Training Neural Network :
I/p file : The input to the neural network will be the refined co-authorship file which contains
authors in positive and negative context w.r.t to every author.
Neural Network continued ..
➢ We have used torch for training neural network.
➢ Neural network is feeded with the positive and negative samples and is being iterated
for 10 epochs containing authors in the dataset and the vector representation for each
author is learned.
➢ The vector representations are learned to finally get authors in positive context closer
on vector space.
Vector representation sample (word-embedding size=30)
1:0.12774897468519,-1.2134315799647,0.28491147244956,0.8021796034968,0.24783552528964,0.064771391008334,-0.62943657350973,
-1.5811627032589,0.50791467408229,-0.016128751957846,-0.95420926437372,0.3088518152673,
-0.18527131689276,0.95070454842939,0.60509919040003,1.3706830088368,0.59082443074081,-2.3339685239631,
-2.5307487148746,0.2078369289687,0.32913756016955,1.6364679430803,0.65293421732019,-0.66457122621034,0.28869327954787,0.64982010840204,1.8983918247831,
-0.52790655050569,0.12223315845681,0.63230901357502
Classifying the vector Representations
Classification techniques that are used to classify the vector representations of
the authors are as follows
A. Stochastic Gradient Descent.
B. Support Vector Machines : RBF kernel is used and grid search is
performed.
C. Random Forest Classification.
RESULTS
Classifier Accuracy
Stochastic Gradient Descent 10
Random Forest 28
SVM ( rbf kernel ) 30
CONCLUSION
Mean accuracy of 28 percent was observed on using random forest as
compared to SVM giving 30 percent.
Full text of the paper can be considered to get author representations
if in positive context closer based on the semantic context of papers
they worked on.
For negative context author selection,considering 1 degree or more
might also add on to the accuracy.
CHALLENGES :
Authors being sparsely distributed:
Many papers contained single author and this information of
authors who did not collaborate with any author were ignored while
feeding the input to neural network.
Thank You!

More Related Content

What's hot

A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...Mirsaeid Abolghasemi
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?SwiftKeyComms
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessVaibhav Khanna
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Introduction to artificial neural networks
Introduction to artificial neural networksIntroduction to artificial neural networks
Introduction to artificial neural networksChetan Ruparel
 
Machine Learning - Supervised learning
Machine Learning - Supervised learningMachine Learning - Supervised learning
Machine Learning - Supervised learningManeesha Caldera
 
Predicting Airbnb New User Bookings
Predicting Airbnb New User BookingsPredicting Airbnb New User Bookings
Predicting Airbnb New User BookingsAnaelia Ovalle
 
Data Science, Data & Dashboards Design
Data Science, Data & Dashboards DesignData Science, Data & Dashboards Design
Data Science, Data & Dashboards DesignKoo Ping Shung
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 
Decision Tree from Scratch in Python
Decision Tree from Scratch in PythonDecision Tree from Scratch in Python
Decision Tree from Scratch in PythonDhirajk7
 
Data Clustering Using Swarm Intelligence Algorithms An Overview
Data Clustering Using  Swarm Intelligence Algorithms  An OverviewData Clustering Using  Swarm Intelligence Algorithms  An Overview
Data Clustering Using Swarm Intelligence Algorithms An OverviewAboul Ella Hassanien
 

What's hot (17)

Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
 
Lect4
Lect4Lect4
Lect4
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Introduction to artificial neural networks
Introduction to artificial neural networksIntroduction to artificial neural networks
Introduction to artificial neural networks
 
Machine Learning - Supervised learning
Machine Learning - Supervised learningMachine Learning - Supervised learning
Machine Learning - Supervised learning
 
Predicting Airbnb New User Bookings
Predicting Airbnb New User BookingsPredicting Airbnb New User Bookings
Predicting Airbnb New User Bookings
 
Data Science, Data & Dashboards Design
Data Science, Data & Dashboards DesignData Science, Data & Dashboards Design
Data Science, Data & Dashboards Design
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Decision Tree from Scratch in Python
Decision Tree from Scratch in PythonDecision Tree from Scratch in Python
Decision Tree from Scratch in Python
 
Data Clustering Using Swarm Intelligence Algorithms An Overview
Data Clustering Using  Swarm Intelligence Algorithms  An OverviewData Clustering Using  Swarm Intelligence Algorithms  An Overview
Data Clustering Using Swarm Intelligence Algorithms An Overview
 

Viewers also liked

Viewers also liked (13)

2015 04-07 Educational missmatch and firm performance
2015 04-07 Educational missmatch and firm performance2015 04-07 Educational missmatch and firm performance
2015 04-07 Educational missmatch and firm performance
 
JPBA published-a platform aQbD approach for multiple methods development
JPBA published-a platform aQbD approach for multiple methods developmentJPBA published-a platform aQbD approach for multiple methods development
JPBA published-a platform aQbD approach for multiple methods development
 
Siddhartha resume (Update)
Siddhartha resume (Update)Siddhartha resume (Update)
Siddhartha resume (Update)
 
Question 2 evaluation
Question 2 evaluation Question 2 evaluation
Question 2 evaluation
 
Heart failure
Heart failureHeart failure
Heart failure
 
Fibaro advanced users guide
Fibaro advanced users guideFibaro advanced users guide
Fibaro advanced users guide
 
Reditech Job folder
Reditech Job folderReditech Job folder
Reditech Job folder
 
La celula
La celulaLa celula
La celula
 
penghitungan jurnal praktek obat dengan 4 resep obat
penghitungan jurnal praktek obat dengan 4 resep obat penghitungan jurnal praktek obat dengan 4 resep obat
penghitungan jurnal praktek obat dengan 4 resep obat
 
Los peces
Los pecesLos peces
Los peces
 
Business plan Kewirausahaan
Business plan Kewirausahaan Business plan Kewirausahaan
Business plan Kewirausahaan
 
Holocaust timeline
Holocaust timelineHolocaust timeline
Holocaust timeline
 
Italian acrostic poems
Italian acrostic poemsItalian acrostic poems
Italian acrostic poems
 

Similar to Learning scientific scholar representations using a combination of collaboration, citation graph and text data

Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ireSoham Saha
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Optimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classificationOptimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classificationIAESIJAI
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowDebasisMohanty37
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...Big Data Spain
 
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSSENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSIJDKP
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
Mathematical foundations of Multithreaded programming concepts in Java lang...
Mathematical foundations of Multithreaded   programming concepts in Java lang...Mathematical foundations of Multithreaded   programming concepts in Java lang...
Mathematical foundations of Multithreaded programming concepts in Java lang...AM Publications,India
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxMrUnknown820784
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionAlessandro Suglia
 

Similar to Learning scientific scholar representations using a combination of collaboration, citation graph and text data (20)

Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Optimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classificationOptimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classification
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
Build a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flowBuild a simple image recognition system with tensor flow
Build a simple image recognition system with tensor flow
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
G04124041046
G04124041046G04124041046
G04124041046
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSSENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Mathematical foundations of Multithreaded programming concepts in Java lang...
Mathematical foundations of Multithreaded   programming concepts in Java lang...Mathematical foundations of Multithreaded   programming concepts in Java lang...
Mathematical foundations of Multithreaded programming concepts in Java lang...
 
No more bad news!
No more bad news!No more bad news!
No more bad news!
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptx
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

Learning scientific scholar representations using a combination of collaboration, citation graph and text data

  • 1. Learning Scientific scholar representations using a combination of Collaboration and Text data Ankush Khandelwal Raksha Jalan Bhavitha K ( IRE GROUP : 40 ) Information Retrieval Extraction IIIT-H Spring 2016
  • 2. Problem : ❏ Aim of the project is to learn vector representations for authors who publish scientific research papers . ❏ These representations should be such that authors who work in same domain ( i.e. same research area ) must be closer in vector space. ❏ These representations helps to categorize or cluster authors into various categories and further predict future collaboration based on past data.
  • 3. Introduction : ❏ Representation Learning/Feature Learning technique (transformation of raw data input into a representation) is performed to learn good vector representations for authors. ❏ They have gained a great success in various applications like image processing, speech recognition and natural language processing (NLP). ❏ The advantage is that once the vector representation is formed, the difficult network mining tasks can be solved with the help of various machine learning techniques.
  • 4. Dataset : ❏ The DBLP computer science bibliography contains the metadata of publications, written by several authors in thousands of journals or conference proceedings series. ❏ We have used a subset of the dataset which has metadata of around 2,75,000 papers.
  • 5. Text-processing : Parse the dataset file to get a list of unique authors and assign each author with an id. A snapshot of the auth id file :
  • 6. Co-authorship Information: Each line in the given snapshot corresponds to a paper. The first line signifies that author with id 1 has worked for the first paper. The second line implies that authors with id’s 2 and 3 have collaborated for second paper and so on. The author name mapping to id is taken from the authid file mentioned in the previous slide.
  • 7. Author Label generation: Information of all tags of the papers a particular author has worked for is taken.
  • 8. Highest occurring tag is assigned to the each author
  • 9. Training Neural Network : I/p file : The input to the neural network will be the refined co-authorship file which contains authors in positive and negative context w.r.t to every author.
  • 10. Neural Network continued .. ➢ We have used torch for training neural network. ➢ Neural network is feeded with the positive and negative samples and is being iterated for 10 epochs containing authors in the dataset and the vector representation for each author is learned. ➢ The vector representations are learned to finally get authors in positive context closer on vector space. Vector representation sample (word-embedding size=30) 1:0.12774897468519,-1.2134315799647,0.28491147244956,0.8021796034968,0.24783552528964,0.064771391008334,-0.62943657350973, -1.5811627032589,0.50791467408229,-0.016128751957846,-0.95420926437372,0.3088518152673, -0.18527131689276,0.95070454842939,0.60509919040003,1.3706830088368,0.59082443074081,-2.3339685239631, -2.5307487148746,0.2078369289687,0.32913756016955,1.6364679430803,0.65293421732019,-0.66457122621034,0.28869327954787,0.64982010840204,1.8983918247831, -0.52790655050569,0.12223315845681,0.63230901357502
  • 11. Classifying the vector Representations Classification techniques that are used to classify the vector representations of the authors are as follows A. Stochastic Gradient Descent. B. Support Vector Machines : RBF kernel is used and grid search is performed. C. Random Forest Classification.
  • 12. RESULTS Classifier Accuracy Stochastic Gradient Descent 10 Random Forest 28 SVM ( rbf kernel ) 30
  • 13. CONCLUSION Mean accuracy of 28 percent was observed on using random forest as compared to SVM giving 30 percent. Full text of the paper can be considered to get author representations if in positive context closer based on the semantic context of papers they worked on. For negative context author selection,considering 1 degree or more might also add on to the accuracy.
  • 14. CHALLENGES : Authors being sparsely distributed: Many papers contained single author and this information of authors who did not collaborate with any author were ignored while feeding the input to neural network.