SlideShare a Scribd company logo
1 of 28
GENDER
DETECTION IN
BLOGS
Presented By (Team No. 32)
Nitish Jain (201301227)
Ganesh Borle (201505587)
Vamshikrishna Reddy (201202177)
Mentored By
Lokesh Walase
IRE [CSE474]
The Big Picture
ABSTRACT
�Through the sands of time, textual content has remained a
prominent feature of internet media especially BLOGS.
�Thus, author profiling and attribution becomes an important
and task and we try to capture one aspect of it, i.e gender.
● internet can’t take responsibility of the all the content, it
should be the author itself.
● But . . .
● lot of content brings a lot of responsibility
Given a text blog , can we identify whether
the writer is a male or a female ?
The Question
WHO IS THE AUTHOR?
OUR APPROACH
THE APPROACH
�An ensemble is applied on these models and the input
document is classified as written by male or female.
● We take advantage of the linguistic features of the
blog and create a feature file.
● This feature file is then trained on various classifier and a
model for each of the classifier is prepared.
WORKFLOW
� each document contains text of about ~35 blogs
in XML format.
[Dataset Link : http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm ]
The Dataset
● Koppels blog dataset
● contains about 19 thousand document
PARSING
● Language used : Python
● Each blog is entry stored in XML format
<Blog>
<date>....... </date>
<post>
….
</post>
...
<Blog>
● Each of the blog filename contains the name and Gender
of the author
The Feature Extraction
FEATURES
For our task of Gender Identification, we take the help of
the following linguistic features:
�Character Based Features
�Word Based Features
�Syntactic Features
�Structural Features
�Function Words
�POS Start Probability
The
Classification
THE CLASSIFICATION TASK
For the task of classification, we used several classifying
algorithms and arrived at a model that uses ensemble of the
following classification algorithms:
�Random Forest Classifier
�Neural Networks Classifier
�Adaboost Tree Classifier
�Gradient Boosting Classifier
�Bagging Classifier
THE CLASSIFICATION TASK
For each of the classifier
�We fed it with partial features to actually see the variation
of accuracies with the features.
�We applied a 10 fold validation to measure the accuracies.
For measuring the accuracy of the ensemble we took the
majority class from the classified results of the classifiers.
RANDOM FOREST CLASSIFIER
● An meta estimator that fits a number
of decision tree classifiers on various
sub-samples of the dataset
● By using Random Forest Classifier we
were able to achieve an accuracy of
69.79%
NEURAL NETWORKS CLASSIFIER
● Consists of multiple layers of nodes
with each layer fully connected to the
next layer nodes and each node is a
neuron with non-linear perceptron.
● Uses a supervised learning called
backpropagation for training the
network.
● By using Neural Networks Classifier
we were able to achieve an accuracy
of 69.51%
ADABOOST TREE CLASSIFIER
● An meta estimator that begins by
fitting a classifier on the original
dataset and then fits the next round
classifiers on the same dataset
● By using Adaboost tree Classifier we
were able to achieve an accuracy of
69.57%
GRADIENT BOOSTING CLASSIFIER
● Builds model in a forward stage-wise
fashion.
● In each of the next stages weak
classifiers are introduced to
compensate the shortcomings of the
existing weak learners and these
shortcomings are identified by the
gradients.
● By using Gradient Boosting Classifier
we were able to achieve an accuracy
of 70.81%
BAGGING CLASSIFIER
● A meta estimator that fits the base
classifiers each on random subsets of
the datasets and then aggregate their
individual predictions.
● By using Gradient Boosting Classifier
we were able to achieve an accuracy
of 70.03%
THE ENSEMBLE
● An Ensemble takes the output of other
classifier and then applies a majority
voting to the outputs of the classifier
to determine the output.
● By using the Ensemble model on the
above discussed classifiers we were
able to achieve an accuracy of
71.10%
FINAL RESULTS
THE FINAL RESULTS
● By using the ensemble, we were
actually able to increase our efficiency
by nearly 1% in each case irrespective
of the performance of the individual
classifiers.
● The maximum obtainable accuracy
that was shown during the
experiments was 73.19% by the
Ensemble model.
73.188406 %The maximum Accuracy Achieved
USEFUL LINKS
� Github - https://github.com/nitishjain2007/Gender_Identification
� Youtube -
� Slideshare -
� Website - http://nitishjain2007.github.io/Gender_Identification/
� Dropbox -
REFERENCES
� http://u.cs.biu.ac.il/~koppel/papers/malefemalellcfinal.pdf
� http://www.aaai.org/ocs/index.php/ICWSM/09/paper/viewFile
/208/537
� http://www.cs.columbia.edu/nlp/papers/2011/acl2011age.pdf
� http://www.ccse.kfupm.edu.sa/~ahmadsm/coe589-
121/cheng2011genderidentification.pdf
Thanks!
Any questions?

More Related Content

Viewers also liked

Cover Letter of Admin Manager
Cover Letter of Admin ManagerCover Letter of Admin Manager
Cover Letter of Admin ManagerOishik Choudhury
 
Successfull upload
Successfull uploadSuccessfull upload
Successfull uploadgarimapriya
 
Sl 1 2
Sl 1 2Sl 1 2
Sl 1 2ken57
 
Normalización
NormalizaciónNormalización
Normalizaciónanggie_ahr
 
Navy and Marine Corp Achievement Medal 1
Navy and Marine Corp Achievement Medal 1Navy and Marine Corp Achievement Medal 1
Navy and Marine Corp Achievement Medal 1Lamar Baker
 
Industry evolution and concentration
Industry evolution and concentrationIndustry evolution and concentration
Industry evolution and concentrationMeenakshi1994
 
Coupling_of_IGA_plates_and_3D_FEM_domain
Coupling_of_IGA_plates_and_3D_FEM_domainCoupling_of_IGA_plates_and_3D_FEM_domain
Coupling_of_IGA_plates_and_3D_FEM_domainNguyen Vinh Phu
 
Tropicalia: Music and Politics in Brazil
Tropicalia: Music and Politics in BrazilTropicalia: Music and Politics in Brazil
Tropicalia: Music and Politics in BrazilKathy Swart
 

Viewers also liked (10)

Cover Letter of Admin Manager
Cover Letter of Admin ManagerCover Letter of Admin Manager
Cover Letter of Admin Manager
 
final15
final15final15
final15
 
Successfull upload
Successfull uploadSuccessfull upload
Successfull upload
 
Sl 1 2
Sl 1 2Sl 1 2
Sl 1 2
 
Normalización
NormalizaciónNormalización
Normalización
 
Navy and Marine Corp Achievement Medal 1
Navy and Marine Corp Achievement Medal 1Navy and Marine Corp Achievement Medal 1
Navy and Marine Corp Achievement Medal 1
 
Industry evolution and concentration
Industry evolution and concentrationIndustry evolution and concentration
Industry evolution and concentration
 
Coupling_of_IGA_plates_and_3D_FEM_domain
Coupling_of_IGA_plates_and_3D_FEM_domainCoupling_of_IGA_plates_and_3D_FEM_domain
Coupling_of_IGA_plates_and_3D_FEM_domain
 
Tropicalia: Music and Politics in Brazil
Tropicalia: Music and Politics in BrazilTropicalia: Music and Politics in Brazil
Tropicalia: Music and Politics in Brazil
 
RTI Act 2005
RTI Act   2005 RTI Act   2005
RTI Act 2005
 

Similar to Gender Detection In Blogs [Information Retrival and Extraction]

Project presentation
Project presentationProject presentation
Project presentationVarun Gupta
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
Automated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionAutomated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionIRJET Journal
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware RecommendationYONG ZHENG
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation systemGaurav Sawant
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsJinho Choi
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMPuneet Kulyana
 

Similar to Gender Detection In Blogs [Information Retrival and Extraction] (20)

Project presentation
Project presentationProject presentation
Project presentation
 
Abstractive Review Summarization
Abstractive Review SummarizationAbstractive Review Summarization
Abstractive Review Summarization
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Automated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionAutomated Essay Grading using Features Selection
Automated Essay Grading using Features Selection
 
Poster (2)
Poster (2)Poster (2)
Poster (2)
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
 
C3 w5
C3 w5C3 w5
C3 w5
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation system
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 

Recently uploaded

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Recently uploaded (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Gender Detection In Blogs [Information Retrival and Extraction]

  • 2. Presented By (Team No. 32) Nitish Jain (201301227) Ganesh Borle (201505587) Vamshikrishna Reddy (201202177) Mentored By Lokesh Walase IRE [CSE474]
  • 4. ABSTRACT �Through the sands of time, textual content has remained a prominent feature of internet media especially BLOGS. �Thus, author profiling and attribution becomes an important and task and we try to capture one aspect of it, i.e gender. ● internet can’t take responsibility of the all the content, it should be the author itself. ● But . . . ● lot of content brings a lot of responsibility
  • 5. Given a text blog , can we identify whether the writer is a male or a female ? The Question
  • 6. WHO IS THE AUTHOR?
  • 8. THE APPROACH �An ensemble is applied on these models and the input document is classified as written by male or female. ● We take advantage of the linguistic features of the blog and create a feature file. ● This feature file is then trained on various classifier and a model for each of the classifier is prepared.
  • 10. � each document contains text of about ~35 blogs in XML format. [Dataset Link : http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm ] The Dataset ● Koppels blog dataset ● contains about 19 thousand document
  • 11. PARSING ● Language used : Python ● Each blog is entry stored in XML format <Blog> <date>....... </date> <post> …. </post> ... <Blog> ● Each of the blog filename contains the name and Gender of the author
  • 13. FEATURES For our task of Gender Identification, we take the help of the following linguistic features: �Character Based Features �Word Based Features �Syntactic Features �Structural Features �Function Words �POS Start Probability
  • 15. THE CLASSIFICATION TASK For the task of classification, we used several classifying algorithms and arrived at a model that uses ensemble of the following classification algorithms: �Random Forest Classifier �Neural Networks Classifier �Adaboost Tree Classifier �Gradient Boosting Classifier �Bagging Classifier
  • 16. THE CLASSIFICATION TASK For each of the classifier �We fed it with partial features to actually see the variation of accuracies with the features. �We applied a 10 fold validation to measure the accuracies. For measuring the accuracy of the ensemble we took the majority class from the classified results of the classifiers.
  • 17. RANDOM FOREST CLASSIFIER ● An meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset ● By using Random Forest Classifier we were able to achieve an accuracy of 69.79%
  • 18. NEURAL NETWORKS CLASSIFIER ● Consists of multiple layers of nodes with each layer fully connected to the next layer nodes and each node is a neuron with non-linear perceptron. ● Uses a supervised learning called backpropagation for training the network. ● By using Neural Networks Classifier we were able to achieve an accuracy of 69.51%
  • 19. ADABOOST TREE CLASSIFIER ● An meta estimator that begins by fitting a classifier on the original dataset and then fits the next round classifiers on the same dataset ● By using Adaboost tree Classifier we were able to achieve an accuracy of 69.57%
  • 20. GRADIENT BOOSTING CLASSIFIER ● Builds model in a forward stage-wise fashion. ● In each of the next stages weak classifiers are introduced to compensate the shortcomings of the existing weak learners and these shortcomings are identified by the gradients. ● By using Gradient Boosting Classifier we were able to achieve an accuracy of 70.81%
  • 21. BAGGING CLASSIFIER ● A meta estimator that fits the base classifiers each on random subsets of the datasets and then aggregate their individual predictions. ● By using Gradient Boosting Classifier we were able to achieve an accuracy of 70.03%
  • 22. THE ENSEMBLE ● An Ensemble takes the output of other classifier and then applies a majority voting to the outputs of the classifier to determine the output. ● By using the Ensemble model on the above discussed classifiers we were able to achieve an accuracy of 71.10%
  • 24. THE FINAL RESULTS ● By using the ensemble, we were actually able to increase our efficiency by nearly 1% in each case irrespective of the performance of the individual classifiers. ● The maximum obtainable accuracy that was shown during the experiments was 73.19% by the Ensemble model.
  • 25. 73.188406 %The maximum Accuracy Achieved
  • 26. USEFUL LINKS � Github - https://github.com/nitishjain2007/Gender_Identification � Youtube - � Slideshare - � Website - http://nitishjain2007.github.io/Gender_Identification/ � Dropbox -
  • 27. REFERENCES � http://u.cs.biu.ac.il/~koppel/papers/malefemalellcfinal.pdf � http://www.aaai.org/ocs/index.php/ICWSM/09/paper/viewFile /208/537 � http://www.cs.columbia.edu/nlp/papers/2011/acl2011age.pdf � http://www.ccse.kfupm.edu.sa/~ahmadsm/coe589- 121/cheng2011genderidentification.pdf