4. 2
Problem statement
Vitaly Khudobakhshov, 2016
• Let’s suppose that we have users who don’t set their birth date
or gender (default value problem)
• or set wrong values for some reason (e.g. mistakes and so on)
6. 4
Social Graph Analysis
Social Graph
• Is represented as an adjacency list
• user -> [(user0, label0), (user1, label1),…]
• Social graph is an undirected graph with labeled edges
• An edge may have multiple labels (classmates, parents, etc.)
Vitaly Khudobakhshov, 2016
7. 5
User’s Graph
What is a User’s Graph?
• User’s graph is a graph which is induced by star-
shaped tree
• user -> [(user0, label0), (user1, label1),…]
Vitaly Khudobakhshov, 2016
John
John’s Mother
John’s Father
John’s Girlfriend
AaronDavid
Sara
8. 6
Social Graph Analysis
Local Properties of User’s Graph
• Number of friends
• Connected components
• Number of triangles
• and so on
Vitaly Khudobakhshov, 2016
9. 7
Age Estimation by Local Properties
Motivation
Vitaly Khudobakhshov, 2016
John
1995
1970
1992
?
1992
1968
Classmates
Parents
Relationship
10. 8
Age Estimation by Local Properties
Data Sources
• Classmate label should be a strong feature (school, college).
• Colleague label definitely is not that good.
• How about a group of friends who are the same age?
Vitaly Khudobakhshov, 2016
11. 9
Some obstacles
Quality of the Model
• No ground truth.
• How to check?
Vitaly Khudobakhshov, 2016
Quality of the Data
• Labeling is incomplete.
14. 12
Age Estimation: Step 2
Vitaly Khudobakhshov, 2016
1 – classmates (school)
2 – classmates (college)
3 – max component
Not so good
15. 13
Confidence
Vitaly Khudobakhshov, 2016
Common sense formula
Here is an easy way to solve the problem:
Cschool = 1 – 1 / #friends + 0.002
Ccollege = 1 – 1 / #friends + 0.001
Cmax = 1 – 1 / #friends
16. 14
So you want to write a fugue?
Model quality
• No ground truth.
• There are special cases (e.g. Eschool=Ecollege=Emax).
• We can try to maximize accuracy with respect to model
parameters.
Vitaly Khudobakhshov, 2016
17. 15
NLP and Gender Estimation
Advantages
Vitaly Khudobakhshov, 2016
• Simple models are easy to understand: I/YOU +
ADJ/VERB with gender
Disadvantages
• Very difficult in case of a multilingual environment
• Coverage is not very good
• Privacy concerns
18. 15
Communities and Interests
How it works
Vitaly Khudobakhshov, 2016
• Male persons prefer cars and extreme sports.
• Female persons prefer something else.
Conclusion
• There are gender specific communities and gender
neutral communities.
• Divide and rule
23. 18
Conclusion
Vitaly Khudobakhshov, 2016
• Models are complimentary to each other.
• Simple methods may produce very good results due to
big data issues.
• We can gain better results without privacy violation.