NamSor at RapidMiner Wisdom 2015 (Ljubljana, Slovenia)

3,064 views

Published on

NamSor Applied Onomastics extension for RapidMiner, includes the following operators:
Extract Gender
Extract Origin
Parse Name
 
Extract Gender operator infers the gender from international names (male/female), calling NamSor GendRE API. Register for an API Key for faster processing and higher throughput.
Extract Origin operator will guess the likely country of origin of a personal name, based on the sociolinguistics of the name (language, culture).
Parse Name will guess the likely structure of a personal name (firstName-lastName order, or lastName-firstName order) based on language/culture.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,064
On SlideShare
0
From Embeds
0
Number of Embeds
2,089
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NamSor at RapidMiner Wisdom 2015 (Ljubljana, Slovenia)

  1. 1. ONOMASTICS EXTENSION FOR RAPIDMINER > GENDER GAP / GENDER PAY GAP > DIASPORAS AND OTHER DIVERSITY ANALYTICS WHAT USE IN SOCIAL NET, SENTIMENT ANALYSIS? Elian CARSENAT, NamSor Applied Onomastics 1 2015-09
  2. 2. 2
  3. 3. Coming soon – Gender Gap in CORPORATE GOVERNANCE3
  4. 4. Mining 3M twitter names to map Diasporas Who are they, where are they and what are they doing? 4 Source: Twitter
  5. 5. Mapping Talents in Cancer Research (in collaboration with French INSERM) 5 Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers) “Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The paper was prepared jointly with NamSor, a private start-up company specialized in mapping international Diasporas.” http://ireg-observatory.org/en/index.php/261-applying-onomastics-to-scientometrics
  6. 6. Cancer Research in Poland and Slovenia Examining the ‘brain drain’ 6 In the Polish Corpus, we look at co- authors with Polish names, affiliated abroad. Top countries: 1. US, 2. Great-Britain, 3. Germany. In the Slovenian Corpus, we look at co- authors with Slovenian names, affiliated abroad. Top countries: 1. Great-Britain, 2. US, 3. Germany.
  7. 7. 7  USE CASE – BOSTON CITY
  8. 8. 8  NamSor Gender API extracts the likely gender of personal names (ex. Andrea Rossini : Male; Andrea Parker : Female)  The gender gap in City of Boston employees  Original file : Employee_Earnings_Report_2012.xlsx  Genderized : output_employees_genderized.xlsx
  9. 9. 3 simple steps to view the gender gap 9 Read examples of industry wide studies (airline pilots, Hollywood, start-ups, ...) http://gendergapgrader.com/ Employees List City
  10. 10. Gender Gap By Department 10 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Transportation-Parking Clerk Law Department ASD Human Resources Boston Public Schools Boston Public Library State Boston Retirement Syst Dept of Voter Mobilization Neighborhood Development Boston City Council Elderly Commission Assessing Department Boston Cntr - Youth & Families Transportation Department Dpt of Innovation & Technology Inspectional Services Dept Boston Police Department Property Management Parks Department Public Works Department Boston Fire Department Boston City Gender Gap by dept having 50+ employees %M %F %U More Male More Female Source: output_employees_genderized.xlsx
  11. 11. Gender Gap By Earnings Range 11 2344 5488 303 3390 4863 15 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 22026+ 59874+ 162754+ Boston City Gender Gap Count of employees with total earnings > $22k, $59k, $162k Female Male Source: output_employees_genderized.xlsx
  12. 12. Is it accurate? Testing with Boston Voters List 12 NamSor API Output (B) NamSor API Input Declared gender (A) How often A=B? Row Labels Female Male Unknown (blank) Grand Total Precision Recall F 194442 9625 1884 205951 95.3% 99.1% M 6249 165069 1779 173097 96.4% 99.0% (blank) 3339 3688 355 7382 Grand Total 204030 178382 4018 386430 95.8% 99.0% Precision > 95%
  13. 13. The gender gap is measured accurately 13 Using Declared Gender Estimating Gender using Names 45% 53% 2% Voters List Gender Gap (actual) M F U 47% 53% 0% Voters List Gender Gap (inferred) Male Female Unknown Error rate in range 0% to 2%, usually <1% (NB can vary based on demographics, ex gender can’t be inferred of Chinese, Korean names)
  14. 14. 14  NamSor Origin API extracts the likely country/culture of origin of personal names  Diversity of origin in City of Boston employees  Original file : Employee_Earnings_Report_2012.xlsx  Origins : output_employees_origined.xlsx  Diversity of origin in City of Boston voters  Original file : Voters List.txt  Origins : Voters_Origined.xlsx
  15. 15. US Census vs NamSor geo-demographics 15  In July 2015, the US Government announced new rules that will require all cities and towns receiving federal housing funds to assess patterns of segregation.  The NY Times has published interactive maps of Boston geo-demographics, which we can compare with the information inferred by NamSor
  16. 16. US Census Race Map of Boston 16 http://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html
  17. 17. Using Voters List  US Census: 1pixel = 40 inhabitants  Voters List: 1 pixel = 1 voter 17
  18. 18. Voter’s list: zooming further into 051200  US Census  Voters List + NamSor 18
  19. 19. Breaking down ‘White’ and ‘Asian’ into Portuguese, Spanish, Italian, India, Pakistan, China, ... 19
  20. 20. “Incredible India” – 1.2 BN People 20 Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
  21. 21. Social Network, Sentiment Analysis? 21  What Women vs. Men Think  Twitter Sentiment Analysis (topic, brand, place ...)  Traditional Media – who’s talking?  How language, culture or origin affect interest, sentiment?
  22. 22. How to get the extension 22
  23. 23. How to get the API Key 23 https://api.namsor.com/ It’s FREE to TRY
  24. 24. Merci ! Your contacts Elian CARSENAT,  NamSor Founder, French national, computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow & Shanghai. elian.carsenat@namsor.com Phone : +33 6 52 77 99 07 24 Juillet 2013, Ambassade de Lituanie à Paris

×