Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

2,558 views

Published on

Presentation at the RapidMiner Wisdom Conference in New York 2016.

Published in: Government & Nonprofit
  • Be the first to comment

Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

  1. 1. Elian CARSENAT, NamSor2016-01-28 1“Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics”
  2. 2. Founder Bio 2 Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow and Shanghai. In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the 'Big Data' and better understand international flows of money, ideas and people. NamSor helps answer the perennial question all countries ask about their diasporas – who are they, where are they and what are they doing. NamSor has been used to attract Foreign Direct Investments (FDI), to build-up international collaboration within scientific communities, to attract and facilitate Diaspora investment in Start-ups... as well as other use cases. http://fr.linkedin.com/in/eliancarsenat/en
  3. 3. NamSor sorts Names 3  Names are meaningful : we use sociolinguistics to extract their semantics and deliver actionable intelligence.  Names reflect cultural Identity  NamSor data mining software recognizes the linguistic or cultural origin of names in any alphabet / language, with fine grain and high accuracy.
  4. 4. 4 Gender Gap in Financing
  5. 5. 5 Gender Gap in Science
  6. 6. Diasporas in Science (in collaboration with French INSERM) 6 Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers) “Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The paper was prepared jointly with NamSor, a private start-up company specialized in mapping international Diasporas.” Source: WoS; Data Mining: INSERM with NamSor
  7. 7. Scholar names in some Canadian Universities Chinese, Indian, Iranian, Moroccan, Italian names 7 Canadian Science Policy Conference - CSPC2015
  8. 8. 8  USE CASE – BOSTON CITY GEODEMOGRAPHICS
  9. 9. US Census vs NamSor geo-demographics 9  In July 2015, the US Government announced new rules that will require all cities and towns receiving federal housing funds to assess patterns of segregation.  The NY Times has published interactive maps of Boston geo-demographics, which we can compare with the information inferred by NamSor
  10. 10. US Census Race Map of Boston 10 http://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html
  11. 11. Using Voters List  US Census: 1pixel = 40 inhabitants  Voters List: 1 pixel = 1 voter 11 Source: Boston Voters List Visualization : ESRI Data Mining: NamSor+RapidMiner
  12. 12. Breaking down ‘White’ and ‘Asian’ into Portuguese, Spanish, Italian, India, Pakistan, China, ... 12 Source: Boston Voters List Visualization : ESRI Data Mining: NamSor+RapidMiner
  13. 13. Who LIVES in New York ? 13
  14. 14. Who OWNS in Brooklyn, NY? Inferring origin in NYC ACRIS (Real Estate OpenData) 14 > Brooklyn zip codes >NamSororigins
  15. 15. Who OWNS in Brooklyn, NY? Inferring origin in NYC ACRIS (Real Estate OpenData) 15 Interesting ‘Little’ spots  ZIP 11209 : Irish  ZIP 11219 : Jewish  ZIP 11233 : African American  ZIP 11228 : Italian  ZIP 11208 : Hispanic  ZIP 11214 : Chinese  ZIP 11235 : Ukrainian/Russian  ZIP 11416 : Indian  ZIP 11222 : Polish
  16. 16. 16  USE CASE – ELECTIONS
  17. 17. A Decision Tree from FLORIDA Voters List (open data) 17  //TODO : based on FLORIDA
  18. 18. Segmenting ‘Asian’ voters would improve the model Using NamSor Origin to infer : Indian, Vietnamese, Korean, Chinese, ... 18 Tree ethno = (Chin: DEM {DEM=3311, REP=2636, IDP=48, INT=199, LPF=9, GRE=5, CPF=2, REF=2, AIP=0, PSL=0} ethno = (Indi: DEM {DEM=12509, REP=4565, IDP=95, INT=432, LPF=32, GRE=10, CPF=0, REF=1, AIP=3, PSL=1} ethno = (Indo: DEM {DEM=984, REP=718, IDP=9, INT=43, LPF=4, GRE=1, CPF=1, REF=0, AIP=0, PSL=0} ethno = (Japa: DEM {DEM=488, REP=403, IDP=9, INT=34, LPF=2, GRE=1, CPF=1, REF=0, AIP=0, PSL=0} ethno = (Kore: REP {DEM=1148, REP=1174, IDP=11, INT=75, LPF=3, GRE=0, CPF=0, REF=0, AIP=0, PSL=0} ethno = (Mong: DEM {DEM=24, REP=22, IDP=0, INT=0, LPF=0, GRE=1, CPF=0, REF=0, AIP=0, PSL=0} ethno = (Paki: DEM {DEM=4411, REP=843, IDP=25, INT=110, LPF=9, GRE=6, CPF=0, REF=0, AIP=0, PSL=0} ethno = (Viet: REP {DEM=3798, REP=5780, IDP=65, INT=272, LPF=10, GRE=5, CPF=3, REF=3, AIP=2, PSL=0} Pakistanis, Vietnamese didn’t vote the same.
  19. 19. 19  USE CASE – TRAVEL INTELLIGENCE
  20. 20. “Incredible India” – 1.2 BN People Indian onomastics by State/Union Territory 20 Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
  21. 21. ASSAM: Karbi Anglong, within district Inter-caste marriages ? 21 output Input Input clusterId clusterParentId Firstname LastName parent is FirstParentLastParent L25354:253L64958:2797 A¡à[¹ ¹}[ššã husband ¤àl¡ü[W¡³ [W¡}>๠L47490:1593L64958:2797 ¤àK[¹ [W¡}>๠father ¤àl¡ü[W¡³ [W¡}>๠L28582:1209L47490:1593 [³>à Òü}[t¡šã husband ¤àK[¹ [W¡}>๠L23643:669L35593:510 ™åKƒ}à [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>๠L23643:669L35593:510 ³à>àÒü [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>๠L47490:1593L35593:510 W¡àì=¢ [W¡}>๠father Wå¡ì¤ [W¡}>๠L23643:669L35593:510 A¡àì¹ t¡àì¹ïšã husband Wå¡ì¤ [W¡}>๠L35593:510L47490:1593 [ƒ[ºš [W¡}>๠father W¡àì¤ [W¡}>๠L23643:669L47490:1593 [¹>à [W¡}>๚ã father W¡àì¤ [W¡}>๠parent is husband Count of serial Column Labels Row Labels L47490:1593 L116370:3612 L54332:2031 L184096:2297 L35593:510 L168871:1819 L135664:4438 L51271:837 L23643:669 6931 84 5099 15 2069 28 791 1924 L151415:3559 18 212 11 6446 19 1217 55 6 L28582:1209 5132 68 3565 10 1494 17 592 1323 L116370:3612 66 10283 38 72 40 321 137 29 L9839:442 2491 60 1851 9 774 11 321 660 L168871:1819 7 263 6 361 8 2730 24 4 L23642:141 1198 8 822 2 375 4 156 332 L25354:253 1181 12 932 375 7 100 323 L135664:4438 20 154 5 22 19 44 2212 3 L87032:1210 11 315 13 51 14 141 37 9 L90333:3644 3 204 2 31 190 5 L184096:2297 13 1735 3 84 11 1 L87031:697 4 136 4 12 3 137 4 5 L14495:131 614 10 432 167 4 68 163 L63724:1422 17 83 10 34 34 28 96 6 L98994:891 31 161 46 21 19 59 21 5 ASSAM: Karbi Anlong district names clustered L116370:3612 L23643:669 L151415:3559 L47490:1593 L28582:1209 L54332:2031 L184096:2297 L168871:1819 L9839:442 L135664:4438 L87032:1210 L90333:3644 L35593:510 L51271:837 L63724:1422 L154797:1168 L64959:1796 L23642:141 L87031:697 L6536:295 L98994:891 L25354:253 L64958:2797 L30570:2614 L90334:1189 L95839:287 L100510:366 L121390:783 Other Source: Voters List; Data Mining: NamSor
  22. 22. Applications to an Airline’s customer intelligence 22 A global airline : ‘For 93% of our customers, when NamSor recognizes an Indian name, the client has travelled to India in the past.’ Finer grain segmentation using names brings insights about diasporas travel pattern visiting family and friends in their home country, as well as their specific needs.
  23. 23. Using NamSor API 23 (1) Get an API Key (2) Get NamSor RapidMiner Extension
  24. 24. Thank you! Elian CARSENAT, elian.carsenat@namsor.com Phone : +33 6 52 77 99 07 http://www.namsor.com/ 24 Juillet 2013, Ambassade de Lituanie à Paris

×