Mining names in the big data to map diasporas - NamSor

1,208 views

Published on

Names reflect cultural Identity
NamSor data mining software recognizes the linguistic or cultural origin of names in any alphabet / language, with fine grain and high accuracy.
Personal names are meaningful : we use sociolinguistics to extract their semantics and deliver actionable intelligence.
The technology can be and is used for identifying and mapping diasporas.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,208
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining names in the big data to map diasporas - NamSor

  1. 1. Mining personal names in the ‘Big Data’ to map Diasporas Who are they, where are they and what are they doing? Connecting, Communicating and Networking with Diasporas 4-6 May 2016 - Dublin Castle - Ireland Elian CARSENAT, NamSor Funded by the European Union
  2. 2. 2 #RMM4Dublin
  3. 3. NamSor sorts Names 3  Personal names are meaningful : we use sociolinguistics to extract their semantics and deliver actionable intelligence.  Names reflect cultural Identity  NamSor data mining software recognizes the linguistic or cultural origin of names in any alphabet / language, with fine grain and high accuracy.
  4. 4. Mining 3M twitter names to map Diasporas Who are they, where are they and what are they doing? 4 Source: Twitter Source: Twitter Visualization : CartoDB Data Mining: NamSor
  5. 5. Flow view – who travels where? 5 Source Target Type Id Onoma Weight United Kingdom France Directed 16 Great Britain 37 Spain France Directed 55 Spain 14 United States France Directed 75 Great Britain 12 Turkey France Directed 79 Turkey 11 Brazil France Directed 87 Portugal 10 United Kingdom France Directed 112 Ireland 9 Italy France Directed 152 Italy 7 Switzerland France Directed 226 France 5 Belgium France Directed 247 France 5 United Kingdom France Directed 258 France 5 Mexico France Directed 287 Spain 4 Ireland France Directed 317 Great Britain 4 United Kingdom France Directed 333 Italy 4 United States France Directed 375 France 4 Source: Twitter Visualization : Gephi Data Mining: NamSor
  6. 6. Flow view – who travels where? 6 Source: Twitter Visualization : Gephi Data Mining: NamSor
  7. 7. “Incredible India” – 1.2 Billion People Indian onomastics by State/Union Territory 7 Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM, ORIYA, TAMIL, TELUGU, ARABIC
  8. 8. Applications to a global Airline’s customer intelligence 8 Example: Indian Diaspora / Non Resident Indians (NRI) based in the United States ‘It applies indeed to 93% of our customers: when NamSor recognizes an Indian name, the client has travelled to India in the past.’ At state level : ~50% Finer grain segmentation using names brings insights into diasporas’ travel patterns visiting family and friends in their home country, as well as their specific needs.
  9. 9. Mapping Talents in Cancer Research (in collaboration with French INSERM) 9 Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers) “Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The paper was prepared jointly with NamSor, a private start-up company specialized in mapping international Diasporas.” Source: WoS; Data Mining: INSERM with NamSor
  10. 10. 10 Source: WoS; Data Mining: INSERM with NamSor Mapping Talents in Cancer Research (in collaboration with French INSERM)
  11. 11. Cancer Research in Poland and Slovenia Examining the ‘brain drain’ 11 In the Polish Corpus, we look at co- authors with Polish names, affiliated abroad. Top countries: 1. USA 2. Great-Britain 3. Germany In the Slovenian Corpus, we look at co- authors with Slovenian names, affiliated abroad. Top countries: 1. Great-Britain 2. USA 3. Germany Source: WoS; Data Mining: INSERM with NamSor
  12. 12. Tunisie
  13. 13. Marocains Résidant à l'Étranger (MRE) Répartition parmi les principales Universités au Canada 13 Canadian Science Policy Conference - CSPC2015
  14. 14. Boston geo-demographics 1/2 14
  15. 15. Boston geo-demographics 2/2 15
  16. 16. Analysing patent data 16
  17. 17. Founder Bio 17 Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow and Shanghai. In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the 'Big Data' and better understand international flows of money, ideas and people. NamSor helps answer the perennial question all countries ask about their diasporas – who are they, where are they and what are they doing. NamSor has been used to attract Foreign Direct Investments (FDI), to build-up international collaboration within scientific communities, to attract and facilitate Diaspora investment in Start-ups... as well as other use cases. http://fr.linkedin.com/in/eliancarsenat/en
  18. 18. Thank you! Elian CARSENAT elian.carsenat@namsor.com Phone : +33 6 52 77 99 07 www.namsor.com 18 Juillet 2013, Ambassade de Lituanie à Paris

×