SlideShare a Scribd company logo
1 of 16
Download to read offline
+$,-,.%/$0,"1&23"+.%.%/4""
  5673".%"8+09:.%&;"1<0%7,0-6%'"
"=6$%>".%"?0-@&"50%/$0/&"+.%.%/"

                A6:.%."BC"D<.:0<."
   1:&"D303&"E%.@&<7.3F"6G"?&H"I6<J"03"K$L0,6"
                   #0%F0"M%9C"

          +$,-,.%/$0,"(C)'"1$976%'"NO"
                NP<.,"Q!'"()Q("


                                                 !"#$%&'"())*"
50%/$0/&"E70/&"6%"3:&"M%3&<%&3"




  R6,$S&"6%"T:.%&7&"S.9<6U,6//.%/"7.3&"V&.U6":07"7$<P077&>"
  1H.W&<"6%"S$,-P,&"69907.6%7"




                                                              !"#$%&'"())*"
50%/$0/&"E70/&"6%"3:&"M%3&<%&3"




   :WP4XXHHHC.%3&<%&3H6<,>73037C96SX73037YC:3S"
                                                  !"#$%&'"())*"
1<&%>.%/"Z:<07&7X+&S&"[&3&9-6%"
 +6%.36<&>"E<>$"%&H7"3:<6$/:6$3"()QQ"

 T6SP0<&>"%&H7"G<6S"0<6$%>"3:&">0F7"6G"
 70S0"K.%"50>&%"J.,,.%/"8+0F"Q]^;"H.3:"3:&"
 0<-9,&7"F&0<"36">03&C""K07&>"6%"3:.7'"&23<093"
 7./%._90%3"%&H"P:<07&7C"

 M,,$73<03&7"63:&<"%&H7"U&7.>&7"K.%"50>&%"
 J.,,.%/'"&C/C"J.,,.%/"6G"=0<66`"K0./'",&0>&<"6G"
 +$W0:.>0"a$0S."+6@&S&%3C"


?63&4""9$<<&%3,F"$7.%/"6%,F"96%3&%3"0%0,F7.7"
36">&3&93"3<&%>.%/"P:<07&7C""1<$&"S&S&"
>&3&9-6%"<&`$.<&7"769.0,"%&3H6<J"G&03$<&7"07"
H&,,C"



                                                    !"#$%&'"())*"
=093$0,X16P.90,"N%0,F7.7"6G"?&H7"




                                    !"#$%&'"())*"
                                        b"
N%0,Fc.%/"Z63&%-0,"K.07".%"+&>.0"



    ?6%]16P.90,"3&23"0%0,F7.74"9:0<093&<.c0-6%7"0<&"76$/:3"6G"
    3:&"6P.%.6%7'"G&&,.%/7'"0%>"0d3$>&7"&2P<&77&>".%"0"3&23'"
    <03:&<"3:0%"e$73"6G"3:&"36P.97"3:&"3&23".7"0U6$3"

    =09.,.303&7"0%0,F7.7"6G":6H"S&>.0"<&P6<37"&@&%374"

    • ""D$SS0<.c&":6H"3:&"P<&77":07"7:.f&>".3g7"0d3$>&"
    36H0<>7"M%>.0"6@&<"3:&"P073"F&0<"
    • ""D:6H":6H">.L&<&%3"<&/.6%7"8D.%>:'"Z&7:0H0<'"=<6%-&<"
    3&<<.36<F;">.L&<".%"3:&.<"P&<9&P-6%"6G"3:&"9$<<&%3"
    0>S.%.73<0-6%"
    • ""V:03"0<&"3:&"S0.%".77$&7"U&.%/"<&P6<3&>h"
                                                                 !"#$%&'"())*"
Non-Topical Analysis

       Agent: Opinion holder
       Target: Target of Opinion being expressed (a topic, a person,
       organization etc.)
       Attitude: includes Expressive Element




                                         ! (&)!678!!05!"4&!2#*3!01!./&!%&#'!(*#-!,)*!'+&"!!()*!!"#$!'%!!"#$%&
                 [anSary nE kha myry ray^E myN eamr shyl ayk bd dmaG awr Zdy XKS hyN ]

[Ansari said, “according to me Aamir Sohail is one crazy and stubborn man”]!

                                                        TARGET
      AGENT                                            Attributes
   Attributes                    !""#"$%&'             ID:t1
   ID:a1               !(()*+,(-.!                                             EXPRESSIVE ELEMENT
   Nested-             "#$%&'&(!)**+*,-./01.$!                               Attributes
   source: “w”         2.3%*+4.!                                             ID:ex1 , TargetID:t1,
   TargetID:t1         "2*.25+*0$!6+36(!!                                    Emotion:anger
                       7.3%*+4.8*9:%;-$!*&!                                  Intensity:high,
                       <95+*+4.8*9:%;-$!2,==!                                Nested-Source: “w”, a1,
                                                                             Polarity:negative
FACETED SEARCH: DRILL DOWN TO RELEVANT CONTENT/DATA




               People are filled with anger and sorrow because of the policies made by Musharaf.
                                       OPINION HOLDER – Writer, People
www.janyainc.com
                        TARGET –Musharaf’s policies (Musharaf is an implied target)
Human Behavior Analysis
 •    Process social media content, provide tools for analysts to:         Predictive
        •  Identify social networks: groups, members
        •  Identify topics of discussion and sentiment
                                                                           Modeling
            •    E.g. angry at govt., wanting retaliation, peacemakers
            •    Thought influencers
                                                                         Link Diagrams
       •  Identify social goals through analysis of verbal
          communication
            •    Manipulation: Persuasion, threats, coercion
            •    Religious supremacy: religious analogues
            •    recruitment




Social Media
  Content
T:0,,&%/&7
         "




             !"#$%&'"())*"
!"#$%&'(&)*+%&%,"#+-"%(&(.(,/%01#2&%3&4#2-1+05&


                                                     Context Aware Translation
         Google Translation
                                              Name translation output:


i6H&@&<'"j>$90-6%"+.%.73&<"+6:0SS0>"
i0%.G":0,G]>&0>"<&9&%3,F".%"NG/:0%.730%'"k"
S.,,.6%"9:.,><&%"6G"79:66,"3&23U66J7":0@&"
     U&&%">.73<.U$3&>"36"76S&":6P&C"




  l66/,&"3<0%7,0-6%"0$/S&%3&>"UF"UF"D&S0%3&2™"3<0%7,0-6%"6G"%0S&7C"
  mi6H&@&<'"j>$90-6%"+.%.73&<"+6:0SS0>"i0%&&G"N3S0<"<&9&%3,F".%"NG/:0%.730%'"k"
  S.,,.6%"9:.,><&%"6G"79:66,"3&23U66J7":0@&"U&&%">.73<.U$3&>"36"76S&":6P&Cn"
  ?0-@&",0%/$0/&"P<69&77.%/"<&`$.<&>"G6<"_%&]/<0.%&>"0%0,F7.7o"
166,7"S0>&"G6<"+DN"G0.,"6%"N<0U.9">.0,&937"

i$S0%"3<0%7,0-6%"G6<"0,,"N<0U.9"@0<.0%37"U&,6H".7"3:&"70S&4"
m1:&<&".7"%6"&,&93<.9.3F'"H:03":0PP&%&>hn"

6,"718&9",1"#%& 6,"718&:3*,8(&;(<%&                =332-(&;,"#+-"%(&
j/FP-0%"            !"# $%&'(%) )*+!,#)            N3`303"&,&93<.90,"H.<&7'"V:F"0<&"Z673&>h
                                 -.* !/,
5&@0%-%&"             $)*+!, 0"12 3#,0             TJ,6"+0G&&7:")*+!,'"5&9:":&9Jh
                               -,"! 0"#
M<0`."              -+"4 $5)*+!, 3,)2 30           p$"+NT?"&,&93<.9.3F'"/66>h

+DN"                  )6)2 $5)*+!, /73")# [6&7"%63":0@&"&,&93<.9.3F'"H:03"
                                              :0PP&%&>h
N<0U.9"[.0,&937"0<&"%63":0%>,&>"H&,,".%"9$<<&%3"S09:.%&"3<0%7,0-6%"7F73&S7C"
                                        -#89
T5NKN"&%0U,&7"+DN"366,7"36".%3&<P<&3">.0,&937"96<<&93,FC"""

                                                                       Q("            !"#$%&'"())*"
V&U"07"T6<P$74""+.%.%/"V.J.P&>.0"G6<"50%/$0/&"
                              A&76$<9&7"




• ""1<0%7,0-6%",&2.96%7"0$36S0-90,,F"&23<093&>"G<6S"T:.%&7&"V.J.P&>.0'"$7&"9<677",0%/$0/&"
,.%J7"36"0>>"j%/,.7:"3<0%7,0-6%7"
• ""j07F"36"<&/&%&<03&"H.3:"%&H"@&<7.6%7"6G"V.J.P&>.0"
• ""T:.%&7&"V.J.P&>.0".7"96%730%3,F"/<6H.%/"

                                                                                     !"#$%&'"())*"
Code Mixing, Switching
!  Use of Latin script: lack of transliteration
   standards makes it difficult to process
!  Urdish, Spanglish, Hinglish etc.
Afsoos key baat hai . kal tak jo batain Non Muslim bhi kartay
hoay dartay thay abhi this man has brought it out in the open.
 [It is sad to see that those words that even a non muslim would
fear to utter until yesterday, this man has brought it out in the
open]


Solutions:
•  Apply “romanized” POS tagger, English tagger in tandem: use machine
learning to combine evidence and generate final tag, language ID
•  For longer English spans, use English NLP system
Language Resource Acquisition

        Less Commonly taught languages (LCTL)
        •  Yoruba, Russian, Swahili
        •  Dialects

        Very few few linguistics resources
        available
        •  electronic lexicons
        •  translation lexicons
        •  part-of-speech taggers, chunkers

        • Typically, very expensive to produce
        these resources by hand

        •  The web provides a new opportunity to
        automatically acquire these resources
        “web as corpus”
1:&"A60>"N:&0>h"




 T6%3&23'"T6%3&23'"T6%3&23o"   U,6/C/37]3<0%7,0-6%C96S"
                                                          !"#$%&'"())*"

More Related Content

What's hot

illustration art market report illustrated gallery
illustration art market report illustrated galleryillustration art market report illustrated gallery
illustration art market report illustrated galleryIngrid Bond
 
Sena McLellan Resume
Sena McLellan ResumeSena McLellan Resume
Sena McLellan ResumeSena McLellan
 
130319 seminario-mercado-boi-gordo-miguel-cavalcanti
130319 seminario-mercado-boi-gordo-miguel-cavalcanti130319 seminario-mercado-boi-gordo-miguel-cavalcanti
130319 seminario-mercado-boi-gordo-miguel-cavalcantiAgroTalento
 
Experience Mining and Dialogues with a Pattern Language for Creative Learning
Experience Mining and Dialogues with a Pattern Language for Creative LearningExperience Mining and Dialogues with a Pattern Language for Creative Learning
Experience Mining and Dialogues with a Pattern Language for Creative LearningTakashi Iba
 
Evolving systems and the link to service orientation
Evolving systems and the link to service orientationEvolving systems and the link to service orientation
Evolving systems and the link to service orientationAngelo van der Sijpt
 
Alcatel lucent-michael cooper
Alcatel lucent-michael cooperAlcatel lucent-michael cooper
Alcatel lucent-michael cooperCarl Ford
 
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Source
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Sourceเทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Source
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health SourceAkarimA SoommarT
 
Programación festival del filme frances
Programación festival del filme francesProgramación festival del filme frances
Programación festival del filme francesAhoraenquito .com
 
Programma Seminario Crspo
Programma Seminario CrspoProgramma Seminario Crspo
Programma Seminario Crspoognikura
 
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...Satoshi Hara
 
Helpers Briefing
Helpers BriefingHelpers Briefing
Helpers Briefingjemyao
 

What's hot (19)

illustration art market report illustrated gallery
illustration art market report illustrated galleryillustration art market report illustrated gallery
illustration art market report illustrated gallery
 
Social Switch
Social SwitchSocial Switch
Social Switch
 
323 n ministerial
323 n ministerial323 n ministerial
323 n ministerial
 
Sena McLellan Resume
Sena McLellan ResumeSena McLellan Resume
Sena McLellan Resume
 
130319 seminario-mercado-boi-gordo-miguel-cavalcanti
130319 seminario-mercado-boi-gordo-miguel-cavalcanti130319 seminario-mercado-boi-gordo-miguel-cavalcanti
130319 seminario-mercado-boi-gordo-miguel-cavalcanti
 
Experience Mining and Dialogues with a Pattern Language for Creative Learning
Experience Mining and Dialogues with a Pattern Language for Creative LearningExperience Mining and Dialogues with a Pattern Language for Creative Learning
Experience Mining and Dialogues with a Pattern Language for Creative Learning
 
5 detox scams to avoid
5 detox scams to avoid5 detox scams to avoid
5 detox scams to avoid
 
Evolving systems and the link to service orientation
Evolving systems and the link to service orientationEvolving systems and the link to service orientation
Evolving systems and the link to service orientation
 
Alcatel lucent-michael cooper
Alcatel lucent-michael cooperAlcatel lucent-michael cooper
Alcatel lucent-michael cooper
 
Xarxes socials
Xarxes socialsXarxes socials
Xarxes socials
 
Csharp Intsight
Csharp IntsightCsharp Intsight
Csharp Intsight
 
Discover arbonne 2015
Discover arbonne 2015Discover arbonne 2015
Discover arbonne 2015
 
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Source
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Sourceเทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Source
เทคนิคการสืบค้น ฐานข้อมูล ProQuest Nursing & Allied Health Source
 
work samples
work sampleswork samples
work samples
 
Programación festival del filme frances
Programación festival del filme francesProgramación festival del filme frances
Programación festival del filme frances
 
Programma Seminario Crspo
Programma Seminario CrspoProgramma Seminario Crspo
Programma Seminario Crspo
 
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and...
 
Ipad gump
Ipad gumpIpad gump
Ipad gump
 
Helpers Briefing
Helpers BriefingHelpers Briefing
Helpers Briefing
 

Viewers also liked

Utterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent SummariesUtterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent SummariesContent Savvy
 
Show off your skills Self-esteem
Show off your skills Self-esteemShow off your skills Self-esteem
Show off your skills Self-esteemEnrique Espinosa
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...Content Savvy
 

Viewers also liked (6)

Utterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent SummariesUtterance Topic Model for Generating Coherent Summaries
Utterance Topic Model for Generating Coherent Summaries
 
Luxury Oceanside Estate
Luxury Oceanside EstateLuxury Oceanside Estate
Luxury Oceanside Estate
 
Show off your skills Self-esteem
Show off your skills Self-esteemShow off your skills Self-esteem
Show off your skills Self-esteem
 
Slide share
Slide shareSlide share
Slide share
 
UNOMY
UNOMYUNOMY
UNOMY
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 

Similar to Multilingual Text Mining: Lost in (Machine) Translation, Found in Native Language Mining

Community Panels: Pandora's Box or Panacea
Community Panels: Pandora's Box or PanaceaCommunity Panels: Pandora's Box or Panacea
Community Panels: Pandora's Box or Panaceavcuniversity
 
Curso de Java server faces (JSF)
Curso de Java server faces (JSF)Curso de Java server faces (JSF)
Curso de Java server faces (JSF)Rafael Ponte
 
Christian Bason, MindLab Denmark - MaRS Global Leadership Series
Christian Bason, MindLab Denmark - MaRS Global Leadership SeriesChristian Bason, MindLab Denmark - MaRS Global Leadership Series
Christian Bason, MindLab Denmark - MaRS Global Leadership SeriesMaRS Discovery District
 
Fichas para trabajar con euros y centimos
Fichas para trabajar con euros y centimosFichas para trabajar con euros y centimos
Fichas para trabajar con euros y centimosVicky Pinero Elices
 
Excel 2007 warm up
Excel 2007 warm upExcel 2007 warm up
Excel 2007 warm upSubeesh Up
 
Second-Screen Searches: Crucial I-Want-to-Know Moments for Brands
Second-Screen Searches: Crucial  I-Want-to-Know Moments for BrandsSecond-Screen Searches: Crucial  I-Want-to-Know Moments for Brands
Second-Screen Searches: Crucial I-Want-to-Know Moments for BrandsStudioRevolucija
 
The Science of Search, Google & Social Signals
The Science of Search, Google & Social SignalsThe Science of Search, Google & Social Signals
The Science of Search, Google & Social SignalsBeyond
 
7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants) 7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants) Heather Piwowar
 
Julius Joseph-SAP BASIS Resume
Julius Joseph-SAP BASIS ResumeJulius Joseph-SAP BASIS Resume
Julius Joseph-SAP BASIS ResumeJulius Joseph
 
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)Dimitris Psounis
 
Radical Collaboration: Tools for Partnering with Community Members
Radical Collaboration: Tools for Partnering with Community MembersRadical Collaboration: Tools for Partnering with Community Members
Radical Collaboration: Tools for Partnering with Community MembersNina Simon
 

Similar to Multilingual Text Mining: Lost in (Machine) Translation, Found in Native Language Mining (20)

Community Panels: Pandora's Box or Panacea
Community Panels: Pandora's Box or PanaceaCommunity Panels: Pandora's Box or Panacea
Community Panels: Pandora's Box or Panacea
 
07 samyagan
07 samyagan07 samyagan
07 samyagan
 
sam presso
sam pressosam presso
sam presso
 
Curso de Java server faces (JSF)
Curso de Java server faces (JSF)Curso de Java server faces (JSF)
Curso de Java server faces (JSF)
 
The Project Trap
The Project TrapThe Project Trap
The Project Trap
 
WikiLeaks Re-Branded
WikiLeaks Re-BrandedWikiLeaks Re-Branded
WikiLeaks Re-Branded
 
Christian Bason, MindLab Denmark - MaRS Global Leadership Series
Christian Bason, MindLab Denmark - MaRS Global Leadership SeriesChristian Bason, MindLab Denmark - MaRS Global Leadership Series
Christian Bason, MindLab Denmark - MaRS Global Leadership Series
 
Fichas para trabajar con euros y centimos
Fichas para trabajar con euros y centimosFichas para trabajar con euros y centimos
Fichas para trabajar con euros y centimos
 
Excel 2007 warm up
Excel 2007 warm upExcel 2007 warm up
Excel 2007 warm up
 
Les serrures abloy par vedis
Les serrures abloy par vedisLes serrures abloy par vedis
Les serrures abloy par vedis
 
Second-Screen Searches: Crucial I-Want-to-Know Moments for Brands
Second-Screen Searches: Crucial  I-Want-to-Know Moments for BrandsSecond-Screen Searches: Crucial  I-Want-to-Know Moments for Brands
Second-Screen Searches: Crucial I-Want-to-Know Moments for Brands
 
Social Signals & Search
Social Signals & SearchSocial Signals & Search
Social Signals & Search
 
The Science of Search, Google & Social Signals
The Science of Search, Google & Social SignalsThe Science of Search, Google & Social Signals
The Science of Search, Google & Social Signals
 
7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants) 7 data citation challenges, illustrated with data (includes elephants)
7 data citation challenges, illustrated with data (includes elephants)
 
Julius Joseph-SAP BASIS Resume
Julius Joseph-SAP BASIS ResumeJulius Joseph-SAP BASIS Resume
Julius Joseph-SAP BASIS Resume
 
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)
ΓΛΩΣΣΑ C - ΜΑΘΗΜΑ 10 (ΕΚΤΥΠΩΣΗ)
 
Radical Collaboration: Tools for Partnering with Community Members
Radical Collaboration: Tools for Partnering with Community MembersRadical Collaboration: Tools for Partnering with Community Members
Radical Collaboration: Tools for Partnering with Community Members
 
la Repubblica.it
la Repubblica.itla Repubblica.it
la Repubblica.it
 
la Repubblica.it
la Repubblica.itla Repubblica.it
la Repubblica.it
 
la Repubblica.it
la Repubblica.itla Repubblica.it
la Repubblica.it
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Multilingual Text Mining: Lost in (Machine) Translation, Found in Native Language Mining

  • 1. +$,-,.%/$0,"1&23"+.%.%/4"" 5673".%"8+09:.%&;"1<0%7,0-6%'" "=6$%>".%"?0-@&"50%/$0/&"+.%.%/" A6:.%."BC"D<.:0<." 1:&"D303&"E%.@&<7.3F"6G"?&H"I6<J"03"K$L0,6" #0%F0"M%9C" +$,-,.%/$0,"(C)'"1$976%'"NO" NP<.,"Q!'"()Q(" !"#$%&'"())*"
  • 3. 50%/$0/&"E70/&"6%"3:&"M%3&<%&3" :WP4XXHHHC.%3&<%&3H6<,>73037C96SX73037YC:3S" !"#$%&'"())*"
  • 4. 1<&%>.%/"Z:<07&7X+&S&"[&3&9-6%" +6%.36<&>"E<>$"%&H7"3:<6$/:6$3"()QQ" T6SP0<&>"%&H7"G<6S"0<6$%>"3:&">0F7"6G" 70S0"K.%"50>&%"J.,,.%/"8+0F"Q]^;"H.3:"3:&" 0<-9,&7"F&0<"36">03&C""K07&>"6%"3:.7'"&23<093" 7./%._90%3"%&H"P:<07&7C" M,,$73<03&7"63:&<"%&H7"U&7.>&7"K.%"50>&%" J.,,.%/'"&C/C"J.,,.%/"6G"=0<66`"K0./'",&0>&<"6G" +$W0:.>0"a$0S."+6@&S&%3C" ?63&4""9$<<&%3,F"$7.%/"6%,F"96%3&%3"0%0,F7.7" 36">&3&93"3<&%>.%/"P:<07&7C""1<$&"S&S&" >&3&9-6%"<&`$.<&7"769.0,"%&3H6<J"G&03$<&7"07" H&,,C" !"#$%&'"())*"
  • 6. N%0,Fc.%/"Z63&%-0,"K.07".%"+&>.0" ?6%]16P.90,"3&23"0%0,F7.74"9:0<093&<.c0-6%7"0<&"76$/:3"6G" 3:&"6P.%.6%7'"G&&,.%/7'"0%>"0d3$>&7"&2P<&77&>".%"0"3&23'" <03:&<"3:0%"e$73"6G"3:&"36P.97"3:&"3&23".7"0U6$3" =09.,.303&7"0%0,F7.7"6G":6H"S&>.0"<&P6<37"&@&%374" • ""D$SS0<.c&":6H"3:&"P<&77":07"7:.f&>".3g7"0d3$>&" 36H0<>7"M%>.0"6@&<"3:&"P073"F&0<" • ""D:6H":6H">.L&<&%3"<&/.6%7"8D.%>:'"Z&7:0H0<'"=<6%-&<" 3&<<.36<F;">.L&<".%"3:&.<"P&<9&P-6%"6G"3:&"9$<<&%3" 0>S.%.73<0-6%" • ""V:03"0<&"3:&"S0.%".77$&7"U&.%/"<&P6<3&>h" !"#$%&'"())*"
  • 7. Non-Topical Analysis Agent: Opinion holder Target: Target of Opinion being expressed (a topic, a person, organization etc.) Attitude: includes Expressive Element ! (&)!678!!05!"4&!2#*3!01!./&!%&#'!(*#-!,)*!'+&"!!()*!!"#$!'%!!"#$%& [anSary nE kha myry ray^E myN eamr shyl ayk bd dmaG awr Zdy XKS hyN ] [Ansari said, “according to me Aamir Sohail is one crazy and stubborn man”]! TARGET AGENT Attributes Attributes !""#"$%&' ID:t1 ID:a1 !(()*+,(-.! EXPRESSIVE ELEMENT Nested- "#$%&'&(!)**+*,-./01.$! Attributes source: “w” 2.3%*+4.! ID:ex1 , TargetID:t1, TargetID:t1 "2*.25+*0$!6+36(!! Emotion:anger 7.3%*+4.8*9:%;-$!*&! Intensity:high, <95+*+4.8*9:%;-$!2,==! Nested-Source: “w”, a1, Polarity:negative
  • 8. FACETED SEARCH: DRILL DOWN TO RELEVANT CONTENT/DATA People are filled with anger and sorrow because of the policies made by Musharaf. OPINION HOLDER – Writer, People www.janyainc.com TARGET –Musharaf’s policies (Musharaf is an implied target)
  • 9. Human Behavior Analysis •  Process social media content, provide tools for analysts to: Predictive •  Identify social networks: groups, members •  Identify topics of discussion and sentiment Modeling •  E.g. angry at govt., wanting retaliation, peacemakers •  Thought influencers Link Diagrams •  Identify social goals through analysis of verbal communication •  Manipulation: Persuasion, threats, coercion •  Religious supremacy: religious analogues •  recruitment Social Media Content
  • 10. T:0,,&%/&7 " !"#$%&'"())*"
  • 11. !"#$%&'(&)*+%&%,"#+-"%(&(.(,/%01#2&%3&4#2-1+05& Context Aware Translation Google Translation Name translation output: i6H&@&<'"j>$90-6%"+.%.73&<"+6:0SS0>" i0%.G":0,G]>&0>"<&9&%3,F".%"NG/:0%.730%'"k" S.,,.6%"9:.,><&%"6G"79:66,"3&23U66J7":0@&" U&&%">.73<.U$3&>"36"76S&":6P&C" l66/,&"3<0%7,0-6%"0$/S&%3&>"UF"UF"D&S0%3&2™"3<0%7,0-6%"6G"%0S&7C" mi6H&@&<'"j>$90-6%"+.%.73&<"+6:0SS0>"i0%&&G"N3S0<"<&9&%3,F".%"NG/:0%.730%'"k" S.,,.6%"9:.,><&%"6G"79:66,"3&23U66J7":0@&"U&&%">.73<.U$3&>"36"76S&":6P&Cn" ?0-@&",0%/$0/&"P<69&77.%/"<&`$.<&>"G6<"_%&]/<0.%&>"0%0,F7.7o"
  • 12. 166,7"S0>&"G6<"+DN"G0.,"6%"N<0U.9">.0,&937" i$S0%"3<0%7,0-6%"G6<"0,,"N<0U.9"@0<.0%37"U&,6H".7"3:&"70S&4" m1:&<&".7"%6"&,&93<.9.3F'"H:03":0PP&%&>hn" 6,"718&9",1"#%& 6,"718&:3*,8(&;(<%& =332-(&;,"#+-"%(& j/FP-0%" !"# $%&'(%) )*+!,#) N3`303"&,&93<.90,"H.<&7'"V:F"0<&"Z673&>h -.* !/, 5&@0%-%&" $)*+!, 0"12 3#,0 TJ,6"+0G&&7:")*+!,'"5&9:":&9Jh -,"! 0"# M<0`." -+"4 $5)*+!, 3,)2 30 p$"+NT?"&,&93<.9.3F'"/66>h +DN" )6)2 $5)*+!, /73")# [6&7"%63":0@&"&,&93<.9.3F'"H:03" :0PP&%&>h N<0U.9"[.0,&937"0<&"%63":0%>,&>"H&,,".%"9$<<&%3"S09:.%&"3<0%7,0-6%"7F73&S7C" -#89 T5NKN"&%0U,&7"+DN"366,7"36".%3&<P<&3">.0,&937"96<<&93,FC""" Q(" !"#$%&'"())*"
  • 13. V&U"07"T6<P$74""+.%.%/"V.J.P&>.0"G6<"50%/$0/&" A&76$<9&7" • ""1<0%7,0-6%",&2.96%7"0$36S0-90,,F"&23<093&>"G<6S"T:.%&7&"V.J.P&>.0'"$7&"9<677",0%/$0/&" ,.%J7"36"0>>"j%/,.7:"3<0%7,0-6%7" • ""j07F"36"<&/&%&<03&"H.3:"%&H"@&<7.6%7"6G"V.J.P&>.0" • ""T:.%&7&"V.J.P&>.0".7"96%730%3,F"/<6H.%/" !"#$%&'"())*"
  • 14. Code Mixing, Switching !  Use of Latin script: lack of transliteration standards makes it difficult to process !  Urdish, Spanglish, Hinglish etc. Afsoos key baat hai . kal tak jo batain Non Muslim bhi kartay hoay dartay thay abhi this man has brought it out in the open. [It is sad to see that those words that even a non muslim would fear to utter until yesterday, this man has brought it out in the open] Solutions: •  Apply “romanized” POS tagger, English tagger in tandem: use machine learning to combine evidence and generate final tag, language ID •  For longer English spans, use English NLP system
  • 15. Language Resource Acquisition Less Commonly taught languages (LCTL) •  Yoruba, Russian, Swahili •  Dialects Very few few linguistics resources available •  electronic lexicons •  translation lexicons •  part-of-speech taggers, chunkers • Typically, very expensive to produce these resources by hand •  The web provides a new opportunity to automatically acquire these resources “web as corpus”
  • 16. 1:&"A60>"N:&0>h" T6%3&23'"T6%3&23'"T6%3&23o" U,6/C/37]3<0%7,0-6%C96S" !"#$%&'"())*"