SlideShare a Scribd company logo
1 of 40
Download to read offline
  @PENGUINANA_	
  (genta	
  kaneyama)	
  
   http://pcod.no-­‐ip.org/	
  

 
     visualization	
  

                            http://b.hatena.ne.jp/pcod/nlp/	
  



  twitter          (yats)        	
  

  2009            Web            	
  
    	

 
                                   	
              1   	

                   	

                                       	
                             	
                        	

 
    	

 
                                        	
              1   	

                   	

                                            	
                             	
                        	

 
  900 tweet/   	
  
  60                  	
  
       3                     	
  AND	
  >400        	
  


                                                           …   	
  
                                     	
  
  Learning	
  to	
  Classify	
  Short	
  and	
  Sparse	
  Text	
  &	
  Web	
  with	
  
   Hidden	
  Topics	
  from	
  Large-­‐scale	
  Data	
  Collections	
  
   (WWW2008)	
  



  Wikipedia,	
  MEDLINE LDA
            	
  

  Wikipedia          ”universal	
  corpus”	
  

                                                                             	
  
…	


http://www.baidu.jp/unlp/#omake
API                                       	
 http://pcod.no-­‐ip.org/yats/genre	
  
http://pcod.no-­‐ip.org/yats/	
  
           	
                                        	
  

                  	
                                        	
  
                                                                    	
  
    	
  
                                               	
  

                            API 	
  

                                  	
  
                           
Learning	
  to	
  Classify	
  Short	
  and	
  Sparse	
  Text	
  &	
  Web	
  with	
  Hidden	
  Topics	
  from	
  
                      Large-­‐scale	
  Data	
  Collections	
  (WWW2008)	
  
?	
  Unlike	
  normal	
  documents,	
  these	
  text	
  &	
  Web	
  segments	
  are	
  
   usually	
  noisier,	
  less	
  topic-­‐focused,	
  and	
  much	
  shorter,	
  that	
  
   is,	
  they	
  consist	
  of	
  from	
  a	
  dozen	
  words	
  to	
  a	
  few	
  sentences.	
  
   Because	
  of	
  the	
  short	
  length,	
  they	
  do	
  not	
  provide	
  enough	
  
   word	
  co-­‐	
  occurrence	
  or	
  shared	
  context	
  for	
  a	
  good	
  
   similarity	
  measure.	
  

 
                                    	
  
  (Wikipedia    )                              	
  

  LDA                	
  

  (Wikipedia    )                       	
  

  ME(maximum	
  entropy)                	
  

         	
  
       LDA                                            	
  
                            	
  
LDA(model)	
  -­‐>	
  MaxEnt(classifier)	
  LDA sparse	
  text                                       	
  
      D.	
  Blei,	
  A.	
  Ng,	
  and	
  M.	
  Jordan.	
  Latent	
  Dirichlet	
  
   Allocation.	
  JMLR,	
  3:993–1022,	
  2003.	
  
  SVM               ME                          	
  
      SVM            	
  
      SVM                                   (                )	
  
  Wikipedia                             	
  

                                                 	
  
                         	
  
       Hidden	
  topic          	
  
                                          	
  


 
Univarsal	
  corpus	
  Final	
  data:	
  	
  	
  	
  	
  =240MB;	
  
   |docs|	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =	
  71,986;	
  
   |paragraphs|	
  =	
  882,376;	
  
   |vocabulary|	
  	
  =	
  60,649;	
  
   |total	
  words|	
  =	
  30,492,305;	
  

  Wikipedia                                                        HTML                	
  

  Stop	
  word                                                       30
                               	
  

  Universal	
  corpus   	
  
 
         (MEDLINE)
  Topic
    200   stable
 
                       	
  
                              	
  
         	
  
API
                                 	
  
       	
  MySQL	
  +	
  Python	
  +	
  TokyoCabinet	
  	
  

                                        	
  
       MeCab(+                 )	
  +	
  Python	
  +	
  MPICH2	
  

  API                   	
  
       Python	
  +	
  Django(or	
  tornado)	
  +	
  apache(or	
  nginx)	
  +	
  redis	
  
                                	
  
                              ,             ,URL            	
  
       if	
  ('   '	
  in	
  n.feature	
  and	
  not	
  '          , '	
  in	
  n.feature	
  and	
  not	
  
        '          '	
  in	
  n.feature)	
  

                         (Python+TokyoCabinet)	
  
       cat	
  dump.txt	
  |	
  python	
  loadtoTC.py	
  
       >df.addint(key,1)	
  

                                 (                )	
  
                  100                                               	
  
                                                                           	
  
                                                            	
  
                                                plda                                            …	
  

  http://code.google.com/p/plda/	
  

                         	
  
            N=50,alpha=0.5,beta=0.1	
  
            Alpha:*empirically*	
  MIN(1,50/num_topics)	
  	
  
            beta:	
  	
  *empirically*	
  0.1	
  



Master>	
  mpd	
  -­‐-­‐daemon	
  -­‐-­‐listenport=55555	
  

Slave>	
  mpd	
  -­‐-­‐daemon	
  -­‐h	
  master	
  -­‐p	
  55555	
  

                                       mpdtrace;	
  mpdringtest	
  

mpiexec	
  -­‐n	
  8	
  ./mpi_lda	
  -­‐-­‐num_topics	
  50	
  -­‐-­‐total_iterations	
  150	
  -­‐-­‐alpha	
  1	
  -­‐-­‐beta	
  0.1	
  -­‐-­‐
   training_data_file	
  ~/201005.txt	
  -­‐-­‐model_file	
  /tmp/lda_model201005.txt	
  
…
9249.0	
                   8833.0	
          7755.0	
              6846.0	
              5895.0	
              4765.0	
  
   4075.0	
               3669.0	
                3399.0	
              3340.0	
              3297.0	
               3223.0	
  
     3152.0	
              3053.0	
                3044.0	
                   2926.0	
              2812.0	
                 2670.0	
  
                2613.0	
                2607.0	
                      2443.0	
              2428.0	
         2330.0	
  
2244.0	
              2234.0	
             2219.0	
                  2150.0	
         2058.0	
              2041.0	
  
1956.0	
         1887.0	
              1855.0	
                    1772.0	
        1765.0	
             1759.0	
               1741.0	
  
           1739.0	
            1685.0	
            1599.0	
             1582.0	
            1538.0	
  >                      1535.0	
  
           1535.0	
                1524.0	
               1513.0	
        1500.0	
           1500.0	
          1471.0	
  
1468.0	
                   1464.0	
              1390.0	
              1358.0	
                  1307.0	
                 1280.0	
  
    >        1275.0	
                         1268.0	
                 1242.0	
          1204.0	
            1184.0	
  
1151.0	
          1145.0	
           1132.0	
                1128.0	
                1107.0	
                1107.0	
  
1106.0	
            1096.0	
                1092.0	
                    1060.0	
            1057.0	
              1046.0	
  
1041.0	
          1029.0	
           1025.0	
              1025.0	
             1015.0	
         1013.0	
            1000.0	
  
939.0	
         922.0	
                      906.0	
                       906.0	
            904.0	
               900.0	
  
899.0	
            898.0	
           892.0	
               891.0	
                879.0	
               877.0	
               867.0	
  
            865.0	
          863.0	
            858.0	
        852.0	
             852.0	
               844.0	
             840.0	
  
      831.0	
            821.0	
  
label,4,
20262.0	
           5296.0	
               4542.0	
                   4439.0	
              4352.0	
               4295.0	
  
            4266.0	
               3585.0	
                3554.0	
             3217.0	
        3190.0	
              3165.0	
  
   2812.0	
                   2630.0	
          2541.0	
             2486.0	
             2456.0	
                2371.0	
  
2324.0	
           2135.0	
                     2084.0	
                2044.0	
                  2030.0	
                1962.0	
  
1910.0	
           1866.0	
          1769.0	
            1734.0	
         1679.0	
              1662.0	
              1575.0	
  
1557.0	
           1486.0	
               1465.0	
          1441.0	
             1403.0	
         1302.0	
                   1280.0	
  
    1278.0	
                   1271.0	
                   1268.0	
             1267.0	
        1245.0	
            	
     1214.0	
                  1208.0	
          1159.0	
            1154.0	
          1129.0	
                   1122.0	
  
1104.0	
          1096.0	
           1095.0	
            1075.0	
         1050.0	
                   1035.0	
            1032.0	
  
1029.0	
          1024.0	
            > 1021.0	
                   1010.0	
          963.0	
             946.0	
           940.0	
  
860.0	
           858.0	
        855.0	
                843.0	
          808.0	
             794.0	
           789.0	
  
781.0	
              779.0	
                777.0	
          769.0	
   > 768.0	
                     763.0	
           761.0	
  
752.0	
         750.0	
               750.0	
           748.0	
                   746.0	
                  742.0	
               737.0	
  
        724.0	
          724.0	
              720.0	
                717.0	
         716.0	
                 706.0	
              702.0	
  
        686.0	
        686.0	
            680.0	
          678.0	
                 678.0	
              660.0	
            650.0	
  
642.0	
              637.0	
  
label,1,
56227.0	
            22429.0	
                       12037.0	
             10790.0	
               8672.0	
           7979.0	
  
          4682.0	
             4612.0	
           4376.0	
          4356.0	
             4010.0	
            2891.0	
  
2700.0	
             2637.0	
            2588.0	
          2579.0	
             2482.0	
            2477.0	
             2426.0	
  
2384.0	
                2353.0	
              2331.0	
            2326.0	
             2194.0	
            2191.0	
         2149.0	
  
   2128.0	
               2078.0	
                    2045.0	
           2030.0	
           2025.0	
            1876.0	
            1873.0	
  
              1841.0	
                    1833.0	
         1776.0	
           1715.0	
            1713.0	
          1666.0	
  
      1593.0	
                      1577.0	
             1532.0	
            1513.0	
                1407.0	
          >
1396.0	
             1374.0	
            1370.0	
                1370.0	
         1358.0	
                  1316.0	
              1226.0	
  
                      1218.0	
             1151.0	
          1105.0	
           1086.0	
                      1034.0	
               979.0	
  
             967.0	
            947.0	
        945.0	
             930.0	
             927.0	
                     886.0	
           882.0	
  
           872.0	
             871.0	
                       850.0	
           840.0	
                       831.0	
            825.0	
  
   808.0	
             799.0	
            793.0	
         792.0	
              783.0	
              762.0	
                759.0	
  
758.0	
              753.0	
        751.0	
            741.0	
               733.0	
             732.0	
                714.0	
  
713.0	
        708.0	
            705.0	
              700.0	
          686.0	
               660.0	
                  659.0	
  
641.0	
         633.0	
              625.0	
            622.0	
         617.0	
            615.0	
                        610.0	
  
610.0	
              608.0	
  
label,2,???	
  
56227.0	
            22429.0	
                       12037.0	
             10790.0	
               8672.0	
           7979.0	
  
          4682.0	
             4612.0	
           4376.0	
          4356.0	
             4010.0	
            2891.0	
  
2700.0	
             2637.0	
            2588.0	
          2579.0	
             2482.0	
            2477.0	
             2426.0	
  
2384.0	
                2353.0	
              2331.0	
            2326.0	
             2194.0	
            2191.0	
         2149.0	
  
   2128.0	
               2078.0	
                    2045.0	
           2030.0	
           2025.0	
            1876.0	
            1873.0	
  
              1841.0	
                    1833.0	
         1776.0	
           1715.0	
            1713.0	
          1666.0	
  
      1593.0	
                      1577.0	
             1532.0	
            1513.0	
                1407.0	
          >
1396.0	
             1374.0	
            1370.0	
                1370.0	
         1358.0	
                  1316.0	
              1226.0	
  
                      1218.0	
             1151.0	
          1105.0	
           1086.0	
                      1034.0	
               979.0	
  
             967.0	
            947.0	
        945.0	
             930.0	
             927.0	
                     886.0	
           882.0	
  
           872.0	
             871.0	
                       850.0	
           840.0	
                       831.0	
            825.0	
  
   808.0	
             799.0	
            793.0	
         792.0	
              783.0	
              762.0	
                759.0	
  
758.0	
              753.0	
        751.0	
            741.0	
               733.0	
             732.0	
                714.0	
  
713.0	
        708.0	
            705.0	
              700.0	
          686.0	
               660.0	
                  659.0	
  
641.0	
         633.0	
              625.0	
            622.0	
         617.0	
            615.0	
                        610.0	
  
610.0	
              608.0	
  
label,2,???	
  
8448.0	
               7233.0	
             6245.0	
             6119.0	
                  5919.0	
              4790.0	
  
             4294.0	
               4212.0	
             3958.0	
           3814.0	
             3676.0	
                     3196.0	
  
             3032.0	
  >              2861.0	
                 2851.0	
                 2812.0	
              2605.0	
  
2588.0	
          2413.0	
                 2383.0	
           2302.0	
            1833.0	
             1817.0	
  
1786.0	
            1662.0	
               1637.0	
          1623.0	
                 1617.0	
             1613.0	
         1546.0	
  
       1539.0	
           1500.0	
                1488.0	
          1469.0	
          1463.0	
                 1463.0	
  
1459.0	
              1450.0	
              1448.0	
               1395.0	
         1350.0	
               1328.0	
  
1311.0	
        1298.0	
             1290.0	
           1285.0	
           1269.0	
            1247.0	
            1232.0	
  
1216.0	
               1215.0	
                  1207.0	
                1204.0	
              1191.0	
                  1162.0	
  
    1155.0	
                    1152.0	
          1146.0	
                 1115.0	
                1103.0	
            1095.0	
  
   1072.0	
                          1048.0	
               1037.0	
           1018.0	
            1018.0	
             1017.0	
  
   1017.0	
             1003.0	
             	
    993.0	
             984.0	
                  970.0	
                      966.0	
               961.0	
           956.0	
  
    949.0	
                 940.0	
              936.0	
                928.0	
          920.0	
             919.0	
            916.0	
  
                896.0	
               890.0	
           889.0	
               889.0	
            888.0	
         887.0	
  
880.0	
         869.0	
               868.0	
                856.0	
          850.0	
            847.0	
               842.0	
  
833.0	
           830.0	
             829.0	
              829.0	
        	
     806.0	
  
label,6,                   	
  
31845.0	
             9928.0	
          8205.0	
          7709.0	
          6715.0	
            4650.0	
              4461.0	
  
      4455.0	
             3718.0	
         3688.0	
                 3592.0	
          3495.0	
             3483.0	
          2965.0	
  
      2651.0	
              2638.0	
            2336.0	
          2261.0	
           2151.0	
                    2140.0	
  
2112.0	
         2071.0	
                       2065.0	
         2024.0	
            1995.0	
           1986.0	
             1973.0	
  
      1967.0	
             1942.0	
         1926.0	
            1838.0	
          1800.0	
            1760.0	
          1760.0	
  
   1727.0	
                1584.0	
        1500.0	
            1471.0	
            1445.0	
            1344.0	
              1302.0	
  
     1280.0	
             1221.0	
         1181.0	
           1174.0	
          1170.0	
                1162.0	
           1154.0	
  
   1146.0	
         1129.0	
                1116.0	
                1101.0	
           1072.0	
                1069.0	
  
1057.0	
        1043.0	
              1024.0	
         1022.0	
            981.0	
                  979.0	
        972.0	
  
960.0	
  >                  953.0	
        945.0	
           935.0	
         934.0	
             924.0	
           910.0	
  
898.0	
                   869.0	
         856.0	
           852.0	
         845.0	
          826.0	
             821.0	
  
803.0	
         801.0	
         800.0	
           798.0	
         794.0	
          773.0	
           771.0	
                  755.0	
  
      747.0	
         746.0	
             739.0	
           734.0	
             733.0	
         720.0	
           714.0	
  
709.0	
         707.0	
         706.0	
           699.0	
              694.0	
            692.0	
               688.0	
  
684.0	
           682.0	
             681.0	
  
label,9,
                            	
  
                                                 	
  
                                   	
  
                     	
  
                                          	
  

 
               	
  
  Latent	
  topic                                       	
  

                                	
  
       Latent	
  topic                                              	
  
                         	
  

                                	
  
                                                      	
  
                                       	
  
                                               	
  
           
API                          	
     Get                              	




            1.                 

2.                                         (redis)

                 3.     	
  




     [Topic1:prob,Topic2:prob,]	
  
               JSON	
  
    PLDA                                                               	
  
     Collaborative	
  Filtering	
  for	
  Orkut	
  Communities:	
  Discovery	
  of	
  User	
  Latent	
  Behavior.	
  Wen-­‐
     Yen	
  Chen	
  et	
  al.,	
  WWW	
  2009.	
  
     http://www.cs.ucsb.edu/~wychen/publications/fp365-­‐chen.pdf	
  	
  

                                                …	
  
     The	
  role	
  of	
  semantic	
  history	
  on	
  online	
  generative	
  topic	
  modeling.	
  L	
  AlSumait,	
  D	
  
     Barbará,	
  C	
  Domeniconi	
  -­‐	
  ise.gmu.edu	
  
     http://www.ise.gmu.edu/~carlotta/publications/Siam_SemOLDA.pdf	
  

    LDA                    …	
  
     R LDA                                  author facebook/data	
                          LDA                                     	
  
              Not-­‐So-­‐Latent	
  Dirichlet	
  Allocation:	
  Collapsed	
  Gibbs	
  Sampling	
  Using	
  Human	
  Judgments	
  
              ePluribus:	
  Ethnicity	
  on	
  Social	
  Networks.	
  	
  
     http://www.facebook.com/data#!/data?v=app_4949752878	
  
http://www.baidu.jp/unlp/#omake	
  



Q&A

More Related Content

Viewers also liked

Mordern Modes Of Aviation
Mordern Modes Of AviationMordern Modes Of Aviation
Mordern Modes Of Aviationpskapadiaa050
 
ニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようgenta kaneyama
 
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピングSolrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピングgenta kaneyama
 
Moneymachine-Ads4bucks Incentive Scheme-BM
Moneymachine-Ads4bucks Incentive Scheme-BMMoneymachine-Ads4bucks Incentive Scheme-BM
Moneymachine-Ads4bucks Incentive Scheme-BMmoneymachinekl
 
Ads4bucks Incentive Scheme
Ads4bucks Incentive SchemeAds4bucks Incentive Scheme
Ads4bucks Incentive Schememoneymachinekl
 
Ads Hq Enhancement Latest Lokpi
Ads Hq Enhancement Latest LokpiAds Hq Enhancement Latest Lokpi
Ads Hq Enhancement Latest Lokpimoneymachinekl
 

Viewers also liked (9)

Solr at cookpad
Solr at cookpadSolr at cookpad
Solr at cookpad
 
Mordern Modes Of Aviation
Mordern Modes Of AviationMordern Modes Of Aviation
Mordern Modes Of Aviation
 
ニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみようニコニコ動画を検索可能にしてみよう
ニコニコ動画を検索可能にしてみよう
 
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピングSolrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
 
Moneymachine-Ads4bucks Incentive Scheme-BM
Moneymachine-Ads4bucks Incentive Scheme-BMMoneymachine-Ads4bucks Incentive Scheme-BM
Moneymachine-Ads4bucks Incentive Scheme-BM
 
Ads4bucks Incentive Scheme
Ads4bucks Incentive SchemeAds4bucks Incentive Scheme
Ads4bucks Incentive Scheme
 
Ads Hq Enhancement Latest Lokpi
Ads Hq Enhancement Latest LokpiAds Hq Enhancement Latest Lokpi
Ads Hq Enhancement Latest Lokpi
 
aaaa
aaaaaaaa
aaaa
 
Solr@twitter検索
Solr@twitter検索Solr@twitter検索
Solr@twitter検索
 

Similar to Tokyotextmining#1 kaneyama genta

Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Gilbert Paquette
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciencesScott Collis
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataDaniel Vila Suero
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing EcosystemDatabricks
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Collaborative Media Annotation with YUMA
Collaborative Media Annotation with YUMACollaborative Media Annotation with YUMA
Collaborative Media Annotation with YUMAaboutgeo
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascadingDataiku
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCKingsley Uyi Idehen
 
Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Alexandre Passant
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsTakuya UESHIN
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013olberger
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)Emil Eifrem
 

Similar to Tokyotextmining#1 kaneyama genta (20)

Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...Semantically-aware Networks and Services for Training and Knowledge Managemen...
Semantically-aware Networks and Services for Training and Knowledge Managemen...
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked Data
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Collaborative Media Annotation with YUMA
Collaborative Media Annotation with YUMACollaborative Media Annotation with YUMA
Collaborative Media Annotation with YUMA
 
ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 
Dataiku pig - hive - cascading
Dataiku   pig - hive - cascadingDataiku   pig - hive - cascading
Dataiku pig - hive - cascading
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBC
 
The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)
 
Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
 
TOSCA in Practice with ARIA
TOSCA in Practice with ARIATOSCA in Practice with ARIA
TOSCA in Practice with ARIA
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Tokyotextmining#1 kaneyama genta

  • 1.
  • 2.   @PENGUINANA_  (genta  kaneyama)   http://pcod.no-­‐ip.org/     visualization     http://b.hatena.ne.jp/pcod/nlp/     twitter (yats)     2009 Web  
  • 3.       1          
  • 4.       1          
  • 5.
  • 6.   900 tweet/     60     3  AND  >400     …      
  • 7.   Learning  to  Classify  Short  and  Sparse  Text  &  Web  with   Hidden  Topics  from  Large-­‐scale  Data  Collections   (WWW2008)     Wikipedia,  MEDLINE LDA     Wikipedia ”universal  corpus”      
  • 9. API http://pcod.no-­‐ip.org/yats/genre  
  • 11.                               API        
  • 12. Learning  to  Classify  Short  and  Sparse  Text  &  Web  with  Hidden  Topics  from   Large-­‐scale  Data  Collections  (WWW2008)  
  • 13. ?   Unlike  normal  documents,  these  text  &  Web  segments  are   usually  noisier,  less  topic-­‐focused,  and  much  shorter,  that   is,  they  consist  of  from  a  dozen  words  to  a  few  sentences.   Because  of  the  short  length,  they  do  not  provide  enough   word  co-­‐  occurrence  or  shared  context  for  a  good   similarity  measure.      
  • 14.
  • 15.   (Wikipedia )     LDA     (Wikipedia )     ME(maximum  entropy)         LDA      
  • 16. LDA(model)  -­‐>  MaxEnt(classifier)   LDA sparse  text   D.  Blei,  A.  Ng,  and  M.  Jordan.  Latent  Dirichlet   Allocation.  JMLR,  3:993–1022,  2003.     SVM ME     SVM     SVM ( )  
  • 17.   Wikipedia             Hidden  topic        
  • 18. Univarsal  corpus   Final  data:          =240MB;   |docs|                              =  71,986;   |paragraphs|  =  882,376;   |vocabulary|    =  60,649;   |total  words|  =  30,492,305;     Wikipedia HTML     Stop  word 30
  • 19.       Universal  corpus  
  • 21.   (MEDLINE)
  • 23.   200 stable
  • 24.
  • 25.         API
  • 26.        MySQL  +  Python  +  TokyoCabinet           MeCab(+ )  +  Python  +  MPICH2     API     Python  +  Django(or  tornado)  +  apache(or  nginx)  +  redis  
  • 27.       , ,URL     if  (' '  in  n.feature  and  not  ' , '  in  n.feature  and  not   ' '  in  n.feature)     (Python+TokyoCabinet)     cat  dump.txt  |  python  loadtoTC.py     >df.addint(key,1)     ( )     100          
  • 28.   plda …     http://code.google.com/p/plda/         N=50,alpha=0.5,beta=0.1     Alpha:*empirically*  MIN(1,50/num_topics)       beta:    *empirically*  0.1   Master>  mpd  -­‐-­‐daemon  -­‐-­‐listenport=55555   Slave>  mpd  -­‐-­‐daemon  -­‐h  master  -­‐p  55555   mpdtrace;  mpdringtest   mpiexec  -­‐n  8  ./mpi_lda  -­‐-­‐num_topics  50  -­‐-­‐total_iterations  150  -­‐-­‐alpha  1  -­‐-­‐beta  0.1  -­‐-­‐ training_data_file  ~/201005.txt  -­‐-­‐model_file  /tmp/lda_model201005.txt  
  • 29.
  • 30. 9249.0   8833.0   7755.0   6846.0   5895.0   4765.0   4075.0   3669.0   3399.0   3340.0   3297.0   3223.0   3152.0   3053.0   3044.0   2926.0   2812.0   2670.0   2613.0   2607.0   2443.0   2428.0   2330.0   2244.0   2234.0   2219.0   2150.0   2058.0   2041.0   1956.0   1887.0   1855.0   1772.0   1765.0   1759.0   1741.0   1739.0   1685.0   1599.0   1582.0   1538.0  > 1535.0   1535.0   1524.0   1513.0   1500.0   1500.0   1471.0   1468.0   1464.0   1390.0   1358.0   1307.0   1280.0   > 1275.0   1268.0   1242.0   1204.0   1184.0   1151.0   1145.0   1132.0   1128.0   1107.0   1107.0   1106.0   1096.0   1092.0   1060.0   1057.0   1046.0   1041.0   1029.0   1025.0   1025.0   1015.0   1013.0   1000.0   939.0   922.0   906.0   906.0   904.0   900.0   899.0   898.0   892.0   891.0   879.0   877.0   867.0   865.0   863.0   858.0   852.0   852.0   844.0   840.0   831.0   821.0   label,4,
  • 31. 20262.0   5296.0   4542.0   4439.0   4352.0   4295.0   4266.0   3585.0   3554.0   3217.0   3190.0   3165.0   2812.0   2630.0   2541.0   2486.0   2456.0   2371.0   2324.0   2135.0   2084.0   2044.0   2030.0   1962.0   1910.0   1866.0   1769.0   1734.0   1679.0   1662.0   1575.0   1557.0   1486.0   1465.0   1441.0   1403.0   1302.0   1280.0   1278.0   1271.0   1268.0   1267.0   1245.0   1214.0   1208.0   1159.0   1154.0   1129.0   1122.0   1104.0   1096.0   1095.0   1075.0   1050.0   1035.0   1032.0   1029.0   1024.0   > 1021.0   1010.0   963.0   946.0   940.0   860.0   858.0   855.0   843.0   808.0   794.0   789.0   781.0   779.0   777.0   769.0   > 768.0   763.0   761.0   752.0   750.0   750.0   748.0   746.0   742.0   737.0   724.0   724.0   720.0   717.0   716.0   706.0   702.0   686.0   686.0   680.0   678.0   678.0   660.0   650.0   642.0   637.0   label,1,
  • 32. 56227.0   22429.0   12037.0   10790.0   8672.0   7979.0   4682.0   4612.0   4376.0   4356.0   4010.0   2891.0   2700.0   2637.0   2588.0   2579.0   2482.0   2477.0   2426.0   2384.0   2353.0   2331.0   2326.0   2194.0   2191.0   2149.0   2128.0   2078.0   2045.0   2030.0   2025.0   1876.0   1873.0   1841.0   1833.0   1776.0   1715.0   1713.0   1666.0   1593.0   1577.0   1532.0   1513.0   1407.0   > 1396.0   1374.0   1370.0   1370.0   1358.0   1316.0   1226.0   1218.0   1151.0   1105.0   1086.0   1034.0   979.0   967.0   947.0   945.0   930.0   927.0   886.0   882.0   872.0   871.0   850.0   840.0   831.0   825.0   808.0   799.0   793.0   792.0   783.0   762.0   759.0   758.0   753.0   751.0   741.0   733.0   732.0   714.0   713.0   708.0   705.0   700.0   686.0   660.0   659.0   641.0   633.0   625.0   622.0   617.0   615.0   610.0   610.0   608.0   label,2,???  
  • 33. 56227.0   22429.0   12037.0   10790.0   8672.0   7979.0   4682.0   4612.0   4376.0   4356.0   4010.0   2891.0   2700.0   2637.0   2588.0   2579.0   2482.0   2477.0   2426.0   2384.0   2353.0   2331.0   2326.0   2194.0   2191.0   2149.0   2128.0   2078.0   2045.0   2030.0   2025.0   1876.0   1873.0   1841.0   1833.0   1776.0   1715.0   1713.0   1666.0   1593.0   1577.0   1532.0   1513.0   1407.0   > 1396.0   1374.0   1370.0   1370.0   1358.0   1316.0   1226.0   1218.0   1151.0   1105.0   1086.0   1034.0   979.0   967.0   947.0   945.0   930.0   927.0   886.0   882.0   872.0   871.0   850.0   840.0   831.0   825.0   808.0   799.0   793.0   792.0   783.0   762.0   759.0   758.0   753.0   751.0   741.0   733.0   732.0   714.0   713.0   708.0   705.0   700.0   686.0   660.0   659.0   641.0   633.0   625.0   622.0   617.0   615.0   610.0   610.0   608.0   label,2,???  
  • 34. 8448.0   7233.0   6245.0   6119.0   5919.0   4790.0   4294.0   4212.0   3958.0   3814.0   3676.0   3196.0   3032.0  > 2861.0   2851.0   2812.0   2605.0   2588.0   2413.0   2383.0   2302.0   1833.0   1817.0   1786.0   1662.0   1637.0   1623.0   1617.0   1613.0   1546.0   1539.0   1500.0   1488.0   1469.0   1463.0   1463.0   1459.0   1450.0   1448.0   1395.0   1350.0   1328.0   1311.0   1298.0   1290.0   1285.0   1269.0   1247.0   1232.0   1216.0   1215.0   1207.0   1204.0   1191.0   1162.0   1155.0   1152.0   1146.0   1115.0   1103.0   1095.0   1072.0   1048.0   1037.0   1018.0   1018.0   1017.0   1017.0   1003.0   993.0   984.0   970.0   966.0   961.0   956.0   949.0   940.0   936.0   928.0   920.0   919.0   916.0   896.0   890.0   889.0   889.0   888.0   887.0   880.0   869.0   868.0   856.0   850.0   847.0   842.0   833.0   830.0   829.0   829.0   806.0   label,6,  
  • 35. 31845.0   9928.0   8205.0   7709.0   6715.0   4650.0   4461.0   4455.0   3718.0   3688.0   3592.0   3495.0   3483.0   2965.0   2651.0   2638.0   2336.0   2261.0   2151.0   2140.0   2112.0   2071.0   2065.0   2024.0   1995.0   1986.0   1973.0   1967.0   1942.0   1926.0   1838.0   1800.0   1760.0   1760.0   1727.0   1584.0   1500.0   1471.0   1445.0   1344.0   1302.0   1280.0   1221.0   1181.0   1174.0   1170.0   1162.0   1154.0   1146.0   1129.0   1116.0   1101.0   1072.0   1069.0   1057.0   1043.0   1024.0   1022.0   981.0   979.0   972.0   960.0  > 953.0   945.0   935.0   934.0   924.0   910.0   898.0   869.0   856.0   852.0   845.0   826.0   821.0   803.0   801.0   800.0   798.0   794.0   773.0   771.0   755.0   747.0   746.0   739.0   734.0   733.0   720.0   714.0   709.0   707.0   706.0   699.0   694.0   692.0   688.0   684.0   682.0   681.0   label,9,
  • 36.                        
  • 37.   Latent  topic         Latent  topic                        
  • 38. API Get 1. 
 2. (redis)
 3.   [Topic1:prob,Topic2:prob,]   JSON  
  • 39.   PLDA   Collaborative  Filtering  for  Orkut  Communities:  Discovery  of  User  Latent  Behavior.  Wen-­‐ Yen  Chen  et  al.,  WWW  2009.   http://www.cs.ucsb.edu/~wychen/publications/fp365-­‐chen.pdf       …   The  role  of  semantic  history  on  online  generative  topic  modeling.  L  AlSumait,  D   Barbará,  C  Domeniconi  -­‐  ise.gmu.edu   http://www.ise.gmu.edu/~carlotta/publications/Siam_SemOLDA.pdf     LDA …   R LDA author facebook/data   LDA     Not-­‐So-­‐Latent  Dirichlet  Allocation:  Collapsed  Gibbs  Sampling  Using  Human  Judgments     ePluribus:  Ethnicity  on  Social  Networks.     http://www.facebook.com/data#!/data?v=app_4949752878