ETRAN 2008

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    ETRAN 2008 - Presentation Transcript

    1.  ETRAN 2008 Modifikovan metod kNN, SVM i N-grami u sistemima za automatsko preporučivanje Popović Zoran, Centar za Multidisciplinarne studije, Beogradski Univerzitet        shoom013@gmail.com
    2. Sistemi za automatsko preporučivanje  (ACRS, i Information Retrieval) • automatic content recommendation systems  (pretraga velikog broja dokumenata,  upravaljanje takvim podacima i meta- podacima, pretrage i upiti) • CBR/CBF (Content Based Retrieval, ne zavisi  od korisnika), CF (Collaborative  Filtering,  zavisi od korisnika) • user information need
    3. Metod najbližeg suseda (kNN) • Ocena kategorije prema datim instancama  (skup obučavanja) i ponderima prema datoj  metrici i/ili drugim zahtevima (proširenje  prema CF karakteristikama) • A-O-V sa brojnim vrednostima – alternativno,  IBA fuzzy vrednosti atributa
    4. N-grami • N-gram kao podniz date niske nad datim  alfabetom tokena • profili N-grama • frekvencije i inverzne frekvencije • NLP i N-grami, multigrami
    5. SMV, MIL, SMO • metode klasifikacije maksimizovanjem  margine
    6. ngram.jar • java -Xms1500M -Xmx1500M -cp .\\ngram.jar ngram.generator.Arff %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 arff.cmd . .\\out.arff -l 1 -m 500 -N 4 -i 0.5 -D 1048576    (poddirektorijumi sa datotekama kao klase) • http://users.hemo.net/shoom/n-gram.zip http://users.hemo.net/shoom/samples.zip  • http://users.hemo.net/shoom/mustAgent.zip
    7. Rezultati • klase 1-8 od 210 fajlova  (oko 5MB total): 22  exe/com, 23 text, 56 html, 17 pdf, 33 gif/jpg,  19 jar, 30 Word, 13 mail) N i-threshold Lmax % correct / not N-grams seconds 4 0.34 500 98.48 / 1.52 2094646 6.27 4 0.5 500 94.83 / 5.16 2094646 6.16 4 0.5 500 94.83 / 5.16 1048576 6.28 3 0.34 500 97.17 / 2.83 1807820 6.8 4 0.34 800 98.10 / 1.90 2094646 8.14 5 0.34 800 97.12 / 2.88 2247852 8.38 4 0.34 1000 94.76 / 5.24 2094646 8.28 2 0.34 800 92.16 / 7.84 65536 7.25
    8. Weka, Eclipse
    9. Primer ARFF datoteke • @relation rel @attribute bag_id {bag0,bag1,bag2,bag3,bag4,bag5,...2} @attribute bag relational @attribute a1 numeric @attribute a2 numeric .... @end bag @attribute class {1,2,3,4,5,6,7,8} @data bag0,\"{41 0.2148861237401014, 42 0.13430382733756338, 47 0.1074430618700507, ..., 495 0.05372153093502535}\",1 bag1,\"{....}”,8 ....

    + shoom013shoom013, 5 months ago

    custom

    151 views, 0 favs, 0 embeds more stats

    A java based n-gram generator in ARFF format for We more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 151
      • 151 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?