Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Wikipedia For Awesome Data

35,633 views

Published on

Published in: Technology
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ So make sure to check it out!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Copas Url to Read PDF Format === http://ebookdfsrewsa.justdied.com/ ebookdfsrewsa.justdied.com383656856X-ando-l-oeuvre-complet-de-1975-a-nos-jours.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Mining Wikipedia For Awesome Data

  1. Mining Wikipedia for Awesome Data Neil Crosby
  2. What’s this about then? • There’s loads of groovy content on Wikipedia[citation needed]. • You are lazy. • You want groovy content on your site.
  3. Wikipedia has an API • Who knew? • http://en.wikipedia.org/w/api.php
  4. API has lots of options Param Values What does it do? format php, json, Output format. TODO redirects 0, 1 Redirect to good pages. rvsection 0, 1, 2, 3, etc Page section to get data for. action query, parse API method.
  5. Getting WikiText? Easy • http://en.wikipedia.org/w/api.php? format=php&action=query&titles=one +flew+over+the+cuckoo’s +nest&rvprop=content&prop=revisions&re directs=1
  6. Searching? Harder • Wikipedia doesn’t have a good search engine.
  7. Use Yahoo! BOSS • http://boss.yahooapis.com/ysearch/web/v1/ site:en.wikipedia.org+one+flew+over+the +cuckoo’s+nest?appid=yourBOSSiD • First result: http://en.wikipedia.org/wiki/ One_Flew_Over_the_Cuckoo's_Nest_(fil m)
  8. Then get WikiText • http://en.wikipedia.org/w/api.php? format=php&action=query&titles= One_Flew_Over_the_Cuckoo's_Nest_(fil m)&rvprop=content&prop=revisions&redir ects=1
  9. The WikiText '''''One Flew Over the Cuckoo's Nest''''' is a [[1975 in film|1975]] [[comedy-drama]] film [[film director|directed]] by [[Miloš Forman]]. The film is an adaptation of the 1962 novel ''[[One Flew Over the Cuckoo's Nest (novel)|One Flew Over the Cuckoo's Nest]]'' by [[Ken Kesey]]. The movie was the first to [[List of Big Five Academy Award winners and nominees|win all five]]...
  10. But I wanted HTML! • WikiText is no good for dumping into a website.
  11. Another API call • http://en.wikipedia.org/w/api.php? action=parse&format=php&text=returned +wiki+text • Text will be big - do as a POST.
  12. Wiki HTML! <p><i><b>One Flew Over the Cuckoo's Nest</b></i> is a <a href=quot;/wiki/ 1975_in_filmquot; title=quot;1975 in filmquot;>1975</a> <a href=quot;/wiki/Comedy-dramaquot; title=quot;Comedy-dramaquot;>comedy-drama</a> film <a href=quot;/wiki/Film_directorquot; title=quot;Film directorquot;>directed</a> by <a href=quot;/wiki/ Milo%C5%A1_Formanquot; title=quot;Miloš Formanquot;>Miloš Forman</a>. The film is an...
  13. Reducing the HTML • DOMDocument->loadHTML() • DOMXPath->query() to get wanted nodes. • DOMDocument->saveHTML() • str_replace() away HTML boilerplate.
  14. The Cuckoo Problem • “One Flew Over the Cuckoo’s Nest” • A book? • A film? • Depends on context.
  15. The Cuckoo Solution • Give context: • “one flew over the cuckoo’s nest book” • “one flew over the cuckoo’s nest movie” • Yahoo! BOSS gives relevant result. Yay.
  16. There’s still a problem... • Sometimes you can give too much context. • “wii fit” gets expected result. • “wii fit electronics” returns “WiiMote”. • Oh dear.
  17. When is too much? • Who knows? • Just because an article exists for the basic term doesn’t mean it’s the right article. • I’ve not solved this yet.
  18. It’s all too complicated • So don’t do it all. • Use a library. • http://thecodetrain.co.uk/code/wikislurp
  19. Runs as a web service • http://yoursite.com/wikislurp/?params=blah
  20. What are the params? Param Meaning secret Your self-chosen appid. query What you’d like wiki info about. context A little bit of context. section Article section to retrieve. Zero indexed. xpath Specify the elements to return. output Serialised php or json.
  21. What does it return? • An array. • Keys for “url”, “title” and “article”.
  22. Why a webservice? • You can’t abandon a function call in PHP. • You can abandon a CURL call. • If wikislurp takes too long, move on.
  23. Kitten Break There’s some code coming up, soz. http://www.flickr.com/photos/gsx-r750/1475603952/
  24. How to call WikiSlurp • http://yoursite.com/wikislurp/? secret=YOUR+SECRET&query=one+flew +over+the+cuckoo’s +nest&context=book&xpath=/html/body/ p[position()<=3]&section=0&output=json
  25. And from PHP? $s = curl_init(); curl_setopt($s,CURLOPT_URL, $url); curl_setopt($s,CURLOPT_HEADER,false); curl_setopt($s, CURLOPT_RETURNTRANSFER,1); // wait 1 second, then abort curl_setopt($s,CURLOPT_TIMEOUT,1); $result = curl_exec($s); curl_close( $s );
  26. XPath? Query Gives You //p All <p> All <p> directly under /html/body/p <body> /html/body/p[2] 2nd <p> directly... /html/body/ First three <p> directly... p[position()<=3]
  27. Oh noes, more XPath Query Gives You All <p> with single class /html/body/p[@class='fish'] “fish” /html/body/ All <p> with any class p[contains(concat(quot; including “fish” quot;,@class,quot; quot;), quot; fish quot;)]
  28. Phew. Have another kitten. http://www.flickr.com/photos/evapro/305689596/
  29. Future Features • Do something intelligent with context. • Convert to HTML without an extra API call. • Return proper error codes if things go wrong.
  30. Where is this used? • TheTenWordReview.com • IsNeilAnnoyedBy.com
  31. Questions? • I will blog about this talk at The Code Train. • No, really - I will. • Download the slurpy source code from http://thecodetrain.co.uk/code/wikislurp • Slides? http://icanhaz.com/wikislurpslides • I was and am http://NeilCrosby.com/vcard

×