MAKING SENSE OUT OFTHINGS ON THE WEB@pradeepbv
We have been accumulating    a lot of information                            3
http://en.wikipedia.org/wiki/File:Jingangjing.jpg   4
http://en.wikipedia.org/wiki/File:Printer_in_1568-ce.png                                                           http://...
6http://en.wikipedia.org/wiki/Odhecaton
What hath God wroughthttp://upload.wikimedia.org/wikipedia/commons/f/f1/The_First_Telegraph.jpg                           ...
1891 Telegraph Lines   http://en.wikipedia.org/wiki/File:1891_Telegraph_Lines.jpg                                         ...
Mr Watson—Come hereI want to see you                                           9      http://www.boerner.net/jboerner/?p=9...
radioRadio        10
http://www.elon.edu/e-web/predictions/150/1930.xhtml   11
12
13
14
www      15
http://en.wikipedia.org/wiki/File:NCSA_Mosaic.PNG                                                    16
the Internet had an estimated   16 million users by 1995                                17
http://en.wikipedia.org/wiki/Venture_capital   18
People from all over the worldstarted sharing their interests,   hopes and dreams online                                  ...
20
21http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
The number of devices connected to IP networkswill be nearly three times as high as the globalpopulation in 2016          ...
kilo                                                                                                mega                  ...
“Reports that say that something hasnthappened are always interesting to me, becauseas we know, there are known knowns; th...
Nicholas Carr worriesthat the flood of digitalinformation is changingnot only our habits, buteven our mentalcapacities: Fo...
Information overload?http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg        26
DO YOU KNOW WHAT ARE YOU LOOKING FOR?                                                                                     ...
DO YOU KNOW WHERE TO FIND WHAT YOU WANT?                        http://www.flickr.com/photos/special/1597251/   28
REGULAR SEARCH #FAIL?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/   29
IS THERE A SUPERHERO       WHO CAN HELP?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/   30
BUILD YOUR OWN SEARCH SERVICE Yes, you are the superhero
BOSS IS     BUILD YOUR OWN      SEARCH SERVICEhttp://developer.yahoo.com/search/boss/
BOSS PROVIDES APIS TO OUR SEARCH  DATA STORES
TO BUILD YOUR OWN     POWERFULSEARCH APPLICATIONS
BOSS allows you to   search over   Web, images,   news & Blogs
You can even monetize yourapplications using Search Ads from BOSS and get support.
What can be done on top of BOSS?• Blend and re-rank search results• Your own look and feel• Mix it with other APIs
BOSS Pricing
Free for building your hacks!!
Where do I start?
What’s in it?Restful XML and JSON API    Web    Image    Spelling    News    Search Ads                              ...
Oauth based Autenticationhttp//www.flickr.com/photos/friarsbalsam/5736126308/sizes/o/in/photostream/.jpg
What else do I get? Web and Limited Web results Image attributes   like height, width, etc Time span filtering   for Ne...
BOSS + YQL• Table Name: boss.search Example          Parameters  Consumer Key      ck         -  Consumer Secret   secret ...
Searching “The Dark Knight”
Finding images of “The Dark Knight Rises”select * from boss.search where q="The Dark   Knight Rises" and service="images" ...
Finding “The Dark Knight Rises” in IMDB,           movies.yahoo.comselect * from boss.search where q="The Dark            ...
Spell Check and Correctionselect * from boss.search where q="The Dark  Knight Rises" and service="spelling" and           ...
Finding news on “The Dark Knight Rises”select * from boss.search where q="The DarkKnight Rises" and service="news" and ck=...
And through the BOSS API Getting multiple data sets    /ysearch/web,images,news?q=anna    /ysearch/web,images,news?web....
Unary Operators Search for Batman but not “Dark Knight”    q=(batman -“Dark Knight") Find pages with “Heath Ledger” but...
Searching in body and in title Searching for Dark Knight in the Title on Yahoo movies    q=reviews intitle:"dark knight"...
Market and document specific Filters Search for “Dark Knight” in India specific sites     q=“Dark Knight”&market=en-in ...
Output
Image search parameters Search for images that are not offensive    /ysearch/images?q=“san francisco”&filter=yes Search...
News search parameters Search news that is less than 7 days old /ysearch/news?q=lokpal&age=7dSearch news that is betwee...
EXAMPLE HACKS
Duckduckgo.com
Interceder
Ask-boss (v1)Hack: http://ask-boss.appspot.comCode: https://github.com/saurabhsahni/Hacks/tree/master/askBOSS
webmeme.in
http://hackyourworld.org/~iitb_pacman/search/
I did BOSS and got data, now how  to extract information of out it?
make sense out of it?
Content Analysisselect * from contentanalysis.analyze where       text="Yahoo! kicks off hackday”
Content Analysis from a URLselect * from contentanalysis.analyze where        url="http://www.cnn.com/"
Term Exractionselect * from search.termextract where context  in (select description from rss where url=‘’)
More resources Yahoo! BOSS: http://developer.yahoo.com/boss BOSS Technical Documentation:  http://developer.yahoo.com/se...
Happy hacking!
Making sense out of things on the web
Making sense out of things on the web
Making sense out of things on the web
Upcoming SlideShare
Loading in...5
×

Making sense out of things on the web

1,564

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,564
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Welcome to Openhackindia
  • I am Pradeep BV. I work as an engineer in YDN. My talk is about making sense out of things on the web.
  • A lot of it since time immemorial
  • The world's earliest dated printed book, AD 868 - The intricate frontispiece of the Diamond Sutra from Tang Dynasty China, (British Museum) - http://en.wikipedia.org/wiki/Woodblock_printing
  • Printing press – enabled mass productionThis woodcut from 1568 shows the left printer removing a page from the press while the one at right inks the text-blocks. Such a duo could reach 14,000 hand movements per working day, printing around 3,600 pages in the process. Woodcut print dated 1423 of St. Christopher from Buxheim on the Upper Rhine- http://en.wikipedia.org/wiki/Printing_press
  • We then started printing musical notes – early signs of converting everything into data? The HarmoniceMusicesOdhecaton (also known simply as the Odhecaton) was an anthology of secular songs published by OttavianoPetrucci in 1501 in Venice. It was the first book of music ever to be printed using movable type, and was hugely influential both in publishing in general, and in dissemination of the Franco-Flemish musical style. - http://en.wikipedia.org/wiki/Odhecaton
  • And then we started transmitting dataOn 24 May 1844, after the line was completed, Morse made the first public demonstration of his telegraph by sending a message from the Supreme Court Chamber in the U.S. Capitol in Washington, D.C. to the B&O Railroad "outer depot" (now the B&O Railroad Museum) in Baltimore.
  • Across continents… the first internet?
  • And then we graduated into communicating…. 1876
  • And then came broadcasting Tesla demonstrating wireless transmissions during his high frequency and potential lecture of 1891. After continued research, Tesla presented the fundamentals of radio in 1893.http://en.wikipedia.org/wiki/Radio#History
  • And then the TV
  • The first group of networked computers communicated with each other in 1969, and ARPANET, or the Advanced Projects Research Agency Network became the start of the internet. Four U.S. universities were connected and became a research system by which computer scientists began solving problems and building the potential for worldwide, online connectivity. ARPANET had its first public demonstration in 1972 - http://www.elon.edu/e-web/predictions/150/1960.xhtml and http://personalpages.manchester.ac.uk/staff/m.dodge/cybergeography/atlas/historical.html
  • And the internet happened http://www.ddsmedia.net/blog/2009/10/40-anos-del-primer-mensaje-en-una-red-arpanet/
  • Closely followed by email… http://www.multicians.org/thvv/mail-history.html
  • In 1991, the World Wide Web was developed by Tim Berners-Lee (pictured at left) as a way for people to share information. The hyper-text format available through his Web made the internet much easier to use because all documents could be seen easily on-screen without downloading.
  • Mosaic was developed at the National Center for Supercomputing Applications (NCSA)[4] at the University of Illinois Urbana-Champaign beginning in late 1992. NCSA released the browser in 1993,[6] and officially discontinued development and support on January 7, 1997.[7] However, it can still be downloaded from NCSA -http://en.wikipedia.org/wiki/Mosaic_(web_browser)
  • It started gaining traction
  • And the venture capitalist's happened. 
  • So internet was no longer limited to just data
  • So internet was no longer limited to just data
  • And we become one tiny spec in the large information universe.http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
  • http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html
  • The problem is getting only worse. Kilo, mega, tera, giga, pita, exa, zetta, yotta - http://en.wikipedia.org/wiki/Orders_of_magnitude_(data)http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html
  • Not being able to find stuff can lead to tough situations…. http://www.bcbusinessonline.ca/bcb/bc-blogs/conference/2010/09/24/11-dumbest-quotes-powerful-people
  • http://www.boston.com/bostonglobe/ideas/articles/2010/11/28/information_overload_the_early_years/#http://blogs.reuters.com/chrystia-freeland/2011/09/23/yuri-milner-on-the-future-of-the-internet/
  • This is an information overload. http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg http://visual.ly/failed-tech-predictions-1http://www.boston.com/bostonglobe/ideas/articles/2010/11/28/information_overload_the_early_years/?page=full
  • Sifting,sortingcanbecometoughonceyouhavetonnes of stuff to look at. http://www.teachersdiary.com/.a/6a0115703931fc970c0128765537ba970c-800wi
  • Did I say needle in a haystack? - http://www.flickr.com/photos/special/1597251/
  • Talk about the secret sauce
  • http://developer.yahoo.com/search/boss/#pricing
  • http://developer.yahoo.com/search/boss/#pricing
  • Lets see whats in it to begin with
  • You can use YQL tables to get started.
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • Add more
  • Add more
  • Add more
  • Why can’t it be as easy as this? Do you know the person? Do you value her/her opinion? How do you analayze the sentiment behind the message?
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • Fix these links
  • Making sense out of things on the web

    1. 1. MAKING SENSE OUT OFTHINGS ON THE WEB@pradeepbv
    2. 2. We have been accumulating a lot of information 3
    3. 3. http://en.wikipedia.org/wiki/File:Jingangjing.jpg 4
    4. 4. http://en.wikipedia.org/wiki/File:Printer_in_1568-ce.png http://en.wikipedia.org/wiki/File:BuxheimStChristopher.jpg 5
    5. 5. 6http://en.wikipedia.org/wiki/Odhecaton
    6. 6. What hath God wroughthttp://upload.wikimedia.org/wikipedia/commons/f/f1/The_First_Telegraph.jpg 7
    7. 7. 1891 Telegraph Lines http://en.wikipedia.org/wiki/File:1891_Telegraph_Lines.jpg 8
    8. 8. Mr Watson—Come hereI want to see you 9 http://www.boerner.net/jboerner/?p=9396
    9. 9. radioRadio 10
    10. 10. http://www.elon.edu/e-web/predictions/150/1930.xhtml 11
    11. 11. 12
    12. 12. 13
    13. 13. 14
    14. 14. www 15
    15. 15. http://en.wikipedia.org/wiki/File:NCSA_Mosaic.PNG 16
    16. 16. the Internet had an estimated 16 million users by 1995 17
    17. 17. http://en.wikipedia.org/wiki/Venture_capital 18
    18. 18. People from all over the worldstarted sharing their interests, hopes and dreams online 19
    19. 19. 20
    20. 20. 21http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
    21. 21. The number of devices connected to IP networkswill be nearly three times as high as the globalpopulation in 2016 22
    22. 22. kilo mega tera giga pita The Zettabyte Era exa zetta 9,444,732,965,739,290,427,392 bits (1024 exbibytes) yottahttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html 23
    23. 23. “Reports that say that something hasnthappened are always interesting to me, becauseas we know, there are known knowns; there arethings we know we know. We also know thereare known unknowns; that is to say we knowthere are some things we do not know. Butthere are also unknown unknowns – the oneswe dont know we dont know.” Donald Rumsfeld, US Defense Secretary at a press conference at NATO Headquarters, Brussels, Belgium, June 6, 2002 Image: planetization.org 24
    24. 24. Nicholas Carr worriesthat the flood of digitalinformation is changingnot only our habits, buteven our mentalcapacities: Forced to scanand skim to keep up, weare losing our abilities topay sustainedattention, reflectdeeply, or rememberwhat we’ve learned. 25
    25. 25. Information overload?http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg 26
    26. 26. DO YOU KNOW WHAT ARE YOU LOOKING FOR? 27 http://www.teachersdiary.com/.a/6a0115703931fc970c0128765537ba970c-800wi
    27. 27. DO YOU KNOW WHERE TO FIND WHAT YOU WANT? http://www.flickr.com/photos/special/1597251/ 28
    28. 28. REGULAR SEARCH #FAIL?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/ 29
    29. 29. IS THERE A SUPERHERO WHO CAN HELP?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/ 30
    30. 30. BUILD YOUR OWN SEARCH SERVICE Yes, you are the superhero
    31. 31. BOSS IS BUILD YOUR OWN SEARCH SERVICEhttp://developer.yahoo.com/search/boss/
    32. 32. BOSS PROVIDES APIS TO OUR SEARCH DATA STORES
    33. 33. TO BUILD YOUR OWN POWERFULSEARCH APPLICATIONS
    34. 34. BOSS allows you to search over Web, images, news & Blogs
    35. 35. You can even monetize yourapplications using Search Ads from BOSS and get support.
    36. 36. What can be done on top of BOSS?• Blend and re-rank search results• Your own look and feel• Mix it with other APIs
    37. 37. BOSS Pricing
    38. 38. Free for building your hacks!!
    39. 39. Where do I start?
    40. 40. What’s in it?Restful XML and JSON API Web Image Spelling News Search Ads http//www.flickr.com/photos/joeshlabotnik/419914250/sizes/o/in/photostream/.jpg
    41. 41. Oauth based Autenticationhttp//www.flickr.com/photos/friarsbalsam/5736126308/sizes/o/in/photostream/.jpg
    42. 42. What else do I get? Web and Limited Web results Image attributes like height, width, etc Time span filtering for News Search Document type filtering Extended abstracts http//www.flickr.com/photos/acidpix/6021203584/sizes/o/in/photostream/.jpg
    43. 43. BOSS + YQL• Table Name: boss.search Example Parameters Consumer Key ck - Consumer Secret secret - Query Term q ‘iitd’• e.g. select * from boss.search where ck=… and secret=… and q=‘openhackindia’
    44. 44. Searching “The Dark Knight”
    45. 45. Finding images of “The Dark Knight Rises”select * from boss.search where q="The Dark Knight Rises" and service="images" and ck="..." and secret="..."
    46. 46. Finding “The Dark Knight Rises” in IMDB, movies.yahoo.comselect * from boss.search where q="The Dark Knight Rises" and sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."
    47. 47. Spell Check and Correctionselect * from boss.search where q="The Dark Knight Rises" and service="spelling" and ck="..." and secret="..."
    48. 48. Finding news on “The Dark Knight Rises”select * from boss.search where q="The DarkKnight Rises" and service="news" and ck="..." and secret="..."
    49. 49. And through the BOSS API Getting multiple data sets  /ysearch/web,images,news?q=anna  /ysearch/web,images,news?web.q=anna&images.q=anna&news.q=lokpal Searching through sites  A Simple Movie Search  /ysearch/web?q=“Dark Knight”& sites=movies.yahoo.com,netflix.com,imdb.com AND/OR operators  /ysearch/web?q="steve jobs"AND((ipad)OR(iphone))&sites=bestbuy.com,newegg.com  Important: Use Braces or quotes
    50. 50. Unary Operators Search for Batman but not “Dark Knight”  q=(batman -“Dark Knight") Find pages with “Heath Ledger” but not “Dark Knight”  q=+”heath ledger”–”Dark Knight”&sites=movies.yahoo.com Force auto-spelling off  q=+”drk knight”
    51. 51. Searching in body and in title Searching for Dark Knight in the Title on Yahoo movies  q=reviews intitle:"dark knight"&sites=movies.yahoo.com Searching for Dark Knight in the Title in Yahoo movies containing Christian Bale  q=reviews intitle:"dark knight" inbody:"christian bale"&sites=movies.yahoo.com
    52. 52. Market and document specific Filters Search for “Dark Knight” in India specific sites  q=“Dark Knight”&market=en-in Search for “PDF’s containing “Dark Knight”  q=“Dark Knight”&type=pdf Search for MS Office type (except PPT’s) containing “Dark Knight”  q=“Dark Knight”&type=msoffice,-ppt
    53. 53. Output
    54. 54. Image search parameters Search for images that are not offensive  /ysearch/images?q=“san francisco”&filter=yes Search for images that are wallpaper size  /ysearch/images?q=“san francisco”&dimensions=wallpaper Search for a image at a certain refer URL  /ysearch/images?q=yahoo&refererurl=http://www.flickr.com• Interesting Output Fields  format, file size, height, width, title, total result count
    55. 55. News search parameters Search news that is less than 7 days old /ysearch/news?q=lokpal&age=7dSearch news that is between 20hrs and 2 days old /ysearch/news?q=lokpal&age=20h2dRe-rank news results by date /ysearch/news?q=lokpal&ranking=trueInteresting Output Fields  Source, Date, Source URL
    56. 56. EXAMPLE HACKS
    57. 57. Duckduckgo.com
    58. 58. Interceder
    59. 59. Ask-boss (v1)Hack: http://ask-boss.appspot.comCode: https://github.com/saurabhsahni/Hacks/tree/master/askBOSS
    60. 60. webmeme.in
    61. 61. http://hackyourworld.org/~iitb_pacman/search/
    62. 62. I did BOSS and got data, now how to extract information of out it?
    63. 63. make sense out of it?
    64. 64. Content Analysisselect * from contentanalysis.analyze where text="Yahoo! kicks off hackday”
    65. 65. Content Analysis from a URLselect * from contentanalysis.analyze where url="http://www.cnn.com/"
    66. 66. Term Exractionselect * from search.termextract where context in (select description from rss where url=‘’)
    67. 67. More resources Yahoo! BOSS: http://developer.yahoo.com/boss BOSS Technical Documentation: http://developer.yahoo.com/search/boss/boss_api_guide/ YQL: http://developer.yahoo.com/yql Amazon Web Services: http://aws.amazon.com oAuth: http://oauth.net/ Open Data: http://theinfo.org Alt Search Engines: http://www.altsearchengines.com/
    68. 68. Happy hacking!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×