• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Making sense out of things on the web

Making sense out of things on the web






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Welcome to Openhackindia
  • I am Pradeep BV. I work as an engineer in YDN. My talk is about making sense out of things on the web.
  • A lot of it since time immemorial
  • The world's earliest dated printed book, AD 868 - The intricate frontispiece of the Diamond Sutra from Tang Dynasty China, (British Museum) - http://en.wikipedia.org/wiki/Woodblock_printing
  • Printing press – enabled mass productionThis woodcut from 1568 shows the left printer removing a page from the press while the one at right inks the text-blocks. Such a duo could reach 14,000 hand movements per working day, printing around 3,600 pages in the process. Woodcut print dated 1423 of St. Christopher from Buxheim on the Upper Rhine- http://en.wikipedia.org/wiki/Printing_press
  • We then started printing musical notes – early signs of converting everything into data? The HarmoniceMusicesOdhecaton (also known simply as the Odhecaton) was an anthology of secular songs published by OttavianoPetrucci in 1501 in Venice. It was the first book of music ever to be printed using movable type, and was hugely influential both in publishing in general, and in dissemination of the Franco-Flemish musical style. - http://en.wikipedia.org/wiki/Odhecaton
  • And then we started transmitting dataOn 24 May 1844, after the line was completed, Morse made the first public demonstration of his telegraph by sending a message from the Supreme Court Chamber in the U.S. Capitol in Washington, D.C. to the B&O Railroad "outer depot" (now the B&O Railroad Museum) in Baltimore.
  • Across continents… the first internet?
  • And then we graduated into communicating…. 1876
  • And then came broadcasting Tesla demonstrating wireless transmissions during his high frequency and potential lecture of 1891. After continued research, Tesla presented the fundamentals of radio in 1893.http://en.wikipedia.org/wiki/Radio#History
  • And then the TV
  • The first group of networked computers communicated with each other in 1969, and ARPANET, or the Advanced Projects Research Agency Network became the start of the internet. Four U.S. universities were connected and became a research system by which computer scientists began solving problems and building the potential for worldwide, online connectivity. ARPANET had its first public demonstration in 1972 - http://www.elon.edu/e-web/predictions/150/1960.xhtml and http://personalpages.manchester.ac.uk/staff/m.dodge/cybergeography/atlas/historical.html
  • And the internet happened http://www.ddsmedia.net/blog/2009/10/40-anos-del-primer-mensaje-en-una-red-arpanet/
  • Closely followed by email… http://www.multicians.org/thvv/mail-history.html
  • In 1991, the World Wide Web was developed by Tim Berners-Lee (pictured at left) as a way for people to share information. The hyper-text format available through his Web made the internet much easier to use because all documents could be seen easily on-screen without downloading.
  • Mosaic was developed at the National Center for Supercomputing Applications (NCSA)[4] at the University of Illinois Urbana-Champaign beginning in late 1992. NCSA released the browser in 1993,[6] and officially discontinued development and support on January 7, 1997.[7] However, it can still be downloaded from NCSA -http://en.wikipedia.org/wiki/Mosaic_(web_browser)
  • It started gaining traction
  • And the venture capitalist's happened. 
  • So internet was no longer limited to just data
  • So internet was no longer limited to just data
  • And we become one tiny spec in the large information universe.http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
  • http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html
  • The problem is getting only worse. Kilo, mega, tera, giga, pita, exa, zetta, yotta - http://en.wikipedia.org/wiki/Orders_of_magnitude_(data)http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html
  • Not being able to find stuff can lead to tough situations…. http://www.bcbusinessonline.ca/bcb/bc-blogs/conference/2010/09/24/11-dumbest-quotes-powerful-people
  • http://www.boston.com/bostonglobe/ideas/articles/2010/11/28/information_overload_the_early_years/#http://blogs.reuters.com/chrystia-freeland/2011/09/23/yuri-milner-on-the-future-of-the-internet/
  • This is an information overload. http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg http://visual.ly/failed-tech-predictions-1http://www.boston.com/bostonglobe/ideas/articles/2010/11/28/information_overload_the_early_years/?page=full
  • Sifting,sortingcanbecometoughonceyouhavetonnes of stuff to look at. http://www.teachersdiary.com/.a/6a0115703931fc970c0128765537ba970c-800wi
  • Did I say needle in a haystack? - http://www.flickr.com/photos/special/1597251/
  • Talk about the secret sauce
  • http://developer.yahoo.com/search/boss/#pricing
  • http://developer.yahoo.com/search/boss/#pricing
  • Lets see whats in it to begin with
  • You can use YQL tables to get started.
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • Add more
  • Add more
  • Add more
  • Why can’t it be as easy as this? Do you know the person? Do you value her/her opinion? How do you analayze the sentiment behind the message?
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • A.K,.A a big waste of time
  • Fix these links

Making sense out of things on the web Making sense out of things on the web Presentation Transcript

  • We have been accumulating a lot of information 3
  • http://en.wikipedia.org/wiki/File:Jingangjing.jpg 4
  • http://en.wikipedia.org/wiki/File:Printer_in_1568-ce.png http://en.wikipedia.org/wiki/File:BuxheimStChristopher.jpg 5
  • 6http://en.wikipedia.org/wiki/Odhecaton
  • What hath God wroughthttp://upload.wikimedia.org/wikipedia/commons/f/f1/The_First_Telegraph.jpg 7
  • 1891 Telegraph Lines http://en.wikipedia.org/wiki/File:1891_Telegraph_Lines.jpg 8
  • Mr Watson—Come hereI want to see you 9 http://www.boerner.net/jboerner/?p=9396
  • radioRadio 10
  • http://www.elon.edu/e-web/predictions/150/1930.xhtml 11
  • 12
  • 13
  • 14
  • www 15
  • http://en.wikipedia.org/wiki/File:NCSA_Mosaic.PNG 16
  • the Internet had an estimated 16 million users by 1995 17
  • http://en.wikipedia.org/wiki/Venture_capital 18
  • People from all over the worldstarted sharing their interests, hopes and dreams online 19
  • 20
  • 21http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
  • The number of devices connected to IP networkswill be nearly three times as high as the globalpopulation in 2016 22
  • kilo mega tera giga pita The Zettabyte Era exa zetta 9,444,732,965,739,290,427,392 bits (1024 exbibytes) yottahttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html 23
  • “Reports that say that something hasnthappened are always interesting to me, becauseas we know, there are known knowns; there arethings we know we know. We also know thereare known unknowns; that is to say we knowthere are some things we do not know. Butthere are also unknown unknowns – the oneswe dont know we dont know.” Donald Rumsfeld, US Defense Secretary at a press conference at NATO Headquarters, Brussels, Belgium, June 6, 2002 Image: planetization.org 24
  • Nicholas Carr worriesthat the flood of digitalinformation is changingnot only our habits, buteven our mentalcapacities: Forced to scanand skim to keep up, weare losing our abilities topay sustainedattention, reflectdeeply, or rememberwhat we’ve learned. 25
  • Information overload?http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg 26
  • DO YOU KNOW WHAT ARE YOU LOOKING FOR? 27 http://www.teachersdiary.com/.a/6a0115703931fc970c0128765537ba970c-800wi
  • DO YOU KNOW WHERE TO FIND WHAT YOU WANT? http://www.flickr.com/photos/special/1597251/ 28
  • REGULAR SEARCH #FAIL?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/ 29
  • IS THERE A SUPERHERO WHO CAN HELP?http://www.flickr.com/photos/sumrow/1267682594/sizes/l/ 30
  • BUILD YOUR OWN SEARCH SERVICE Yes, you are the superhero
  • BOSS IS BUILD YOUR OWN SEARCH SERVICEhttp://developer.yahoo.com/search/boss/
  • BOSS allows you to search over Web, images, news & Blogs
  • You can even monetize yourapplications using Search Ads from BOSS and get support.
  • What can be done on top of BOSS?• Blend and re-rank search results• Your own look and feel• Mix it with other APIs
  • BOSS Pricing
  • Free for building your hacks!!
  • Where do I start?
  • What’s in it?Restful XML and JSON API Web Image Spelling News Search Ads http//www.flickr.com/photos/joeshlabotnik/419914250/sizes/o/in/photostream/.jpg
  • Oauth based Autenticationhttp//www.flickr.com/photos/friarsbalsam/5736126308/sizes/o/in/photostream/.jpg
  • What else do I get? Web and Limited Web results Image attributes like height, width, etc Time span filtering for News Search Document type filtering Extended abstracts http//www.flickr.com/photos/acidpix/6021203584/sizes/o/in/photostream/.jpg
  • BOSS + YQL• Table Name: boss.search Example Parameters Consumer Key ck - Consumer Secret secret - Query Term q ‘iitd’• e.g. select * from boss.search where ck=… and secret=… and q=‘openhackindia’
  • Searching “The Dark Knight”
  • Finding images of “The Dark Knight Rises”select * from boss.search where q="The Dark Knight Rises" and service="images" and ck="..." and secret="..."
  • Finding “The Dark Knight Rises” in IMDB, movies.yahoo.comselect * from boss.search where q="The Dark Knight Rises" and sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."
  • Spell Check and Correctionselect * from boss.search where q="The Dark Knight Rises" and service="spelling" and ck="..." and secret="..."
  • Finding news on “The Dark Knight Rises”select * from boss.search where q="The DarkKnight Rises" and service="news" and ck="..." and secret="..."
  • And through the BOSS API Getting multiple data sets  /ysearch/web,images,news?q=anna  /ysearch/web,images,news?web.q=anna&images.q=anna&news.q=lokpal Searching through sites  A Simple Movie Search  /ysearch/web?q=“Dark Knight”& sites=movies.yahoo.com,netflix.com,imdb.com AND/OR operators  /ysearch/web?q="steve jobs"AND((ipad)OR(iphone))&sites=bestbuy.com,newegg.com  Important: Use Braces or quotes
  • Unary Operators Search for Batman but not “Dark Knight”  q=(batman -“Dark Knight") Find pages with “Heath Ledger” but not “Dark Knight”  q=+”heath ledger”–”Dark Knight”&sites=movies.yahoo.com Force auto-spelling off  q=+”drk knight”
  • Searching in body and in title Searching for Dark Knight in the Title on Yahoo movies  q=reviews intitle:"dark knight"&sites=movies.yahoo.com Searching for Dark Knight in the Title in Yahoo movies containing Christian Bale  q=reviews intitle:"dark knight" inbody:"christian bale"&sites=movies.yahoo.com
  • Market and document specific Filters Search for “Dark Knight” in India specific sites  q=“Dark Knight”&market=en-in Search for “PDF’s containing “Dark Knight”  q=“Dark Knight”&type=pdf Search for MS Office type (except PPT’s) containing “Dark Knight”  q=“Dark Knight”&type=msoffice,-ppt
  • Output
  • Image search parameters Search for images that are not offensive  /ysearch/images?q=“san francisco”&filter=yes Search for images that are wallpaper size  /ysearch/images?q=“san francisco”&dimensions=wallpaper Search for a image at a certain refer URL  /ysearch/images?q=yahoo&refererurl=http://www.flickr.com• Interesting Output Fields  format, file size, height, width, title, total result count
  • News search parameters Search news that is less than 7 days old /ysearch/news?q=lokpal&age=7dSearch news that is between 20hrs and 2 days old /ysearch/news?q=lokpal&age=20h2dRe-rank news results by date /ysearch/news?q=lokpal&ranking=trueInteresting Output Fields  Source, Date, Source URL
  • Duckduckgo.com
  • Interceder
  • Ask-boss (v1)Hack: http://ask-boss.appspot.comCode: https://github.com/saurabhsahni/Hacks/tree/master/askBOSS
  • webmeme.in
  • http://hackyourworld.org/~iitb_pacman/search/
  • I did BOSS and got data, now how to extract information of out it?
  • make sense out of it?
  • Content Analysisselect * from contentanalysis.analyze where text="Yahoo! kicks off hackday”
  • Content Analysis from a URLselect * from contentanalysis.analyze where url="http://www.cnn.com/"
  • Term Exractionselect * from search.termextract where context in (select description from rss where url=‘’)
  • More resources Yahoo! BOSS: http://developer.yahoo.com/boss BOSS Technical Documentation: http://developer.yahoo.com/search/boss/boss_api_guide/ YQL: http://developer.yahoo.com/yql Amazon Web Services: http://aws.amazon.com oAuth: http://oauth.net/ Open Data: http://theinfo.org Alt Search Engines: http://www.altsearchengines.com/
  • Happy hacking!