Your SlideShare is downloading. ×
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Archiving the Mobile Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Archiving the Mobile Web

2,765

Published on

Presented at WADL 2013 in Indianapolis, Indiana.

Presented at WADL 2013 in Indianapolis, Indiana.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,765
On Slideshare
0
From Embeds
0
Number of Embeds
75
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • iPhone introduced in the United States on June 29, 2007
  • Transcript

    • 1. Archiving the Mobile Web Frank McCown, Monica Yarbrough, & Keith Enlow Computer Science Dept Harding University WADL 2013 Indianapolis, IN July 25, 2013
    • 2. Mobile vs. Stationary Web
    • 3. Mobile Web-Related Markup Languages http://en.wikipedia.org/wiki/File:Mobile_Web_Standards_Evolution_Vector.svg Smartphone era
    • 4. Two Types of Mobile Web Feature Phone Web Smartphone Web cHTML (iMode), WML, WAP, etc. XHTML, HTML5, etc.
    • 5. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance <!-- CSS media query on a link element --> <link rel="stylesheet" media="(max-width: 800px)" href="example.css" /> <!-- CSS media query within a style sheet --> <style> @media (max-width: 600px) { .sidebar { display: none; } } </style>
    • 6. Example of Responsive Web Design
    • 7. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance 2. Redirect mobile user agent to mobile site • Client-side redirection • Server-side redirection
    • 8. Client-Side Redirection • JavaScript detects mobile user agent // From www.harding.edu var ua = navigator.userAgent.toLowerCase(); if (queryString.match('version=mobile') || ua.match(/IEMobile|Windows CE|NetFront|PlayStation|like Mac OS Z|MIDP|UP.Browser|Symbian| Nintendo|BlackBerry|mobile/i)) { if (!ua.match('ipad')) { if (window.location.pathname.match('.html')) window.location = window.location.pathname.replace('.html', '.m.html'); else window.location = window.location.pathname + 'index.m.html'; } }
    • 9. Client-Side Redirection
    • 10. Server-Side Redirection • Server routes mobile user agent to different page Apache Example: RewriteEngine On RewriteBase / RewriteCond %{HTTP_USER_AGENT} (android|bbd+|meego).+mobile|avantgo|badda/|blackberry|blazer|etc…|zte-) [NC] RewriteRule ^$ http://detectmobilebrowser.com/mobile [R,L] https://developers.google.com/webmasters/smartphone-sites/details
    • 11. Server-Side Redirection
    • 12. Serving Up Mobile Sites 1. Responsive web design • Same HTML content to desktop and mobile • CSS media queries alter appearance 2. Redirect mobile user agent to mobile site • Client-side redirection • Server-side redirection 3. User-agent content negotiation • Dynamically serving different HTML for the same URL
    • 13. User-Agent Content Negotiation • Server serves up different content for same URL • Use Vary: User-Agent header in response • Best method for serving content quickly
    • 14. Archiving Mobile Sites 1. Responsive web design • Easy: Crawl like normal • Use client tools to view page formatted for mobile 2. Redirect mobile user agent to mobile site • Need to crawl with mobile user agent • Need JavaScript-enabled crawler to handle client-side redirection 3. User-agent content negotiation • Need to crawl with mobile user agent • Need to distinguish mobile vs. desktop for same URL
    • 15. How are we doing archiving mobile sites so far?
    • 16. Earliest archived page
    • 17. Earliest 2007 archived page: WML
    • 18. Finally some news!
    • 19. Really???
    • 20. Great…
    • 21. Only desktop version is archived!
    • 22. Mobile Finder By Monica Yarbrough
    • 23. Google’s Suggestions for SEO • Vary HTTP Header • Annotations within the HTML: • On desktop page: • <link rel=“alternate” media=“only screen and (max-width: 640px)” href=“http://m.example.com/page-1” > • On mobile page: • <link rel=“canonical” href=“http://www.example.com/page-1” > • Media queries https://developers.google.com/webmasters/smartphone-sites/
    • 24. How Mobile Finder Works • Use both desktop and mobile useragents • Look for: • Redirect • Different content • Different stylesheets • Media queries
    • 25. How Mobile Finder Works • Change the url to fit common mobile url patterns ex: www.t-mobile.com m.t-mobile.com
    • 26. PhantomJs • Headless WebKit (browser) • Well-known and widely used • Used to get the content of a page • Takes snapshots of the sites it visits • Scriptable with coffeescript or javascript
    • 27. Web Service • Query string with 2 parameters • url (required) • useragent (optional) • http://cs.harding.edu/mobilefinder/service.php?url=URL&u seragent=USER_AGENT • Default useragent = Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; mediaqueries/1.0; +http://cs.harding.edu)
    • 28. Results <MobileFinder> <url>http://www.cnn.com/</url> <mobileUrl>http://www.cnn.com/</mobileUrl> <reason> <code>400</code> <message>differing content</message> </reason> <useragent> Mozilla/5.0 (Android; Linux armv7l; rv:9.0) Gecko/20111216 Firefox/9.0 Fennec/9.0</useragent> <timeAccessed>2013-07-20 15:23:42</timeAccessed> <error/> <MobileFinder/>
    • 29. Limitations • Crashing • Inconsistent results • Problems executing javascript redirection • Falsely fails when it actually gets the content • Fails to get url of page accessed • Slow
    • 30. Limitations • Client-side Redirects www.golferen.no/wip4/ (right) www.ng.kz/ (below)
    • 31. Analysis Results • Accuracy (of 100 random hand-checked results) • 96 % accurate overall • 1 % inaccurately record not found when there is in fact a mobile version • 3 % inaccurately say mobile found when there is not a mobile version
    • 32. Nytimes desktop vs mobile
    • 33. Rakuten.co.jp desktop vs mobile
    • 34. Are Google’s Suggestions Used? • 28 % found a mobile version following Google’s suggestions • 85 % found as having some sort of mobile version
    • 35. Are Google’s Suggestions Used? • 28 % found following Google’s suggestions • Of the 82% that were found as not following the rules: • 93% missing vary HTTP header • 89% missing alternate and canonical links
    • 36. Are Google’s Suggestions Used? • 28 % found following Google’s suggestions • 85 % found as having some sort of mobile version • Redirect: 35% • “Significantly” different content: 28% • Stylesheets alone: 9% • Stylesheets and media queries: 11% • Media queries alone: 6% • Differing urls (trial and error): 11%
    • 37. End Result • As a whole, mobile web pages do not adhere to Google’s standards • There are no truly consistent ways for finding a mobile version of a site
    • 38. Keith Enlow Heritrix Mobile
    • 39. Introduction • Heritrix 3.1 • Mobile Finder Web Service • 2 Options • Crawl desktop web pages (default) • Crawl mobile web pages using Mobile finder and exclude mobile web pages that use media queries.
    • 40. Experiment • Decision Making Heritrix • Web Service (Mobile Finder) Heritrix • Modified Heritrix 3.1 to include two options for crawling • Option 0: Crawl with desktop user agent • Option 1: Crawl with mobile user agent using Mobile Finder • Added built in mobile user agent adapted from Google Bot • Crawled a small set of URLs • Used Mobile Finder to find if the given URL has mobile version • Wrote a small script to discover differences between the mobile and desktop versions
    • 41. <property name="userAgentTemplate" value="Mozilla/5.0 (compatible; heritrix/@VERISON@+ @OPERATOR_CONTACT_URL@)"/> <property name="userAgentTemplateMobile" value="Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; heritrix/@VERSION@+ @OPERATOR_CONTACT_URL@"/> <!-- Option # = Description 0 [Default] Crawl using desktop user agent 1 Crawl using mobile user agent + Mobile Finder Web Service -- > <property name="CrawlOption" value="0" />
    • 42. URLs Crawled Desktop URL Mobile URL • www.huffingtonpost.com • www.foxnews.com • www.nbcnews.com • www.whitehouse.gov • www.nasa.gov • www.ssa.gov • www.cornell.edu • www.stanford.edu • www.mit.edu • m.huffpost.com • foxnews.mobi • www.nbcnews.com • m.whitehouse.gov • mobile.nasa.gov • www.ssa.gov/mobile • m.cornell.edu/#home • m.stanford.edu • m.mit.edu / mobile.mit.edu
    • 43. Redirection/Delivery • 200 Response (server side redirect) • 302 “Temporary” relocation • 301 “Permanent” relocation • JavaScript Redirection (client side redirect) • Media Queries • Style Sheets
    • 44. Tiny Limits • No JavaScript Engine • Heritrix is unable to perform and execute JavaScript code • Unable to catch client side redirection and will instead continue to crawl the desktop version of the web page. Note: The Mobile Finder Web Service will find the mobile page and therefore Heritrix will continue the crawl. • www.nasa.gov • www.ssa.gov • www.cornell.edu
    • 45. Hufington Fox News NBC News NASA SSA White House Stanford Cornell MIT 56774 12703 8894 4960 2380 8121 2351 2901 120 2134 110 3545 63 53 570 116 94 124 Total Link Count
    • 46. HTML Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 11550 2681 2302 851 20 3251 385 596 12 493 35 488 18 0 76 16 31 26
    • 47. JavaScript Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 245 107 46 589 12 83 104 525 2 33 4 14 8 0 13 4 8 0
    • 48. CSS Distribution Huffington Fox News NBC News NASA SSA White House Stanford Cornell MIT 587 301 72 304 1 154 214 86 3 36 3 17 1 0 19 8 4 3
    • 49. Image Distribution Huffington Fox News NBC NASA SSA White House Stanford Cornell MIT 38671 8893 5852 2908 17 4187 1460 1484 87 1227 59 2769 28 0 436 74 4 89
    • 50. Acknowledgements • Internet Archive aided in Mobile Finder work • Funded by NSF grant 1008492

    ×