Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Lessons from developing a web browser on Raspberry Pi 
DEVIEW 2014 
ChangSeok Oh 
changseok@gnome.org 
DEVIEW 2014 
1
About me 
DEVIEW 2014 2 
ChangSeok Oh 
• IRC nickname : changseok 
• Opensource hacker : WebKit committer, GNOME foundatio...
DEVIEW 2014 3 http://www.commitstrip.com/en/2014/05/07/the-truth-behind-open-source-apps/
DEVIEW 2014 4 
Optimization is EVERYWHERE.
Do developers ever have enough memory and performance? 
DEVIEW 2014 5 
We are always hungry. 
http://images.google.com
Optimization 
DEVIEW 2014 6 
• Dictionary definition 
‣ Make the best or most effective use of a situation or resource 
‣ ...
Possible approaches 
1. Using a better hardware including a faster CPU/GPU & more memory 
2. Parallel programming to take ...
DEVIEW 2014 8 
But what if you can’t do them all?
Raspberry Pi is a good example for such a poor environment 
• Old single core CPU 
‣ ARMv6, 700MHz 
• Very limited system ...
DEVIEW 2014 10 
All problems come from here 
http://en.wikipedia.org/wiki/Raspberry_Pi#mediaviewer/File:Raspberrypi_pcb_ov...
FWIW, Raspberry Pi needs a fast modern browser. 
DEVIEW 2014 11
Raspberry Pi already supports many browsers though… 
Extra information 
12 
Lynx …
Requirements 
DEVIEW 2014 13 
A modern & fast HTML5 browser 
• Multi-Tab browsing 
• HTML5 & CSS3 
• HTML5 Video/Audio sup...
DEVIEW 2014 14
Achievements 
DEVIEW 2014 15 
We’ve improved WebKit1 + Epiphany 
• Progressive tiled rendering for smoother scrolling 
• A...
Achievements 
DEVIEW 2014 16 
We’ve improved WebKit1 + Epiphany 
• Faster fullscreen playback using dispmanx directly 
• H...
DEVIEW 2014 17 
Technologies
Progressive tiled rendering for smoother scrolling 
• Scrolling doesn’t block even if the content is not available, instea...
Avoid useless image format conversions 
• Try to use internal buffers which use the same depth, 16 or 32 bits to prevent f...
Disk image cache 
• We enhanced the disk image cache module of WebKit for the POSIX system. 
• Decoded images are kept int...
Reduction of the number of memory copies to play video 
• The video needs to be blotted on screen and that involves memory...
Video 
gst buffer 
Cairo surface for video 
Cairo surfaces for TBS 
DEVIEW 2014 22 
Video 
gst buffer 
Cairo image surface...
Memory pressure handler support through cgroups 
• Control groups(cgroups) is a Linux kernel feature to limit, account, an...
Better YouTube support 
• HTML5 video is required. 
• YouTube has its own heavy UI 
• Inject some simple javascript code w...
DEVIEW 2014 25
DEVIEW 2014 26
Mouse events are swallowed by this element 
DEVIEW 2014 27 
because of StackingContext!
DEVIEW 2014 28 
<body> 
<div id=“div1” style=“z-index:5”></div> 
<div id=“div2” style=“z-index:2”></div> 
<div id=“div3” s...
DEVIEW 2014 29 
<div class=“html5-video-container” 
style=“z-index:900”> 
<video style=“z-index:auto”> 
<div class=“html5-...
DEVIEW 2014 30 
<video width=“xxx” height=“yyy” 
src=“A video URL extracted from 
youtube” controls />
DEVIEW 2014 31 
var posterData = download_webpage( 
'http://gdata.youtube.com/feeds/ 
api/videos/' + this.videoId + '? 
v=...
Faster fullscreen playback using dispmanx directly 
• Fullscreen mode is a very independent feature. 
‣ It just shows vide...
DEVIEW 2014 33
DEVIEW 2014 34 
Cursor Dispmanx plane 4 
Controls 
Video 
TBS 
Dispmanx plane 2, 3 
Filled with a controls images 
Dispman...
Hardware decoding of image & video through OpenMAX 
• Raspberry Pi supports OpenMAX (shortened as “OMX”) 
• OpenMAX 
‣ A s...
Hardware scaling of video through gst-omx 
• Often the video in web is not displayed at its natural size. It needs to be s...
More responsive UI and scrolling even under heavy load 
• Progressive tiled backing store. 
‣ Progressive tile base render...
Memory & CPU friendly tab management 
• Unload tabs if too many(more than 3) are in use. 
• Slow down javascript on backgr...
Start up is 3x faster 
• Optimized Adblock 
‣ Adblock is built in Epiphany. It’s loaded automatically when startup. 
‣ Use...
Javascript JIT fixes for ARMv6 
• Backported latest JIT related changes into our working WebKit. 
• Bug fix for ARMv6 
DEV...
DEVIEW 2014 41 
Lessons
Lesson 1. Profiling, Profiling & Profiling 
• Measuring cpu, memory and time will show you a way to go. 
• Profiling quite...
DEVIEW 2014 43 
Debug package missed
Lesson 2. Keep watching upstream 
ARMv6 is not a popular AP nowadays. Nobody cares. BUT… 
• You’re not only guy concerning...
Lesson 3. Suspect useless, stupid and repeated things 
• Direct painting, not to use a timer based drawing mechanism. 
• D...
Lesson 4. Just In Time 
• Progressive tiled backing store. 
• Suspend javascript and animations if necessary. 
• Optimized...
Lesson 5. Hackish but feasible then O.K 
• Used mobile version pages for some sites. 
• Better YouTube support by injectin...
Lesson 6. Utilize all available resources in the platform 
• Disk image cache 
• Trade-off between memory and local disk s...
Lesson 7. Careful resource reallocation 
• Throttle video fps up to 30fps. 
• Tune priorities among events 
• Memory press...
Conclusion 
DEVIEW 2014 50 
• Optimization is literally finding the best solutions to fit your 
purpose or platform. 
• It...
DEVIEW 2014 51 
Thank you
Upcoming SlideShare
Loading in …5
×

[1C5]Lessons from developing a web browser for raspberry pi

DEVIEW 2014 [1C5]Lessons from developing a web browser for raspberry pi

  • Login to see the comments

[1C5]Lessons from developing a web browser for raspberry pi

  1. 1. Lessons from developing a web browser on Raspberry Pi DEVIEW 2014 ChangSeok Oh changseok@gnome.org DEVIEW 2014 1
  2. 2. About me DEVIEW 2014 2 ChangSeok Oh • IRC nickname : changseok • Opensource hacker : WebKit committer, GNOME foundation member • Affiliations : Collabora Ltd. (past SAMSUNG Electronics) • Experience : SAMSUNG SmartTV, TIZEN, SmartTV Alliance SDK, WebKit-clutter, Raspberry Pi etc.
  3. 3. DEVIEW 2014 3 http://www.commitstrip.com/en/2014/05/07/the-truth-behind-open-source-apps/
  4. 4. DEVIEW 2014 4 Optimization is EVERYWHERE.
  5. 5. Do developers ever have enough memory and performance? DEVIEW 2014 5 We are always hungry. http://images.google.com
  6. 6. Optimization DEVIEW 2014 6 • Dictionary definition ‣ Make the best or most effective use of a situation or resource ‣ In short, Improve performance & Use resources efficiently • Usually difficult and tedious works • Depends on developer’s experience & know-how
  7. 7. Possible approaches 1. Using a better hardware including a faster CPU/GPU & more memory 2. Parallel programming to take advantages from multi-core CPU 3. Utilizing a GPU through OpenGL/ES to improve rendering performance. 4. Just turning off the screen and going outside to play…? DEVIEW 2014 7
  8. 8. DEVIEW 2014 8 But what if you can’t do them all?
  9. 9. Raspberry Pi is a good example for such a poor environment • Old single core CPU ‣ ARMv6, 700MHz • Very limited system memory ‣ 512MB shared with GPU • Not redundant storage • Bad OpenGL ES integration with windowing system. DEVIEW 2014 9
  10. 10. DEVIEW 2014 10 All problems come from here http://en.wikipedia.org/wiki/Raspberry_Pi#mediaviewer/File:Raspberrypi_pcb_overview_v04.svg
  11. 11. FWIW, Raspberry Pi needs a fast modern browser. DEVIEW 2014 11
  12. 12. Raspberry Pi already supports many browsers though… Extra information 12 Lynx …
  13. 13. Requirements DEVIEW 2014 13 A modern & fast HTML5 browser • Multi-Tab browsing • HTML5 & CSS3 • HTML5 Video/Audio support (YouTube should run well) • Responsive user interface • Low memory footprint
  14. 14. DEVIEW 2014 14
  15. 15. Achievements DEVIEW 2014 15 We’ve improved WebKit1 + Epiphany • Progressive tiled rendering for smoother scrolling • Avoid useless image format conversions • Disk image cache • Reduction of the number of memory copies to play videos • Memory pressure handler support by using cgroup • Better YouTube support including on-demand load of embedded YouTube videos for a much faster page load
  16. 16. Achievements DEVIEW 2014 16 We’ve improved WebKit1 + Epiphany • Faster fullscreen playback using dispmanx directly • Hardware decoding of image & video through OMX • Hardware scaling of video through gst-omx • More responsive UI & scrolling even under heavy load • Memory & CPU friendly tab management • Startup is 3x faster • Javascript JIT fixes for ARMv6
  17. 17. DEVIEW 2014 17 Technologies
  18. 18. Progressive tiled rendering for smoother scrolling • Scrolling doesn’t block even if the content is not available, instead we fill the area with a checkered pattern. DEVIEW 2014 18 http://ariya.ofilabs.com/2011/06/progressive-rendering-via-tiled-backing-store.html
  19. 19. Avoid useless image format conversions • Try to use internal buffers which use the same depth, 16 or 32 bits to prevent format conversions ‣ Raspberry Pi uses 16bit depth(RGB16_565) buffer as default. ‣ Basically images (JPEG, PNG, GIF) and video were decoded into 32 bits depth (ARGB32) buffers. ‣ By using same depth, we could use cairo image surface which can be painted quickly to the target. 32bit) GIF(32bit) 16bit) 16bit) PNG(16bit) JPEG(1326bit) Videos(32bit) DEVIEW 2014 19 TBS(16bit) GtkWidget (16bit)
  20. 20. Disk image cache • We enhanced the disk image cache module of WebKit for the POSIX system. • Decoded images are kept int memory mapped files as caches • Saved CPU by avoiding multiple decodings • Saved memory by using local disk space • Not a magic wand : Big image over 20KB, Animated GIF DEVIEW 2014 20 Decoded image Local disk space Physical memory
  21. 21. Reduction of the number of memory copies to play video • The video needs to be blotted on screen and that involves memory copies for no reason. • If cairo surface of backingstore is a system memory then cairo creates an additional surface which wraps a shm pixmap and copies into this pixmap before copying into the final drawable. ‣ cairo_surface_create_similar • When GdkWindow has already a cairo surface which wraps a X drawable, it is friendly to cairo image surfaces. ‣ Ensured that by calling gdk_cairo_create ‣ cairo_surface_create_similar_image • When used correctly we can prevent cairo from calling XShmCreatePixmap at every copying the backingstore to the window. • Available from gtk+3.10 DEVIEW 2014 21
  22. 22. Video gst buffer Cairo surface for video Cairo surfaces for TBS DEVIEW 2014 22 Video gst buffer Cairo image surface for video SHM pixmap GtkWidget GtkWidget Cairo image surfaces for TBS
  23. 23. Memory pressure handler support through cgroups • Control groups(cgroups) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O etc) of process groups. ‣ Merged into kernel version 2.6.24 ‣ Resource limiting : groups can be set to not exceed a set memory limit ‣ Prioritization : some groups may get a larger share of CPU or disk I/O throughput ‣ Accounting : to measure how much resources certain systems use ‣ Control : freezing groups or checkpointing and restarting. • We implemented memory pressure handler for POSIX systems in webkit by using cgroups. • When the RPi system goes under pressure of memory, we free all unnecessary cache and memory and also run garbage collector to avoid OOM according to a pressure level. • Not a magic wand : If the OOM is caused by other applications, not browser? DEVIEW 2014 23
  24. 24. Better YouTube support • HTML5 video is required. • YouTube has its own heavy UI • Inject some simple javascript code which gets the URL for video stream and create a <video> for it. • Get thumbnails through YouTube Data API, and get video with a similar way with the youtube-dl • This allow us to block some extra JS on YouTube that was using a lot of CPU • Block the comment section on YouTube since it took 30 seconds to fully load. • Embedded YouTube video took too much time to load as well. • We just load a fake placeholder showing the thumbnail and a fake play button. • When a user clicks on it, the real video is actually loaded. This made loading pages with a lot of videos much much faster. DEVIEW 2014 24
  25. 25. DEVIEW 2014 25
  26. 26. DEVIEW 2014 26
  27. 27. Mouse events are swallowed by this element DEVIEW 2014 27 because of StackingContext!
  28. 28. DEVIEW 2014 28 <body> <div id=“div1” style=“z-index:5”></div> <div id=“div2” style=“z-index:2”></div> <div id=“div3” style=“z-index:4”> <div id=“div4” style=“z-index:6”></div> <div id=“div5” style=“z-index:1”></div> <div id=“div6” style=“z-index:3”></div> </div> </body> Stacking Context https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Understanding_z_index/The_stacking_context
  29. 29. DEVIEW 2014 29 <div class=“html5-video-container” style=“z-index:900”> <video style=“z-index:auto”> <div class=“html5-video-controls” style=“z-index:940”> <div class=“html5-video-player”> <div class=“html5-video-info-panel” style=“z-index:960”> ShadowRoot (Container node) MediaControls (HTMLDivElement)
  30. 30. DEVIEW 2014 30 <video width=“xxx” height=“yyy” src=“A video URL extracted from youtube” controls />
  31. 31. DEVIEW 2014 31 var posterData = download_webpage( 'http://gdata.youtube.com/feeds/ api/videos/' + this.videoId + '? v=2&alt=json'); 1.Show a thumbnail and a fake play button 2.On click, inject the video wrapper 3.and then actual video is loaded. ! Pretty useful for heavy pages embedding many YouTube videos. url ='http://www.youtube.com/ watch?v=' + video_id + '%s&gl=US&hl=en&has_verified=1'; video_webpage = download_webpage(url);
  32. 32. Faster fullscreen playback using dispmanx directly • Fullscreen mode is a very independent feature. ‣ It just shows video and controls. ‣ Need to do nothing except copying decoded video frame and drawing controls if necessary. ‣ Do not need to update backingstore at all under fullscreen mode. • Dispmanx ‣ A subset of VideoCore library ‣ A windowing system in the process of being deprecated in favor of OpenWF ‣ Provide useful APIs like creating comprehensible layers to GPU, scaling/moving the layers etc. • We directly wrote a video raw data into a dispmanx plane and scaled it to fit in with a screen through GPU. • Not updating backingstore and scaling video through GPU allow us to save CPU very much. • A fake cursor required since the bad integration of a GPU plane into the windowing system. DEVIEW 2014 32
  33. 33. DEVIEW 2014 33
  34. 34. DEVIEW 2014 34 Cursor Dispmanx plane 4 Controls Video TBS Dispmanx plane 2, 3 Filled with a controls images Dispmanx plane 1 Filled with a video draw data. Scaling is performed by GPU Cairo surface in GtkWidget. Absolutely hidden by Video plane. So we don’t need to update at all. Controls A fake cursor image
  35. 35. Hardware decoding of image & video through OpenMAX • Raspberry Pi supports OpenMAX (shortened as “OMX”) • OpenMAX ‣ A set of C-language programming interfaces that provides abstractions for routines especially useful for audio, video, and still images processing. ‣ Provide 3 layers of interfaces: AL(application layer), IL(integration layer) and DL(development layer) • Especially OpenMAX DL is useful to decode image and video. ‣ AC : Audio Codecs (MP3 decoder & AAC decoder components) - Can’t because of licensing issue! ‣ IC : Image codecs (JPEG components) ‣ IP : Image processing (Generic image processing functions) ‣ SP : Signal Processing (Generic audio processing functions) ‣ VC : Video Codecs (H.264 & MP4 components) • JPEG is decoded with OMX in WebKit • Gst-omx is used to decode video with OMX in gstreamer. ‣ http://cgit.freedesktop.org/gstreamer/gst-omx DEVIEW 2014 35
  36. 36. Hardware scaling of video through gst-omx • Often the video in web is not displayed at its natural size. It needs to be scaled. • We enhanced gst-omx to scale the video through OMX as well. DEVIEW 2014 <video width=“760” height=“340” controls> 36
  37. 37. More responsive UI and scrolling even under heavy load • Progressive tiled backing store. ‣ Progressive tile base rendering on scroll as like mobile browsers do ‣ We can reduce an absolute amount of drawing with TBS so UI event could have more chances to be handled. • Suspend javascript and animation while scrolling ‣ WebKit1 is single threaded for JS and rendering single process so that we could not get the scroll events while JS is running. ‣ But this is not perfect yet since we could not stop running javascript functions • Tune priorities among events ‣ Make sure the handling of the UI event is higher priority than other things. ‣ Tweaking event priority should be conducted very carefully. It’s quite conditional. ‣ ex) Wiggling a mouse may make drawing events fall into a starvation. DEVIEW 2014 37
  38. 38. Memory & CPU friendly tab management • Unload tabs if too many(more than 3) are in use. • Slow down javascript on background tabs. DEVIEW 2014 38
  39. 39. Start up is 3x faster • Optimized Adblock ‣ Adblock is built in Epiphany. It’s loaded automatically when startup. ‣ Use regular expressions only when needed. ‣ Reuse parsed regular expressions instead of recreating the same one every time. ‣ Asynchronously load filters for Adblock. ‣ Avoid running the converter tool used to convert epiphany config files from one version to another if not needed. DEVIEW 2014 39
  40. 40. Javascript JIT fixes for ARMv6 • Backported latest JIT related changes into our working WebKit. • Bug fix for ARMv6 DEVIEW 2014 40
  41. 41. DEVIEW 2014 41 Lessons
  42. 42. Lesson 1. Profiling, Profiling & Profiling • Measuring cpu, memory and time will show you a way to go. • Profiling quite depends on developer’s experience. • Do not hesitate to share your know-how with your colleagues. • Do not be afraid of learning new tools. • Ex) perf tool is very useful on linux. ‣ Install relevant debug packages ‣ sudo apt-get install linux-tools ‣ sudo perf record -a -g -o perf.data ‣ sudo perf report -g -i perf.data DEVIEW 2014 42
  43. 43. DEVIEW 2014 43 Debug package missed
  44. 44. Lesson 2. Keep watching upstream ARMv6 is not a popular AP nowadays. Nobody cares. BUT… • You’re not only guy concerning the problem! • JIT compiler enabled on ARMv6 • Optimized pixman and libav for ARMv6 DEVIEW 2014 44
  45. 45. Lesson 3. Suspect useless, stupid and repeated things • Direct painting, not to use a timer based drawing mechanism. • Disk image cache • Reduction of the number of memory copies to play video • Unique feature, fullscreen mode • Avoid useless image format conversions DEVIEW 2014 45
  46. 46. Lesson 4. Just In Time • Progressive tiled backing store. • Suspend javascript and animations if necessary. • Optimized Adblock DEVIEW 2014 46
  47. 47. Lesson 5. Hackish but feasible then O.K • Used mobile version pages for some sites. • Better YouTube support by injecting custom video tag wrapper. • Faster fullscreen video. DEVIEW 2014 47
  48. 48. Lesson 6. Utilize all available resources in the platform • Disk image cache • Trade-off between memory and local disk space. • OMX(OpenMAX) for decoding video and images • Decode video through GPU, not CPU • OMX for scaling video and images • Scales videos through GPU, not CPU. DEVIEW 2014 48
  49. 49. Lesson 7. Careful resource reallocation • Throttle video fps up to 30fps. • Tune priorities among events • Memory pressure handler by using cgroup • Unload tabs if too many are in use. • Slow down javascript on background tabs. DEVIEW 2014 49
  50. 50. Conclusion DEVIEW 2014 50 • Optimization is literally finding the best solutions to fit your purpose or platform. • It depends on your situation so it could be various ways • SW engineer should not expect a better hardware to do anything instead of you. • No magic, No universal solution for optimization • Imagine your own way, don’t be afraid of trying your idea.
  51. 51. DEVIEW 2014 51 Thank you

×