Scale In total: Every DAY:
∗400,000 net new users ∗350 million photos ∗5 billion shares ∗10 billion messages ∗1.2 billion users ∗150 billion friendships ∗250 billion photos ∗1 trillion likes ∗16% of all time spent on Internet ∗1 million users per engineer
Scope ∗ Machine learning ∗
Big Data ∗ Search and information retrieval ∗ Performance ∗ Hardware ∗ Network ∗ Human-computer interaction ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Web UI Mobile UI Static analysis Compilers Virtual machines Image processing Video processing Datacentre design
Fixing a bug tldr: We've
found the source for the File Descriptor leaks and we have potential fixes to address the problem. Taking a deeper look at the problem, in order to verify what the real fixes could be, we need to determine the actual destination of where the file descriptors are pointing to. In adb shell you can do so by cd-ing into /proc/<pid>fd and then ls -l to show you the actual destination... A quick glance at the result set reveals that pipes and /dev/ashmem occupies the majority of the open fds, since both items take up almost 50% of all the open fds, these two items are ideal candidates to figure out the fd leaking issue. In short, ashmem stands for Android Shared Memory, and it is used by the Android system to facilitates memory sharing across all processes. Each ashmem registers a shrinker and the shrinker would reclaim the memory when the device is in low memory state, just like jvm, but in the native space. As for pipe in unix world, it is an interprocess channel that places two file descriptors, one for reading and one for writing. My first task is see why the number of pipe fds are building up upon scrolling in newsfeed. When FB4A first started we have around xx open pipes. Scrolling through couple pages will grow the number… To isolate the problem from fb4a, I built the fbsimple app which only contains the newsfeed module and I observed the same behavior. To further isolate the problem, I then turned off image fetching/prefetching to see if the problem is correlated to the image fetching pipeline. Surprisingly, without image fetching I can still see the problem, and I am convinced that the problem affects more than just the image pipeline. My next experiment disabled newsfeed database caching and the same problem persisted, which rules out db access as the main cause of the problem. The only thing left to do is to play around with the network executor. On FB4A by default, we use the HttpClient from Apache to execute all the network requests. Earlier last month, we introduced the SPDY library okHttp as an experiment to replace Apache HttpClient. A quick test reveals that Apache HttpClient is indeed the culprit for the leaking pipes - with the same configuration, okHttp keeps the open fd pipe to around 20 versus 90 with Apache HttpClient. Not only okHttp is better at reusing network connection, it also has better fd management. A sanity check with okhttp enabled in FB4A reveals the same result. Ashmem debugging is rather straight forward - Ashmem is allocated when image fetch is enabled. A deep dive reveals that the FD is only allocated after the bitmap decoding has been called, and I suspected that ashmem has to do with purgeability. To verify, I disabled image cache and instead of relying on a disk file to decode the image, I passed in the http content inputstream directly and used BitmapFactory.decodeStream to decode the image. With this I can confirmed that we are not longer allocating ashmem for decoding because the images are no longer purgeable and lives in the java heap space. However, we ran into the same memory problem with byte decoding experiment, and big images would be black or partially decoded on fb4a. So instead of decoding every images with the stream based approach, I made a quick prototype to have big images (images from single photo stories, multi photos collage) to render with our existing solution and have small images like profile pictures to be rendered with stream and the result looks promising. Scrolling through the list of 1000 people in the flyout now would not grow the number of open fds. I think this hybrid approach would work. With the combined approach as stated above, FB4A now stays around half the open fds.