• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 Multithreading and Parallelism on iOS [MobOS 2013]
 

Multithreading and Parallelism on iOS [MobOS 2013]

on

  • 1,483 views

 

Statistics

Views

Total Views
1,483
Views on SlideShare
1,462
Embed Views
21

Actions

Likes
0
Downloads
45
Comments
0

1 Embed 21

http://lanyrd.com 21

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

     Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013] Presentation Transcript

    • Multithreading and Parallelism on iOS Kuba Brecka @kubabrecka ! Mobile Operating Systems Conference MobOS 2013
    • Agenda • Part I: Parallelism and multithreading overview • Part II: Thread-safety, GCD, operation queues • Part III: Synchronization, locking, memory model • Part IV: Performance tuning, ILP • Part V: (at the party) Whatever you’d like to discuss
    • Multithreading and Parallelism on iOS Part I: Parallelism and multithreading overview
    • Quiz 1 int a; ! - (void)method { a = 0; ! ! ! ! } dispatch_queue_t queue = dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_async(queue, ^{ a = 1; }); dispatch_async(queue, ^{ a = 2; }); NSLog(@"%d", a);
    • Quiz 2 int a; ! - (void)method { a = 0; dispatch_queue_t queue = dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); ! dispatch_async(queue, ^{ a = 1; }); while (a == 0) { // wait } ! } NSLog(@"%d", a);
    • Parallelism is a huge topic
    • Terminology • Parallel • Multi-threaded • Concurrent • Simultaneous • Asynchronous
    • Why parallelize? • Responsiveness • “when I scroll, it’s smooth” • Performance • “it works fast” • Energy saving • “it doesn’t drain my battery” • Convenience • some things are parallel by nature, e.g. running two completely separate apps
    • How? • Multiple processes • XPC, fork • Multiple threads • POSIX Threads, NSThread • High-level thread abstraction • Operation queues, dispatch queues • GPGPU • Instruction-level parallelism • superscalar CPUs, pipelining, vector instructions • Multiple PCs • servers, clouds
    • Threads • What is a thread? • It’s an abstraction made by the OS • The CPU has no such concept • Represents a line of calculation • Has an ID, a stack, thread-local storage, priority, CPU registers • Shares memory and resources within a process • The OS scheduler runs/pauses threads • context switching
    • Issues with threading • Race conditions • the result depends on the timing of the scheduler • the behavior is non-deterministic • can result in almost anything • crash, wrong result, corrupted data • So, you have to use locks/mutexes/… • More issues: deadlocks, livelocks, starvation • Even the best guys have trouble with these • Security consequences, vulnerabilities
    • Know your enemy • The compiler • The CPU • The memory • Time • Your brain
    • The iPhone has matured iPhone 4 iPhone 4S iPhone 5 iPhone 5S 512 MB RAM 512 MB RAM 1 GB RAM 1 GB RAM A4 SoC (1 core) A5 SoC (2 core) A6 SoC (2 core) A7 SoC (2 core) 800 MHz 800 MHz 1300 MHz 1300 MHz
    • ARM has matured • Apple A5 (2011) • ARM Cortex-A9 MPCore • 2 cores • out-of-order execution • speculative execution • superscalar, pipelining (8 stages) • NEON 128-bit SIMD • Apple A7 (2013) • ARMv8-A “Cyclone” • 64-bit, 32 registers, per-core L1 cache
    • iOS has matured • The kernel knows a lot more about the system than the developer • GCD • Operation Queues • LLVM, compiler optimizations • GPU computations • Accelerate.framework
    • iOS threading technologies • Multiple processes – forking disabled, no XPC • Low-level threads • POSIX Threads (pthread) • NSThread • -[NSObject performSelectorInBackground:withObject:] • Higher-level abstractions • NSOperationQueue, NSOperation • GCD
    • Is multithreading hard? • Yes, if you don’t know what you’re doing. • But that’s true for anything. • Paul E. McKenney: Is Parallel Programming Hard, And, If So, What Can You Do About It? (2013) • https://www.kernel.org/pub/linux/kernel/people/ paulmck/perfbook/perfbook.html
    • You need to know how it works • The abstractions you use (threads, dispatch queues) are leaky • You still must know how it works below: • CPU • OS • compiler (LLVM) • libraries and 3rd party code you are using • language specification • language implementation • + the abstraction you are using (GCD)
    • You need to know even more • Often you parallelize to get better performance • For this you need to know • CPU architecture details • CPU instruction latencies • memory hierarchy and latencies
    • Parallelizing tasks vs. algorithms • Task = a standalone unit of work • has some inputs, gives some outputs • “add a blur effect to these 1000 photos” • 1 photo = 1 task (independent) • “add a blur effect to this one 5000x5000px photo” • 1 task = ? • Some algorithms simply cannot be parallelized (you will not get any significant speedup)
    • Multithreading and Parallelism on iOS Part II: Thread-safety, GCD, operation queues
    • What’s thread safety? • “Thread-safe object” • you can safely use the object from multiple threads at the same time • the internal state of the object will not get corrupted and it will behave correctly • When you don’t know if an object is thread-safe, you have to assume it isn’t • How do you make your object thread-safe? • immutability, locks, atomic reads/writes
    • Shared mutable state • Exclusive immutable object = no problem • Shared immutable object = no problem • Exclusive mutable object = no problem • Shared mutable object • root of all evil • you always want to minimize this
    • Global variables • “Global variables are bad” • Multi-threading is another very good reason not to use global variables / global state • Global variables are always shared • Watch out for “hidden” global state: • working directory, chdir() • environment variables, putenv()
    • Thread-safety vs. iOS • Terrible lack of proper documentation • Most of the low-level Obj-C runtime is thread-safe • memory management, ARC, weak references, … • Immutable objects (NSString, NSArray, …) are threadsafe • A few other classes are thread-safe • Usually it’s thread-safe to call class methods • google for “iOS thread safety” • https://developer.apple.com/library/ios/ DOCUMENTATION/Cocoa/Conceptual/Multithreading/ ThreadSafetySummary/ThreadSafetySummary.html
    • POSIX threads • “plain threads” • C API • if you want to pass an object to the new thread, you will have issues with memory management • Synchronization • mutexes, conditions, R/W locks, barriers
    • POSIX thread API • pthread_create • pthread_join • mutex • pthread_mutex_init, pthread_mutex_lock, pthread_mutex_unlock • conditions • pthread_cond_init, pthread_cond_signal, pthread_cond_wait
    • NSThread • “plain threads” as well • Obj-C API • mostly just a wrapper around POSIX threads • memory management just works • Synchronize with NSLock, NSCondition, …
    • NSThread API • -[NSThread initWithTarget:selector:object:] • -[NSThread start] • +[NSThread detachNewThreadSelector:toTarget:withObject:] • subclassing NSThread • -[NSObject performSelectorInBackground:withObject:]
    • Thread-specific properties • Thread-local storage • Thread priorities • Autorelease pools • Detached vs. joinable
    • Grand Central Dispatch • Let’s not think about threads • Instead, let’s think about tasks • New concepts: • Tasks • Queues • Queue-specific data • Dispatch groups • Dispatch sources • Synchronization • Semaphores, barriers • C API (!) but has ARC and works with blocks
    • GCD queues • Main queue • there is just one, executed on the main thread • Concurrent queue • tasks run concurrently • 4 pre-made concurrent queues with different priorities • DISPATCH_QUEUE_PRIORITY_DEFAULT, _HIGH, _LOW, _BACKGROUND • you can make your own • Serial queue • only one task at a time, in order • you can make your own
    • GCD task API • Get/create a queue: • dispatch_get_global_queue • dispatch_get_main_queue • dispatch_queue_create • Submit task: • dispatch_sync • dispatch_async • dispatch_apply
    • GCD convenience API • dispatch_once • guarantees the code run only run once • use to implement a proper and fast singleton • dispatch_after • execute the task at a specific time
    • It’s not threads • GCD uses threads, but the threads are completely managed by GCD • You can’t assume your code will run on any specific thread • even two tasks from the same serial queue can run on different threads • Don’t use thread-local storage • Don’t use thread priorities
    • Operation queues • A similar abstraction to GCD, this time you have: • NSOperation • either a block, a method call or custom subclass • concurrent or non-concurrent • dependencies on other NSOperations • support for cancellation • NSOperationQueue • executes the operations, or you can execute an operation directly
    • Operation queues API • -[NSOperationQueue addOperation:] • -[NSOperationQueue addOperationWithBlock:] • -[NSOperation addDependency:] • +[NSBlockOperation blockOperationWithBlock:] • -[NSInvocationOperation initWithTarget:selector:object:]
    • Comparison • POSIX threads, NSThread • thread-based • you have control over the lifetime of threads • overhead when creating • memory-management issues • GCD, operation queues • task-based • nice API with objects/blocks • operation queues • dependencies
    • Run loops and messaging • Avoid shared mutable state • For POSIX threads and NSThreads: • put your thread into an event loop, where it just waits until an event occurs • the main thread has this by default • hidden inside UIApplicationMain • then you can communicate with the thread through: • -[NSObject performSelector:onThread:withObject:waitUntilDone:]
    • Run loop API • +[NSRunLoop currentRunLoop] • -[NSRunLoop run] • you have to add at least one input source or it will return immediately • but you can add an empty port • [NSMachPort port] • -[NSRunLoop addPort:forMode:]
    • Main thread • first thread = main thread = UI thread • all rendering • all layout • scrolling, panning, zooming • user input (touches, on-screen keyboard, external keyboard) • system events • Yes, that’s a lot of work. • 60 FPS = 16 ms per frame • Yes, that’s very little time.
    • Offload the main thread • Goal: Keep the UI thread responsive • Rule: • Do as much work as possible on other threads • Well, but… • Do as little work as possible in the background, that is just enough to keep the main thread responsive • Measure, measure, measure
    • Rendering and animations • Your app doesn’t have access to the GPU/display • Background process called “backboardd” • IPC – rendering commands • Shared memory – backing stores • CAAnimations are transferred to backboardd and performed without any communication with your app
    • Demo 1 https://github.com/kubabrecka/mobos-ios
    • Multithreading and Parallelism on iOS Part III: Synchronization, locking, memory model
    • Demo 2 https://github.com/kubabrecka/mobos-ios
    • Only trust what’s guaranteed • The order of things isn’t guaranteed unless someone tell you: int a, b; // global variables ! // thread 1 b = 20; a = 10; // thread 2 wait for a to be 10 NSLog(@“%d”, b); // ?
    • Solutions • Avoid shared mutable state • communicate by message passing • design your objects as immutable • avoid multithreading • Synchronization • You must always have “a plan” • if you can’t tell which code is supposed to run in which thread, then nobody can help you • if you can’t tell which data can be accessed from which thread, then nobody can help you
    • So what is guaranteed? • Semantics for one thread • “the (single-threaded) code you wrote will have the correct result” • For multi-threaded code, you have to obtain guarantees by using: • Atomic data types, volatile keyword • Locks, semaphores, memory barriers • For 3rd party code, generally you can’t assume anything
    • Atomic types • Which data types are atomic? • Depends on the architecture! • Pointers and “native” integers are usually atomic • What does an atomic data type guarantee? • Also depends on the architecture! • A single read or a single write is usually atomic • Definitely not “i++” • OSAtomicIncrement, …
    • Objective-C atomic properties • @property (atomic) int a; • Only affects auto-generated getters and setters • Again, a single read is atomic, a single write is atomic • Again, “obj.a++” is not atomic • It has no effect on direct member access, obj->a • “atomic” is default
    • Objective-C messaging • Is the order of Obj-C method calls guaranteed? • It seems so, the current compilers don’t optimize through the dynamic dispatch (objc_msgSend) • But it’s still not guaranteed • This might (and probably will) change in the future
    • Volatile keyword • don’t confuse with Java volatile • prevents some compiler optimizations • the variable can change on its own • doesn’t give you atomicity • doesn’t give you ordering • there are better means of synchronization
    • Locks • Mutexes, critical sections • allow only a single thread to be in this part of code at the same time • -[NSLock lock] • -[NSLock unlock] • @synchronized { … } • uses an implicit lock, which exists on each object • handles exceptions • Recursive locks, R/W locks, conditions
    • Lock-free algorithms and data structures • Some concurrent structures (hash tables, queues) can be written without using explicit locks • Currently a major topic in CS • databases • The name is confusing though, there is still a lot of locking happening • cache coherency • memory bus locking for complex atomic operations
    • Memory barriers • Locks can be expensive • Memory barrier ensured ordering without locking • Memory reads and writes happen on the other side of the barrier • But the guarantee is only at the point of the barrier! • OSMemoryBarrier
    • Is the trouble worth it? • Measure! • OK, so you need more than a single thread • use task-level parallelization (GCD) with clear input and output, use immutable data and message passing • Measure again! • OK, so you need more than that • find the bottleneck, don’t assume • is it really the CPU? Isn’t the bottleneck in the memory/ network/disk?
    • Demo 3 https://github.com/kubabrecka/mobos-ios
    • Multithreading and Parallelism on iOS Part IV: Performance tuning, ILP
    • Multithreading isn’t everything • There are plenty of ways to make your code run faster • avoiding unnecessary work • choosing better algorithms • calculations on the GPU • using vector instructions (AVX, SSE, NEON) • hand-optimizing your assembly • tweaking the compiler optimizations
    • The bottleneck • It’s easy to make wrong assumptions • Your bottleneck can be • CPU • Memory • I/O (disk, network) • GPU • There is no “usually”
    • Some common UI issues • Creating UIViews is slow • reuse views, dequeue cells in tables • Loading images is slow • cache images • Rendering is slow • avoid drawRect, consider rasterization of flattened views • Scrolling is slow • don’t do heavy work in scrollViewDidScroll • Rendering shadows is slow • use shadowPath • Rendering layer masks is slow • pre-render
    • Choose your data structures • -[NSArray containsObject:] • O(n) • -[NSSet containsObject:] • O(1)
    • Always profile first • Don’t guess, measure! • Amdahl’s law • Hardware is cheap, programmers are expensive
    • Profiling with Instruments • What can you measure with Instruments? • CPU • utilization • all performance counters (interrupts, syscalls, user/kernel time, …) • Memory • free memory • allocations, leaks, “zombies” • many more performance counters (page faults, cache hits/misses, …) • Network • Battery usage • Display FPS • Single process / multiple processes •…
    • Measure carefully • Instruments isn’t perfect • Sampling is only a statistic method • Real device behave very differently than simulators • Hardware is different • Compiled code is different (both yours and libraries) • Verify your assumptions • In many cases, wrapping your code with two calls to [NSDate date] and subtracting is the best approach
    • Optimize memory/cache accesses • Cache lines (64 B) • Try to linearize memory accesses • Choose correct data structures • array of structs vs. struct of arrays • Aligned memory accesses
    • Instruction-level parallelism • The compiler tries to maximize ILP with scheduling • The main obstacle is data dependency • a series of arithmetic operations which depend on each other simply cannot be parallelized • independent operations are easily parallelized • CPU is superscalar and has deep pipelines • the problem is that often the compiler can’t be sure about the dependency • memory accesses, aliasing • it has to assume the dependency is there
    • Help the compiler • The compiler is smart: • GCC: dead code elimination, common subexpression elimination, forward propagation, loop unrolling, tail call elimination, loop invariant motion, lower complex arithmetic, vectorization, modulo scheduling, … • Sometimes, it would like to be smart, but it can’t: • the C “restrict” keyword (C99): void * memcpy(void * restrict s1, const void * restrict s2, size_t n);
    • Vector instructions • SIMD = Single Instruction Multiple Data • ARM NEON • 128-bit instructions (e.g. 4x 32-bit or 16x 8-bit at once) • LLVM auto-vectorizer • Often you have to change your data structure • alignment • interleaved values
    • Accelerate.framework • Heavily optimized built-in framework for: • image processing • image format conversion and encoding/decoding • DSP, FFT • various general math on “large” data #include <Accelerate/Accelerate.h> ! vFloat vx = { 1.f, 2.f, 3.f, 4.f }; vFloat vy; ... vy = vsinf(vx);
    • Away from the CPU • GPGPU • Only through OpenGL ES shaders • Perfect for image processing (Core Image, GPUImage) • M7 motion coprocessor (iPhone 5S)
    • Thank you for your attention.
    • Multithreading and Parallelism on iOS Kuba Brecka @kubabrecka ! Mobile Operating Systems Conference MobOS 2013