SlideShare a Scribd company logo
1 of 74
Download to read offline
Multithreading and Parallelism on iOS

Kuba Brecka
@kubabrecka
!

Mobile Operating Systems Conference MobOS 2013
Agenda

• Part I: Parallelism and multithreading overview
• Part II: Thread-safety, GCD, operation queues
• Part III: Synchronization, locking, memory model
• Part IV: Performance tuning, ILP
• Part V: (at the party) Whatever you’d like to discuss
Multithreading and Parallelism on iOS
Part I: Parallelism and multithreading overview
Quiz 1
int a;

!

- (void)method
{
a = 0;

!
!

!

!
}

dispatch_queue_t queue = dispatch_get_global_queue(
DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_async(queue, ^{
a = 1;
});
dispatch_async(queue, ^{
a = 2;
});
NSLog(@"%d", a);
Quiz 2
int a;

!

- (void)method
{
a = 0;
dispatch_queue_t queue = dispatch_get_global_queue(
DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);

!

dispatch_async(queue, ^{
a = 1;
});
while (a == 0) {
// wait
}

!
}

NSLog(@"%d", a);
Parallelism is a huge topic
Terminology

• Parallel
• Multi-threaded
• Concurrent
• Simultaneous
• Asynchronous
Why parallelize?

• Responsiveness
• “when I scroll, it’s smooth”

• Performance

• “it works fast”

• Energy saving
• “it doesn’t drain my battery”

• Convenience

• some things are parallel by nature, e.g. running two
completely separate apps
How?
• Multiple processes
• XPC, fork

• Multiple threads
• POSIX Threads, NSThread

• High-level thread abstraction
• Operation queues, dispatch queues

• GPGPU
• Instruction-level parallelism
• superscalar CPUs, pipelining, vector instructions

• Multiple PCs
• servers, clouds
Threads

• What is a thread?
• It’s an abstraction made by the OS
• The CPU has no such concept

• Represents a line of calculation
• Has an ID, a stack, thread-local storage, priority, CPU
registers

• Shares memory and resources within a process

• The OS scheduler runs/pauses threads
• context switching
Issues with threading

• Race conditions
• the result depends on the timing of the scheduler
• the behavior is non-deterministic
• can result in almost anything
• crash, wrong result, corrupted data

• So, you have to use locks/mutexes/…
• More issues: deadlocks, livelocks, starvation

• Even the best guys have trouble with these
• Security consequences, vulnerabilities
Know your enemy

• The compiler
• The CPU
• The memory
• Time
• Your brain
The iPhone has matured

iPhone 4
iPhone 4S
iPhone 5
iPhone 5S
512 MB RAM 512 MB RAM
1 GB RAM
1 GB RAM
A4 SoC (1 core) A5 SoC (2 core) A6 SoC (2 core) A7 SoC (2 core)
800 MHz
800 MHz
1300 MHz
1300 MHz
ARM has matured

• Apple A5 (2011)
• ARM Cortex-A9 MPCore
• 2 cores
• out-of-order execution
• speculative execution
• superscalar, pipelining (8 stages)
• NEON 128-bit SIMD

• Apple A7 (2013)

• ARMv8-A “Cyclone”
• 64-bit, 32 registers, per-core L1 cache
iOS has matured

• The kernel knows a lot more about the system than
the developer

• GCD
• Operation Queues
• LLVM, compiler optimizations
• GPU computations
• Accelerate.framework
iOS threading technologies

• Multiple processes – forking disabled, no XPC
• Low-level threads
• POSIX Threads (pthread)
• NSThread
• -[NSObject performSelectorInBackground:withObject:]

• Higher-level abstractions

• NSOperationQueue, NSOperation
• GCD
Is multithreading hard?

• Yes, if you don’t know what you’re doing.
• But that’s true for anything.

• Paul E. McKenney: Is Parallel Programming Hard,
And, If So, What Can You Do About It? (2013)

• https://www.kernel.org/pub/linux/kernel/people/
paulmck/perfbook/perfbook.html
You need to know how it works

• The abstractions you use (threads, dispatch queues)
are leaky

• You still must know how it works below:
• CPU
• OS
• compiler (LLVM)
• libraries and 3rd party code you are using
• language specification
• language implementation
• + the abstraction you are using (GCD)
You need to know even more

• Often you parallelize to get better performance
• For this you need to know
• CPU architecture details
• CPU instruction latencies
• memory hierarchy and latencies
Parallelizing tasks vs. algorithms

• Task = a standalone unit of work
• has some inputs, gives some outputs

• “add a blur effect to these 1000 photos”
• 1 photo = 1 task (independent)

• “add a blur effect to this one 5000x5000px photo”
• 1 task = ?

• Some algorithms simply cannot be parallelized (you
will not get any significant speedup)
Multithreading and Parallelism on iOS
Part II: Thread-safety, GCD, operation queues
What’s thread safety?

• “Thread-safe object”
• you can safely use the object from multiple threads at
the same time

• the internal state of the object will not get corrupted and
it will behave correctly

• When you don’t know if an object is thread-safe,
you have to assume it isn’t

• How do you make your object thread-safe?
• immutability, locks, atomic reads/writes
Shared mutable state

• Exclusive immutable object = no problem
• Shared immutable object = no problem
• Exclusive mutable object = no problem
• Shared mutable object
• root of all evil
• you always want to minimize this
Global variables

• “Global variables are bad”
• Multi-threading is another very good reason not to
use global variables / global state

• Global variables are always shared
• Watch out for “hidden” global state:
• working directory, chdir()
• environment variables, putenv()
Thread-safety vs. iOS

• Terrible lack of proper documentation
• Most of the low-level Obj-C runtime is thread-safe
• memory management, ARC, weak references, …

• Immutable objects (NSString, NSArray, …) are threadsafe

• A few other classes are thread-safe
• Usually it’s thread-safe to call class methods
• google for “iOS thread safety”
• https://developer.apple.com/library/ios/

DOCUMENTATION/Cocoa/Conceptual/Multithreading/
ThreadSafetySummary/ThreadSafetySummary.html
POSIX threads

• “plain threads”
• C API
• if you want to pass an object to the new thread, you will
have issues with memory management

• Synchronization
• mutexes, conditions, R/W locks, barriers
POSIX thread API

• pthread_create
• pthread_join
• mutex
• pthread_mutex_init, pthread_mutex_lock,
pthread_mutex_unlock

• conditions
• pthread_cond_init, pthread_cond_signal,
pthread_cond_wait
NSThread

• “plain threads” as well
• Obj-C API
• mostly just a wrapper around POSIX threads
• memory management just works

• Synchronize with NSLock, NSCondition, …
NSThread API

• -[NSThread initWithTarget:selector:object:]
• -[NSThread start]
• +[NSThread

detachNewThreadSelector:toTarget:withObject:]

• subclassing NSThread
• -[NSObject

performSelectorInBackground:withObject:]
Thread-specific properties

• Thread-local storage
• Thread priorities
• Autorelease pools
• Detached vs. joinable
Grand Central Dispatch
• Let’s not think about threads
• Instead, let’s think about tasks
• New concepts:
• Tasks
• Queues
• Queue-specific data
• Dispatch groups
• Dispatch sources

• Synchronization
• Semaphores, barriers

• C API (!) but has ARC and works with blocks
GCD queues

• Main queue
• there is just one, executed on the main thread

• Concurrent queue
• tasks run concurrently
• 4 pre-made concurrent queues with different priorities
• DISPATCH_QUEUE_PRIORITY_DEFAULT, _HIGH, _LOW,
_BACKGROUND

• you can make your own

• Serial queue
• only one task at a time, in order
• you can make your own
GCD task API

• Get/create a queue:
• dispatch_get_global_queue
• dispatch_get_main_queue
• dispatch_queue_create
• Submit task:
• dispatch_sync
• dispatch_async
• dispatch_apply
GCD convenience API

• dispatch_once
• guarantees the code run only run once
• use to implement a proper and fast singleton

• dispatch_after

• execute the task at a specific time
It’s not threads

• GCD uses threads, but the threads are completely
managed by GCD

• You can’t assume your code will run on any specific
thread

• even two tasks from the same serial queue can run on
different threads

• Don’t use thread-local storage
• Don’t use thread priorities
Operation queues

• A similar abstraction to GCD, this time you have:
• NSOperation
• either a block, a method call or custom subclass
• concurrent or non-concurrent
• dependencies on other NSOperations
• support for cancellation

• NSOperationQueue
• executes the operations, or you can execute an operation
directly
Operation queues API

• -[NSOperationQueue addOperation:]
• -[NSOperationQueue addOperationWithBlock:]
• -[NSOperation addDependency:]
• +[NSBlockOperation blockOperationWithBlock:]
• -[NSInvocationOperation
initWithTarget:selector:object:]
Comparison

• POSIX threads, NSThread
• thread-based
• you have control over the lifetime of threads
• overhead when creating
• memory-management issues

• GCD, operation queues

• task-based
• nice API with objects/blocks

• operation queues
• dependencies
Run loops and messaging

• Avoid shared mutable state
• For POSIX threads and NSThreads:
• put your thread into an event loop, where it just waits
until an event occurs

• the main thread has this by default
• hidden inside UIApplicationMain

• then you can communicate with the thread through:
• -[NSObject

performSelector:onThread:withObject:waitUntilDone:]
Run loop API

• +[NSRunLoop currentRunLoop]
• -[NSRunLoop run]
• you have to add at least one input source or it will return
immediately

• but you can add an empty port
• [NSMachPort port]

• -[NSRunLoop addPort:forMode:]
Main thread

• first thread = main thread = UI thread
• all rendering
• all layout
• scrolling, panning, zooming
• user input (touches, on-screen keyboard, external keyboard)
• system events

• Yes, that’s a lot of work.
• 60 FPS = 16 ms per frame
• Yes, that’s very little time.
Offload the main thread

• Goal: Keep the UI thread responsive
• Rule:
• Do as much work as possible on other threads

• Well, but…
• Do as little work as possible in the background,
that is just enough to keep the main thread
responsive

• Measure, measure, measure
Rendering and animations

• Your app doesn’t have access to the GPU/display
• Background process called “backboardd”
• IPC – rendering commands
• Shared memory – backing stores

• CAAnimations are transferred to backboardd and

performed without any communication with your
app
Demo 1
https://github.com/kubabrecka/mobos-ios
Multithreading and Parallelism on iOS
Part III: Synchronization, locking, memory model
Demo 2
https://github.com/kubabrecka/mobos-ios
Only trust what’s guaranteed

• The order of things isn’t guaranteed unless
someone tell you:

int a, b; // global variables	
!

// thread 1	
b = 20;	
a = 10;

// thread 2	
wait for a to be 10	
NSLog(@“%d”, b); // ?
Solutions

• Avoid shared mutable state
• communicate by message passing
• design your objects as immutable
• avoid multithreading

• Synchronization
• You must always have “a plan”

• if you can’t tell which code is supposed to run in which
thread, then nobody can help you

• if you can’t tell which data can be accessed from which
thread, then nobody can help you
So what is guaranteed?

• Semantics for one thread
• “the (single-threaded) code you wrote will have the
correct result”

• For multi-threaded code, you have to obtain
guarantees by using:

• Atomic data types, volatile keyword
• Locks, semaphores, memory barriers

• For 3rd party code, generally you can’t assume
anything
Atomic types

• Which data types are atomic?
• Depends on the architecture!
• Pointers and “native” integers are usually atomic
• What does an atomic data type guarantee?
• Also depends on the architecture!
• A single read or a single write is usually atomic
• Definitely not “i++”
• OSAtomicIncrement, …
Objective-C atomic properties

• @property (atomic) int a;
• Only affects auto-generated getters and setters
• Again, a single read is atomic, a single write is
atomic

• Again, “obj.a++” is not atomic
• It has no effect on direct member access, obj->a
• “atomic” is default
Objective-C messaging

• Is the order of Obj-C method calls guaranteed?
• It seems so, the current compilers don’t optimize
through the dynamic dispatch (objc_msgSend)

• But it’s still not guaranteed
• This might (and probably will) change in the future
Volatile keyword

• don’t confuse with Java volatile
• prevents some compiler optimizations
• the variable can change on its own

• doesn’t give you atomicity
• doesn’t give you ordering
• there are better means of synchronization
Locks

• Mutexes, critical sections
• allow only a single thread to be in this part of code at the
same time

• -[NSLock lock]
• -[NSLock unlock]
• @synchronized { … }
• uses an implicit lock, which exists on each object
• handles exceptions

• Recursive locks, R/W locks, conditions
Lock-free algorithms and data structures

• Some concurrent structures (hash tables, queues)
can be written without using explicit locks

• Currently a major topic in CS
• databases

• The name is confusing though, there is still a lot of
locking happening

• cache coherency
• memory bus locking for complex atomic operations
Memory barriers

• Locks can be expensive
• Memory barrier ensured ordering without locking
• Memory reads and writes happen on the other side of
the barrier

• But the guarantee is only at the point of the barrier!

• OSMemoryBarrier
Is the trouble worth it?

• Measure!
• OK, so you need more than a single thread
• use task-level parallelization (GCD) with clear input and
output, use immutable data and message passing

• Measure again!
• OK, so you need more than that
• find the bottleneck, don’t assume
• is it really the CPU? Isn’t the bottleneck in the memory/
network/disk?
Demo 3
https://github.com/kubabrecka/mobos-ios
Multithreading and Parallelism on iOS
Part IV: Performance tuning, ILP
Multithreading isn’t everything

• There are plenty of ways to make your code run
faster

• avoiding unnecessary work
• choosing better algorithms
• calculations on the GPU
• using vector instructions (AVX, SSE, NEON)
• hand-optimizing your assembly
• tweaking the compiler optimizations
The bottleneck

• It’s easy to make wrong assumptions
• Your bottleneck can be
• CPU
• Memory
• I/O (disk, network)
• GPU

• There is no “usually”
Some common UI issues
• Creating UIViews is slow
• reuse views, dequeue cells in tables

• Loading images is slow
• cache images

• Rendering is slow
• avoid drawRect, consider rasterization of flattened views

• Scrolling is slow
• don’t do heavy work in scrollViewDidScroll

• Rendering shadows is slow
• use shadowPath

• Rendering layer masks is slow
• pre-render
Choose your data structures

• -[NSArray containsObject:]
• O(n)

• -[NSSet containsObject:]
• O(1)
Always profile first

• Don’t guess, measure!
• Amdahl’s law
• Hardware is cheap, programmers are expensive
Profiling with Instruments
• What can you measure with Instruments?
• CPU
• utilization
• all performance counters (interrupts, syscalls, user/kernel time, …)

• Memory
• free memory
• allocations, leaks, “zombies”
• many more performance counters (page faults, cache hits/misses, …)

• Network
• Battery usage
• Display FPS
• Single process / multiple processes
•…
Measure carefully

• Instruments isn’t perfect
• Sampling is only a statistic method

• Real device behave very differently than simulators
• Hardware is different
• Compiled code is different (both yours and libraries)

• Verify your assumptions
• In many cases, wrapping your code with two calls to
[NSDate date] and subtracting is the best approach
Optimize memory/cache accesses

• Cache lines (64 B)
• Try to linearize memory accesses
• Choose correct data structures
• array of structs vs. struct of arrays

• Aligned memory accesses
Instruction-level parallelism

• The compiler tries to maximize ILP with scheduling
• The main obstacle is data dependency
• a series of arithmetic operations which depend on each
other simply cannot be parallelized

• independent operations are easily parallelized
• CPU is superscalar and has deep pipelines

• the problem is that often the compiler can’t be sure
about the dependency

• memory accesses, aliasing
• it has to assume the dependency is there
Help the compiler

• The compiler is smart:
• GCC: dead code elimination, common subexpression

elimination, forward propagation, loop unrolling, tail call
elimination, loop invariant motion, lower complex
arithmetic, vectorization, modulo scheduling, …

• Sometimes, it would like to be smart, but it can’t:
• the C “restrict” keyword (C99):
void * memcpy(void * restrict s1, const void * restrict s2, size_t n);
Vector instructions

• SIMD = Single Instruction Multiple Data
• ARM NEON
• 128-bit instructions (e.g. 4x 32-bit or 16x 8-bit at once)

• LLVM auto-vectorizer
• Often you have to change your data structure
• alignment
• interleaved values
Accelerate.framework

• Heavily optimized built-in framework for:
• image processing
• image format conversion and encoding/decoding
• DSP, FFT
• various general math on “large” data
#include <Accelerate/Accelerate.h>	
!

vFloat vx = { 1.f, 2.f, 3.f, 4.f };	
vFloat vy;	
...	
vy = vsinf(vx);
Away from the CPU

• GPGPU
• Only through OpenGL ES shaders
• Perfect for image processing (Core Image, GPUImage)

• M7 motion coprocessor (iPhone 5S)
Thank you for your attention.
Multithreading and Parallelism on iOS

Kuba Brecka
@kubabrecka
!

Mobile Operating Systems Conference MobOS 2013

More Related Content

What's hot

Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in PracticeAlina Dolgikh
 
Java New Evolution
Java New EvolutionJava New Evolution
Java New EvolutionAllan Huang
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrencyfeng lee
 
Java Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsCarol McDonald
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency GotchasAlex Miller
 
MFF UK - Advanced iOS Topics
MFF UK - Advanced iOS TopicsMFF UK - Advanced iOS Topics
MFF UK - Advanced iOS TopicsPetr Dvorak
 
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)Unity Technologies Japan K.K.
 
Introduction to TPL
Introduction to TPLIntroduction to TPL
Introduction to TPLGyuwon Yi
 
Concurrency and Thread-Safe Data Processing in Background Tasks
Concurrency and Thread-Safe Data Processing in Background TasksConcurrency and Thread-Safe Data Processing in Background Tasks
Concurrency and Thread-Safe Data Processing in Background TasksWO Community
 
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろうUnity Technologies Japan K.K.
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?Roman Elizarov
 
Getting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverGetting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverDataStax Academy
 
Tornado - different Web programming
Tornado - different Web programmingTornado - different Web programming
Tornado - different Web programmingDima Malenko
 
Threads v3
Threads v3Threads v3
Threads v3Sunil OS
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and FallaciesRoman Elizarov
 
C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴명신 김
 
Comet with node.js and V8
Comet with node.js and V8Comet with node.js and V8
Comet with node.js and V8amix3k
 
User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014Robert Stupp
 

What's hot (20)

Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
 
Java New Evolution
Java New EvolutionJava New Evolution
Java New Evolution
 
Effective java - concurrency
Effective java - concurrencyEffective java - concurrency
Effective java - concurrency
 
Java Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and Trends
 
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
 
MFF UK - Advanced iOS Topics
MFF UK - Advanced iOS TopicsMFF UK - Advanced iOS Topics
MFF UK - Advanced iOS Topics
 
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)
【Unite 2017 Tokyo】パフォーマンス向上のためのスクリプトのベストプラクティス(note付き)
 
Introduction to TPL
Introduction to TPLIntroduction to TPL
Introduction to TPL
 
Concurrency and Thread-Safe Data Processing in Background Tasks
Concurrency and Thread-Safe Data Processing in Background TasksConcurrency and Thread-Safe Data Processing in Background Tasks
Concurrency and Thread-Safe Data Processing in Background Tasks
 
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう
【Unite 2017 Tokyo】ScriptableObjectを使ってプログラマーもアーティストも幸せになろう
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?
 
Getting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net DriverGetting Started with Datatsax .Net Driver
Getting Started with Datatsax .Net Driver
 
Tornado - different Web programming
Tornado - different Web programmingTornado - different Web programming
Tornado - different Web programming
 
Threads v3
Threads v3Threads v3
Threads v3
 
Objective C Memory Management
Objective C Memory ManagementObjective C Memory Management
Objective C Memory Management
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and Fallacies
 
C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴
 
Comet with node.js and V8
Comet with node.js and V8Comet with node.js and V8
Comet with node.js and V8
 
Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014User defined-functions-cassandra-summit-eu-2014
User defined-functions-cassandra-summit-eu-2014
 

Viewers also liked

Memory management in iOS.
Memory management in iOS.Memory management in iOS.
Memory management in iOS.HSIEH CHING-FAN
 
File system in iOS
File system in iOSFile system in iOS
File system in iOSPurvik Rana
 
Sample pitch deck widescreen (1)
Sample pitch deck widescreen (1)Sample pitch deck widescreen (1)
Sample pitch deck widescreen (1)James Wallace
 
MULTITHREADING CONCEPT
MULTITHREADING CONCEPTMULTITHREADING CONCEPT
MULTITHREADING CONCEPTRAVI MAURYA
 
Multithreading
MultithreadingMultithreading
Multithreadingsagsharma
 
Multithreading
Multithreading Multithreading
Multithreading WafaQKhan
 
iOS file structure and organization
iOS file structure and organizationiOS file structure and organization
iOS file structure and organizationJenny Chang
 
Life Cycle of an iPhone App
Life Cycle of an iPhone AppLife Cycle of an iPhone App
Life Cycle of an iPhone AppJohn McKerrell
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
Objective C Tricks
Objective C TricksObjective C Tricks
Objective C TricksInova LLC
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore ProcesssorsAveen Meena
 
Memory Management on iOS
Memory Management on iOSMemory Management on iOS
Memory Management on iOSMake School
 
iOS Application Lifecycle
iOS Application LifecycleiOS Application Lifecycle
iOS Application LifecycleSiva Prasad K V
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 

Viewers also liked (20)

12 deadlock concept
12 deadlock concept12 deadlock concept
12 deadlock concept
 
Memory management in iOS.
Memory management in iOS.Memory management in iOS.
Memory management in iOS.
 
File system in iOS
File system in iOSFile system in iOS
File system in iOS
 
Sample pitch deck widescreen (1)
Sample pitch deck widescreen (1)Sample pitch deck widescreen (1)
Sample pitch deck widescreen (1)
 
MULTITHREADING CONCEPT
MULTITHREADING CONCEPTMULTITHREADING CONCEPT
MULTITHREADING CONCEPT
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Multithreading
Multithreading Multithreading
Multithreading
 
iOS file structure and organization
iOS file structure and organizationiOS file structure and organization
iOS file structure and organization
 
Life Cycle of an iPhone App
Life Cycle of an iPhone AppLife Cycle of an iPhone App
Life Cycle of an iPhone App
 
Superscalar processors
Superscalar processorsSuperscalar processors
Superscalar processors
 
Multi core processor
Multi core processorMulti core processor
Multi core processor
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
iOS Ecosystem
iOS EcosystemiOS Ecosystem
iOS Ecosystem
 
Multicore Processor Technology
Multicore Processor TechnologyMulticore Processor Technology
Multicore Processor Technology
 
Objective C Tricks
Objective C TricksObjective C Tricks
Objective C Tricks
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore Processsors
 
Memory Management on iOS
Memory Management on iOSMemory Management on iOS
Memory Management on iOS
 
iOS Application Lifecycle
iOS Application LifecycleiOS Application Lifecycle
iOS Application Lifecycle
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 

Similar to Multithreading and Parallelism on iOS [MobOS 2013]

Grand Central Dispatch and multi-threading [iCONdev 2014]
Grand Central Dispatch and multi-threading [iCONdev 2014]Grand Central Dispatch and multi-threading [iCONdev 2014]
Grand Central Dispatch and multi-threading [iCONdev 2014]Kuba Břečka
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" Peter Hlavaty
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Killing Shark-Riding Dinosaurs with ORM
Killing Shark-Riding Dinosaurs with ORMKilling Shark-Riding Dinosaurs with ORM
Killing Shark-Riding Dinosaurs with ORMOrtus Solutions, Corp
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon chinaPeter Hlavaty
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
ITB2017 - Slaying the ORM dragons with cborm
ITB2017 - Slaying the ORM dragons with cbormITB2017 - Slaying the ORM dragons with cborm
ITB2017 - Slaying the ORM dragons with cbormOrtus Solutions, Corp
 
Basic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.jsBasic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.jsGary Yeh
 
Process Doppelgänging
Process Doppelgänging Process Doppelgänging
Process Doppelgänging KarlFrank99
 
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Liang Chen
 
T4T Training day - NodeJS
T4T Training day - NodeJST4T Training day - NodeJS
T4T Training day - NodeJSTim Sommer
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Charles Nutter
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Nexus FrontierTech
 
Java in High Frequency Trading
Java in High Frequency TradingJava in High Frequency Trading
Java in High Frequency TradingViktor Sovietov
 

Similar to Multithreading and Parallelism on iOS [MobOS 2013] (20)

Grand Central Dispatch and multi-threading [iCONdev 2014]
Grand Central Dispatch and multi-threading [iCONdev 2014]Grand Central Dispatch and multi-threading [iCONdev 2014]
Grand Central Dispatch and multi-threading [iCONdev 2014]
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Killing Shark-Riding Dinosaurs with ORM
Killing Shark-Riding Dinosaurs with ORMKilling Shark-Riding Dinosaurs with ORM
Killing Shark-Riding Dinosaurs with ORM
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon china
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
ITB2017 - Slaying the ORM dragons with cborm
ITB2017 - Slaying the ORM dragons with cbormITB2017 - Slaying the ORM dragons with cborm
ITB2017 - Slaying the ORM dragons with cborm
 
Basic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.jsBasic Understanding and Implement of Node.js
Basic Understanding and Implement of Node.js
 
Process Doppelgänging
Process Doppelgänging Process Doppelgänging
Process Doppelgänging
 
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...
 
T4T Training day - NodeJS
T4T Training day - NodeJST4T Training day - NodeJS
T4T Training day - NodeJS
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
 
Java in High Frequency Trading
Java in High Frequency TradingJava in High Frequency Trading
Java in High Frequency Trading
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Multithreading and Parallelism on iOS [MobOS 2013]

  • 1. Multithreading and Parallelism on iOS Kuba Brecka @kubabrecka ! Mobile Operating Systems Conference MobOS 2013
  • 2. Agenda • Part I: Parallelism and multithreading overview • Part II: Thread-safety, GCD, operation queues • Part III: Synchronization, locking, memory model • Part IV: Performance tuning, ILP • Part V: (at the party) Whatever you’d like to discuss
  • 3. Multithreading and Parallelism on iOS Part I: Parallelism and multithreading overview
  • 4. Quiz 1 int a; ! - (void)method { a = 0; ! ! ! ! } dispatch_queue_t queue = dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_async(queue, ^{ a = 1; }); dispatch_async(queue, ^{ a = 2; }); NSLog(@"%d", a);
  • 5. Quiz 2 int a; ! - (void)method { a = 0; dispatch_queue_t queue = dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); ! dispatch_async(queue, ^{ a = 1; }); while (a == 0) { // wait } ! } NSLog(@"%d", a);
  • 6. Parallelism is a huge topic
  • 7. Terminology • Parallel • Multi-threaded • Concurrent • Simultaneous • Asynchronous
  • 8. Why parallelize? • Responsiveness • “when I scroll, it’s smooth” • Performance • “it works fast” • Energy saving • “it doesn’t drain my battery” • Convenience • some things are parallel by nature, e.g. running two completely separate apps
  • 9. How? • Multiple processes • XPC, fork • Multiple threads • POSIX Threads, NSThread • High-level thread abstraction • Operation queues, dispatch queues • GPGPU • Instruction-level parallelism • superscalar CPUs, pipelining, vector instructions • Multiple PCs • servers, clouds
  • 10. Threads • What is a thread? • It’s an abstraction made by the OS • The CPU has no such concept • Represents a line of calculation • Has an ID, a stack, thread-local storage, priority, CPU registers • Shares memory and resources within a process • The OS scheduler runs/pauses threads • context switching
  • 11. Issues with threading • Race conditions • the result depends on the timing of the scheduler • the behavior is non-deterministic • can result in almost anything • crash, wrong result, corrupted data • So, you have to use locks/mutexes/… • More issues: deadlocks, livelocks, starvation • Even the best guys have trouble with these • Security consequences, vulnerabilities
  • 12. Know your enemy • The compiler • The CPU • The memory • Time • Your brain
  • 13. The iPhone has matured iPhone 4 iPhone 4S iPhone 5 iPhone 5S 512 MB RAM 512 MB RAM 1 GB RAM 1 GB RAM A4 SoC (1 core) A5 SoC (2 core) A6 SoC (2 core) A7 SoC (2 core) 800 MHz 800 MHz 1300 MHz 1300 MHz
  • 14. ARM has matured • Apple A5 (2011) • ARM Cortex-A9 MPCore • 2 cores • out-of-order execution • speculative execution • superscalar, pipelining (8 stages) • NEON 128-bit SIMD • Apple A7 (2013) • ARMv8-A “Cyclone” • 64-bit, 32 registers, per-core L1 cache
  • 15. iOS has matured • The kernel knows a lot more about the system than the developer • GCD • Operation Queues • LLVM, compiler optimizations • GPU computations • Accelerate.framework
  • 16. iOS threading technologies • Multiple processes – forking disabled, no XPC • Low-level threads • POSIX Threads (pthread) • NSThread • -[NSObject performSelectorInBackground:withObject:] • Higher-level abstractions • NSOperationQueue, NSOperation • GCD
  • 17. Is multithreading hard? • Yes, if you don’t know what you’re doing. • But that’s true for anything. • Paul E. McKenney: Is Parallel Programming Hard, And, If So, What Can You Do About It? (2013) • https://www.kernel.org/pub/linux/kernel/people/ paulmck/perfbook/perfbook.html
  • 18. You need to know how it works • The abstractions you use (threads, dispatch queues) are leaky • You still must know how it works below: • CPU • OS • compiler (LLVM) • libraries and 3rd party code you are using • language specification • language implementation • + the abstraction you are using (GCD)
  • 19. You need to know even more • Often you parallelize to get better performance • For this you need to know • CPU architecture details • CPU instruction latencies • memory hierarchy and latencies
  • 20. Parallelizing tasks vs. algorithms • Task = a standalone unit of work • has some inputs, gives some outputs • “add a blur effect to these 1000 photos” • 1 photo = 1 task (independent) • “add a blur effect to this one 5000x5000px photo” • 1 task = ? • Some algorithms simply cannot be parallelized (you will not get any significant speedup)
  • 21. Multithreading and Parallelism on iOS Part II: Thread-safety, GCD, operation queues
  • 22. What’s thread safety? • “Thread-safe object” • you can safely use the object from multiple threads at the same time • the internal state of the object will not get corrupted and it will behave correctly • When you don’t know if an object is thread-safe, you have to assume it isn’t • How do you make your object thread-safe? • immutability, locks, atomic reads/writes
  • 23. Shared mutable state • Exclusive immutable object = no problem • Shared immutable object = no problem • Exclusive mutable object = no problem • Shared mutable object • root of all evil • you always want to minimize this
  • 24. Global variables • “Global variables are bad” • Multi-threading is another very good reason not to use global variables / global state • Global variables are always shared • Watch out for “hidden” global state: • working directory, chdir() • environment variables, putenv()
  • 25. Thread-safety vs. iOS • Terrible lack of proper documentation • Most of the low-level Obj-C runtime is thread-safe • memory management, ARC, weak references, … • Immutable objects (NSString, NSArray, …) are threadsafe • A few other classes are thread-safe • Usually it’s thread-safe to call class methods • google for “iOS thread safety” • https://developer.apple.com/library/ios/ DOCUMENTATION/Cocoa/Conceptual/Multithreading/ ThreadSafetySummary/ThreadSafetySummary.html
  • 26. POSIX threads • “plain threads” • C API • if you want to pass an object to the new thread, you will have issues with memory management • Synchronization • mutexes, conditions, R/W locks, barriers
  • 27. POSIX thread API • pthread_create • pthread_join • mutex • pthread_mutex_init, pthread_mutex_lock, pthread_mutex_unlock • conditions • pthread_cond_init, pthread_cond_signal, pthread_cond_wait
  • 28. NSThread • “plain threads” as well • Obj-C API • mostly just a wrapper around POSIX threads • memory management just works • Synchronize with NSLock, NSCondition, …
  • 29. NSThread API • -[NSThread initWithTarget:selector:object:] • -[NSThread start] • +[NSThread detachNewThreadSelector:toTarget:withObject:] • subclassing NSThread • -[NSObject performSelectorInBackground:withObject:]
  • 30. Thread-specific properties • Thread-local storage • Thread priorities • Autorelease pools • Detached vs. joinable
  • 31. Grand Central Dispatch • Let’s not think about threads • Instead, let’s think about tasks • New concepts: • Tasks • Queues • Queue-specific data • Dispatch groups • Dispatch sources • Synchronization • Semaphores, barriers • C API (!) but has ARC and works with blocks
  • 32. GCD queues • Main queue • there is just one, executed on the main thread • Concurrent queue • tasks run concurrently • 4 pre-made concurrent queues with different priorities • DISPATCH_QUEUE_PRIORITY_DEFAULT, _HIGH, _LOW, _BACKGROUND • you can make your own • Serial queue • only one task at a time, in order • you can make your own
  • 33. GCD task API • Get/create a queue: • dispatch_get_global_queue • dispatch_get_main_queue • dispatch_queue_create • Submit task: • dispatch_sync • dispatch_async • dispatch_apply
  • 34. GCD convenience API • dispatch_once • guarantees the code run only run once • use to implement a proper and fast singleton • dispatch_after • execute the task at a specific time
  • 35. It’s not threads • GCD uses threads, but the threads are completely managed by GCD • You can’t assume your code will run on any specific thread • even two tasks from the same serial queue can run on different threads • Don’t use thread-local storage • Don’t use thread priorities
  • 36. Operation queues • A similar abstraction to GCD, this time you have: • NSOperation • either a block, a method call or custom subclass • concurrent or non-concurrent • dependencies on other NSOperations • support for cancellation • NSOperationQueue • executes the operations, or you can execute an operation directly
  • 37. Operation queues API • -[NSOperationQueue addOperation:] • -[NSOperationQueue addOperationWithBlock:] • -[NSOperation addDependency:] • +[NSBlockOperation blockOperationWithBlock:] • -[NSInvocationOperation initWithTarget:selector:object:]
  • 38. Comparison • POSIX threads, NSThread • thread-based • you have control over the lifetime of threads • overhead when creating • memory-management issues • GCD, operation queues • task-based • nice API with objects/blocks • operation queues • dependencies
  • 39. Run loops and messaging • Avoid shared mutable state • For POSIX threads and NSThreads: • put your thread into an event loop, where it just waits until an event occurs • the main thread has this by default • hidden inside UIApplicationMain • then you can communicate with the thread through: • -[NSObject performSelector:onThread:withObject:waitUntilDone:]
  • 40. Run loop API • +[NSRunLoop currentRunLoop] • -[NSRunLoop run] • you have to add at least one input source or it will return immediately • but you can add an empty port • [NSMachPort port] • -[NSRunLoop addPort:forMode:]
  • 41. Main thread • first thread = main thread = UI thread • all rendering • all layout • scrolling, panning, zooming • user input (touches, on-screen keyboard, external keyboard) • system events • Yes, that’s a lot of work. • 60 FPS = 16 ms per frame • Yes, that’s very little time.
  • 42. Offload the main thread • Goal: Keep the UI thread responsive • Rule: • Do as much work as possible on other threads • Well, but… • Do as little work as possible in the background, that is just enough to keep the main thread responsive • Measure, measure, measure
  • 43. Rendering and animations • Your app doesn’t have access to the GPU/display • Background process called “backboardd” • IPC – rendering commands • Shared memory – backing stores • CAAnimations are transferred to backboardd and performed without any communication with your app
  • 45. Multithreading and Parallelism on iOS Part III: Synchronization, locking, memory model
  • 47. Only trust what’s guaranteed • The order of things isn’t guaranteed unless someone tell you: int a, b; // global variables ! // thread 1 b = 20; a = 10; // thread 2 wait for a to be 10 NSLog(@“%d”, b); // ?
  • 48. Solutions • Avoid shared mutable state • communicate by message passing • design your objects as immutable • avoid multithreading • Synchronization • You must always have “a plan” • if you can’t tell which code is supposed to run in which thread, then nobody can help you • if you can’t tell which data can be accessed from which thread, then nobody can help you
  • 49. So what is guaranteed? • Semantics for one thread • “the (single-threaded) code you wrote will have the correct result” • For multi-threaded code, you have to obtain guarantees by using: • Atomic data types, volatile keyword • Locks, semaphores, memory barriers • For 3rd party code, generally you can’t assume anything
  • 50. Atomic types • Which data types are atomic? • Depends on the architecture! • Pointers and “native” integers are usually atomic • What does an atomic data type guarantee? • Also depends on the architecture! • A single read or a single write is usually atomic • Definitely not “i++” • OSAtomicIncrement, …
  • 51. Objective-C atomic properties • @property (atomic) int a; • Only affects auto-generated getters and setters • Again, a single read is atomic, a single write is atomic • Again, “obj.a++” is not atomic • It has no effect on direct member access, obj->a • “atomic” is default
  • 52. Objective-C messaging • Is the order of Obj-C method calls guaranteed? • It seems so, the current compilers don’t optimize through the dynamic dispatch (objc_msgSend) • But it’s still not guaranteed • This might (and probably will) change in the future
  • 53. Volatile keyword • don’t confuse with Java volatile • prevents some compiler optimizations • the variable can change on its own • doesn’t give you atomicity • doesn’t give you ordering • there are better means of synchronization
  • 54. Locks • Mutexes, critical sections • allow only a single thread to be in this part of code at the same time • -[NSLock lock] • -[NSLock unlock] • @synchronized { … } • uses an implicit lock, which exists on each object • handles exceptions • Recursive locks, R/W locks, conditions
  • 55. Lock-free algorithms and data structures • Some concurrent structures (hash tables, queues) can be written without using explicit locks • Currently a major topic in CS • databases • The name is confusing though, there is still a lot of locking happening • cache coherency • memory bus locking for complex atomic operations
  • 56. Memory barriers • Locks can be expensive • Memory barrier ensured ordering without locking • Memory reads and writes happen on the other side of the barrier • But the guarantee is only at the point of the barrier! • OSMemoryBarrier
  • 57. Is the trouble worth it? • Measure! • OK, so you need more than a single thread • use task-level parallelization (GCD) with clear input and output, use immutable data and message passing • Measure again! • OK, so you need more than that • find the bottleneck, don’t assume • is it really the CPU? Isn’t the bottleneck in the memory/ network/disk?
  • 59. Multithreading and Parallelism on iOS Part IV: Performance tuning, ILP
  • 60. Multithreading isn’t everything • There are plenty of ways to make your code run faster • avoiding unnecessary work • choosing better algorithms • calculations on the GPU • using vector instructions (AVX, SSE, NEON) • hand-optimizing your assembly • tweaking the compiler optimizations
  • 61. The bottleneck • It’s easy to make wrong assumptions • Your bottleneck can be • CPU • Memory • I/O (disk, network) • GPU • There is no “usually”
  • 62. Some common UI issues • Creating UIViews is slow • reuse views, dequeue cells in tables • Loading images is slow • cache images • Rendering is slow • avoid drawRect, consider rasterization of flattened views • Scrolling is slow • don’t do heavy work in scrollViewDidScroll • Rendering shadows is slow • use shadowPath • Rendering layer masks is slow • pre-render
  • 63. Choose your data structures • -[NSArray containsObject:] • O(n) • -[NSSet containsObject:] • O(1)
  • 64. Always profile first • Don’t guess, measure! • Amdahl’s law • Hardware is cheap, programmers are expensive
  • 65. Profiling with Instruments • What can you measure with Instruments? • CPU • utilization • all performance counters (interrupts, syscalls, user/kernel time, …) • Memory • free memory • allocations, leaks, “zombies” • many more performance counters (page faults, cache hits/misses, …) • Network • Battery usage • Display FPS • Single process / multiple processes •…
  • 66. Measure carefully • Instruments isn’t perfect • Sampling is only a statistic method • Real device behave very differently than simulators • Hardware is different • Compiled code is different (both yours and libraries) • Verify your assumptions • In many cases, wrapping your code with two calls to [NSDate date] and subtracting is the best approach
  • 67. Optimize memory/cache accesses • Cache lines (64 B) • Try to linearize memory accesses • Choose correct data structures • array of structs vs. struct of arrays • Aligned memory accesses
  • 68. Instruction-level parallelism • The compiler tries to maximize ILP with scheduling • The main obstacle is data dependency • a series of arithmetic operations which depend on each other simply cannot be parallelized • independent operations are easily parallelized • CPU is superscalar and has deep pipelines • the problem is that often the compiler can’t be sure about the dependency • memory accesses, aliasing • it has to assume the dependency is there
  • 69. Help the compiler • The compiler is smart: • GCC: dead code elimination, common subexpression elimination, forward propagation, loop unrolling, tail call elimination, loop invariant motion, lower complex arithmetic, vectorization, modulo scheduling, … • Sometimes, it would like to be smart, but it can’t: • the C “restrict” keyword (C99): void * memcpy(void * restrict s1, const void * restrict s2, size_t n);
  • 70. Vector instructions • SIMD = Single Instruction Multiple Data • ARM NEON • 128-bit instructions (e.g. 4x 32-bit or 16x 8-bit at once) • LLVM auto-vectorizer • Often you have to change your data structure • alignment • interleaved values
  • 71. Accelerate.framework • Heavily optimized built-in framework for: • image processing • image format conversion and encoding/decoding • DSP, FFT • various general math on “large” data #include <Accelerate/Accelerate.h> ! vFloat vx = { 1.f, 2.f, 3.f, 4.f }; vFloat vy; ... vy = vsinf(vx);
  • 72. Away from the CPU • GPGPU • Only through OpenGL ES shaders • Perfect for image processing (Core Image, GPUImage) • M7 motion coprocessor (iPhone 5S)
  • 73. Thank you for your attention.
  • 74. Multithreading and Parallelism on iOS Kuba Brecka @kubabrecka ! Mobile Operating Systems Conference MobOS 2013