SlideShare a Scribd company logo
Dark Silicon, Mobile
Devices, and Possible Open
Source Solutions
Koan-Sin Tan
freedom@computer.org
COSCUP 2013, Aug. 3rd,TICC,Taipei
Friday, August 23, 13
• Software engineer, veteran open-source user
• Learned something about light-weight
process (LWP) on Sun OS 4.x in early 1990s
• Did a user-level thread library on 386BSD
with a classmate in 1992
• Was involved in big.LITTLE scheduling work
recently
Friday, August 23, 13
Samsung “optimization” for senchmarks
http://www.anandtech.com/show/7187/looking-at-
cpugpu-benchmark-optimizations-galaxy-s-4
Friday, August 23, 13
Friday, August 23, 13
Silicon
Friday, August 23, 13
• “Dark Silicon refers to the exponentially
increasing number of a chip's transistors
that must remain passive, or "dark", in
order to stay within a chip's power budget”
Friday, August 23, 13
Figure from the textbook. We know we are in CMP era.
“Since 2003, the limits of power and available instruction-
level parallelism have slowed uniprocessor performance.”
Friday, August 23, 13
Dennard scaling hits the wall
• Dennard Scaling
• When voltages are scaled along with all dimensions, a device’s electric
fields remain constant, and most device characteristics are preserved
• scaling maintains constant power density
• logic area and power is scaled down by alpha^2
• energy per transition is scaled down by alpha^3, but frequency is
scaled up by 1/alpha, resulting in an alpha^2 decrease in power per
gate
• ........
• google Dennard Scaling you can find more information, such as, http://
www1.cs.columbia.edu/~cs4824/lectures/csee4824_f12_lec22.pdf
Friday, August 23, 13
Mobile Devices
• Both power and thermal constrains are
more severe than desktop devices
• The progress of battery is relatively slow
• You don’t want put a fan into you
smartphone
• conduction, convection, radiation
Friday, August 23, 13
Yes, modern high-end mobile processors have serious
thermal problems.Tegra 4 game console figure from
iFixit
Friday, August 23, 13
Nexus 10 Thermal
Throttling
• Antutu 3.0.2
• Unit for X axis is 200 ms
• It reaches 80 ˚C in 20
second
• Throttling starts at 80 ˚C;
stops at 78 ˚C
• Throttling is to decrement
themaximum freq value of
cpufreq
Friday, August 23, 13
Running&Antutu&on&Octa
0&
200&
400&
600&
800&
1000&
1200&
0&
200000&
400000&
600000&
800000&
1000000&
1200000&
1400000&
1600000&
1&
10&
19&
28&
37&
46&
55&
64&
73&
82&
91&
100&
109&
118&
127&
136&
145&
154&
163&
172&
181&
190&
199&
208&
217&
226&
235&
244&
253&
262&
271&
280&
289&
298&
307&
316&
325&
334&
343&
352&
freq&0&
freq&1&
freq&2&
freq&3&
temp&0&&
temp&1&
temp&2&
temp&3&
Antutu 3.0.2 on S4 Octa
Friday, August 23, 13
Running&Antutu&on&New&One
0&
10&
20&
30&
40&
50&
60&
70&
80&
90&
100&
1&
9&
17&
25&
33&
41&
49&
57&
65&
73&
81&
89&
97&
105&
113&
121&
129&
137&
145&
153&
161&
169&
177&
185&
193&
201&
209&
217&
225&
233&
241&
249&
257&
265&
273&
281&
289&
297&
305&
313&
321&
329&
337&
tz0&
tz1&
tz2&
tz3&
tz4&
tz5&
tz6&
tz7&
tz8&
tz9&
tz10&
tz11&
Antutu 3.0.2 on new One
Friday, August 23, 13
Introducingbig.LITTLE
Figure 28-3 Processor DVFS curves
In a big.LITTLE system these operating points are applied both to the Cortex-A15 and
Cortex-A7 processors. When the Cortex-A7 processor is executing the OS can tune the
operating points as it would for an existing platform with a single applications processor. When
the Cortex-A7 processor is at its highest operating point (Figure 28-3), if more performance is
required a switch is invoked that transfers the OS and applications to the Cortex-A15 processor.
Further DVFS tuning takes place on the Cortex-A15 processor if required, as the operating load
increases.
Migration requires rapid context switching capability. Coherency is clearly a critical enabler in
achieving a fast task migration time as it allows the state that has been saved on the outbound
(migrated from) processor to be snooped and restored on the inbound (migrated to) processor
rather than going via main memory. Additionally, for Cluster migration, (or for CPU migration
when all processors have been switched) because the L2 cache of the outbound processor is
coherent it can remain powered up after a task migration to improve the cache warming time of
ARM big.LITTLE
Friday, August 23, 13
Thread-Level Parallelism
• Thread-level Parallelism (TLP) is
an index you can treat it as
number of threads running
concurrently
• a table from an ISCA ‘10 paper
named “Evolution of thread-level
parallelism in desktop
applications”
• 2000, 2010
• mobile devices are worse
• http://dl.acm.org/citation.cfm?
id=1816000
Friday, August 23, 13
Parallel Programming
Could Help a Bit
• Parallel computing/programming has been there for a long time
• You know pthread and OpenMP are available and C++11 came with currency
support
• Java use thread and its synchronization model
• “Why Threads Are A Bad Idea”, by John Ousterhout, http://www.cc.gatech.edu/
classes/AY2009/cs4210_fall/papers/ousterhout-threads.pdf
• Thread is “easy: to describe; to use; to get wrong” to quote Andrew Birrell,
http://www.cs.princeton.edu/courses/archive/spr07/cos598A/lectures/
Birrell.pdf
• For more theoretical explanation, see “The Problems with Threads” by Edward
Lee, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
• And you know that except shared memory model, there is message passing
computing model. And more, e.g., actors, data-flow, systolic array, etc.
Friday, August 23, 13
Threads are Bad Ideas?
• “Why Threads Are Bad Ideas”, John
Ousterhout, 1995, http://
www.cc.gatech.edu/classes/AY2009/
cs4210_fall/papers/ousterhout-
threads.pdf
• Yes, It’s a bit dated. Some of those
points are no longer valid; many of
them stand the test of time
• Threads:
• Too hard for most
programmers to use
• Even for experts, development
is painful
Friday, August 23, 13
Some of Ousterhout’s
arguments remain valid
• Synchronization
• manually set of mutex/lock
• deadlock: yes deadlock
• hard to debug
• threads breaks modularization
• callbacks don’t work with locks
Friday, August 23, 13
thread is easy to get
wrong
• Manual selection of mutual exclusion:
• Default is too little (and hence races)
• Easy fix is too much (deadlocks or
blank stares)
• Projects don’t create hierarchical
abstractions
• Can’t decide and/or maintain acyclic
locking order
• “Composition” requires entire new
abstractions
• “Clever” optimizations aren’t maintainable
• .....
Friday, August 23, 13
User-level libraries,
frameworks
• Android AsyncTask
• a class to help perform background operations and publish results on the UI
thread without having to manipulate threads and/or handlers
• http://developer.android.com/reference/android/os/AsyncTask.html
• Intel Threading Building Blocks (TBB)
• http://threadingbuildingblocks.org/, http://en.wikipedia.org/wiki/
Intel_Threading_Building_Blocks
• works on Android x86 and ARM
• Apple Grand Central Dispatch (GCD)
• http://developer.apple.com/library/ios/#documentation/Performance/
Reference/GCD_libdispatch_Ref/
• Software Transactional Memory
• http://gcc.gnu.org/wiki/TransactionalMemory
Friday, August 23, 13
Language extension
• Intel Cilk Plus
• http://cilkplus.org/, http://en.wikipedia.org/
wiki/Intel_Cilk_Plus
• open sourced, trying to get into gcc and llvm
• Apple blocks
• http://developer.apple.com/library/ios/
#documentation/cocoa/Conceptual/Blocks/
Friday, August 23, 13
OpenCL Related
• OpenCL
• pocl, http://pocl.sourceforge.net/
• OpenCL and Java
• Aparapi, https://code.google.com/p/aparapi/
• Smuatra, http://openjdk.java.net/projects/sumatra/
• RenderScript
• in AOSP
• ThorScript
• will be open-sourced
Friday, August 23, 13
Cilk Plus: simple language extensions
originated from Charles Leiserson
Friday, August 23, 13
Simple Cilk Plus Example
int fib(int n) {
if (n < 2) return n;
int x = fib(n-1);
int y = fib(n-2);
return x + y;
}
int fib(int n) {
if (n < 2) return n;
int x = clik_spawn fib(n-1);
int y = fib(n-2);
cilk_sync;
return x + y;
}
Friday, August 23, 13
simple GCD+blocks
dispatch_group_t group = dispatch_group_create();
fib = ^() {
if (n < 2) {
result = n;
return;
}
__block int x, y;
int m = n;
n = m - 1;
dispatch_group_async(group, a_queue, ^{fib(); x = result;});
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
n = m - 2;
dispatch_sync(a_queue, ^{fib(); y = result;});
n = m;
result = x + y;
return;
};
Friday, August 23, 13
data parallel fib() looks
more reasonable
int fib(int n) {
if (n < 2) return n;
int p = 0, q = 1, result =0;
cilk_for (int i=2; i <= n; i++) {
result = p + q;
p = q; q = result;
}
return result;
}
TextText
Text
n.b.: in case you didn’t
notice, this may produce
wrong results because of
loop-carried dependency
Friday, August 23, 13
parallel fib() with GCD
and blocks
int(^fib)(int);
fib = ^(int n){
if (n < 2) return n;
__block int p = 0, q = 1, result = 0;
dispatch_apply(n-1, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t i) {
result = p + q;
p = q; q = result;
});
return result;
};
Friday, August 23, 13
GCD is can be used with
OpenCL And GCD
• That’s what is available on Mac OS X and
iOS
• Nope, iOS didn’t open OpenCL yet. But
you can find how to use OpenCL for
ARM on iOS easily
Friday, August 23, 13
What are available
• Task-parallel and data-parallel constructs,
libraries or languguages
• Lambda, closure, continuation, etc.
• Queue, queue management: load balance,
work stealing, etc
• Data structures, e.g.,TBB
• Lock-less synchronization
Friday, August 23, 13
Lockfree synchronization
• In case you didn’t know it, NO, it’s not new
at all
• Linux has been used RCU (Read-Copy-
Update) for several years
• In fact, it’s there since 1970s, see Kung’s
1980 paper proposed RCU-like mechanism.
Friday, August 23, 13
Kernel
• big.LITTLE
• IKS: in-kernel-switcher
• related code being upstreaming after 3.10
• Global Task Scheduling (GTS), Heterogenous Multi-Processor (HMP)
• Current CFS maintainer Ingo didn’t like GTS’s power-saving part
• Power Management
• So many mechanisms: cpufreq, cpuidle, runtime PM, CCF, etc.
• Linaro has a wiki page on how to/what to enable/implement for a new SoC
• Thermal Management
• Throttling, e.g., ask related components to slow down so that less heat will
be generated
Friday, August 23, 13
Linaro In-kernel Switcher
Friday, August 23, 13
Global Task-Scheduling (GTS)
Friday, August 23, 13
Many are remained to be done
• No widely used open-source power or thermal
management framework available?
• Some problems are fundamental hard to
parallelized, e.g.,
• parsing in browser: nowadays, webkit and
firefox use LALR(1) or similar parsing algorithm
• No full-featured open-source OpenCL
implementation for GPGPU
Friday, August 23, 13
Wrap-up
• “dark silicon” is reality on mobile devices,
• power wall and thermal wall
• parallel/concurrent code isn’t popular on
mobile devices (yet)
• discussed some possible free and open
source solutions
• many remained to be done
Friday, August 23, 13

More Related Content

What's hot

【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
Unity Technologies Japan K.K.
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Guy K. Kloss
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
Koan-Sin Tan
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...
Gianluca Padovani
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
Patrick Bos
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
Domino Data Lab
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
Koan-Sin Tan
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
ukdpe
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
Sarah Mount
 
Parallel programming using python
Parallel programming using python Parallel programming using python
Parallel programming using python
Samah Gad
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
Pôle Systematic Paris-Region
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
Patrick Vergain
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
Koan-Sin Tan
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
Travis Oliphant
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
Codemotion
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
kammeyer
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
Dominik Seifert
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
stan_seibert
 

What's hot (20)

【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
 
Parallel programming using python
Parallel programming using python Parallel programming using python
Parallel programming using python
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
 

Similar to Dark Silicon, Mobile Devices, and Possible Open-Source Solutions

Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
Erlang, random numbers, and the security: London Erlang User Group Talk Slide...Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
Kenji Rikitake
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
ESUG
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PROIDEA
 
Using Elasticsearch as the Primary Data Store
Using Elasticsearch as the Primary Data StoreUsing Elasticsearch as the Primary Data Store
Using Elasticsearch as the Primary Data Store
Volkan Yazıcı
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Surasak Sanguanpong
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
Duy-Hieu Bui
 
503
503503
Optimising code using Span<T>
Optimising code using Span<T>Optimising code using Span<T>
Optimising code using Span<T>
Mirco Vanini
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
Ferdinand Jamitzky
 
MIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. GuptaMIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. Gupta
Ashish K Gupta
 
L09.pdf
L09.pdfL09.pdf
IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015
哲也 廣田
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
Taylor Riggan
 
2013 06-ohkawa-heart-presen
2013 06-ohkawa-heart-presen2013 06-ohkawa-heart-presen
2013 06-ohkawa-heart-presen
Takeshi Ohkawa
 
第5回SCDN - Ruby Summer of Code: NArray on OpenCL
第5回SCDN - Ruby Summer of Code: NArray on OpenCL第5回SCDN - Ruby Summer of Code: NArray on OpenCL
第5回SCDN - Ruby Summer of Code: NArray on OpenCL
scdn
 
A Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry PiA Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry Pi
Jian-Hong Pan
 
Why use JavaScript in Hardware? GoTo Conf - Berlin
Why use JavaScript in Hardware? GoTo Conf - Berlin Why use JavaScript in Hardware? GoTo Conf - Berlin
Why use JavaScript in Hardware? GoTo Conf - Berlin
TechnicalMachine
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
rsebbe
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
Jeremy Schneider
 

Similar to Dark Silicon, Mobile Devices, and Possible Open-Source Solutions (20)

Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
Erlang, random numbers, and the security: London Erlang User Group Talk Slide...Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
Erlang, random numbers, and the security: London Erlang User Group Talk Slide...
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Using Elasticsearch as the Primary Data Store
Using Elasticsearch as the Primary Data StoreUsing Elasticsearch as the Primary Data Store
Using Elasticsearch as the Primary Data Store
 
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Experiences in ELK with D3.js  for Large Log Analysis  and VisualizationExperiences in ELK with D3.js  for Large Log Analysis  and Visualization
Experiences in ELK with D3.js for Large Log Analysis and Visualization
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
 
503
503503
503
 
Optimising code using Span<T>
Optimising code using Span<T>Optimising code using Span<T>
Optimising code using Span<T>
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
MIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. GuptaMIS: Computers, Dr. Ashish K. Gupta
MIS: Computers, Dr. Ashish K. Gupta
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015IoT with Ruby/mruby - RubyWorld Conference 2015
IoT with Ruby/mruby - RubyWorld Conference 2015
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
2013 06-ohkawa-heart-presen
2013 06-ohkawa-heart-presen2013 06-ohkawa-heart-presen
2013 06-ohkawa-heart-presen
 
第5回SCDN - Ruby Summer of Code: NArray on OpenCL
第5回SCDN - Ruby Summer of Code: NArray on OpenCL第5回SCDN - Ruby Summer of Code: NArray on OpenCL
第5回SCDN - Ruby Summer of Code: NArray on OpenCL
 
A Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry PiA Journey to Boot Linux on Raspberry Pi
A Journey to Boot Linux on Raspberry Pi
 
Why use JavaScript in Hardware? GoTo Conf - Berlin
Why use JavaScript in Hardware? GoTo Conf - Berlin Why use JavaScript in Hardware? GoTo Conf - Berlin
Why use JavaScript in Hardware? GoTo Conf - Berlin
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Koan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Koan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
Koan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
Koan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
Koan-Sin Tan
 

More from Koan-Sin Tan (8)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
Enterprise Knowledge
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
softsuave
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
DianaGray10
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)
Debmalya Biswas
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
alexjohnson7307
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
Arpan Buwa
 

Recently uploaded (20)

Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
 
Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From ScratchStep-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)Gen AI: Privacy Risks of Large Language Models (LLMs)
Gen AI: Privacy Risks of Large Language Models (LLMs)
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision MakingConnector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
 

Dark Silicon, Mobile Devices, and Possible Open-Source Solutions

  • 1. Dark Silicon, Mobile Devices, and Possible Open Source Solutions Koan-Sin Tan freedom@computer.org COSCUP 2013, Aug. 3rd,TICC,Taipei Friday, August 23, 13
  • 2. • Software engineer, veteran open-source user • Learned something about light-weight process (LWP) on Sun OS 4.x in early 1990s • Did a user-level thread library on 386BSD with a classmate in 1992 • Was involved in big.LITTLE scheduling work recently Friday, August 23, 13
  • 3. Samsung “optimization” for senchmarks http://www.anandtech.com/show/7187/looking-at- cpugpu-benchmark-optimizations-galaxy-s-4 Friday, August 23, 13
  • 6. • “Dark Silicon refers to the exponentially increasing number of a chip's transistors that must remain passive, or "dark", in order to stay within a chip's power budget” Friday, August 23, 13
  • 7. Figure from the textbook. We know we are in CMP era. “Since 2003, the limits of power and available instruction- level parallelism have slowed uniprocessor performance.” Friday, August 23, 13
  • 8. Dennard scaling hits the wall • Dennard Scaling • When voltages are scaled along with all dimensions, a device’s electric fields remain constant, and most device characteristics are preserved • scaling maintains constant power density • logic area and power is scaled down by alpha^2 • energy per transition is scaled down by alpha^3, but frequency is scaled up by 1/alpha, resulting in an alpha^2 decrease in power per gate • ........ • google Dennard Scaling you can find more information, such as, http:// www1.cs.columbia.edu/~cs4824/lectures/csee4824_f12_lec22.pdf Friday, August 23, 13
  • 9. Mobile Devices • Both power and thermal constrains are more severe than desktop devices • The progress of battery is relatively slow • You don’t want put a fan into you smartphone • conduction, convection, radiation Friday, August 23, 13
  • 10. Yes, modern high-end mobile processors have serious thermal problems.Tegra 4 game console figure from iFixit Friday, August 23, 13
  • 11. Nexus 10 Thermal Throttling • Antutu 3.0.2 • Unit for X axis is 200 ms • It reaches 80 ˚C in 20 second • Throttling starts at 80 ˚C; stops at 78 ˚C • Throttling is to decrement themaximum freq value of cpufreq Friday, August 23, 13
  • 14. Introducingbig.LITTLE Figure 28-3 Processor DVFS curves In a big.LITTLE system these operating points are applied both to the Cortex-A15 and Cortex-A7 processors. When the Cortex-A7 processor is executing the OS can tune the operating points as it would for an existing platform with a single applications processor. When the Cortex-A7 processor is at its highest operating point (Figure 28-3), if more performance is required a switch is invoked that transfers the OS and applications to the Cortex-A15 processor. Further DVFS tuning takes place on the Cortex-A15 processor if required, as the operating load increases. Migration requires rapid context switching capability. Coherency is clearly a critical enabler in achieving a fast task migration time as it allows the state that has been saved on the outbound (migrated from) processor to be snooped and restored on the inbound (migrated to) processor rather than going via main memory. Additionally, for Cluster migration, (or for CPU migration when all processors have been switched) because the L2 cache of the outbound processor is coherent it can remain powered up after a task migration to improve the cache warming time of ARM big.LITTLE Friday, August 23, 13
  • 15. Thread-Level Parallelism • Thread-level Parallelism (TLP) is an index you can treat it as number of threads running concurrently • a table from an ISCA ‘10 paper named “Evolution of thread-level parallelism in desktop applications” • 2000, 2010 • mobile devices are worse • http://dl.acm.org/citation.cfm? id=1816000 Friday, August 23, 13
  • 16. Parallel Programming Could Help a Bit • Parallel computing/programming has been there for a long time • You know pthread and OpenMP are available and C++11 came with currency support • Java use thread and its synchronization model • “Why Threads Are A Bad Idea”, by John Ousterhout, http://www.cc.gatech.edu/ classes/AY2009/cs4210_fall/papers/ousterhout-threads.pdf • Thread is “easy: to describe; to use; to get wrong” to quote Andrew Birrell, http://www.cs.princeton.edu/courses/archive/spr07/cos598A/lectures/ Birrell.pdf • For more theoretical explanation, see “The Problems with Threads” by Edward Lee, http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf • And you know that except shared memory model, there is message passing computing model. And more, e.g., actors, data-flow, systolic array, etc. Friday, August 23, 13
  • 17. Threads are Bad Ideas? • “Why Threads Are Bad Ideas”, John Ousterhout, 1995, http:// www.cc.gatech.edu/classes/AY2009/ cs4210_fall/papers/ousterhout- threads.pdf • Yes, It’s a bit dated. Some of those points are no longer valid; many of them stand the test of time • Threads: • Too hard for most programmers to use • Even for experts, development is painful Friday, August 23, 13
  • 18. Some of Ousterhout’s arguments remain valid • Synchronization • manually set of mutex/lock • deadlock: yes deadlock • hard to debug • threads breaks modularization • callbacks don’t work with locks Friday, August 23, 13
  • 19. thread is easy to get wrong • Manual selection of mutual exclusion: • Default is too little (and hence races) • Easy fix is too much (deadlocks or blank stares) • Projects don’t create hierarchical abstractions • Can’t decide and/or maintain acyclic locking order • “Composition” requires entire new abstractions • “Clever” optimizations aren’t maintainable • ..... Friday, August 23, 13
  • 20. User-level libraries, frameworks • Android AsyncTask • a class to help perform background operations and publish results on the UI thread without having to manipulate threads and/or handlers • http://developer.android.com/reference/android/os/AsyncTask.html • Intel Threading Building Blocks (TBB) • http://threadingbuildingblocks.org/, http://en.wikipedia.org/wiki/ Intel_Threading_Building_Blocks • works on Android x86 and ARM • Apple Grand Central Dispatch (GCD) • http://developer.apple.com/library/ios/#documentation/Performance/ Reference/GCD_libdispatch_Ref/ • Software Transactional Memory • http://gcc.gnu.org/wiki/TransactionalMemory Friday, August 23, 13
  • 21. Language extension • Intel Cilk Plus • http://cilkplus.org/, http://en.wikipedia.org/ wiki/Intel_Cilk_Plus • open sourced, trying to get into gcc and llvm • Apple blocks • http://developer.apple.com/library/ios/ #documentation/cocoa/Conceptual/Blocks/ Friday, August 23, 13
  • 22. OpenCL Related • OpenCL • pocl, http://pocl.sourceforge.net/ • OpenCL and Java • Aparapi, https://code.google.com/p/aparapi/ • Smuatra, http://openjdk.java.net/projects/sumatra/ • RenderScript • in AOSP • ThorScript • will be open-sourced Friday, August 23, 13
  • 23. Cilk Plus: simple language extensions originated from Charles Leiserson Friday, August 23, 13
  • 24. Simple Cilk Plus Example int fib(int n) { if (n < 2) return n; int x = fib(n-1); int y = fib(n-2); return x + y; } int fib(int n) { if (n < 2) return n; int x = clik_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } Friday, August 23, 13
  • 25. simple GCD+blocks dispatch_group_t group = dispatch_group_create(); fib = ^() { if (n < 2) { result = n; return; } __block int x, y; int m = n; n = m - 1; dispatch_group_async(group, a_queue, ^{fib(); x = result;}); dispatch_group_wait(group, DISPATCH_TIME_FOREVER); n = m - 2; dispatch_sync(a_queue, ^{fib(); y = result;}); n = m; result = x + y; return; }; Friday, August 23, 13
  • 26. data parallel fib() looks more reasonable int fib(int n) { if (n < 2) return n; int p = 0, q = 1, result =0; cilk_for (int i=2; i <= n; i++) { result = p + q; p = q; q = result; } return result; } TextText Text n.b.: in case you didn’t notice, this may produce wrong results because of loop-carried dependency Friday, August 23, 13
  • 27. parallel fib() with GCD and blocks int(^fib)(int); fib = ^(int n){ if (n < 2) return n; __block int p = 0, q = 1, result = 0; dispatch_apply(n-1, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(size_t i) { result = p + q; p = q; q = result; }); return result; }; Friday, August 23, 13
  • 28. GCD is can be used with OpenCL And GCD • That’s what is available on Mac OS X and iOS • Nope, iOS didn’t open OpenCL yet. But you can find how to use OpenCL for ARM on iOS easily Friday, August 23, 13
  • 29. What are available • Task-parallel and data-parallel constructs, libraries or languguages • Lambda, closure, continuation, etc. • Queue, queue management: load balance, work stealing, etc • Data structures, e.g.,TBB • Lock-less synchronization Friday, August 23, 13
  • 30. Lockfree synchronization • In case you didn’t know it, NO, it’s not new at all • Linux has been used RCU (Read-Copy- Update) for several years • In fact, it’s there since 1970s, see Kung’s 1980 paper proposed RCU-like mechanism. Friday, August 23, 13
  • 31. Kernel • big.LITTLE • IKS: in-kernel-switcher • related code being upstreaming after 3.10 • Global Task Scheduling (GTS), Heterogenous Multi-Processor (HMP) • Current CFS maintainer Ingo didn’t like GTS’s power-saving part • Power Management • So many mechanisms: cpufreq, cpuidle, runtime PM, CCF, etc. • Linaro has a wiki page on how to/what to enable/implement for a new SoC • Thermal Management • Throttling, e.g., ask related components to slow down so that less heat will be generated Friday, August 23, 13
  • 34. Many are remained to be done • No widely used open-source power or thermal management framework available? • Some problems are fundamental hard to parallelized, e.g., • parsing in browser: nowadays, webkit and firefox use LALR(1) or similar parsing algorithm • No full-featured open-source OpenCL implementation for GPGPU Friday, August 23, 13
  • 35. Wrap-up • “dark silicon” is reality on mobile devices, • power wall and thermal wall • parallel/concurrent code isn’t popular on mobile devices (yet) • discussed some possible free and open source solutions • many remained to be done Friday, August 23, 13