Multi-core architecture is the present and future way in which the market is addressing Moore’s law limitations. Multi-core workstations, high performance computers, GPUs and the focus on hybrid/ public cloud technologies for offloading and scaling applications is the direction development is heading. Leveraging multiple cores in order to increase application performance and responsiveness is expected especially from classic high-throughput executions such as rendering, simulations, and heavy calculations. Choosing the correct multi-core strategy for your software requirements is essential, making the wrong decision can have serious implications on software performance, scalability, memory usage and other factors. In this overview, we will inspect various considerations for choosing the correct multi-core strategy for your application’s requirement and investigate the pros and cons of multi-threaded development vs multi-process development. For example, Boost’s GIL (Generic Image Library) provides you with the ability to efficiently code image processing algorithms. However, deciding whether your algorithms should be executed as multi-threaded or multi-process has a high impact on your design, coding, future maintenance, scalability, performance, and other factors.
A partial list of considerations to take into account before taking this architectural decision includes:
- How big are the images I need to process
- What risks can I have in terms of race-conditions, timing issues, sharing violations – does it justify multi-threading programming?
- Do I have any special communication and synchronization requirements?
- How much time would it take my customers to execute a large scenario?
- Would I like to scale processing performance by using the cloud or cluster?
We will then examine these issues in real-world environments. In order to learn how this issue is being addressed in a real-world scenario, we will examine common development and testing environments we are using in our daily work and compare the multi-core strategies they have implemented in order to promote higher development productivity.
2. In this session we will discuss…
Multi-threaded vs. Multi-Process
Choosing between Multi-Core or Multi-
Threaded development
Considerations in making your choices
Scalability
Summary and Q&A
3. Moore’s Law Going Multicore
Instead of doing everything in a linear fashion, programmers
need to ensure applications are designed for parallel tasking.
Increase in clock speed, allowing software to automatically
run faster. No longer obliged to Moore’s law
4. Why is it important to choose well?
Time to market
Future product directions
Product’s maintainability
Product’s complexity
5. Multi-threaded Development: Pros
Easy and fast to communicate and share data
between threads (espercially read-only).
Easy to communicate with parent process.
Beneficial for large datasets that can’t be divided to
sub sets.
Supported by many libraries.
6. Multi-threaded Development: Cons
One thread crashes the entire process crashes
Multi-threaded apps are hard to debug
Writing to shared data requires a set of (difficult to
maintain) locks
Too many threads too much time in context-
switching.
Not scalable to other machines / public cloud
7. Multi-process Development: Pros
Includes multi-threaded by definition.
Process crash won’t harm other processes.
Much easier to debug
Encapsulated – easier to manage for large teams
Scalable – distributed computing public cloud
More memory can be consumed
8. Multi-process Development: Cons
Custom development required in order to
communicate between processes
Requires special mechanism to share memory
data between processes
Limited “process-safe” libraries
9. Parellel parts need to sync data?
Data sets size
Ease of development
Scaling - Compute time and throughput
Multi-threaded vs. Multi-core:
List of Considerations
10. Easier in multi-threaded
Easier to debug in multi-threaded
In multi-process, will require custom
development
Many mechanisms exists both for multi-
process and multi-threaded
Synchronization/State Maintenance
11. Pre-made libraries for MT (multi-threaded)
MP - Easier debug/maintenance/code understanding /
less bugs will be introduced with new commits
MP - Can be developed and maintained by
less experienced developers
MP - Requires developer to maintain encapsulation
Bugs are less complicated to solve when MT is not involved
Ease of Development –
Multi-Threaded vs. Multi-Process
12. How many tasks do I have to execute in parallel –
dozens /millions
How many tasks will my product need to support in the
future?
How much time will each task take? Milliseconds or
minutes?
How much memory will each task require?
How large is the data that each task requires?
Throughput & scalability - 1/2
13. How complex is each task’s code?
Is my system real-time?
Would the product benefit from scalability?
Business wise - scalability as a service?
Offloading
Throughput & scalability - 2/2
14. Do my tasks require special hardware?
Will my tasks perform better with specific hardware?
Connection to a database
Other Scalability Questions
15. Common scenarios for using multi-threaded
• Simple scenarios
• Real-time performance
• Long initialization time with short
computation time
• Communication, synchs and states
• Only a few highly intensive tasks
16. Common scenarios for using multi-process
• Multiple, encapsulated modules
• Tasks may require much memory
• Limited synchronization required
• Large application large team
• Scaling High-throughput
• Tasks are complex and prawned to errors
18. Let’s have a look at a real-life scenario
Maven vs Make build tools
• Popular tools
• Many tasks
• Tasks are atomic
• Minimal communication
• Small dataset
• Scalable
19. Scenarios – Build system approach: Maven
• Multi-threaded code was only added
recently:
20. Scenarios – build system approach: Maven
• Not Scalable: Users are asking for it, but it is
still missing multi-process invocation.
• Projects are building many 3rd parties.
29. Common high-throughput scenarios and
decision
Multi-threaded:
Large in-memory read write meshes:
• CAD
• AI (connections between elements)
• Weather forecasting
• Genetic algorithms
Needs all the data to be in-memory and
accessible for all tasks.
30. Common high-throughput scenarios and
decision
Multi-threaded:
Real time performance needs:
• Financial transactions
• Defense systems
• Gaming
Latency and initialization time are very
important.
31. Common high-throughput scenarios and
decision
Multi-Threaded:
Applications which most of the calculation
(business logic) is Database related (stored
procedures, triggers, OLAP)
CRM, ERP, SAP
32. Common high-throughput scenarios and
decision
Multi-threaded:
Fast sequential processing:
• CnC
• Manufacturing line
• Dependent calculations
Next calculation depends on the previous one.
33. Common high-throughput scenarios and
decision
Multi-threaded:
Real time performance needs:
• Financial transactions
• Defense systems
• Gaming
Latency and initialization time are very
important.
34. Common high-throughput scenarios and
decision
Multi-Processing:
Large independent dataset: when we don’t need all
data in-memory at the same time.
Low dependencies between data entities.
• Simulations (transient analysis) – Ansys, PSCad
• Financial derivatives – Morgan Stanley
• Rendering
35. Common high-throughput scenarios and
decision
Multi-Processing:
When we have Many calculations with small business
units.
Solution: batch tasks
• Compilations
• Testing (unit tests, API tests, stress tests)
• Code analysis
• Asset builds
• Math calculations (Banking, Insurance, Diamond
industry).
36. Common high-throughput scenarios and
decision
Multi-Processing:
GPU – applications that scale to use many GPUs
• CUDA, OpenCL
• Rendering
• Password cracking
37. Common high-throughput scenarios and
decision
Multi-Processing:
Output and computation streaming:
NVIDIA Shield, Onlive Service, Netflix, Waze
40. Summary - Writing Scalable Apps:
Architectural considerations:
• Can my execution be broken to many independent tasks?
• Can the work be divided over more than a single host?
• Do I have large datasets I need to work with?
• How many relations are in my dataset?
• How much time would it take my customers to execute a
large scenario – do I have large scenarios?
41. Summary - Writing Scalable app:
Technical considerations:
• Do I have any special communication and
synchronization requirements?
• Dev Complexity - What risks can we have in terms of
race-conditions, timing issues, sharing violations –
does it justify multi-threading programming?
• Would I like to scale processing performance by
using the cloud or cluster?
42. Distributed Computing
Some of the known infrastructures to using
multi-core/multi-machine solutions:
• MapReduce (Google – dedicated grid)
• Hadoop (Open source – dedicated grid)
• Univa Grid Engine(Propriety – dedicated grid)
• IncrediBuild (Propriety - ad-hoc process
virtualization grid)
43.
44. Dori Exterman, IncrediBuild CTO
dori@incredibuild.com
IncrediBuild now free in Visual Studio 2017
www.incredibuild.com
Editor's Notes
Good morning everyone, my name is Dori Exterman. I’m the CTO at IncrediBuild
In this session we will discuss the advantages and disadvantages of multi-threaded programming vs multi-process programming.We will discuss major differences, common scenarios and considerations of choosing one over another.Finally, We will discuss architectural and design considerations on writing scalable applications.
Lets talk a bit about Moore Law: The main guide lines of this law declares that the processor power will be doubled every 2 years. This leads the software developer to believe that if some tasks are slow today, soon the processors clock time will become faster and the problem will be solved by itself.
For the past few years the processors clock time remained more or less the same and it’s obvious today that Moor’s law is not going to stay for long and that the entire industry is going for multiple processing power.
However, unlike the past - In order to benefit from multi-core infrastructure your executions should be designed in a manner that will allow them to utilize multiple core, utilizing multiple cores means
Parallelizing execution tasks, which translates into
Reducing dependencies
Scalable design
Let’s talk about multi-threaded code.
Multi-threading has its merits. First, everyone has a single core computer. Modern OS utilize a single core in a way that simulate multi-threaded apps.The user is led to believe that several apps run at once. For example, When one thread is waiting for I/O to complete another thread can run. I should note that this is not actual multi-processing since the threads are not all active at the same time. Yes, this does allow simulation of multi-core. I call it simulation of multi-core for the poor. Bad simulation. Since this is not actual multi-core and no apps are actually running in parallel. even with a standard 4 cores desktop machine there are still too many threads to run in parallel.While running multi-threaded application on one machine all data can be shared by all threads. This is done by means such as variables and classes. There is no need to replicate data.Another advantage of multi-threaded apps is library support. many available libraries today support multi-threaded apps by providing a ‘thread-safe’ interface. Build-blocks with supporting libraries enabled us to easily construct our own multi-threaded code. Finally, Multi-threaded code has great benefits when using large dataset.
There are also disadvantages in Multi-threaded code. As mentioned earlier Multi-threaded code is not real multi-core. it is not multi core in the sense of parallel computing. Unless appropriate solutions are used. Another problem rises when too many threads are all used at once. Processor has to spend too much time content switching on the expense of actual processing.
Yes, multi-threaded code does allow us to easily share data. But this data is maintained with a set of locks. I should mention that locks are only required when writing is involved. In case of read-only. No locks are required.
It is also difficult to debug multi-threaded apps. Not only difficult but almost impossible. For example, when setting a breakpoint in one of the threads. When this breakpoint occurs it stops not only the thread we wanted but also stops all the others.
Why go multi-core?multi-core does not necessarily mean not multi-threaded. Our code can be both multi-core and multi-threaded.
You also get a real multi-core. you can actually write applications that run in parallel. This is no simulation. This is the real thing. process is independent of other processes in the system – easier to implement stand-alone or independent components in the system using processes.You also have less lock issues. Yes, if the application is implemented as if it was combined as two threads you will suffer the same complexities.But if the data is replicated (and then merged back – when and if needed) the lock issues no longer exist.Another important advantage concerns crashing – when a thread crash it crases the whole process. But a process crashing hardly harm other processes.That is, of course, Unless this is a kernel-space process we are talking about…Finally, if the right infrastructure is used and the code is scalable. You can just add another machine and your performance easily increases. No need to throw away the old machines!
There are also disadvantages in Multi-process code: multi-process code requires custom development in order to communicate between processes, in order to share data and in order to maintain locks and synchronization (where needed). Also, the number of supporting libraries, comparting to thread-safe libraries is relatively small.
Now for one of the more important parts of the talk: can we divide our code and still conquer? Does most of our code need to be in sync with other parts? maybe we can cut it into hundreds of small pieces which run in parallel and put it back together?
is it important for us to maintain our code easily? Debug it easily? Is it important for us to scale it? Another questions that rises - should compute intensive tasks be treated differently?
If your parallel elements (either threads or processes) require large synchronization and are not encapsulated elements , MT (multi-thread) can be a benefit in terms of ease of development because you can easily share information between threads. Sharing information and maintaining states between multiple processes can be more of a challenge as the developer will need to apply his own logical mechanisms. In order to achieve the same in multi-process architecture the developer should implement an element through which the processes will be able to communicate \ synch – such as server application, shared memory, direct (peer 2 peer) tcp communications, database, etc
Achieving this in multi-threading is easier both in development and in debugging
*****
Developing multi-threaded code is easier since many libraries already contain thread-safe methods.On the other hand, it is easier to debug a single process in multi-process environment.
Developing for multi process is easier. It is easier to maintaining encapsulation. Less experienced developers can maintain the code with less difficulty. Easier to debug, easier to maintain, easier to understand and less bugs are likely to happen. When bugs do happen – they are much easier to solve.
How many tasks do I have to run in parallel? What are the present and the future requirements?What is the duration of the tasks? What are the dependencies between the tasks. What are the resource requirements for each task? In term of Memory, CPU and dataset
Scalability offers not only more CPU but can offer more GPU, memory, cumulative I/O power and more and more.
Simple tasks are easier to scale, on the contrary we might not want to scale real-time tasks due to lower latency of inter-process communication.
Multi-process code provides an easier way to get scalability. When working with multi process code you can more easily use clusters \ grid solutions or the public cloud to get more compute resources on demand.
When using a multi-process code you can transform your code from working on a single machine to multiple-machine. There are even some solutions that allow you to scale and use idle machines and cores across your network or public cloud without you needing to code anything in order to achieve this scalability. If we'll have time later on I'll be able to elaborate a bit on that.
In order to verify whether this scenario is relevant for your product you’ll need to consider to points above. It is highly important to consider future requirements as well, because once the product is mature, it will be much more difficult to change its architecture (although doable). If you have millions of very short tasks (5 milli for example), can you batch them together?
For real-time performance you might want to consider multi-threaded especially in scenarios in which the tasks require data to be loaded to work with.
If scalability can be beneficial to your customers, you can consider selling it separately, as a service. Such as cloud services are doing today. There is also the possibility of complete offloading (all heavy tasks will be executed remotely leaving the machine ready for work and also enables a faster tasks completion ). One example is grapics rendering. Where rendering on a local desktop it can be important in graphic rendering).
-- add explanation that multi-threaded must all be off-loaded. Multi-core can benefit also from local machine or off-load some of the tasks.
What about tasks which require special hardware?
In case that my tasks have mandatory need for special hardware - it might be impossible to scale it. If the hardware is only used to accelerate some of the tasks, it is easier to scale it. Although the impact of scaling is not as efficient.What if the process requires connection to database? Too many db connections per process may suffocate your DB. Perhaps it is possible to have a single connection per machine?
Multi-process application can be idle when you have to wait for other process to complete. For example whensome of the processes are using special hardware.
Another scenarios include real-time applications. Which might include long initialization time comparing to short computation. Inter-process communication overhead. In this kind of scenario a multi-process code would impact performance severally.
Another scenarios include real-time applications. Which might include long initialization time comparing to short computation. Inter-process communication overhead. In this kind of scenario a multi-process code would impact performance severally.
Let’s take a look at a real-life scenario: automatic build scripts. In particular make and maven. Both of them are used to build a big project out of small files. Small files – many tasks – where most of them are independent. Once a decision to run a task is made, the task runs independently of other tasks in the same build. Sometimes there is a minimal communication while the tasks are running.We also have a small dataset. Which is mostly read-only. Most of the data is composed of included files and source code. Hence the whole process is scalable. At least of for the local machine.
One of the approaches to system build is maven – maven build configuration is a structured xml with known phases, each phase with a specific (recommended) role. Multi-threaded code support was only added about 18 months ago. What about multi-process?
Unfortunately – Maven does not support multi-processing. In this slide you can see users asking for mutli-process support to increase build speed. Unfortunately for them – the platform lacks the support.
This raises the question: do users really need multi-core?The answer is yes, users need multi-core for several reasons.
First . Programmers building with open-source too often need to compile open-source code as part of their project.Hence They have larger and larger source code which one machine however powerful is slow to compile which impact their productivity and time-to-market a lot! Especially when moving to Agile continuous integration . Another example is DevOps which requires a build to include more tasks such as automated testing, staging, etc
Finally , They require multi-core in order for 3rd parties to develop solution for tasks distribution
This is a slide from the incredibuild’ GUI. In this slide we can see a representation of a Visual Studio build building QT with a single core. To the left we can see machine IP and core id, to the right in colors we see build’s tasks.
This slide visualizes a Visual Studio build building QT with 8 cores.We can see that there are some inefficient utilization of multiple cores
Each solution has its own benefits but that essentially, these solutions could be and were implemented because Make was developed in a multi-process manner. Process distribution can be only applied to multi process and not multi-threaded solutions.
We should also note that most modern build system are multi-process and not multi-threaded.Modern popular multi-process build tools – such as – Visual Studio, Scons, Ninja, JAM, MSBuild, JOM, Gradle, WAF, Ruby and many more.
IncrediBuild’s generic process distribution technology supports all these tools out-of-the-box for example but it is unable to support Maven.
This slide visualizes a Visual Studio build building QT with 8 cores.We can see that there are some inefficient utilization of multiple cores
This slide visualizes a Visual Studio build building QT with 8 cores.We can see that performance was improved by 30% due to better dependency detection and task parallelization and ordering
This slide visualizes a Visual Studio build building QT with 130 cores, from which 122 are remote cores from other machines.We can see that performance was improved by and additional 1000% due to better parallelization and resource utilization
This slide visualizes a Visual Studio build building QT with 130 cores, from which 122 are remote cores from other machines.We can see that performance was improved by and additional 1000% due to better parallelization and resource utilization
Lets talk about more scenarios: how about an application that requires all data to be in-memory at the same time? For example scientific applications, weather forecast simulation and so on.
Some applications are heavily dependent on DB queries. Where most of the queries is read-only and the minority are write-operations. Such applications are CRM, ERP and SAP. It is possible to split the calculation between several processing units – which are processes hence have a multi-process application.
Sometimes we have large dataset with small to no dependencies. In this case it is easier to write a mutli-processing code. For example, code that does billing during a night-job. It has to maintain bookkeeping for all activities of several customers all at once. While there is no dependencies between the customers. This provides a great chance to increase performance by splitting small tasks to many CPUs.
In some cases we need to have many operations on the same input data. In those cases we could split the calculation into many small business units. One example is Sarin. Sarin sells software that analyses diamonds and other gems according to images of the gem. They had thousands of small calculation and were able to parallel all the tasks and achieve significant performance increase.
Another great example is an application which use GPU, GPU usage by nature is off-loading processing from the CPU to the GPU where the more processing units the GPU owns the faster the processing is. Most graphic card manufactures allows seamlessly increase in performance using SLI configurations. SLI configuration allows to put more than one graphic card to the machine in order to increase the number of available GPU processing units. While the driver and the sdk makes the increase of processing units seamless.Another great example is An application made to run with GPUs is highly scalable.
What about streaming services? The nvidia shield which was announces recently is a set-top box which allows anyone to play high-quality heavy processing graphic games at the ease of their home using remote farm of high-performance computers. The onlive service is similar, enabling you to install the OnLive application on any machine. Even aged one and being able to play the best and most graphics intensive games. Without having to add additional hardware.As for Netflix, I believe that I don’t need to elaborate on this one.
When the output is remote we can always distribute the computation and stream out the results.
Why go multi-core?When you need more performance – just add another machine. No need to throw away the old ones.You also get a real multi-core. you can actually write apps that run in parallel. This is no simulation. This is the real thing. Each process is independent of other processes in the system – easier to implement stand-alone or independent components in the system.Last – multi-core does not necessarily mean not multi-threaded. App can be both multi-core and multi-threaded.
Why not go multi-core? first – as mentioned earlier each process is independent of another. It is hared to predict the behavior of the code.Second – we need to write our code in a way that the inter-communication take advantage of multi-core. while some support for multi-threaded code was added to C++11, for example.None was added to support multi-core.