Introduction to multi core


Published on

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to multi core

  1. 1. Introduction to Multi-Core  A multi-core processor is an integrated circuit to which two or more processors have been attached.  Leads to o enhanced performance, reduced power consumption, and more efficient simultaneous processing of multiple tasks  What changes expected in software design: o To achieve competitive application performance on these new processors, many applications must be written (or rewritten) as parallel, multithreaded applications. o Multithreaded development can be difficult, expensive, time consuming, and error prone — and it requires new programming skill sets.  Adding cores results in additional overheads and latencies o Serializes execution between communicating and non-communicating cores (e.g. hardware barriers, fences, resource contention) o Various interdependent sources of latency and overhead  Architecture : cache coherency  System: processor scheduling  Application: synchronization o Sensitive to real workloads (e.g. data dependencies) o As the number of cores increase, the size of the overheads and latencies increases  Is Multiprocessor same as Multicore? o Multi-core, multiple cpu cores within a single processor o Multi-processor, multiple processor within a single chip
  2. 2.  For software perspective, we can use either one of the term. Multiprocessor MultiCore Diagram
  3. 3. Example of Multi-core Architecture:  ARMMPCORE  Two basic models of multi-core: o Each core acts independently - “multiple single cores” o Cores cooperate each other – “true multi-core”
  4. 4. What is Multiple single core?  Each core acts independently o Pros  Simplifying a porting from Single core systems  The minimum of interaction between cores – less overhead and more predictable system  No cache coherency issues between the cores  Tools support may remain the same as it was for single core  Good scalability – however depends on hardware support o Cons  Load balancing issues – some cores maybe idle and some overloaded.  Hardware should support this mode of operations by providing I/O Queues for network interfaces. What is True Multi-core?  Cores cooperate each other o Pros  Better possibilities for balance loading meaning more effective usage of system resources  L1 instruction cache can be used more efficiently (cache affinity) o Cons  Porting from single core is typically more complicated  Possible cache coherency issues between the cores  System becomes more complex especially when dependencies exist between tasks. As a result, hard-real time scheduling is harder to achieve
  5. 5.  ??Example of True-Multi-Core designs: Master-Slave, SMP … Different flavors of Multi-core  SMP ( Symmetric Multi-processor) o Identical processor cores o Dynamic Task allocation ( each task can run on any identical processor) o Shared view of Memory  Synhcronization and communication via shared memory o Normally homogeneous CPU arrangement  AMP (Asymmetric Multi-processor) o Static Task allocation ( each processor is assigned a particular kind of task ) o Distributed or common view of memory  Synchronization and communication via message passing mechanism o Either homogeneous or heterogeneous CPU cores  Cache coherency requires special attention  Master-Slave MP architecture o Master core is responsible for all I/O operations and uses all the cores as the slaves. It decides what task each core performs o Slave core do not communicate each other but only thru a master core
  6. 6. OS + Multi-core Design: Each CPU has its own OS • Statically allocate physical memory to each CPU • Each CPU runs its own independents OS • Share peripherals • Each CPU handles its processes system calls • Used in early multiprocessor systems • Simple to implement • Avoids concurrency issues by not sharing • Issues: 1. Each processor has its own scheduling queue. 2. Each processor has its own memory partition. 3. Consistency is an issue with independent disk buffer caches and potentially shared files. OS + Master-Slave Multiprocessors • OS mostly runs on a single fixed CPU. • User-level applications run on the other CPUs. • All system calls are passed to the Master CPU for processing • Very little synchronisation required • Single to implement • Single centralised scheduler to keep all processors busy • Memory can be allocated as needed to all CPUs.
  7. 7. • Issues: Master CPU becomes the bottleneck. OS + SMP • OS kernel runs on all processors, while load and resources are balanced between all processors. • One alternative: A single mutex (mutual exclusion object) that make the entire kernel a large critical section; Only one CPU can be in the kernel at a time; Only slight better than master-slave • Better alternative: Identify independent parts of the kernel and make each of them their own critical section, which allows parallelism in the kernel • Issues: A difficult task; Code is mostly similar to uniprocessor code; hard part is identifying independent parts that don’t interfere with each other • CPUs connected via shared bus to shared memory • Each processor has L1 Cache • Any task can be running on any CPU, every CPU is equal for system • No master-slave configuration • Each processor able to access the entire memory map • Each processor is non-unique and equal power Application porting on Multi-core:  Identify the threads (tasks) that can be executed concurrently by different cores  How to choose these tasks ? o Minimize inter-task dependencies o Each task should have schedulable real-time characteristics for single core o Avoid too short tasks because of overhead
  8. 8. o Keep place for tuning at implementation stage o Identify inter-task dependencies o Inter-task dependencies may cause performance degradation as one core will have to wait for other cores and as a result to missing deadlines. o Inter-task dependencies may affect your scheduler decisions o Define what management and I/O tasks you assign to a “master” core and what is shared between several cores  Memory management can be done both by master core and by all cores  Ethernet and other I/O  DMA o Define a scheduling policies  Take into account cache considerations and multicore : 1. For example, it may be more efficient to co-schedule two tasks that are using the same working set in L2 cache 2. Running several “big” working sets on the different cores thrashing each other in L2 at the same time may be painful 3. Data cache affinity – sometimes it is worth to give priority for task to run on the same core and use advantage of “hot” cache What is Cache Coherency?  Cache coherency is a state where each processor in a multiprocessor system sees the same value for a data item in its cache as the value that is in System Memory.  This state is transparent to the software but affects software performance For Example: • Processor A and B both cache address x • A writes to x – Updates cache • How does B find out?
  9. 9. There are many cache coherence protocols like: – MESI MESI  Modified o Have modified the cached data, must write it back to memory  Exclusive o No other processor has it cached, can be modified  Shared o Not modified, other processors have cached it, if required to change have to inform other processor to invalidate cache line  Invalid o Cached line is no longer valid ( may be some other processor has updated it ) Specifics required to work with an MP core:  Identification o CPUID to uniquely identify CPU to software o Ability to indicate need to memory coherent  Can maintain memory coherency o Caches can participate in MESI protocol  Provides consistent view of memory o With a defined memory ordering o Atomic and synchronization primitives  Communication with peers o IPI o Message passing  Interrupt distribution
  10. 10. o Interrupt distribution unit controlling individual processor Interrupt controller unit Multi-core/Multi-Processor design issues:  Cache coherency  Design of multi-threaded applications for multi-core o Functional decomposition o Domain decomposition ( independent data sets )  Snooping (Cache/Memory snooping)  Interrupt distribution  Processor affinity  Inter-processor Interrupts  Memory access  Concurrency o Interrupt o Instruction/data o Memory/peripherals  Memory consistency/memory ordering model ( by hardware + by compiler optimization)  SMP protection by OS/HW. o Spinlocks o Atomic operations for the basis of all protection tools (ARM LL/SC operation)  Debugging tools  Performance  Profiling Linux SMP Design:  Process affinity
  11. 11. o Each processor has runqueue o Runqueue is list of all active processes , to be scheduled  Load Balancing o To shift process from one overloaded process to another symmetric processor o Part of the scheduler o Should maintain processor affinity for cache efficiency  Interrupt Affinity o Requires help from the hardware interrupt distribution system ( APIC ) o APIC controls interrupt going to only one of the core o Linux interrupt provides cpu_set function to change APIC behavior  Smp_processor_id o Returns CPU identifier for which current code is executing  Per-CPU variable o Define per-cpu memory region at the start of the kernel where per-cpu variables will be placed o Variable associated with a single core o Variable defined as per-cpu creates an array of variables, one per CPU instance.  Spinlock o Disabling preemption and interrupts will not help in MP environment  Big-lock o Introduced in 2.2 kernel to serialize access across the system  What about BH? o Tasklets are executed on the processor that schedules it
  12. 12. Linux SMP booting: