Multicore 101: Migrating Embedded Apps to Multicore with Linux
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Multicore 101: Migrating Embedded Apps to Multicore with Linux

  • 395 views
Uploaded on

Joint presentation with Ian Forsyth of Freescale Semiconductor (2008)

Joint presentation with Ian Forsyth of Freescale Semiconductor (2008)

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
395
On Slideshare
256
From Embeds
139
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 139

http://opendevincode.wordpress.com 139

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Multicore 101: Migrating Embedded Applications to a Multicore Environment with Linux Presented by MontaVista Software and Freescale Semiconductor Ian Forsyth Senior Enablement Architect Freescale Semiconductor Brad Dixon Director of Product Management MontaVista Software Attend Vision for more in-depth multicore sessions www.mvista.com/Vision
  • 2. Agenda ►The Challenge In Migrating Applications The “Net Effect” • Changing networking topology • The multicore challenge • ►Proposed Multicore Solutions Combined hardware/software • Virtualization and hypervisor • ►The Pathway to Migrating Your Applications Contain – Exploit – Analyze – Optimize • Use the right tools • ►Learn • more and evaluate multicore solutions Evaluate MontaVista TestDrive: Freescale + MontaVista Linux Multicore 101
  • 3. The “Net Effect” Metro Carrier Edge Router IMS Controller SSL, IPSec, Firewall Serving Node Router (GSN) Converged Networking Storage Networks IP Services TelePresence Enterprise Wireless Access Gateway Access Point Aggregation Integrated Services Routers Unified Threat Management Network Admission Control Service Provider Routers Multicore 101 Networking trends drive the need for more performance
  • 4. The Changing Networking Topology ► Layer 4-7 (Application) processing in the network is now common ► Increasing Integration in datacom deployments ► Both driving higher computational capabilities from hardware vendors Multicore 101
  • 5. Why Multicore in Embedded Networks? ► Demand for differentiating features 1xCPU Device Hot-spot Power Limit services are implemented in software running on general purpose CPUs Power ► Advance nxCPU ► Frequency scaling of CPU cores no longer valid, primarily due to power ► Multicore processors viewed as most viable approach Multicore 101 Performance Requirement
  • 6. The Multicore Challenge – It’s All About the Software Multicore Software ► Multicore silicon devices have raced ahead of the embedded software market’s ability to support them L2 Cache Power Architecture™ Core ► Millions of lines of single-threaded legacy code will need to be written in a parallel fashion in order to utilize multicore devices Single-threaded Legacy Software L2 Cache D-Cache L2 Cache Power Architecture™ Core D-Cache Core I-Cache L2 Cache Power Architecture™ Core D-Cache a paradigm shift in how developers must think about and implement future programs I-Cache Power Architecture™ D-Cache ► Creates I-Cache I-Cache L2 Cache Power Architecture™ Core D-Cache I-Cache ► No automated or “quick-fix” approaches for this software migration and paradigm shift – significant programmer effort is required ► Tools and support – simulators, compilers, OS, virtualization packages, performance profilers, debuggers, example applications and training will all be key to the widespread adoption of multicore solutions Multicore 101
  • 7. Multicore Tools and Solutions Software Pyramid Market-specific multicore stacks, apps, libraries. Support green field. Support for standard and OS-dependent programming models, often leveraging multiprocessor. Base multicore infrastructure: Operating System, boot standards. First-rate tools: debuggers, performance and trace analyzers, simulators, compilers. Multicore 101 Stacks N/W Accel Early Code Partitioning Hardware & Software Hypervisor SMP/AMP OS’s Advance Debug Libraries
  • 8. QorIQ™ Solution Platforms Applications Applications IDE (compiler / debugger / build tools) Optimized High-Speed Drivers Hypervisor Simics Virtualized Development Environment Functional Model API Optimized High-Speed Drivers Hypervisor Freescale QorIQ™ Silicon Performance Model Simulation to Hardware: Same Software Freescale-supplied Multicore 101
  • 9. Hybrid Functional/Performance Simulator Functional Model CPU Performance Model CPU Ethernet CPU CPU I/O CPU ROM RAM API Ethernet Bus CPU Hardware Acceleration I/O Hardware Acceleration Functional Mode Simulation - High Speed Periodic Checkpoints Performance Mode Simulation Functional Mode Simulation Simulated Time Multicore 101
  • 10. Virtualization for Reduced Cycle Time Core e200, e300 e500, e600, … A Hybrid Model: Functional Provides programmer's view of the SoC Products and Systems Deterministic Non-invasive SOC Single Simulation Environment MPC8360/MPC8641D MPC8548/MPC8572 Multicore Platform/ … Control of time Systematic control of validation and error Boards Control of cores Control of configuration Systems Performance Force and detect race conditions Optimized solutions Freescale with Virtutech and MontaVista provide a multicore development platform that accelerates software development before and after silicon availability Multicore 101
  • 11. MPC8641/40D Dual Core Block Diagram ► Dual e600 PowerPC cores @ 1.25/1.0 GHz • 1MB L2 Cache w/ECC per core • 36-bit physical addressing ► System Unit • 64b DDR/DDR2 w/ECC • 4x 10/100/1000 Ethernet Controllers ► High-speed Interfaces • 1x/4x SRIO (2.5GB/s) and x1/x2/x4/x8 PCI-Express (4GB/s) • OR two x1/x2/x4/x8 PCI-Express (8GB/s) ► Pin and Software compatible to MC8641D ► Max Power (Watts) • 31.0 W @ 1.25 GHz • 21.0 W @ 1.00 GHz ► Production Availability • 0 to 105C – Now • -40 to 105C – Q408 ► MontaVista • • Multicore 101 commercial support Professional Edition 5.0 Carrier Grade Edition 5.0
  • 12. QorIQ™ P4080 Multicore It’s a smarter approach to multicore. Freescale’s Multicore Platform ► Innovative Multicore Micro-architecture for unprecedented computing efficiency, performance and scalability. • • • On-chip coherency fabric Back-side cache per CPU core On-demand application acceleration Features • Eight e500mc cores • CoreNet™ scales to 32 cores • PCI Express® 2.0, 10GbE • PME 2.0, SEC 4.0 • Data path acceleration • Trust/secure boot • Hypervisor ► Multicore Simulation Environment for accurate, fast code development and debugging. • • • Fully tap the capabilities of the multicore platform Debug software not hardware Dynamic, real-time debug with non-intrusive capture • Standardized debug • Virtualization with real applications • High-performance SoC • Advanced technology • Tier one partnerships ► 45-nm Process Technology for industry-leading power-to-performance solution. • Provides highest instructions-per-cycle (IPC) and frequency for given Milliwatt/area Multicore 101 • Outstanding ecosystem • MontaVista Linux support
  • 13. Datapath Acceleration Architecture QorIQ™ P4 Platform DPAA Network Interfaces Parse Datapath Acceleration Architecture simultaneously enables a lower complexity software environment as well as very high networking performance Congestion Mgmt Classify FMan Steer Policing QMan BMan Stash Context Enqueue Manage Work Q Cores Multicore 101 Accelerators
  • 14. Multicore Operating Systems ► Wide variation of customer use-cases • Multiple operating systems utilized across cores on a single device Proprietary, 3rd party and Open Source multicore operating systems • Symmetric Multi-Processing (SMP) and Asymmetric Multi-Processing (AMP), often running concurrently • Often no OS, or engineered light OS, used on forwarding/data plane cores ► Leverage Power Architecture™ technology’s 3rd party OS ecosystem Freescale embedded Hypervisor Freescale boot standards, including u-boot Leverage open boot protocol and API standards (e.g. Power.org™) Freescale Light Weight Executive (LWE) for run to completion data plane processing Demonstrate performance and provide reference example for customers Services MontaVista Linux® Forwarding/ Data Plane Light Weight Executive MontaVista Linux® AMP Power Architecture™ Core Power Architecture™ Core Control Plane MontaVista Linux® AMP Power Architecture™ Core Power Power Architecture™ Architecture™ Core Core Multicore 101 SMP Power Architecture™ Core Power Architecture™ Core Power Architecture™ Core
  • 15. Light Weight Executive Summary ►The LWE provides a set of services and abstractions to an application ►Focus is on run-to-completion model Application Software on other Cores– e.g. running Linux® Light Weight Executive interaction ►Freescale provides example applications to demonstrate the use of the LWE ►The LWE helps Freescale customers and partners develop functionality using cores as highly optimized accelerators Multicore 101
  • 16. Hypervisor Contrasts Freescale Hypervisor Implementation Guest OS Guest OS CPU Traditional Hypervisor Implementation Guest OS CPU Guest OS CPU Requirement: isolation, performance Requirement: solves problem of under-utilized CPUs, plus isolation Implications: No more than one OS per core, OS has direct control of high-speed peripherals Implications: more than one OS per core, complexity, performance implications QorIQ™ P4080 hypervisor hardware assists in meeting both requirement sets Multicore 101
  • 17. Natural Virtualization via QorIQ™ P4080 Datapath ►Datapath decouples cores and peripherals– allows N cores to share M peripherals ►Accessed by “Portals” that are per-core ►Allows direct and efficient access by cores to many high-speed peripherals Cores can access the same network interface with no SW synchronization because cores have their own portals portal Power Architecture™ Core Network Interface P4080 Datapath portal Power Architecture™ Core Multicore 101
  • 18. Solution Solution = Freescale software + ecosystem software + customer software Partition Mgmt. MontaVista Applications High Level IPC Stacks Example Apps L Stacks Applications High Level IPC W Linux Drivers IPC E Drivers IPC Hypervisor Hypervisor Freescale QorIQ™ Silicon Freescale QorIQ™ Silicon Freescale 3rd Party and/or Customer Multicore 101
  • 19. Market Analysis “Developers overwhelmingly voted for the chip's softwaredevelopment tools as the most important thing when evaluating a new embedded processor.” “The most valuable feature of a chip isn't even the chip itself. Compilers and debuggers trump MIPS and megahertz.” - Jim Turley, ESD Source: Embedded Systems Design Survey Multicore 101
  • 20. Migrating to Multicore: What is the pathway? ►Contain ►Exploit ►Analyze ►Optimize Multicore 101
  • 21. Containment Goal: Migrate application codebase to multicore platform without disruption ►Risk – concurrent execution will expose latent race conditions and synchronization issues ►Technique – utilize Linux's processor and interrupt affinity APIs to contain your application's threads and processes to a single core Multicore 101
  • 22. Containment Your App Housekeeping Utilities Multicore 101
  • 23. Containment Housekeeping Utilities Your App Your App Housekeeping Utilities Multicore 101
  • 24. Containment Housekeeping Utilities Your App Your App Housekeeping Utilities Benefits: ► Delay exposing latent concurrency defects ► Easily gain an efficiency boost by exploiting available cores ► I/D/L2 cache efficiency by minimizing scheduler bounces Multicore 101
  • 25. Migration with Containment The designer can explicitly control which CPUs are permitted to handle particular threads and interrupts Shown on Freescale 8641D multicore processor Multicore 101
  • 26. A Quick Sidebar… ►Why SMP? ►Linux's long march to multicore ►On virtualization Multicore 101
  • 27. Why SMP? ►Multicore CPU's can permit a number of processing scenarios ►SMP maximizes run-time flexibility to match CPU to the needs of the moment ►SMP ends up playing a role in many system architectures ►Combined with a hypervisor SMP does not exclude any other design options Multicore 101
  • 28. Linux’s Long March to Multicore ►Linux has been MC ready for years ►Kernel, drivers, protocol stacks, and apps are ready ►As core count scales the focus shifts to exploiting MC at the application layer Multicore 101
  • 29. On Virtualization… ►Difficulties applying virtualization to telecom/datacom The isolation vs. latency trade-off • Hardware contention • I/O devices • ►Hardware support minimizes virtualization overhead Multicore 101
  • 30. Sidebar Summary ►SMP is the natural way for Linux to exploit multicore processors. ►Hypervisors can permit new flexibilities ►New hardware features are making hypervisor based architectures more efficient to use Multicore 101
  • 31. Migrating to Multicore: What is the Pathway? ►Contain • Migrate to multicore but contain code to a single core ►Exploit ►Analyze ►Optimize Multicore 101
  • 32. Exploit Goal: Identify code that will benefit from multicore execution and modify code to exploit available cores Multicore 101
  • 33. Application Architectures to Exploit MC Objective: scale efficiently across multiple cores so that more client work can be handled rapidly ► Key question is how to map client requests (or packets) to workers quickly and obtain speed-up from multicore Multicore 101
  • 34. Application Characteristics ►Each request requires a small amount of work ►Requests are largely independent of each other ►Requires read-only access to a moderate amount of state ►Small amount of state may travel with the request ►Must be able to manage overload effectively Multicore 101
  • 35. Application Characteristics ►Each request requires a small amount of work ►Requests are largely independent of each other ►Requires read-only access to a moderate amount of state ►Small amount of state may travel with the request ►Must be able to manage overload effectively ►Some Multicore 101 anti-patterns Non-concurrent • Process/Thread per client • Spawn process/thread per request • HPC message passing such as MPI •
  • 36. Application Characteristics ►Each request requires a small amount of work ►Requests are largely independent of each other ►Requires read-only access to a moderate amount of state ►Small amount of state may travel with the request ►Must be able to manage overload effectively ►Some anti-patterns Non-concurrent • Process/Thread per client • Spawn process/thread per request • HPC message passing such as MPI • For telecom/datacom applications an event driven architecture is ideal to facilitate multicore migration Multicore 101
  • 37. Sample Application Architecture Similar to that used by memcached & Apache ► Dispatcher can handle overload, monitoring, etc. ► Multicore awareness only for central services ► Plugable Dispatcher is feasible if planned correctly ► Managing global, per service, per session, and per request state is the battleground for scalability Multicore 101
  • 38. Migrating to Multicore: What is the Pathway? ►Contain • Migrate to multicore but contain code to a single core ►Exploit • Use an event driven architecture to add explicit functional parallelism ►Analyze ►Optimize Multicore 101
  • 39. Analyze Goal: Understand MC performance bottlenecks and diagnose unexpected faults ► Benchmark first... the bottlenecks may not be where you think they are Multicore 101
  • 40. Analysis Tools Profiling Can be used for far more than CPU cycles per function or line • e500mc core has a rich set of performance attributes it can monitor • MontaVista DevRocket can use oprofile to collect and correlate this data to your code • Runtime Monitoring • “top” in SMP mode will give you a broad overview of CPU stats Tracing • Fine grained CPU-aware tracing Multicore 101
  • 41. MontaVista DevRocket Analysis Tools Multicore 101
  • 42. MontaVista DevRocket Analysis Tools Multicore 101
  • 43. MontaVista DevRocket Analysis Tools Multicore 101
  • 44. CGE5 Only: Microstate Accounting Per process & thread information Time in nanoseconds • Time consumed since process start. • See: /proc/<PID>/tasks/<TID>/msa for per-thread information # cat /proc/1845/msa State: Interruptible Now: 2287392468035 ONCPU_USER 1473381312 ONCPU_SYS 3110032766 INTERRUPTIBLE 1183737626438 UNINTERRUPTIBLE 1011435 INTERRUPTED 546291 ACTIVEQUEUE 2217218048 EXPIREDQUEUE 0 STOPPED 0 ZOMBIE 0 SLP_POLL 0 SLP_PAGING 0 SLP_FUTEX 0 Multicore 101
  • 45. Debug “Multi-Anything Debug process, thread, and kernel context DevRocket IDE Multicore 101
  • 46. Migrating to Multicore: What is the Pathway? ►Contain • Migrate to multicore but contain code to a single core ►Exploit • Use an event driven architecture to add explicit functional parallelism ►Analyze • Use available profiling, tracing, and performance monitoring tools and APIs ►Optimize Multicore 101
  • 47. Optimize Goal: Get the most from the available MC performance ►Focus attention on areas where Amdahl's law indicates the most benefit can occur! ►Leverage data parallelization for CPU bound computations ►Utilize interrupt and process/thread affinity to tune the system Multicore 101
  • 48. Migrating to Multicore: What is the Pathway? ►Contain • Migrate to multicore but contain code to a single core ►Exploit • Use an event driven architecture to add explicit functional parallelism ►Analyze • Use available profiling, tracing, and performance monitoring tools and APIs ►Optimize • Specialize cores as needed. Explore other MC optimizations Multicore 101
  • 49. MontaVista Support for Freescale Multicore ►Carrier Grade Edition 4.0 8572 • 8641D, 8640D ►Professional • ►Carrier • • 8641D, 8640D ►Professional Grade Edition 5.0 Edition 4.0 Edition 5.0 8572 • 8641D, 8640D • 8641D, 8640D Freescale P4080 operating today on the Virtutech Simics simulator in advance of hardware availability MontaVista offers comprehensive support of Freescale Power Architecture processors today Multicore 101
  • 50. Two Ways to Learn More About Multicore October 1-3, 2008 San Francisco, CA Where embedded Linux gets real MontaVista Vision MontaVista TestDrive For more information on in-depth multicore sessions, visit: Evaluate Freescale multicore and MontaVista Linux for free, visit: www.mvista.com/vision www.mvista.com/freescale/eval Multicore 101