This document discusses manycore programming and preparing for the manycore future. It begins by defining manycore as having more than 8 cores per chip. It emphasizes that hardware is changing and programming needs to change to take advantage of parallelism. It discusses task parallelism vs data parallelism and introduces frameworks like the Task Parallel Library and languages like F# that support functional programming approaches well-suited for manycore. It stresses designing applications for concurrency from the start.
Julia: A modern language for software 2.0Viral Shah
This talk introduces the Julia language, the size of the community, the package ecosystem, differentiable programming, compiler design, and applications of scientific machine learning.
Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.
A powerful comparison of deep learning frameworks for Arabic sentiment analysis IJECEIAES
Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.
TensorFlow is the most popular machine learning framework nowadays. TensorFlow Lite (TFLite), open sourced in late 2017, is TensorFlow’s runtime designed for mobile devices, esp. Android cell phones. TFLite is getting more and more mature. One the most interesting new components introduced recently are its GPU delegate and new NNAPI delegate. The GPU delegate uses Open GL ES compute shader on Android platforms and Metal shade on iOS devices. The original NNAPI delegate is an all-or-nothing design (if one of the ops in the compute graph is not supported by NNAPI, the whole graph is not delegated). The new one is a per-op design. When an op in a graph is not supported by NNAPI, the op is automatically fell back to the CPU runtime. I’ll have a quick review TFLite and its interpreter, then walk the audience through example usage of the two delegates and important source code of them.
Julia: A modern language for software 2.0Viral Shah
This talk introduces the Julia language, the size of the community, the package ecosystem, differentiable programming, compiler design, and applications of scientific machine learning.
Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.
A powerful comparison of deep learning frameworks for Arabic sentiment analysis IJECEIAES
Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.
TensorFlow is the most popular machine learning framework nowadays. TensorFlow Lite (TFLite), open sourced in late 2017, is TensorFlow’s runtime designed for mobile devices, esp. Android cell phones. TFLite is getting more and more mature. One the most interesting new components introduced recently are its GPU delegate and new NNAPI delegate. The GPU delegate uses Open GL ES compute shader on Android platforms and Metal shade on iOS devices. The original NNAPI delegate is an all-or-nothing design (if one of the ops in the compute graph is not supported by NNAPI, the whole graph is not delegated). The new one is a per-op design. When an op in a graph is not supported by NNAPI, the op is automatically fell back to the CPU runtime. I’ll have a quick review TFLite and its interpreter, then walk the audience through example usage of the two delegates and important source code of them.
Deep Learning libraries and first experiments with TheanoVincenzo Lomonaco
In recent years, neural networks and deep learning techniques have shown to perform well on many
problems in image recognition, speech recognition, natural language processing and many other tasks.
As a result, a large number of libraries, toolkits and frameworks came out in different languages and
with different purposes. In this report, firstly we take a look at these projects and secondly we choose the
framework that best suits our needs: Theano. Eventually, we implement a simple convolutional neural net
using this framework to test both its ease-of-use and efficiency.
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
Unreal Engine* 4 is a high-performance game engine for game developers. Learn how Intel and Epic Games* worked together to improve engine performance both for CPUs and GPUs and how developers can take advantage of it.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
Matrix Multiplication with Ateji PX for JavaPatrick Viry
Matrix multiplication is a standard benchmark for evaluating the performance of intensive dataparallel operations on recent multi-core processors. This whitepaper shows to use Ateji PX for Java to achieve state-of-the-art parallel performance, by adding one single operator in your existing code.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introduction-to-the-tvm-open-source-deep-learning-compiler-stack-a-presentation-from-octoml/
Luis Ceze, Co-founder and CEO of OctoML, a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and Venture Partner at Madrona Venture Group, presents the “Introduction to the TVM Open Source Deep Learning Compiler Stack” tutorial at the September 2020 Embedded Vision Summit.
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators — requires significant manual effort.
In this talk, Ceze presents his work on the TVM stack, which exposes graph- and operator-level optimizations to provide performance portability for deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of optimizations.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Software and the Concurrency Revolution : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Software and the Concurrency Revolution
Herb Sutter
Software Architect, Microsoft
Software Development Consultant, www.gotw.ca/training
Herb Sutter is a prominent C++ expert. He is also a book author and was a columnist for Dr. Dobb's Journal. He joined Microsoft in 2002 as a platform evangelist for Visual C++ .NET, rising to lead software architect for C++/CLI.
In this deck from the HPC User Forum in Austin, Yutaka Ishikawa from Riken AICS presents: Japan's post K Computer.
Watch the video presentation: http://wp.me/p3RLHQ-fJ6
Learn more: http://hpcuserforum.com
This issue’s feature article, Tuning Autonomous Driving Using Intel® System Studio, illustrates how the tools in Intel System Studio give embedded systems and connected device developers an integrated development environment to build, debug, and tune performance and power usage. Continuing the theme of tuning edge applications, Building Fast Data Compression Code for Cloud and Edge Applications shows how to use the Intel® Integrated Performance Primitives
to speed data compression.
Deep Learning libraries and first experiments with TheanoVincenzo Lomonaco
In recent years, neural networks and deep learning techniques have shown to perform well on many
problems in image recognition, speech recognition, natural language processing and many other tasks.
As a result, a large number of libraries, toolkits and frameworks came out in different languages and
with different purposes. In this report, firstly we take a look at these projects and secondly we choose the
framework that best suits our needs: Theano. Eventually, we implement a simple convolutional neural net
using this framework to test both its ease-of-use and efficiency.
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
Unreal Engine* 4 is a high-performance game engine for game developers. Learn how Intel and Epic Games* worked together to improve engine performance both for CPUs and GPUs and how developers can take advantage of it.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
Matrix Multiplication with Ateji PX for JavaPatrick Viry
Matrix multiplication is a standard benchmark for evaluating the performance of intensive dataparallel operations on recent multi-core processors. This whitepaper shows to use Ateji PX for Java to achieve state-of-the-art parallel performance, by adding one single operator in your existing code.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introduction-to-the-tvm-open-source-deep-learning-compiler-stack-a-presentation-from-octoml/
Luis Ceze, Co-founder and CEO of OctoML, a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and Venture Partner at Madrona Venture Group, presents the “Introduction to the TVM Open Source Deep Learning Compiler Stack” tutorial at the September 2020 Embedded Vision Summit.
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators — requires significant manual effort.
In this talk, Ceze presents his work on the TVM stack, which exposes graph- and operator-level optimizations to provide performance portability for deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of optimizations.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Software and the Concurrency Revolution : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Software and the Concurrency Revolution
Herb Sutter
Software Architect, Microsoft
Software Development Consultant, www.gotw.ca/training
Herb Sutter is a prominent C++ expert. He is also a book author and was a columnist for Dr. Dobb's Journal. He joined Microsoft in 2002 as a platform evangelist for Visual C++ .NET, rising to lead software architect for C++/CLI.
In this deck from the HPC User Forum in Austin, Yutaka Ishikawa from Riken AICS presents: Japan's post K Computer.
Watch the video presentation: http://wp.me/p3RLHQ-fJ6
Learn more: http://hpcuserforum.com
This issue’s feature article, Tuning Autonomous Driving Using Intel® System Studio, illustrates how the tools in Intel System Studio give embedded systems and connected device developers an integrated development environment to build, debug, and tune performance and power usage. Continuing the theme of tuning edge applications, Building Fast Data Compression Code for Cloud and Edge Applications shows how to use the Intel® Integrated Performance Primitives
to speed data compression.
Entenda as recentes novidades e mudanças anunciadas pela Microsoft com relacao ao futuro do .NET Framework e sua nova arquitetura e quais os cenarios que ele contempla. Detalhes também sobre os novos cenarios WEB habilitados
I have introduced developments in multi-core computers along with their architectural developments. Also, I have explained about high performance computing, where these are used. At the end, openMP is introduced with many ready to run parallel programs.
LAS16-108: JerryScript and other scripting languages for IoTLinaro
LAS16-108: JerryScript and other scripting languages for IoT
Speakers: Paul Sokolovsky
Date: September 26, 2016
★ Session Description ★
Overview of small-size/low-resource VHLL (very high-level languages)/scripting languages available for embedded/IoT usage (JavaScript, Python, Lua, etc.). Typical/possible usage scenarios and benefits. Challenges of running VHLLs in deeply embedded/very resource-constrained environments. Progress reports on porting JerryScript to Zephyr. (Possibly, architecture comparison of JerryScript and MicroPython).
★ Resources ★
Etherpad: pad.linaro.org/p/las16-108
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-108/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
oneAPI: Industry Initiative & Intel ProductTyrone Systems
With the growth of AI, machine learning, and data-centric applications, the industry needs a programming model that allows developers to take advantage of rapid innovation in processor architectures. TensorFlow supports the oneAPI industry initiative and its standards-based open specification.
oneAPI complements TensorFlow’s modular design and provides increased choice of hardware vendor and processor architecture, and faster support of next-generation accelerators. TensorFlow uses oneAPI today on Xeon processors and we look forward to using oneAPI to run on future Intel architectures.
Similar to Architecting Solutions for the Manycore Future (20)
Building high performance and scalable share point applicationsTalbott Crowell
SharePoint custom application development can sometimes be challenging. This presentation at SPS New Hampshire on October 18th, 2014 covers some techniques and strategies on improving performance and scalability of your applications.
Road to the Cloud - Extending your reach with SharePoint and Office 365Talbott Crowell
Presentation on SharePoint and Office 365 development for ISV's at Microsoft Cambridge on March 6th, 2014. More details https://info.windowsazure.com/Feb2014BostonRoadtotheCloudBusinessstrategyandnetworkingforISVs_Register.html?LeadSource=Email&browserLanguage=&LeadSourceDetail=Person-to-person
Custom Development in SharePoint – What are my options now?Talbott Crowell
Since Microsoft has released SharePoint 2013 with a whole new application development methodology, there has been some confusion and frustration in the community on what the best approach for customizing SharePoint for developers. In this session, we will look at the options, new and old, and discuss the pros and cons. We may even see some novel approaches you haven’t thought about yet.
Talbott Crowell introduces F# 3.0 including function programming basics, units of measure, and the new F# 3.0 feature called Type Providers for analyzing Big Data
PowerShell is a powerful scripting environment and language for developers and administrators. SharePoint 2010 has built in PowerShell administration commandlets, but you can use this powerful scripting language with any version of SharePoint. Using PowerShell, Talbott will demonstrate how you can build and deploy SharePoint sites for development, testing, proof of concepts, and production. Using an agile methodology, you will learn how to leverage PowerShell scripts for your planning and development process.
This talk describes building Silverlight 3 applications using F#. Both Visual Studio 2008 and 2010 RC are demonstrated. This talk was given by Talbott Crowell at the F# User Group meeting on April 4, 2010 at Microsoft, Cambridge, MA.
Automating SQL Server Database Creation for SharePointTalbott Crowell
In this session, Talbott will discuss the use of the SharePoint API for provisioning content databases in SQL Server to store documents. There are several scenarios that you will want to control and manage the database creation when building specialized applications using SharePoint. Topics include planning and estimating size requirements plus strategies around partitioning data into content databases. Attendees include SQL Server DBA's supporting SharePoint installations and applications. Presented at New England Data Camp 1.0, Jan 24, 2009, at Microsoft Waltham, MA.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. This talk will focus solution architects toward thinking about parallelism when designing applications and solutions Threads vs. Tasks using TPL LINQ vs. PLINQ Object Oriented vs. Functional Programming This talk will also compare programming languages, how languages differ when dealing with manycore programming, and the different advantages to these languages. Abstract manycore
3. Patrick Gelsinger, Intel VP February 2001, San Francisco, CA 2001 IEEE International Solid-State Circuits Conference (ISSCC) If scaling continues at present pace, by 2005, high speed processors would have power density of nuclear reactor, by 2010, a rocket nozzle, and by 2015, surface of sun. Intel stock dropped 8% on the next day “Business as usual will not work in the future.”
4. The Power Wall: CPU Clock Speed Manycore -> Multicore -> Single core -> From Katherine Yelick’s “Multicore: Fallout of a Hardware Revolution”
5. In 1966, Gordon Moore predicted exponential growth in number of transistors per chip based on the trend from 1959 to 1965 Clock frequencies continued to increase exponentially until they hit the power wall in 2004 at around 3 to 4 GHz 1971, Intel 4004 (first single-chip CPU) – 740 kHz 1978, Intel 8086 (orgin of x86) – 4.77 MHz 1985, Intel 80386DX – 16 MHz 1993, Pentium P5 – 66 MHz 1998, Pentium II – 450 MHz 2001, Pentium II (Tualatin) – 1.4 GHz 2004, Pentium 4F – 3.6 GHz 2008, Core i7 (Extreme) – 3.3 GHz Intel is now doubling cores along with other improvements to continue to scale Effect of the Power Wall This trend continues even today The Power Wall Enter Manycore
6. Manycore, What is it? Manycore, Why should I care? Manycore, What do we do about it? Frameworks Task Parallel Library (Reactive Extensions and .NET 4) Languages, paradigms, and language extensions F#, functional programming, LINQ, PLINQ Tools Visual Studio 2010 Tools for Concurrency Agenda: Manycore Future
8. Single core: 1 processor on a chip die (1 socket) Many past consumer and server CPU’s (some current CPU’s for lightweight low power devices) Including CPU’s that support hyperthreading, but this is a grey area Multicore: 2 to 8 core processors per chip/socket AMD Athlon 64 X2 (first dual-core desktop CPU released in 2005) Intel Core Duo, 2006 (32 bit, dual core, for laptops only) Core Solo was a dual core chip with one that doesn’t work Intel Core 2 (not multicore, instead a brand for 64 bit arch) Core 2 Solo (1 core) Core 2 Duo (2 cores) Core 2 Quad (4 cores) Manycore: more than 8 cores per chip Currently prototypes and R&D Manycore, What is it?
9. High-end Servers 2001-2004 IBM Servers 2001 - IBM POWER4 PowerPC for AS/400 and RS/6000 “world's first non-embedded dual-core processor” Sun Servers 2004 - UltraSpark IV – “first multicore SPARC processor” Desktops/Laptops 2005-2006 AMD Athlon 64 X2 (Manchester) May 2005 “first dual-core desktop CPU” Intel Core Duo, Jan 2006 Intel Pentium (Allendale) dual core Jan 2007 Windows Servers 2006 Intel Xeon (Paxville) dual core Dec 2005 AMD Opteron (Denmark) dual core March 2006 Intel Itanium 2 (Montecito) dual core July 2006 Sony Playstation 3 – 2006 9 core Cell Processor (only 8 operational) - Cell architecture jointly developed by Sony, Toshiba, and IBM Multicore trends from servers to gaming consoles
10. Power Mac G5 - Mid 2003 2 x 1 core (single core) IBM PowerPC 970 Mac Pro - Mid 2006 2 x 2 core (dual core) Intel Xeon (Woodcrest) Mac Pro - Early 2008 2 x 4 core (quad core) Intel Xeon (Harpertown) In 5 years number of cores doubled twice on Apple’s high end graphics workstation From 2 to 4 to 8 Macintosh multicore trend
11. The chip is just designed for research efforts at the moment, according to an Intel spokesperson. "There are no product plans for this chip. We will never sell it so there won't be a price for it," the Intel spokesperson noted in an e-mail. "We will give about a hundred or more to industry partners like Microsoft and academia to help us research software development and learn on a real piece of hardware, [of] which nothing of its kind exists today." http://redmondmag.com/articles/2009/12/04/intel-unveils-48-core-cloud-computer-chip.aspx Microsoft said it had already put SCC into its development pipeline so it could exploit it in the future. http://news.bbc.co.uk/2/hi/technology/8392392.stm 48 Core Single-chip Cloud Computer (SCC)
13. Hardware is changing Programming needs to change to take advantage of new hardware Concurrent Programming Paradigm Shift Designing applications Developing applications Manycore, Why should I care?
14. “The computer industry is once again at a crossroads. Hardware concurrency, in the form of new manycore processors, together with growing software complexity, will require that the technology industry fundamentally rethink both the architecture of modern computers and the resulting software development paradigms.” Craig MundieChief Research and Strategy OfficerMicrosoft CorporationJune 2008 First paragraph of the Forward of Joe Duffy’s preeminent tome “Concurrent Programming on Windows” Concurrent Programming
15. Excerpt from Mark Reinhold’s Blog post: November 24, 2009 The free lunch is over. Multicore processors are not just coming—they’re here. Leveraging multiple cores requires writing scalable parallel programs, which is incredibly hard. Tools such as fork/join frameworks based on work-stealing algorithms make the task easier, but it still takes a fair bit of expertise and tuning. Bulk-data APIs such as parallel arrays allow computations to be expressed in terms of higher-level, SQL-like operations (e.g., filter, map, and reduce) which can be mapped automatically onto the fork-join paradigm. Working with parallel arrays in Java, unfortunately, requires lots of boilerplate code to solve even simple problems. Closures can eliminate that boilerplate. “It’s time to add them to Java.” http://blogs.sun.com/mr/entry/closures “There’s not a moment to lose!”
16. Herb Sutter 2005 Programs are not doubling in speed every couple of years for free anymore We need to start writing code to take advantage of many cores Currently painful and problematic to take advantage of many cores because of shared memory, locking, and other imperative programming techniques “The Free Lunch Is Over”
17. Is this just hype? Another Y2K scare? Fact: CPU’s are changing Programmers will learn to exploit new architectures Will you be one of them? Wait and see? You could just wait and let the tools catch up so you don’t have to think about it. Will that strategy work? Should you be concerned?
18. Just tools or frameworks will not solve the manycore problem alone Imperative programming by definition has limitations scaling in a parallel way Imperative programming (C, C++, VB, Java, C#) Requires locks and synchronization code to handle shared memory read/write transactions Not trivial Difficult to debug Tools and frameworks may help, but will require different approach to the problem (a different paradigm) to really take advantage of the tools The Core Problem
19. Some frameworks are designed to be single threaded, such as ASP.NET Best practices for ASP.NET applications recommend avoiding spawning new threads ASP.NET and IIS handle the multithreading and multiprocessing to take advantage of the many processors (and now many cores) on Web Servers and Application Servers Will this best practice remain true? Even when server CPU’s have hundreds or thousands of cores? Will it affect all programmers?
20. What do we do about it? (How do we prepare for Manycore)
21. Identify where the dependencies are Identify where you can parallelize Understand the tools, techniques, and approaches for solving the pieces Put them together to understand overall performance POC – Proof of Concept Test, test, test Performance goals up front Understand Problem Domain
22. Frameworks Task Parallel Library (TPL) Reactive Extensions for .NET 3.5 (Rx) Used to be called Parallel Extensions or PFx Baked into .NET 4 Programming paradigms, languages, and language extensions Functional programming F# LINQ and PLINQ Tools Visual Studio 2010 Tools for Concurrency Manycore, What do we do about it?
24. Concurrency or Concurrent computing Many independent requests Web Server, works on multi-threaded single core CPU Separate processes that may be executed in parallel More general than parallelism Parallelism or Parallel computing Processes are executed in parallel simultaneously Only possible with multiple processors or multiple cores Yuan Lin: compares to black and white photography vs. color, one is not a superset of the other http://www.touchdreams.net/blog/2008/12/21/more-on-concurrency-vs-parallelism/ Parallelism vs. Concurrency
25. Task Parallelism (aka function parallelism and control parallelism) Distributing execution processes (threads/functions/tasks) across different parallel computing nodes (cores) http://msdn.microsoft.com/en-us/library/dd537609(VS.100).aspx Data Parallelism (aka loop-level parallelism) Distributing dataacross different parallel computing nodes (cores) Executing same command over every element in a data structure http://msdn.microsoft.com/en-us/library/dd537608(VS.100).aspx Task vs. Data Parallelism See MSDN for .NET 4, Parallel Programming, Data/Task Parallelism
28. Reference System.Threading Use Visual Studio 2010 or .NET 4 For Visual Studio 2008 Download unsupported version for .NET 3.5 SP1 from Reactive Extensions for .NET (Rx) http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx Create a “Task” How to use the TPL FileStream fs = new FileStream(fileName, FileMode.CreateNew); var task = Task.Factory.FromAsync(fs.BeginWrite, fs.EndWrite, bytes, 0, bytes.Length, null);
29. Use Task class Task Parallelism with the TPL // Create a task and supply a user delegate // by using a lambda expression. vartaskA = new Task(() => Console.WriteLine("Hello from taskA.")); // Start the task. taskA.Start(); // Output a message from the calling thread. Console.WriteLine("Hello from the calling thread.");
30. Task<TResult> Getting return value from a Task Task<double>[] taskArray = new Task<double>[] { Task<double>.Factory.StartNew(() => DoComputation1()), // May be written more conveniently like this: Task.Factory.StartNew(() => DoComputation2()), Task.Factory.StartNew(() => DoComputation3()) }; double[] results = new double[taskArray.Length]; for (inti = 0; i < taskArray.Length; i++) results[i] = taskArray[i].Result;
31. Task resembles new thread or ThreadPool work item, but higher level of abstraction Tasks provide two primary benefits over Threads: More efficient and scalable use of system resources More programmatic control than is possible with a thread or work item Tasks vs. Threads
32. Behind the scenes, tasks are queued to the ThreadPool ThreadPool now enhanced with algorithms (like hill-climbing) that determine and adjust to the number of threads that maximizes throughput. Tasks are relatively lightweight You can create many of them to enable fine-grained parallelism. To complement this, widely-known work-stealing algorithms are employed to provide load-balancing.. Tasks and the framework built around them provide a rich set of APIs that support waiting, cancellation, continuations, robust exception handling, detailed status, custom scheduling, and more. Tasks
33. Instead of: Use: Data Parallelism with the TPL for (inti = 0; i < matARows; i++) { for (int j = 0; j < matBCols; j++) { ... } } Parallel.For(0, matARows, i => { for (int j = 0; j < matBCols; j++) { ... } }); // Parallel.For
34. Use Tasks not Threads Use Parallel.For in Data Parallelism scenarios Or… Use AsyncWorkflosw from F#, covered later Use PLINQ, covered later TPL Summary
36. 1930’s: lambda calculus (roots) 1956: IPL (Information Processing Language) “the first functional language” 1958: LISP “a functional flavored language” 1962: APL (A Programming Language) 1973: ML (Meta Language) 1983: SML (Standard ML) 1987: Caml (Categorical Abstract Machine Language ) and Haskell 1996: OCaml (Objective Caml) 2005: F# introduced to public by Microsoft Research 2010: F# is “productized” in the form of Visual Studio 2010 Functional programming has been around a long time (over 50 years)
37. Most functional languages encourage programmers to avoid side effects Haskell (a “pure” functional language) restricts side effects with a static type system A side effect Modifies some state Has observable interaction with calling functions Has observable interaction with the outside world Example: a function or method with no return value Functional programming is safe
38. Language Evolution (Simon Payton-Jones) C#, VB, Java, C are imperative programming languages. Very useful but can change the state of the world at anytime creating side effects. Nirvana! Useful and Safe F# Haskell is Very Safe, but not very useful. Used heavily in research and academia, but rarely in business. http://channel9.msdn.com/posts/Charles/Simon-Peyton-Jones-Towards-a-Programming-Language-Nirvana/
39. When a function changes the state of the program Write to a file (that may be read later) Write to the screen Changing values of variables in memory (global variables or object state) Side Effect
40. Compare SQL to your favorite imperative programming language If you write a statement to store and query your data, you don’t need to specify how the system will need to store the data at a low level Example: table partitioning LINQ is an example of bringing functional programming to C# and VB through language extensions Functional Programming
41. Use lots of processes Avoid side effects Avoid sequential bottlenecks Write “small messages, big computations” code Efficient Multicore Programming Source: Joe Armstrong’s “Programming Erlang, Software for a Concurrent World” Section 20.1 “How to Make Programs Run Efficiently on a Multicore CPU”
43. Functional language developed by Microsoft Research By Don Syme and his team, who productized Generics Based on OCaml (influenced by C# and Haskell) History 2002: F# language design started 2005 January: F# 1.0.1 releases to public Not a product. Integration with VS2003 Works in .NET 1.0 through .NET 2.0 beta, Mono 2005 November: F# 1.1.5 with VS 2005 RTM support 2009 October: VS2010 Beta 2, CTP for VS2008 & Non-Windows users 2010: F# is “productized” and baked into VS 2010 What is F#
45. Parallel Computing and PDC09 Tools Managed Languages Axum Visual F# Visual Studio 2010 Parallel Debugger Windows Native Libraries Managed Libraries DryadLINQ Async AgentsLibrary Parallel Pattern Library Profiler Concurrency Analysis Parallel LINQ Rx Task ParallelLibrary Data Structures Data Structures Microsoft Research Native Concurrency Runtime Task Scheduler Race Detection Managed Concurrency Runtime Resource Manager ThreadPool Fuzzing Operating System Threads UMS Threads HPC Server Windows 7 / Server 2008 R2 Research / Incubation Visual Studio 2010 / .NET 4 Key:
46. Functional programming has been around a long time Not new Long history Functional programming is safe A concern as we head toward manycore and cloud computing Functional programming is on the rise Why another language?
48. “F# is, technically speaking, neutral with respect to concurrency - it allows the programmer to exploit the many different techniques for concurrency and distribution supported by the .NET platform” F# FAQ: http://bit.ly/FSharpFAQ Functional programming is a primary technique for minimizing/isolating mutable state Asynchronous workflows make writing parallel programs in a “natural and compositional style” F# and Multi-Core Programming
49. Interactive Scripting Good for prototyping Succinct = Less code Type Inference Strongly typed, strict (no dynamic typing) Automatic generalization (generics for free) Few type annotations 1st class functions (currying, lazy evaluations) Pattern matching Key Characteristics of F#
53. Difficult to turn existing sequential code into parallel code Must modify large portions of code to use threads explicitly Using shared state and locks is difficult Careful to avoid race conditions and deadlocks Two Problems Parallelizing Imperative Code http://www.manning.com/petricek/petricek_meapch1.pdf
57. From Seq to PSeq Matthew Podwysocki’s Blog http://weblogs.asp.net/podwysocki/archive/2009/02/23/adding-parallel-extensions-to-f.aspx Adding Parallel Extensions to F# for VS2010 Beta 2 Talbott Crowell’s Developer Blog http://talbottc.spaces.live.com/blog/cns!A6E0DA836D488CA6!396.entry Parallel Extensions to F#
59. Asynchronous Workflows Control.MailboxProcessor Task Based Programming using TPL Reactive Extensions “The Reactive Extensions can be used from any .NET language. In F#, .NET events are first-class values that implement the IObservable<out T> interface. In addition, F# provides a basic set of functions for composing observable collections and F# developers can leverage Rx to get a richer set of operators for composing events and other observable collections. ”S. Somasegar, Senior Vice President, Developer Division http://blogs.msdn.com/somasegar/archive/2009/11/18/reactive-extensions-for-net-rx.aspx F# Parallel Programming Options
60. Problem Resize a ton of images Demo of Image Processor let files = Directory.GetFiles(@"C:magesriginal") for file in files do use image = Image.FromFile(file) use smallImage = ResizeImage(image) let destFileName = DestFileName("s1", file) smallImage.Save(destFileName)
61. Asynchronous Workflows let FetchAsync(file:string) = async { use stream = File.OpenRead(file) let! bytes = stream.AsyncRead(intstream.Length) use memstream = new MemoryStream(bytes.Length) memstream.Write(bytes, 0, bytes.Length) use image = Image.FromStream(memstream) use smallImage = ResizeImage(image) let destFileName = DestFileName("s2", file) smallImage.Save(destFileName) } let tasks = [for file in files -> FetchAsync(file)] let parallelTasks = Async.Parallel tasks Async.RunSynchronouslyparallelTasks
64. LINQ declaratively specify what you want done not how you want it done Versus: LINQ var source = Enumerable.Range(1, 10000); varevenNums = from num in source where Compute(num) > 0 select num; var source = Enumerable.Range(1, 10000); varevenNums = new List<int>(); foreach (var num in source) if (Compute(num) > 0) evenNums.Add(num);
65. If I put a counter in Compute(num)? What will happen? var source = Enumerable.Range(1, 10000); varevenNums = from num in source where Compute(num) > 0 select num; private static int Compute(int num) { counter++; if (num % 2 == 0) return 1; return 0; }
68. LINQ declaratively specify what you want done not how you want it done PLINQ Declaratively specify “As Parallel” Under the hood, the framework will implement “the how” using TPL and threads. PLINQ = Parallel LINQ var source = Enumerable.Range(1, 10000); varevenNums = from num in source where Compute(num) > 0 select num; var source = Enumerable.Range(1, 10000); varevenNums = from num in source.AsParallel() where Compute(num) > 0 select num;
71. Steven Toub at PDC 2009Senior Program Manager on the Parallel Computing Platform http://microsoftpdc.com/Sessions/P09-09
72. Views enable you to see how your multi-threaded application interacts with Itself Hardware Operating System Other processes on the host computer Provides graphical, tabular and textual data Shows the temporal relationships between the threads in your program the system as a whole Concurrency Visualizer in Visual Studio 2010
73. Performance bottlenecks CPU underutilization Thread contention Thread migration Synchronization delays Areas of overlapped I/O and other info… Use Concurrency Visualizer to Locate
79. Tomas Petricek - F# Webcast (III.) - Using Asynchronous Workflows http://tomasp.net/blog/fsharp-webcast-async.aspx Luke Hoban - F# for Parallel and Asynchronous Programming http://microsoftpdc.com/Sessions/FT20 More info on Asychrounous Workflows
80. The Landscape of Parallel Computing Research: A View from Berkeley 2.0 by David Patterson http://science.officeisp.net/ManycoreComputingWorkshop07/Presentations/David%20Patterson.pdf Parallel Dwarfs http://paralleldwarfs.codeplex.com/ More Research
81. “The architect as we know him today is a product of the Renaissance.” (1) “But the medieval architect was a master craftsman (usually a mason or a carpenter by trace), one who could build as well as design, or at least ‘one trained in that craft even if he had ceased to ply his axe and chisel’(2).” (1) “Not only is he hands on, like the agile architect, but we also learn from Arnold that the great Gothic cathedrals of Europe were built, not with BDUF, but with ENUF” (1). Dana Arnold, Reading Architectural History, 2002 (2). D. Knoop & G. P. Jones, The Medieval Mason, 1933 (3). Architects: Back to the future?, Ian Cooper 2008 The Architect http://codebetter.com/blogs/ian_cooper/archive/2008/01/02/architects-back-to-the-future.aspx
82. visit us at http://fsug.org Thank you. Questions?Architecting Solutions for the Manycore Future Talbott Crowell ThirdM.com http://talbottc.spaces.live.com Twitter: @Talbott and @fsug