The document summarizes SDN forwarding abstractions and two related papers. It discusses how programmable data planes and flexible matching and actions in hardware can support SDN. The first paper introduces the Reconfigurable Match Table (RMT) model which aims to make hardware switches flexible while keeping costs low. The second paper introduces the P4 language for programming protocol-independent packet processors to configure parsers, tables, and packet processing on switches in a target-independent way.
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
Lendo e apresentando o artigo: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN.
http://conferences.sigcomm.org/sigcomm/2013/program.php
http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p99.pdf
Motivation for multithreaded architecturesYoung Alista
The document discusses the motivation for and design of multithreaded architectures. It aims to increase processor utilization by allowing multiple independent instruction streams, or threads, to execute simultaneously. This can compensate for a lack of instruction-level parallelism in individual threads. Simultaneous multithreading (SMT) processors in particular issue instructions from multiple threads each cycle without hardware context switching. SMT achieves high throughput with minimal performance degradation to individual threads by sharing most hardware resources between threads.
Jennifer Rexford
Professor
Princeton University
Plenaries Session
ONS2015: http://bit.ly/ons2015sd
ONS Inspire! Webinars: http://bit.ly/oiw-sd
Watch the talk (video) on ONS Content Archives: http://bit.ly/ons-archives-sd
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
The document proposes a new parallel sorting algorithm called AA-sort that is suitable for multi-core SIMD processors. AA-sort avoids unaligned memory accesses to fully utilize SIMD instructions. It sorts data in-core using an optimized combsort and then merges sorted blocks out-of-core in parallel. Experimental results on PowerPC and Cell processors show AA-sort has better scalability and performance than other algorithms both sequentially and in parallel.
Project Slides for Website 2020-22.pptxAkshitAgiwal1
This document describes a resource efficient ternary content addressable memory (TCAM) architecture. It uses a pipelined layer architecture to increase operating frequency. The TCAM was implemented on an FPGA with a size of 64x32 and achieved a 10.52% higher speed, 7.62% lower power, and 50% lower resource utilization compared to existing designs. When implemented in ASIC using a 45nm process, the proposed TCAM achieved 121.10% higher speed, 70.12% lower power, and an 18.7x smaller area compared to existing designs.
digital signal processing
Computer Architectures for signal processing
Harvard Architecture, Pipelining, Multiplier
Accumulator, Special Instructions for DSP, extended
Parallelism,General Purpose DSP Processors,
Implementation of DSP Algorithms for var
ious operations,Special purpose DSP
Hardware,Hardware Digital filters and FFT processors,
Case study and overview of TMS320
series processor, ADSP 21XX processor
isca22-feng-menda_for sparse transposition and dataflow.pptxssuser30e7d2
MeNDA is a near-memory architecture that uses processing units deployed in DIMM buffer chips to perform sparse matrix transposition and SpMV through a multi-way merge algorithm. It presents a scalable solution by exploiting rank-level and DIMM-level parallelism. Evaluation shows MeNDA achieves speedups of 19x, 12x and 8x over CPU, GPU, and state-of-the-art SpMV accelerator implementations, respectively. It also reduces the transposition overhead in graph analytics from 126% to 5% by enabling in-situ processing.
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
Lendo e apresentando o artigo: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN.
http://conferences.sigcomm.org/sigcomm/2013/program.php
http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p99.pdf
Motivation for multithreaded architecturesYoung Alista
The document discusses the motivation for and design of multithreaded architectures. It aims to increase processor utilization by allowing multiple independent instruction streams, or threads, to execute simultaneously. This can compensate for a lack of instruction-level parallelism in individual threads. Simultaneous multithreading (SMT) processors in particular issue instructions from multiple threads each cycle without hardware context switching. SMT achieves high throughput with minimal performance degradation to individual threads by sharing most hardware resources between threads.
Jennifer Rexford
Professor
Princeton University
Plenaries Session
ONS2015: http://bit.ly/ons2015sd
ONS Inspire! Webinars: http://bit.ly/oiw-sd
Watch the talk (video) on ONS Content Archives: http://bit.ly/ons-archives-sd
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
The document proposes a new parallel sorting algorithm called AA-sort that is suitable for multi-core SIMD processors. AA-sort avoids unaligned memory accesses to fully utilize SIMD instructions. It sorts data in-core using an optimized combsort and then merges sorted blocks out-of-core in parallel. Experimental results on PowerPC and Cell processors show AA-sort has better scalability and performance than other algorithms both sequentially and in parallel.
Project Slides for Website 2020-22.pptxAkshitAgiwal1
This document describes a resource efficient ternary content addressable memory (TCAM) architecture. It uses a pipelined layer architecture to increase operating frequency. The TCAM was implemented on an FPGA with a size of 64x32 and achieved a 10.52% higher speed, 7.62% lower power, and 50% lower resource utilization compared to existing designs. When implemented in ASIC using a 45nm process, the proposed TCAM achieved 121.10% higher speed, 70.12% lower power, and an 18.7x smaller area compared to existing designs.
digital signal processing
Computer Architectures for signal processing
Harvard Architecture, Pipelining, Multiplier
Accumulator, Special Instructions for DSP, extended
Parallelism,General Purpose DSP Processors,
Implementation of DSP Algorithms for var
ious operations,Special purpose DSP
Hardware,Hardware Digital filters and FFT processors,
Case study and overview of TMS320
series processor, ADSP 21XX processor
isca22-feng-menda_for sparse transposition and dataflow.pptxssuser30e7d2
MeNDA is a near-memory architecture that uses processing units deployed in DIMM buffer chips to perform sparse matrix transposition and SpMV through a multi-way merge algorithm. It presents a scalable solution by exploiting rank-level and DIMM-level parallelism. Evaluation shows MeNDA achieves speedups of 19x, 12x and 8x over CPU, GPU, and state-of-the-art SpMV accelerator implementations, respectively. It also reduces the transposition overhead in graph analytics from 126% to 5% by enabling in-situ processing.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
1) Early parallel architectures included the mainframe approach using a crossbar interconnect and the minicomputer approach using a shared bus. (2) Modern architectures have converged on a distributed memory model connected by a general-purpose network. (3) Programming models have also converged but hardware organizations remain flexible to support different approaches like message passing, shared memory, data parallel and systolic.
This document describes a primitive processing and advanced shading architecture for embedded systems. It features a vertex cache and programmable primitive engine that can process fixed and variable size primitives with reduced memory bandwidth requirements. The architecture includes a configurable per-fragment shader that supports various shading models using dot products and lookup tables stored on-chip. This hybrid design aims to bring appealing shading to embedded applications while meeting limitations on gate size, power consumption, and memory traffic growth.
Morph: Flexible Acceleration for 3D CNN-based Video Understanding.
Kartik Hegde, Rohit Agrawal, Yulun Yao, Christopher W. Fletcher
University of Illinois
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
This document provides an overview of the NCTU P4 Workshop. It discusses:
- The P4 programming language which allows specifying how switches process packets in a protocol-independent and target-independent way.
- Key concepts in P4 including headers, parsers, tables, actions, and the control flow.
- An example P4 architecture and how to define headers, parsers, tables, and the control flow.
- How to get started with P4 including setting up the behavioral model compiler and runtime environment and using Mininet to test P4 programs.
- A quick demo of a simple P4 program that uses a custom header to implement path routing and can be configured via the runtime
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...Ritu Arora
Often, HPC software outlives the HPC systems for which they are initially developed. The innovations in the HPC platforms’ hardware and parallel programming standards drive the modernization of HPC applications so that they continue being performant. While such code modernization efforts may not be very challenging for HPC experts and well-funded research groups, many domain-experts may find it challenging to adapt their applications for latest HPC platforms due to lack of expertise, time, and funds. The challenges of such domain-experts can be mitigated by providing them high-level tools for code modernization and migration.
This chapter discusses reduced instruction set computers (RISC). It provides background on major advances in computers that led to RISC designs, such as cache memory, microprocessors, and pipelining. The key features of RISC processors are described, including large general-purpose registers, a limited and simple instruction set, and an emphasis on optimizing the instruction pipeline. The chapter compares RISC to CISC processors and discusses the driving forces behind both approaches. It analyzes the execution characteristics of programs and implications for processor design, such as optimizing register usage and careful pipeline design.
This document provides an introduction to high-performance computing (HPC) including definitions, applications, hardware, and software. It defines HPC as utilizing parallel processing through computer clusters and supercomputers to solve complex modeling problems. The document then describes typical HPC cluster hardware such as computing nodes, a head node, switches, storage, and a KVM. It also outlines cluster management software, job scheduling, and parallel programming tools like MPI that allow programs to run simultaneously on multiple processors. An example HPC cluster at SIU called Maxwell is presented with its technical specifications and a tutorial on logging into and running simple MPI programs on the system.
This document summarizes a research paper on a fine-grain multithreading superscalar architecture. It introduces the need for multithreading to hide long latency operations. It describes the base superscalar architecture and how threads are defined with characteristics like no data dependencies. A register remapping scheme is used to allocate registers to threads. The multithreading architecture can suspend threads on long latency instructions to hide latency without stalling. Simulation results show the multithreading approach provides around a 1.5x speedup on average.
The document provides a summary of various physical design problems Lee Johnson has solved using Tcl scripting in different EDA tool environments over several projects from the late 1990s to recent years. It lists solutions by project, including mesh clock routing flows in Cadence Innovus from 2015-2016, floorplanning and routing scripts for IBM chips from 1997-2015, and scripts addressing problems like pin placement, bus routing, and macro placement for other ASIC projects during the same period. It also provides examples of general utilities developed.
Larrabee is Intel's many-core graphics processor that combines CPU and GPU functionality. It is based on Intel's x86 architecture and features vector processing units and texture sampling units. Larrabee uses a binning renderer for parallel processing across its cores. While similar to GPUs in supporting graphics functions like rasterization, it differs in allowing cache coherency and running x86 instructions. Larrabee aims to enable highly parallel workloads for applications like graphics rendering, physics simulation, and data analysis.
The document discusses the steps in the logic synthesis process from RTL to optimized gate-level netlist. It includes:
1) RTL description is converted to an internal representation
2) Logic is optimized to remove redundancy
3) Technology mapping implements the representation using cells from a technology library
The document also discusses floor planning, which determines routing areas by placing blocks/macros, and placement which places standard cells in rows to minimize area and interconnect cost.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
The problem with this parallel processing of the interleaver is that it requires random access to the memory locations storing the interleaved addresses. However, achieving random access to multiple memory locations in parallel is difficult and inefficient in hardware implementations. It is better to generate the interleaved addresses sequentially rather than requiring parallel random access.
This document discusses the basics of RTL design and synthesis. It covers stages of synthesis including identifying state machines, inferring logic and state elements, optimization, and mapping to target technology. It notes that not everything that can be simulated can be synthesized. Good coding style reduces hazards and improves optimization. Examples are given of how logic, sequential logic, and datapaths can be synthesized. Pipelining is discussed as dividing complex operations into simpler operations processed sequentially.
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.
http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
Fast switching of threads between cores is a published research paper on Operating systems, This is our attempt to decode the research and present to the class
This presentation gives you easier way to study 8086 i.e. 16 bit microprocessor. 8086 microprocessor's architecture contains two basic functional units namely Bus Interface Unit and Execution Unit. BIS fetches the instructions from memory. It reads/writes instructions to/from memory. Input/Output of data to/from I/O peripheral ports. Address generation of memory reference. Queing instructions. Thus, BIU handles all transfer of data and address. Tasks of EU: Decodes the instructions. Executes the decoded instructions. Tells BIU from where to fetch the instruction. EU takes care of performing operation on the data. EU is also known as heart of the data.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
1) Early parallel architectures included the mainframe approach using a crossbar interconnect and the minicomputer approach using a shared bus. (2) Modern architectures have converged on a distributed memory model connected by a general-purpose network. (3) Programming models have also converged but hardware organizations remain flexible to support different approaches like message passing, shared memory, data parallel and systolic.
This document describes a primitive processing and advanced shading architecture for embedded systems. It features a vertex cache and programmable primitive engine that can process fixed and variable size primitives with reduced memory bandwidth requirements. The architecture includes a configurable per-fragment shader that supports various shading models using dot products and lookup tables stored on-chip. This hybrid design aims to bring appealing shading to embedded applications while meeting limitations on gate size, power consumption, and memory traffic growth.
Morph: Flexible Acceleration for 3D CNN-based Video Understanding.
Kartik Hegde, Rohit Agrawal, Yulun Yao, Christopher W. Fletcher
University of Illinois
2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
This document provides an overview of the NCTU P4 Workshop. It discusses:
- The P4 programming language which allows specifying how switches process packets in a protocol-independent and target-independent way.
- Key concepts in P4 including headers, parsers, tables, actions, and the control flow.
- An example P4 architecture and how to define headers, parsers, tables, and the control flow.
- How to get started with P4 including setting up the behavioral model compiler and runtime environment and using Mininet to test P4 programs.
- A quick demo of a simple P4 program that uses a custom header to implement path routing and can be configured via the runtime
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...Ritu Arora
Often, HPC software outlives the HPC systems for which they are initially developed. The innovations in the HPC platforms’ hardware and parallel programming standards drive the modernization of HPC applications so that they continue being performant. While such code modernization efforts may not be very challenging for HPC experts and well-funded research groups, many domain-experts may find it challenging to adapt their applications for latest HPC platforms due to lack of expertise, time, and funds. The challenges of such domain-experts can be mitigated by providing them high-level tools for code modernization and migration.
This chapter discusses reduced instruction set computers (RISC). It provides background on major advances in computers that led to RISC designs, such as cache memory, microprocessors, and pipelining. The key features of RISC processors are described, including large general-purpose registers, a limited and simple instruction set, and an emphasis on optimizing the instruction pipeline. The chapter compares RISC to CISC processors and discusses the driving forces behind both approaches. It analyzes the execution characteristics of programs and implications for processor design, such as optimizing register usage and careful pipeline design.
This document provides an introduction to high-performance computing (HPC) including definitions, applications, hardware, and software. It defines HPC as utilizing parallel processing through computer clusters and supercomputers to solve complex modeling problems. The document then describes typical HPC cluster hardware such as computing nodes, a head node, switches, storage, and a KVM. It also outlines cluster management software, job scheduling, and parallel programming tools like MPI that allow programs to run simultaneously on multiple processors. An example HPC cluster at SIU called Maxwell is presented with its technical specifications and a tutorial on logging into and running simple MPI programs on the system.
This document summarizes a research paper on a fine-grain multithreading superscalar architecture. It introduces the need for multithreading to hide long latency operations. It describes the base superscalar architecture and how threads are defined with characteristics like no data dependencies. A register remapping scheme is used to allocate registers to threads. The multithreading architecture can suspend threads on long latency instructions to hide latency without stalling. Simulation results show the multithreading approach provides around a 1.5x speedup on average.
The document provides a summary of various physical design problems Lee Johnson has solved using Tcl scripting in different EDA tool environments over several projects from the late 1990s to recent years. It lists solutions by project, including mesh clock routing flows in Cadence Innovus from 2015-2016, floorplanning and routing scripts for IBM chips from 1997-2015, and scripts addressing problems like pin placement, bus routing, and macro placement for other ASIC projects during the same period. It also provides examples of general utilities developed.
Larrabee is Intel's many-core graphics processor that combines CPU and GPU functionality. It is based on Intel's x86 architecture and features vector processing units and texture sampling units. Larrabee uses a binning renderer for parallel processing across its cores. While similar to GPUs in supporting graphics functions like rasterization, it differs in allowing cache coherency and running x86 instructions. Larrabee aims to enable highly parallel workloads for applications like graphics rendering, physics simulation, and data analysis.
The document discusses the steps in the logic synthesis process from RTL to optimized gate-level netlist. It includes:
1) RTL description is converted to an internal representation
2) Logic is optimized to remove redundancy
3) Technology mapping implements the representation using cells from a technology library
The document also discusses floor planning, which determines routing areas by placing blocks/macros, and placement which places standard cells in rows to minimize area and interconnect cost.
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
The problem with this parallel processing of the interleaver is that it requires random access to the memory locations storing the interleaved addresses. However, achieving random access to multiple memory locations in parallel is difficult and inefficient in hardware implementations. It is better to generate the interleaved addresses sequentially rather than requiring parallel random access.
This document discusses the basics of RTL design and synthesis. It covers stages of synthesis including identifying state machines, inferring logic and state elements, optimization, and mapping to target technology. It notes that not everything that can be simulated can be synthesized. Good coding style reduces hazards and improves optimization. Examples are given of how logic, sequential logic, and datapaths can be synthesized. Pipelining is discussed as dividing complex operations into simpler operations processed sequentially.
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.
http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
Fast switching of threads between cores is a published research paper on Operating systems, This is our attempt to decode the research and present to the class
This presentation gives you easier way to study 8086 i.e. 16 bit microprocessor. 8086 microprocessor's architecture contains two basic functional units namely Bus Interface Unit and Execution Unit. BIS fetches the instructions from memory. It reads/writes instructions to/from memory. Input/Output of data to/from I/O peripheral ports. Address generation of memory reference. Queing instructions. Thus, BIU handles all transfer of data and address. Tasks of EU: Decodes the instructions. Executes the decoded instructions. Tells BIU from where to fetch the instruction. EU takes care of performing operation on the data. EU is also known as heart of the data.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
1. SDN Forwarding Abstractions
• Programmable data plane
• “Forwarding Metamorphosis: Fast Programmable Match-Action Processing in
Hardware for SDN”, SIGCOMM’2013.
• “P4: Programming Protocol-Independent Packet Processors”, SIGCOMM
Computer Communications Review, July 2014.
2. Forwarding Metamorphosis
The problem addressed in this paper: The current hardware switches
are rigid, what is the cost to make them flexible to support OpenFlow
forwarding abstraction.
Conventional switch chips are inflexible
SDN demands flexibility…sounds expensive…
How do we do it: The Reconfigurable Match Table (RMT) switch model
Flexibility costs less than 15%
3. Flexible match-action
Single Match table
All header in one table
To support flexibility, the table needs to store all combination in the header.
Wasteful
Multiple Match Tables
Smaller tables with a subset of headers in a pipeline of stages
Stage j can be made depend on stage i, i<j.
Existing switches has a small (4-8) number of tables whose widths, depths,
and execution order are fixed.
4. A fixed function switch
Deparser
In
Queues
Data
Out
ACL
Stage
L3
Stage
L2
Stage
Action:
set
L2D
Stage 1
L2
Table
L2: 128k x 48
Exact match
Action:
set
L2D,
dec
TTL
Stage 2
L3
Table
L3: 16k x 32
Longest prefix
match
Action:
permit/deny
Stage 3
ACL
Table
ACL: 4k
Ternary match
?????????
Parser
X X X X X
5. What flexibility does OpenFlow need?
o Multiple stages of match-action
Flexible resource allocation
Flexible header fields
Flexible actions
6. Other alternatives for SDN to get the
flexibility?
• Software? 100x slower
• NPUs? 10x slower, expensive
• FPGAs? 10x slower, expensive
• Need to keep up with 1Tb/s speed!!
7. What is in this paper?
How to design a flexible switch chip?
What does the flexibility cost?
8. Reconfigurable Match Tables (RMT)
Ideal RMT should allow a set of pipeline stages each with a match
table of arbitrary depth and width.
RMT goes beyond fixed function MMT in four ways
field definitions can be changed and new fields can be added.
The number, topology, widths and depths of match tables can be specified,
subject only to the overall resource limit on the number of matched bits.
New actions can be defined.
Arbitrarily modified packets can be placed in specified queue(s).
The configuration should be done by an SDN controller.
9. RMT model, and header field flexibility
• The parser specifies the header fields of interests, and generates a
4kb header vector (put the fields in a specific location in the vector).
change header field, add new header
10. The RMT model: flexibility in the number,
topology, widths and depths of match tables
• More physical stages than
logical stages. A logical
stages can be mapped to
the physical stage.
• Logical stage specified by
table graph.
11. The RMT model: flexibility in the number,
topology, widths and depths of match tables
• Restrictions for realizability
• Physical pipeline stage
• Match restriction: a fixed number
stages (32)
• Packet header limits: the packet
header vector has a fixed size (4kb)
• Memory restriction: identical
number of entries for all
• Action restrictions: number of
complexity of instructions
• Not generic inst., just modify
headers.
18. RMT Switch Design
• 64 x 10Gb ports
• 960M packets/second
• 1GHz pipeline
• Programmable parser
• 32 Match/action stages
• Huge TCAM: 10x current chips
• 64K TCAM words x 640b
• SRAM hash tables for exact matches
• 128K words x 640b
• 224 action processors per stage
• All OpenFlow statistics counters
19. Cost of Configurability:
Comparison with Conventional Switch
• Many functions identical: I/O, data buffer, queueing…
• Make extra functions optional: statistics
• Memory dominates area
• Compare memory area/bit and bit count
• RMT must use memory bits efficiently to compete on cost
• Techniques for flexibility
• Match stage unit RAM configurability
• Ingress/egress resource sharing
• Table predication allows multiple tables per stage
• Match memory overhead reduction
• Match memory multi-word packing
20. Chip Comparison with Fixed Function Switches
Section Area % of chip Extra Cost
IO, buffer, queue, CPU, etc 37% 0.0%
Match memory & logic 54.3% 8.0%
VLIW action engine 7.4% 5.5%
Parser + deparser 1.3% 0.7%
Total extra area cost 14.2%
Section Power % of chip Extra Cost
I/O 26.0% 0.0%
Memory leakage 43.7% 4.0%
Logic leakage 7.3% 2.5%
RAM active 2.7% 0.4%
TCAM active 3.5% 0.0%
Logic active 16.8% 5.5%
Total extra power cost 12.4%
Area
Power
10/20/14 Software Defined Networking (COMS 6998-10)
21. Conclusion
• How do we design a flexible chip?
• The RMT switch model
• Bring processing close to the memories:
• pipeline of many stages
• Bring the processing to the wires:
• 224 action CPUs per stage
• How much does it cost?
• 15%
• Lots of the details how this is designed in 28nm CMOS are in the
paper
10/20/14 Software Defined Networking (COMS 6998-10) Source: P. Bosshart, TI
22. SDN Forwarding Abstractions
• Programmable data plane
• “Forwarding Metamorphosis: Fast Programmable Match-Action Processing in
Hardware for SDN”, SIGCOMM’2013.
• “P4: Programming Protocol-Independent Packet Processors”, SIGCOMM
Computer Communications Review, July 2014.
23. In the Beginning…
• OpenFlow was simple
• A single rule table
• Priority, pattern, actions, counters, timeouts
• Matching on any of 12 fields, e.g.,
• MAC addresses
• IP addresses
• Transport protocol
• Transport port numbers
24. Over the Past Five Years…
24
Version Date # Headers
OF 1.0 Dec 2009 12
OF 1.1 Feb 2011 15
OF 1.2 Dec 2011 36
OF 1.3 Jun 2012 40
OF 1.4 Oct 2013 41
Proliferation of header fields
Multiple stages of heterogeneous tables
Still not enough (e.g., VXLAN, NVGRE, STT, …)
25. Future SDN Switches
• Configurable packet parser
• Not tied to a specific header format
• Flexible match+action tables
• Multiple tables (in series and/or parallel)
• Able to match on all defined fields
• General packet-processing primitives
• Copy, add, remove, and modify
• For both header fields and meta-data
25
26. We Can Do This!
• New generation of switch ASICs
• Intel FlexPipe: programmable parser,
• RMT [SIGCOMM’13]
• Cisco Doppler
• But, programming these chips is hard
• Custom, vendor-specific interfaces
• Low-level, akin to microcode programming
27. We need a higher-level interface
To tell the programmable switch how we want it to behave
28. Three Goals for the language
• Protocol independence
• Configure a packet parser
• Define a set of typed match+action tables
• Target independence
• Program without knowledge of switch details
• Rely on compiler to configure the target switch
• Reconfigurability
• Change parsing and processing in the field
33. P4 Concepts
• A P4 program contains definitions of the following
Headers: describes the sequence and structure of a series of fields (fields
width and constraints on field values)
Parsers: A parser definition specifies how to identify headers and valid
header sequence within packets
Tables: Match+action tables are the mechanism for performing packet
processing. P4 program defines the fields on which a table may match and the
actions it may execute.
Actions: P4 supports construction of complex actions from simpler protocol-
independent primitives. Actions are available within match-action tables.
Control programs: the control program determines the order of match-
action tables that are applied to a packet, describing the flow of control
between match+action tables.
34. P4 language by example
• Data-center routing
• Top-of-rack switches
• Two tiers of core switches
• Source routing by ToR
• Hierarchical tag (mTag)
• Pushed by the ToR
• Four one-byte fields
• Two hops up, two down
34
up1
up2 down1
down2
ToR ToR
35. Header Formats
• Header
• Ordered list of fields
• A field has a name and width
header ethernet {
fields {
dst_addr : 48;
src_addr : 48;
ethertype : 16;
}
}
header mTag {
fields {
up1 : 8;
up2 : 8;
down1 : 8;
down2 : 8;
ethertype : 16;
}
}
header vlan {
fields {
pcp : 3;
cfi : 1;
vid : 12;
ethertype : 16;
}
}
36. Parser
• State machine traversing the packet
• Extracting field values as it goes
36
parser start { parser vlan {
ethernet; switch(ethertype) {
} case 0xaaaa : mTag;
case 0x800 : ipv4;
parser ethernet { . . .
switch(ethertype) { }
case 0x8100 : vlan;
case 0x9100 : vlan; parser mTag {
case 0x800 : ipv4; switch(ethertype) {
. . . case 0x800 : ipv4;
} . . .
} }
}
37. Match-action Tables
• Describe each packet-processing stage
• What fields are matched, and in what way
• What action functions are performed
• (Optionally) a hint about max number of rules
37
table mTag_table {
reads {
ethernet.dst_addr : exact;
vlan.vid : exact;
}
actions {
add_mTag;
}
max_size : 20000;
}
table local_switching {
// read destination and checks if local
// if miss occurs, goto mtag table
}
table local_switching {
//verify egress is resolved
// do not retag packets received with tag
// read egress and whether packet was mtagged
}
39. Control Flow
• Flow of control from one table to the next
• Collection of functions, conditionals, and tables
• For a ToR switch:
39
ToR
From core
(with mTag)
From local hosts
(with no mTag)
Source
Check
Table
Local
Switching
Table
Egress
Check
mTag
Table
Miss: Not Local
40. Control Flow
• Flow of control from one table to the next
• Collection of functions, conditionals, and tables
• Simple imperative representation
40
control main() {
table(source_check);
if (!defined(metadata.ingress_error)) {
table(local_switching);
if (!defined(metadata.outport)) {
table(mTag_table);
}
table(egress_check);
}
}
42. P4 Compiler
• Parser
• Programmable parser: translate to state machine
• Fixed parser: verify the description is consistent
• Control program
• Target-independent: table graph of dependencies
• Target-dependent: mapping to switch resources
• Rule translation
• Verify that rules agree with the (logical) table types
• Translate the rules to the physical tables
42
44. Compiling control programs
• And then to a target specific back-end
control main() {
table(source_check);
if (!defined(metadata.ingress_error)) {
table(local_switching);
if (!defined(metadata.outport)) {
table(mTag_table);
}
table(egress_check);
}
}
45. Conclusion
• OpenFlow 1.x
• Vendor-agnostic API
• But, only for fixed-function switches
• An alternate future
• Protocol independence
• Target independence
• Reconfigurability in the field
• P4 language: a straw-man proposal
• To trigger discussion and debate
• Much, much more work to do!
45
Editor's Notes
OF 1.4 did stop for lack of wanting more, but just to put on the breaks.
This is natural and a sign of success of OpenFlow:
enable a wider range of controller apps
expose more of the capabilities of the switch
E.g., adding support for MPLS, inter-table meta-data, ARP/ICMP, IPv6, etc.
New encap formats arising much faster than vendors spin new hardware
This may sound like a pipe dream, and certainty this won’t happen overnight.
But, there are promising signs of progress in this direction…
Offering this kind of functionality at reasonable cost…
Two modes: (i) configuration and (ii) populating
Compiler configures the parser, lays out the tables (cognizant of switch resources and capabilities), and translates the rules to map to the hardware tables
The compiler could run directly on the switch (or at least some backend portion of the compiler would do so)
Compiler uses this to
- determine what kind of memory (width, height), and what kind (TCAM, SRAM)
- reject rules that violate the type
The “add_mTag” is an “action function”.
In other kinds of tables, multiple different action functions may be allowed.
Push mTag on the packet
Copy the ethertype of the VLAN header to correctly identify the next level of header
Make the VLAN header point to the mTag
Set the four fields of the mTag
Ensure the packet goes to the right egess port
Source check (one rule per port): verify mTag only on ports to the core (actions: fault, strip and record metadata, pass)
Local switching: checks if destination is local, and otherwise goes to mTag table
mTag table: add mTag
Egress check: has someone set the outport
Source check (one rule per port): verify mTag only on ports to the core (actions: fault, strip and record metadata, pass)
Local switching: checks if destination is local, and otherwise goes to mTag table
mTag table: add mTag
Egress check: verify egress is resolved, do not retag packets received with tag, reads egress, etc.
This is a rich area, and so far we have only touched the surface…
Parser:
Programmable parser: translate parser description into a state machine
Fixed parser: verify that the parser description is consistent with the target processor
Control program
Table graph: dependencies on processing order, opportunities for parallelism
Mapping to switch resources: physical tables, types of memory, sizes, combining tables
Rule translation
Verify rules agree with the types
Translate rules into the physical tables