SlideShare a Scribd company logo

Radical step in computer architecture

ARCCN
ARCCN

Boris Babayan report for ARCCN seminar on 20 october 2016

1 of 42
Download to read offline
Radical step in computer
architecture
Boris Babayan
Nearly all basic radical steps in architecture were
made by our team before anybody in industry
• “Carry save arithmetic” – one of the two basic technologies still in use for
main arithmetic primitive operations
– my student’s work (1954), presented at university conference (1955).
• The best possible architecture functionality definition and
implementation in Elbrus computer (1978) widely used in our country
including
– High level programming architecture support (not just support of the existing
HLL corrupted by outdated architecture) – without parallel execution
functionality (HW of that time was not ready for that)
not implemented so far in any existing computers
– Real HLL EL – 76 (1976) for Elbrus computers
– Clean best possible OS kernel (no privilege mode) for supporting real High
Level programming
• Elbrus architecture, which main goal is a real HLL EL – 76, and Elbrus OS
kernel as a byproduct, fully solved security problem including possibility
of supporting user programs’ correctness proof.
OUR RADICAL STEPS (first in industry)
(cont.)
• The very first-in-technology implementation of OOO superscalar (Elbrus 1 – 1978)
and what is even more important at the early stage (after the second generation of
Elbrus computers in 1985) getting rid of superscalar approach showing its weak
points and starting to find more robust solution of parallel execution problem.
• Successful implementation of cluster-based VLIW architecture with fine grained
parallel execution (Elbrus 3, end of 90s), probably for the first time in technology.
• Suggestion and the fist implementation of Binary Translation (BT) technology for
designing a new architecture built on radically new principles but binary
compatible with the old ones (Elbrus 3, end of 90s).
• Design and simulation of radically new principles of fine grained parallel
architecture and extension of HLL (like EL – 76) and OS (like Elbrus OS kernels) for
their support.
General computer system
structure
Drawbacks of current superscalar (SS)
• Program conversion in SS is rather complicated.
Parallel algorithm  sequential binary  implicitly parallel inside SS  sequential at retirement
• SS has performance limit (independent of available HW).
• Inability to use all available HW properly.
• Funny situation exists with SMT mechanism  using SMT instead of using natural algorithm parallelism.
• Rather complicated VECTOR HW and MULTI-THREAD programming.
• Current architecture corrupted all today’s HLLs.
• Current architecture does not support dynamic data typing and object oriented data memory.
This excludes possibility to support good security and debugging facility.
• Current organization of computations does not allow good optimization.
Compiler has no full information about algorithm and HW (corrupted HLL).
Cache structure of today’s architecture hides its internal structure preventing compiler from good
optimization of its operation.
• Today’s architecture is far from being universal.
• Etc.
An extremely important point here is that
all the above-mentioned drawbacks (including HLL, OS) have a single source –
inheriting of principles of ancient, early days computing with strong HW size constraints for
current architecture as its basic ones.
EARLY DAY’S COMPUTING
Main constraint – shortage of HW  single execution unit EU and small linear memory
Execution unit was un-improvable
Carry cave and high radix arithmetic
Therefore, the whole architecture was un-improvable and universal
with said constraints
Basic architecture decisions
Single Instruction Pointer binary (SIP)
Simple unstructured linear memory (LM)
No data types support (No DT)
Binary was the sequence (SIP) of instructions for the main resource - single EU
Argument of instructions – address of another resource – memory location (LM)
No any data type support (No DT) – shortage of resources
All execution optimization was programmer’s job, he knows algorithm and HW
resources well. At that time both algorithms to be executed and HW were rather
simple, so programmer was able to do his job very well
Input binary includes instructions how to use resources,
rather than the algorithm description.
Design was best possible for those constraints.

Recommended

A Perspective on the Future of Computer Architecture
A Perspective on the  Future of Computer ArchitectureA Perspective on the  Future of Computer Architecture
A Perspective on the Future of Computer ArchitectureARCCN
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumHossam Hassan
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOCA B Shinde
 
Software hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq socSoftware hardware co-design using xilinx zynq soc
Software hardware co-design using xilinx zynq socHossam Hassan
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
VLSI Study experiments
VLSI Study experimentsVLSI Study experiments
VLSI Study experimentsGouthaman V
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 

More Related Content

What's hot

Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsSyed Zaid Irshad
 
Cisc vs risc
Cisc vs riscCisc vs risc
Cisc vs riscKumar
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA OverviewMetalMath
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processorSou Jana
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip BasicsA B Shinde
 
Fpga optimus main_print
Fpga optimus  main_printFpga optimus  main_print
Fpga optimus main_printSushant Burde
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis toolsHossam Hassan
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computersSanjivani Sontakke
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitSWINDONSilicon
 

What's hot (20)

Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
Asic
AsicAsic
Asic
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
 
Cisc vs risc
Cisc vs riscCisc vs risc
Cisc vs risc
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Lec1 final
Lec1 finalLec1 final
Lec1 final
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA Overview
 
Vliw
VliwVliw
Vliw
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processor
 
Risc vs cisc
Risc vs ciscRisc vs cisc
Risc vs cisc
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
Cp uarch
Cp uarchCp uarch
Cp uarch
 
RCW@DEI - Design Flow 4 SoPc
RCW@DEI - Design Flow 4 SoPcRCW@DEI - Design Flow 4 SoPc
RCW@DEI - Design Flow 4 SoPc
 
Fpga optimus main_print
Fpga optimus  main_printFpga optimus  main_print
Fpga optimus main_print
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis tools
 
ASIC DESIGN FLOW
ASIC DESIGN FLOWASIC DESIGN FLOW
ASIC DESIGN FLOW
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated Circuit
 

Viewers also liked

SDN & NFV: от абонента до Internet eXchange
SDN & NFV: от абонента до Internet eXchangeSDN & NFV: от абонента до Internet eXchange
SDN & NFV: от абонента до Internet eXchangeARCCN
 
TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...chiportal
 
WIDER Annual Lecture 20 – Martin Ravallion
WIDER Annual Lecture 20 – Martin RavallionWIDER Annual Lecture 20 – Martin Ravallion
WIDER Annual Lecture 20 – Martin RavallionUNU-WIDER
 
P3 computer bus system
P3   computer bus systemP3   computer bus system
P3 computer bus systemJakealflatt
 
Отечественные решения на базе SDN и NFV для телеком-операторов
Отечественные решения на базе SDN и NFV для телеком-операторовОтечественные решения на базе SDN и NFV для телеком-операторов
Отечественные решения на базе SDN и NFV для телеком-операторовARCCN
 
EZchip Open Flow switch by ARCCN
EZchip Open Flow switch by ARCCN  EZchip Open Flow switch by ARCCN
EZchip Open Flow switch by ARCCN ARCCN
 
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchange
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchangeПрактическое применение SDN/NFV в современных сетях: от CPE до Internet eXchange
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchangeARCCN
 
RUNOS OpenFlow controller (ru)
RUNOS OpenFlow controller (ru)RUNOS OpenFlow controller (ru)
RUNOS OpenFlow controller (ru)Alexander Shalimov
 
Intermediate machine architecture
Intermediate machine architectureIntermediate machine architecture
Intermediate machine architectureJohn Cutajar
 
台積電
台積電台積電
台積電5045033
 
Intro to Buses (Computer Architecture)
Intro to Buses  (Computer Architecture)Intro to Buses  (Computer Architecture)
Intro to Buses (Computer Architecture)Matthew Levandowski
 
Types of buses of computer
Types of buses of computerTypes of buses of computer
Types of buses of computerSAGAR DODHIA
 

Viewers also liked (14)

SDN & NFV: от абонента до Internet eXchange
SDN & NFV: от абонента до Internet eXchangeSDN & NFV: от абонента до Internet eXchange
SDN & NFV: от абонента до Internet eXchange
 
TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...
 
TSM
TSMTSM
TSM
 
WIDER Annual Lecture 20 – Martin Ravallion
WIDER Annual Lecture 20 – Martin RavallionWIDER Annual Lecture 20 – Martin Ravallion
WIDER Annual Lecture 20 – Martin Ravallion
 
P3 computer bus system
P3   computer bus systemP3   computer bus system
P3 computer bus system
 
Отечественные решения на базе SDN и NFV для телеком-операторов
Отечественные решения на базе SDN и NFV для телеком-операторовОтечественные решения на базе SDN и NFV для телеком-операторов
Отечественные решения на базе SDN и NFV для телеком-операторов
 
EZchip Open Flow switch by ARCCN
EZchip Open Flow switch by ARCCN  EZchip Open Flow switch by ARCCN
EZchip Open Flow switch by ARCCN
 
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchange
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchangeПрактическое применение SDN/NFV в современных сетях: от CPE до Internet eXchange
Практическое применение SDN/NFV в современных сетях: от CPE до Internet eXchange
 
RUNOS OpenFlow controller (ru)
RUNOS OpenFlow controller (ru)RUNOS OpenFlow controller (ru)
RUNOS OpenFlow controller (ru)
 
Intermediate machine architecture
Intermediate machine architectureIntermediate machine architecture
Intermediate machine architecture
 
台積電
台積電台積電
台積電
 
Intro to Buses (Computer Architecture)
Intro to Buses  (Computer Architecture)Intro to Buses  (Computer Architecture)
Intro to Buses (Computer Architecture)
 
Internet, intranet and extranet
Internet, intranet and extranetInternet, intranet and extranet
Internet, intranet and extranet
 
Types of buses of computer
Types of buses of computerTypes of buses of computer
Types of buses of computer
 

Similar to Radical step in computer architecture

Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...ijceronline
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)Altium
 
CS403: Operating System : Lec 1 Introduction.pptx
CS403: Operating System : Lec 1 Introduction.pptxCS403: Operating System : Lec 1 Introduction.pptx
CS403: Operating System : Lec 1 Introduction.pptxAsst.prof M.Gokilavani
 
SequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageSequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageDoug Norton
 
Introduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsIntroduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsSiva Kumar
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architectureaamc1100
 
Arm based controller - basic bootcamp
Arm based controller - basic bootcampArm based controller - basic bootcamp
Arm based controller - basic bootcampRoy Messinger
 
Low cost embedded system
Low cost embedded systemLow cost embedded system
Low cost embedded systemece svit
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesAngelos Kapsimanis
 
Specter - AAL
Specter - AALSpecter - AAL
Specter - AALPROBOTEK
 
Embedded system
Embedded systemEmbedded system
Embedded systemmangal das
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Atv Reddy
 
CST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic DesignCST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic Designoudesign
 
Embedded systems introduction
Embedded systems introductionEmbedded systems introduction
Embedded systems introductionmohamed drahem
 
MODULE 1 MES.pptx
MODULE 1 MES.pptxMODULE 1 MES.pptx
MODULE 1 MES.pptxManvanthBC
 
PPT MES class.pptx
PPT MES class.pptxPPT MES class.pptx
PPT MES class.pptxkavithadcs
 

Similar to Radical step in computer architecture (20)

Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 
UNIT 1.docx
UNIT 1.docxUNIT 1.docx
UNIT 1.docx
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)
 
CS403: Operating System : Lec 1 Introduction.pptx
CS403: Operating System : Lec 1 Introduction.pptxCS403: Operating System : Lec 1 Introduction.pptx
CS403: Operating System : Lec 1 Introduction.pptx
 
SequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageSequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggage
 
Introduction to embedded computing and arm processors
Introduction to embedded computing and arm processorsIntroduction to embedded computing and arm processors
Introduction to embedded computing and arm processors
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architecture
 
Arm based controller - basic bootcamp
Arm based controller - basic bootcampArm based controller - basic bootcamp
Arm based controller - basic bootcamp
 
Low cost embedded system
Low cost embedded systemLow cost embedded system
Low cost embedded system
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniques
 
Specter - AAL
Specter - AALSpecter - AAL
Specter - AAL
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02Embedded 100912065920-phpapp02
Embedded 100912065920-phpapp02
 
Esl basics
Esl basicsEsl basics
Esl basics
 
CST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic DesignCST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic Design
 
Embedded systems introduction
Embedded systems introductionEmbedded systems introduction
Embedded systems introduction
 
MODULE 1 MES.pptx
MODULE 1 MES.pptxMODULE 1 MES.pptx
MODULE 1 MES.pptx
 
PPT MES class.pptx
PPT MES class.pptxPPT MES class.pptx
PPT MES class.pptx
 

More from ARCCN

Построение транспортных SDN сетей для операторов связи
Построение транспортных SDN сетей для операторов связиПостроение транспортных SDN сетей для операторов связи
Построение транспортных SDN сетей для операторов связиARCCN
 
Магистерская программа «Распределённые системы и компьютерные сети»
Магистерская программа «Распределённые системы и компьютерные сети»Магистерская программа «Распределённые системы и компьютерные сети»
Магистерская программа «Распределённые системы и компьютерные сети»ARCCN
 
Особенности интеграции сторонних сервисов в облачную MANO платформу
Особенности интеграции сторонних сервисов в облачную MANO платформуОсобенности интеграции сторонних сервисов в облачную MANO платформу
Особенности интеграции сторонних сервисов в облачную MANO платформуARCCN
 
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...ARCCN
 
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...ARCCN
 
Перспективы развития SDN  в МИЭТ на базе кафедры ТКС
Перспективы развития SDN  в МИЭТ на базе кафедры ТКСПерспективы развития SDN  в МИЭТ на базе кафедры ТКС
Перспективы развития SDN  в МИЭТ на базе кафедры ТКСARCCN
 
MetaCloud Computing Environment
MetaCloud Computing EnvironmentMetaCloud Computing Environment
MetaCloud Computing EnvironmentARCCN
 
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...ARCCN
 
Возможности импортозамещения коммутационного оборудования в сетях нового пок...
Возможности импортозамещения коммутационного оборудования  в сетях нового пок...Возможности импортозамещения коммутационного оборудования  в сетях нового пок...
Возможности импортозамещения коммутационного оборудования в сетях нового пок...ARCCN
 
Внедрение SDN в сети телеком-оператора
Внедрение SDN в сети телеком-оператораВнедрение SDN в сети телеком-оператора
Внедрение SDN в сети телеком-оператораARCCN
 
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператора
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператораОб одном подходе переноса функциональности CPE устройств в ЦОД телеком оператора
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператораARCCN
 
Облачная платформа Cloud Conductor
Облачная платформа Cloud ConductorОблачная платформа Cloud Conductor
Облачная платформа Cloud ConductorARCCN
 
Типовые сервисы региональной сети передачи данных
Типовые сервисы региональной сети передачи данныхТиповые сервисы региональной сети передачи данных
Типовые сервисы региональной сети передачи данныхARCCN
 
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchip
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchipРазработка OpenFlow-коммутатора на базе сетевого процессора EZchip
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchipARCCN
 
Исследования SDN в Оренбургском государственном университете: сетевая безопас...
Исследования SDN в Оренбургском государственном университете: сетевая безопас...Исследования SDN в Оренбургском государственном университете: сетевая безопас...
Исследования SDN в Оренбургском государственном университете: сетевая безопас...ARCCN
 
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...ARCCN
 
SDN и защищенные квантовые коммуникации
SDN и защищенные квантовые коммуникацииSDN и защищенные квантовые коммуникации
SDN и защищенные квантовые коммуникацииARCCN
 
Отчет по проектах ЦПИКС
Отчет по проектах ЦПИКСОтчет по проектах ЦПИКС
Отчет по проектах ЦПИКСARCCN
 
Учебно-методическая работа по тематике ПКС и ВСС
Учебно-методическая работа по тематике ПКС и ВССУчебно-методическая работа по тематике ПКС и ВСС
Учебно-методическая работа по тематике ПКС и ВССARCCN
 
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...ARCCN
 

More from ARCCN (20)

Построение транспортных SDN сетей для операторов связи
Построение транспортных SDN сетей для операторов связиПостроение транспортных SDN сетей для операторов связи
Построение транспортных SDN сетей для операторов связи
 
Магистерская программа «Распределённые системы и компьютерные сети»
Магистерская программа «Распределённые системы и компьютерные сети»Магистерская программа «Распределённые системы и компьютерные сети»
Магистерская программа «Распределённые системы и компьютерные сети»
 
Особенности интеграции сторонних сервисов в облачную MANO платформу
Особенности интеграции сторонних сервисов в облачную MANO платформуОсобенности интеграции сторонних сервисов в облачную MANO платформу
Особенности интеграции сторонних сервисов в облачную MANO платформу
 
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...
Основные направления развития ФГБОУ ВО «РГРТУ» в области программно-конфигури...
 
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...
Методика стратегического управления развитием SDN&NFV-сети оператора связи и ...
 
Перспективы развития SDN  в МИЭТ на базе кафедры ТКС
Перспективы развития SDN  в МИЭТ на базе кафедры ТКСПерспективы развития SDN  в МИЭТ на базе кафедры ТКС
Перспективы развития SDN  в МИЭТ на базе кафедры ТКС
 
MetaCloud Computing Environment
MetaCloud Computing EnvironmentMetaCloud Computing Environment
MetaCloud Computing Environment
 
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...
Пилотные зоны для тестирования и апробирования SDN&NFV разработок и решений в...
 
Возможности импортозамещения коммутационного оборудования в сетях нового пок...
Возможности импортозамещения коммутационного оборудования  в сетях нового пок...Возможности импортозамещения коммутационного оборудования  в сетях нового пок...
Возможности импортозамещения коммутационного оборудования в сетях нового пок...
 
Внедрение SDN в сети телеком-оператора
Внедрение SDN в сети телеком-оператораВнедрение SDN в сети телеком-оператора
Внедрение SDN в сети телеком-оператора
 
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператора
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператораОб одном подходе переноса функциональности CPE устройств в ЦОД телеком оператора
Об одном подходе переноса функциональности CPE устройств в ЦОД телеком оператора
 
Облачная платформа Cloud Conductor
Облачная платформа Cloud ConductorОблачная платформа Cloud Conductor
Облачная платформа Cloud Conductor
 
Типовые сервисы региональной сети передачи данных
Типовые сервисы региональной сети передачи данныхТиповые сервисы региональной сети передачи данных
Типовые сервисы региональной сети передачи данных
 
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchip
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchipРазработка OpenFlow-коммутатора на базе сетевого процессора EZchip
Разработка OpenFlow-коммутатора на базе сетевого процессора EZchip
 
Исследования SDN в Оренбургском государственном университете: сетевая безопас...
Исследования SDN в Оренбургском государственном университете: сетевая безопас...Исследования SDN в Оренбургском государственном университете: сетевая безопас...
Исследования SDN в Оренбургском государственном университете: сетевая безопас...
 
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...
Цели и задачи МИЭТ, как участника Консорциума на примере кафедры "Телекоммуни...
 
SDN и защищенные квантовые коммуникации
SDN и защищенные квантовые коммуникацииSDN и защищенные квантовые коммуникации
SDN и защищенные квантовые коммуникации
 
Отчет по проектах ЦПИКС
Отчет по проектах ЦПИКСОтчет по проектах ЦПИКС
Отчет по проектах ЦПИКС
 
Учебно-методическая работа по тематике ПКС и ВСС
Учебно-методическая работа по тематике ПКС и ВССУчебно-методическая работа по тематике ПКС и ВСС
Учебно-методическая работа по тематике ПКС и ВСС
 
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...
Отчет «Центра прикладных исследований компьютерных сетей» на Совете фонда "Ск...
 

Recently uploaded

21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementMimmo Squillace
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Adrian Sanabria
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Adrian Sanabria
 

Recently uploaded (20)

21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvement
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
 

Radical step in computer architecture

  • 1. Radical step in computer architecture Boris Babayan
  • 2. Nearly all basic radical steps in architecture were made by our team before anybody in industry • “Carry save arithmetic” – one of the two basic technologies still in use for main arithmetic primitive operations – my student’s work (1954), presented at university conference (1955). • The best possible architecture functionality definition and implementation in Elbrus computer (1978) widely used in our country including – High level programming architecture support (not just support of the existing HLL corrupted by outdated architecture) – without parallel execution functionality (HW of that time was not ready for that) not implemented so far in any existing computers – Real HLL EL – 76 (1976) for Elbrus computers – Clean best possible OS kernel (no privilege mode) for supporting real High Level programming • Elbrus architecture, which main goal is a real HLL EL – 76, and Elbrus OS kernel as a byproduct, fully solved security problem including possibility of supporting user programs’ correctness proof.
  • 3. OUR RADICAL STEPS (first in industry) (cont.) • The very first-in-technology implementation of OOO superscalar (Elbrus 1 – 1978) and what is even more important at the early stage (after the second generation of Elbrus computers in 1985) getting rid of superscalar approach showing its weak points and starting to find more robust solution of parallel execution problem. • Successful implementation of cluster-based VLIW architecture with fine grained parallel execution (Elbrus 3, end of 90s), probably for the first time in technology. • Suggestion and the fist implementation of Binary Translation (BT) technology for designing a new architecture built on radically new principles but binary compatible with the old ones (Elbrus 3, end of 90s). • Design and simulation of radically new principles of fine grained parallel architecture and extension of HLL (like EL – 76) and OS (like Elbrus OS kernels) for their support.
  • 5. Drawbacks of current superscalar (SS) • Program conversion in SS is rather complicated. Parallel algorithm  sequential binary  implicitly parallel inside SS  sequential at retirement • SS has performance limit (independent of available HW). • Inability to use all available HW properly. • Funny situation exists with SMT mechanism  using SMT instead of using natural algorithm parallelism. • Rather complicated VECTOR HW and MULTI-THREAD programming. • Current architecture corrupted all today’s HLLs. • Current architecture does not support dynamic data typing and object oriented data memory. This excludes possibility to support good security and debugging facility. • Current organization of computations does not allow good optimization. Compiler has no full information about algorithm and HW (corrupted HLL). Cache structure of today’s architecture hides its internal structure preventing compiler from good optimization of its operation. • Today’s architecture is far from being universal. • Etc. An extremely important point here is that all the above-mentioned drawbacks (including HLL, OS) have a single source – inheriting of principles of ancient, early days computing with strong HW size constraints for current architecture as its basic ones.
  • 6. EARLY DAY’S COMPUTING Main constraint – shortage of HW  single execution unit EU and small linear memory Execution unit was un-improvable Carry cave and high radix arithmetic Therefore, the whole architecture was un-improvable and universal with said constraints Basic architecture decisions Single Instruction Pointer binary (SIP) Simple unstructured linear memory (LM) No data types support (No DT) Binary was the sequence (SIP) of instructions for the main resource - single EU Argument of instructions – address of another resource – memory location (LM) No any data type support (No DT) – shortage of resources All execution optimization was programmer’s job, he knows algorithm and HW resources well. At that time both algorithms to be executed and HW were rather simple, so programmer was able to do his job very well Input binary includes instructions how to use resources, rather than the algorithm description. Design was best possible for those constraints.
  • 7. SUPERSCALAR (SS) With SS the situation became different: • No HW size constraint • The main constraint is requirement of user level compatibility with old computer (SIP, LM, No Dynamic data Types) • Program size, HW complexity and optimization job became very big Many drawbacks of superscalar presented above can be split in two areas: • Bad functionality (semantics of data and operations) Without supporting dynamic data types in HW it is impossible to correct this drawback. It is impossible to support real high level programming and full security.
  • 8. SUPERSCALAR (SS) (cont.) • Bad performance In SS optimization is executed by programmer, language compiler and HW. Programmer • Now it is too complicated for him and he doesn't know complicated HW • Due to corrupted HLL he cannot specify results of optimization correctly. Compiler Optimization is the right job for it (for it only), but there are no good conditions for that in SS • Due to corrupted HLL compiler has no full information about algorithm • Now compiler is not local to model – it has no enough info about model HW as well including cache structure, which is hidden from compiler for compatibility reasons. HW (BPU, prefetching, eviction) it is a wrong job for it HW has no algorithm information HW structure is not adjusted for algorithm structure (“artificial binding”)
  • 9. BEST POSSIBLE COMPUTER SYSTEM Radical step for Best Possible System (BPS) should move the design into a strongly opposite extreme – from resources to algorithms care Two BPS systems will be discussed. UNCONSTRAINED BPS with the only constraint – algorithm limitation and specific model HW resources size CONSTRAINED BPS with previous constraints plus user level compatibility with x86 (or ARM, etc.) All mechanisms designed for unconstrained BPS are best possible and should be used as basic in constrained BPS. Besides, a few mechanisms should be added for compatibility support. For this the following requirements should be satisfied for language, compiler and HW for unconstrained BPS
  • 10. New language for BPS Compiler should have full information about algorithm. That means that algorithm should be presented in a new universal language that is not corrupted by old architectures. Programmer’s job is to optimize algorithm only, but not its execution. His responsibility is only to give full information about algorithm to compiler. This language should have at least three important features: • Support of presentation of fine grained parallel algorithms (parallelism) • The right functionality (semantics) of its elements including dynamic data typing and capability feature • Possibility to present exhaustive information about algorithm The second feature is completely implemented in EL-76 language used in several generations of computers in our country.
  • 11. COMPILER for BPS Only compiler can and should do optimization in BPS , but it should have the following good conditions for that: • It should have full information about algorithm Programmer should give it using the new language • It should have full information about HW model Compiler should be local to the HW model Distributable binary should be just a simple recoding of new HLL without any optimizations Compiler will use some dynamic information from execution to be able to tune optimization dynamically • The structure of HW elements should be suitable for good optimization control by compiler (see next slide). Local to model compiler removes compatibility requirements from HW, because local compiler receives binary and, if needed for HW improvement, it can be changed together with compiler.
  • 12. HW requirements for BPS HW in BPS should not do any optimizations (BPU, prefetching, eviction, etc.) – it cannot do this good enough, it has no algorithm info and cannot do complex reasoning at run time for analysis. It should do resources allocation according to compiler instruction. The main point here is that HW structure should avoid “artificial binding” (AB) like SIP, Cache line, Vectors in AVX, Full virtual pages, etc. The data structure in HW should not contradict to that of algorithm. The data in HW should be like Lego Set, which will allow compiler to do restructuring for optimization. The BPS should use Elbrus like object oriented memory structure.
  • 13. CONSTRAINED BPS All past architectures reach un-improvable state for their constraints. This is true for current SS as well. Therefore, at least relaxation of current constraints, with retaining user level ISA compatibility (x86, ARM, etc.), is an absolutely necessary condition to step forward and build constrained BPS. We cannot change semantics of current ISA. The only possibility is to change binary presentation by means of BT. So, the only possible step forward for constrained computer architecture is usage of BT system. With BT constrained BPS will use all mechanisms of unconstrained BPS with adding three more mechanisms to support basic compatibility requirements (SIP, LM). These mechanisms are: • Retirement • Check Point • Memory Lock Table Unfortunately, for semantics compatibility reasons constrained BPS cannot support security and aggressive procedure level parallelization.
  • 15. In constrained architecture functionality (semantics) of all its elements (data and operations) is strongly determined by compatibility requirements In this section we are going to present the main functional features of unconstrained computer system and its elements, which were developed in accordance with the approach described above. All mechanisms implementation good for both constrained and unconstrained systems will be the subject of the following sections. Primitive data types and operations Besides the traditional ones (integer, FP, etc.) they include Data and Functional Descriptors – DD and FD – references to object and procedure DYNAMIC PRIMITIVE DATA TYPES For primitive data HW supports data types together with values dynamically (with TAGs). TYPE SAFETY APPROACH All primitive operations are checking types of their arguments.
  • 16. User defined data types (objects) functionality “Natural” requirements to the mechanism of user defined data types (objects) and their implementation 1) Every procedure can generate a new data object and receive a reference (DD) to this new object 2) This procedure, using the received reference, can do with this new object anything possible: – Read data from this object – Read full constant only – Update any element – Delete this object 3) No other procedure can access this object just after it has been generated, but this procedure can give a reference to this object to any objects it knows (has a reference to it) with all or decreased rights listed above 4) Any procedure can generate a copy of reference to any object it knows maybe with decreased rights 5) After the object has been deleted, nobody can access it (all existing references are obsolete) This “natural” description quite uniquely identifies rather simple HW implementation with very high overall execution efficiency (compared with traditional systems).
  • 17. User defined data types (cont.) Object can have user defined Object Type Name (OTN). OTN is also primitive data allocated to objet by its creator. Primitive HW operations check types of their arguments. Procedure also can check type of any object it is working with. Compaction algorithm - dangling pointer problem efficient solution (compared with less efficient Garbage Collection GC) was developed in Elbrus computer. It should be used in unconstrained BPS. With this approach, user (similarly to existing systems) explicitly kills the already used object, which (unlike GC) immediately frees physical (but, unfortunately, not virtual) memory. When virtual memory is close to overflow, background compaction algorithm searches the whole memory sequentially, deleting DD of killed objects and decrementing virtual memory value of still alive objects, which results in compacted virtual memory and possibility to reuse all virtual memory freed from killed objects.
  • 18. Procedure mechanism (user defined operations) Here also we would like to discuss the first “natural” requirement to procedure construction to support language level functionality consistent with the “abstract algorithm” ideas. 1) Any procedure can define another procedure, and define any information accessible to the original procedure as global data for the new procedure. In real running program the only thing to do for definition of the new procedure is to generate (this special instruction in ISA) Functional Descriptor (FD), which allows calling this new procedure. 2) Procedure, which generated this FD, can give this new FD to anybody it has access to, and this new owner also can call this new procedure (only call without access to its global data, executable code, etc., which can be used by the called procedure only). 3) Procedure, which generates FD, includes in FD virtual address of the code to be executed by the new procedure, when this procedure will be called, and this procedure also includes in FD a virtual address of global data object, which can be used by instructions of the new called procedure. Therefore both references are included into FD (a reference to code and a reference to global data) 4) Any procedure, which has FD of the new procedure, can call this procedure and can give it some parameters. Parameter passing logically is an atomic step – the new procedure does not work (no one instruction of the called procedure is executed) before caller specifies all parameters; caller has no access to the parameters passed to callee after call is executed 5) Caller can receive some return data as a result of procedure execution. These data can be used by caller code. Here also we have atomic return value passing
  • 19. Procedure mechanism (user defined operations) (cont.) An extremely important notion for procedure is procedure context – this is the only set of data, which the called procedure can use. The called procedure can use nothing besides the procedure context. Procedure context includes: • Global data given to procedure by creator procedure • Parameters data from caller • All data returned to procedure by procedures called by this procedure. Procedure restriction for context only access is the result of HW architecture features • Dynamic data type and primitive operations type safety support • Strong support semantics of references (DD and FD) This is foundation of capability technology, which ensures strong inter procedure protection. Implementation of all these features in HW is a rather simple and efficient job.
  • 20. Full solution of security problem Strong inter procedure protection ensures that no any attacker can corrupt functioning of system SW (if it has no internal mistakes) and model HW. Attacker cannot access any system data as a result of capability feature, just because attacker never will have any references (DD or FD) to system data. Nobody can sent it to him and he is unable to “create” it artificially. He is also unable to do something bad without real references to system SW. However, now a lot of security problems are results of possibility to use mistakes in user programs by attacker, which he is working with. Logically, the only remedy here is possibility to use a well developed technology of program correctness proof. However, with todays architecture (x86, ARM, etc.) even procedure without any mistakes can be corrupted by attacker due to imperfect old architecture. This is not the case with capability system and correctness proof gives reliable result. Presented approach fully solves security problem. This technology was fully implemented in Elbrus computer about 40 years ago. Unfortunately, nobody till now is even close to this solution.
  • 22. Object oriented memory (OOM) OOM was designed and used in two generations of Elbrus computer with good results. Unfortunately, at that time there was no requirement for cache. But now it can be easily extended into cache. Current Narch design was made on traditional memory and cache structure. However, this memory structure doesn’t correspond to above philosophy. OOM design can be used in full degree on unconstrained BPS. Unfortunately, it cannot be used for memory system of constrained BPS (Narch) due to compatibility reason. However, it can be used in its cache system. OOM structure even for constrained BPS according to preliminary estimations can decrease cache sizes by up to 2-3 times and nearly exclude performance losses due to cache misses.
  • 23. Object oriented memory (OOM) implementation Organizations of physical memory and all cache levels, in general, are the same. The following description is related to all of them. The size of physical memory allocated for an object is equal to the object size. However, each allocated object is also loaded in the virtual space. This space has fixed size pages. For each new object virtual space is allocated from the beginning of a new page. If size of the object is smaller than the page size, then the end of the virtual space of this page is empty (not used). If object is bigger than the virtual page, then a number of pages are allocated for it and the last one can be not fully used. One of the main results of this organization is that each page can include data of one object only. Any page can never include data of more than one object. All free space is explicitly visible for HW and compiler (no “artificial binding”). In memory, as well as in caches, an arbitrary physical part of the object can be allocated (by compiler local to model) in some specific cache. All physical space (of variable size) both in memory and in any cache levels is allocated dynamically. Therefore, the whole free space is in high degree fragmented. Therefore, it is very difficult, if possible at all sometimes, to allocate a rather big piece of an object. We split object into pages to cope successfully with this problem. However, for cache level even page size is big enough from this viewpoint. Therefore, parts of object allocated at cache levels are split by local to model compiler even into smaller parts (all these parts are a part of the same virtual page).
  • 24. Object oriented memory (OOM) implementation (cont.) The system supports special lists for all free spaces. Each list keeps the free areas of a certain set of the sizes (more likely, of power of 2). Each free area is listed in one of the bidirectional lists through the first word of this free piece. Actually, OOM uses virtual numbers of the objects instead of virtual memory addresses. Therefore, in the case of object with the size of many pages, all its pages will have the same virtual object number. Full identification of a specific element of the object will include virtual object number and its index inside object. However, descriptor includes virtual object number only. In OOM each object should not be necessarily presented in memory. Some objects can be generated, for example, in Level 1 cache only or in other levels of caches.
  • 25. Object oriented memory (OOM) implementation (cont.) This memory/cache system organization allows stronger compiler control on execution. Compiler knows all program semantics information and does a more sophisticated optimization. Compiler can do preload of the needed data to high cache level, at first without appointing a more valuable register memory, and moving this data from cache to register only at the last moment. But now even preloading directly into register sometimes could be a good alternative – now we have a big register file. This cache organization allows using access to the first level cache directly from instruction by physical addresses without using virtual address and associative search. To do this, base register (BR) can support a special mode, in which it includes pointers to the physical location of the first level cache together with its virtual address.
  • 26. Procedure mechanism (implementation) In the past we used “strands” approach for this implementation. While “strand” approach is substantially better than superscalar, it still allows dramatic improvement. In strand implementation each strand is HW resource. Parallelism level of dynamically executed program is varying depending on resources dynamic situation, therefore, execution should be able dynamically fork a new strand, which requires a new resource. Typically, for such a situation deadlock avoidance problem should be solved. Static solution of this problem decreases performance. This is less dangerous for loops, because loops can be executed nearly without stopping and forking strands. However, it is not so good for scalar code. Here we will discuss a substantially more advanced suggestion, which is good for scalar and increases performance for loops as well. It can be used in constrained BPS (Narch) as well. This will improve already declared performance data for Narch.
  • 27. Procedure mechanism (implementation) (cont.) For new approach, code to be executed is presented as a fine grained parallel graph with instructions in its nodes and dependencies presented by arcs of the graph. Compiler splits this graph into a number of streams similar to strands in current implementation. Instead of frontend in current design, the new approach has code buffer for whole graph (not for separate streams) only. Four basic technologies are used here: • Register allocation, which is not so trivial in the case of fine grained dynamic code execution. • Speculative execution (control and data speculation) – same as today in Narch • Dynamic execution of parallel instruction graph by “workers” • Instruction graph loading into instruction buffer
  • 28. Register allocation DL/CL technology Scalar code (streams) graph can be crossed both by DLs and CLs lines. Code can have several DLs and CLs, each having a corresponding number – DLn and CLn. All instructions, which cross DL or CL, include this information and HW knows when a specific line was crossed. When some DLn was crossed that means that some register WEBs are already free (all reads and writes are finished) and can be reused. The registers, which were freed with DLn, can be used by compiler in instructions after corresponding CLn. Therefore, corresponding CLn also can be crossed by corresponding streams. If some instruction marked by CLn is being executed and corresponding DLn is not crossed yet, this instruction will wait until this happens. Program will be executed normally, but the time of execution can be improved. Dynamic feedback (in HW) collects information whether any CL was waiting, and using this information compiler later can recompile procedure lifting a corresponding DL a little bit. Eventually, program will work without any time losses for CL wait.
  • 29. Speculative execution (control and data) Branch execution in the new approach is similar to the previous one. BT compiler in constrained and high level language compiler in unconstrained versions generate fine grained parallel binary for HW. Unlike superscalar with BPU technology, when all branches are critical and need predicted speculative execution for each branch with performance losses in case of miss prediction, in our case, due to explicitly parallel execution according to our statistics 80% of branches are not critical and can be executed without speculation. Even with critical branches in our case, when predicate is known well ahead, or has very strong prediction by compiler, there is no need in speculation. Critical branches with late predicate and bad compiler prediction should execute speculatively both alternatives, until predicate is known. As a result, in our case we have no performance losses for branches at all. Similar situation is with data speculation.
  • 30. Dynamic execution of parallel instruction graph by “workers” For constrained architecture, compiler will do all decoding itself instead of HW, therefore, each instruction on the code is ready to be loaded into the corresponding execution unit. For unconstrained case, each instruction also will not need any decoding. For each instruction compiler will calculate “Priority Value Number” (PVN). This number is the number of clocks from this instruction up to the end of scalar code along the longest path. Compiler will present the code in a number of dependent instruction sequences - “streams” (similar to strands in previous design). In this architecture, from the very beginning processor will execute not the “single instruction pointer” sequential code, but the whole graph of the algorithm - all streams with explicitly parallel structure visible to HW. To make it possible processor, besides register file, includes the code buffer The new technology removes frontend from HW entirely. There are many other advantages for this step as well. As code will be executed in fine grained parallel mode, each register should have EMPTY/FULL (E/F) bit to prevent reading from empty register and ask reading instruction to wait until result is assigned.
  • 31. Dynamic execution of parallel instruction graph by “workers” (cont.) Our engine has a number of “workers” in each cluster, whose job is to take the next instructions from the most important streams and to allocate them to a corresponding execution unit. The number of workers in each cluster should be enough to make all execution units busy each clock. Our preliminary guess is that each cluster should have about 16 workers. It loads into Reservation Station (RS) a candidate instruction, which is ready to be executed (all argument registers are FULL or instruction, which should generate its value, has already been sent into RS – needs yet another bit (RS) in each register) and destination is EMPTY). Besides E/F and RS bits each register has (one byte) the head of the line of the streams, which are waiting for the result to be written into this register from some other stream. If at least one argument of the next instruction to be allocated is not ready, worker stops working with this stream and puts this stream into one directional line of one of the registers, which is not ready. This work requires two register assignments, which can be done in parallel, however, at this point worker is free of work anyway, and it is searching any other stream ready to be handled.
  • 32. Loading of instruction graph into instruction buffer DL/CL technology helps to solve big code problem. For code buffer, it is necessary to have its extension. When code is executed before CLn, it is necessary to upload the next part of the code between CLn and CLn+k. Similarly, when DLn is crossed, all code area above can be free. The size of code between CLn and CLn+k is not bigger than the size of register file.
  • 33. Example: Structure of Recurrent Loop Dependencies  Use loop iteration parallelism (both iteration internal and inter-iteration) as fully as possible  Loop iterations analysis performed by the compiler: – Find instructions, which are self-dependent over iteration – Find the groups of instructions, which being self-dependent, are also mutually dependent over the iterations (“rings” of data dependency) – The rest of instructions create sequences, or graph of dependent instructions (a number of “rows”) – The result of each row is either an output of the iteration (STORE, for example), or is used by another row(s) or ring(s).  Each “ring” and/or “row” loop is producing data, which are consumed by other small loops. Each producer can have a number of consumers. However, producer and consumer should be connected through a buffer, giving possibility for producer to go forward, if consumer is not ready yet to use these data
  • 34. 1. Primitive data types and operations introduction 2.1 User defined data types functionality Objects introduction 2.2 User defined operations functionality Procedure Introduction 4.1 User structural data architecture support Object oriented memory implementation 4.2.1 intra (fine grained) & inter procedure execution parallelism architecture implementation 4.1.1 To be extended to cache 4.2 User ”operations” procedure implementation 2.2.1 intra & inter proc parallelism 3. “New” HLL introduction 3.0.1 parallelism 5. “New” OS kernel introduction Basic components of computer technology, their current state and our involvement in their implementation
  • 35. 35 Green parts of computer technology were fully implemented by our (ELBRUS) team in real design (1978) before anybody else in technology Yellow parts require moderate extensions of some of green technologies to support fine grained parallelism Red part is introduction of intra & (fine grained) inter procedure parallelism. All basic decisions are well developed, need to be implemented in real design.
  • 36. The block diagram above includes all basic parts of computer technology and indicates their current states. 1. Introduction of primitive data types and operations Implementation of arithmetic highlighted in green – over 60 years ago this implementation reached the un-improvable state  Carry save algorithm – my student’s work in 1954 – university presentation in 1955. The first western publication in 1956.  High radix arithmetic – James E. Robertson, mid 50s. I had a meeting with him in Moscow in 1958. 2.x Introduction of functionality of user defined data types (Objects) & operations (Procedure) This functionality must be defined with the main and maybe the only basic goal: • To fully correspond to the natural meaning of these notions, without corruption by trying to do any optimization, security or other goals. • If this job is not constrained by any compatibility requirements (especially, with early day’s architecture), this approach ensures the best possible byproduct results for all these goals. This problem was fully solved in Elbrus architecture (1978) and showed outstanding results in two generations of computers widely used in our country. Though it is difficult to prove this theoretically, however, it is rather evident that this approach is the best possible, just because the above goal (natural meaning of functional elements) has the only solution.
  • 37. 2.2.1 Intra & inter proc parallelism Procedure definitions should be extended by intra (fine grained) & inter procedure parallel execution semantics. It was not possible to implement this in Elbrus time, because HW was unable to support this. This is part of the work to be done on parallel architecture implementation. All basic approaches now have been already suggested in our team. 3. “New” HLL introduction 3.0.1 HLL parallelism extension We have already implemented a New language for such a design in Elbrus (EL – 76) According to the declared general design principle this language should be (and is) with dynamic data types and with type safety approach. It should be extended by parallel semantics. 4.1 Object oriented memory implementation Unlike superscalar memory and cache organization, object oriented memory allows to do efficient optimization for local to model compiler. Object oriented memory is fully implemented in Elbrus.
  • 38. 4.1.1 To be extended to cache. In Elbrus time there was no need to use caches. All suggestions in this area have been already made. 4.2 Procedure implementation For advanced architecture procedure is a highly important feature. Elbrus made a very clean functional implementation of procedure. The basic result is highly modular programming with strong inter-procedure protection. This is also clean and best possible implementation. The main design step to be done here is its extension for intra & inter procedure parallelism support. 4.2.1 Intra (fine grained) & inter procedure execution parallelism implementation These are the main design efforts for finishing design of the best possible architecture. Only about 10 year progress of silicon technology required and made it possible to implement a radical parallel architecture. Our team has reached this point with big past experience in this area. The industry-first real OOO superscalar (Elbrus 1, 2) 1978 Even more important is that we found out that it is not the best approach and got rid of it after the second generation (Elbrus 2) 1985 VLIW (Elbrus 3) with the first successful cluster ~2000 Strands (already in Intel) 2007- 2013 Clean loop implementation based on strands 2007 - 2013 All these approaches, while reaching good results, are not the best possible (including strands) Now we have suggested a radical improvement close to Data Flow both for scalar and for loops (also looks like for the first time in industry).
  • 39. 5. “New” OS kernel introduction Elbrus 1, 2 are the first and the best possible full implementation of this technology. Due to basic principles Elbrus did not need to use privileged mode programming even in OS kernel. OS kernel implementation having the same functionality is about four times simpler (smaller in size) compared with today’s OSs and can be implemented in application mode only.
  • 40. Results • Elbrus, Narch and Narch+ are made strongly according to the approach presented in this paper. The results are impressive. These are the results of work and application of widely used architecture Elbrus 1, 2, 3 and detailed simulation of future design. • This approach allows implementation of architecture unconstrained by any compatibility restriction (Narch+) or compatible with one of existing architectures - x86, ARM, POWER, etc., or even with all of them together in one HW model with BT – (Narch).
  • 41. Main results over most powerful Intel processors: Narch • Extremely high performance both in Single Job or MT applications – unreachable for any existing architectures, maybe can reach an absolute un- improvable level already shown on detailed simulation, before introduction of all performance mechanisms 2x+ on ST 2x on MT with the same area After finishing of debugging 3x – 4x on ST 2,5x – 3x on MT with the same area • Substantially more power efficiency and less area with the same performance 20% - 30% power efficiency 60% area • More simpler architecture design • Un-improvable for any current architecture, fully compatible with x86 or ARM or any other current architectures.
  • 42. Main results over most powerful Intel processors: Narch+ • Performance is many tens of times higher both for ST and for MT • Extremely simple and power efficient • Substantially simpler and more reliable SW debugging (according to Elbrus experience – by 10 times) • Full solution of security problem both for HW, OS and for user programs (with correctness proof) – all attackers will be jobless • Really universal, this is a rather important feature. No one architecture after the very first vacuum tube computer has this characteristic. It is very likely that after Narch+ introduction (if this happens), it will not be necessary to design a myriad of specialized architectures like graphics, computer vision, machine learning and so on. Narch+ will be absolutely un-improvable architecture nearly after the very first design.