This document discusses speedups and efficiencies achieved using dataflow architectures compared to multicore and manycore systems for big data algorithms. It reports speedups of 20-200x and reductions in electricity costs of 20 times using hardware produced in Europe and software generated by EU/WB programmers. Programming dataflow machines requires adapting algorithms for a different computational model and is more difficult than traditional architectures. Several application examples are provided that demonstrate speedups, including geoscience, banking, modeling, and seismic imaging.
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
Presentation CC-4005, Performance analysis of 3D Finite Difference computational stencils on Seamicro fabric compute systems, by Joshua Mora from the AMD Developer Summit (APU13) November 2013.
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
Presentation CC-4005, Performance analysis of 3D Finite Difference computational stencils on Seamicro fabric compute systems, by Joshua Mora from the AMD Developer Summit (APU13) November 2013.
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today Fujitsu published specifications for the A64FX CPU to be featured in the post-K computer, a future machine designed to be 100 times faster than the legendary K computer that dominated the TOP500 for years.
A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance. A64FX offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the world's first CPU to adopt the SVE of Arm Limited's Armv8-A instruction set architecture, extended for supercomputers. Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX.
Learn more: https://wp.me/p3RLHQ-iYt
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
One of the presentations used in a discussion meeting about GlusterFS held on Sep. 14, 2011 in Japan.
Ust: http://www.ustream.tv/channel/glusterfs
Togetter: http://togetter.com/li/188183
One of the presentations used in a discussion meeting about GlusterFS held on Sep. 14, 2011 in Japan.
Ust: http://www.ustream.tv/channel/glusterfs
Togetter: http://togetter.com/li/188183
This PPT will help to understand about the following:
1. what is DSC ?
2. what is mp ?
3. Difference between mp and DSC ?
4. Various generation of TMS320 ?
5. Application of TMS320F2000 FAMILY
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how Neurosynaptic chips are becoming economic feasible for supercomputing applications. Neurosynaptic chips use a different architecture, one that mimics the brain with neurons and synapses. These neurons and synapses are built with conventional architecture. This presentation describes the advantages and disadvantages of synaptic chips when compared to conventional chips and how rapid rates of progress in speed, density, and power efficiency are making synaptic chips economically feasible for supercomputing applications. The biggest disadvantage for synaptic chips is in software; a new operating system and application software are needed.
We looked at the data. Here’s a breakdown of some key statistics about the nation’s incoming presidents’ addresses, how long they spoke, how well, and more.
My books- Hacking Digital Learning Strategies http://hackingdls.com & Learning to Go https://gum.co/learn2go
Resources at http://shellyterrell.com/emoji
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
We asked LinkedIn members worldwide about their levels of interest in the latest wave of technology: whether they’re using wearables, and whether they intend to buy self-driving cars and VR headsets as they become available. We asked them too about their attitudes to technology and to the growing role of Artificial Intelligence (AI) in the devices that they use. The answers were fascinating – and in many cases, surprising.
This SlideShare explores the full results of this study, including detailed market-by-market breakdowns of intention levels for each technology – and how attitudes change with age, location and seniority level. If you’re marketing a tech brand – or planning to use VR and wearables to reach a professional audience – then these are insights you won’t want to miss.
Today Fujitsu published specifications for the A64FX CPU to be featured in the post-K computer, a future machine designed to be 100 times faster than the legendary K computer that dominated the TOP500 for years.
A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance. A64FX offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the world's first CPU to adopt the SVE of Arm Limited's Armv8-A instruction set architecture, extended for supercomputers. Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX.
Learn more: https://wp.me/p3RLHQ-iYt
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
One of the presentations used in a discussion meeting about GlusterFS held on Sep. 14, 2011 in Japan.
Ust: http://www.ustream.tv/channel/glusterfs
Togetter: http://togetter.com/li/188183
One of the presentations used in a discussion meeting about GlusterFS held on Sep. 14, 2011 in Japan.
Ust: http://www.ustream.tv/channel/glusterfs
Togetter: http://togetter.com/li/188183
This PPT will help to understand about the following:
1. what is DSC ?
2. what is mp ?
3. Difference between mp and DSC ?
4. Various generation of TMS320 ?
5. Application of TMS320F2000 FAMILY
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how Neurosynaptic chips are becoming economic feasible for supercomputing applications. Neurosynaptic chips use a different architecture, one that mimics the brain with neurons and synapses. These neurons and synapses are built with conventional architecture. This presentation describes the advantages and disadvantages of synaptic chips when compared to conventional chips and how rapid rates of progress in speed, density, and power efficiency are making synaptic chips economically feasible for supercomputing applications. The biggest disadvantage for synaptic chips is in software; a new operating system and application software are needed.
We looked at the data. Here’s a breakdown of some key statistics about the nation’s incoming presidents’ addresses, how long they spoke, how well, and more.
My books- Hacking Digital Learning Strategies http://hackingdls.com & Learning to Go https://gum.co/learn2go
Resources at http://shellyterrell.com/emoji
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
We asked LinkedIn members worldwide about their levels of interest in the latest wave of technology: whether they’re using wearables, and whether they intend to buy self-driving cars and VR headsets as they become available. We asked them too about their attitudes to technology and to the growing role of Artificial Intelligence (AI) in the devices that they use. The answers were fascinating – and in many cases, surprising.
This SlideShare explores the full results of this study, including detailed market-by-market breakdowns of intention levels for each technology – and how attitudes change with age, location and seniority level. If you’re marketing a tech brand – or planning to use VR and wearables to reach a professional audience – then these are insights you won’t want to miss.
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
Computer Architecture and the von Neumann memory Barrier. New computer architectures for the 21st century: neuromorphic computing, processing in memory, and dataflow computing. Applications to machine learning, AI, image processing and other use cases. Future Technology Conference 2018 - Vancouver BC
Alexis Dacquay – is CCIE with over 10 years experience in the networking industry. He has in the past been designing, deploying, and supporting some large corporate LAN/WAN networks. He has in the last 4 years specialised in high performance datacenter networking to satisfy the needs of cloud providers, web2.0, big data, HPC, HFT, and any other enterprise for which high performing network is critical to their business. Originally from Bretagne, privately a huge fan of polish cuisine.
Topic of Presentation: Handling high-bandwidth-consumption applications in a modern DC design
Language: English
Abstract: Modern Data Centre requires proper handling of high-bandwidth consuming applications, like BigData or IP Storage. To achieve this, next generation Ethernet speeds of 25, 50 and 100Gbps are being pursued. We are to show _why_ these new Ethernet speeds are vital from technology standpoint and _how_ to cope with the those sparkling new requirements by networking hardware enablements. We are to share ethernet switches’ design considerations, with the biggest emphasis put on the importance of big buffers and how they accommodate this bursty traffic. Throughout the presentation we will additionally elaborate on the evolution of variety of modern applications, and how we can handle those with the properly designed hardware, software, and Data Centre itself.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
Seven years ago at LCA, Van Jacobsen introduced the concept of net channels but since then the concept of user mode networking has not hit the mainstream. There are several different user mode networking environments: Intel DPDK, BSD netmap, and Solarflare OpenOnload. Each of these provides higher performance than standard Linux kernel networking; but also creates new problems. This talk will explore the issues created by user space networking including performance, internal architecture, security and licensing.
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Find out more about Infineon on our Homepage: www.infineon.com/xmc
Find here all information about XMC4000 - Advanced Microcontrollers for Industrial Solutions - 32-bit Microcontroller Family based on ARM® Cortex(tm)-M4 from Infineon Technologies.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
1. V. Milutinović, G. Rakocevic, S. Stojanović, and Z. Sustran
University of Belgrade
Oskar Mencer
Imperial College, London
Oliver Pell
Maxeler Technologies, London and Palo Alto
Michael Flynn
Stanford University, Palo Alto
Valentina E. Balas
1/52
Aurel Vlaicu University of Arad
2. For Big Data algorithms
and for the same hardware price as before,
achieving:
a) speed-up, 20-200
b) monthly electricity bills, reduced 20 times
c) size, 20 times smaller
2/52
3. Absolutely all results achieved with:
a) all hardware produced in Europe,
specifically UK
b) all software generated by programmers
of EU and WB
3/52
4. ControlFlow (MultiFlow and ManyFlow):
Top500 ranks using Linpack (Japanese K,…)
DataFlow:
Coarse Grain (HEP) vs. Fine Grain (Maxeler)
4/52
5. Compiling below the machine code level brings speedups;
also a smaller power, size, and cost.
The price to pay:
The machine is more difficult to program.
Consequently:
Ideal for WORM applications :)
Examples using Maxeler:
GeoPhysics (20-40), Banking (200-1000, with JP Morgan 20%),
M&C (New York City), Datamining (Google), …
5/52
11. tCPU = tGPU = tDF = NOPS * CDF * TclkDF +
N * NOPS * CCPU*TclkCPU N * NOPS * CGPU*TclkGPU / (N – 1) * TclkDF / NDF
/NcoresCPU NcoresGPU
Assumptions:
1. Software includes enough parallelism to keep all cores busy
2. The only limiting factor is the number of cores. 11/52
19. Factor: 20
MultiCore/ManyCore DataFlow
Data Processing
Data Processing
Process Control
Process Control
19/52
20. MultiCore:
Explain what to do, to the driver
Caches, instruction buffers, and predictors needed
ManyCore:
Explain what to do, to many sub-drivers
Reduced caches and instruction buffers needed
DataFlow:
Make a field of processing gates: 1C+2nJava+3Java
No caches, etc. (300 students/year: BGD, BCN, LjU, ICL,…)
20/52
21. MultiCore:
Business as usual
ManyCore:
More difficult
DataFlow:
Much more difficult
Debugging both, application and configuration code
21/52
22. MultiCore/ManyCore:
Several minutes
DataFlow:
Several hours for the real hardware
Fortunately, only several minutes for the simulator
The simulator supports
both the large JPMorgan machine
as well as the smallest “University Support” machine
Good news:
Tabula@2GHz
22/52
29. Revisiting the Top 500 SuperComputer Benchmarks
Our paper in Communications of the ACM
Revisiting all major Big Data DM algorithms
Massive static parallelism at low clock frequencies
Concurrency and communication
Concurrency between millions of tiny cores difficult,
“jitter” between cores will harm performance
at synchronization points
Reliability and fault tolerance
10-100x fewer nodes, failures much less often
Memory bandwidth and FLOP/byte ratio
Optimize data choreography, data movement,
and the algorithmic computation
29/52
30. Maxeler Hardware
CPUs plus DFEs DFEs shared over Infiniband Low latency connectivity
Intel Xeon CPU cores and up to Up to 8 DFEs with 384GB of Intel Xeon CPUs and 1-2 DFEs
4 DFEs with 192GB of RAM RAM and dynamic allocation with up to six 10Gbit Ethernet
of DFEs to CPU servers connections
MaxWorkstation MaxCloud
Desktop development system On-demand scalable accelerated
compute resource, hosted in London
30/52
31. Major Classes of Algorithms,
from the Computational Perspective
1. Coarse grained, stateful: Business
– CPU requires DFE for minutes or hours
1. Fine grained, transactional with shared database: DM
– CPU utilizes DFE for ms to s
– Many short computations, accessing common database data
1. Fine grained, stateless transactional: Science (FF)
– CPU requires DFE for ms to s
– Many short computations
31/52
32. Coarse Grained: Modeling
80
• Long runtime, but: 70
Timesteps (thousand)
Domain points (billion)
60
• Memory requirements 50 Total computed points (trillion)
40
change dramatically based 30
on modelled frequency
20
10
• Number of DFEs allocated
0
0 10 20 30 40 50 60 70 80
Peak Frequency (Hz)
to a CPU process can be 2,000
easily varied to increase 1,800
1,600
15Hz peak frequency
30Hz peak frequency
available memory 1,400
1,200
45Hz peak frequency
70Hz peak frequency
• Streaming compression
1,000
800
600
• Boundary data exchanged 400
U
o
n
u
q
P
C
e
a
E
v
c
s
r
t
l
i
200
over chassis MaxRing 0
1 4
Number of MAX2 cards
8
32/52
33. Fine Grained, Shared Data: Monitoring
• DFE DRAM contains the database to be searched
• CPUs issue transactions find(x, db)
• Complex search function
– Text search against documents
– Shortest distance to coordinate (multi-dimensional)
– Smith Waterman sequence alignment for genomes
• Any CPU runs on any DFE
that has been loaded with the database
– MaxelerOS may add or remove DFEs
from the processing group to balance system demands
– New DFEs must be loaded with the search DB before use
33/52
34. Fine Grained, Stateless: The BSOP Control
• Analyse > 1,000,000 scenarios
• Many CPU processes run on many DFEs
– Each transaction executes on any DFE in the assigned group atomically
• ~50x MPC-X vs. multi-core x86 node
CPU
CPU DFE
CPU
CPU Market and DFE
DFE
Loop over instruments
Loop over instruments
Loop over instruments
Loop over instruments
CPU instruments DFE
DFE
Loop over instruments
Loop over instruments
Loop over instruments
Loop over instruments
Loop over instruments
Loop over instruments
Random number
Random number
data Random number
Random number
Random number
Random number
generator and
Random numberand
generator
Random number
generator and
Random numberand
generator
Random number
generator and
generator and
sampling of and
Tail
Tail generator underliers
sampling of of underliers
sampling underliers
generator and
Tail
Tail sampling of of underliers
sampling underliers
generator and
generator and
Tail
Tail sampling of of underliers
sampling underliers
Tail
analysis
Tail
analysis sampling of of underliers
sampling underliers
sampling of underliers
Tail
analysis
Tail
analysis
analysis
analysis
analysis
onCPU
CPU
analysis
analysis CPU
onCPU
onCPU
analysis
onCPU
onCPU
on
on CPU
on Price instruments
Price instruments
Price instruments
Price instruments
on CPU
on CPU Price instruments
Priceusing Black
instruments
using Black
Price instruments
Priceusing Black
instruments
using Black
Price instruments
Priceusing Scholes
instruments
Black
using Black
Scholes
using Scholes
Black
using Black
Scholes
using Scholes
Black
using Black
Scholes
Instrument Scholes
Scholes
Scholes
Scholes
values
34/52
37. The CRS Results
Performance of one MAX2 card vs. 1 CPU core
Land case (8 params), speedup of 230x
Marine case (6 params), speedup of 190x
CPU Coherency MAX2 Coherency
37/52
38. Seismic Imaging
• Running on MaxNode servers
- 8 parallel compute pipelines per chip
- 150MHz => low power consumption!
- 30x faster than microprocessors
An Implementation of the Acoustic Wave Equation on FPGAs
T. Nemeth†, J. Stefani†, W. Liu†, R. Dimond‡, O. Pell‡, R.Ergas§
†
Chevron, ‡Maxeler, §Formerly Chevron, SEG 2008
38/52
41. P. Marchetti et al, 2010
Trace Stacking: Speed-up 217
• DM for Monitoring and Control in Seismic processing
• Velocity independent / data driven method
to obtain a stack of traces, based on 8 parameters
– Search for every sample of each output trace
2
2 T 2t0 T
t 2
hyp = t0 + w m +
v0 v0
(
m H zy K N H T m + h T H zy K NIP H T h
zy zy )
2 parameters ( emergence angle & azimuth )
3 Normal Wave front parameters ( KN,11; KN,12 ; KN22 )
3 NIP Wave front parameters ( KNip,11; KNip,12 ; KNip22 )
41/52
45. Conclusion: Nota Bene
This is about algorithmic changes,
to maximize
the algorithm to architecture match:
Data choreography,
process modifications,
and
decision precision.
The winning paradigm
of Big Data ExaScale?
45/52
46. The TriPeak
Siena
+ BSC
+ Imperial College
+ Maxeler
+ Belgrade
46/52
46/8
47. The TriPeak
MontBlanc = A ManyCore (NVidia) + a MultiCore (ARM)
Maxeler = A FineGrain DataFlow (FPGA)
How about a happy marriage?
MontBlanc (ompSS) and Maxeler (an accelerator)
In each happy marriage,
it is known who does what :)
The Big Data DM algorithms:
What part goes to MontBlanc and what to Maxeler?
47/52
47/8
48. Core of the Symbiotic Success
An intelligent DM algorithmic scheduler,
partially implemented for compile time,
and partially for run time.
At compile time:
Checking what part of code fits where
(MontBlanc or Maxeler): LoC 1M vs 2K vs 20K
At run time:
Rechecking the compile time decision,
based on the current data values.
48/52
48/8
49. Maxeler: Teaching (Google: prof
vm) VLSI, PowerPoints, Maxeler:
TEACHING,
Maxeler Veljko Explanations, August 2012
Maxeler Veljko Anegdotic,
Maxeler Oskar Talk, August 2012
Maxeler Forbes Article
Flyer by JP Morgan
Flyer by Maxeler HPC
Tutorial Slides by Sasha and Veljko: Practice (Current Update)
Paper, unconditionally accepted for Advances in Computers by Elsevier
Paper, unconditionally accepted for Communications of the ACM
Tutorial Slides by Oskar: Theory (7 parts)
Slides by Jacob, New York
Slides by Jacob, Alabama
Slides by Sasha: Practice (Current Update)
Maxeler in Meteorology
Maxeler in Mathematics
Examples generated in Belgrade and Worldwide
THE COURSE ALSO INCLUDES DARPA METHODOLOGY FOR MICROPROCESSOR DESIGN,
with an example
49/52
49/8
50. Maxeler: Research (Google: good
method)
Structure of a Typical Research Paper: Scenario #1
[Comparison of Platforms for One Algorithm]
Curve A: MultiCore of approximately the same PurchasePrice
Curve B: ManyCore of approximately the same PurchasePrice
Curve C: Maxeler after a direct algorithm migration
Curve D: Maxeler after algorithmic improvements
Curve E: Maxeler after data choreography
Curve F: Maxeler after precision modifications
Structure of a Typical Research Paper: Scenario #2
[Ranking of Algorithms for One Application]
CurveSet A: Comparison of Algorithms on a MultiCore
CurveSet B: Comparison of Algorithms on a ManyCore
CurveSet C: Comparison on Maxeler, after a direct algorithm migration
CurveSet D: Comparison on Maxeler, after algorithmic improvements
CurveSet E: Comparison on Maxeler, after data choreography
CurveSet F: Comparison on Maxeler, after precision modifications
50/52
50/8
51. Maxeler: Topics (Google: HiPeac Berlin)
SRB (TR):
KG: Blood Flow
NS: Combinatorial Math
BG1: MiSANU Math
BG2: Meteos Meteorology
BG3: Physics (Gross Pitaevskii 3D real)
BG4: Physics (Gross Pitaevskii 3D imaginary)
(reusability with MPI/OpenMP vs effort to accelerate)
FP7 (Call 11):
University of Siena, Italy,
ICL, UK,
BSC, Spain,
QPLAN, Greece,
ETF, Serbia,
IJS, Slovenia, …
51/52
51/8