This document discusses simulating a CPU using Python. It describes the Epiphany manycore chip architecture, which has 16-64 simple cores that each have 32kB of RAM. The Revelation simulator of the Epiphany was written in RPython (statically typed Python) and has around 1,000 lines of code. It uses the Pydgin framework to generate instruction set simulators from simple architecture descriptions. The document outlines how to test the simulator by writing unit tests against individual instructions and integration tests against compiled ELF files, and compares traces with an existing reference simulator to check for differences.
This ppt show concept of Data Link Access, BSD Packet Filter, DLPI, Linux SOCK_PACKET, libpcap–Packet capture Library, libnet: Packet Creation and Injection Library
AI邊緣運算實作: TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
[1]python程式設計
https://bit.ly/359cz4m
[2]AI機器學習&深度學習
http://bit.ly/2KDZZz4
[3]TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
This talk is all about the Berkeley Packet Filters (BPF) and their uses in Linux.
Agenda:
* What is a BPF and why do we need it?
* Writing custom BPFs
* Notes on BPF implementation in the kernel
* Usage examples: SOCKET_FILTER & seccomp
Speaker:
Kfir Gollan, senior embedded software developer, Linux kernel hacker and software team leader.
This ppt show concept of Data Link Access, BSD Packet Filter, DLPI, Linux SOCK_PACKET, libpcap–Packet capture Library, libnet: Packet Creation and Injection Library
AI邊緣運算實作: TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
[1]python程式設計
https://bit.ly/359cz4m
[2]AI機器學習&深度學習
http://bit.ly/2KDZZz4
[3]TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
This talk is all about the Berkeley Packet Filters (BPF) and their uses in Linux.
Agenda:
* What is a BPF and why do we need it?
* Writing custom BPFs
* Notes on BPF implementation in the kernel
* Usage examples: SOCKET_FILTER & seccomp
Speaker:
Kfir Gollan, senior embedded software developer, Linux kernel hacker and software team leader.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
Presentation done by Rokas Antanas Balevičius in .NET Crowd event on 2019 May.
Full material with source code could be found here:
https://github.com/RokasBalevicius/netcrowd-multithreading-your-way-out
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...Anne Nicolas
I have always wanted to understand x86 instruction encoding in detail but never gotten around to it. Of course not, who has time nowadays?! So, in order to force me to do it, I decided to write an x86 instruction decoder.
This talk attempts to show what I have learned in the process and how instruction encoding is done on x86.
As a practical aspect, the decoder I’ve scratched together tries to verbosely show some of the crazy low-level hacks^Wtechniques we do in the Linux kernel like alternatives patching, jump labels, exception tables, etc – they have a lot to do with deep knowledge of x86 instructions and how code is generally laid out in the binary kernel image. Maybe this talk can help shed some light on the whole lowlevel fun that’s happening under the hood in the kernel and so many are missing out on. And maybe it’ll make it more interesting and palatable to people and they wont scare so fast anymore when we go deep into the bowels of the kernel and the machine.
Borislav Petkov, SUSE
Imagine you're tackling one of these evasive performance issues in the field, and your go-to monitoring checklist doesn't seem to cut it. There are plenty of suspects, but they are moving around rapidly and you need more logs, more data, more in-depth information to make a diagnosis. Maybe you've heard about DTrace, or even used it, and are yearning for a similar toolkit, which can plug dynamic tracing into a system that wasn't prepared or instrumented in any way.
Hopefully, you won't have to yearn for a lot longer. eBPF (extended Berkeley Packet Filters) is a kernel technology that enables a plethora of diagnostic scenarios by introducing dynamic, safe, low-overhead, efficient programs that run in the context of your live kernel. Sure, BPF programs can attach to sockets; but more interestingly, they can attach to kprobes and uprobes, static kernel tracepoints, and even user-mode static probes. And modern BPF programs have access to a wide set of instructions and data structures, which means you can collect valuable information and analyze it on-the-fly, without spilling it to huge files and reading them from user space.
In this talk, we will introduce BCC, the BPF Compiler Collection, which is an open set of tools and libraries for dynamic tracing on Linux. Some tools are easy and ready to use, such as execsnoop, fileslower, and memleak. Other tools such as trace and argdist require more sophistication and can be used as a Swiss Army knife for a variety of scenarios. We will spend most of the time demonstrating the power of modern dynamic tracing -- from memory leaks to static probes in Ruby, Node, and Java programs, from slow file I/O to monitoring network traffic. Finally, we will discuss building our own tools using the Python and Lua bindings to BCC, and its LLVM backend.
A short introduction on how functions work. Functions are the building blocks of any modern programming language. This tutorial shows you how functions are implemented and how the process stack plays an important role in supporting functions.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Join RISC-V BitManip industry leader Claire Xenia Wolf and Dr. James Cuff for an open and lively discussion with an interactive Q&A on RISC-V and BitManip including trends and comparisons with the existing architecture landscape including x86 and ARM and what specifically makes RISC-V unique.
Presentation done by Rokas Antanas Balevičius in .NET Crowd event on 2019 May.
Full material with source code could be found here:
https://github.com/RokasBalevicius/netcrowd-multithreading-your-way-out
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...Anne Nicolas
I have always wanted to understand x86 instruction encoding in detail but never gotten around to it. Of course not, who has time nowadays?! So, in order to force me to do it, I decided to write an x86 instruction decoder.
This talk attempts to show what I have learned in the process and how instruction encoding is done on x86.
As a practical aspect, the decoder I’ve scratched together tries to verbosely show some of the crazy low-level hacks^Wtechniques we do in the Linux kernel like alternatives patching, jump labels, exception tables, etc – they have a lot to do with deep knowledge of x86 instructions and how code is generally laid out in the binary kernel image. Maybe this talk can help shed some light on the whole lowlevel fun that’s happening under the hood in the kernel and so many are missing out on. And maybe it’ll make it more interesting and palatable to people and they wont scare so fast anymore when we go deep into the bowels of the kernel and the machine.
Borislav Petkov, SUSE
Imagine you're tackling one of these evasive performance issues in the field, and your go-to monitoring checklist doesn't seem to cut it. There are plenty of suspects, but they are moving around rapidly and you need more logs, more data, more in-depth information to make a diagnosis. Maybe you've heard about DTrace, or even used it, and are yearning for a similar toolkit, which can plug dynamic tracing into a system that wasn't prepared or instrumented in any way.
Hopefully, you won't have to yearn for a lot longer. eBPF (extended Berkeley Packet Filters) is a kernel technology that enables a plethora of diagnostic scenarios by introducing dynamic, safe, low-overhead, efficient programs that run in the context of your live kernel. Sure, BPF programs can attach to sockets; but more interestingly, they can attach to kprobes and uprobes, static kernel tracepoints, and even user-mode static probes. And modern BPF programs have access to a wide set of instructions and data structures, which means you can collect valuable information and analyze it on-the-fly, without spilling it to huge files and reading them from user space.
In this talk, we will introduce BCC, the BPF Compiler Collection, which is an open set of tools and libraries for dynamic tracing on Linux. Some tools are easy and ready to use, such as execsnoop, fileslower, and memleak. Other tools such as trace and argdist require more sophistication and can be used as a Swiss Army knife for a variety of scenarios. We will spend most of the time demonstrating the power of modern dynamic tracing -- from memory leaks to static probes in Ruby, Node, and Java programs, from slow file I/O to monitoring network traffic. Finally, we will discuss building our own tools using the Python and Lua bindings to BCC, and its LLVM backend.
A short introduction on how functions work. Functions are the building blocks of any modern programming language. This tutorial shows you how functions are implemented and how the process stack plays an important role in supporting functions.
eBPF is an exciting new technology that is poised to transform Linux performance engineering. eBPF enables users to dynamically and programatically trace any kernel or user space code path, safely and efficiently. However, understanding eBPF is not so simple. The goal of this talk is to give audiences a fundamental understanding of eBPF, how it interconnects existing Linux tracing technologies, and provides a powerful aplatform to solve any Linux performance problem.
This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series.
First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks.
The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing.
DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI).
At the end, there are few DPDK performance tips.
Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral IntegrationTalal Khaliq
A new descriptive method to use ARM AHB and APB Bus architecture to add new IP Cores and enhance functionality of 32 bit Processor (Leon3). AHB and APB addressing and GUI enhancement is also discussed.
Building Network Functions with eBPF & BCCKernel TLV
eBPF (Extended Berkeley Packet Filter) is an in-kernel virtual machine that allows running user-supplied sandboxed programs inside of the kernel. It is especially well-suited to network programs and it's possible to write programs that filter traffic, classify traffic and perform high-performance custom packet processing.
BCC (BPF Compiler Collection) is a toolkit for creating efficient kernel tracing and manipulation programs. It makes use of eBPF.
BCC provides an end-to-end workflow for developing eBPF programs and supplies Python bindings, making eBPF programs much easier to write.
Together, eBPF and BCC allow you to develop and deploy network functions safely and easily, focusing on your application logic (instead of kernel datapath integration).
In this session, we will introduce eBPF and BCC, explain how to implement a network function using BCC, discuss some real-life use-cases and show a live demonstration of the technology.
About the speaker
Shmulik Ladkani, Chief Technology Officer at Meta Networks,
Long time network veteran and kernel geek.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
Some billions of forwarded packets later, Shmulik left his position as Jungo's lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud-based service, focusing around virtualization systems, network virtualization and SDN.
Recently he co-founded Meta Networks where he's been busy architecting secure, multi-tenant, large-scale network infrastructure as a cloud-based service.
Finding Xori: Malware Analysis Triage with Automated DisassemblyPriyanka Aash
"In a world of high volume malware and limited researchers we need a dramatic improvement in our ability to process and analyze new and old malware at scale. Unfortunately what is currently available to the community is incredibly cost prohibitive or does not rise to the challenge. As malware authors and distributors share code and prepackaged tool kits, the corporate sponsored research community is dominated by solutions aimed at profit as opposed to augmenting capabilities available to the broader community. With that in mind, we are introducing our library for malware disassembly called Xori as an open source project. Xori is focused on helping reverse engineers analyze binaries, optimizing for time and effort spent per sample.
Xori is an automation-ready disassembly and static analysis library that consumes shellcode or PE binaries and provides triage analysis data. This Rust library emulates the stack, register states, and reference tables to identify suspicious functionality for manual analysis. Xori extracts structured data from binaries to use in machine learning and data science pipelines.
We will go over the pain-points of conventional open source disassemblers that Xori solves, examples of identifying suspicious functionality, and some of the interesting things we've done with the library. We invite everyone in the community to use it, help contribute and make it an increasingly valuable tool for researchers alike."
Chapter 1
Syllabus
Catalog Description: Computer structure, machine representation of data,
addressing and indexing, computation and control instructions, assembly
language and assemblers; procedures (subroutines) and data segments,
linkages and subroutine calling conventions, loaders; practical use of an
assembly language for computer implementation of illustrative examples.
Course Goals
0 Knowledge of the basic structure of microcomputers - registers, mem-
ory, addressing I/O devices, etc.
1 Knowledge of most non-privileged hardware instructions for the Ar-
chitecture being studied.
2 Ability to write small programs in assembly language
3 Knowledge of computer representations of data, and how to do simple
arithmetic in binary & hexadecimal, including conversions
4 Being able to implementing a moderately complicated algorithm in
assembler, with emphasis on efficiency.
5 Knowledge of procedure calling conventions and interfacing with high-
level languages.
Optional Text: Kip Irvine, Assembly Language for the IBM PC, Prentice
Hall, 4th or 5th edition
1
Additional References: Intel and DOS API documentation as presented
in Intel publications and online at www.x86.org; lecture notes (to be sup-
plied as we go).
Prerequisites by Topic. Working knowledge of some programming lan-
guage (102/103: C/C++); Minimal programming experience
Major Topics Covered in the Course:
1 Low-level and high-level languages; why learn assembler?
2 How does one study a new computer: the CPU, memory, addressing
modes, operation modes.
3 History of the Intel family of microprocessors.
4-5 Registers; simple arithmetic instructions; byte order; Arithmetic and
logical operations.
6 Implementing longer integer type support; carry and overflow.
7 Shifts, multiplication and division.
8 Memory layout.
9 Direct video memory access; discussion of the first project.
10 Assembler syntax; how to use the tools.
11-13 Conditional & unconditional jumps; loops; emulating high-level lan-
guage constructions; Stack; call and return; procedures
14-15 String instructions: effcient memory-to-memory operations.
16 Interrupts overview: interrupt table; how do interrupts work; classif-
cation.
17 Summary of the most important interrupts.
18-20 DOS interrupt; File I/O functions; file-copy program; discussion of
the second project
21 Interrupt handlers; keyboard drivers; timer-driven processes; viruses
and virus-protection software.
2
22 Debug interrupts; how do debuggers and profilers work.
23-24 (Optional).interfacing with high level languages; Protected mode fun-
damentals
Grading The grading is based on two projects, midterm project is 49%
and the final is 51%. Please note that the projects are individual, submitting
projects that are similar to submissions of others and/or are essentially
downloads from the Web would result in a fail.
Office Hours My hours this term for CSc 210 will be 3:45 ¶Ł 4:45 on
Mondays.
Zoom links:
11am https://ccny.zoom.us/j/8 ...
Concurrency and parallelism in Python are always hot topics. This talk will look the variety of forms of concurrency and parallelism. In particular this talk will give an overview of various forms of message-passing concurrency which have become popular in languages like Scala and Go. A Python library called python-csp which implements similar ideas in a Pythonic way will be introduced and we will look at how this style of programming can be used to avoid deadlocks, race hazards and "callback hell".
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
4. Adapteva Epiphany III chip
A new style of energy-efficient manycore
computer?
● 16 / 64 cores, theoretical maximum of 4095 cores
● 1 GHz
● <2 Watts
● Whole Parallella board: 5 Watts
● 19 GFLOPS / Watt
● 32 kB RAM / core
5. Packing code into that small space
● Most 32-bit Epiphany core instructions have
a 16-bit equivalent
● Compiler chooses the 16-bit instruction
whenever possible
6. Derek Lockhart, Berkin Ilbeyi, and Christopher Batten. (2015) Pydgin: Generating Fast Instruction Set
Simulators from Simple Architecture Descriptions with Meta-Tracing JIT Compilers. IEEE International
Symposium on Performance Analysis of Systems and Software (ISPASS).
7. Revelation: a new Epiphany sim
● Written using the Pydgin framework in
RPython (statically typed Python)
● All instructions implemented except some
multicore
● Small code base: ~1k Source Lines of Code
● http://www.revelation-sim.org/
Still a work in progress though (i.e. don’t expect
it to work!)
8. ● Clone pypy:
$ hg clone https://bitbucket.org/pypy/pypy
● Clone Pydgin:
$ git clone ...github.com/cornell-brg/pydgin.git
● Put Pydgin on your PYTHONPATH:
$ export PYTHONPATH=${PYTHONPATH}:.../pydgin/:
● Write a simple simulator!
Write your own Pydgin simulator!
13. Some oddities about the Epiphany
● The chip has a flat memory map. Everything
(registers, flags) is a location in RAM.
● Each core has 32kB local memory,
addressed as 0x0…0xFFFFF
● The same memory can be globally
addressed, and accessed by other cores, as
(COREID << 20) | local address, e.g.
0xF0408 (the PC) is globally 0x808F0408
for chip 0x808 (on row 32, column 8 of the
chip.
14. More oddities
● There are many instructions, almost all of
which have 16-bit and 32-bit versions. This
could lead to duplicate code.
● Thankfully, we can use closures to create all
the versions of an instruction implementation
that we need in one go:
16. Test, compile, iterate
To compile your simulator (with a JIT):
$ .../rpython/bin/rpython -Ojit sim.py
To compile with the ability to print an instruction
trace:
$ .../rpython/bin/rpython -Ojit sim.py
--debug
18. But you said TEST, compile, iterate!
● Compiling is slow, so you want to avoid
compiling every time you make a change to
the code base
● That means you want to test your RPython
code dynamically (untranslated)
● Because RPython is a subset of Python, you
can use Python test tools, such as py.test,
which means you can unit tests your
untranslated simulator
19. Unit tests
● “Proper” unit tests, which test small program
units (such as a single instruction)
● The Epiphany Architecture Reference
Manual has many examples (1 or more for
each instruction). Each of these is written
into an assembler file and used as the basis
of a (more complex) unit test.
● Currently >500 unit and integration tests in
Revelation, testing a very tiny part of the
state space!
23. Integration tests
● Integration tests typically load and execute a
full ELF file (compiled from C code). They
then check one or more of:
○ The machine state
○ The number of instructions executed
○ Output on STDOUT
● ELF files need to be cross-compiled with the
Epiphany SDK (e-gcc, etc.) and so are
checked into version control
26. Using a test oracle
● A test oracle is a source of truth, for example
an existing, complete simulator
● Adapteva e-sim:
○ https://github.com/adapteva/epiphany-cgen
○ https://github.com/adapteva/epiphany-gdb
32. Things I haven’t tried but might...
● American Fuzzy Lop:
○ http://lcamtuf.coredump.cx/afl/
● Hypothesis
○ http://hypothesis.works/
● ...any other form of automated testing