The document describes the memory management unit (MMU) of the LatticeMico32 processor. It has 1024 entries each for the instruction and data translation lookaside buffers (ITLB and DTLB). It supports adding, updating, and invalidating TLB entries as well as flushing the TLBs using control and status registers. The MMU can be activated and deactivated by setting bits in the processor status word register to enable virtual memory and memory protection capabilities.
This document provides instructions for compiling and testing the WDT (Facebook's open-source data transfer library). It summarizes the steps to install prerequisites like Cmake and OpenSSL. It then describes compiling WDT from source, including issues encountered with specific library versions. The document tests WDT's transfer speed compared to SCP by sending a 5GB directory from one Ubuntu system to another. It notes WDT requires specifying a start port unlike SCP.
The document discusses Oracle database logging and redo operations. It describes how Oracle uses physiological logging to generate redo records from change vectors. Change vectors transition database blocks between versions. Redo records group change vectors and transition the overall database state. The document provides an example redo record for an INSERT statement, showing the change vectors for both the table and undo segments involved in the transaction.
The document discusses optimizing parallel reduction in CUDA. It presents 7 versions of a parallel reduction kernel with optimizations including interleaved vs sequential addressing to avoid bank conflicts, unrolling the last warp, adding the first reduction during global memory loads, and completely unrolling the reduction using C++ templates. Performance is improved from 2 GB/s to over 30 GB/s for a 4 million element reduction, reaching over 15x speedup by eliminating instruction overhead. Templates allow specializing the kernel for different block sizes known only at launch time.
This document provides an overview of advanced RAC troubleshooting concepts by Riyaj Shamsudeen. It discusses key concepts related to cache coherency, single and multi-block reads and transfers in RAC, buffer changes when modifying data, and common wait events seen in RAC environments like gc cr block 2-way and gc cr block 3-way. The document is intended for experienced Oracle professionals and provides examples and demonstrations of the various RAC concepts discussed.
There are three levels of tracing in Vertica: Select, Session, and System. The Select level traces a single statement, Session traces all statements in a session, and System traces all queries across sessions. The trace output populates tables with query metadata and execution details. A long-running query was identified and traced to a costly GroupByHash operator. Creating a projection to presort the data enabled pipelining between operators, improving performance by 85%.
This document provides an introduction to Real Application Clusters (RAC) architecture by Riyaj Shamsudeen. It includes sections on RAC architecture, reasons for and against implementing RAC, clusterware commands, cache fusion, and row-level locking. The author discusses key RAC concepts such as shared-nothing vs shared-everything architecture, clusterware components, instance vs database definitions, and importance of workload segregation and application affinity in RAC deployments.
The document discusses several key concepts in Cassandra including gossip, memtables/SSTables, compaction, commitlogs, consistency levels, hinted handoff, anti-entropy, and read repair. It provides an overview of Cassandra's architecture, data distribution, replication strategies, and interfaces. It also covers node management tasks like adding, removing, and moving nodes as well as recovering nodes.
This document is the manual for D-ITG V. 2.6.1d, which is a network performance measurement tool developed by researchers at the University of Naples Federico II. It describes the usage and options of various programs that are part of the D-ITG platform, including ITGSend for generating network traffic, ITGRecv for receiving traffic, and ITGLog, ITGDec, and ITGplot for analyzing results. The manual provides examples of how to use each program to perform different types of network measurements and traffic generation.
This document provides instructions for compiling and testing the WDT (Facebook's open-source data transfer library). It summarizes the steps to install prerequisites like Cmake and OpenSSL. It then describes compiling WDT from source, including issues encountered with specific library versions. The document tests WDT's transfer speed compared to SCP by sending a 5GB directory from one Ubuntu system to another. It notes WDT requires specifying a start port unlike SCP.
The document discusses Oracle database logging and redo operations. It describes how Oracle uses physiological logging to generate redo records from change vectors. Change vectors transition database blocks between versions. Redo records group change vectors and transition the overall database state. The document provides an example redo record for an INSERT statement, showing the change vectors for both the table and undo segments involved in the transaction.
The document discusses optimizing parallel reduction in CUDA. It presents 7 versions of a parallel reduction kernel with optimizations including interleaved vs sequential addressing to avoid bank conflicts, unrolling the last warp, adding the first reduction during global memory loads, and completely unrolling the reduction using C++ templates. Performance is improved from 2 GB/s to over 30 GB/s for a 4 million element reduction, reaching over 15x speedup by eliminating instruction overhead. Templates allow specializing the kernel for different block sizes known only at launch time.
This document provides an overview of advanced RAC troubleshooting concepts by Riyaj Shamsudeen. It discusses key concepts related to cache coherency, single and multi-block reads and transfers in RAC, buffer changes when modifying data, and common wait events seen in RAC environments like gc cr block 2-way and gc cr block 3-way. The document is intended for experienced Oracle professionals and provides examples and demonstrations of the various RAC concepts discussed.
There are three levels of tracing in Vertica: Select, Session, and System. The Select level traces a single statement, Session traces all statements in a session, and System traces all queries across sessions. The trace output populates tables with query metadata and execution details. A long-running query was identified and traced to a costly GroupByHash operator. Creating a projection to presort the data enabled pipelining between operators, improving performance by 85%.
This document provides an introduction to Real Application Clusters (RAC) architecture by Riyaj Shamsudeen. It includes sections on RAC architecture, reasons for and against implementing RAC, clusterware commands, cache fusion, and row-level locking. The author discusses key RAC concepts such as shared-nothing vs shared-everything architecture, clusterware components, instance vs database definitions, and importance of workload segregation and application affinity in RAC deployments.
The document discusses several key concepts in Cassandra including gossip, memtables/SSTables, compaction, commitlogs, consistency levels, hinted handoff, anti-entropy, and read repair. It provides an overview of Cassandra's architecture, data distribution, replication strategies, and interfaces. It also covers node management tasks like adding, removing, and moving nodes as well as recovering nodes.
This document is the manual for D-ITG V. 2.6.1d, which is a network performance measurement tool developed by researchers at the University of Naples Federico II. It describes the usage and options of various programs that are part of the D-ITG platform, including ITGSend for generating network traffic, ITGRecv for receiving traffic, and ITGLog, ITGDec, and ITGplot for analyzing results. The manual provides examples of how to use each program to perform different types of network measurements and traffic generation.
This document discusses network considerations for Real Application Clusters (RAC). It describes the different network types used, including public, private, storage, and backup networks. It discusses protocols like TCP and UDP used for different traffic. It also covers concepts like network architecture, layers, MTU, jumbo frames, and tools for monitoring network performance like netstat, ping, and traceroute.
The RAC cluster was experiencing intermittent hangs lasting 10-15 minutes. Analysis showed high global buffer busy waits and log file sync waits across nodes. Further investigation revealed a background process on one node had been waiting for a CF lock for over 4 minutes, indicating a locking contention issue that was slowing down the entire cluster.
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
HighLoad++ 2017
Зал «Кейптаун», 7 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/2868.html
В Avito объявления хранятся в базах данных Postgres. При этом уже на протяжении многих лет активно применяется логическая репликация. С помощью неё успешно решаются вопросы роста объема данных и количества запросов к ним, масштабирования и распределения нагрузки, доставки данных в DWH и поисковые подсистемы, меж-базные и меж-сервисные синхронизации данных и пр.
...
Here are the key points about the C++11 memory model and ordering:
- The C++ memory model aims to balance performance and correctness for concurrent programs. It allows optimizations but prevents data races.
- Operations on atomic types have memory ordering properties that restrict how instructions can be reordered with respect to other threads.
- A release fence prevents writes from moving past the fence. An acquire fence prevents reads from moving before the fence.
- For the code snippet shown, a thread reading flag needs to ensure it sees the write to data. This requires an acquire fence after loading flag to prevent the load from moving above the write to data.
So the correct answer is that it needs an acquire fence after loading flag
1. The document discusses implementing distributed mclock in Ceph for quality of service (QoS). It describes implementing QoS units at the pool, RBD image, and universal levels.
2. It covers inserting delta/rho/phase parameters into Ceph classes for distributed mclock. Issues addressed include number of shards and background I/O.
3. An outstanding I/O based adaptive throttle is introduced to suspend mclock scheduling if the I/O load is too high. Testing showed it effectively maintained maximum throughput.
4. Future plans include improving the mclock algorithm, extending QoS to individual RBDs, adding metrics, and testing in various environments. Collaboration with the community is
This slides explains why Paxos is the only correctly way to problems about consensus in a distributed system.
This slides uses several diagram to show how paxos is derived from a naive replication algorithm to a immediate consistent replication algorithm.
It starts with master-slave replication.
Then we refine it to quorum-rw by adding consistency constrain.
And then we refine quorum-rw to paxos by adding atomicity constrain.
This document discusses Corosync and Pacemaker, which are open source cluster management and resource management tools. Corosync provides messaging, membership, and quorum services, while Pacemaker acts as a cluster resource manager that can monitor resources and move them between nodes as needed. Resources can be configured as clones, which have multiple instances across nodes, or as master-slave, where one node is active and others are backups. Pacemaker uses resource agents to monitor specific services and resources. The document provides examples of configuring and clustering an Apache web server and IP address.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
This document discusses issues related to low latency and mechanical sympathy in a Java application for routing orders. It addresses GC pauses, power management, NUMA, and cache pollution. To achieve sub-millisecond performance, the document recommends tuning the GC, BIOS, OS, NUMA configuration, and avoiding cache pollution through thread affinity and CPU isolation. The goal is to optimize performance without changing application code through mechanical sympathy with the hardware.
These are the slides for a talk I gave to the Fredericksburg Linux User Group about Bitcoin and cryptocurrency in general on 2014-02-22. Audio is forthcoming from one of the attendees as a podcast.
In previous work, we proposed a new multi-versioning STM--adaptive object metadata (henceforth AOM for short)---that reduces substantially both the memory and the performance overheads associated with transactional locations that are not under contention. AOM is an object-based design that follows the JVSTM general design, but it is adaptive because the metadata used for each transactional object changes over time, depending on how objects are accessed. Now we implemented a new version of the AOM that is based on the lock-free version of the JVSTM and we eliminated all the overheads of accessing objects in the compact layout during read-only transactions. To make the contention-free execution path free of any STM barrier, we duplicated the accessors of the transactional classes, so that one accesses directly the object fields and another uses STM barriers.
The document describes a cache-aware hybrid sorter that is faster than the STL sort. It first radix sorts input streams into substreams that fit into the CPU cache. This is done in a cache-friendly manner by splitting streams based on cache size. The substreams are then merged using a loser tree merge, which has better memory access patterns than a heap-based priority queue. Testing showed the hybrid sort was 2-6 times faster than STL sort and scaled well on multi-core CPUs.
The document discusses the Disruptor, a high performance inter-thread messaging library created by LMAX. It introduces key concepts like the ring buffer, producer-consumer pattern, and consumer dependency graphs. It then explains the Disruptor architecture which uses a multithreaded producer to populate a ring buffer of event data which is consumed by handler threads in a specified order without blocking. Finally, it outlines examples of unicast, multicast and pipelined consumer configurations and references additional resources to learn more about the Disruptor and related concurrency concepts.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
The document discusses dense linear algebra solvers and algorithms. It provides an overview of existing software for dense linear algebra including LINPACK, EISPACK, LAPACK, ScaLAPACK, PLASMA, and MAGMA. It then discusses challenges with dense linear algebra on modern hardware including distributed memory, heterogeneity, and the high cost of communication. It introduces tile algorithms as an approach to address these challenges compared to traditional LAPACK algorithms.
Kernel Recipes 2015 - Porting Linux to a new processor architectureAnne Nicolas
Getting the Linux kernel running on a new processor architecture is a difficult process. Worse still, there is not much documentation available describing the porting process.
After spending countless hours becoming almost fluent in many of the supported architectures, I discovered that a well-defined skeleton shared by the majority of ports exists. Such a skeleton can logically be split into two parts that intersect a great deal.
The first part is the boot code, meaning the architecture-specific code that is executed from the moment the kernel takes over from the bootloader until init is finally executed. The second part concerns the architecture-specific code that is regularly executed once the booting phase has been completed and the kernel is running normally. This second part includes starting new threads, dealing with hardware interrupts or software exceptions, copying data from/to user applications, serving system calls, and so on.
In this talk I will provide an overview of the procedure, or at least one possible procedure, that can be followed when porting the Linux kernel to a new processor architecture.
Joël Porquet – Joël was a post-doc at Pierre and Marie Curie University (UPMC) where he ported Linux to TSAR, an academic processor. He is now looking for new adventures.
A Brief Introduction of TiDB (Percona Live)PingCAP
TiDB is an open-source distributed SQL database that supports high availability, horizontal scalability, and consistent distributed transactions. It provides a MySQL compatible API and seamless online expansion. TiDB uses Raft for consensus and implements the MVCC model to support high concurrency. It also provides distributed transactions through a two-phase commit protocol. The architecture consists of a stateless SQL layer (TiDB) and a distributed transactional key-value storage (TiKV).
Managing Unstructured Data: Lobs in the World of JSONMichael Rosenblum
This document discusses managing unstructured JSON data in Oracle databases. It describes how a company initially stored JSON files in VARCHAR2 columns, but then the files grew larger than 4000 characters requiring a change to CLOB storage. This change caused issues until developers understood that CLOBs have different access, storage, and processing mechanisms compared to VARCHAR2. The document provides an overview of CLOB architecture including data access, internal storage, caching, logging, and indexing. It emphasizes that properly understanding CLOBs is important when storing and manipulating JSON data in Oracle databases.
This document discusses network considerations for Real Application Clusters (RAC). It describes the different network types used, including public, private, storage, and backup networks. It discusses protocols like TCP and UDP used for different traffic. It also covers concepts like network architecture, layers, MTU, jumbo frames, and tools for monitoring network performance like netstat, ping, and traceroute.
The RAC cluster was experiencing intermittent hangs lasting 10-15 minutes. Analysis showed high global buffer busy waits and log file sync waits across nodes. Further investigation revealed a background process on one node had been waiting for a CF lock for over 4 minutes, indicating a locking contention issue that was slowing down the entire cluster.
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
HighLoad++ 2017
Зал «Кейптаун», 7 ноября, 12:00
Тезисы:
http://www.highload.ru/2017/abstracts/2868.html
В Avito объявления хранятся в базах данных Postgres. При этом уже на протяжении многих лет активно применяется логическая репликация. С помощью неё успешно решаются вопросы роста объема данных и количества запросов к ним, масштабирования и распределения нагрузки, доставки данных в DWH и поисковые подсистемы, меж-базные и меж-сервисные синхронизации данных и пр.
...
Here are the key points about the C++11 memory model and ordering:
- The C++ memory model aims to balance performance and correctness for concurrent programs. It allows optimizations but prevents data races.
- Operations on atomic types have memory ordering properties that restrict how instructions can be reordered with respect to other threads.
- A release fence prevents writes from moving past the fence. An acquire fence prevents reads from moving before the fence.
- For the code snippet shown, a thread reading flag needs to ensure it sees the write to data. This requires an acquire fence after loading flag to prevent the load from moving above the write to data.
So the correct answer is that it needs an acquire fence after loading flag
1. The document discusses implementing distributed mclock in Ceph for quality of service (QoS). It describes implementing QoS units at the pool, RBD image, and universal levels.
2. It covers inserting delta/rho/phase parameters into Ceph classes for distributed mclock. Issues addressed include number of shards and background I/O.
3. An outstanding I/O based adaptive throttle is introduced to suspend mclock scheduling if the I/O load is too high. Testing showed it effectively maintained maximum throughput.
4. Future plans include improving the mclock algorithm, extending QoS to individual RBDs, adding metrics, and testing in various environments. Collaboration with the community is
This slides explains why Paxos is the only correctly way to problems about consensus in a distributed system.
This slides uses several diagram to show how paxos is derived from a naive replication algorithm to a immediate consistent replication algorithm.
It starts with master-slave replication.
Then we refine it to quorum-rw by adding consistency constrain.
And then we refine quorum-rw to paxos by adding atomicity constrain.
This document discusses Corosync and Pacemaker, which are open source cluster management and resource management tools. Corosync provides messaging, membership, and quorum services, while Pacemaker acts as a cluster resource manager that can monitor resources and move them between nodes as needed. Resources can be configured as clones, which have multiple instances across nodes, or as master-slave, where one node is active and others are backups. Pacemaker uses resource agents to monitor specific services and resources. The document provides examples of configuring and clustering an Apache web server and IP address.
Video games are written as a main loop: process player input, update the state of the game, render a new frame to the screen, repeat. They do this 60 times a second, with millisecond timing. Most monitoring tools are also written as loops: send a probe, wait for the response, update a data store, sleep. Often this is done pretty slowly, maybe once a second! In video games if you can’t update fast enough, you skip the rendering step and the frame rate drops. With monitoring tools if your loop takes to long you also stop logging data as often, and instead of choppy gameplay you get gaps in your graphs, often when you need that data the most!
Let’s use ping as an example and see how we can rewrite its main loop to function more like a video game, keeping a high frame rate.
This document discusses issues related to low latency and mechanical sympathy in a Java application for routing orders. It addresses GC pauses, power management, NUMA, and cache pollution. To achieve sub-millisecond performance, the document recommends tuning the GC, BIOS, OS, NUMA configuration, and avoiding cache pollution through thread affinity and CPU isolation. The goal is to optimize performance without changing application code through mechanical sympathy with the hardware.
These are the slides for a talk I gave to the Fredericksburg Linux User Group about Bitcoin and cryptocurrency in general on 2014-02-22. Audio is forthcoming from one of the attendees as a podcast.
In previous work, we proposed a new multi-versioning STM--adaptive object metadata (henceforth AOM for short)---that reduces substantially both the memory and the performance overheads associated with transactional locations that are not under contention. AOM is an object-based design that follows the JVSTM general design, but it is adaptive because the metadata used for each transactional object changes over time, depending on how objects are accessed. Now we implemented a new version of the AOM that is based on the lock-free version of the JVSTM and we eliminated all the overheads of accessing objects in the compact layout during read-only transactions. To make the contention-free execution path free of any STM barrier, we duplicated the accessors of the transactional classes, so that one accesses directly the object fields and another uses STM barriers.
The document describes a cache-aware hybrid sorter that is faster than the STL sort. It first radix sorts input streams into substreams that fit into the CPU cache. This is done in a cache-friendly manner by splitting streams based on cache size. The substreams are then merged using a loser tree merge, which has better memory access patterns than a heap-based priority queue. Testing showed the hybrid sort was 2-6 times faster than STL sort and scaled well on multi-core CPUs.
The document discusses the Disruptor, a high performance inter-thread messaging library created by LMAX. It introduces key concepts like the ring buffer, producer-consumer pattern, and consumer dependency graphs. It then explains the Disruptor architecture which uses a multithreaded producer to populate a ring buffer of event data which is consumed by handler threads in a specified order without blocking. Finally, it outlines examples of unicast, multicast and pipelined consumer configurations and references additional resources to learn more about the Disruptor and related concurrency concepts.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
The document discusses dense linear algebra solvers and algorithms. It provides an overview of existing software for dense linear algebra including LINPACK, EISPACK, LAPACK, ScaLAPACK, PLASMA, and MAGMA. It then discusses challenges with dense linear algebra on modern hardware including distributed memory, heterogeneity, and the high cost of communication. It introduces tile algorithms as an approach to address these challenges compared to traditional LAPACK algorithms.
Kernel Recipes 2015 - Porting Linux to a new processor architectureAnne Nicolas
Getting the Linux kernel running on a new processor architecture is a difficult process. Worse still, there is not much documentation available describing the porting process.
After spending countless hours becoming almost fluent in many of the supported architectures, I discovered that a well-defined skeleton shared by the majority of ports exists. Such a skeleton can logically be split into two parts that intersect a great deal.
The first part is the boot code, meaning the architecture-specific code that is executed from the moment the kernel takes over from the bootloader until init is finally executed. The second part concerns the architecture-specific code that is regularly executed once the booting phase has been completed and the kernel is running normally. This second part includes starting new threads, dealing with hardware interrupts or software exceptions, copying data from/to user applications, serving system calls, and so on.
In this talk I will provide an overview of the procedure, or at least one possible procedure, that can be followed when porting the Linux kernel to a new processor architecture.
Joël Porquet – Joël was a post-doc at Pierre and Marie Curie University (UPMC) where he ported Linux to TSAR, an academic processor. He is now looking for new adventures.
A Brief Introduction of TiDB (Percona Live)PingCAP
TiDB is an open-source distributed SQL database that supports high availability, horizontal scalability, and consistent distributed transactions. It provides a MySQL compatible API and seamless online expansion. TiDB uses Raft for consensus and implements the MVCC model to support high concurrency. It also provides distributed transactions through a two-phase commit protocol. The architecture consists of a stateless SQL layer (TiDB) and a distributed transactional key-value storage (TiKV).
Managing Unstructured Data: Lobs in the World of JSONMichael Rosenblum
This document discusses managing unstructured JSON data in Oracle databases. It describes how a company initially stored JSON files in VARCHAR2 columns, but then the files grew larger than 4000 characters requiring a change to CLOB storage. This change caused issues until developers understood that CLOBs have different access, storage, and processing mechanisms compared to VARCHAR2. The document provides an overview of CLOB architecture including data access, internal storage, caching, logging, and indexing. It emphasizes that properly understanding CLOBs is important when storing and manipulating JSON data in Oracle databases.
Porting NetBSD to the open source LatticeMico32 CPUYann Sionneau
In this talk I gave at EHSM 2014 event ( http://ehsm.eu ) I am explaining what a MMU is and how it works. I then explain how I ported NetBSD (and EdgeBSD which is a fork of NetBSD) on this open source LM32 CPU in which I added an MMU.
TiDB and Amazon Aurora can be combined to provide analytics on transactional data without needing a separate data warehouse. TiDB Data Migration (DM) tool allows migrating and replicating data from Aurora into TiDB for analytics queries. DM provides full data migration and incremental replication of binlog events from Aurora into TiDB. This allows joining transactional and analytical workloads on the same dataset without needing ETL pipelines.
The document describes the BAT Buffer Pool (BBP) in MonetDB. The BBP manages BATs (Binary Association Tables) in memory and persists them to disk. It handles administration, lookup, persistence, buffer management, recovery, unloading, reference counting, sharing, and synchronization of BATs. The BBP stores BAT metadata in a BBP.dir file and uses locks to control concurrent access to the in-memory BBP array.
This document describes the design and implementation of a MIPS processor using Verilog. It begins with an overview of the MIPS architecture and instruction set. It then provides Verilog code for the top-level processor module, controller, datapath, register file, ALU, and other components. Diagrams of the processor microarchitecture and multicycle controller state machine are also shown. The document focuses on hierarchically designing the MIPS processor using a structural Verilog approach and parameterized modules.
This document summarizes several myths about database redo, undo, commit, and rollback operations. It presents test cases and analysis to debunk the myths. The author is an experienced Oracle DBA who specializes in performance tuning and internals. Sample redo records are displayed and analyzed to explain how operations like rollback do generate redo. The document aims to clarify misunderstandings about the internal workings of Oracle's transaction and redo logging.
How to-mount-3 par-san-virtual-copy-onto-rhel-servers-by-Dusan-BaljevicCircling Cycle
The document describes how to mount a 3PAR virtual copy volume onto a RHEL server. It involves creating host definitions and exporting volumes from 3PAR to the server. The volumes are then mapped, formatted, and mounted. Finally, a virtual copy is created on 3PAR and exported to the server, where it is detected as a new volume.
The document discusses virtual memory and TLB misses. When a TLB miss occurs, the TLB entry for the requested translation must be loaded. This can be done by hardware reloading the TLB from the page table or via a software TLB miss handler. The handler looks up the page table entry, loads the TLB, and resumes execution. Demand paging is also covered, where non-resident pages are swapped to disk and reloaded on access to reduce memory usage based on locality of reference.
How Triton can help to reverse virtual machine based software protectionsJonathan Salwan
The first part of the talk is going to be an introduction to the Triton framework to expose its components and to explain how they work together. Then, the second part will include demonstrations on how it's possible to reverse virtual machine based protections using taint analysis, symbolic execution, SMT simplifications and LLVM-IR optimizations.
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
Presented by Luke Lovett, Software Engineer, MongoDB
Experience level: Introductory
MongoDB and Hadoop work powerfully together as complementary technologies. Learn how the Hadoop connector allows you to leverage the power of MapReduce to process data sourced from your MongoDB cluster.
This document discusses configuring Microsoft Distributed Transaction Coordinator (MSDTC) in a SQL Server cluster. It describes setting up an Active Directory domain, installing the Failover Clustering feature and creating a Windows cluster. It also discusses configuring a dedicated MSDTC resource for the cluster, including adding a MSDTC resource to a SQL Server role and configuring its dependencies on a disk and hostname. The document provides best practices for using MSDTC in a clustered SQL Server environment.
The document discusses the SysTick timer in ARM Cortex-M3 microcontrollers. SysTick generates interrupts at regular intervals which allows an operating system to perform task switching for multiple tasks. It describes the registers used to configure SysTick including the control and status register, current value register, and reload value register. It explains how to set the clock source, reload value, enable interrupts and counter to start the downcounting timer.
This document summarizes the results of running a DBT2 benchmark test on an Oracle Cloud infrastructure with MySQL NDB Cluster. The benchmark achieved over 3 million transactions per minute by using 6 data nodes with 52 CPU cores each, 15 MySQL server nodes with 36 CPU cores each, and varying the number of replicas and node groups. The limiting factor was found to be the NDB data nodes. Over 4 million transactions per minute was achieved using a total of 770 MySQL server CPUs and 340 data node CPUs across 22 bare metal servers.
This document provides an overview and introduction to GlobalISel, including:
- What GlobalISel is and how it differs from SelectionDAG
- The key stages in the GlobalISel compilation flow
- Examples of basic arithmetic instruction selection and type legalization in GlobalISel
- Tips for building GlobalISel support in a backend, such as handling constants and register selection
Проведение криминалистической экспертизы и анализа руткит-программ на примере...Alex Matrosov
This document summarizes a presentation on analyzing the Win32/Olmarik(TDL4) rootkit through forensic examination and debugging techniques. It discusses the evolution of rootkits from x86 to x64 systems and techniques used by TDL rootkits to bypass security protections like driver signature enforcement. It also demonstrates tools like TdlFsReader that were developed to analyze the hidden TDL file system and decrypt encrypted files.
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...Positive Hack Days
В рамках мастер-класса будет рассмотрены следующие вопросы:
методы внедрения и работы руткита TDL4;
инструментарий и методы сбора данных для проведения криминалистической экспертизы зараженной машины;
отладка буткит-составляющей на ранней стадии загрузки системы с использованием эмулятора Bochs;
анализ зараженной машины при помощи WinDbg;
удаление руткита из системы после сбора всех необходимых данных.
Static scheduling machines like VLIW move instruction scheduling from hardware to compiler by packing multiple operations into single instructions. This simplifies hardware but requires compilers to explicitly represent dependencies. The Intel Itanium ISA follows the VLIW philosophy, using instruction bundles containing up to 3 operations. Itanium 2 improved on the original design with an 8-stage pipeline and register renaming to boost performance. Itanium supports both control and data speculation via techniques like speculative loads and the Advanced Load Address Table.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Session 1 - Intro to Robotic Process Automation.pdf
LatticeMico32 MMU documentation
1. LatticeMico32 Memory Management Unit Documentation
Yann Sionneau
version 1.0 June 2013
Contents
1 Overview 1
2 Features 2
3 TLB Layout 2
4 Interact with the TLB 3
4.1 Add or Update a TLB entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Invalidate a TLB entry or Flush the entire TLB . . . . . . . . . . . . . . . . . . . 5
4.3 A sum up of TLB actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Interact with the MMU 6
5.1 Activate the MMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Deactivate the MMU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6 TLB lookups 7
7 CSR registers special behaviours 8
1 Overview
This document describes the LatticeMico32 MMU (Memory Management Unit) features and
how it can be congured.
This MMU is not part of the original LatticeMico32 CPU, it has been added by Yann Sionneau
with the help of the Milkymist community in general and Michael Walle in particular.
The LM32 MMU has been designed with simplicity in head, KISS (Keep It Simple Stupid) is
the motto.
Only the minimum has been implemented to have the minimalistic features which would allow
a modern Operating System like Linux or *BSD to run, providing virtual memory and memory
protection.
1
2. The Caches are designed to be VIPT (Virtually Indexed Physically Tagged) to allow the TLB
lookup to take place in parallel of the cache lookup so that we don't need to stale the pipeline.
2 Features
• 1024 entries ITLB (Instruction Translation Lookaside Buer)
• 1024 entries DTLB (Data Translation Lookaside Buer)
• CPU exceptions generated upon
ITLB miss
DTLB miss
DTLB page fault (writing to a read-only page)
• I/D TLB lookup in parallel of the I/D Cache lookup to avoid lookup penalties
• 4 kB pages
As you can see, it is quite minimalistic, here is a list of what's not featured by this MMU:
• No hardware page tree walker
• No dirty or present bit
• No ASID (Address Space Identier)
• No lockable TLB entries
• Only 1 page size supported: 4 kB
3 TLB Layout
Let's name our 32 bits virtual address vaddr.
Let's name our 32 bits physical address paddr.
Let's say vaddr[0] is the Lowest Signicant Bit and vaddr[31] the Most Signicant Bit.
Let's say vaddr[11:0] is the part of vaddr represented by its 12 Lowest Signicant Bits.
Deep inside, the TLB is a Direct-mapped, VIVT (Virtually Indexed Virtually Tagged) Cache.
When the LM32 core is synthetized with MMU support, the CPU pipeline Data and Instruction
Caches turn into VIPT (Virtually Indexed Physically Tagged) Caches.
The TLB is indexed by vaddr[21:12]: The bottom 10 LSB of the virtual PFN (Page Frame
Number).
2
3. A TLB entry holds: Physical PFN, Physical Tag, Cache inhibit ag (for DTLB), Read-only ag
(for DTLB), Valid entry tag
More precisely:
• A valid DTLB entry: paddr[31:12], vaddr[31:22], paddr[2], paddr[1], 1
• An invalid DTLB entry: paddr[31:12], vaddr[21:22], paddr[2], paddr[1], 0
• A valid ITLB entry: paddr[31:12], vaddr[31:22], 1
• An invalid ITLB entry: paddr[31:12], vaddr[31:22], 0
The meaning of paddr[2] and paddr[1] will be explained later on in the section which explains
how to program the MMU using LM32 assembly instructions.
4 Interact with the TLB
In order to interact with the TLB, three CSR (Control and Status Registers) have been added
to the LM32 CPU:
CSR Description R/W
TLBVADDR You can write the virtual pfn of the entry you
want to update or invalidate or cause a TLB
ush.
You can read the virtual pfn causing a TLB miss
or fault.
Read-Write
TLBPADDR You can write the physical pfn of the entry you
want to update.
Write-only
TLBBADVADDR You can read the virtual address which caused
the TLB exception.
Read-only
• TLBVADDR: holds a virtual address
• TLBPADDR: holds a physical address
A CSR register can be written to like this:
The following code writes the content of the R1 register to TLBVADDR CSR:
wcsr TLBVADDR , r1
A CSR register can be read from like this:
The following code writes the content of TLBPADDR CSR to the R1 register:
rcsr r1, TLBPADDR
3
4. 4.1 Add or Update a TLB entry
First, make sure vaddr[2:0] == 000 (or 3'b0 in verilog) as those 3 bits will be used for other
TLB operations.
Then, write the virtual address to the TLBVADDR CSR.
Then you need to do a logical OR operation on the physical address to set paddr[2:0] according
to your needs:
• paddr[2] set to 1 means the page won't be cached by LM32 Data Cache (only for Data
Cache / DTLB)
• paddr[1] set to 1 means the Page is Read-only (only valid for DTLB)
• paddr[0] set to 1 means you want to update DTLB, use 0 for ITLB
Then, you need to write the OR'ed physical address to the TLBPADDR CSR.
The TLB entry update will be triggered by the write to TLBPADDR CSR.
Code samples:
#define PAGE_SIZE (1 12)
#define PAGE_MASK (PAGE_SIZE - 1)
void update_dtlb_entry(unsigned int vaddr , unsigned int paddr ,
bool read -only , bool not_cached)
{
paddr = ~PAGE_MASK; // Make sure page offset is zeroed
vaddr = ~PAGE_MASK; // Make sure page offset is zeroed
paddr |= 1; // This means we are addressing DTLB
if (read -only)
paddr |= 2;
if (not_cached)
paddr |= 4;
asm volatile (wcsr TLBVADDR , %0 :: r(vaddr) : );
asm volatile (wcsr TLBPADDR , %0 :: r(paddr) : );
}
void update_itlb_entry(unsigned int vaddr , unsigned int paddr)
{
paddr = ~PAGE_MASK; // Make sure page offset is zeroed
vaddr = ~PAGE_MASK; // Make sure page offset is zeroed
// We don 't set paddr [0] which means we are addressing
ITLB
asm volatile (wcsr TLBVADDR , %0 :: r(vaddr) : );
asm volatile (wcsr TLBPADDR , %0 :: r(paddr) : );
4
5. }
4.2 Invalidate a TLB entry or Flush the entire TLB
First, you need to do a logical OR operation on the virtual address to set vaddr[2:0] according
to your needs:
• vaddr[2] set to 1 will trigger a ush of the entire selected TLB
• vaddr[1] set to 1 will trigger the invalidation of the entry indexed by vaddr[21:12] inside
the selected TLB
• vaddr[0] set to 1 means you want to operate on DTLB, use 0 for ITLB
The action is triggered upon the write of the OR'ed virtual address to the TLBVADDR CSR.
Code samples:
#define PAGE_SIZE (1 12)
#define PAGE_MASK (PAGE_SIZE - 1)
void invalidate_dtlb_entry(unsigned int vaddr)
{
vaddr = ~PAGE_MASK; // Make sure page offset is zeroed
/*
* 1 because we are addressing DTLB
* 2 because we want to invalidate a specific line
*/
vaddr |= 1 | 2;
asm volatile (wcsr TLBVADDR , %0 :: r(vaddr) : );
}
void invalidate_itlb_entry(unsigned int vaddr)
{
vaddr = ~PAGE_MASK; // Make sure page offset is zeroed
vaddr |= 2; // 2 because we want to invalidate a specific
line
asm volatile (wcsr TLBVADDR , %0 :: r(vaddr) : );
}
void flush_dtlb(void)
{
unsigned int cmd = 1 | 4;
asm volatile (wcsr TLBVADDR , %0 :: r(cmd) : );
}
void flush_itlb(void)
5
6. {
unsigned int cmd = 4;
asm volatile (wcsr TLBVADDR , %0 :: r(cmd) : );
}
4.3 A sum up of TLB actions
To summarize all possible TLB actions:
• Writing to TLBPADDR triggers the update of a TLB entry according to the content of
TLBVADDR and TLBPADDR
• Writing to TLBVADDR either prepares for updating a TLB entry if it is followed by
a write operation to TLBPADDR or immediately triggers an action determined by bits
vaddr[2:0] written to TLBVADDR. In the latter case, the action is performed on the TLB
entry indexed by vaddr[21:12].
Possible actions triggered by writing to TLBVADDR:
vaddr[2:0] action
000 No Operation, used for updating TLB entry by writting to TLBPADDR
011 Invalidate DTLB entry indexed by vaddr[21:12]
010 Invalidate ITLB entry indexed by vaddr[21:12]
101 Flush DTLB
100 Flush ITLB
11x Not deterministic, do not use untill it's dened by a future MMU revision
5 Interact with the MMU
In order to interact with the MMU, a new CSR (Control and Status Register) has been added:
PSW (Processor Status Word)
Bits Meaning
31:12 unused
11 BUSR: Breakpoint backup of USR
10 EUSR: Exception backup of USR
9 USR: User mode bit
8 BDTLBE: Breakpoint backup of DTLBE
7 EDTLBE: Exception backup of DTLBE
6 DTLBE: DTLB enabled
5 BITLBE: Breakpoint backup of ITLBE
4 EITLBE: Exception backup of ITLBE
3 ITLBE: ITLB enabled
2 IE.BIE ∗
1 IE.EIE ∗
0 IE.IE ∗
(*) PSW[2:0] is a real mirror of IE[2:0] as described in the LatticeMico32 Processor Reference
Manuel p. 10 Table 5 Fields of the IE CSR. In any condition: PSW[2:0] == IE[2:0]. IE CSR
6
7. is mirrored in the lower bits of PSW CSR for compatibility reasons. Old programs (ignorant
of the MMU) will keep using IE CSR, newer programs can use PSW to deal with MMU and
interrupts.
5.1 Activate the MMU
Activating the MMU is done by activating each TLB by writing 1 into PSW[ITLBE] and
PSW[DTLBE].
void enable_mmu(void)
{
asm volatile (rcsr r1, PSWnt
ori r1, r1, 72nt
wcsr PSW , r1 ::: r1);
}
5.2 Deactivate the MMU
Disactivating the MMU is done by deactivating each TLB by writing 0 into PSW[ITLBE] and
PSW[DTLBE].
void disable_mmu(void)
{
unsigned int mask = ~(72);
asm volatile (rcsr r1, PSWnt
and r1, r1, %0nt
wcsr PSW , r1 :: r(mask) : r1);
}
6 TLB lookups
This section explains in details how the TLB lookup takes place: what happens in which condi-
tion.
If the TLBs are disabled, nothing special happens, LM32 will behave as if it has been synthetized
without MMU support (except for the presence of PSW, TLBVADDR and TLBPADDR).
If DTLB is enabled:
In parallel of the Data Cache lookup, the DTLB lookup happens.
DTLB is indexed by vaddr[21:11].
If the DTLB entry is invalid (i.e. invalid bit is set), then the DTLB generates a DTLB miss
exception.
If the DTLB entry is valid, the DTLB compares vaddr[31:22] with the DTLB entry tag, if this
comparison fails: the DTLB generates a DTLB miss exception as well.
If the DTLB entry is valid and the vaddr[31:22] matches the DTLB entry tag:
• Then if the memory access was a READ (lb, lbu, lh, lhu, lw)
7
8. the Data Cache compares the tag of its selected line with the paddr[31:12] extracted
from the DTLB to check if we Hit or Miss the Data Cache
Then the usual Cache rell happens (using the physical address) in case of a cache
miss
• Then if the memory access was a WRITE (sb, sh, sw)
The read-only bit ag contained in the DTLB entry is checked
∗ If it is set: it triggers a DTLB fault CPU exception
∗ If it's not set: The Data Cache does the same tag comparison as with the READ
operation to check for Cache Hit/Miss
All these behaviours are summed up in the following table:
Exception EID Condition
ITLB miss 8
• ITLB entry is invalid
• ITLB entry tag does not match
vaddr[31:22]
DTLB miss 9
• DTLB entry is invalid
• DTLB entry tag does not match
vaddr[31:22]
DTLB fault 10 DTLB entry is valid
AND the entry tag matches vaddr[31:22]
AND the read-only bit is set
AND the cpu is doing a memory store
Privilege exception 11
PSW[USR] == 1 and one of the following
instruction is executed:
• iret
• bret
• wcsr
The Condition column's content is a logical OR between each bullet point except for the
DTLB fault where the logical AND is explicitly specied.
7 CSR registers special behaviours
Upon any exception, PSW CSR is modied automatically by the CPU pipeline itself:
• PSW[ITLBE] is saved in PSW[EITLBE] and the former is cleared
• PSW[DTLBE] is saved in PSW[EDTLBE] and the former is cleared
8
9. • PSW[USR] is saved in PSW[EUSR] and the former is cleared
• TLBVADDR is pre-charged with the virtual PFN (page frame number) which caused an
exception (in case of TLB miss or fault only)
TLBVADDR[0] is set to 1 when then exception is caused by DTLB, else it is clear
In case of DTLB miss or fault, TLBVADDR[31:12] is pre-charged the virtual PFN
whose load or store operation caused the exception
In case of ITLB miss, TLBVADDR[31:12] is pre-charged with the virtual PFN of the
instruction whose fetch caused the exception
This mechanism allows for faster TLB miss handling because TLBVADDR is already
pre-charged with the right value
Since TLBVADDR is pre-charged with the virtual PFN: page oset bits (TLB-
VADDR[11:1]) are not set
• TLBBADVADDR∗∗ is written with a virtual address when an exception is caused by a
TLB miss
In case of ITLB miss, TLBBADVADDR[31:2] contains the PC address whose fetch
triggered the ITLB miss exception. Instructions being 32 bits aligned, PC[1:0] is
always 00.
In case of DTLB miss or fault, TLBBADVADDR[31:0] contains the virtual address
whose load or store operation caused the exception
Unlike TLBVADDR, TLBBADVADDR page oset bits are set according to what
caused the exception
∗ In LM32 pipeline, exception happens in the eXecute stage, even though they may be triggered
in the Fetch or Memory stage for example. Load and Store instructions therefore stall the
pipeline for 1 cycle during the eXecute stage if the DTLB is activated.
∗∗ TLBBADVADDR is the same CSR ID as TLBPADDR. The former is read-only and the latter
is write-only.
Upon any breakpoint hit, PSW CSR is also modied by the CPU pipeline:
• PSW[ITLBE] is saved in PSW[BITLBE] and the former is cleared
• PSW[DTLBE] is saved in PSW[BDTLBE] and the former is cleared
• PSW[USR] is saved in PSW[BUSR] and the former is cleared
This means MMU is turned o upon CPU exception or breakpoint hit.
Upon return from exception (iret instruction), PSW CSR is also modied by the CPU pipeline:
• PSW[ITLBE] is restored using the value from PSW[EITLBE]
• PSW[DTLBE] is restored using the value from PSW[EDTLBE]
• PSW[USR] is restored using the value from PSW[EUSR]
Upon return from breakpoint (bret instruction), PSW CSR is also modied by the CPU pipeline:
9
10. • PSW[ITLBE] is restored using the value from PSW[BITLBE]
• PSW[DTLBE] is restored using the value from PSW[BDTLBE]
• PSW[USR] is restored using the value from PSW[BUSR]
Copyright notice
Copyright c 2013 Yann Sionneau.
Permission is granted to copy, distribute and/or modify this document under the terms of the
BSD License.
10