Direct memory access, or DMA, is a way of dealing with input/output in a way that reduces CPU overhead and increases throughput for large or frequent I/O operations. It allows the CPU to perform computations in parallel with those data transfers.

Functioning

DMA makes it so that an external input/output device can use a controller that sends and reads data directly from a main memory address on the machine. The CPU is oblivious to these transfers, and DMA addresses and sizes are programmed by the CPU using Programmed input-output.

This is where the name comes from, as the DMA controller that is in an IO device directly accesses the machine’s memory without blocking the CPU’s cycles.

This causes shared memory access, which can be problematic for the same reasons asynchronous code can be in general. There are multiple different types of ways to deal with this, but the general mechanism is to lock the CPU out of the address being written or read from by the DMA controller while the transfer is in progress.

It uses interrupts to signal the CPU when needed, usually on completion.

Example

Non-testable

The example below includes information that is not testable, however it illustrates the functioning of DMA, which is testable.

Scenario

A program wants to render a frame on your screen using the GPU. The CPU prepares the data, and the GPU uses DMA to read that data directly from main memory without bothering the CPU.

CPU Preparation

The CPU has a set of information it wants to pass off to the GPU to render. This is generally triangles, points, pixel data etc. It stores it into a known region of main memory. If it’s vertices for example (3d points), it will be stored in what is generally called a vertex buffer.

It will then tell the GPU’s DMA controller via Programmed input-output, the size of the data and the address it is in, as well as what to do with it. In this case, it would be reading the data on the RAM that the CPU just put there, and copy it to its own local memory (VRAM).

The CPU then keeps on chugging along with its next task.

GPU Uses DMA

The GPU will then perform the data transfer independently of the CPU. This is done through mechanisms we don’t see in CPSC 213, but the important part is that the CPU is doing none of that copying work.

Completion

Once the GPU is done, it will signal completion to the CPU by setting the interrupt flag, and writing its device ID into the interruptControllerID. The next time the CPU checks for interrupts, it will see that the interrupt is present, pause its current work and jump to the GPU’s interrupt service routine using the interruptVectorBase.

Interrupt

The CPU saves its current state into memory, before running the interrupt service routine, whatever it may be. Once the ISR completes, it restores the state of its registers from memory and continues along.

Non-testable

CPU involvement

In reality, as the example above illustrates, the CPU is involved to some extent. It is not involved in the movement of data itself, but it does configure the DMA controller, coordinate memory access and handle completion.

Locking mechanisms

Modern systems don’t actually lock the CPU out of the memory address, instead they have more modern mechanisms to do so without tanking performance. Keywords to look up would be bus arbitration, cache coherence protocols, IOMMU and double buffering.

DMA Modes

There are different ways that DMA can handle shared memory access. Some examples include burst mode, cycle stealing and demand mode.

Problems with DMA

DMA has one severe issue, it is that it interacts poorly with CPU caches, since it bypasses the CPU entirely.