The Journey of a Frame through a Linux Based System
The scope of this article is to describe, at a high level, the journey a frame takes through a Linux based system.
NOTE This article does not cover Linux kernel performance issues and caveats, for more information around this please see pushing the limits of kernel networking.
Lets first look at the path an ingress frame would take.takes (Figure 1).
- Frame is received by the network adapter.
- Frame is moved (via DMA) to a RX ring buffer in kernel memory*.
- The NIC notifies the system of that there is a new frame ready for processing by raising a hardware interrupt (IRQ)**.
- The IRQ is cleared on the NIC and the kernel executes the device driver which drains the RX ring via SoftIRQs.
- The SoftIRQs place the frames into a kernel data structure called an sk_buff or "skb" (i.e socket buffer).
- The frame is passed up through the networking stack for further processing.
- The packet finally arrives at the receive socket buffer (also known as the receive window).
- Application calls the read system call. The area is changed to the kernel area and the data in the socket buffer is copied to the userspace of the application.
*RX ring buffer - the ring buffer is a circular buffer (FIFO). This means an overflow will simply overwrite existing data.
**Interrupts - Hardware (also known as top-half) interrupts can be expensive in terms of CPU usage, it alerts the CPU to a high-priority condition requiring the processor to suspend what is currently doing, saving its state and then initialize an interrupt handler to deal with the event. The hard interrupt handler then leaves the majority of packet reception to a software (also known as bottom-half) interrupt process which can be scheduled more fairly.
Figure 1 - High Level overview of components involved in Frame Reception/Delivery.
Now lets look at the path an egress frame would take (Figure 1).
- Application performs a write() on the socket.
- Frame is copied from the applications user space to the send socket buffer.
- Frame is passed down the stack to the output queue (qdisc).
- Frame is passed to the device driver, who moves it to a TX ring buffer.
- The device driver invokes the NIC DMA engine to transmit the frame onto the wire.
- Once transmission is complete, the device raises an interrupt to signal transmit completion.
- The device drivers IRQ handler runs and triggers SoftIRQs.
- SoftIRQs unmap the DMA regions and frees the packet data.
Introduced within the Linux Kernel 2.4.20, NAPI (New API) reduces the amount of hard IRQs raised by network adapters for ingress frames. It works by disabling hard IRQs from the NIC, packets are then pulled from the RX ring buffer by the NAPI subsystem. Once the RX ring buffer is empty, IRQs from the NIC are re-enabled.