Buffer management

The basic data structure conveying network data through the regular (in-band) network stack is the socket buffer, aka sk_buff. Dovetail extends its usage to conveying out-of-band I/O packets as well. The related changes fall into two categories:

Socket buffers may be shared between the in-band and out-of-band network stacks, particularly because Dovetail can handle hybrid setups with a companion core providing an out-of-band network stack which is still using stock NIC drivers for performing the physical I/O operations.


Allocating out-of-band buffers

A socket buffer is logically split in two parts: the actual payload data, and some metadata describing it which lives in the sk_buff structure. There are several ways to build a socket buffer for a network device using the kernel API, they all boil down to allocating a struct sk_buff instance and some memory to store the associated payload separately, then pair them together.

For this reason, Dovetail provides services to allocate sk_buff structures from the out-of-band stage, along with enabling out-of-band operations for the page pool API to allocate and release memory pages.

Out-of-band sk_buff pool

When the in-band network stack initializes and CONFIG_NET_OOB is enabled in the kernel, a pool of sk_buff structures is pre-allocated. This global pool can be accessed from any execution stage to allocate and release them. Typically, this pool is needed when:

  • the companion core wants to send an outgoing packet to the network via some device, in this case it needs an sk_buff structure to store the metadata describing the associated payload.

  • an oob-capable network device driver wants to refill a buffer slot maintained for its (DMA) RX ring, after a buffer was consumed in order to pass an incoming packet to the network stack.

This pool is directly accessed by the get_oob_skb() and put_oob_skb() services in order to allocate and release sk_buff instances respectively.

The number of pre-allocated buffers in this pool is fixed at system boot, defaults to 1024. This value can be changed passing the sysctl.net.max_oob_skb=<nr-buffers> argument on the kernel command line.

Page pool API

Once CONFIG_NET_OOB - as implemented by Dovetail - is enabled in the kernel, the regular page pool API supports an out-of-band allocation mode. This mode is turned on by setting PP_FLAG_PAGE_OOB in the flags passed to page_pool_create() via the parameter block to create a new pool. It applies to the whole memory space managed by that particular pool.

The regular page pool API applies to out-of-band pools, except for the following restrictions:

  • the main difference with in-band pools stems from the buffer allocation strategy, which does not involve any cache refilling when out-of-band mode is enabled. In this case, the maximum number of available buffers throughout the lifetime of the pool must be fixed at creation time by setting a non-zero pool_size parameter in the page_pool_params structure. The per-pool fast cache is immediately and fully populated with the maximum number of buffers.

  • page fragments cannot be handled by out-of-band pools. Only full pages are returned by the allocator.

Setting PP_FLAG_PAGE_OOB does not preclude from allocating pages from such pool when running on the in-band stage as well, however the usage restrictions imposed on the user when the out-of-band mode is enabled would still apply.

When CONFIG_NET_OOB is turned on in the kernel configuration, the following buffer-related services are available:


struct sk_buff *get_oob_skb(void)

Allocate a sk_buff structure from the out-of-band pool, returning it to the caller, or NULL if no buffer is available. This call is thread-safe, and immune from stage preemption as well. Calling this service either from the in-band or out-of-band stage is safe, although its normal usage suggests a call from the latter.

The returned buffer is marked as coming from the out-of-band pool, which can be tested using the skb_is_oob() predicate. As a result, put_oob_skb() is automatically invoked when the last reference to this buffer is dropped from by the in-band network stack (e.g. kfree_skb()). Otherwise, the companion core should call put_oob_skb() explicitly for the same purpose when it sees fit.


void put_oob_skb(struct sk_buff *skb)

Release skb to the out-of-band pool. This buffer must have been allocated by a previous call to get_oob_skb(). This call is thread-safe, and immune from stage preemption as well. Calling this service either from the in-band or out-of-band stage is safe.

  • skb

    The sk_buff structure to release.


  • bool skb_is_oob(struct sk_buff *skb)

    Return true if skb was allocated from the out-of-band pool. A previous call to skb_mark_oob() is the only way to turn this flag on for a buffer, which get_oob_skb() issues for every buffer it returns.

  • skb

    The sk_buff structure to test.

  • This predicate is distinct from skb_has_oob_storage() which checks whether the sk_buff structure conveys the metadata associated with a payload buffer deal with by the out-of-band network stack.


    void skb_mark_oob(struct sk_buff *skb)

    Mark skb as managed by the out-of-band network stack. get_oob_skb() issues this call for every buffer it returns. This is distinct from the payload buffer which may be associated to skb; see the related note in skb_is_oob().

  • skb

    The sk_buff structure to mark.


  • bool skb_has_oob_storage(struct sk_buff *skb)

    Return true if skb conveys the metadata associated with a payload buffer managed by the out-of-band network stack implemented by a companion core.

  • skb

    The sk_buff structure conveying the payload buffer to test.

  • This predicate is distinct from skb_is_oob() which checks whether the sk_buff structure was allocated from the out-of-band pool for conveying the metadata (not the payload).


    void skb_mark_oob_storage(struct sk_buff *skb)

    Mark the payload buffer associated to skb as managed by the out-of-band network stack. This is distinct from marking the skb metadata structure itself skb; see the related note in skb_has_oob_storage().

  • skb

    The sk_buff structure to mark.


  • void finalize_skb_inband(struct sk_buff *skb)

    The companion core needs a way to release a sk_buff structure to some in-band pool, as the in-band stack would see fit. No more reference must be pending on skb at the time of the call, however the payload buffer may still be attached to it. The out-of-band network stack may use this service to pass an unreferenced socket buffer for release to the in-band stack.

  • skb

    The sk_buff structure to finalize.

  • This service must not be confused with kfree_skb() or its NAPI equivalent, which first drop a reference to skb prior to finalizing the buffer if the reference count dropped to zero. Again, finalize_skb_inband() only deals with buffers which are guaranteed reference-free.


    The Dovetail interface relies in part on the companion core for supporting out-of-band memory management by mean of the following weakly bound routines which the latter must implement.


    __weak void free_skb_oob(struct sk_buff *skb)

    A companion core implementing this hook receives an unused socket buffer right after its last reference was dropped. That way, it is given a chance to either release the buffer immediately to its own buffer pool if applicable, or send it back to the in-band network stack if the payload data was allocated in-band, in which case the buffer should be disposed from there. In other words, the in-band stack hands over unused socket buffers to the companion core using this call, when either of the following conditions is true:

    • the caller runs on the out-of-band execution stage.

    • the payload buffer associated to skb is managed by the out-of-band network stack, not the in-band one (i.e. skb_has_oob_storage() yields true).

    A typical logic of this hook would be:

    • Check whether skb conveys a payload maintained by the companion core (i.e. not by the in-band stack) using the skb_has_oob_storage() predicate. If so, then release the payload buffer attached to skb internally (to the core), unless it is still shared with other socket buffers (see skb->users refcounting).

    • Otherwise, if skb conveys a payload allocated by the in-band stack, then:

      • If currently running in-band, then finalize skb by an immediate call to finalize_skb_inband().

      • If currently running out-of-band, schedule a call to finalize_skb_inband() by any available mean, in order for that call to happen the next time the in-band stage resumes. Using the irq_work deferral mechanism is a typical option for this purpose.

  • skb

    The sk_buff structure to be released.

  • This routine may be called from any execution stage, in-band or out-of-band. Hard irqs are always enabled on call.


    Last modified: Mon, 18 Nov 2024 15:05:47 +0100