The basic data structure conveying network data through the regular (in-band) network stack is the socket buffer, aka sk_buff. Dovetail extends its usage to conveying out-of-band I/O packets as well. The related changes fall into two categories:
enabling sk_buff
allocation and release from the out-of-band
Dovetail stage. This category of services is useful for adapting a stock NIC
driver for
dealing with out-of-band I/O traffic.
providing hooks for the companion core to interpose on specific events related to memory management, so that it can participate in managing the lifetime of the socket buffers. This category of services is useful for implementing a complete out-of-band network stack like the EVL core does.
Socket buffers may be shared between the in-band and out-of-band network stacks, particularly because Dovetail can handle hybrid setups with a companion core providing an out-of-band network stack which is still using stock NIC drivers for performing the physical I/O operations.
A socket buffer is logically split in two parts: the actual payload
data, and some metadata describing it which lives in the sk_buff
structure. There are several ways to build a socket buffer for a
network device using the kernel API, they all boil down to allocating
a struct sk_buff
instance and some memory to store the associated
payload separately, then pair them together.
For this reason, Dovetail provides services to allocate sk_buff
structures from the out-of-band stage, along with enabling out-of-band
operations for the page pool
API to allocate
and release memory pages.
sk_buff
poolWhen the in-band network stack initializes and CONFIG_NET_OOB
is
enabled in the kernel, a pool of sk_buff
structures is
pre-allocated. This global pool can be accessed from any execution
stage to allocate and release them. Typically, this pool is needed
when:
the companion core wants to send an outgoing packet to the network
via some device, in this case it needs an sk_buff
structure to
store the metadata describing the associated payload.
an oob-capable network device driver wants to refill a buffer slot maintained for its (DMA) RX ring, after a buffer was consumed in order to pass an incoming packet to the network stack.
This pool is directly accessed by the get_oob_skb() and put_oob_skb()
services in order to allocate and release sk_buff
instances
respectively.
The number of pre-allocated buffers in this pool is fixed at system
boot, defaults to 1024. This value can be changed passing the
sysctl.net.max_oob_skb=<nr-buffers>
argument on the kernel command
line.
Once CONFIG_NET_OOB
- as implemented by Dovetail - is enabled in the
kernel, the regular page pool
API supports an
out-of-band allocation mode. This mode is turned on by setting
PP_FLAG_PAGE_OOB
in the flags passed to
page_pool_create()
via the parameter block to create a new pool. It applies to the whole
memory space managed by that particular pool.
The regular page pool API applies to out-of-band pools, except for the following restrictions:
the main difference with in-band pools stems from the buffer
allocation strategy, which does not involve any cache refilling when
out-of-band mode is enabled. In this case, the maximum number of
available buffers throughout the lifetime of the pool must be fixed
at creation time by setting a non-zero pool_size
parameter in the
page_pool_params
structure. The per-pool fast cache is immediately
and fully populated with the maximum number of buffers.
page fragments cannot be handled by out-of-band pools. Only full pages are returned by the allocator.
Setting PP_FLAG_PAGE_OOB
does not preclude from allocating pages
from such pool when running on the in-band stage as well, however the
usage restrictions imposed on the user when the out-of-band mode is
enabled would still apply.
When CONFIG_NET_OOB
is turned on in the kernel configuration, the
following buffer-related services are available:
Allocate a sk_buff
structure from the out-of-band pool, returning it
to the caller, or NULL if no buffer is available. This call is
thread-safe, and immune from stage preemption as well. Calling this
service either from the in-band or out-of-band stage is safe, although
its normal usage suggests a call from the latter.
The returned buffer is marked as coming from the out-of-band pool, which can be tested using the skb_is_oob() predicate. As a result, put_oob_skb() is automatically invoked when the last reference to this buffer is dropped from by the in-band network stack (e.g. kfree_skb()). Otherwise, the companion core should call put_oob_skb() explicitly for the same purpose when it sees fit.
Release skb
to the out-of-band pool. This buffer must have been
allocated by a previous call to get_oob_skb(). This call is thread-safe, and immune from stage
preemption as well. Calling this service either from the in-band or
out-of-band stage is safe.
The sk_buff
structure to release.
Return true if skb
was allocated from the out-of-band pool. A previous call to skb_mark_oob() is the only way to turn this flag on for a
buffer, which get_oob_skb() issues
for every buffer it returns.
The sk_buff
structure to test.
This predicate is distinct from skb_has_oob_storage() which checks whether the sk_buff
structure conveys the metadata associated with a payload buffer deal
with by the out-of-band network stack.
Mark skb
as managed by the out-of-band network
stack. get_oob_skb() issues this call
for every buffer it returns. This is distinct from the payload buffer
which may be associated to skb
; see the related note in
skb_is_oob().
The sk_buff
structure to mark.
Return true if skb
conveys the metadata associated with a payload
buffer managed by the out-of-band network stack implemented by a
companion core.
The sk_buff
structure conveying the payload buffer to test.
This predicate is distinct from skb_is_oob() which checks whether the sk_buff
structure was allocated from the out-of-band pool for conveying the metadata
(not the payload).
Mark the payload buffer associated to skb
as managed by the
out-of-band network stack. This is distinct from marking the skb
metadata structure itself skb
; see the related note in
skb_has_oob_storage().
The sk_buff
structure to mark.
The companion core needs a way to release a sk_buff
structure to
some in-band pool, as the in-band stack would see fit. No more
reference must be pending on skb
at the time of the call, however
the payload buffer may still be attached to it. The out-of-band
network stack may use this service to pass an unreferenced socket
buffer for release to the in-band stack.
The sk_buff
structure to finalize.
This service must not be confused with
kfree_skb() or its
NAPI equivalent, which first drop a reference to skb
prior to
finalizing the buffer if the reference count dropped to zero.
Again, finalize_skb_inband() only deals with buffers which are
guaranteed reference-free.
The Dovetail interface relies in part on the companion core for supporting out-of-band memory management by mean of the following weakly bound routines which the latter must implement.
A companion core implementing this hook receives an unused socket buffer right after its last reference was dropped. That way, it is given a chance to either release the buffer immediately to its own buffer pool if applicable, or send it back to the in-band network stack if the payload data was allocated in-band, in which case the buffer should be disposed from there. In other words, the in-band stack hands over unused socket buffers to the companion core using this call, when either of the following conditions is true:
the caller runs on the out-of-band execution stage.
the payload buffer associated to skb
is managed by the out-of-band
network stack, not the in-band one (i.e. skb_has_oob_storage() yields true).
A typical logic of this hook would be:
Check whether skb
conveys a payload maintained by the companion
core (i.e. not by the in-band stack) using the
skb_has_oob_storage()
predicate. If so, then release the payload buffer attached to skb
internally (to the core), unless it is still shared with other socket
buffers (see skb->users
refcounting).
Otherwise, if skb
conveys a payload allocated by the in-band
stack, then:
If currently running in-band, then finalize skb
by an immediate
call to finalize_skb_inband().
If currently running out-of-band, schedule a call to finalize_skb_inband() by any available mean, in order for that call to happen the next time the in-band stage resumes. Using the irq_work deferral mechanism is a typical option for this purpose.
The sk_buff
structure to be released.
This routine may be called from any execution stage, in-band or out-of-band. Hard irqs are always enabled on call.