diff options
author | Paolo Bonzini <pbonzini@redhat.com> | 2017-06-06 16:55:19 +0200 |
---|---|---|
committer | Paolo Bonzini <pbonzini@redhat.com> | 2017-06-15 11:18:39 +0200 |
commit | d59157ea058b55b95f27675b33275ffe0f4c7bd6 (patch) | |
tree | 5bd8aae074365a816041cce7c5ab11cb003c7bee /docs/specs | |
parent | 067b913619ac36299be5ab23921fd19a0347df60 (diff) |
docs: create interop/ subdirectory
This is for the future interoperability & management guide. It includes
the QAPI docs, including the automatically generated ones, other socket
protocols (vhost-user, VNC), and the qcow2 file format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'docs/specs')
-rw-r--r-- | docs/specs/parallels.txt | 228 | ||||
-rw-r--r-- | docs/specs/qcow2.txt | 581 | ||||
-rw-r--r-- | docs/specs/qed_spec.txt | 138 | ||||
-rw-r--r-- | docs/specs/vhost-user.txt | 620 |
4 files changed, 0 insertions, 1567 deletions
diff --git a/docs/specs/parallels.txt b/docs/specs/parallels.txt deleted file mode 100644 index e9271eba5d..0000000000 --- a/docs/specs/parallels.txt +++ /dev/null @@ -1,228 +0,0 @@ -= License = - -Copyright (c) 2015 Denis Lunev -Copyright (c) 2015 Vladimir Sementsov-Ogievskiy - -This work is licensed under the terms of the GNU GPL, version 2 or later. -See the COPYING file in the top-level directory. - -= Parallels Expandable Image File Format = - -A Parallels expandable image file consists of three consecutive parts: - * header - * BAT - * data area - -All numbers in a Parallels expandable image are stored in little-endian byte -order. - - -== Definitions == - - Sector A 512-byte data chunk. - - Cluster A data chunk of the size specified in the image header. - Currently, the default size is 1MiB (2048 sectors). In previous - versions, cluster sizes of 63 sectors, 256 and 252 kilobytes were - used. - - BAT Block Allocation Table, an entity that contains information for - guest-to-host I/O data address translation. - - -== Header == - -The header is placed at the start of an image and contains the following -fields: - -Bytes: - 0 - 15: magic - Must contain "WithoutFreeSpace" or "WithouFreSpacExt". - - 16 - 19: version - Must be 2. - - 20 - 23: heads - Disk geometry parameter for guest. - - 24 - 27: cylinders - Disk geometry parameter for guest. - - 28 - 31: tracks - Cluster size, in sectors. - - 32 - 35: nb_bat_entries - Disk size, in clusters (BAT size). - - 36 - 43: nb_sectors - Disk size, in sectors. - - For "WithoutFreeSpace" images: - Only the lowest 4 bytes are used. The highest 4 bytes must be - cleared in this case. - - For "WithouFreSpacExt" images, there are no such - restrictions. - - 44 - 47: in_use - Set to 0x746F6E59 when the image is opened by software in R/W - mode; set to 0x312e3276 when the image is closed. - - A zero in this field means that the image was opened by an old - version of the software that doesn't support Format Extension - (see below). - - Other values are not allowed. - - 48 - 51: data_off - An offset, in sectors, from the start of the file to the start of - the data area. - - For "WithoutFreeSpace" images: - - If data_off is zero, the offset is calculated as the end of BAT - table plus some padding to ensure sector size alignment. - - If data_off is non-zero, the offset should be aligned to sector - size. However it is recommended to align it to cluster size for - newly created images. - - For "WithouFreSpacExt" images: - data_off must be non-zero and aligned to cluster size. - - 52 - 55: flags - Miscellaneous flags. - - Bit 0: Empty Image bit. If set, the image should be - considered clear. - - Bits 1-31: Unused. - - 56 - 63: ext_off - Format Extension offset, an offset, in sectors, from the start of - the file to the start of the Format Extension Cluster. - - ext_off must meet the same requirements as cluster offsets - defined by BAT entries (see below). - - -== BAT == - -BAT is placed immediately after the image header. In the file, BAT is a -contiguous array of 32-bit unsigned little-endian integers with -(bat_entries * 4) bytes size. - -Each BAT entry contains an offset from the start of the file to the -corresponding cluster. The offset set in clusters for "WithouFreSpacExt" images -and in sectors for "WithoutFreeSpace" images. - -If a BAT entry is zero, the corresponding cluster is not allocated and should -be considered as filled with zeroes. - -Cluster offsets specified by BAT entries must meet the following requirements: - - the value must not be lower than data offset (provided by header.data_off - or calculated as specified above), - - the value must be lower than the desired file size, - - the value must be unique among all BAT entries, - - the result of (cluster offset - data offset) must be aligned to cluster - size. - - -== Data Area == - -The data area is an area from the data offset (provided by header.data_off or -calculated as specified above) to the end of the file. It represents a -contiguous array of clusters. Most of them are allocated by the BAT, some may -be allocated by the ext_off field in the header while other may be allocated by -extensions. All clusters allocated by ext_off and extensions should meet the -same requirements as clusters specified by BAT entries. - - -== Format Extension == - -The Format Extension is an area 1 cluster in size that provides additional -format features. This cluster is addressed by the ext_off field in the header. -The format of the Format Extension area is the following: - - 0 - 7: magic - Must be 0xAB234CEF23DCEA87 - - 8 - 23: m_CheckSum - The MD5 checksum of the entire Header Extension cluster except - the first 24 bytes. - - The above are followed by feature sections or "extensions". The last - extension must be "End of features" (see below). - -Each feature section has the following format: - - 0 - 7: magic - The identifier of the feature: - 0x0000000000000000 - End of features - 0x20385FAE252CB34A - Dirty bitmap - - 8 - 15: flags - External flags for extension: - - Bit 0: NECESSARY - If the software cannot load the extension (due to an - unknown magic number or error), the file should not be - changed. If this flag is unset and there is an error on - loading the extension, said extension should be dropped. - - Bit 1: TRANSIT - If there is an unknown extension with this flag set, - said extension should be left as is. - - If neither NECESSARY nor TRANSIT are set, the extension should be - dropped. - - 16 - 19: data_size - The size of the following feature data, in bytes. - - 20 - 23: unused32 - Align header to 8 bytes boundary. - - variable: data (data_size bytes) - - The above is followed by padding to the next 8 bytes boundary, then the - next extension starts. - - The last extension must be "End of features" with all the fields set to 0. - - -=== Dirty bitmaps feature === - -This feature provides a way of storing dirty bitmaps in the image. The fields -of its data area are: - - 0 - 7: size - The bitmap size, should be equal to disk size in sectors. - - 8 - 23: id - An identifier for backup consistency checking. - - 24 - 27: granularity - Bitmap granularity, in sectors. I.e., the number of sectors - corresponding to one bit of the bitmap. Granularity must be - a power of 2. - - 28 - 31: l1_size - The number of entries in the L1 table of the bitmap. - - variable: l1 (64 * l1_size bytes) - L1 offset table (in bytes) - -A dirty bitmap is stored using a one-level structure for the mapping to host -clusters - an L1 table. - -Given an offset in bytes into the bitmap data, the offset in bytes into the -image file can be obtained as follows: - - offset = l1_table[offset / cluster_size] + (offset % cluster_size) - -If an L1 table entry is 0, the corresponding cluster of the bitmap is assumed -to be zero. - -If an L1 table entry is 1, the corresponding cluster of the bitmap is assumed -to have all bits set. - -If an L1 table entry is not 0 or 1, it allocates a cluster from the data area. diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt deleted file mode 100644 index 80cdfd0e91..0000000000 --- a/docs/specs/qcow2.txt +++ /dev/null @@ -1,581 +0,0 @@ -== General == - -A qcow2 image file is organized in units of constant size, which are called -(host) clusters. A cluster is the unit in which all allocations are done, -both for actual guest data and for image metadata. - -Likewise, the virtual disk as seen by the guest is divided into (guest) -clusters of the same size. - -All numbers in qcow2 are stored in Big Endian byte order. - - -== Header == - -The first cluster of a qcow2 image contains the file header: - - Byte 0 - 3: magic - QCOW magic string ("QFI\xfb") - - 4 - 7: version - Version number (valid values are 2 and 3) - - 8 - 15: backing_file_offset - Offset into the image file at which the backing file name - is stored (NB: The string is not null terminated). 0 if the - image doesn't have a backing file. - - 16 - 19: backing_file_size - Length of the backing file name in bytes. Must not be - longer than 1023 bytes. Undefined if the image doesn't have - a backing file. - - 20 - 23: cluster_bits - Number of bits that are used for addressing an offset - within a cluster (1 << cluster_bits is the cluster size). - Must not be less than 9 (i.e. 512 byte clusters). - - Note: qemu as of today has an implementation limit of 2 MB - as the maximum cluster size and won't be able to open images - with larger cluster sizes. - - 24 - 31: size - Virtual disk size in bytes - - 32 - 35: crypt_method - 0 for no encryption - 1 for AES encryption - - 36 - 39: l1_size - Number of entries in the active L1 table - - 40 - 47: l1_table_offset - Offset into the image file at which the active L1 table - starts. Must be aligned to a cluster boundary. - - 48 - 55: refcount_table_offset - Offset into the image file at which the refcount table - starts. Must be aligned to a cluster boundary. - - 56 - 59: refcount_table_clusters - Number of clusters that the refcount table occupies - - 60 - 63: nb_snapshots - Number of snapshots contained in the image - - 64 - 71: snapshots_offset - Offset into the image file at which the snapshot table - starts. Must be aligned to a cluster boundary. - -If the version is 3 or higher, the header has the following additional fields. -For version 2, the values are assumed to be zero, unless specified otherwise -in the description of a field. - - 72 - 79: incompatible_features - Bitmask of incompatible features. An implementation must - fail to open an image if an unknown bit is set. - - Bit 0: Dirty bit. If this bit is set then refcounts - may be inconsistent, make sure to scan L1/L2 - tables to repair refcounts before accessing the - image. - - Bit 1: Corrupt bit. If this bit is set then any data - structure may be corrupt and the image must not - be written to (unless for regaining - consistency). - - Bits 2-63: Reserved (set to 0) - - 80 - 87: compatible_features - Bitmask of compatible features. An implementation can - safely ignore any unknown bits that are set. - - Bit 0: Lazy refcounts bit. If this bit is set then - lazy refcount updates can be used. This means - marking the image file dirty and postponing - refcount metadata updates. - - Bits 1-63: Reserved (set to 0) - - 88 - 95: autoclear_features - Bitmask of auto-clear features. An implementation may only - write to an image with unknown auto-clear features if it - clears the respective bits from this field first. - - Bit 0: Bitmaps extension bit - This bit indicates consistency for the bitmaps - extension data. - - It is an error if this bit is set without the - bitmaps extension present. - - If the bitmaps extension is present but this - bit is unset, the bitmaps extension data must be - considered inconsistent. - - Bits 1-63: Reserved (set to 0) - - 96 - 99: refcount_order - Describes the width of a reference count block entry (width - in bits: refcount_bits = 1 << refcount_order). For version 2 - images, the order is always assumed to be 4 - (i.e. refcount_bits = 16). - This value may not exceed 6 (i.e. refcount_bits = 64). - - 100 - 103: header_length - Length of the header structure in bytes. For version 2 - images, the length is always assumed to be 72 bytes. - -Directly after the image header, optional sections called header extensions can -be stored. Each extension has a structure like the following: - - Byte 0 - 3: Header extension type: - 0x00000000 - End of the header extension area - 0xE2792ACA - Backing file format name - 0x6803f857 - Feature name table - 0x23852875 - Bitmaps extension - other - Unknown header extension, can be safely - ignored - - 4 - 7: Length of the header extension data - - 8 - n: Header extension data - - n - m: Padding to round up the header extension size to the next - multiple of 8. - -Unless stated otherwise, each header extension type shall appear at most once -in the same image. - -If the image has a backing file then the backing file name should be stored in -the remaining space between the end of the header extension area and the end of -the first cluster. It is not allowed to store other data here, so that an -implementation can safely modify the header and add extensions without harming -data of compatible features that it doesn't support. Compatible features that -need space for additional data can use a header extension. - - -== Feature name table == - -The feature name table is an optional header extension that contains the name -for features used by the image. It can be used by applications that don't know -the respective feature (e.g. because the feature was introduced only later) to -display a useful error message. - -The number of entries in the feature name table is determined by the length of -the header extension data. Each entry look like this: - - Byte 0: Type of feature (select feature bitmap) - 0: Incompatible feature - 1: Compatible feature - 2: Autoclear feature - - 1: Bit number within the selected feature bitmap (valid - values: 0-63) - - 2 - 47: Feature name (padded with zeros, but not necessarily null - terminated if it has full length) - - -== Bitmaps extension == - -The bitmaps extension is an optional header extension. It provides the ability -to store bitmaps related to a virtual disk. For now, there is only one bitmap -type: the dirty tracking bitmap, which tracks virtual disk changes from some -point in time. - -The data of the extension should be considered consistent only if the -corresponding auto-clear feature bit is set, see autoclear_features above. - -The fields of the bitmaps extension are: - - Byte 0 - 3: nb_bitmaps - The number of bitmaps contained in the image. Must be - greater than or equal to 1. - - Note: Qemu currently only supports up to 65535 bitmaps per - image. - - 4 - 7: Reserved, must be zero. - - 8 - 15: bitmap_directory_size - Size of the bitmap directory in bytes. It is the cumulative - size of all (nb_bitmaps) bitmap headers. - - 16 - 23: bitmap_directory_offset - Offset into the image file at which the bitmap directory - starts. Must be aligned to a cluster boundary. - - -== Host cluster management == - -qcow2 manages the allocation of host clusters by maintaining a reference count -for each host cluster. A refcount of 0 means that the cluster is free, 1 means -that it is used, and >= 2 means that it is used and any write access must -perform a COW (copy on write) operation. - -The refcounts are managed in a two-level table. The first level is called -refcount table and has a variable size (which is stored in the header). The -refcount table can cover multiple clusters, however it needs to be contiguous -in the image file. - -It contains pointers to the second level structures which are called refcount -blocks and are exactly one cluster in size. - -Given a offset into the image file, the refcount of its cluster can be obtained -as follows: - - refcount_block_entries = (cluster_size * 8 / refcount_bits) - - refcount_block_index = (offset / cluster_size) % refcount_block_entries - refcount_table_index = (offset / cluster_size) / refcount_block_entries - - refcount_block = load_cluster(refcount_table[refcount_table_index]); - return refcount_block[refcount_block_index]; - -Refcount table entry: - - Bit 0 - 8: Reserved (set to 0) - - 9 - 63: Bits 9-63 of the offset into the image file at which the - refcount block starts. Must be aligned to a cluster - boundary. - - If this is 0, the corresponding refcount block has not yet - been allocated. All refcounts managed by this refcount block - are 0. - -Refcount block entry (x = refcount_bits - 1): - - Bit 0 - x: Reference count of the cluster. If refcount_bits implies a - sub-byte width, note that bit 0 means the least significant - bit in this context. - - -== Cluster mapping == - -Just as for refcounts, qcow2 uses a two-level structure for the mapping of -guest clusters to host clusters. They are called L1 and L2 table. - -The L1 table has a variable size (stored in the header) and may use multiple -clusters, however it must be contiguous in the image file. L2 tables are -exactly one cluster in size. - -Given a offset into the virtual disk, the offset into the image file can be -obtained as follows: - - l2_entries = (cluster_size / sizeof(uint64_t)) - - l2_index = (offset / cluster_size) % l2_entries - l1_index = (offset / cluster_size) / l2_entries - - l2_table = load_cluster(l1_table[l1_index]); - cluster_offset = l2_table[l2_index]; - - return cluster_offset + (offset % cluster_size) - -L1 table entry: - - Bit 0 - 8: Reserved (set to 0) - - 9 - 55: Bits 9-55 of the offset into the image file at which the L2 - table starts. Must be aligned to a cluster boundary. If the - offset is 0, the L2 table and all clusters described by this - L2 table are unallocated. - - 56 - 62: Reserved (set to 0) - - 63: 0 for an L2 table that is unused or requires COW, 1 if its - refcount is exactly one. This information is only accurate - in the active L1 table. - -L2 table entry: - - Bit 0 - 61: Cluster descriptor - - 62: 0 for standard clusters - 1 for compressed clusters - - 63: 0 for a cluster that is unused or requires COW, 1 if its - refcount is exactly one. This information is only accurate - in L2 tables that are reachable from the active L1 - table. - -Standard Cluster Descriptor: - - Bit 0: If set to 1, the cluster reads as all zeros. The host - cluster offset can be used to describe a preallocation, - but it won't be used for reading data from this cluster, - nor is data read from the backing file if the cluster is - unallocated. - - With version 2, this is always 0. - - 1 - 8: Reserved (set to 0) - - 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a - cluster boundary. If the offset is 0, the cluster is - unallocated. - - 56 - 61: Reserved (set to 0) - - -Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)): - - Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a - cluster boundary! - - x+1 - 61: Compressed size of the images in sectors of 512 bytes - -If a cluster is unallocated, read requests shall read the data from the backing -file (except if bit 0 in the Standard Cluster Descriptor is set). If there is -no backing file or the backing file is smaller than the image, they shall read -zeros for all parts that are not covered by the backing file. - - -== Snapshots == - -qcow2 supports internal snapshots. Their basic principle of operation is to -switch the active L1 table, so that a different set of host clusters are -exposed to the guest. - -When creating a snapshot, the L1 table should be copied and the refcount of all -L2 tables and clusters reachable from this L1 table must be increased, so that -a write causes a COW and isn't visible in other snapshots. - -When loading a snapshot, bit 63 of all entries in the new active L1 table and -all L2 tables referenced by it must be reconstructed from the refcount table -as it doesn't need to be accurate in inactive L1 tables. - -A directory of all snapshots is stored in the snapshot table, a contiguous area -in the image file, whose starting offset and length are given by the header -fields snapshots_offset and nb_snapshots. The entries of the snapshot table -have variable length, depending on the length of ID, name and extra data. - -Snapshot table entry: - - Byte 0 - 7: Offset into the image file at which the L1 table for the - snapshot starts. Must be aligned to a cluster boundary. - - 8 - 11: Number of entries in the L1 table of the snapshots - - 12 - 13: Length of the unique ID string describing the snapshot - - 14 - 15: Length of the name of the snapshot - - 16 - 19: Time at which the snapshot was taken in seconds since the - Epoch - - 20 - 23: Subsecond part of the time at which the snapshot was taken - in nanoseconds - - 24 - 31: Time that the guest was running until the snapshot was - taken in nanoseconds - - 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. - If there is VM state, it starts at the first cluster - described by first L1 table entry that doesn't describe a - regular guest cluster (i.e. VM state is stored like guest - disk content, except that it is stored at offsets that are - larger than the virtual disk presented to the guest) - - 36 - 39: Size of extra data in the table entry (used for future - extensions of the format) - - variable: Extra data for future extensions. Unknown fields must be - ignored. Currently defined are (offset relative to snapshot - table entry): - - Byte 40 - 47: Size of the VM state in bytes. 0 if no VM - state is saved. If this field is present, - the 32-bit value in bytes 32-35 is ignored. - - Byte 48 - 55: Virtual disk size of the snapshot in bytes - - Version 3 images must include extra data at least up to - byte 55. - - variable: Unique ID string for the snapshot (not null terminated) - - variable: Name of the snapshot (not null terminated) - - variable: Padding to round up the snapshot table entry size to the - next multiple of 8. - - -== Bitmaps == - -As mentioned above, the bitmaps extension provides the ability to store bitmaps -related to a virtual disk. This section describes how these bitmaps are stored. - -All stored bitmaps are related to the virtual disk stored in the same image, so -each bitmap size is equal to the virtual disk size. - -Each bit of the bitmap is responsible for strictly defined range of the virtual -disk. For bit number bit_nr the corresponding range (in bytes) will be: - - [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] - -Granularity is a property of the concrete bitmap, see below. - - -=== Bitmap directory === - -Each bitmap saved in the image is described in a bitmap directory entry. The -bitmap directory is a contiguous area in the image file, whose starting offset -and length are given by the header extension fields bitmap_directory_offset and -bitmap_directory_size. The entries of the bitmap directory have variable -length, depending on the lengths of the bitmap name and extra data. These -entries are also called bitmap headers. - -Structure of a bitmap directory entry: - - Byte 0 - 7: bitmap_table_offset - Offset into the image file at which the bitmap table - (described below) for the bitmap starts. Must be aligned to - a cluster boundary. - - 8 - 11: bitmap_table_size - Number of entries in the bitmap table of the bitmap. - - 12 - 15: flags - Bit - 0: in_use - The bitmap was not saved correctly and may be - inconsistent. - - 1: auto - The bitmap must reflect all changes of the virtual - disk by any application that would write to this qcow2 - file (including writes, snapshot switching, etc.). The - type of this bitmap must be 'dirty tracking bitmap'. - - 2: extra_data_compatible - This flags is meaningful when the extra data is - unknown to the software (currently any extra data is - unknown to Qemu). - If it is set, the bitmap may be used as expected, extra - data must be left as is. - If it is not set, the bitmap must not be used, but - both it and its extra data be left as is. - - Bits 3 - 31 are reserved and must be 0. - - 16: type - This field describes the sort of the bitmap. - Values: - 1: Dirty tracking bitmap - - Values 0, 2 - 255 are reserved. - - 17: granularity_bits - Granularity bits. Valid values: 0 - 63. - - Note: Qemu currently doesn't support granularity_bits - greater than 31. - - Granularity is calculated as - granularity = 1 << granularity_bits - - A bitmap's granularity is how many bytes of the image - accounts for one bit of the bitmap. - - 18 - 19: name_size - Size of the bitmap name. Must be non-zero. - - Note: Qemu currently doesn't support values greater than - 1023. - - 20 - 23: extra_data_size - Size of type-specific extra data. - - For now, as no extra data is defined, extra_data_size is - reserved and should be zero. If it is non-zero the - behavior is defined by extra_data_compatible flag. - - variable: extra_data - Extra data for the bitmap, occupying extra_data_size bytes. - Extra data must never contain references to clusters or in - some other way allocate additional clusters. - - variable: name - The name of the bitmap (not null terminated), occupying - name_size bytes. Must be unique among all bitmap names - within the bitmaps extension. - - variable: Padding to round up the bitmap directory entry size to the - next multiple of 8. All bytes of the padding must be zero. - - -=== Bitmap table === - -Each bitmap is stored using a one-level structure (as opposed to two-level -structures like for refcounts and guest clusters mapping) for the mapping of -bitmap data to host clusters. This structure is called the bitmap table. - -Each bitmap table has a variable size (stored in the bitmap directory entry) -and may use multiple clusters, however, it must be contiguous in the image -file. - -Structure of a bitmap table entry: - - Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. - If bits 9 - 55 are zero: - 0: Cluster should be read as all zeros. - 1: Cluster should be read as all ones. - - 1 - 8: Reserved and must be zero. - - 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to - a cluster boundary. If the offset is 0, the cluster is - unallocated; in that case, bit 0 determines how this - cluster should be treated during reads. - - 56 - 63: Reserved and must be zero. - - -=== Bitmap data === - -As noted above, bitmap data is stored in separate clusters, described by the -bitmap table. Given an offset (in bytes) into the bitmap data, the offset into -the image file can be obtained as follows: - - image_offset(bitmap_data_offset) = - bitmap_table[bitmap_data_offset / cluster_size] + - (bitmap_data_offset % cluster_size) - -This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see -above). - -Given an offset byte_nr into the virtual disk and the bitmap's granularity, the -bit offset into the image file to the corresponding bit of the bitmap can be -calculated like this: - - bit_offset(byte_nr) = - image_offset(byte_nr / granularity / 8) * 8 + - (byte_nr / granularity) % 8 - -If the size of the bitmap data is not a multiple of the cluster size then the -last cluster of the bitmap data contains some unused tail bits. These bits must -be zero. - - -=== Dirty tracking bitmaps === - -Bitmaps with 'type' field equal to one are dirty tracking bitmaps. - -When the virtual disk is in use dirty tracking bitmap may be 'enabled' or -'disabled'. While the bitmap is 'enabled', all writes to the virtual disk -should be reflected in the bitmap. A set bit in the bitmap means that the -corresponding range of the virtual disk (see above) was written to while the -bitmap was 'enabled'. An unset bit means that this range was not written to. - -The software doesn't have to sync the bitmap in the image file with its -representation in RAM after each write. Flag 'in_use' should be set while the -bitmap is not synced. - -In the image file the 'enabled' state is reflected by the 'auto' flag. If this -flag is set, the software must consider the bitmap as 'enabled' and start -tracking virtual disk changes to this bitmap from the first write to the -virtual disk. If this flag is not set then the bitmap is disabled. diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt deleted file mode 100644 index 7982e058b2..0000000000 --- a/docs/specs/qed_spec.txt +++ /dev/null @@ -1,138 +0,0 @@ -=Specification= - -The file format looks like this: - - +----------+----------+----------+-----+ - | cluster0 | cluster1 | cluster2 | ... | - +----------+----------+----------+-----+ - -The first cluster begins with the '''header'''. The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file. A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1 and L2 tables are composed of one or more contiguous clusters. - -Normally the file size will be a multiple of the cluster size. If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written. Legitimate extra information should use space between the header and the first regular cluster. - -All fields are little-endian. - -==Header== - Header { - uint32_t magic; /* QED\0 */ - - uint32_t cluster_size; /* in bytes */ - uint32_t table_size; /* for L1 and L2 tables, in clusters */ - uint32_t header_size; /* in clusters */ - - uint64_t features; /* format feature bits */ - uint64_t compat_features; /* compat feature bits */ - uint64_t autoclear_features; /* self-resetting feature bits */ - - uint64_t l1_table_offset; /* in bytes */ - uint64_t image_size; /* total logical image size, in bytes */ - - /* if (features & QED_F_BACKING_FILE) */ - uint32_t backing_filename_offset; /* in bytes from start of header */ - uint32_t backing_filename_size; /* in bytes */ - } - -Field descriptions: -* ''cluster_size'' must be a power of 2 in range [2^12, 2^26]. -* ''table_size'' must be a power of 2 in range [1, 16]. -* ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters. -* ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps. They work as follows: -** An image with unknown ''features'' bits enabled must not be opened. File format changes that are not backwards-compatible must use ''features'' bits. -** An image with unknown ''compat_features'' bits enabled can be opened safely. The unknown features are simply ignored and represent backwards-compatible changes to the file format. -** An image with unknown ''autoclear_features'' bits enable can be opened safely after clearing the unknown bits. This allows for backwards-compatible changes to the file format which degrade gracefully and can be re-enabled again by a new program later. -* ''l1_table_offset'' is the offset of the first byte of the L1 table in the image file and must be a multiple of ''cluster_size''. -* ''image_size'' is the block device size seen by the guest and must be a multiple of 512 bytes. -* ''backing_filename_offset'' and ''backing_filename_size'' describe a string in (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. The string must be stored within the first ''header_size'' clusters. The backing filename may be an absolute path or relative to the image file. - -Feature bits: -* QED_F_BACKING_FILE = 0x01. The image uses a backing file. -* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. -* QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk image and no file format autodetection should be attempted. This should be used to ensure that raw backing files are never detected as an image format if they happen to contain magic constants. - -There are currently no defined ''compat_features'' or ''autoclear_features'' bits. - -Fields predicated on a feature bit are only used when that feature is set. The fields always take up header space, regardless of whether or not the feature bit is set. - -==Tables== - -Tables provide the translation from logical offsets in the block device to cluster offsets in the file. - - #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) - - Table { - uint64_t offsets[TABLE_NOFFSETS]; - } - -The tables are organized as follows: - - +----------+ - | L1 table | - +----------+ - ,------' | '------. - +----------+ | +----------+ - | L2 table | ... | L2 table | - +----------+ +----------+ - ,------' | '------. - +----------+ | +----------+ - | Data | ... | Data | - +----------+ +----------+ - -A table is made up of one or more contiguous clusters. The table_size header field determines table size for an image file. For example, cluster_size=64 KB and table_size=4 results in 256 KB tables. - -The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table: - header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size - -L1, L2, and data cluster offsets must be aligned to header.cluster_size. The following offsets have special meanings: - -===L2 table offsets=== -* 0 - unallocated. The L2 table is not yet allocated. - -===Data cluster offsets=== -* 0 - unallocated. The data cluster is not yet allocated. -* 1 - zero. The data cluster contents are all zeroes and no cluster is allocated. - -Future format extensions may wish to store per-offset information. The least significant 12 bits of an offset are reserved for this purpose and must be set to zero. Image files with cluster_size > 2^12 will have more unused bits which should also be zeroed. - -===Unallocated L2 tables and data clusters=== -Reads to an unallocated area of the image file access the backing file. If there is no backing file, then zeroes are produced. The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes. - -Writes to an unallocated area cause a new data clusters to be allocated, and a new L2 table if that is also unallocated. The new data cluster is populated with data from the backing file (or zeroes if no backing file) and the data being written. - -===Zero data clusters=== -Zero data clusters are a space-efficient way of storing zeroed regions of the image. - -Reads to a zero data cluster produce zeroes. Note that the difference between an unallocated and a zero data cluster is that zero data clusters stop the reading of contents from the backing file. - -Writes to a zero data cluster cause a new data cluster to be allocated. The new data cluster is populated with zeroes and the data being written. - -===Logical offset translation=== -Logical offsets are translated into cluster offsets as follows: - - table_bits table_bits cluster_bits - <--------> <--------> <---------------> - +----------+----------+-----------------+ - | L1 index | L2 index | byte offset | - +----------+----------+-----------------+ - - Structure of a logical offset - - offset_mask = ~(cluster_size - 1) # mask for the image file byte offset - - def logical_to_cluster_offset(l1_index, l2_index, byte_offset): - l2_offset = l1_table[l1_index] - l2_table = load_table(l2_offset) - cluster_offset = l2_table[l2_index] & offset_mask - return cluster_offset + byte_offset - -==Consistency checking== - -This section is informational and included to provide background on the use of the QED_F_NEED_CHECK ''features'' bit. - -The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an operation that could leave the image in an inconsistent state if interrupted by a crash or power failure. A dirty image must be checked on open because its metadata may not be consistent. - -Consistency check includes the following invariants: -# Each cluster is referenced once and only once. It is an inconsistency to have a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked if it has no references. -# Offsets must be within the image file size and must be ''cluster_size'' aligned. -# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from the end of the image file so that there is space for the entire table. - -The consistency check process starts by from ''l1_table_offset'' and scans all L2 tables. After the check completes with no other errors besides leaks, the QED_F_NEED_CHECK bit can be cleared and the image can be accessed. diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt deleted file mode 100644 index 481ab56e35..0000000000 --- a/docs/specs/vhost-user.txt +++ /dev/null @@ -1,620 +0,0 @@ -Vhost-user Protocol -=================== - -Copyright (c) 2014 Virtual Open Systems Sarl. - -This work is licensed under the terms of the GNU GPL, version 2 or later. -See the COPYING file in the top-level directory. -=================== - -This protocol is aiming to complement the ioctl interface used to control the -vhost implementation in the Linux kernel. It implements the control plane needed -to establish virtqueue sharing with a user space process on the same host. It -uses communication over a Unix domain socket to share file descriptors in the -ancillary data of the message. - -The protocol defines 2 sides of the communication, master and slave. Master is -the application that shares its virtqueues, in our case QEMU. Slave is the -consumer of the virtqueues. - -In the current implementation QEMU is the Master, and the Slave is intended to -be a software Ethernet switch running in user space, such as Snabbswitch. - -Master and slave can be either a client (i.e. connecting) or server (listening) -in the socket communication. - -Message Specification ---------------------- - -Note that all numbers are in the machine native byte order. A vhost-user message -consists of 3 header fields and a payload: - ------------------------------------- -| request | flags | size | payload | ------------------------------------- - - * Request: 32-bit type of the request - * Flags: 32-bit bit field: - - Lower 2 bits are the version (currently 0x01) - - Bit 2 is the reply flag - needs to be sent on each reply from the slave - - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for - details. - * Size - 32-bit size of the payload - - -Depending on the request type, payload can be: - - * A single 64-bit integer - ------- - | u64 | - ------- - - u64: a 64-bit unsigned integer - - * A vring state description - --------------- - | index | num | - --------------- - - Index: a 32-bit index - Num: a 32-bit number - - * A vring address description - -------------------------------------------------------------- - | index | flags | size | descriptor | used | available | log | - -------------------------------------------------------------- - - Index: a 32-bit vring index - Flags: a 32-bit vring flags - Descriptor: a 64-bit user address of the vring descriptor table - Used: a 64-bit user address of the vring used ring - Available: a 64-bit user address of the vring available ring - Log: a 64-bit guest address for logging - - * Memory regions description - --------------------------------------------------- - | num regions | padding | region0 | ... | region7 | - --------------------------------------------------- - - Num regions: a 32-bit number of regions - Padding: 32-bit - - A region is: - ----------------------------------------------------- - | guest address | size | user address | mmap offset | - ----------------------------------------------------- - - Guest address: a 64-bit guest address of the region - Size: a 64-bit size - User address: a 64-bit user address - mmap offset: 64-bit offset where region starts in the mapped memory - -* Log description - --------------------------- - | log size | log offset | - --------------------------- - log size: size of area used for logging - log offset: offset from start of supplied file descriptor - where logging starts (i.e. where guest address 0 would be logged) - - * An IOTLB message - --------------------------------------------------------- - | iova | size | user address | permissions flags | type | - --------------------------------------------------------- - - IOVA: a 64-bit I/O virtual address programmed by the guest - Size: a 64-bit size - User address: a 64-bit user address - Permissions: a 8-bit value: - - 0: No access - - 1: Read access - - 2: Write access - - 3: Read/Write access - Type: a 8-bit IOTLB message type: - - 1: IOTLB miss - - 2: IOTLB update - - 3: IOTLB invalidate - - 4: IOTLB access fail - -In QEMU the vhost-user message is implemented with the following struct: - -typedef struct VhostUserMsg { - VhostUserRequest request; - uint32_t flags; - uint32_t size; - union { - uint64_t u64; - struct vhost_vring_state state; - struct vhost_vring_addr addr; - VhostUserMemory memory; - VhostUserLog log; - struct vhost_iotlb_msg iotlb; - }; -} QEMU_PACKED VhostUserMsg; - -Communication -------------- - -The protocol for vhost-user is based on the existing implementation of vhost -for the Linux Kernel. Most messages that can be sent via the Unix domain socket -implementing vhost-user have an equivalent ioctl to the kernel implementation. - -The communication consists of master sending message requests and slave sending -message replies. Most of the requests don't require replies. Here is a list of -the ones that do: - - * VHOST_USER_GET_FEATURES - * VHOST_USER_GET_PROTOCOL_FEATURES - * VHOST_USER_GET_VRING_BASE - * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) - -[ Also see the section on REPLY_ACK protocol extension. ] - -There are several messages that the master sends with file descriptors passed -in the ancillary data: - - * VHOST_USER_SET_MEM_TABLE - * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD) - * VHOST_USER_SET_LOG_FD - * VHOST_USER_SET_VRING_KICK - * VHOST_USER_SET_VRING_CALL - * VHOST_USER_SET_VRING_ERR - * VHOST_USER_SET_SLAVE_REQ_FD - -If Master is unable to send the full message or receives a wrong reply it will -close the connection. An optional reconnection mechanism can be implemented. - -Any protocol extensions are gated by protocol feature bits, -which allows full backwards compatibility on both master -and slave. -As older slaves don't support negotiating protocol features, -a feature bit was dedicated for this purpose: -#define VHOST_USER_F_PROTOCOL_FEATURES 30 - -Starting and stopping rings ----------------------- -Client must only process each ring when it is started. - -Client must only pass data between the ring and the -backend, when the ring is enabled. - -If ring is started but disabled, client must process the -ring without talking to the backend. - -For example, for a networking device, in the disabled state -client must not supply any new RX packets, but must process -and discard any TX packets. - -If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized -in an enabled state. - -If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized -in a disabled state. Client must not pass data to/from the backend until ring is enabled by -VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by -VHOST_USER_SET_VRING_ENABLE with parameter 0. - -Each ring is initialized in a stopped state, client must not process it until -ring is started, or after it has been stopped. - -Client must start ring upon receiving a kick (that is, detecting that file -descriptor is readable) on the descriptor specified by -VHOST_USER_SET_VRING_KICK, and stop ring upon receiving -VHOST_USER_GET_VRING_BASE. - -While processing the rings (whether they are enabled or not), client must -support changing some configuration aspects on the fly. - -Multiple queue support ----------------------- - -Multiple queue is treated as a protocol extension, hence the slave has to -implement protocol features first. The multiple queues feature is supported -only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set. - -The max number of queues the slave supports can be queried with message -VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of -requested queues is bigger than that. - -As all queues share one connection, the master uses a unique index for each -queue in the sent message to identify a specified queue. One queue pair -is enabled initially. More queues are enabled dynamically, by sending -message VHOST_USER_SET_VRING_ENABLE. - -Migration ---------- - -During live migration, the master may need to track the modifications -the slave makes to the memory mapped regions. The client should mark -the dirty pages in a log. Once it complies to this logging, it may -declare the VHOST_F_LOG_ALL vhost feature. - -To start/stop logging of data/used ring writes, server may send messages -VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with -VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively. - -All the modifications to memory pointed by vring "descriptor" should -be marked. Modifications to "used" vring should be marked if -VHOST_VRING_F_LOG is part of ring's flags. - -Dirty pages are of size: -#define VHOST_LOG_PAGE 0x1000 - -The log memory fd is provided in the ancillary data of -VHOST_USER_SET_LOG_BASE message when the slave has -VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature. - -The size of the log is supplied as part of VhostUserMsg -which should be large enough to cover all known guest -addresses. Log starts at the supplied offset in the -supplied file descriptor. -The log covers from address 0 to the maximum of guest -regions. In pseudo-code, to mark page at "addr" as dirty: - -page = addr / VHOST_LOG_PAGE -log[page / 8] |= 1 << page % 8 - -Where addr is the guest physical address. - -Use atomic operations, as the log may be concurrently manipulated. - -Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG -is set for this ring), log_guest_addr should be used to calculate the log -offset: the write to first byte of the used ring is logged at this offset from -log start. Also note that this value might be outside the legal guest physical -address range (i.e. does not have to be covered by the VhostUserMemory table), -but the bit offset of the last byte of the ring must fall within -the size supplied by VhostUserLog. - -VHOST_USER_SET_LOG_FD is an optional message with an eventfd in -ancillary data, it may be used to inform the master that the log has -been modified. - -Once the source has finished migration, rings will be stopped by -the source. No further update must be done before rings are -restarted. - -IOMMU support -------------- - -When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master -sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG -requests to the slave with a struct vhost_iotlb_msg as payload. For update -events, the iotlb payload has to be filled with the update message type (2), -the I/O virtual address, the size, the user virtual address, and the -permissions flags. Addresses and size must be within vhost memory regions set -via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb -payload has to be filled with the invalidation message type (3), the I/O virtual -address and the size. On success, the slave is expected to reply with a zero -payload, non-zero otherwise. - -The slave relies on the slave communcation channel (see "Slave communication" -section below) to send IOTLB miss and access failure events, by sending -VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg -as payload. For miss events, the iotlb payload has to be filled with the miss -message type (1), the I/O virtual address and the permissions flags. For access -failure event, the iotlb payload has to be filled with the access failure -message type (4), the I/O virtual address and the permissions flags. -For synchronization purpose, the slave may rely on the reply-ack feature, -so the master may send a reply when operation is completed if the reply-ack -feature is negotiated and slaves requests a reply. For miss events, completed -operation means either master sent an update message containing the IOTLB entry -containing requested address and permission, or master sent nothing if the IOTLB -miss message is invalid (invalid IOVA or permission). - -The master isn't expected to take the initiative to send IOTLB update messages, -as the slave sends IOTLB miss messages for the guest virtual memory areas it -needs to access. - -Slave communication -------------------- - -An optional communication channel is provided if the slave declares -VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make -requests to the master. - -The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data. - -A slave may then send VHOST_USER_SLAVE_* messages to the master -using this fd communication channel. - -Protocol features ------------------ - -#define VHOST_USER_PROTOCOL_F_MQ 0 -#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 -#define VHOST_USER_PROTOCOL_F_RARP 2 -#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 -#define VHOST_USER_PROTOCOL_F_MTU 4 -#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 - -Master message types --------------------- - - * VHOST_USER_GET_FEATURES - - Id: 1 - Equivalent ioctl: VHOST_GET_FEATURES - Master payload: N/A - Slave payload: u64 - - Get from the underlying vhost implementation the features bitmask. - Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for - VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. - - * VHOST_USER_SET_FEATURES - - Id: 2 - Ioctl: VHOST_SET_FEATURES - Master payload: u64 - - Enable features in the underlying vhost implementation using a bitmask. - Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for - VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES. - - * VHOST_USER_GET_PROTOCOL_FEATURES - - Id: 15 - Equivalent ioctl: VHOST_GET_FEATURES - Master payload: N/A - Slave payload: u64 - - Get the protocol feature bitmask from the underlying vhost implementation. - Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in - VHOST_USER_GET_FEATURES. - Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support - this message even before VHOST_USER_SET_FEATURES was called. - - * VHOST_USER_SET_PROTOCOL_FEATURES - - Id: 16 - Ioctl: VHOST_SET_FEATURES - Master payload: u64 - - Enable protocol features in the underlying vhost implementation. - Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in - VHOST_USER_GET_FEATURES. - Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support - this message even before VHOST_USER_SET_FEATURES was called. - - * VHOST_USER_SET_OWNER - - Id: 3 - Equivalent ioctl: VHOST_SET_OWNER - Master payload: N/A - - Issued when a new connection is established. It sets the current Master - as an owner of the session. This can be used on the Slave as a - "session start" flag. - - * VHOST_USER_RESET_OWNER - - Id: 4 - Master payload: N/A - - This is no longer used. Used to be sent to request disabling - all rings, but some clients interpreted it to also discard - connection state (this interpretation would lead to bugs). - It is recommended that clients either ignore this message, - or use it to disable all rings. - - * VHOST_USER_SET_MEM_TABLE - - Id: 5 - Equivalent ioctl: VHOST_SET_MEM_TABLE - Master payload: memory regions description - - Sets the memory map regions on the slave so it can translate the vring - addresses. In the ancillary data there is an array of file descriptors - for each memory mapped region. The size and ordering of the fds matches - the number and ordering of memory regions. - - * VHOST_USER_SET_LOG_BASE - - Id: 6 - Equivalent ioctl: VHOST_SET_LOG_BASE - Master payload: u64 - Slave payload: N/A - - Sets logging shared memory space. - When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol - feature, the log memory fd is provided in the ancillary data of - VHOST_USER_SET_LOG_BASE message, the size and offset of shared - memory area provided in the message. - - - * VHOST_USER_SET_LOG_FD - - Id: 7 - Equivalent ioctl: VHOST_SET_LOG_FD - Master payload: N/A - - Sets the logging file descriptor, which is passed as ancillary data. - - * VHOST_USER_SET_VRING_NUM - - Id: 8 - Equivalent ioctl: VHOST_SET_VRING_NUM - Master payload: vring state description - - Set the size of the queue. - - * VHOST_USER_SET_VRING_ADDR - - Id: 9 - Equivalent ioctl: VHOST_SET_VRING_ADDR - Master payload: vring address description - Slave payload: N/A - - Sets the addresses of the different aspects of the vring. - - * VHOST_USER_SET_VRING_BASE - - Id: 10 - Equivalent ioctl: VHOST_SET_VRING_BASE - Master payload: vring state description - - Sets the base offset in the available vring. - - * VHOST_USER_GET_VRING_BASE - - Id: 11 - Equivalent ioctl: VHOST_USER_GET_VRING_BASE - Master payload: vring state description - Slave payload: vring state description - - Get the available vring base offset. - - * VHOST_USER_SET_VRING_KICK - - Id: 12 - Equivalent ioctl: VHOST_SET_VRING_KICK - Master payload: u64 - - Set the event file descriptor for adding buffers to the vring. It - is passed in the ancillary data. - Bits (0-7) of the payload contain the vring index. Bit 8 is the - invalid FD flag. This flag is set when there is no file descriptor - in the ancillary data. This signals that polling should be used - instead of waiting for a kick. - - * VHOST_USER_SET_VRING_CALL - - Id: 13 - Equivalent ioctl: VHOST_SET_VRING_CALL - Master payload: u64 - - Set the event file descriptor to signal when buffers are used. It - is passed in the ancillary data. - Bits (0-7) of the payload contain the vring index. Bit 8 is the - invalid FD flag. This flag is set when there is no file descriptor - in the ancillary data. This signals that polling will be used - instead of waiting for the call. - - * VHOST_USER_SET_VRING_ERR - - Id: 14 - Equivalent ioctl: VHOST_SET_VRING_ERR - Master payload: u64 - - Set the event file descriptor to signal when error occurs. It - is passed in the ancillary data. - Bits (0-7) of the payload contain the vring index. Bit 8 is the - invalid FD flag. This flag is set when there is no file descriptor - in the ancillary data. - - * VHOST_USER_GET_QUEUE_NUM - - Id: 17 - Equivalent ioctl: N/A - Master payload: N/A - Slave payload: u64 - - Query how many queues the backend supports. This request should be - sent only when VHOST_USER_PROTOCOL_F_MQ is set in queried protocol - features by VHOST_USER_GET_PROTOCOL_FEATURES. - - * VHOST_USER_SET_VRING_ENABLE - - Id: 18 - Equivalent ioctl: N/A - Master payload: vring state description - - Signal slave to enable or disable corresponding vring. - This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES - has been negotiated. - - * VHOST_USER_SEND_RARP - - Id: 19 - Equivalent ioctl: N/A - Master payload: u64 - - Ask vhost user backend to broadcast a fake RARP to notify the migration - is terminated for guest that does not support GUEST_ANNOUNCE. - Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in - VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP - is present in VHOST_USER_GET_PROTOCOL_FEATURES. - The first 6 bytes of the payload contain the mac address of the guest to - allow the vhost user backend to construct and broadcast the fake RARP. - - * VHOST_USER_NET_SET_MTU - - Id: 20 - Equivalent ioctl: N/A - Master payload: u64 - - Set host MTU value exposed to the guest. - This request should be sent only when VIRTIO_NET_F_MTU feature has been - successfully negotiated, VHOST_USER_F_PROTOCOL_FEATURES is present in - VHOST_USER_GET_FEATURES and protocol feature bit - VHOST_USER_PROTOCOL_F_NET_MTU is present in - VHOST_USER_GET_PROTOCOL_FEATURES. - If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond - with zero in case the specified MTU is valid, or non-zero otherwise. - - * VHOST_USER_SET_SLAVE_REQ_FD - - Id: 21 - Equivalent ioctl: N/A - Master payload: N/A - - Set the socket file descriptor for slave initiated requests. It is passed - in the ancillary data. - This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES - has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ - bit is present in VHOST_USER_GET_PROTOCOL_FEATURES. - If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond - with zero for success, non-zero otherwise. - - * VHOST_USER_IOTLB_MSG - - Id: 22 - Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) - Master payload: struct vhost_iotlb_msg - Slave payload: u64 - - Send IOTLB messages with struct vhost_iotlb_msg as payload. - Master sends such requests to update and invalidate entries in the device - IOTLB. The slave has to acknowledge the request with sending zero as u64 - payload for success, non-zero otherwise. - This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature - has been successfully negotiated. - -Slave message types -------------------- - - * VHOST_USER_SLAVE_IOTLB_MSG - - Id: 1 - Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) - Slave payload: struct vhost_iotlb_msg - Master payload: N/A - - Send IOTLB messages with struct vhost_iotlb_msg as payload. - Slave sends such requests to notify of an IOTLB miss, or an IOTLB - access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, - and slave set the VHOST_USER_NEED_REPLY flag, master must respond with - zero when operation is successfully completed, or non-zero otherwise. - This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature - has been successfully negotiated. - -VHOST_USER_PROTOCOL_F_REPLY_ACK: -------------------------------- -The original vhost-user specification only demands replies for certain -commands. This differs from the vhost protocol implementation where commands -are sent over an ioctl() call and block until the client has completed. - -With this protocol extension negotiated, the sender (QEMU) can set the -"need_reply" [Bit 3] flag to any command. This indicates that -the client MUST respond with a Payload VhostUserMsg indicating success or -failure. The payload should be set to zero on success or non-zero on failure, -unless the message already has an explicit reply body. - -The response payload gives QEMU a deterministic indication of the result -of the command. Today, QEMU is expected to terminate the main vhost-user -loop upon receiving such errors. In future, qemu could be taught to be more -resilient for selective requests. - -For the message types that already solicit a reply from the client, the -presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings -no behavioural change. (See the 'Communication' section for details.) |