diff options
author | Shaohua Li <shli@fb.com> | 2017-05-01 14:09:21 -0700 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2017-05-01 14:09:21 -0700 |
commit | e265eb3a30543a237b2ebc4e0422ac82e55b07e4 (patch) | |
tree | 5485bce4a0645e5e9b6ef4686bd390b7b2599ffb /Documentation | |
parent | 85724edecbdc19f53ed4b902fc3a32e4d1b61c9b (diff) | |
parent | b506335e5d2b4ec687dde392a3bdbf7601778f1d (diff) |
Merge branch 'md-next' into md-linus
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/md.rst | 32 | ||||
-rw-r--r-- | Documentation/md/md-cluster.txt | 2 | ||||
-rw-r--r-- | Documentation/md/raid5-ppl.txt | 44 |
3 files changed, 74 insertions, 4 deletions
diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst index 1e61bf50595c..84de718f24a4 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -276,14 +276,14 @@ All md devices contain: array creation it will default to 0, though starting the array as ``clean`` will set it much larger. - new_dev + new_dev This file can be written but not read. The value written should be a block device number as major:minor. e.g. 8:0 This will cause that device to be attached to the array, if it is available. It will then appear at md/dev-XXX (depending on the name of the device) and further configuration is then possible. - safe_mode_delay + safe_mode_delay When an md array has seen no write requests for a certain period of time, it will be marked as ``clean``. When another write request arrives, the array is marked as ``dirty`` before the write @@ -292,7 +292,7 @@ All md devices contain: period as a number of seconds. The default is 200msec (0.200). Writing a value of 0 disables safemode. - array_state + array_state This file contains a single word which describes the current state of the array. In many cases, the state can be set by writing the word for the desired state, however some states @@ -401,7 +401,30 @@ All md devices contain: once the array becomes non-degraded, and this fact has been recorded in the metadata. + consistency_policy + This indicates how the array maintains consistency in case of unexpected + shutdown. It can be: + none + Array has no redundancy information, e.g. raid0, linear. + + resync + Full resync is performed and all redundancy is regenerated when the + array is started after unclean shutdown. + + bitmap + Resync assisted by a write-intent bitmap. + + journal + For raid4/5/6, journal device is used to log transactions and replay + after unclean shutdown. + + ppl + For raid5 only, Partial Parity Log is used to close the write hole and + eliminate resync. + + The accepted values when writing to this file are ``ppl`` and ``resync``, + used to enable and disable PPL. As component devices are added to an md array, they appear in the ``md`` @@ -563,6 +586,9 @@ Each directory contains: adds bad blocks without acknowledging them. This is largely for testing. + ppl_sector, ppl_size + Location and size (in sectors) of the space used for Partial Parity Log + on this device. An active md device will also contain an entry for each active device diff --git a/Documentation/md/md-cluster.txt b/Documentation/md/md-cluster.txt index 38883276d31c..2663d49dd8a0 100644 --- a/Documentation/md/md-cluster.txt +++ b/Documentation/md/md-cluster.txt @@ -321,4 +321,4 @@ The algorithm is: There are somethings which are not supported by cluster MD yet. -- update size and change array_sectors. +- change array_sectors. diff --git a/Documentation/md/raid5-ppl.txt b/Documentation/md/raid5-ppl.txt new file mode 100644 index 000000000000..127072b09363 --- /dev/null +++ b/Documentation/md/raid5-ppl.txt @@ -0,0 +1,44 @@ +Partial Parity Log + +Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue +addressed by PPL is that after a dirty shutdown, parity of a particular stripe +may become inconsistent with data on other member disks. If the array is also +in degraded state, there is no way to recalculate parity, because one of the +disks is missing. This can lead to silent data corruption when rebuilding the +array or using it is as degraded - data calculated from parity for array blocks +that have not been touched by a write request during the unclean shutdown can +be incorrect. Such condition is known as the RAID5 Write Hole. Because of +this, md by default does not allow starting a dirty degraded array. + +Partial parity for a write operation is the XOR of stripe data chunks not +modified by this write. It is just enough data needed for recovering from the +write hole. XORing partial parity with the modified chunks produces parity for +the stripe, consistent with its state before the write operation, regardless of +which chunk writes have completed. If one of the not modified data disks of +this stripe is missing, this updated parity can be used to recover its +contents. PPL recovery is also performed when starting an array after an +unclean shutdown and all disks are available, eliminating the need to resync +the array. Because of this, using write-intent bitmap and PPL together is not +supported. + +When handling a write request PPL writes partial parity before new data and +parity are dispatched to disks. PPL is a distributed log - it is stored on +array member drives in the metadata area, on the parity drive of a particular +stripe. It does not require a dedicated journaling drive. Write performance is +reduced by up to 30%-40% but it scales with the number of drives in the array +and the journaling drive does not become a bottleneck or a single point of +failure. + +Unlike raid5-cache, the other solution in md for closing the write hole, PPL is +not a true journal. It does not protect from losing in-flight data, only from +silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is +performed for this stripe (parity is not updated). So it is possible to have +arbitrary data in the written part of a stripe if that disk is lost. In such +case the behavior is the same as in plain raid5. + +PPL is available for md version-1 metadata and external (specifically IMSM) +metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl. + +Currently, volatile write-back cache should be disabled on all member drives +when using PPL. Otherwise it cannot guarantee consistency in case of power +failure. |