summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/sysfs-driver-mlxreg-io6
-rw-r--r--Documentation/ABI/testing/sysfs-class-led-trigger-pattern51
-rw-r--r--Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg18
-rw-r--r--Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html26
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html6
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg2
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg8
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg6
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.html20
-rw-r--r--Documentation/RCU/stallwarn.txt15
-rw-r--r--Documentation/RCU/torture.txt169
-rw-r--r--Documentation/RCU/whatisRCU.txt4
-rw-r--r--Documentation/admin-guide/README.rst32
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt52
-rw-r--r--Documentation/arm64/silicon-errata.txt2
-rw-r--r--Documentation/bpf/bpf_design_QA.rst24
-rw-r--r--Documentation/bpf/btf.rst848
-rw-r--r--Documentation/bpf/index.rst7
-rw-r--r--Documentation/core-api/refcount-vs-atomic.rst24
-rw-r--r--Documentation/devicetree/bindings/Makefile6
-rw-r--r--Documentation/devicetree/bindings/crypto/samsung-slimsss.txt19
-rw-r--r--Documentation/devicetree/bindings/hwmon/ad741x.txt15
-rw-r--r--Documentation/devicetree/bindings/hwmon/dps650ab.txt11
-rw-r--r--Documentation/devicetree/bindings/hwmon/hih6130.txt12
-rw-r--r--Documentation/devicetree/bindings/hwmon/ina3221.txt10
-rw-r--r--Documentation/devicetree/bindings/hwmon/lm75.txt37
-rw-r--r--Documentation/devicetree/bindings/hwmon/pwm-fan.txt3
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/fsl,irqsteer.txt11
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/loongson,ls1x-intc.txt24
-rw-r--r--Documentation/devicetree/bindings/leds/common.txt12
-rw-r--r--Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt49
-rw-r--r--Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt17
-rw-r--r--Documentation/devicetree/bindings/mips/lantiq/rcu-gphy.txt36
-rw-r--r--Documentation/devicetree/bindings/mips/lantiq/rcu.txt18
-rw-r--r--Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt1
-rw-r--r--Documentation/devicetree/bindings/mmc/mmc.txt2
-rw-r--r--Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt6
-rw-r--r--Documentation/devicetree/bindings/mmc/ti-omap.txt28
-rw-r--r--Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt60
-rw-r--r--Documentation/devicetree/bindings/mtd/cadence-quadspi.txt1
-rw-r--r--Documentation/devicetree/bindings/mtd/mtk-quadspi.txt3
-rw-r--r--Documentation/devicetree/bindings/mtd/stm32-fmc2-nand.txt61
-rw-r--r--Documentation/devicetree/bindings/net/btusb.txt3
-rw-r--r--Documentation/devicetree/bindings/net/dsa/ksz.txt145
-rw-r--r--Documentation/devicetree/bindings/net/dsa/mt7530.txt6
-rw-r--r--Documentation/devicetree/bindings/net/fsl-enetc.txt69
-rw-r--r--Documentation/devicetree/bindings/net/macb.txt4
-rw-r--r--Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt2
-rw-r--r--Documentation/devicetree/bindings/net/mdio-mux-multiplexer.txt82
-rw-r--r--Documentation/devicetree/bindings/net/mediatek-bluetooth.txt64
-rw-r--r--Documentation/devicetree/bindings/net/nixge.txt72
-rw-r--r--Documentation/devicetree/bindings/net/qcom,ethqos.txt64
-rw-r--r--Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt19
-rw-r--r--Documentation/devicetree/bindings/phy/phy-armada38x-comphy.txt40
-rw-r--r--Documentation/devicetree/bindings/ptp/ptp-qoriq.txt5
-rw-r--r--Documentation/devicetree/bindings/regulator/fan53555.txt3
-rw-r--r--Documentation/devicetree/bindings/regulator/fixed-regulator.txt35
-rw-r--r--Documentation/devicetree/bindings/regulator/fixed-regulator.yaml67
-rw-r--r--Documentation/devicetree/bindings/regulator/max77650-regulator.txt41
-rw-r--r--Documentation/devicetree/bindings/regulator/pfuze100.txt2
-rw-r--r--Documentation/devicetree/bindings/regulator/rohm,bd70528-regulator.txt68
-rw-r--r--Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt38
-rw-r--r--Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt6
-rw-r--r--Documentation/devicetree/bindings/regulator/tps65218.txt9
-rw-r--r--Documentation/devicetree/bindings/serio/olpc,ap-sp.txt4
-rw-r--r--Documentation/devicetree/bindings/spi/atmel-quadspi.txt12
-rw-r--r--Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt1
-rw-r--r--Documentation/devicetree/bindings/spi/spi-fsl-qspi.txt (renamed from Documentation/devicetree/bindings/mtd/fsl-quadspi.txt)18
-rw-r--r--Documentation/devicetree/bindings/spi/spi-nxp-fspi.txt39
-rw-r--r--Documentation/devicetree/bindings/spi/spi-sifive.txt37
-rw-r--r--Documentation/devicetree/bindings/spi/spi-sprd.txt7
-rw-r--r--Documentation/devicetree/bindings/spi/spi-stm32.txt9
-rw-r--r--Documentation/devicetree/bindings/timer/fsl,imxgpt.txt39
-rw-r--r--Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt11
-rw-r--r--Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt36
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,cmt.txt2
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,tmu.txt1
-rw-r--r--Documentation/devicetree/bindings/trivial-devices.yaml2
-rw-r--r--Documentation/driver-api/80211/mac80211.rst3
-rw-r--r--Documentation/hwmon/lm859
-rw-r--r--Documentation/networking/af_xdp.rst36
-rw-r--r--Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst14
-rw-r--r--Documentation/networking/device_drivers/intel/e100.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/e1000.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/e1000e.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/fm10k.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/i40e.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/iavf.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/ice.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/igb.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/igbvf.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/ixgb.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/ixgbe.rst1
-rw-r--r--Documentation/networking/device_drivers/intel/ixgbevf.rst1
-rw-r--r--Documentation/networking/device_drivers/stmicro/stmmac.txt2
-rw-r--r--Documentation/networking/devlink-health.txt86
-rw-r--r--Documentation/networking/devlink-info-versions.rst43
-rw-r--r--Documentation/networking/devlink-params-mlxsw.txt10
-rw-r--r--Documentation/networking/dsa/dsa.txt23
-rw-r--r--Documentation/networking/filter.txt33
-rw-r--r--Documentation/networking/ieee802154.rst (renamed from Documentation/networking/ieee802154.txt)193
-rw-r--r--Documentation/networking/index.rst4
-rw-r--r--Documentation/networking/msg_zerocopy.rst2
-rw-r--r--Documentation/networking/operstates.txt14
-rw-r--r--Documentation/networking/phy.rst447
-rw-r--r--Documentation/networking/phy.txt427
-rw-r--r--Documentation/networking/sfp-phylink.rst268
-rw-r--r--Documentation/networking/snmp_counter.rst295
-rw-r--r--Documentation/networking/switchdev.txt37
-rw-r--r--Documentation/networking/timestamping.txt43
-rw-r--r--Documentation/power/energy-model.txt144
-rw-r--r--Documentation/process/applying-patches.rst117
-rw-r--r--Documentation/scheduler/sched-energy.txt425
-rw-r--r--Documentation/spi/pxa2xx10
-rw-r--r--Documentation/sysctl/fs.txt28
-rw-r--r--Documentation/sysctl/kernel.txt12
-rw-r--r--Documentation/sysctl/net.txt15
-rw-r--r--Documentation/translations/it_IT/admin-guide/README.rst2
-rw-r--r--Documentation/userspace-api/spec_ctrl.rst27
-rw-r--r--Documentation/x86/resctrl_ui.txt2
120 files changed, 4348 insertions, 1259 deletions
diff --git a/Documentation/ABI/stable/sysfs-driver-mlxreg-io b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
index 9b642669cb16..169fe08a649b 100644
--- a/Documentation/ABI/stable/sysfs-driver-mlxreg-io
+++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io
@@ -24,7 +24,7 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
cpld3_version
Date: November 2018
-KernelVersion: 4.21
+KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show with which CPLD versions have been burned
on LED board.
@@ -35,7 +35,7 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
jtag_enable
Date: November 2018
-KernelVersion: 4.21
+KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files enable and disable the access to the JTAG domain.
By default access to the JTAG domain is disabled.
@@ -105,7 +105,7 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
reset_voltmon_upgrade_fail
Date: November 2018
-KernelVersion: 4.21
+KernelVersion: 5.0
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show the system reset cause, as following: ComEx
power fail, reset from ComEx, system platform reset, reset
diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
index 1e5d172e0646..bd92ef9d6faa 100644
--- a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
+++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
@@ -7,55 +7,10 @@ Description:
timer. It can do gradual dimming and step change of brightness.
The pattern is given by a series of tuples, of brightness and
- duration (ms). The LED is expected to traverse the series and
- each brightness value for the specified duration. Duration of
- 0 means brightness should immediately change to new value, and
- writing malformed pattern deactivates any active one.
+ duration (ms).
- 1. For gradual dimming, the dimming interval now is set as 50
- milliseconds. So the tuple with duration less than dimming
- interval (50ms) is treated as a step change of brightness,
- i.e. the subsequent brightness will be applied without adding
- intervening dimming intervals.
-
- The gradual dimming format of the software pattern values should be:
- "brightness_1 duration_1 brightness_2 duration_2 brightness_3
- duration_3 ...". For example:
-
- echo 0 1000 255 2000 > pattern
-
- It will make the LED go gradually from zero-intensity to max (255)
- intensity in 1000 milliseconds, then back to zero intensity in 2000
- milliseconds:
-
- LED brightness
- ^
- 255-| / \ / \ /
- | / \ / \ /
- | / \ / \ /
- | / \ / \ /
- 0-| / \/ \/
- +---0----1----2----3----4----5----6------------> time (s)
-
- 2. To make the LED go instantly from one brightness value to another,
- we should use zero-time lengths (the brightness must be same as
- the previous tuple's). So the format should be:
- "brightness_1 duration_1 brightness_1 0 brightness_2 duration_2
- brightness_2 0 ...". For example:
-
- echo 0 1000 0 0 255 2000 255 0 > pattern
-
- It will make the LED stay off for one second, then stay at max brightness
- for two seconds:
-
- LED brightness
- ^
- 255-| +---------+ +---------+
- | | | | |
- | | | | |
- | | | | |
- 0-| -----+ +----+ +----
- +---0----1----2----3----4----5----6------------> time (s)
+ The exact format is described in:
+ Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt
What: /sys/class/leds/<led>/hw_pattern
Date: September 2018
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg b/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg
index e4233ac93c2b..6189ffcc6aff 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg
@@ -328,13 +328,13 @@
inkscape:window-height="1148"
id="namedview90"
showgrid="true"
- inkscape:zoom="0.80021373"
- inkscape:cx="462.49289"
- inkscape:cy="473.6718"
+ inkscape:zoom="0.69092787"
+ inkscape:cx="476.34085"
+ inkscape:cy="712.80957"
inkscape:window-x="770"
inkscape:window-y="24"
inkscape:window-maximized="0"
- inkscape:current-layer="g4114-9-3-9"
+ inkscape:current-layer="g4"
inkscape:snap-grids="false"
fit-margin-top="5"
fit-margin-right="5"
@@ -813,14 +813,18 @@
<text
sodipodi:linespacing="125%"
id="text4110-5-7-6-2-4-0"
- y="841.88086"
+ y="670.74316"
x="1460.1007"
style="font-size:267.24359131px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
xml:space="preserve"><tspan
- y="841.88086"
+ y="670.74316"
+ x="1460.1007"
+ sodipodi:role="line"
+ id="tspan4925-1-2-4-5">Request</tspan><tspan
+ y="1004.7976"
x="1460.1007"
sodipodi:role="line"
- id="tspan4925-1-2-4-5">reched_cpu()</tspan></text>
+ id="tspan3100">context switch</tspan></text>
</g>
</g>
</svg>
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
index 8e4f873b979f..19e7a5fb6b73 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
@@ -72,10 +72,10 @@ will ignore it because idle and offline CPUs are already residing
in quiescent states.
Otherwise, the expedited grace period will use
<tt>smp_call_function_single()</tt> to send the CPU an IPI, which
-is handled by <tt>sync_rcu_exp_handler()</tt>.
+is handled by <tt>rcu_exp_handler()</tt>.
<p>
-However, because this is preemptible RCU, <tt>sync_rcu_exp_handler()</tt>
+However, because this is preemptible RCU, <tt>rcu_exp_handler()</tt>
can check to see if the CPU is currently running in an RCU read-side
critical section.
If not, the handler can immediately report a quiescent state.
@@ -145,19 +145,18 @@ expedited grace period is shown in the following diagram:
<p><img src="ExpSchedFlow.svg" alt="ExpSchedFlow.svg" width="55%">
<p>
-As with RCU-preempt's <tt>synchronize_rcu_expedited()</tt>,
+As with RCU-preempt, RCU-sched's
<tt>synchronize_sched_expedited()</tt> ignores offline and
idle CPUs, again because they are in remotely detectable
quiescent states.
-However, the <tt>synchronize_rcu_expedited()</tt> handler
-is <tt>sync_sched_exp_handler()</tt>, and because the
+However, because the
<tt>rcu_read_lock_sched()</tt> and <tt>rcu_read_unlock_sched()</tt>
leave no trace of their invocation, in general it is not possible to tell
whether or not the current CPU is in an RCU read-side critical section.
-The best that <tt>sync_sched_exp_handler()</tt> can do is to check
+The best that RCU-sched's <tt>rcu_exp_handler()</tt> can do is to check
for idle, on the off-chance that the CPU went idle while the IPI
was in flight.
-If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports
+If the CPU is idle, then <tt>rcu_exp_handler()</tt> reports
the quiescent state.
<p> Otherwise, the handler forces a future context switch by setting the
@@ -298,19 +297,18 @@ Instead, the task pushing the grace period forward will include the
idle CPUs in the mask passed to <tt>rcu_report_exp_cpu_mult()</tt>.
<p>
-For RCU-sched, there is an additional check for idle in the IPI
-handler, <tt>sync_sched_exp_handler()</tt>.
+For RCU-sched, there is an additional check:
If the IPI has interrupted the idle loop, then
-<tt>sync_sched_exp_handler()</tt> invokes <tt>rcu_report_exp_rdp()</tt>
+<tt>rcu_exp_handler()</tt> invokes <tt>rcu_report_exp_rdp()</tt>
to report the corresponding quiescent state.
<p>
For RCU-preempt, there is no specific check for idle in the
-IPI handler (<tt>sync_rcu_exp_handler()</tt>), but because
+IPI handler (<tt>rcu_exp_handler()</tt>), but because
RCU read-side critical sections are not permitted within the
-idle loop, if <tt>sync_rcu_exp_handler()</tt> sees that the CPU is within
+idle loop, if <tt>rcu_exp_handler()</tt> sees that the CPU is within
RCU read-side critical section, the CPU cannot possibly be idle.
-Otherwise, <tt>sync_rcu_exp_handler()</tt> invokes
+Otherwise, <tt>rcu_exp_handler()</tt> invokes
<tt>rcu_report_exp_rdp()</tt> to report the corresponding quiescent
state, regardless of whether or not that quiescent state was due to
the CPU being idle.
@@ -625,6 +623,8 @@ checks, but only during the mid-boot dead zone.
<p>
With this refinement, synchronous grace periods can now be used from
task context pretty much any time during the life of the kernel.
+That is, aside from some points in the suspend, hibernate, or shutdown
+code path.
<h3><a name="Summary">
Summary</a></h3>
diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
index e4d94fba6c89..8d21af02b1f0 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
@@ -485,13 +485,13 @@ section that the grace period must wait on.
noted by <tt>rcu_node_context_switch()</tt> on the left.
On the other hand, if the CPU takes a scheduler-clock interrupt
while executing in usermode, a quiescent state will be noted by
-<tt>rcu_check_callbacks()</tt> on the right.
+<tt>rcu_sched_clock_irq()</tt> on the right.
Either way, the passage through a quiescent state will be noted
in a per-CPU variable.
<p>The next time an <tt>RCU_SOFTIRQ</tt> handler executes on
this CPU (for example, after the next scheduler-clock
-interrupt), <tt>__rcu_process_callbacks()</tt> will invoke
+interrupt), <tt>rcu_core()</tt> will invoke
<tt>rcu_check_quiescent_state()</tt>, which will notice the
recorded quiescent state, and invoke
<tt>rcu_report_qs_rdp()</tt>.
@@ -651,7 +651,7 @@ to end.
These callbacks are identified by <tt>rcu_advance_cbs()</tt>,
which is usually invoked by <tt>__note_gp_changes()</tt>.
As shown in the diagram below, this invocation can be triggered by
-the scheduling-clock interrupt (<tt>rcu_check_callbacks()</tt> on
+the scheduling-clock interrupt (<tt>rcu_sched_clock_irq()</tt> on
the left) or by idle entry (<tt>rcu_cleanup_after_idle()</tt> on
the right, but only for kernels build with
<tt>CONFIG_RCU_FAST_NO_HZ=y</tt>).
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg
index 832408313d93..3fcf0c17cef2 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg
@@ -349,7 +349,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_check_callbacks()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_sched_clock_irq()</text>
<rect
x="7069.6187"
y="5087.4678"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index acd73c7ad0f4..2bcd742d6e49 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -3902,7 +3902,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
- xml:space="preserve">rcu_check_callbacks()</text>
+ xml:space="preserve">rcu_sched_clock_irq()</text>
</g>
<g
transform="translate(-850.30204,55463.106)"
@@ -3924,7 +3924,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
- xml:space="preserve">rcu_process_callbacks()</text>
+ xml:space="preserve">rcu_core()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-5-3-27-0"
@@ -3933,7 +3933,7 @@
font-style="normal"
y="-4165.7954"
x="3745.7725"
- xml:space="preserve">rcu_check_quiescent_state())</text>
+ xml:space="preserve">rcu_check_quiescent_state()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-5-3-27-0-9"
@@ -4968,7 +4968,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-19"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_check_callbacks()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_sched_clock_irq()</text>
<rect
x="5314.2671"
y="82817.688"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
index 149bec2a4493..779c9ac31a52 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg
@@ -775,7 +775,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
- xml:space="preserve">rcu_check_callbacks()</text>
+ xml:space="preserve">rcu_sched_clock_irq()</text>
</g>
<g
transform="translate(399.7744,828.86448)"
@@ -797,7 +797,7 @@
font-style="normal"
y="-4418.6582"
x="3745.7725"
- xml:space="preserve">rcu_process_callbacks()</text>
+ xml:space="preserve">rcu_core()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-5-3-27-0"
@@ -806,7 +806,7 @@
font-style="normal"
y="-4165.7954"
x="3745.7725"
- xml:space="preserve">rcu_check_quiescent_state())</text>
+ xml:space="preserve">rcu_check_quiescent_state()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-5-3-27-0-9"
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 9fca73e03a98..5a9238a2883c 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -3099,7 +3099,7 @@ If you block forever in one of a given domain's SRCU read-side critical
sections, then that domain's grace periods will also be blocked forever.
Of course, one good way to block forever is to deadlock, which can
happen if any operation in a given domain's SRCU read-side critical
-section can block waiting, either directly or indirectly, for that domain's
+section can wait, either directly or indirectly, for that domain's
grace period to elapse.
For example, this results in a self-deadlock:
@@ -3139,12 +3139,18 @@ API, which, in combination with <tt>srcu_read_unlock()</tt>,
guarantees a full memory barrier.
<p>
-Also unlike other RCU flavors, SRCU's callbacks-wait function
-<tt>srcu_barrier()</tt> may be invoked from CPU-hotplug notifiers,
-though this is not necessarily a good idea.
-The reason that this is possible is that SRCU is insensitive
-to whether or not a CPU is online, which means that <tt>srcu_barrier()</tt>
-need not exclude CPU-hotplug operations.
+Also unlike other RCU flavors, <tt>synchronize_srcu()</tt> may <b>not</b>
+be invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
+periods make use of timers and the possibility of timers being temporarily
+&ldquo;stranded&rdquo; on the outgoing CPU.
+This stranding of timers means that timers posted to the outgoing CPU
+will not fire until late in the CPU-hotplug process.
+The problem is that if a notifier is waiting on an SRCU grace period,
+that grace period is waiting on a timer, and that timer is stranded on the
+outgoing CPU, then the notifier will never be awakened, in other words,
+deadlock has occurred.
+This same situation of course also prohibits <tt>srcu_barrier()</tt>
+from being invoked from CPU-hotplug notifiers.
<p>
SRCU also differs from other RCU flavors in that SRCU's expedited and
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 073dbc12d1ea..1ab70c37921f 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -219,17 +219,18 @@ an estimate of the total number of RCU callbacks queued across all CPUs
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU:
- 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
+ 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
The "last_accelerate:" prints the low-order 16 bits (in hex) of the
jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
-rcu_prepare_for_idle(). The "nonlazy_posted:" prints the number
-of non-lazy callbacks posted since the last call to rcu_needs_cpu().
-Finally, an "L" indicates that there are currently no non-lazy callbacks
-("." is printed otherwise, as shown above) and "D" indicates that
-dyntick-idle processing is enabled ("." is printed otherwise, for example,
-if disabled via the "nohz=" kernel boot parameter).
+rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback
+status, so that an "l" indicates that all callbacks were lazy at the start
+of the last idle period and an "L" indicates that there are currently
+no non-lazy callbacks (in both cases, "." is printed otherwise, as
+shown above) and "D" indicates that dyntick-idle processing is enabled
+("." is printed otherwise, for example, if disabled via the "nohz="
+kernel boot parameter).
If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 55918b54808b..a41a0384d20c 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -10,173 +10,8 @@ status messages via printk(), which can be examined via the dmesg
command (perhaps grepping for "torture"). The test is started
when the module is loaded, and stops when the module is unloaded.
-
-MODULE PARAMETERS
-
-This module has the following parameters:
-
-fqs_duration Duration (in microseconds) of artificially induced bursts
- of force_quiescent_state() invocations. In RCU
- implementations having force_quiescent_state(), these
- bursts help force races between forcing a given grace
- period and that grace period ending on its own.
-
-fqs_holdoff Holdoff time (in microseconds) between consecutive calls
- to force_quiescent_state() within a burst.
-
-fqs_stutter Wait time (in seconds) between consecutive bursts
- of calls to force_quiescent_state().
-
-gp_normal Make the fake writers use normal synchronous grace-period
- primitives.
-
-gp_exp Make the fake writers use expedited synchronous grace-period
- primitives. If both gp_normal and gp_exp are set, or
- if neither gp_normal nor gp_exp are set, then randomly
- choose the primitive so that about 50% are normal and
- 50% expedited. By default, neither are set, which
- gives best overall test coverage.
-
-irqreader Says to invoke RCU readers from irq level. This is currently
- done via timers. Defaults to "1" for variants of RCU that
- permit this. (Or, more accurately, variants of RCU that do
- -not- permit this know to ignore this variable.)
-
-n_barrier_cbs If this is nonzero, RCU barrier testing will be conducted,
- in which case n_barrier_cbs specifies the number of
- RCU callbacks (and corresponding kthreads) to use for
- this testing. The value cannot be negative. If you
- specify this to be non-zero when torture_type indicates a
- synchronous RCU implementation (one for which a member of
- the synchronize_rcu() rather than the call_rcu() family is
- used -- see the documentation for torture_type below), an
- error will be reported and no testing will be carried out.
-
-nfakewriters This is the number of RCU fake writer threads to run. Fake
- writer threads repeatedly use the synchronous "wait for
- current readers" function of the interface selected by
- torture_type, with a delay between calls to allow for various
- different numbers of writers running in parallel.
- nfakewriters defaults to 4, which provides enough parallelism
- to trigger special cases caused by multiple writers, such as
- the synchronize_srcu() early return optimization.
-
-nreaders This is the number of RCU reading threads supported.
- The default is twice the number of CPUs. Why twice?
- To properly exercise RCU implementations with preemptible
- read-side critical sections.
-
-onoff_interval
- The number of seconds between each attempt to execute a
- randomly selected CPU-hotplug operation. Defaults to
- zero, which disables CPU hotplugging. In HOTPLUG_CPU=n
- kernels, rcutorture will silently refuse to do any
- CPU-hotplug operations regardless of what value is
- specified for onoff_interval.
-
-onoff_holdoff The number of seconds to wait until starting CPU-hotplug
- operations. This would normally only be used when
- rcutorture was built into the kernel and started
- automatically at boot time, in which case it is useful
- in order to avoid confusing boot-time code with CPUs
- coming and going.
-
-shuffle_interval
- The number of seconds to keep the test threads affinitied
- to a particular subset of the CPUs, defaults to 3 seconds.
- Used in conjunction with test_no_idle_hz.
-
-shutdown_secs The number of seconds to run the test before terminating
- the test and powering off the system. The default is
- zero, which disables test termination and system shutdown.
- This capability is useful for automated testing.
-
-stall_cpu The number of seconds that a CPU should be stalled while
- within both an rcu_read_lock() and a preempt_disable().
- This stall happens only once per rcutorture run.
- If you need multiple stalls, use modprobe and rmmod to
- repeatedly run rcutorture. The default for stall_cpu
- is zero, which prevents rcutorture from stalling a CPU.
-
- Note that attempts to rmmod rcutorture while the stall
- is ongoing will hang, so be careful what value you
- choose for this module parameter! In addition, too-large
- values for stall_cpu might well induce failures and
- warnings in other parts of the kernel. You have been
- warned!
-
-stall_cpu_holdoff
- The number of seconds to wait after rcutorture starts
- before stalling a CPU. Defaults to 10 seconds.
-
-stat_interval The number of seconds between output of torture
- statistics (via printk()). Regardless of the interval,
- statistics are printed when the module is unloaded.
- Setting the interval to zero causes the statistics to
- be printed -only- when the module is unloaded, and this
- is the default.
-
-stutter The length of time to run the test before pausing for this
- same period of time. Defaults to "stutter=5", so as
- to run and pause for (roughly) five-second intervals.
- Specifying "stutter=0" causes the test to run continuously
- without pausing, which is the old default behavior.
-
-test_boost Whether or not to test the ability of RCU to do priority
- boosting. Defaults to "test_boost=1", which performs
- RCU priority-inversion testing only if the selected
- RCU implementation supports priority boosting. Specifying
- "test_boost=0" never performs RCU priority-inversion
- testing. Specifying "test_boost=2" performs RCU
- priority-inversion testing even if the selected RCU
- implementation does not support RCU priority boosting,
- which can be used to test rcutorture's ability to
- carry out RCU priority-inversion testing.
-
-test_boost_interval
- The number of seconds in an RCU priority-inversion test
- cycle. Defaults to "test_boost_interval=7". It is
- usually wise for this value to be relatively prime to
- the value selected for "stutter".
-
-test_boost_duration
- The number of seconds to do RCU priority-inversion testing
- within any given "test_boost_interval". Defaults to
- "test_boost_duration=4".
-
-test_no_idle_hz Whether or not to test the ability of RCU to operate in
- a kernel that disables the scheduling-clock interrupt to
- idle CPUs. Boolean parameter, "1" to test, "0" otherwise.
- Defaults to omitting this test.
-
-torture_type The type of RCU to test, with string values as follows:
-
- "rcu": rcu_read_lock(), rcu_read_unlock() and call_rcu(),
- along with expedited, synchronous, and polling
- variants.
-
- "rcu_bh": rcu_read_lock_bh(), rcu_read_unlock_bh(), and
- call_rcu_bh(), along with expedited and synchronous
- variants.
-
- "rcu_busted": This tests an intentionally incorrect version
- of RCU in order to help test rcutorture itself.
-
- "srcu": srcu_read_lock(), srcu_read_unlock() and
- call_srcu(), along with expedited and
- synchronous variants.
-
- "sched": preempt_disable(), preempt_enable(), and
- call_rcu_sched(), along with expedited,
- synchronous, and polling variants.
-
- "tasks": voluntary context switch and call_rcu_tasks(),
- along with expedited and synchronous variants.
-
- Defaults to "rcu".
-
-verbose Enable debug printk()s. Default is disabled.
-
+Module parameters are prefixed by "rcutorture." in
+Documentation/admin-guide/kernel-parameters.txt.
OUTPUT
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 4a6854318b17..1ace20815bb1 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -302,7 +302,7 @@ rcu_dereference()
must prohibit. The rcu_dereference_protected() variant takes
a lockdep expression to indicate which locks must be acquired
by the caller. If the indicated protection is not provided,
- a lockdep splat is emitted. See RCU/Design/Requirements.html
+ a lockdep splat is emitted. See RCU/Design/Requirements/Requirements.html
and the API's code comments for more details and example usage.
The following diagram shows how each API communicates among the
@@ -560,7 +560,7 @@ presents two such "toy" implementations of RCU, one that is implemented
in terms of familiar locking primitives, and another that more closely
resembles "classic" RCU. Both are way too simple for real-world use,
lacking both functionality and performance. However, they are useful
-in getting a feel for how RCU works. See kernel/rcupdate.c for a
+in getting a feel for how RCU works. See kernel/rcu/update.c for a
production-quality implementation, and see:
http://www.rdrop.com/users/paulmck/RCU
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index 0797eec76be1..47e577264198 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -1,9 +1,9 @@
.. _readme:
-Linux kernel release 4.x <http://kernel.org/>
+Linux kernel release 5.x <http://kernel.org/>
=============================================
-These are the release notes for Linux version 4. Read them carefully,
+These are the release notes for Linux version 5. Read them carefully,
as they tell you what this is all about, explain how to install the
kernel, and what to do if something goes wrong.
@@ -63,7 +63,7 @@ Installing the kernel source
directory where you have permissions (e.g. your home directory) and
unpack it::
- xz -cd linux-4.X.tar.xz | tar xvf -
+ xz -cd linux-5.x.tar.xz | tar xvf -
Replace "X" with the version number of the latest kernel.
@@ -72,26 +72,26 @@ Installing the kernel source
files. They should match the library, and not get messed up by
whatever the kernel-du-jour happens to be.
- - You can also upgrade between 4.x releases by patching. Patches are
+ - You can also upgrade between 5.x releases by patching. Patches are
distributed in the xz format. To install by patching, get all the
newer patch files, enter the top level directory of the kernel source
- (linux-4.X) and execute::
+ (linux-5.x) and execute::
- xz -cd ../patch-4.x.xz | patch -p1
+ xz -cd ../patch-5.x.xz | patch -p1
- Replace "x" for all versions bigger than the version "X" of your current
+ Replace "x" for all versions bigger than the version "x" of your current
source tree, **in_order**, and you should be ok. You may want to remove
the backup files (some-file-name~ or some-file-name.orig), and make sure
that there are no failed patches (some-file-name# or some-file-name.rej).
If there are, either you or I have made a mistake.
- Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
+ Unlike patches for the 5.x kernels, patches for the 5.x.y kernels
(also known as the -stable kernels) are not incremental but instead apply
- directly to the base 4.x kernel. For example, if your base kernel is 4.0
- and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
- and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
- want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
- patch -R) **before** applying the 4.0.3 patch. You can read more on this in
+ directly to the base 5.x kernel. For example, if your base kernel is 5.0
+ and you want to apply the 5.0.3 patch, you must not first apply the 5.0.1
+ and 5.0.2 patches. Similarly, if you are running kernel version 5.0.2 and
+ want to jump to 5.0.3, you must first reverse the 5.0.2 patch (that is,
+ patch -R) **before** applying the 5.0.3 patch. You can read more on this in
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
Alternatively, the script patch-kernel can be used to automate this
@@ -114,7 +114,7 @@ Installing the kernel source
Software requirements
---------------------
- Compiling and running the 4.x kernels requires up-to-date
+ Compiling and running the 5.x kernels requires up-to-date
versions of various software packages. Consult
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
required and how to get updates for these packages. Beware that using
@@ -132,12 +132,12 @@ Build directory for the kernel
place for the output files (including .config).
Example::
- kernel source code: /usr/src/linux-4.X
+ kernel source code: /usr/src/linux-5.x
build directory: /home/name/build/kernel
To configure and build the kernel, use::
- cd /usr/src/linux-4.X
+ cd /usr/src/linux-5.x
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b799bcf67d7b..a87418990d39 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -461,6 +461,11 @@
possible to determine what the correct size should be.
This option provides an override for these situations.
+ carrier_timeout=
+ [NET] Specifies amount of time (in seconds) that
+ the kernel should wait for a network carrier. By default
+ it waits 120 seconds.
+
ca_keys= [KEYS] This parameter identifies a specific key(s) on
the system trusted keyring to be used for certificate
trust validation.
@@ -1073,9 +1078,15 @@
specified address. The serial port must already be
setup and configured. Options are not yet supported.
+ efifb,[options]
+ Start an early, unaccelerated console on the EFI
+ memory mapped framebuffer (if available). On cache
+ coherent non-x86 systems that use system memory for
+ the framebuffer, pass the 'ram' option so that it is
+ mapped with the correct attributes.
+
earlyprintk= [X86,SH,ARM,M68k,S390]
earlyprintk=vga
- earlyprintk=efi
earlyprintk=sclp
earlyprintk=xen
earlyprintk=serial[,ttySn[,baudrate]]
@@ -1696,12 +1707,11 @@
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported.
- sm_off [Default Off]
- By default, scalable mode will be supported if the
+ sm_on [Default Off]
+ By default, scalable mode will be disabled even if the
hardware advertises that it has support for the scalable
mode translation. With this option set, scalable mode
- will not be used even on hardware which claims to support
- it.
+ will be used on hardware which claims to support it.
tboot_noforce [Default Off]
Do not force the Intel IOMMU enabled under tboot.
By default, tboot will force Intel IOMMU on, which
@@ -3654,19 +3664,6 @@
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
- rcutree.jiffies_till_sched_qs= [KNL]
- Set required age in jiffies for a
- given grace period before RCU starts
- soliciting quiescent-state help from
- rcu_note_context_switch(). If not specified, the
- kernel will calculate a value based on the most
- recent settings of rcutree.jiffies_till_first_fqs
- and rcutree.jiffies_till_next_fqs.
- This calculated value may be viewed in
- rcutree.jiffies_to_sched_qs. Any attempt to
- set rcutree.jiffies_to_sched_qs will be
- cheerfully overwritten.
-
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@@ -3678,6 +3675,20 @@
quiescent states. Units are jiffies, minimum
value is one, and maximum value is HZ.
+ rcutree.jiffies_till_sched_qs= [KNL]
+ Set required age in jiffies for a
+ given grace period before RCU starts
+ soliciting quiescent-state help from
+ rcu_note_context_switch() and cond_resched().
+ If not specified, the kernel will calculate
+ a value based on the most recent settings
+ of rcutree.jiffies_till_first_fqs
+ and rcutree.jiffies_till_next_fqs.
+ This calculated value may be viewed in
+ rcutree.jiffies_to_sched_qs. Any attempt to set
+ rcutree.jiffies_to_sched_qs will be cheerfully
+ overwritten.
+
rcutree.kthread_prio= [KNL,BOOT]
Set the SCHED_FIFO priority of the RCU per-CPU
kthreads (rcuc/N). This value is also used for
@@ -3721,6 +3732,11 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
+ rcutree.sysrq_rcu= [KNL]
+ Commandeer a sysrq key to dump out Tree RCU's
+ rcu_node tree with an eye towards determining
+ why a new grace period has not yet started.
+
rcuperf.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index 1f09d043d086..ddb8ce5333ba 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -44,6 +44,8 @@ stable kernels.
| Implementor | Component | Erratum ID | Kconfig |
+----------------+-----------------+-----------------+-----------------------------+
+| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
+| | | | |
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
index 7cc9e368c1e9..10453c627135 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -36,27 +36,27 @@ consideration important quirks of other architectures) and
defines calling convention that is compatible with C calling
convention of the linux kernel on those architectures.
-Q: can multiple return values be supported in the future?
+Q: Can multiple return values be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF allows only register R0 to be used as return value.
-Q: can more than 5 function arguments be supported in the future?
+Q: Can more than 5 function arguments be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF calling convention only allows registers R1-R5 to be used
as arguments. BPF is not a standalone instruction set.
(unlike x64 ISA that allows msft, cdecl and other conventions)
-Q: can BPF programs access instruction pointer or return address?
+Q: Can BPF programs access instruction pointer or return address?
-----------------------------------------------------------------
A: NO.
-Q: can BPF programs access stack pointer ?
+Q: Can BPF programs access stack pointer ?
------------------------------------------
A: NO.
Only frame pointer (register R10) is accessible.
From compiler point of view it's necessary to have stack pointer.
-For example LLVM defines register R11 as stack pointer in its
+For example, LLVM defines register R11 as stack pointer in its
BPF backend, but it makes sure that generated code never uses it.
Q: Does C-calling convention diminishes possible use cases?
@@ -66,8 +66,8 @@ A: YES.
BPF design forces addition of major functionality in the form
of kernel helper functions and kernel objects like BPF maps with
seamless interoperability between them. It lets kernel call into
-BPF programs and programs call kernel helpers with zero overhead.
-As all of them were native C code. That is particularly the case
+BPF programs and programs call kernel helpers with zero overhead,
+as all of them were native C code. That is particularly the case
for JITed BPF programs that are indistinguishable from
native kernel C code.
@@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
------------------------------------------------------------------------
A: Soft yes.
-At least for now until BPF core has support for
+At least for now, until BPF core has support for
bpf-to-bpf calls, indirect calls, loops, global variables,
-jump tables, read only sections and all other normal constructs
+jump tables, read-only sections, and all other normal constructs
that C code can produce.
Q: Can loops be supported in a safe way?
@@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like?
A: This was necessary to avoid introducing flags into ISA which are
impossible to make generic and efficient across CPU architectures.
-Q: why BPF_DIV instruction doesn't map to x64 div?
+Q: Why BPF_DIV instruction doesn't map to x64 div?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because if we picked one-to-one relationship to x64 it would have made
it more complicated to support on arm64 and other archs. Also it
needs div-by-zero runtime check.
-Q: why there is no BPF_SDIV for signed divide operation?
+Q: Why there is no BPF_SDIV for signed divide operation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because it would be rarely used. llvm errors in such case and
-prints a suggestion to use unsigned divide instead
+prints a suggestion to use unsigned divide instead.
Q: Why BPF has implicit prologue and epilogue?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst
new file mode 100644
index 000000000000..9a60a5d60e38
--- /dev/null
+++ b/Documentation/bpf/btf.rst
@@ -0,0 +1,848 @@
+=====================
+BPF Type Format (BTF)
+=====================
+
+1. Introduction
+***************
+
+BTF (BPF Type Format) is the metadata format which encodes the debug info
+related to BPF program/map. The name BTF was used initially to describe data
+types. The BTF was later extended to include function info for defined
+subroutines, and line info for source/line information.
+
+The debug info is used for map pretty print, function signature, etc. The
+function signature enables better bpf program/function kernel symbol. The line
+info helps generate source annotated translated byte code, jited code and
+verifier log.
+
+The BTF specification contains two parts,
+ * BTF kernel API
+ * BTF ELF file format
+
+The kernel API is the contract between user space and kernel. The kernel
+verifies the BTF info before using it. The ELF file format is a user space
+contract between ELF file and libbpf loader.
+
+The type and string sections are part of the BTF kernel API, describing the
+debug info (mostly types related) referenced by the bpf program. These two
+sections are discussed in details in :ref:`BTF_Type_String`.
+
+.. _BTF_Type_String:
+
+2. BTF Type and String Encoding
+*******************************
+
+The file ``include/uapi/linux/btf.h`` provides high-level definition of how
+types/strings are encoded.
+
+The beginning of data blob must be::
+
+ struct btf_header {
+ __u16 magic;
+ __u8 version;
+ __u8 flags;
+ __u32 hdr_len;
+
+ /* All offsets are in bytes relative to the end of this header */
+ __u32 type_off; /* offset of type section */
+ __u32 type_len; /* length of type section */
+ __u32 str_off; /* offset of string section */
+ __u32 str_len; /* length of string section */
+ };
+
+The magic is ``0xeB9F``, which has different encoding for big and little
+endian systems, and can be used to test whether BTF is generated for big- or
+little-endian target. The ``btf_header`` is designed to be extensible with
+``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is
+generated.
+
+2.1 String Encoding
+===================
+
+The first string in the string section must be a null string. The rest of
+string table is a concatenation of other null-terminated strings.
+
+2.2 Type Encoding
+=================
+
+The type id ``0`` is reserved for ``void`` type. The type section is parsed
+sequentially and type id is assigned to each recognized type starting from id
+``1``. Currently, the following types are supported::
+
+ #define BTF_KIND_INT 1 /* Integer */
+ #define BTF_KIND_PTR 2 /* Pointer */
+ #define BTF_KIND_ARRAY 3 /* Array */
+ #define BTF_KIND_STRUCT 4 /* Struct */
+ #define BTF_KIND_UNION 5 /* Union */
+ #define BTF_KIND_ENUM 6 /* Enumeration */
+ #define BTF_KIND_FWD 7 /* Forward */
+ #define BTF_KIND_TYPEDEF 8 /* Typedef */
+ #define BTF_KIND_VOLATILE 9 /* Volatile */
+ #define BTF_KIND_CONST 10 /* Const */
+ #define BTF_KIND_RESTRICT 11 /* Restrict */
+ #define BTF_KIND_FUNC 12 /* Function */
+ #define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
+
+Note that the type section encodes debug info, not just pure types.
+``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
+
+Each type contains the following common data::
+
+ struct btf_type {
+ __u32 name_off;
+ /* "info" bits arrangement
+ * bits 0-15: vlen (e.g. # of struct's members)
+ * bits 16-23: unused
+ * bits 24-27: kind (e.g. int, ptr, array...etc)
+ * bits 28-30: unused
+ * bit 31: kind_flag, currently used by
+ * struct, union and fwd
+ */
+ __u32 info;
+ /* "size" is used by INT, ENUM, STRUCT and UNION.
+ * "size" tells the size of the type it is describing.
+ *
+ * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
+ * FUNC and FUNC_PROTO.
+ * "type" is a type_id referring to another type.
+ */
+ union {
+ __u32 size;
+ __u32 type;
+ };
+ };
+
+For certain kinds, the common data are followed by kind-specific data. The
+``name_off`` in ``struct btf_type`` specifies the offset in the string table.
+The following sections detail encoding of each kind.
+
+2.2.1 BTF_KIND_INT
+~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: any valid offset
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_INT
+ * ``info.vlen``: 0
+ * ``size``: the size of the int type in bytes.
+
+``btf_type`` is followed by a ``u32`` with the following bits arrangement::
+
+ #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
+ #define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
+ #define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff)
+
+The ``BTF_INT_ENCODING`` has the following attributes::
+
+ #define BTF_INT_SIGNED (1 << 0)
+ #define BTF_INT_CHAR (1 << 1)
+ #define BTF_INT_BOOL (1 << 2)
+
+The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or
+bool, for the int type. The char and bool encoding are mostly useful for
+pretty print. At most one encoding can be specified for the int type.
+
+The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int
+type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4.
+The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()``
+for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
+
+The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
+for this int. For example, a bitfield struct member has: * btf member bit
+offset 100 from the start of the structure, * btf member pointing to an int
+type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
+
+Then in the struct memory layout, this member will occupy ``4`` bits starting
+from bits ``100 + 2 = 102``.
+
+Alternatively, the bitfield struct member can be the following to access the
+same bits as the above:
+
+ * btf member bit offset 102,
+ * btf member pointing to an int type,
+ * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
+
+The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of
+bitfield encoding. Currently, both llvm and pahole generate
+``BTF_INT_OFFSET() = 0`` for all int types.
+
+2.2.2 BTF_KIND_PTR
+~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_PTR
+ * ``info.vlen``: 0
+ * ``type``: the pointee type of the pointer
+
+No additional type data follow ``btf_type``.
+
+2.2.3 BTF_KIND_ARRAY
+~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_ARRAY
+ * ``info.vlen``: 0
+ * ``size/type``: 0, not used
+
+``btf_type`` is followed by one ``struct btf_array``::
+
+ struct btf_array {
+ __u32 type;
+ __u32 index_type;
+ __u32 nelems;
+ };
+
+The ``struct btf_array`` encoding:
+ * ``type``: the element type
+ * ``index_type``: the index type
+ * ``nelems``: the number of elements for this array (``0`` is also allowed).
+
+The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``,
+``u64``, ``unsigned __int128``). The original design of including
+``index_type`` follows DWARF, which has an ``index_type`` for its array type.
+Currently in BTF, beyond type verification, the ``index_type`` is not used.
+
+The ``struct btf_array`` allows chaining through element type to represent
+multidimensional arrays. For example, for ``int a[5][6]``, the following type
+information illustrates the chaining:
+
+ * [1]: int
+ * [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
+ * [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
+
+Currently, both pahole and llvm collapse multidimensional array into
+one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is
+equal to ``30``. This is because the original use case is map pretty print
+where the whole array is dumped out so one-dimensional array is enough. As
+more BTF usage is explored, pahole and llvm can be changed to generate proper
+chained representation for multidimensional arrays.
+
+2.2.4 BTF_KIND_STRUCT
+~~~~~~~~~~~~~~~~~~~~~
+2.2.5 BTF_KIND_UNION
+~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0 or offset to a valid C identifier
+ * ``info.kind_flag``: 0 or 1
+ * ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION
+ * ``info.vlen``: the number of struct/union members
+ * ``info.size``: the size of the struct/union in bytes
+
+``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.::
+
+ struct btf_member {
+ __u32 name_off;
+ __u32 type;
+ __u32 offset;
+ };
+
+``struct btf_member`` encoding:
+ * ``name_off``: offset to a valid C identifier
+ * ``type``: the member type
+ * ``offset``: <see below>
+
+If the type info ``kind_flag`` is not set, the offset contains only bit offset
+of the member. Note that the base type of the bitfield can only be int or enum
+type. If the bitfield size is 32, the base type can be either int or enum
+type. If the bitfield size is not 32, the base type must be int, and int type
+``BTF_INT_BITS()`` encodes the bitfield size.
+
+If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member
+bitfield size and bit offset. The bitfield size and bit offset are calculated
+as below.::
+
+ #define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
+ #define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
+
+In this case, if the base type is an int type, it must be a regular int type:
+
+ * ``BTF_INT_OFFSET()`` must be 0.
+ * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
+
+The following kernel patch introduced ``kind_flag`` and explained why both
+modes exist:
+
+ https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
+
+2.2.6 BTF_KIND_ENUM
+~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0 or offset to a valid C identifier
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_ENUM
+ * ``info.vlen``: number of enum values
+ * ``size``: 4
+
+``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.::
+
+ struct btf_enum {
+ __u32 name_off;
+ __s32 val;
+ };
+
+The ``btf_enum`` encoding:
+ * ``name_off``: offset to a valid C identifier
+ * ``val``: any value
+
+2.2.7 BTF_KIND_FWD
+~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: offset to a valid C identifier
+ * ``info.kind_flag``: 0 for struct, 1 for union
+ * ``info.kind``: BTF_KIND_FWD
+ * ``info.vlen``: 0
+ * ``type``: 0
+
+No additional type data follow ``btf_type``.
+
+2.2.8 BTF_KIND_TYPEDEF
+~~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: offset to a valid C identifier
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_TYPEDEF
+ * ``info.vlen``: 0
+ * ``type``: the type which can be referred by name at ``name_off``
+
+No additional type data follow ``btf_type``.
+
+2.2.9 BTF_KIND_VOLATILE
+~~~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_VOLATILE
+ * ``info.vlen``: 0
+ * ``type``: the type with ``volatile`` qualifier
+
+No additional type data follow ``btf_type``.
+
+2.2.10 BTF_KIND_CONST
+~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_CONST
+ * ``info.vlen``: 0
+ * ``type``: the type with ``const`` qualifier
+
+No additional type data follow ``btf_type``.
+
+2.2.11 BTF_KIND_RESTRICT
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_RESTRICT
+ * ``info.vlen``: 0
+ * ``type``: the type with ``restrict`` qualifier
+
+No additional type data follow ``btf_type``.
+
+2.2.12 BTF_KIND_FUNC
+~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: offset to a valid C identifier
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_FUNC
+ * ``info.vlen``: 0
+ * ``type``: a BTF_KIND_FUNC_PROTO type
+
+No additional type data follow ``btf_type``.
+
+A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose
+signature is defined by ``type``. The subprogram is thus an instance of that
+type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
+:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
+(ABI).
+
+2.2.13 BTF_KIND_FUNC_PROTO
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``struct btf_type`` encoding requirement:
+ * ``name_off``: 0
+ * ``info.kind_flag``: 0
+ * ``info.kind``: BTF_KIND_FUNC_PROTO
+ * ``info.vlen``: # of parameters
+ * ``type``: the return type
+
+``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.::
+
+ struct btf_param {
+ __u32 name_off;
+ __u32 type;
+ };
+
+If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then
+``btf_param.name_off`` must point to a valid C identifier except for the
+possible last argument representing the variable argument. The btf_param.type
+refers to parameter type.
+
+If the function has variable arguments, the last parameter is encoded with
+``name_off = 0`` and ``type = 0``.
+
+3. BTF Kernel API
+*****************
+
+The following bpf syscall command involves BTF:
+ * BPF_BTF_LOAD: load a blob of BTF data into kernel
+ * BPF_MAP_CREATE: map creation with btf key and value type info.
+ * BPF_PROG_LOAD: prog load with btf function and line info.
+ * BPF_BTF_GET_FD_BY_ID: get a btf fd
+ * BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info
+ and other btf related info are returned.
+
+The workflow typically looks like:
+::
+
+ Application:
+ BPF_BTF_LOAD
+ |
+ v
+ BPF_MAP_CREATE and BPF_PROG_LOAD
+ |
+ V
+ ......
+
+ Introspection tool:
+ ......
+ BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's)
+ |
+ V
+ BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd)
+ |
+ V
+ BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id)
+ | |
+ V |
+ BPF_BTF_GET_FD_BY_ID (get btf_fd) |
+ | |
+ V |
+ BPF_OBJ_GET_INFO_BY_FD (get btf) |
+ | |
+ V V
+ pretty print types, dump func signatures and line info, etc.
+
+
+3.1 BPF_BTF_LOAD
+================
+
+Load a blob of BTF data into kernel. A blob of data, described in
+:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
+is returned to a userspace.
+
+3.2 BPF_MAP_CREATE
+==================
+
+A map can be created with ``btf_fd`` and specified key/value type id.::
+
+ __u32 btf_fd; /* fd pointing to a BTF type data */
+ __u32 btf_key_type_id; /* BTF type_id of the key */
+ __u32 btf_value_type_id; /* BTF type_id of the value */
+
+In libbpf, the map can be defined with extra annotation like below:
+::
+
+ struct bpf_map_def SEC("maps") btf_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(struct ipv_counts),
+ .max_entries = 4,
+ };
+ BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
+
+Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and
+value types for the map. During ELF parsing, libbpf is able to extract
+key/value type_id's and assign them to BPF_MAP_CREATE attributes
+automatically.
+
+.. _BPF_Prog_Load:
+
+3.3 BPF_PROG_LOAD
+=================
+
+During prog_load, func_info and line_info can be passed to kernel with proper
+values for the following attributes:
+::
+
+ __u32 insn_cnt;
+ __aligned_u64 insns;
+ ......
+ __u32 prog_btf_fd; /* fd pointing to BTF type data */
+ __u32 func_info_rec_size; /* userspace bpf_func_info size */
+ __aligned_u64 func_info; /* func info */
+ __u32 func_info_cnt; /* number of bpf_func_info records */
+ __u32 line_info_rec_size; /* userspace bpf_line_info size */
+ __aligned_u64 line_info; /* line info */
+ __u32 line_info_cnt; /* number of bpf_line_info records */
+
+The func_info and line_info are an array of below, respectively.::
+
+ struct bpf_func_info {
+ __u32 insn_off; /* [0, insn_cnt - 1] */
+ __u32 type_id; /* pointing to a BTF_KIND_FUNC type */
+ };
+ struct bpf_line_info {
+ __u32 insn_off; /* [0, insn_cnt - 1] */
+ __u32 file_name_off; /* offset to string table for the filename */
+ __u32 line_off; /* offset to string table for the source line */
+ __u32 line_col; /* line number and column number */
+ };
+
+func_info_rec_size is the size of each func_info record, and
+line_info_rec_size is the size of each line_info record. Passing the record
+size to kernel make it possible to extend the record itself in the future.
+
+Below are requirements for func_info:
+ * func_info[0].insn_off must be 0.
+ * the func_info insn_off is in strictly increasing order and matches
+ bpf func boundaries.
+
+Below are requirements for line_info:
+ * the first insn in each func must have a line_info record pointing to it.
+ * the line_info insn_off is in strictly increasing order.
+
+For line_info, the line number and column number are defined as below:
+::
+
+ #define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10)
+ #define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
+
+3.4 BPF_{PROG,MAP}_GET_NEXT_ID
+
+In kernel, every loaded program, map or btf has a unique id. The id won't
+change during the lifetime of a program, map, or btf.
+
+The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for
+each command, to user space, for bpf program or maps, respectively, so an
+inspection tool can inspect all programs and maps.
+
+3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
+
+An introspection tool cannot use id to get details about program or maps.
+A file descriptor needs to be obtained first for reference-counting purpose.
+
+3.6 BPF_OBJ_GET_INFO_BY_FD
+==========================
+
+Once a program/map fd is acquired, an introspection tool can get the detailed
+information from kernel about this fd, some of which are BTF-related. For
+example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
+``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated
+bpf byte codes, and jited_line_info.
+
+3.7 BPF_BTF_GET_FD_BY_ID
+========================
+
+With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
+syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
+command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the
+kernel with BPF_BTF_LOAD, can be retrieved.
+
+With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection
+tool has full btf knowledge and is able to pretty print map key/values, dump
+func signatures and line info, along with byte/jit codes.
+
+4. ELF File Format Interface
+****************************
+
+4.1 .BTF section
+================
+
+The .BTF section contains type and string data. The format of this section is
+same as the one describe in :ref:`BTF_Type_String`.
+
+.. _BTF_Ext_Section:
+
+4.2 .BTF.ext section
+====================
+
+The .BTF.ext section encodes func_info and line_info which needs loader
+manipulation before loading into the kernel.
+
+The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h``
+and ``tools/lib/bpf/btf.c``.
+
+The current header of .BTF.ext section::
+
+ struct btf_ext_header {
+ __u16 magic;
+ __u8 version;
+ __u8 flags;
+ __u32 hdr_len;
+
+ /* All offsets are in bytes relative to the end of this header */
+ __u32 func_info_off;
+ __u32 func_info_len;
+ __u32 line_info_off;
+ __u32 line_info_len;
+ };
+
+It is very similar to .BTF section. Instead of type/string section, it
+contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details
+about func_info and line_info record format.
+
+The func_info is organized as below.::
+
+ func_info_rec_size
+ btf_ext_info_sec for section #1 /* func_info for section #1 */
+ btf_ext_info_sec for section #2 /* func_info for section #2 */
+ ...
+
+``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when
+.BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of
+func_info for each specific ELF section.::
+
+ struct btf_ext_info_sec {
+ __u32 sec_name_off; /* offset to section name */
+ __u32 num_info;
+ /* Followed by num_info * record_size number of bytes */
+ __u8 data[0];
+ };
+
+Here, num_info must be greater than 0.
+
+The line_info is organized as below.::
+
+ line_info_rec_size
+ btf_ext_info_sec for section #1 /* line_info for section #1 */
+ btf_ext_info_sec for section #2 /* line_info for section #2 */
+ ...
+
+``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when
+.BTF.ext is generated.
+
+The interpretation of ``bpf_func_info->insn_off`` and
+``bpf_line_info->insn_off`` is different between kernel API and ELF API. For
+kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct
+bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
+beginning of section (``btf_ext_info_sec->sec_name_off``).
+
+5. Using BTF
+************
+
+5.1 bpftool map pretty print
+============================
+
+With BTF, the map key/value can be printed based on fields rather than simply
+raw bytes. This is especially valuable for large structure or if your data
+structure has bitfields. For example, for the following map,::
+
+ enum A { A1, A2, A3, A4, A5 };
+ typedef enum A ___A;
+ struct tmp_t {
+ char a1:4;
+ int a2:4;
+ int :4;
+ __u32 a3:4;
+ int b;
+ ___A b1:4;
+ enum A b2:4;
+ };
+ struct bpf_map_def SEC("maps") tmpmap = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct tmp_t),
+ .max_entries = 1,
+ };
+ BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);
+
+bpftool is able to pretty print like below:
+::
+
+ [{
+ "key": 0,
+ "value": {
+ "a1": 0x2,
+ "a2": 0x4,
+ "a3": 0x6,
+ "b": 7,
+ "b1": 0x8,
+ "b2": 0xa
+ }
+ }
+ ]
+
+5.2 bpftool prog dump
+=====================
+
+The following is an example showing how func_info and line_info can help prog
+dump with better kernel symbol names, function prototypes and line
+information.::
+
+ $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
+ [...]
+ int test_long_fname_2(struct dummy_tracepoint_args * arg):
+ bpf_prog_44a040bf25481309_test_long_fname_2:
+ ; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
+ 0: push %rbp
+ 1: mov %rsp,%rbp
+ 4: sub $0x30,%rsp
+ b: sub $0x28,%rbp
+ f: mov %rbx,0x0(%rbp)
+ 13: mov %r13,0x8(%rbp)
+ 17: mov %r14,0x10(%rbp)
+ 1b: mov %r15,0x18(%rbp)
+ 1f: xor %eax,%eax
+ 21: mov %rax,0x20(%rbp)
+ 25: xor %esi,%esi
+ ; int key = 0;
+ 27: mov %esi,-0x4(%rbp)
+ ; if (!arg->sock)
+ 2a: mov 0x8(%rdi),%rdi
+ ; if (!arg->sock)
+ 2e: cmp $0x0,%rdi
+ 32: je 0x0000000000000070
+ 34: mov %rbp,%rsi
+ ; counts = bpf_map_lookup_elem(&btf_map, &key);
+ [...]
+
+5.3 Verifier Log
+================
+
+The following is an example of how line_info can help debugging verification
+failure.::
+
+ /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
+ * is modified as below.
+ */
+ data = (void *)(long)xdp->data;
+ data_end = (void *)(long)xdp->data_end;
+ /*
+ if (data + 4 > data_end)
+ return XDP_DROP;
+ */
+ *(u32 *)data = dst->dst;
+
+ $ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
+ ; data = (void *)(long)xdp->data;
+ 224: (79) r2 = *(u64 *)(r10 -112)
+ 225: (61) r2 = *(u32 *)(r2 +0)
+ ; *(u32 *)data = dst->dst;
+ 226: (63) *(u32 *)(r2 +0) = r1
+ invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
+ R2 offset is outside of the packet
+
+6. BTF Generation
+*****************
+
+You need latest pahole
+
+ https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
+
+or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't
+support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,::
+
+ -bash-4.4$ cat t.c
+ struct t {
+ int a:2;
+ int b:3;
+ int c:2;
+ } g;
+ -bash-4.4$ gcc -c -O2 -g t.c
+ -bash-4.4$ pahole -JV t.o
+ File t.o:
+ [1] STRUCT t kind_flag=1 size=4 vlen=3
+ a type_id=2 bitfield_size=2 bits_offset=0
+ b type_id=2 bitfield_size=3 bits_offset=2
+ c type_id=2 bitfield_size=2 bits_offset=5
+ [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
+
+The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target
+only. The assembly code (-S) is able to show the BTF encoding in assembly
+format.::
+
+ -bash-4.4$ cat t2.c
+ typedef int __int32;
+ struct t2 {
+ int a2;
+ int (*f2)(char q1, __int32 q2, ...);
+ int (*f3)();
+ } g2;
+ int main() { return 0; }
+ int test() { return 0; }
+ -bash-4.4$ clang -c -g -O2 -target bpf t2.c
+ -bash-4.4$ readelf -S t2.o
+ ......
+ [ 8] .BTF PROGBITS 0000000000000000 00000247
+ 000000000000016e 0000000000000000 0 0 1
+ [ 9] .BTF.ext PROGBITS 0000000000000000 000003b5
+ 0000000000000060 0000000000000000 0 0 1
+ [10] .rel.BTF.ext REL 0000000000000000 000007e0
+ 0000000000000040 0000000000000010 16 9 8
+ ......
+ -bash-4.4$ clang -S -g -O2 -target bpf t2.c
+ -bash-4.4$ cat t2.s
+ ......
+ .section .BTF,"",@progbits
+ .short 60319 # 0xeb9f
+ .byte 1
+ .byte 0
+ .long 24
+ .long 0
+ .long 220
+ .long 220
+ .long 122
+ .long 0 # BTF_KIND_FUNC_PROTO(id = 1)
+ .long 218103808 # 0xd000000
+ .long 2
+ .long 83 # BTF_KIND_INT(id = 2)
+ .long 16777216 # 0x1000000
+ .long 4
+ .long 16777248 # 0x1000020
+ ......
+ .byte 0 # string offset=0
+ .ascii ".text" # string offset=1
+ .byte 0
+ .ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7
+ .byte 0
+ .ascii "int main() { return 0; }" # string offset=33
+ .byte 0
+ .ascii "int test() { return 0; }" # string offset=58
+ .byte 0
+ .ascii "int" # string offset=83
+ ......
+ .section .BTF.ext,"",@progbits
+ .short 60319 # 0xeb9f
+ .byte 1
+ .byte 0
+ .long 24
+ .long 0
+ .long 28
+ .long 28
+ .long 44
+ .long 8 # FuncInfo
+ .long 1 # FuncInfo section string offset=1
+ .long 2
+ .long .Lfunc_begin0
+ .long 3
+ .long .Lfunc_begin1
+ .long 5
+ .long 16 # LineInfo
+ .long 1 # LineInfo section string offset=1
+ .long 2
+ .long .Ltmp0
+ .long 7
+ .long 33
+ .long 7182 # Line 7 Col 14
+ .long .Ltmp3
+ .long 7
+ .long 58
+ .long 8206 # Line 8 Col 14
+
+7. Testing
+**********
+
+Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 00a8450a602f..4e77932959cc 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -15,6 +15,13 @@ that goes into great technical depth about the BPF Architecture.
The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_.
+BPF Type Format (BTF)
+=====================
+
+.. toctree::
+ :maxdepth: 1
+
+ btf
Frequently asked questions (FAQ)
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
index 322851bada16..976e85adffe8 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -54,6 +54,13 @@ must propagate to all other CPUs before the release operation
(A-cumulative property). This is implemented using
:c:func:`smp_store_release`.
+An ACQUIRE memory ordering guarantees that all post loads and
+stores (all po-later instructions) on the same CPU are
+completed after the acquire operation. It also guarantees that all
+po-later stores on the same CPU must propagate to all other CPUs
+after the acquire operation executes. This is implemented using
+:c:func:`smp_acquire__after_ctrl_dep`.
+
A control dependency (on success) for refcounters guarantees that
if a reference for an object was successfully obtained (reference
counter increment or addition happened, function returned true),
@@ -119,13 +126,24 @@ Memory ordering guarantees changes:
result of obtaining pointer to the object!
-case 5) - decrement-based RMW ops that return a value
------------------------------------------------------
+case 5) - generic dec/sub decrement-based RMW ops that return a value
+---------------------------------------------------------------------
Function changes:
* :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
* :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + ACQUIRE ordering on success
+
+
+case 6) other decrement-based RMW ops that return a value
+---------------------------------------------------------
+
+Function changes:
+
* no atomic counterpart --> :c:func:`refcount_dec_if_one`
* ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
@@ -136,7 +154,7 @@ Memory ordering guarantees changes:
.. note:: :c:func:`atomic_add_unless` only provides full order on success.
-case 6) - lock-based RMW
+case 7) - lock-based RMW
------------------------
Function changes:
diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile
index 6e5cef0ed6fb..50daa0b3b032 100644
--- a/Documentation/devicetree/bindings/Makefile
+++ b/Documentation/devicetree/bindings/Makefile
@@ -17,7 +17,11 @@ extra-y += $(DT_TMP_SCHEMA)
quiet_cmd_mk_schema = SCHEMA $@
cmd_mk_schema = $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $(filter-out FORCE, $^)
-DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml')
+DT_DOCS = $(shell \
+ cd $(srctree)/$(src) && \
+ find * \( -name '*.yaml' ! -name $(DT_TMP_SCHEMA) \) \
+ )
+
DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS))
extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES))
diff --git a/Documentation/devicetree/bindings/crypto/samsung-slimsss.txt b/Documentation/devicetree/bindings/crypto/samsung-slimsss.txt
new file mode 100644
index 000000000000..7ec9a5a7727a
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/samsung-slimsss.txt
@@ -0,0 +1,19 @@
+Samsung SoC SlimSSS (Slim Security SubSystem) module
+
+The SlimSSS module in Exynos5433 SoC supports the following:
+-- Feeder (FeedCtrl)
+-- Advanced Encryption Standard (AES) with ECB,CBC,CTR,XTS and (CBC/XTS)/CTS
+-- SHA-1/SHA-256 and (SHA-1/SHA-256)/HMAC
+
+Required properties:
+
+- compatible : Should contain entry for slimSSS version:
+ - "samsung,exynos5433-slim-sss" for Exynos5433 SoC.
+- reg : Offset and length of the register set for the module
+- interrupts : interrupt specifiers of SlimSSS module interrupts (one feed
+ control interrupt).
+
+- clocks : list of clock phandle and specifier pairs for all clocks listed in
+ clock-names property.
+- clock-names : list of device clock input names; should contain "pclk" and
+ "aclk" for slim-sss in Exynos5433.
diff --git a/Documentation/devicetree/bindings/hwmon/ad741x.txt b/Documentation/devicetree/bindings/hwmon/ad741x.txt
new file mode 100644
index 000000000000..9102152c8410
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ad741x.txt
@@ -0,0 +1,15 @@
+* AD7416/AD7417/AD7418 Temperature Sensor Device Tree Bindings
+
+Required properties:
+- compatible: one of
+ "adi,ad7416"
+ "adi,ad7417"
+ "adi,ad7418"
+- reg: I2C address
+
+Example:
+
+hwmon@28 {
+ compatible = "adi,ad7418";
+ reg = <0x28>;
+};
diff --git a/Documentation/devicetree/bindings/hwmon/dps650ab.txt b/Documentation/devicetree/bindings/hwmon/dps650ab.txt
new file mode 100644
index 000000000000..76780e795899
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/dps650ab.txt
@@ -0,0 +1,11 @@
+Bindings for Delta Electronics DPS-650-AB power supply
+
+Required properties:
+- compatible : "delta,dps650ab"
+- reg : I2C address, one of 0x58, 0x59.
+
+Example:
+ dps650ab@58 {
+ compatible = "delta,dps650ab";
+ reg = <0x58>;
+ };
diff --git a/Documentation/devicetree/bindings/hwmon/hih6130.txt b/Documentation/devicetree/bindings/hwmon/hih6130.txt
new file mode 100644
index 000000000000..2c43837af4c2
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/hih6130.txt
@@ -0,0 +1,12 @@
+Honeywell Humidicon HIH-6130 humidity/temperature sensor
+--------------------------------------------------------
+
+Requires node properties:
+- compatible : "honeywell,hi6130"
+- reg : the I2C address of the device. This is 0x27.
+
+Example:
+ hih6130@27 {
+ compatible = "honeywell,hih6130";
+ reg = <0x27>;
+ };
diff --git a/Documentation/devicetree/bindings/hwmon/ina3221.txt b/Documentation/devicetree/bindings/hwmon/ina3221.txt
index a7b25caa2b8e..fa63b6171407 100644
--- a/Documentation/devicetree/bindings/hwmon/ina3221.txt
+++ b/Documentation/devicetree/bindings/hwmon/ina3221.txt
@@ -6,6 +6,16 @@ Texas Instruments INA3221 Device Tree Bindings
- reg: I2C address
Optional properties:
+ - ti,single-shot: This chip has two power modes: single-shot (chip takes one
+ measurement and then shuts itself down) and continuous (
+ chip takes continuous measurements). The continuous mode is
+ more reliable and suitable for hardware monitor type device,
+ but the single-shot mode is more power-friendly and useful
+ for battery-powered device which cares power consumptions
+ while still needs some measurements occasionally.
+ If this property is present, the single-shot mode will be
+ used, instead of the default continuous one for monitoring.
+
= The node contains optional child nodes for three channels =
= Each child node describes the information of input source =
diff --git a/Documentation/devicetree/bindings/hwmon/lm75.txt b/Documentation/devicetree/bindings/hwmon/lm75.txt
new file mode 100644
index 000000000000..12d8cf7cf592
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/lm75.txt
@@ -0,0 +1,37 @@
+*LM75 hwmon sensor.
+
+Required properties:
+- compatible: manufacturer and chip name, one of
+ "adi,adt75",
+ "dallas,ds1775",
+ "dallas,ds75",
+ "dallas,ds7505",
+ "gmt,g751",
+ "national,lm75",
+ "national,lm75a",
+ "national,lm75b",
+ "maxim,max6625",
+ "maxim,max6626",
+ "maxim,max31725",
+ "maxim,max31726",
+ "maxim,mcp980x",
+ "st,stds75",
+ "st,stlm75",
+ "microchip,tcn75",
+ "ti,tmp100",
+ "ti,tmp101",
+ "ti,tmp105",
+ "ti,tmp112",
+ "ti,tmp175",
+ "ti,tmp275",
+ "ti,tmp75",
+ "ti,tmp75c",
+
+- reg: I2C bus address of the device
+
+Example:
+
+sensor@48 {
+ compatible = "st,stlm75";
+ reg = <0x48>;
+};
diff --git a/Documentation/devicetree/bindings/hwmon/pwm-fan.txt b/Documentation/devicetree/bindings/hwmon/pwm-fan.txt
index c6d533202d3e..49ca5d83ed13 100644
--- a/Documentation/devicetree/bindings/hwmon/pwm-fan.txt
+++ b/Documentation/devicetree/bindings/hwmon/pwm-fan.txt
@@ -6,6 +6,9 @@ Required properties:
- cooling-levels : PWM duty cycle values in a range from 0 to 255
which correspond to thermal cooling states
+Optional properties:
+- fan-supply : phandle to the regulator that provides power to the fan
+
Example:
fan0: pwm-fan {
compatible = "pwm-fan";
diff --git a/Documentation/devicetree/bindings/interrupt-controller/fsl,irqsteer.txt b/Documentation/devicetree/bindings/interrupt-controller/fsl,irqsteer.txt
index 45790ce6f5b9..582991c426ee 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/fsl,irqsteer.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/fsl,irqsteer.txt
@@ -6,8 +6,9 @@ Required properties:
- "fsl,imx8m-irqsteer"
- "fsl,imx-irqsteer"
- reg: Physical base address and size of registers.
-- interrupts: Should contain the parent interrupt line used to multiplex the
- input interrupts.
+- interrupts: Should contain the up to 8 parent interrupt lines used to
+ multiplex the input interrupts. They should be specified sequentially
+ from output 0 to 7.
- clocks: Should contain one clock for entry in clock-names
see Documentation/devicetree/bindings/clock/clock-bindings.txt
- clock-names:
@@ -16,8 +17,8 @@ Required properties:
- #interrupt-cells: Specifies the number of cells needed to encode an
interrupt source. The value must be 1.
- fsl,channel: The output channel that all input IRQs should be steered into.
-- fsl,irq-groups: Number of IRQ groups managed by this controller instance.
- Each group manages 64 input interrupts.
+- fsl,num-irqs: Number of input interrupts of this channel.
+ Should be multiple of 32 input interrupts and up to 512 interrupts.
Example:
@@ -28,7 +29,7 @@ Example:
clocks = <&clk IMX8MQ_CLK_DISP_APB_ROOT>;
clock-names = "ipg";
fsl,channel = <0>;
- fsl,irq-groups = <1>;
+ fsl,num-irqs = <64>;
interrupt-controller;
#interrupt-cells = <1>;
};
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,ls1x-intc.txt b/Documentation/devicetree/bindings/interrupt-controller/loongson,ls1x-intc.txt
new file mode 100644
index 000000000000..a63ed9fcb535
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,ls1x-intc.txt
@@ -0,0 +1,24 @@
+Loongson ls1x Interrupt Controller
+
+Required properties:
+
+- compatible : should be "loongson,ls1x-intc". Valid strings are:
+
+- reg : Specifies base physical address and size of the registers.
+- interrupt-controller : Identifies the node as an interrupt controller
+- #interrupt-cells : Specifies the number of cells needed to encode an
+ interrupt source. The value shall be 2.
+- interrupts : Specifies the CPU interrupt the controller is connected to.
+
+Example:
+
+intc: interrupt-controller@1fd01040 {
+ compatible = "loongson,ls1x-intc";
+ reg = <0x1fd01040 0x18>;
+
+ interrupt-controller;
+ #interrupt-cells = <2>;
+
+ interrupt-parent = <&cpu_intc>;
+ interrupts = <2>;
+};
diff --git a/Documentation/devicetree/bindings/leds/common.txt b/Documentation/devicetree/bindings/leds/common.txt
index aa1399814a2a..70876ac11367 100644
--- a/Documentation/devicetree/bindings/leds/common.txt
+++ b/Documentation/devicetree/bindings/leds/common.txt
@@ -37,6 +37,18 @@ Optional properties for child nodes:
"ide-disk" - LED indicates IDE disk activity (deprecated),
in new implementations use "disk-activity"
"timer" - LED flashes at a fixed, configurable rate
+ "pattern" - LED alters the brightness for the specified duration with one
+ software timer (requires "led-pattern" property)
+
+- led-pattern : Array of integers with default pattern for certain triggers.
+ Each trigger may parse this property differently:
+ - one-shot : two numbers specifying delay on and delay off (in ms),
+ - timer : two numbers specifying delay on and delay off (in ms),
+ - pattern : the pattern is given by a series of tuples, of
+ brightness and duration (in ms). The exact format is
+ described in:
+ Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt
+
- led-max-microamp : Maximum LED supply current in microamperes. This property
can be made mandatory for the board configurations
diff --git a/Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt b/Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt
new file mode 100644
index 000000000000..d3696680bfc8
--- /dev/null
+++ b/Documentation/devicetree/bindings/leds/leds-trigger-pattern.txt
@@ -0,0 +1,49 @@
+* Pattern format for LED pattern trigger
+
+The pattern is given by a series of tuples, of brightness and duration (ms).
+The LED is expected to traverse the series and each brightness value for the
+specified duration. Duration of 0 means brightness should immediately change to
+new value, and writing malformed pattern deactivates any active one.
+
+1. For gradual dimming, the dimming interval now is set as 50 milliseconds. So
+the tuple with duration less than dimming interval (50ms) is treated as a step
+change of brightness, i.e. the subsequent brightness will be applied without
+adding intervening dimming intervals.
+
+The gradual dimming format of the software pattern values should be:
+"brightness_1 duration_1 brightness_2 duration_2 brightness_3 duration_3 ...".
+For example (using sysfs interface):
+
+echo 0 1000 255 2000 > pattern
+
+It will make the LED go gradually from zero-intensity to max (255) intensity in
+1000 milliseconds, then back to zero intensity in 2000 milliseconds:
+
+LED brightness
+ ^
+255-| / \ / \ /
+ | / \ / \ /
+ | / \ / \ /
+ | / \ / \ /
+ 0-| / \/ \/
+ +---0----1----2----3----4----5----6------------> time (s)
+
+2. To make the LED go instantly from one brightness value to another, we should
+use zero-time lengths (the brightness must be same as the previous tuple's). So
+the format should be: "brightness_1 duration_1 brightness_1 0 brightness_2
+duration_2 brightness_2 0 ...".
+For example (using sysfs interface):
+
+echo 0 1000 0 0 255 2000 255 0 > pattern
+
+It will make the LED stay off for one second, then stay at max brightness for
+two seconds:
+
+LED brightness
+ ^
+255-| +---------+ +---------+
+ | | | | |
+ | | | | |
+ | | | | |
+ 0-| -----+ +----+ +----
+ +---0----1----2----3----4----5----6------------> time (s)
diff --git a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
index a4b056761eaa..d5f68ac78d15 100644
--- a/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
+++ b/Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt
@@ -23,6 +23,20 @@ Required properties:
Optional properties:
- clock-output-names : Should contain name for output clock.
+- rohm,reset-snvs-powered : Transfer BD718x7 to SNVS state at reset.
+
+The BD718x7 supports two different HW states as reset target states. States
+are called as SNVS and READY. At READY state all the PMIC power outputs go
+down and OTP is reload. At the SNVS state all other logic and external
+devices apart from the SNVS power domain are shut off. Please refer to NXP
+i.MX8 documentation for further information regarding SNVS state. When a
+reset is done via SNVS state the PMIC OTP data is not reload. This causes
+power outputs that have been under SW control to stay down when reset has
+switched power state to SNVS. If reset is done via READY state the power
+outputs will be returned to HW control by OTP loading. Thus the reset
+target state is set to READY by default. If SNVS state is used the boot
+crucial regulators must have the regulator-always-on and regulator-boot-on
+properties set in regulator node.
Example:
@@ -43,6 +57,7 @@ Example:
#clock-cells = <0>;
clocks = <&osc 0>;
clock-output-names = "bd71837-32k-out";
+ rohm,reset-snvs-powered;
regulators {
buck1: BUCK1 {
@@ -50,8 +65,10 @@ Example:
regulator-min-microvolt = <700000>;
regulator-max-microvolt = <1300000>;
regulator-boot-on;
+ regulator-always-on;
regulator-ramp-delay = <1250>;
};
+ // [...]
};
};
diff --git a/Documentation/devicetree/bindings/mips/lantiq/rcu-gphy.txt b/Documentation/devicetree/bindings/mips/lantiq/rcu-gphy.txt
deleted file mode 100644
index a0c19bd1ce66..000000000000
--- a/Documentation/devicetree/bindings/mips/lantiq/rcu-gphy.txt
+++ /dev/null
@@ -1,36 +0,0 @@
-Lantiq XWAY SoC GPHY binding
-============================
-
-This binding describes a software-defined ethernet PHY, provided by the RCU
-module on newer Lantiq XWAY SoCs (xRX200 and newer).
-
--------------------------------------------------------------------------------
-Required properties:
-- compatible : Should be one of
- "lantiq,xrx200a1x-gphy"
- "lantiq,xrx200a2x-gphy"
- "lantiq,xrx300-gphy"
- "lantiq,xrx330-gphy"
-- reg : Addrress of the GPHY FW load address register
-- resets : Must reference the RCU GPHY reset bit
-- reset-names : One entry, value must be "gphy" or optional "gphy2"
-- clocks : A reference to the (PMU) GPHY clock gate
-
-Optional properties:
-- lantiq,gphy-mode : GPHY_MODE_GE (default) or GPHY_MODE_FE as defined in
- <dt-bindings/mips/lantiq_xway_gphy.h>
-
-
--------------------------------------------------------------------------------
-Example for the GPHys on the xRX200 SoCs:
-
-#include <dt-bindings/mips/lantiq_rcu_gphy.h>
- gphy0: gphy@20 {
- compatible = "lantiq,xrx200a2x-gphy";
- reg = <0x20 0x4>;
-
- resets = <&reset0 31 30>, <&reset1 7 7>;
- reset-names = "gphy", "gphy2";
- clocks = <&pmu0 XRX200_PMU_GATE_GPHY>;
- lantiq,gphy-mode = <GPHY_MODE_GE>;
- };
diff --git a/Documentation/devicetree/bindings/mips/lantiq/rcu.txt b/Documentation/devicetree/bindings/mips/lantiq/rcu.txt
index 7f0822b4beae..58d51f480c9e 100644
--- a/Documentation/devicetree/bindings/mips/lantiq/rcu.txt
+++ b/Documentation/devicetree/bindings/mips/lantiq/rcu.txt
@@ -26,24 +26,6 @@ Example of the RCU bindings on a xRX200 SoC:
ranges = <0x0 0x203000 0x100>;
big-endian;
- gphy0: gphy@20 {
- compatible = "lantiq,xrx200a2x-gphy";
- reg = <0x20 0x4>;
-
- resets = <&reset0 31 30>, <&reset1 7 7>;
- reset-names = "gphy", "gphy2";
- lantiq,gphy-mode = <GPHY_MODE_GE>;
- };
-
- gphy1: gphy@68 {
- compatible = "lantiq,xrx200a2x-gphy";
- reg = <0x68 0x4>;
-
- resets = <&reset0 29 28>, <&reset1 6 6>;
- reset-names = "gphy", "gphy2";
- lantiq,gphy-mode = <GPHY_MODE_GE>;
- };
-
reset0: reset-controller@10 {
compatible = "lantiq,xrx200-reset";
reg = <0x10 4>, <0x14 4>;
diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
index 9201a7d8d7b0..540c65ed9cba 100644
--- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
+++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
@@ -15,6 +15,7 @@ Required properties:
"fsl,imx6q-usdhc"
"fsl,imx6sl-usdhc"
"fsl,imx6sx-usdhc"
+ "fsl,imx6ull-usdhc"
"fsl,imx7d-usdhc"
"fsl,imx8qxp-usdhc"
diff --git a/Documentation/devicetree/bindings/mmc/mmc.txt b/Documentation/devicetree/bindings/mmc/mmc.txt
index f5a0923b34ca..cdbcfd3a4ff2 100644
--- a/Documentation/devicetree/bindings/mmc/mmc.txt
+++ b/Documentation/devicetree/bindings/mmc/mmc.txt
@@ -62,6 +62,8 @@ Optional properties:
be referred to mmc-pwrseq-simple.txt. But now it's reused as a tunable delay
waiting for I/O signalling and card power supply to be stable, regardless of
whether pwrseq-simple is used. Default to 10ms if no available.
+- supports-cqe : The presence of this property indicates that the corresponding
+ MMC host controller supports HW command queue feature.
*NOTE* on CD and WP polarity. To use common for all SD/MMC host controllers line
polarity properties, we have to fix the meaning of the "normal" and "inverted"
diff --git a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
index 32b4b4e41923..2cecdc71d94c 100644
--- a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
+++ b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
@@ -39,12 +39,16 @@ sdhci@c8000200 {
bus-width = <8>;
};
-Optional properties for Tegra210 and Tegra186:
+Optional properties for Tegra210, Tegra186 and Tegra194:
- pinctrl-names, pinctrl-0, pinctrl-1 : Specify pad voltage
configurations. Valid pinctrl-names are "sdmmc-3v3" and "sdmmc-1v8"
for controllers supporting multiple voltage levels. The order of names
should correspond to the pin configuration states in pinctrl-0 and
pinctrl-1.
+- pinctrl-names : "sdmmc-3v3-drv" and "sdmmc-1v8-drv" are applicable for
+ Tegra210 where pad config registers are in the pinmux register domain
+ for pull-up-strength and pull-down-strength values configuration when
+ using pads at 3V3 and 1V8 levels.
- nvidia,only-1-8-v : The presence of this property indicates that the
controller operates at a 1.8 V fixed I/O voltage.
- nvidia,pad-autocal-pull-up-offset-3v3,
diff --git a/Documentation/devicetree/bindings/mmc/ti-omap.txt b/Documentation/devicetree/bindings/mmc/ti-omap.txt
index 8de579969763..02fd31cf361d 100644
--- a/Documentation/devicetree/bindings/mmc/ti-omap.txt
+++ b/Documentation/devicetree/bindings/mmc/ti-omap.txt
@@ -24,31 +24,3 @@ Examples:
dmas = <&sdma 61 &sdma 62>;
dma-names = "tx", "rx";
};
-
-* TI MMC host controller for OMAP1 and 2420
-
-The MMC Host Controller on TI OMAP1 and 2420 family provides
-an interface for MMC, SD, and SDIO types of memory cards.
-
-This file documents differences between the core properties described
-by mmc.txt and the properties used by the omap mmc driver.
-
-Note that this driver will not work with omap2430 or later omaps,
-please see the omap hsmmc driver for the current omaps.
-
-Required properties:
-- compatible: Must be "ti,omap2420-mmc", for OMAP2420 controllers
-- ti,hwmods: For 2420, must be "msdi<n>", where n is controller
- instance starting 1
-
-Examples:
-
- msdi1: mmc@4809c000 {
- compatible = "ti,omap2420-mmc";
- ti,hwmods = "msdi1";
- reg = <0x4809c000 0x80>;
- interrupts = <83>;
- dmas = <&sdma 61 &sdma 62>;
- dma-names = "tx", "rx";
- };
-
diff --git a/Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt b/Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt
new file mode 100644
index 000000000000..3983c11e062c
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt
@@ -0,0 +1,60 @@
+Amlogic NAND Flash Controller (NFC) for GXBB/GXL/AXG family SoCs
+
+This file documents the properties in addition to those available in
+the MTD NAND bindings.
+
+Required properties:
+- compatible : contains one of:
+ - "amlogic,meson-gxl-nfc"
+ - "amlogic,meson-axg-nfc"
+- clocks :
+ A list of phandle + clock-specifier pairs for the clocks listed
+ in clock-names.
+
+- clock-names: Should contain the following:
+ "core" - NFC module gate clock
+ "device" - device clock from eMMC sub clock controller
+ "rx" - rx clock phase
+ "tx" - tx clock phase
+
+- amlogic,mmc-syscon : Required for NAND clocks, it's shared with SD/eMMC
+ controller port C
+
+Optional children nodes:
+Children nodes represent the available nand chips.
+
+Other properties:
+see Documentation/devicetree/bindings/mtd/nand.txt for generic bindings.
+
+Example demonstrate on AXG SoC:
+
+ sd_emmc_c_clkc: mmc@7000 {
+ compatible = "amlogic,meson-axg-mmc-clkc", "syscon";
+ reg = <0x0 0x7000 0x0 0x800>;
+ };
+
+ nand-controller@7800 {
+ compatible = "amlogic,meson-axg-nfc";
+ reg = <0x0 0x7800 0x0 0x100>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ interrupts = <GIC_SPI 34 IRQ_TYPE_EDGE_RISING>;
+
+ clocks = <&clkc CLKID_SD_EMMC_C>,
+ <&sd_emmc_c_clkc CLKID_MMC_DIV>,
+ <&sd_emmc_c_clkc CLKID_MMC_PHASE_RX>,
+ <&sd_emmc_c_clkc CLKID_MMC_PHASE_TX>;
+ clock-names = "core", "device", "rx", "tx";
+ amlogic,mmc-syscon = <&sd_emmc_c_clkc>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&nand_pins>;
+
+ nand@0 {
+ reg = <0>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+
+ nand-on-flash-bbt;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
index bb2075df9b38..4345c3a6f530 100644
--- a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
@@ -4,6 +4,7 @@ Required properties:
- compatible : should be one of the following:
Generic default - "cdns,qspi-nor".
For TI 66AK2G SoC - "ti,k2g-qspi", "cdns,qspi-nor".
+ For TI AM654 SoC - "ti,am654-ospi", "cdns,qspi-nor".
- reg : Contains two entries, each of which is a tuple consisting of a
physical address and length. The first entry is the address and
length of the controller register set. The second entry is the
diff --git a/Documentation/devicetree/bindings/mtd/mtk-quadspi.txt b/Documentation/devicetree/bindings/mtd/mtk-quadspi.txt
index 56d3668e2c50..a12e3b5c495d 100644
--- a/Documentation/devicetree/bindings/mtd/mtk-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/mtk-quadspi.txt
@@ -1,4 +1,4 @@
-* Serial NOR flash controller for MTK MT81xx (and similar)
+* Serial NOR flash controller for MediaTek SoCs
Required properties:
- compatible: For mt8173, compatible should be "mediatek,mt8173-nor",
@@ -10,6 +10,7 @@ Required properties:
"mediatek,mt2712-nor", "mediatek,mt8173-nor"
"mediatek,mt7622-nor", "mediatek,mt8173-nor"
"mediatek,mt7623-nor", "mediatek,mt8173-nor"
+ "mediatek,mt7629-nor", "mediatek,mt8173-nor"
"mediatek,mt8173-nor"
- reg: physical base address and length of the controller's register
- clocks: the phandle of the clocks needed by the nor controller
diff --git a/Documentation/devicetree/bindings/mtd/stm32-fmc2-nand.txt b/Documentation/devicetree/bindings/mtd/stm32-fmc2-nand.txt
new file mode 100644
index 000000000000..ad2bef826582
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/stm32-fmc2-nand.txt
@@ -0,0 +1,61 @@
+STMicroelectronics Flexible Memory Controller 2 (FMC2)
+NAND Interface
+
+Required properties:
+- compatible: Should be one of:
+ * st,stm32mp15-fmc2
+- reg: NAND flash controller memory areas.
+ First region contains the register location.
+ Regions 2 to 4 respectively contain the data, command,
+ and address space for CS0.
+ Regions 5 to 7 contain the same areas for CS1.
+- interrupts: The interrupt number
+- pinctrl-0: Standard Pinctrl phandle (see: pinctrl/pinctrl-bindings.txt)
+- clocks: The clock needed by the NAND flash controller
+
+Optional properties:
+- resets: Reference to a reset controller asserting the FMC controller
+- dmas: DMA specifiers (see: dma/stm32-mdma.txt)
+- dma-names: Must be "tx", "rx" and "ecc"
+
+* NAND device bindings:
+
+Required properties:
+- reg: describes the CS lines assigned to the NAND device.
+
+Optional properties:
+- nand-on-flash-bbt: see nand.txt
+- nand-ecc-strength: see nand.txt
+- nand-ecc-step-size: see nand.txt
+
+The following ECC strength and step size are currently supported:
+ - nand-ecc-strength = <1>, nand-ecc-step-size = <512> (Hamming)
+ - nand-ecc-strength = <4>, nand-ecc-step-size = <512> (BCH4)
+ - nand-ecc-strength = <8>, nand-ecc-step-size = <512> (BCH8) (default)
+
+Example:
+
+ fmc: nand-controller@58002000 {
+ compatible = "st,stm32mp15-fmc2";
+ reg = <0x58002000 0x1000>,
+ <0x80000000 0x1000>,
+ <0x88010000 0x1000>,
+ <0x88020000 0x1000>,
+ <0x81000000 0x1000>,
+ <0x89010000 0x1000>,
+ <0x89020000 0x1000>;
+ interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&rcc FMC_K>;
+ resets = <&rcc FMC_R>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&fmc_pins_a>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ nand@0 {
+ reg = <0>;
+ nand-on-flash-bbt;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt
index 37d67926dd6d..b1ad6ee68e90 100644
--- a/Documentation/devicetree/bindings/net/btusb.txt
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -9,6 +9,9 @@ Required properties:
(more may be added later) are:
"usb1286,204e" (Marvell 8997)
+ "usbcf3,e300" (Qualcomm QCA6174A)
+ "usb4ca,301a" (Qualcomm QCA6174A (Lite-On))
+
Also, vendors that use btusb may have device additional properties, e.g:
Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt b/Documentation/devicetree/bindings/net/dsa/ksz.txt
index 0f407fb371ce..e7db7268fd0f 100644
--- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
+++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
@@ -7,6 +7,11 @@ Required properties:
of the following:
- "microchip,ksz9477"
- "microchip,ksz9897"
+ - "microchip,ksz9896"
+ - "microchip,ksz9567"
+ - "microchip,ksz8565"
+ - "microchip,ksz9893"
+ - "microchip,ksz9563"
Optional properties:
@@ -19,58 +24,96 @@ Examples:
Ethernet switch connected via SPI to the host, CPU port wired to eth0:
- eth0: ethernet@10001000 {
- fixed-link {
- speed = <1000>;
- full-duplex;
- };
- };
+ eth0: ethernet@10001000 {
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
- spi1: spi@f8008000 {
- pinctrl-0 = <&pinctrl_spi_ksz>;
- cs-gpios = <&pioC 25 0>;
- id = <1>;
+ spi1: spi@f8008000 {
+ pinctrl-0 = <&pinctrl_spi_ksz>;
+ cs-gpios = <&pioC 25 0>;
+ id = <1>;
- ksz9477: ksz9477@0 {
- compatible = "microchip,ksz9477";
- reg = <0>;
+ ksz9477: ksz9477@0 {
+ compatible = "microchip,ksz9477";
+ reg = <0>;
- spi-max-frequency = <44000000>;
- spi-cpha;
- spi-cpol;
+ spi-max-frequency = <44000000>;
+ spi-cpha;
+ spi-cpol;
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
- port@0 {
- reg = <0>;
- label = "lan1";
- };
- port@1 {
- reg = <1>;
- label = "lan2";
- };
- port@2 {
- reg = <2>;
- label = "lan3";
- };
- port@3 {
- reg = <3>;
- label = "lan4";
- };
- port@4 {
- reg = <4>;
- label = "lan5";
- };
- port@5 {
- reg = <5>;
- label = "cpu";
- ethernet = <&eth0>;
- fixed-link {
- speed = <1000>;
- full-duplex;
- };
- };
- };
- };
- };
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+ port@2 {
+ reg = <2>;
+ label = "lan3";
+ };
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+ port@4 {
+ reg = <4>;
+ label = "lan5";
+ };
+ port@5 {
+ reg = <5>;
+ label = "cpu";
+ ethernet = <&eth0>;
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
+ };
+ };
+ ksz8565: ksz8565@0 {
+ compatible = "microchip,ksz8565";
+ reg = <0>;
+
+ spi-max-frequency = <44000000>;
+ spi-cpha;
+ spi-cpol;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ port@0 {
+ reg = <0>;
+ label = "lan1";
+ };
+ port@1 {
+ reg = <1>;
+ label = "lan2";
+ };
+ port@2 {
+ reg = <2>;
+ label = "lan3";
+ };
+ port@3 {
+ reg = <3>;
+ label = "lan4";
+ };
+ port@6 {
+ reg = <6>;
+ label = "cpu";
+ ethernet = <&eth0>;
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
index aa3527f71fdc..47aa205ee0bd 100644
--- a/Documentation/devicetree/bindings/net/dsa/mt7530.txt
+++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
@@ -3,12 +3,16 @@ Mediatek MT7530 Ethernet switch
Required properties:
-- compatible: Must be compatible = "mediatek,mt7530";
+- compatible: may be compatible = "mediatek,mt7530"
+ or compatible = "mediatek,mt7621"
- #address-cells: Must be 1.
- #size-cells: Must be 0.
- mediatek,mcm: Boolean; if defined, indicates that either MT7530 is the part
on multi-chip module belong to MT7623A has or the remotely standalone
chip as the function MT7623N reference board provided for.
+
+If compatible mediatek,mt7530 is set then the following properties are required
+
- core-supply: Phandle to the regulator node necessary for the core power.
- io-supply: Phandle to the regulator node necessary for the I/O power.
See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt
diff --git a/Documentation/devicetree/bindings/net/fsl-enetc.txt b/Documentation/devicetree/bindings/net/fsl-enetc.txt
new file mode 100644
index 000000000000..c812e25ae90f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/fsl-enetc.txt
@@ -0,0 +1,69 @@
+* ENETC ethernet device tree bindings
+
+Depending on board design and ENETC port type (internal or
+external) there are two supported link modes specified by
+below device tree bindings.
+
+Required properties:
+
+- reg : Specifies PCIe Device Number and Function
+ Number of the ENETC endpoint device, according
+ to parent node bindings.
+- compatible : Should be "fsl,enetc".
+
+1) The ENETC external port is connected to a MDIO configurable phy:
+
+In this case, the ENETC node should include a "mdio" sub-node
+that in turn should contain the "ethernet-phy" node describing the
+external phy. Below properties are required, their bindings
+already defined in ethernet.txt or phy.txt, under
+Documentation/devicetree/bindings/net/*.
+
+Required:
+
+- phy-handle : Phandle to a PHY on the MDIO bus.
+ Defined in ethernet.txt.
+
+- phy-connection-type : Defined in ethernet.txt.
+
+- mdio : "mdio" node, defined in mdio.txt.
+
+- ethernet-phy : "ethernet-phy" node, defined in phy.txt.
+
+Example:
+
+ ethernet@0,0 {
+ compatible = "fsl,enetc";
+ reg = <0x000000 0 0 0 0>;
+ phy-handle = <&sgmii_phy0>;
+ phy-connection-type = "sgmii";
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ sgmii_phy0: ethernet-phy@2 {
+ reg = <0x2>;
+ };
+ };
+ };
+
+2) The ENETC port is an internal port or has a fixed-link external
+connection:
+
+In this case, the ENETC port node defines a fixed link connection,
+as specified by "fixed-link.txt", under
+Documentation/devicetree/bindings/net/*.
+
+Required:
+
+- fixed-link : "fixed-link" node, defined in "fixed-link.txt".
+
+Example:
+ ethernet@0,2 {
+ compatible = "fsl,enetc";
+ reg = <0x000200 0 0 0 0>;
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt
index 3e17ac1d5d58..174f292d8a3e 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -3,8 +3,8 @@
Required properties:
- compatible: Should be "cdns,[<chip>-]{macb|gem}"
Use "cdns,at91rm9200-emac" Atmel at91rm9200 SoC.
- Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs or the 10/100Mbit IP
- available on sama5d3 SoCs.
+ Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs.
+ Use "cdns,sam9x60-macb" for Microchip sam9x60 SoC.
Use "cdns,np4-macb" for NP4 SoC devices.
Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb".
Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index bedcfd5a52cd..691f886cfc4a 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -19,7 +19,7 @@ Optional properties:
"marvell,armada-370-neta" and 9800B for others.
- clock-names: List of names corresponding to clocks property; shall be
"core" for core clock and "bus" for the optional bus clock.
-
+- phys: comphy for the ethernet port, see ../phy/phy-bindings.txt
Optional properties (valid only for Armada XP/38x):
diff --git a/Documentation/devicetree/bindings/net/mdio-mux-multiplexer.txt b/Documentation/devicetree/bindings/net/mdio-mux-multiplexer.txt
new file mode 100644
index 000000000000..534e38058fe0
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux-multiplexer.txt
@@ -0,0 +1,82 @@
+Properties for an MDIO bus multiplexer consumer device
+
+This is a special case of MDIO mux when MDIO mux is defined as a consumer
+of a mux producer device. The mux producer can be of any type like mmio mux
+producer, gpio mux producer or generic register based mux producer.
+
+Required properties in addition to the MDIO Bus multiplexer properties:
+
+- compatible : should be "mmio-mux-multiplexer"
+- mux-controls : mux controller node to use for operating the mux
+- mdio-parent-bus : phandle to the parent MDIO bus.
+
+each child node of mdio bus multiplexer consumer device represent a mdio
+bus.
+
+for more information please refer
+Documentation/devicetree/bindings/mux/mux-controller.txt
+and Documentation/devicetree/bindings/net/mdio-mux.txt
+
+Example:
+In below example the Mux producer and consumer are separate nodes.
+
+&i2c0 {
+ fpga@66 { // fpga connected to i2c
+ compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c",
+ "simple-mfd";
+ reg = <0x66>;
+
+ mux: mux-controller { // Mux Producer
+ compatible = "reg-mux";
+ #mux-control-cells = <1>;
+ mux-reg-masks = <0x54 0xf8>, /* 0: reg 0x54, bits 7:3 */
+ <0x54 0x07>; /* 1: reg 0x54, bits 2:0 */
+ };
+ };
+};
+
+mdio-mux-1 { // Mux consumer
+ compatible = "mdio-mux-multiplexer";
+ mux-controls = <&mux 0>;
+ mdio-parent-bus = <&emdio1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mdio@0 {
+ reg = <0x0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ mdio@8 {
+ reg = <0x8>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ ..
+ ..
+};
+
+mdio-mux-2 { // Mux consumer
+ compatible = "mdio-mux-multiplexer";
+ mux-controls = <&mux 1>;
+ mdio-parent-bus = <&emdio2>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mdio@0 {
+ reg = <0x0>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ mdio@1 {
+ reg = <0x1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ };
+
+ ..
+ ..
+};
diff --git a/Documentation/devicetree/bindings/net/mediatek-bluetooth.txt b/Documentation/devicetree/bindings/net/mediatek-bluetooth.txt
index 14ceb2a5b4e8..41a7dcc80f5b 100644
--- a/Documentation/devicetree/bindings/net/mediatek-bluetooth.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-bluetooth.txt
@@ -33,3 +33,67 @@ Example:
clock-names = "ref";
};
};
+
+MediaTek UART based Bluetooth Devices
+==================================
+
+This device is a serial attached device to UART device and thus it must be a
+child node of the serial node with UART.
+
+Please refer to the following documents for generic properties:
+
+ Documentation/devicetree/bindings/serial/slave-device.txt
+
+Required properties:
+
+- compatible: Must be
+ "mediatek,mt7663u-bluetooth": for MT7663U device
+ "mediatek,mt7668u-bluetooth": for MT7668U device
+- vcc-supply: Main voltage regulator
+- pinctrl-names: Should be "default", "runtime"
+- pinctrl-0: Should contain UART RXD low when the device is powered up to
+ enter proper bootstrap mode.
+- pinctrl-1: Should contain UART mode pin ctrl
+
+Optional properties:
+
+- reset-gpios: GPIO used to reset the device whose initial state keeps low,
+ if the GPIO is missing, then board-level design should be
+ guaranteed.
+- current-speed: Current baud rate of the device whose defaults to 921600
+
+Example:
+
+ uart1_pins_boot: uart1-default {
+ pins-dat {
+ pinmux = <MT7623_PIN_81_URXD1_FUNC_GPIO81>;
+ output-low;
+ };
+ };
+
+ uart1_pins_runtime: uart1-runtime {
+ pins-dat {
+ pinmux = <MT7623_PIN_81_URXD1_FUNC_URXD1>,
+ <MT7623_PIN_82_UTXD1_FUNC_UTXD1>;
+ };
+ };
+
+ uart1: serial@11003000 {
+ compatible = "mediatek,mt7623-uart",
+ "mediatek,mt6577-uart";
+ reg = <0 0x11003000 0 0x400>;
+ interrupts = <GIC_SPI 52 IRQ_TYPE_LEVEL_LOW>;
+ clocks = <&pericfg CLK_PERI_UART1_SEL>,
+ <&pericfg CLK_PERI_UART1>;
+ clock-names = "baud", "bus";
+
+ bluetooth {
+ compatible = "mediatek,mt7663u-bluetooth";
+ vcc-supply = <&reg_5v>;
+ reset-gpios = <&pio 24 GPIO_ACTIVE_LOW>;
+ pinctrl-names = "default", "runtime";
+ pinctrl-0 = <&uart1_pins_boot>;
+ pinctrl-1 = <&uart1_pins_runtime>;
+ current-speed = <921600>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/nixge.txt b/Documentation/devicetree/bindings/net/nixge.txt
index e55af7f0881a..85d7240a9b20 100644
--- a/Documentation/devicetree/bindings/net/nixge.txt
+++ b/Documentation/devicetree/bindings/net/nixge.txt
@@ -1,17 +1,53 @@
* NI XGE Ethernet controller
Required properties:
-- compatible: Should be "ni,xge-enet-2.00"
-- reg: Address and length of the register set for the device
+- compatible: Should be "ni,xge-enet-3.00", but can be "ni,xge-enet-2.00" for
+ older device trees with DMA engines co-located in the address map,
+ with the one reg entry to describe the whole device.
+- reg: Address and length of the register set for the device. It contains the
+ information of registers in the same order as described by reg-names.
+- reg-names: Should contain the reg names
+ "dma": DMA engine control and status region
+ "ctrl": MDIO and PHY control and status region
- interrupts: Should contain tx and rx interrupt
- interrupt-names: Should be "rx" and "tx"
- phy-mode: See ethernet.txt file in the same directory.
-- phy-handle: See ethernet.txt file in the same directory.
- nvmem-cells: Phandle of nvmem cell containing the MAC address
- nvmem-cell-names: Should be "address"
+Optional properties:
+- mdio subnode to indicate presence of MDIO controller
+- fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
+ Use instead of phy-handle.
+- phy-handle: See ethernet.txt file in the same directory.
+
Examples (10G generic PHY):
nixge0: ethernet@40000000 {
+ compatible = "ni,xge-enet-3.00";
+ reg = <0x40000000 0x4000
+ 0x41002000 0x2000>;
+ reg-names = "dma", "ctrl";
+
+ nvmem-cells = <&eth1_addr>;
+ nvmem-cell-names = "address";
+
+ interrupts = <0 29 IRQ_TYPE_LEVEL_HIGH>, <0 30 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "rx", "tx";
+ interrupt-parent = <&intc>;
+
+ phy-mode = "xgmii";
+ phy-handle = <&ethernet_phy1>;
+
+ mdio {
+ ethernet_phy1: ethernet-phy@4 {
+ compatible = "ethernet-phy-ieee802.3-c45";
+ reg = <4>;
+ };
+ };
+ };
+
+Examples (10G generic PHY, no MDIO):
+ nixge0: ethernet@40000000 {
compatible = "ni,xge-enet-2.00";
reg = <0x40000000 0x6000>;
@@ -24,9 +60,33 @@ Examples (10G generic PHY):
phy-mode = "xgmii";
phy-handle = <&ethernet_phy1>;
+ };
+
+Examples (1G generic fixed-link + MDIO):
+ nixge0: ethernet@40000000 {
+ compatible = "ni,xge-enet-2.00";
+ reg = <0x40000000 0x6000>;
- ethernet_phy1: ethernet-phy@4 {
- compatible = "ethernet-phy-ieee802.3-c45";
- reg = <4>;
+ nvmem-cells = <&eth1_addr>;
+ nvmem-cell-names = "address";
+
+ interrupts = <0 29 IRQ_TYPE_LEVEL_HIGH>, <0 30 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "rx", "tx";
+ interrupt-parent = <&intc>;
+
+ phy-mode = "xgmii";
+
+ fixed-link {
+ speed = <1000>;
+ pause;
+ link-gpios = <&gpio0 63 GPIO_ACTIVE_HIGH>;
+ };
+
+ mdio {
+ ethernet_phy1: ethernet-phy@4 {
+ compatible = "ethernet-phy-ieee802.3-c22";
+ reg = <4>;
+ };
};
+
};
diff --git a/Documentation/devicetree/bindings/net/qcom,ethqos.txt b/Documentation/devicetree/bindings/net/qcom,ethqos.txt
new file mode 100644
index 000000000000..fcf5035810b5
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom,ethqos.txt
@@ -0,0 +1,64 @@
+Qualcomm Ethernet ETHQOS device
+
+This documents dwmmac based ethernet device which supports Gigabit
+ethernet for version v2.3.0 onwards.
+
+This device has following properties:
+
+Required properties:
+
+- compatible: Should be qcom,qcs404-ethqos"
+
+- reg: Address and length of the register set for the device
+
+- reg-names: Should contain register names "stmmaceth", "rgmii"
+
+- clocks: Should contain phandle to clocks
+
+- clock-names: Should contain clock names "stmmaceth", "pclk",
+ "ptp_ref", "rgmii"
+
+- interrupts: Should contain phandle to interrupts
+
+- interrupt-names: Should contain interrupt names "macirq", "eth_lpi"
+
+Rest of the properties are defined in stmmac.txt file in same directory
+
+
+Example:
+
+ethernet: ethernet@7a80000 {
+ compatible = "qcom,qcs404-ethqos";
+ reg = <0x07a80000 0x10000>,
+ <0x07a96000 0x100>;
+ reg-names = "stmmaceth", "rgmii";
+ clock-names = "stmmaceth", "pclk", "ptp_ref", "rgmii";
+ clocks = <&gcc GCC_ETH_AXI_CLK>,
+ <&gcc GCC_ETH_SLAVE_AHB_CLK>,
+ <&gcc GCC_ETH_PTP_CLK>,
+ <&gcc GCC_ETH_RGMII_CLK>;
+ interrupts = <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "macirq", "eth_lpi";
+ snps,reset-gpio = <&tlmm 60 GPIO_ACTIVE_LOW>;
+ snps,reset-active-low;
+
+ snps,txpbl = <8>;
+ snps,rxpbl = <2>;
+ snps,aal;
+ snps,tso;
+
+ phy-handle = <&phy1>;
+ phy-mode = "rgmii";
+
+ mdio {
+ #address-cells = <0x1>;
+ #size-cells = <0x0>;
+ compatible = "snps,dwmac-mdio";
+ phy1: phy@4 {
+ device_type = "ethernet-phy";
+ reg = <0x4>;
+ };
+ };
+
+};
diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
index 0c17a0ec9b7b..7b9a776230c0 100644
--- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
+++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt
@@ -4,6 +4,13 @@ This node provides properties for configuring the MediaTek mt76xx wireless
device. The node is expected to be specified as a child node of the PCI
controller to which the wireless chip is connected.
+Alternatively, it can specify the wireless part of the MT7628/MT7688 SoC.
+For SoC, use the compatible string "mediatek,mt7628-wmac" and the following
+properties:
+
+- reg: Address and length of the register set for the device.
+- interrupts: Main device interrupt
+
Optional properties:
- mac-address: See ethernet.txt in the parent directory
@@ -30,3 +37,15 @@ Optional nodes:
};
};
};
+
+MT7628 example:
+
+wmac: wmac@10300000 {
+ compatible = "mediatek,mt7628-wmac";
+ reg = <0x10300000 0x100000>;
+
+ interrupt-parent = <&cpuintc>;
+ interrupts = <6>;
+
+ mediatek,mtd-eeprom = <&factory 0x0000>;
+};
diff --git a/Documentation/devicetree/bindings/phy/phy-armada38x-comphy.txt b/Documentation/devicetree/bindings/phy/phy-armada38x-comphy.txt
new file mode 100644
index 000000000000..ad49e5c01334
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-armada38x-comphy.txt
@@ -0,0 +1,40 @@
+mvebu armada 38x comphy driver
+------------------------------
+
+This comphy controller can be found on Marvell Armada 38x. It provides a
+number of shared PHYs used by various interfaces (network, sata, usb,
+PCIe...).
+
+Required properties:
+
+- compatible: should be "marvell,armada-380-comphy"
+- reg: should contain the comphy register location and length.
+- #address-cells: should be 1.
+- #size-cells: should be 0.
+
+A sub-node is required for each comphy lane provided by the comphy.
+
+Required properties (child nodes):
+
+- reg: comphy lane number.
+- #phy-cells : from the generic phy bindings, must be 1. Defines the
+ input port to use for a given comphy lane.
+
+Example:
+
+ comphy: phy@18300 {
+ compatible = "marvell,armada-380-comphy";
+ reg = <0x18300 0x100>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpm_comphy0: phy@0 {
+ reg = <0>;
+ #phy-cells = <1>;
+ };
+
+ cpm_comphy1: phy@1 {
+ reg = <1>;
+ #phy-cells = <1>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
index c5d0e7998e2b..454c937076a2 100644
--- a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
+++ b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
@@ -17,6 +17,11 @@ Clock Properties:
- fsl,tmr-fiper1 Fixed interval period pulse generator.
- fsl,tmr-fiper2 Fixed interval period pulse generator.
- fsl,max-adj Maximum frequency adjustment in parts per billion.
+ - fsl,extts-fifo The presence of this property indicates hardware
+ support for the external trigger stamp FIFO.
+ - little-endian The presence of this property indicates the 1588 timer
+ IP block is little-endian mode. The default endian mode
+ is big-endian.
These properties set the operational parameters for the PTP
clock. You must choose these carefully for the clock to work right.
diff --git a/Documentation/devicetree/bindings/regulator/fan53555.txt b/Documentation/devicetree/bindings/regulator/fan53555.txt
index 54a3f2c80e3a..e7fc045281d1 100644
--- a/Documentation/devicetree/bindings/regulator/fan53555.txt
+++ b/Documentation/devicetree/bindings/regulator/fan53555.txt
@@ -1,7 +1,8 @@
Binding for Fairchild FAN53555 regulators
Required properties:
- - compatible: one of "fcs,fan53555", "silergy,syr827", "silergy,syr828"
+ - compatible: one of "fcs,fan53555", "fcs,fan53526", "silergy,syr827" or
+ "silergy,syr828"
- reg: I2C address
Optional properties:
diff --git a/Documentation/devicetree/bindings/regulator/fixed-regulator.txt b/Documentation/devicetree/bindings/regulator/fixed-regulator.txt
deleted file mode 100644
index 0c2a6c8a1536..000000000000
--- a/Documentation/devicetree/bindings/regulator/fixed-regulator.txt
+++ /dev/null
@@ -1,35 +0,0 @@
-Fixed Voltage regulators
-
-Required properties:
-- compatible: Must be "regulator-fixed";
-- regulator-name: Defined in regulator.txt as optional, but required here.
-
-Optional properties:
-- gpio: gpio to use for enable control
-- startup-delay-us: startup time in microseconds
-- enable-active-high: Polarity of GPIO is Active high
-If this property is missing, the default assumed is Active low.
-- gpio-open-drain: GPIO is open drain type.
- If this property is missing then default assumption is false.
--vin-supply: Input supply name.
-
-Any property defined as part of the core regulator
-binding, defined in regulator.txt, can also be used.
-However a fixed voltage regulator is expected to have the
-regulator-min-microvolt and regulator-max-microvolt
-to be the same.
-
-Example:
-
- abc: fixedregulator@0 {
- compatible = "regulator-fixed";
- regulator-name = "fixed-supply";
- regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
- gpio = <&gpio1 16 0>;
- startup-delay-us = <70000>;
- enable-active-high;
- regulator-boot-on;
- gpio-open-drain;
- vin-supply = <&parent_reg>;
- };
diff --git a/Documentation/devicetree/bindings/regulator/fixed-regulator.yaml b/Documentation/devicetree/bindings/regulator/fixed-regulator.yaml
new file mode 100644
index 000000000000..d289c2f7455a
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/fixed-regulator.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/regulator/fixed-regulator.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Fixed Voltage regulators
+
+maintainers:
+ - Liam Girdwood <lgirdwood@gmail.com>
+ - Mark Brown <broonie@kernel.org>
+
+description:
+ Any property defined as part of the core regulator binding, defined in
+ regulator.txt, can also be used. However a fixed voltage regulator is
+ expected to have the regulator-min-microvolt and regulator-max-microvolt
+ to be the same.
+
+properties:
+ compatible:
+ const: regulator-fixed
+
+ regulator-name: true
+
+ gpio:
+ description: gpio to use for enable control
+ maxItems: 1
+
+ startup-delay-us:
+ description: startup time in microseconds
+ $ref: /schemas/types.yaml#/definitions/uint32
+
+ enable-active-high:
+ description:
+ Polarity of GPIO is Active high. If this property is missing,
+ the default assumed is Active low.
+ type: boolean
+
+ gpio-open-drain:
+ description:
+ GPIO is open drain type. If this property is missing then default
+ assumption is false.
+ type: boolean
+
+ vin-supply:
+ description: Input supply phandle.
+ $ref: /schemas/types.yaml#/definitions/phandle
+
+required:
+ - compatible
+ - regulator-name
+
+examples:
+ - |
+ reg_1v8: regulator-1v8 {
+ compatible = "regulator-fixed";
+ regulator-name = "1v8";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ gpio = <&gpio1 16 0>;
+ startup-delay-us = <70000>;
+ enable-active-high;
+ regulator-boot-on;
+ gpio-open-drain;
+ vin-supply = <&parent_reg>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/regulator/max77650-regulator.txt b/Documentation/devicetree/bindings/regulator/max77650-regulator.txt
new file mode 100644
index 000000000000..f1cbe813c30f
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/max77650-regulator.txt
@@ -0,0 +1,41 @@
+Regulator driver for MAX77650 PMIC from Maxim Integrated.
+
+This module is part of the MAX77650 MFD device. For more details
+see Documentation/devicetree/bindings/mfd/max77650.txt.
+
+The regulator controller is represented as a sub-node of the PMIC node
+on the device tree.
+
+The device has a single LDO regulator and a SIMO buck-boost regulator with
+three independent power rails.
+
+Required properties:
+--------------------
+- compatible: Must be "maxim,max77650-regulator"
+
+Each rail must be instantiated under the regulators subnode of the top PMIC
+node. Up to four regulators can be defined. For standard regulator properties
+refer to Documentation/devicetree/bindings/regulator/regulator.txt.
+
+Available regulator compatible strings are: "ldo", "sbb0", "sbb1", "sbb2".
+
+Example:
+--------
+
+ regulators {
+ compatible = "maxim,max77650-regulator";
+
+ max77650_ldo: regulator@0 {
+ regulator-compatible = "ldo";
+ regulator-name = "max77650-ldo";
+ regulator-min-microvolt = <1350000>;
+ regulator-max-microvolt = <2937500>;
+ };
+
+ max77650_sbb0: regulator@1 {
+ regulator-compatible = "sbb0";
+ regulator-name = "max77650-sbb0";
+ regulator-min-microvolt = <800000>;
+ regulator-max-microvolt = <1587500>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt b/Documentation/devicetree/bindings/regulator/pfuze100.txt
index f9be1acf891c..4d3b12b92cb3 100644
--- a/Documentation/devicetree/bindings/regulator/pfuze100.txt
+++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt
@@ -8,7 +8,7 @@ Optional properties:
- fsl,pfuze-support-disable-sw: Boolean, if present disable all unused switch
regulators to save power consumption. Attention, ensure that all important
regulators (e.g. DDR ref, DDR supply) has set the "regulator-always-on"
- property. If not present, the switched regualtors are always on and can't be
+ property. If not present, the switched regulators are always on and can't be
disabled. This binding is a workaround to keep backward compatibility with
old dtb's which rely on the fact that the switched regulators are always on
and don't mark them explicit as "regulator-always-on".
diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd70528-regulator.txt b/Documentation/devicetree/bindings/regulator/rohm,bd70528-regulator.txt
new file mode 100644
index 000000000000..698cfc3bc3dd
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd70528-regulator.txt
@@ -0,0 +1,68 @@
+ROHM BD70528 Power Management Integrated Circuit regulator bindings
+
+Required properties:
+ - regulator-name: should be "buck1", "buck2", "buck3", "ldo1", "ldo2", "ldo3",
+ "led_ldo1", "led_ldo2"
+
+List of regulators provided by this controller. BD70528 regulators node
+should be sub node of the BD70528 MFD node. See BD70528 MFD bindings at
+Documentation/devicetree/bindings/mfd/rohm,bd70528-pmic.txt
+
+The valid names for BD70528 regulator nodes are:
+BUCK1, BUCK2, BUCK3, LDO1, LDO2, LDO3, LED_LDO1, LED_LDO2
+
+Optional properties:
+- Any optional property defined in bindings/regulator/regulator.txt
+
+Example:
+regulators {
+ buck1: BUCK1 {
+ regulator-name = "buck1";
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <3400000>;
+ regulator-boot-on;
+ regulator-ramp-delay = <125>;
+ };
+ buck2: BUCK2 {
+ regulator-name = "buck2";
+ regulator-min-microvolt = <1200000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-boot-on;
+ regulator-ramp-delay = <125>;
+ };
+ buck3: BUCK3 {
+ regulator-name = "buck3";
+ regulator-min-microvolt = <800000>;
+ regulator-max-microvolt = <1800000>;
+ regulator-boot-on;
+ regulator-ramp-delay = <250>;
+ };
+ ldo1: LDO1 {
+ regulator-name = "ldo1";
+ regulator-min-microvolt = <1650000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-boot-on;
+ };
+ ldo2: LDO2 {
+ regulator-name = "ldo2";
+ regulator-min-microvolt = <1650000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-boot-on;
+ };
+
+ ldo3: LDO3 {
+ regulator-name = "ldo3";
+ regulator-min-microvolt = <1650000>;
+ regulator-max-microvolt = <3300000>;
+ };
+ led_ldo1: LED_LDO1 {
+ regulator-name = "led_ldo1";
+ regulator-min-microvolt = <200000>;
+ regulator-max-microvolt = <300000>;
+ };
+ led_ldo2: LED_LDO2 {
+ regulator-name = "led_ldo2";
+ regulator-min-microvolt = <200000>;
+ regulator-max-microvolt = <300000>;
+ };
+};
diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
index 4b98ca26e61a..cbce62c22b60 100644
--- a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt
@@ -27,8 +27,38 @@ BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6
LDO1, LDO2, LDO3, LDO4, LDO5, LDO6
Optional properties:
+- rohm,dvs-run-voltage : PMIC default "RUN" state voltage in uV.
+ See below table for bucks which support this.
+- rohm,dvs-idle-voltage : PMIC default "IDLE" state voltage in uV.
+ See below table for bucks which support this.
+- rohm,dvs-suspend-voltage : PMIC default "SUSPEND" state voltage in uV.
+ See below table for bucks which support this.
- Any optional property defined in bindings/regulator/regulator.txt
+Supported default DVS states:
+
+BD71837:
+buck | dvs-run-voltage | dvs-idle-voltage | dvs-suspend-voltage
+-----------------------------------------------------------------------------
+1 | supported | supported | supported
+----------------------------------------------------------------------------
+2 | supported | supported | not supported
+----------------------------------------------------------------------------
+3 | supported | not supported | not supported
+----------------------------------------------------------------------------
+4 | supported | not supported | not supported
+----------------------------------------------------------------------------
+rest | not supported | not supported | not supported
+
+BD71847:
+buck | dvs-run-voltage | dvs-idle-voltage | dvs-suspend-voltage
+-----------------------------------------------------------------------------
+1 | supported | supported | supported
+----------------------------------------------------------------------------
+2 | supported | supported | not supported
+----------------------------------------------------------------------------
+rest | not supported | not supported | not supported
+
Example:
regulators {
buck1: BUCK1 {
@@ -36,7 +66,11 @@ regulators {
regulator-min-microvolt = <700000>;
regulator-max-microvolt = <1300000>;
regulator-boot-on;
+ regulator-always-on;
regulator-ramp-delay = <1250>;
+ rohm,dvs-run-voltage = <900000>;
+ rohm,dvs-idle-voltage = <850000>;
+ rohm,dvs-suspend-voltage = <800000>;
};
buck2: BUCK2 {
regulator-name = "buck2";
@@ -45,18 +79,22 @@ regulators {
regulator-boot-on;
regulator-always-on;
regulator-ramp-delay = <1250>;
+ rohm,dvs-run-voltage = <1000000>;
+ rohm,dvs-idle-voltage = <900000>;
};
buck3: BUCK3 {
regulator-name = "buck3";
regulator-min-microvolt = <700000>;
regulator-max-microvolt = <1300000>;
regulator-boot-on;
+ rohm,dvs-run-voltage = <1000000>;
};
buck4: BUCK4 {
regulator-name = "buck4";
regulator-min-microvolt = <700000>;
regulator-max-microvolt = <1300000>;
regulator-boot-on;
+ rohm,dvs-run-voltage = <1000000>;
};
buck5: BUCK5 {
regulator-name = "buck5";
diff --git a/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt b/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt
index a3f476240565..6189df71ea98 100644
--- a/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/st,stpmic1-regulator.txt
@@ -23,16 +23,14 @@ Switches are fixed voltage regulators with only enable/disable capability.
Optional properties:
- st,mask-reset: mask reset for this regulator: the regulator configuration
is maintained during pmic reset.
-- regulator-pull-down: enable high pull down
- if not specified light pull down is used
- regulator-over-current-protection:
if set, all regulators are switched off in case of over-current detection
on this regulator,
if not set, the driver only sends an over-current event.
-- interrupt-parent: phandle to the parent interrupt controller
- interrupts: index of current limit detection interrupt
- <regulator>-supply: phandle to the parent supply/regulator node
each regulator supply can be described except vref_ddr.
+- regulator-active-discharge: can be used on pwr_sw1 and pwr_sw2.
Example:
regulators {
@@ -43,7 +41,6 @@ regulators {
vdd_core: buck1 {
regulator-name = "vdd_core";
interrupts = <IT_CURLIM_BUCK1 0>;
- interrupt-parent = <&pmic>;
st,mask-reset;
regulator-pull-down;
regulator-min-microvolt = <700000>;
@@ -53,7 +50,6 @@ regulators {
v3v3: buck4 {
regulator-name = "v3v3";
interrupts = <IT_CURLIM_BUCK4 0>;
- interrupt-parent = <&mypmic>;
regulator-min-microvolt = <3300000>;
regulator-max-microvolt = <3300000>;
diff --git a/Documentation/devicetree/bindings/regulator/tps65218.txt b/Documentation/devicetree/bindings/regulator/tps65218.txt
index 02f0e9bbfbf8..54aded3b78e2 100644
--- a/Documentation/devicetree/bindings/regulator/tps65218.txt
+++ b/Documentation/devicetree/bindings/regulator/tps65218.txt
@@ -71,8 +71,13 @@ tps65218: tps65218@24 {
regulator-always-on;
};
+ ls2: regulator-ls2 {
+ regulator-min-microamp = <100000>;
+ regulator-max-microamp = <1000000>;
+ };
+
ls3: regulator-ls3 {
- regulator-min-microvolt = <100000>;
- regulator-max-microvolt = <1000000>;
+ regulator-min-microamp = <100000>;
+ regulator-max-microamp = <1000000>;
};
};
diff --git a/Documentation/devicetree/bindings/serio/olpc,ap-sp.txt b/Documentation/devicetree/bindings/serio/olpc,ap-sp.txt
index 36603419d6f8..0e72183f52bc 100644
--- a/Documentation/devicetree/bindings/serio/olpc,ap-sp.txt
+++ b/Documentation/devicetree/bindings/serio/olpc,ap-sp.txt
@@ -4,14 +4,10 @@ Required properties:
- compatible : "olpc,ap-sp"
- reg : base address and length of SoC's WTM registers
- interrupts : SP-AP interrupt
-- clocks : phandle + clock-specifier for the clock that drives the WTM
-- clock-names: should be "sp"
Example:
ap-sp@d4290000 {
compatible = "olpc,ap-sp";
reg = <0xd4290000 0x1000>;
interrupts = <40>;
- clocks = <&soc_clocks MMP2_CLK_SP>;
- clock-names = "sp";
}
diff --git a/Documentation/devicetree/bindings/spi/atmel-quadspi.txt b/Documentation/devicetree/bindings/spi/atmel-quadspi.txt
index b93c1e2f25dd..7c40ea694352 100644
--- a/Documentation/devicetree/bindings/spi/atmel-quadspi.txt
+++ b/Documentation/devicetree/bindings/spi/atmel-quadspi.txt
@@ -1,14 +1,19 @@
* Atmel Quad Serial Peripheral Interface (QSPI)
Required properties:
-- compatible: Should be "atmel,sama5d2-qspi".
+- compatible: Should be one of the following:
+ - "atmel,sama5d2-qspi"
+ - "microchip,sam9x60-qspi"
- reg: Should contain the locations and lengths of the base registers
and the mapped memory.
- reg-names: Should contain the resource reg names:
- qspi_base: configuration register address space
- qspi_mmap: memory mapped address space
- interrupts: Should contain the interrupt for the device.
-- clocks: The phandle of the clock needed by the QSPI controller.
+- clocks: Should reference the peripheral clock and the QSPI system
+ clock if available.
+- clock-names: Should contain "pclk" for the peripheral clock and "qspick"
+ for the system clock when available.
- #address-cells: Should be <1>.
- #size-cells: Should be <0>.
@@ -19,7 +24,8 @@ spi@f0020000 {
reg = <0xf0020000 0x100>, <0xd0000000 0x8000000>;
reg-names = "qspi_base", "qspi_mmap";
interrupts = <52 IRQ_TYPE_LEVEL_HIGH 7>;
- clocks = <&spi0_clk>;
+ clocks = <&pmc PMC_TYPE_PERIPHERAL 52>;
+ clock-names = "pclk";
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
diff --git a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
index e3c48b20b1a6..2d3264140cc5 100644
--- a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
+++ b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
@@ -10,6 +10,7 @@ Required properties:
- "fsl,imx35-cspi" for SPI compatible with the one integrated on i.MX35
- "fsl,imx51-ecspi" for SPI compatible with the one integrated on i.MX51
- "fsl,imx53-ecspi" for SPI compatible with the one integrated on i.MX53 and later Soc
+ - "fsl,imx8mq-ecspi" for SPI compatible with the one integrated on i.MX8M
- reg : Offset and length of the register set for the device
- interrupts : Should contain CSPI/eCSPI interrupt
- clocks : Clock specifiers for both ipg and per clocks.
diff --git a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt b/Documentation/devicetree/bindings/spi/spi-fsl-qspi.txt
index 483e9cfac1b1..e8f1d627d288 100644
--- a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
+++ b/Documentation/devicetree/bindings/spi/spi-fsl-qspi.txt
@@ -14,15 +14,13 @@ Required properties:
- clocks : The clocks needed by the QuadSPI controller
- clock-names : Should contain the name of the clocks: "qspi_en" and "qspi".
-Optional properties:
- - fsl,qspi-has-second-chip: The controller has two buses, bus A and bus B.
- Each bus can be connected with two NOR flashes.
- Most of the time, each bus only has one NOR flash
- connected, this is the default case.
- But if there are two NOR flashes connected to the
- bus, you should enable this property.
- (Please check the board's schematic.)
- - big-endian : That means the IP register is big endian
+Required SPI slave node properties:
+ - reg: There are two buses (A and B) with two chip selects each.
+ This encodes to which bus and CS the flash is connected:
+ <0>: Bus A, CS 0
+ <1>: Bus A, CS 1
+ <2>: Bus B, CS 0
+ <3>: Bus B, CS 1
Example:
@@ -40,7 +38,7 @@ qspi0: quadspi@40044000 {
};
};
-Example showing the usage of two SPI NOR devices:
+Example showing the usage of two SPI NOR devices on bus A:
&qspi2 {
pinctrl-names = "default";
diff --git a/Documentation/devicetree/bindings/spi/spi-nxp-fspi.txt b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.txt
new file mode 100644
index 000000000000..2cd67eb727d4
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.txt
@@ -0,0 +1,39 @@
+* NXP Flex Serial Peripheral Interface (FSPI)
+
+Required properties:
+ - compatible : Should be "nxp,lx2160a-fspi"
+ - reg : First contains the register location and length,
+ Second contains the memory mapping address and length
+ - reg-names : Should contain the resource reg names:
+ - fspi_base: configuration register address space
+ - fspi_mmap: memory mapped address space
+ - interrupts : Should contain the interrupt for the device
+
+Required SPI slave node properties:
+ - reg : There are two buses (A and B) with two chip selects each.
+ This encodes to which bus and CS the flash is connected:
+ - <0>: Bus A, CS 0
+ - <1>: Bus A, CS 1
+ - <2>: Bus B, CS 0
+ - <3>: Bus B, CS 1
+
+Example showing the usage of two SPI NOR slave devices on bus A:
+
+fspi0: spi@20c0000 {
+ compatible = "nxp,lx2160a-fspi";
+ reg = <0x0 0x20c0000 0x0 0x10000>, <0x0 0x20000000 0x0 0x10000000>;
+ reg-names = "fspi_base", "fspi_mmap";
+ interrupts = <0 25 0x4>; /* Level high type */
+ clocks = <&clockgen 4 3>, <&clockgen 4 3>;
+ clock-names = "fspi_en", "fspi";
+
+ mt35xu512aba0: flash@0 {
+ reg = <0>;
+ ....
+ };
+
+ mt35xu512aba1: flash@1 {
+ reg = <1>;
+ ....
+ };
+};
diff --git a/Documentation/devicetree/bindings/spi/spi-sifive.txt b/Documentation/devicetree/bindings/spi/spi-sifive.txt
new file mode 100644
index 000000000000..3f5c6e438972
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi-sifive.txt
@@ -0,0 +1,37 @@
+SiFive SPI controller Device Tree Bindings
+------------------------------------------
+
+Required properties:
+- compatible : Should be "sifive,<chip>-spi" and "sifive,spi<version>".
+ Supported compatible strings are:
+ "sifive,fu540-c000-spi" for the SiFive SPI v0 as integrated
+ onto the SiFive FU540 chip, and "sifive,spi0" for the SiFive
+ SPI v0 IP block with no chip integration tweaks.
+ Please refer to sifive-blocks-ip-versioning.txt for details
+- reg : Physical base address and size of SPI registers map
+ A second (optional) range can indicate memory mapped flash
+- interrupts : Must contain one entry
+- interrupt-parent : Must be core interrupt controller
+- clocks : Must reference the frequency given to the controller
+- #address-cells : Must be '1', indicating which CS to use
+- #size-cells : Must be '0'
+
+Optional properties:
+- sifive,fifo-depth : Depth of hardware queues; defaults to 8
+- sifive,max-bits-per-word : Maximum bits per word; defaults to 8
+
+SPI RTL that corresponds to the IP block version numbers can be found here:
+https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/spi
+
+Example:
+ spi: spi@10040000 {
+ compatible = "sifive,fu540-c000-spi", "sifive,spi0";
+ reg = <0x0 0x10040000 0x0 0x1000 0x0 0x20000000 0x0 0x10000000>;
+ interrupt-parent = <&plic>;
+ interrupts = <51>;
+ clocks = <&tlclk>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ sifive,fifo-depth = <8>;
+ sifive,max-bits-per-word = <8>;
+ };
diff --git a/Documentation/devicetree/bindings/spi/spi-sprd.txt b/Documentation/devicetree/bindings/spi/spi-sprd.txt
index bad211a19da4..3c7eacce0ee3 100644
--- a/Documentation/devicetree/bindings/spi/spi-sprd.txt
+++ b/Documentation/devicetree/bindings/spi/spi-sprd.txt
@@ -14,6 +14,11 @@ Required properties:
address on the SPI bus. Should be set to 1.
- #size-cells: Should be set to 0.
+Optional properties:
+dma-names: Should contain names of the SPI used DMA channel.
+dmas: Should contain DMA channels and DMA slave ids which the SPI used
+ sorted in the same order as the dma-names property.
+
Example:
spi0: spi@70a00000{
compatible = "sprd,sc9860-spi";
@@ -21,6 +26,8 @@ spi0: spi@70a00000{
interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>;
clock-names = "spi", "source","enable";
clocks = <&clk_spi0>, <&ext_26m>, <&clk_ap_apb_gates 5>;
+ dma-names = "rx_chn", "tx_chn";
+ dmas = <&apdma 11 11>, <&apdma 12 12>;
#address-cells = <1>;
#size-cells = <0>;
};
diff --git a/Documentation/devicetree/bindings/spi/spi-stm32.txt b/Documentation/devicetree/bindings/spi/spi-stm32.txt
index 1b3fa2c119d5..d82755c63eaf 100644
--- a/Documentation/devicetree/bindings/spi/spi-stm32.txt
+++ b/Documentation/devicetree/bindings/spi/spi-stm32.txt
@@ -7,7 +7,9 @@ from 4 to 32-bit data size. Although it can be configured as master or slave,
only master is supported by the driver.
Required properties:
-- compatible: Must be "st,stm32h7-spi".
+- compatible: Should be one of:
+ "st,stm32h7-spi"
+ "st,stm32f4-spi"
- reg: Offset and length of the device's register set.
- interrupts: Must contain the interrupt id.
- clocks: Must contain an entry for spiclk (which feeds the internal clock
@@ -30,8 +32,9 @@ Child nodes represent devices on the SPI bus
See ../spi/spi-bus.txt
Optional properties:
-- st,spi-midi-ns: (Master Inter-Data Idleness) minimum time delay in
- nanoseconds inserted between two consecutive data frames.
+- st,spi-midi-ns: Only for STM32H7, (Master Inter-Data Idleness) minimum time
+ delay in nanoseconds inserted between two consecutive data
+ frames.
Example:
diff --git a/Documentation/devicetree/bindings/timer/fsl,imxgpt.txt b/Documentation/devicetree/bindings/timer/fsl,imxgpt.txt
index 9809b11f7180..5d8fd5b52598 100644
--- a/Documentation/devicetree/bindings/timer/fsl,imxgpt.txt
+++ b/Documentation/devicetree/bindings/timer/fsl,imxgpt.txt
@@ -2,17 +2,44 @@ Freescale i.MX General Purpose Timer (GPT)
Required properties:
-- compatible : should be "fsl,<soc>-gpt"
-- reg : Specifies base physical address and size of the registers.
-- interrupts : A list of 4 interrupts; one per timer channel.
-- clocks : The clocks provided by the SoC to drive the timer.
+- compatible : should be one of following:
+ for i.MX1:
+ - "fsl,imx1-gpt";
+ for i.MX21:
+ - "fsl,imx21-gpt";
+ for i.MX27:
+ - "fsl,imx27-gpt", "fsl,imx21-gpt";
+ for i.MX31:
+ - "fsl,imx31-gpt";
+ for i.MX25:
+ - "fsl,imx25-gpt", "fsl,imx31-gpt";
+ for i.MX50:
+ - "fsl,imx50-gpt", "fsl,imx31-gpt";
+ for i.MX51:
+ - "fsl,imx51-gpt", "fsl,imx31-gpt";
+ for i.MX53:
+ - "fsl,imx53-gpt", "fsl,imx31-gpt";
+ for i.MX6Q:
+ - "fsl,imx6q-gpt", "fsl,imx31-gpt";
+ for i.MX6DL:
+ - "fsl,imx6dl-gpt";
+ for i.MX6SL:
+ - "fsl,imx6sl-gpt", "fsl,imx6dl-gpt";
+ for i.MX6SX:
+ - "fsl,imx6sx-gpt", "fsl,imx6dl-gpt";
+- reg : specifies base physical address and size of the registers.
+- interrupts : should be the gpt interrupt.
+- clocks : the clocks provided by the SoC to drive the timer, must contain
+ an entry for each entry in clock-names.
+- clock-names : must include "ipg" entry first, then "per" entry.
Example:
gpt1: timer@10003000 {
- compatible = "fsl,imx27-gpt", "fsl,imx1-gpt";
+ compatible = "fsl,imx27-gpt", "fsl,imx21-gpt";
reg = <0x10003000 0x1000>;
interrupts = <26>;
- clocks = <&clks 46>, <&clks 61>;
+ clocks = <&clks IMX27_CLK_GPT1_IPG_GATE>,
+ <&clks IMX27_CLK_PER1_GATE>;
clock-names = "ipg", "per";
};
diff --git a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
index 18d4d0166c76..ff7c567a7972 100644
--- a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
+++ b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
@@ -1,7 +1,7 @@
-Mediatek Timers
+MediaTek Timers
---------------
-Mediatek SoCs have two different timers on different platforms,
+MediaTek SoCs have two different timers on different platforms,
- GPT (General Purpose Timer)
- SYST (System Timer)
@@ -9,6 +9,7 @@ The proper timer will be selected automatically by driver.
Required properties:
- compatible should contain:
+ For those SoCs that use GPT
* "mediatek,mt2701-timer" for MT2701 compatible timers (GPT)
* "mediatek,mt6580-timer" for MT6580 compatible timers (GPT)
* "mediatek,mt6589-timer" for MT6589 compatible timers (GPT)
@@ -17,7 +18,11 @@ Required properties:
* "mediatek,mt8135-timer" for MT8135 compatible timers (GPT)
* "mediatek,mt8173-timer" for MT8173 compatible timers (GPT)
* "mediatek,mt6577-timer" for MT6577 and all above compatible timers (GPT)
- * "mediatek,mt6765-timer" for MT6765 compatible timers (SYST)
+
+ For those SoCs that use SYST
+ * "mediatek,mt7629-timer" for MT7629 compatible timers (SYST)
+ * "mediatek,mt6765-timer" for MT6765 and all above compatible timers (SYST)
+
- reg: Should contain location and length for timer register.
- clocks: Should contain system clock.
diff --git a/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt b/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt
new file mode 100644
index 000000000000..032cda96fe0d
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt
@@ -0,0 +1,36 @@
+NVIDIA Tegra210 timer
+
+The Tegra210 timer provides fourteen 29-bit timer counters and one 32-bit
+timestamp counter. The TMRs run at either a fixed 1 MHz clock rate derived
+from the oscillator clock (TMR0-TMR9) or directly at the oscillator clock
+(TMR10-TMR13). Each TMR can be programmed to generate one-shot, periodic,
+or watchdog interrupts.
+
+Required properties:
+- compatible : "nvidia,tegra210-timer".
+- reg : Specifies base physical address and size of the registers.
+- interrupts : A list of 14 interrupts; one per each timer channels 0 through
+ 13.
+- clocks : Must contain one entry, for the module clock.
+ See ../clocks/clock-bindings.txt for details.
+
+timer@60005000 {
+ compatible = "nvidia,tegra210-timer";
+ reg = <0x0 0x60005000 0x0 0x400>;
+ interrupts = <GIC_SPI 156 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 42 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 121 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 152 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 153 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 154 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 155 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 177 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 178 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 179 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&tegra_car TEGRA210_CLK_TIMER>;
+ clock-names = "timer";
+};
diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index 862a80f0380a..c0594450e9ef 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -32,6 +32,8 @@ Required Properties:
- "renesas,r8a77470-cmt1" for the 48-bit CMT1 device included in r8a77470.
- "renesas,r8a774a1-cmt0" for the 32-bit CMT0 device included in r8a774a1.
- "renesas,r8a774a1-cmt1" for the 48-bit CMT1 device included in r8a774a1.
+ - "renesas,r8a774c0-cmt0" for the 32-bit CMT0 device included in r8a774c0.
+ - "renesas,r8a774c0-cmt1" for the 48-bit CMT1 device included in r8a774c0.
- "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790.
- "renesas,r8a7790-cmt1" for the 48-bit CMT1 device included in r8a7790.
- "renesas,r8a7791-cmt0" for the 32-bit CMT0 device included in r8a7791.
diff --git a/Documentation/devicetree/bindings/timer/renesas,tmu.txt b/Documentation/devicetree/bindings/timer/renesas,tmu.txt
index 4ddff85837da..13ad07416bdd 100644
--- a/Documentation/devicetree/bindings/timer/renesas,tmu.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,tmu.txt
@@ -10,6 +10,7 @@ Required Properties:
- compatible: must contain one or more of the following:
- "renesas,tmu-r8a7740" for the r8a7740 TMU
+ - "renesas,tmu-r8a774c0" for the r8a774C0 TMU
- "renesas,tmu-r8a7778" for the r8a7778 TMU
- "renesas,tmu-r8a7779" for the r8a7779 TMU
- "renesas,tmu-r8a77970" for the r8a77970 TMU
diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index cc64ec63a6ad..d79fb22bde39 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -322,6 +322,8 @@ properties:
- ti,ads7830
# Temperature Monitoring and Fan Control
- ti,amc6821
+ # Temperature sensor with integrated fan control
+ - ti,lm96000
# I2C Touch-Screen Controller
- ti,tsc2003
# Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface
diff --git a/Documentation/driver-api/80211/mac80211.rst b/Documentation/driver-api/80211/mac80211.rst
index 85a8335e80b6..eab40bcf3987 100644
--- a/Documentation/driver-api/80211/mac80211.rst
+++ b/Documentation/driver-api/80211/mac80211.rst
@@ -126,6 +126,9 @@ functions/definitions
:functions: ieee80211_rx_status
.. kernel-doc:: include/net/mac80211.h
+ :functions: mac80211_rx_encoding_flags
+
+.. kernel-doc:: include/net/mac80211.h
:functions: mac80211_rx_flags
.. kernel-doc:: include/net/mac80211.h
diff --git a/Documentation/hwmon/lm85 b/Documentation/hwmon/lm85
index 7c49feaa79d2..2329c383efe4 100644
--- a/Documentation/hwmon/lm85
+++ b/Documentation/hwmon/lm85
@@ -3,9 +3,13 @@ Kernel driver lm85
Supported chips:
* National Semiconductor LM85 (B and C versions)
- Prefix: 'lm85'
+ Prefix: 'lm85b' or 'lm85c'
Addresses scanned: I2C 0x2c, 0x2d, 0x2e
Datasheet: http://www.national.com/pf/LM/LM85.html
+ * Texas Instruments LM96000
+ Prefix: 'lm9600'
+ Addresses scanned: I2C 0x2c, 0x2d, 0x2e
+ Datasheet: http://www.ti.com/lit/ds/symlink/lm96000.pdf
* Analog Devices ADM1027
Prefix: 'adm1027'
Addresses scanned: I2C 0x2c, 0x2d, 0x2e
@@ -136,6 +140,9 @@ of voltage and temperature channels.
SMSC EMC6D103S is similar to EMC6D103, but does not support pwm#_auto_pwm_minctl
and temp#_auto_temp_off.
+The LM96000 supports additional high frequency PWM modes (22.5 kHz, 24 kHz,
+25.7 kHz, 27.7 kHz and 30 kHz), which can be configured on a per-PWM basis.
+
Hardware Configurations
-----------------------
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 4ae4f9d8f8fe..e14d7d40fc75 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -295,6 +295,41 @@ using::
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
can be displayed with "-h", as usual.
+FAQ
+=======
+
+Q: I am not seeing any traffic on the socket. What am I doing wrong?
+
+A: When a netdev of a physical NIC is initialized, Linux usually
+ allocates one Rx and Tx queue pair per core. So on a 8 core system,
+ queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
+ bind call or the xsk_socket__create libbpf function call, you
+ specify a specific queue id to bind to and it is only the traffic
+ towards that queue you are going to get on you socket. So in the
+ example above, if you bind to queue 0, you are NOT going to get any
+ traffic that is distributed to queues 1 through 7. If you are
+ lucky, you will see the traffic, but usually it will end up on one
+ of the queues you have not bound to.
+
+ There are a number of ways to solve the problem of getting the
+ traffic you want to the queue id you bound to. If you want to see
+ all the traffic, you can force the netdev to only have 1 queue, queue
+ id 0, and then bind to queue 0. You can use ethtool to do this::
+
+ sudo ethtool -L <interface> combined 1
+
+ If you want to only see part of the traffic, you can program the
+ NIC through ethtool to filter out your traffic to a single queue id
+ that you can bind your XDP socket to. Here is one example in which
+ UDP traffic to and from port 4242 are sent to queue 2::
+
+ sudo ethtool -N <interface> rx-flow-hash udp4 fn
+ sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
+ 4242 action 2
+
+ A number of other ways are possible all up to the capabilitites of
+ the NIC you have.
+
Credits
=======
@@ -309,4 +344,3 @@ Credits
- Michael S. Tsirkin
- Qi Z Zhang
- Willem de Bruijn
-
diff --git a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
index a188466b6698..5045df990a4c 100644
--- a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
@@ -27,11 +27,12 @@ Driver Overview
The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
provides services that:
- A) allow other drivers, such as the Ethernet driver, to enqueue and dequeue
+
+ A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
frames for their respective objects
- B) allow drivers to register callbacks for data availability notifications
+ B. allow drivers to register callbacks for data availability notifications
when data becomes available on a queue or channel
- C) allow drivers to manage hardware buffer pools
+ C. allow drivers to manage hardware buffer pools
The Linux DPIO driver consists of 3 primary components--
DPIO object driver-- fsl-mc driver that manages the DPIO object
@@ -140,11 +141,10 @@ QBman portal interface (qbman-portal.c)
The qbman-portal component provides APIs to do the low level hardware
bit twiddling for operations such as:
- -initializing Qman software portals
-
- -building and sending portal commands
- -portal interrupt configuration and processing
+ - initializing Qman software portals
+ - building and sending portal commands
+ - portal interrupt configuration and processing
The qbman-portal APIs are not public to other drivers, and are
only used by dpio-service.
diff --git a/Documentation/networking/device_drivers/intel/e100.rst b/Documentation/networking/device_drivers/intel/e100.rst
index 5e2839b4ec92..2b9f4887beda 100644
--- a/Documentation/networking/device_drivers/intel/e100.rst
+++ b/Documentation/networking/device_drivers/intel/e100.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+==============================================================
Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
==============================================================
diff --git a/Documentation/networking/device_drivers/intel/e1000.rst b/Documentation/networking/device_drivers/intel/e1000.rst
index 6379d4d20771..956560b6e745 100644
--- a/Documentation/networking/device_drivers/intel/e1000.rst
+++ b/Documentation/networking/device_drivers/intel/e1000.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+===========================================================
Linux* Base Driver for Intel(R) Ethernet Network Connection
===========================================================
diff --git a/Documentation/networking/device_drivers/intel/e1000e.rst b/Documentation/networking/device_drivers/intel/e1000e.rst
index 33554e5416c5..01999f05509c 100644
--- a/Documentation/networking/device_drivers/intel/e1000e.rst
+++ b/Documentation/networking/device_drivers/intel/e1000e.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+======================================================
Linux* Driver for Intel(R) Ethernet Network Connection
======================================================
diff --git a/Documentation/networking/device_drivers/intel/fm10k.rst b/Documentation/networking/device_drivers/intel/fm10k.rst
index bf5e5942f28d..ac3269e34f55 100644
--- a/Documentation/networking/device_drivers/intel/fm10k.rst
+++ b/Documentation/networking/device_drivers/intel/fm10k.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+==============================================================
Linux* Base Driver for Intel(R) Ethernet Multi-host Controller
==============================================================
diff --git a/Documentation/networking/device_drivers/intel/i40e.rst b/Documentation/networking/device_drivers/intel/i40e.rst
index 0cc16c525d10..848fd388fa6e 100644
--- a/Documentation/networking/device_drivers/intel/i40e.rst
+++ b/Documentation/networking/device_drivers/intel/i40e.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+==================================================================
Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series
==================================================================
diff --git a/Documentation/networking/device_drivers/intel/iavf.rst b/Documentation/networking/device_drivers/intel/iavf.rst
index f8b42b64eb28..2d0c3baa1752 100644
--- a/Documentation/networking/device_drivers/intel/iavf.rst
+++ b/Documentation/networking/device_drivers/intel/iavf.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+==================================================================
Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function
==================================================================
diff --git a/Documentation/networking/device_drivers/intel/ice.rst b/Documentation/networking/device_drivers/intel/ice.rst
index 4d118b827bbb..c220aa2711c6 100644
--- a/Documentation/networking/device_drivers/intel/ice.rst
+++ b/Documentation/networking/device_drivers/intel/ice.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+===================================================================
Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series
===================================================================
diff --git a/Documentation/networking/device_drivers/intel/igb.rst b/Documentation/networking/device_drivers/intel/igb.rst
index e87a4a72ea2d..fc8cfaa5dcfa 100644
--- a/Documentation/networking/device_drivers/intel/igb.rst
+++ b/Documentation/networking/device_drivers/intel/igb.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+===========================================================
Linux* Base Driver for Intel(R) Ethernet Network Connection
===========================================================
diff --git a/Documentation/networking/device_drivers/intel/igbvf.rst b/Documentation/networking/device_drivers/intel/igbvf.rst
index a8a9ffa4f8d3..9cddabe8108e 100644
--- a/Documentation/networking/device_drivers/intel/igbvf.rst
+++ b/Documentation/networking/device_drivers/intel/igbvf.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+============================================================
Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet
============================================================
diff --git a/Documentation/networking/device_drivers/intel/ixgb.rst b/Documentation/networking/device_drivers/intel/ixgb.rst
index 8bd80e27843d..945018207a92 100644
--- a/Documentation/networking/device_drivers/intel/ixgb.rst
+++ b/Documentation/networking/device_drivers/intel/ixgb.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+=====================================================================
Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
=====================================================================
diff --git a/Documentation/networking/device_drivers/intel/ixgbe.rst b/Documentation/networking/device_drivers/intel/ixgbe.rst
index 86d887a63606..c7d25483fedb 100644
--- a/Documentation/networking/device_drivers/intel/ixgbe.rst
+++ b/Documentation/networking/device_drivers/intel/ixgbe.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+=============================================================================
Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
=============================================================================
diff --git a/Documentation/networking/device_drivers/intel/ixgbevf.rst b/Documentation/networking/device_drivers/intel/ixgbevf.rst
index 56cde6366c2f..5d4977360157 100644
--- a/Documentation/networking/device_drivers/intel/ixgbevf.rst
+++ b/Documentation/networking/device_drivers/intel/ixgbevf.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+
+=============================================================
Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet
=============================================================
diff --git a/Documentation/networking/device_drivers/stmicro/stmmac.txt b/Documentation/networking/device_drivers/stmicro/stmmac.txt
index 2bb07078f535..1ae979fd90d2 100644
--- a/Documentation/networking/device_drivers/stmicro/stmmac.txt
+++ b/Documentation/networking/device_drivers/stmicro/stmmac.txt
@@ -267,7 +267,7 @@ static struct fixed_phy_status stmmac0_fixed_phy_status = {
During the board's device_init we can configure the first
MAC for fixed_link by calling:
- fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status, -1);
+ fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status);
and the second one, with a real PHY device attached to the bus,
by using the stmmac_mdio_bus_data structure (to provide the id, the
reset procedure etc).
diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 000000000000..1db3fbea0831
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+ information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+ * A log is being send to the kernel trace events buffer
+ * Health status and statistics are being updated for the reporter instance
+ * Object dump is being taken and saved at the reporter instance (as long as
+ there is no other dump which is already stored)
+ * Auto recovery attempt is being done. Depends on:
+ - Auto-recovery configuration
+ - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+ Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+ Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+ Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+ Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
+ Retrieves the last stored dump. Devlink health
+ saves a single dump. If an dump is not already stored by the devlink
+ for this reporter, devlink generates a new dump.
+ dump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
+ Clears the last saved dump file for the specified reporter.
+
+
+ netlink
+ +--------------------------+
+ | |
+ | + |
+ | | |
+ +--------------------------+
+ |request for ops
+ |(diagnose,
+ mlx5_core devlink |recover,
+ |dump)
++--------+ +--------------------------+
+| | | reporter| |
+| | | +---------v----------+ |
+| | ops execution | | | |
+| <----------------------------------+ | |
+| | | | | |
+| | | + ^------------------+ |
+| | | | request for ops |
+| | | | (recover, dump) |
+| | | | |
+| | | +-+------------------+ |
+| | health report | | health handler | |
+| +-------------------------------> | |
+| | | +--------------------+ |
+| | health reporter create | |
+| +----------------------------> |
++--------+ +--------------------------+
diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink-info-versions.rst
new file mode 100644
index 000000000000..c79ad8593383
--- /dev/null
+++ b/Documentation/networking/devlink-info-versions.rst
@@ -0,0 +1,43 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=====================
+Devlink info versions
+=====================
+
+board.id
+========
+
+Unique identifier of the board design.
+
+board.rev
+=========
+
+Board design revision.
+
+board.manufacture
+=================
+
+An identifier of the company or the facility which produced the part.
+
+fw.mgmt
+=======
+
+Control unit firmware version. This firmware is responsible for house
+keeping tasks, PHY control etc. but not the packet-by-packet data path
+operation.
+
+fw.app
+======
+
+Data path microcode controlling high-speed packet processing.
+
+fw.undi
+=======
+
+UNDI software, may include the UEFI driver, firmware or both.
+
+fw.ncsi
+=======
+
+Version of the software responsible for supporting/handling the
+Network Controller Sideband Interface.
diff --git a/Documentation/networking/devlink-params-mlxsw.txt b/Documentation/networking/devlink-params-mlxsw.txt
new file mode 100644
index 000000000000..c63ea9fc7009
--- /dev/null
+++ b/Documentation/networking/devlink-params-mlxsw.txt
@@ -0,0 +1,10 @@
+fw_load_policy [DEVICE, GENERIC]
+ Configuration mode: driverinit
+
+acl_region_rehash_interval [DEVICE, DRIVER-SPECIFIC]
+ Sets an interval for periodic ACL region rehashes.
+ The value is in milliseconds, minimal value is "3000".
+ Value "0" disables the periodic work.
+ The first rehash will be run right after value is set.
+ Type: u32
+ Configuration mode: runtime
diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt
index 25170ad7d25b..43ef767bc440 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -236,19 +236,6 @@ description.
Design limitations
==================
-DSA is a platform device driver
--------------------------------
-
-DSA is implemented as a DSA platform device driver which is convenient because
-it will register the entire DSA switch tree attached to a master network device
-in one-shot, facilitating the device creation and simplifying the device driver
-model a bit, this comes however with a number of limitations:
-
-- building DSA and its switch drivers as modules is currently not working
-- the device driver parenting does not necessarily reflect the original
- bus/device the switch can be created from
-- supporting non-MDIO and non-MMIO (platform) switches is not possible
-
Limits on the number of devices and ports
-----------------------------------------
@@ -533,16 +520,12 @@ Bridge VLAN filtering
function that the driver has to call for each VLAN the given port is a member
of. A switchdev object is used to carry the VID and bridge flags.
-- port_fdb_prepare: bridge layer function invoked when the bridge prepares the
- installation of a Forwarding Database entry. If the operation is not
- supported, this function should return -EOPNOTSUPP to inform the bridge code
- to fallback to a software implementation. No hardware setup must be done in
- this function. See port_fdb_add for this and details.
-
- port_fdb_add: bridge layer function invoked when the bridge wants to install a
Forwarding Database entry, the switch hardware should be programmed with the
specified address in the specified VLAN Id in the forwarding database
- associated with this VLAN ID
+ associated with this VLAN ID. If the operation is not supported, this
+ function should return -EOPNOTSUPP to inform the bridge code to fallback to
+ a software implementation.
Note: VLAN ID 0 corresponds to the port private database, which, in the context
of DSA, would be the its port-based VLAN, used by the associated bridge device.
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 2196b824e96c..319e5e041f38 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -464,10 +464,11 @@ breakpoints: 0 1
JIT compiler
------------
-The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC,
-ARM, ARM64, MIPS and s390 and can be enabled through CONFIG_BPF_JIT. The JIT
-compiler is transparently invoked for each attached filter from user space
-or for internal kernel users if it has been previously enabled by root:
+The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
+PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
+CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
+attached filter from user space or for internal kernel users if it has
+been previously enabled by root:
echo 1 > /proc/sys/net/core/bpf_jit_enable
@@ -603,9 +604,10 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g.
skb pointer). All constraints and restrictions from bpf_check_classic() apply
before a conversion to the new layout is being done behind the scenes!
-Currently, the classic BPF format is being used for JITing on most 32-bit
-architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform
-JIT compilation from eBPF instruction set.
+Currently, the classic BPF format is being used for JITing on most
+32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
+sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
+instruction set.
Some core changes of the new internal format:
@@ -827,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack.
-Internal BPF can used as generic assembler for last step performance
+Internal BPF can be used as a generic assembler for last step performance
optimizations, socket filters and seccomp are using it as assembler. Tracing
filters may use it as assembler to generate code from kernel. In kernel usage
may not be bounded by security considerations, since generated internal BPF code
@@ -865,7 +867,7 @@ Three LSB bits store instruction class which is one of:
BPF_STX 0x03 BPF_STX 0x03
BPF_ALU 0x04 BPF_ALU 0x04
BPF_JMP 0x05 BPF_JMP 0x05
- BPF_RET 0x06 [ class 6 unused, for future if needed ]
+ BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
@@ -902,9 +904,9 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */
-If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
+If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
- BPF_JA 0x00
+ BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10
BPF_JGT 0x20
BPF_JGE 0x30
@@ -912,8 +914,8 @@ If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
BPF_JNE 0x50 /* eBPF only: jump != */
BPF_JSGT 0x60 /* eBPF only: signed '>' */
BPF_JSGE 0x70 /* eBPF only: signed '>=' */
- BPF_CALL 0x80 /* eBPF only: function call */
- BPF_EXIT 0x90 /* eBPF only: function return */
+ BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */
+ BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */
BPF_JLT 0xa0 /* eBPF only: unsigned '<' */
BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */
BPF_JSLT 0xc0 /* eBPF only: signed '<' */
@@ -936,8 +938,9 @@ Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
-value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
-unused and reserved for future use.
+value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
+BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
+operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as:
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.rst
index e74d8e1da0e2..36ca823a1122 100644
--- a/Documentation/networking/ieee802154.txt
+++ b/Documentation/networking/ieee802154.rst
@@ -1,54 +1,79 @@
-
- Linux IEEE 802.15.4 implementation
-
+===============================
+IEEE 802.15.4 Developer's Guide
+===============================
Introduction
============
The IEEE 802.15.4 working group focuses on standardization of the bottom
two layers: Medium Access Control (MAC) and Physical access (PHY). And there
are mainly two options available for upper layers:
- - ZigBee - proprietary protocol from the ZigBee Alliance
- - 6LoWPAN - IPv6 networking over low rate personal area networks
+
+- ZigBee - proprietary protocol from the ZigBee Alliance
+- 6LoWPAN - IPv6 networking over low rate personal area networks
The goal of the Linux-wpan is to provide a complete implementation
of the IEEE 802.15.4 and 6LoWPAN protocols. IEEE 802.15.4 is a stack
of protocols for organizing Low-Rate Wireless Personal Area Networks.
The stack is composed of three main parts:
- - IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
- the generic Linux networking stack to transfer IEEE 802.15.4 data
- messages and a special protocol over netlink for configuration/management
- - MAC - provides access to shared channel and reliable data delivery
- - PHY - represents device drivers
+- IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
+ the generic Linux networking stack to transfer IEEE 802.15.4 data
+ messages and a special protocol over netlink for configuration/management
+- MAC - provides access to shared channel and reliable data delivery
+- PHY - represents device drivers
Socket API
==========
-int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
-.....
+.. c:function:: int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
The address family, socket addresses etc. are defined in the
include/net/af_ieee802154.h header or in the special header
in the userspace package (see either http://wpan.cakelab.org/ or the
git tree at https://github.com/linux-wpan/wpan-tools).
+6LoWPAN Linux implementation
+============================
+
+The IEEE 802.15.4 standard specifies an MTU of 127 bytes, yielding about 80
+octets of actual MAC payload once security is turned on, on a wireless link
+with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format
+[RFC4944] was specified to carry IPv6 datagrams over such constrained links,
+taking into account limited bandwidth, memory, or energy resources that are
+expected in applications such as wireless Sensor Networks. [RFC4944] defines
+a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header
+to support the IPv6 minimum MTU requirement [RFC2460], and stateless header
+compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the
+relatively large IPv6 and UDP headers down to (in the best case) several bytes.
+
+In September 2011 the standard update was published - [RFC6282].
+It deprecates HC1 and HC2 compression and defines IPHC encoding format which is
+used in this Linux implementation.
+
+All the code related to 6lowpan you may find in files: net/6lowpan/*
+and net/ieee802154/6lowpan/*
+
+To setup a 6LoWPAN interface you need:
+1. Add IEEE802.15.4 interface and set channel and PAN ID;
+2. Add 6lowpan interface by command like:
+# ip link add link wpan0 name lowpan0 type lowpan
+3. Bring up 'lowpan0' interface
-Kernel side
-=============
+Drivers
+=======
Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
- exports a management (e.g. MLME) and data API.
+exports a management (e.g. MLME) and data API.
2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
- possibly with some kinds of acceleration like automatic CRC computation and
- comparation, automagic ACK handling, address matching, etc.
+possibly with some kinds of acceleration like automatic CRC computation and
+comparation, automagic ACK handling, address matching, etc.
Those types of devices require different approach to be hooked into Linux kernel.
-
HardMAC
-=======
+-------
See the header include/net/ieee802154_netdev.h. You have to implement Linux
net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
@@ -64,9 +89,8 @@ net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
All other fields are required.
-
SoftMAC
-=======
+-------
The MAC is the middle layer in the IEEE 802.15.4 Linux stack. This moment it
provides interface for drivers registration and management of slave interfaces.
@@ -79,99 +103,78 @@ This layer is going to be extended soon.
See header include/net/mac802154.h and several drivers in
drivers/net/ieee802154/.
+Fake drivers
+------------
+
+In addition there is a driver available which simulates a real device with
+SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
+provides a possibility to test and debug the stack without usage of real hardware.
Device drivers API
==================
The include/net/mac802154.h defines following functions:
- - struct ieee802154_hw *
- ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops):
- allocation of IEEE 802.15.4 compatible hardware device
- - void ieee802154_free_hw(struct ieee802154_hw *hw):
- freeing allocated hardware device
+.. c:function:: struct ieee802154_dev *ieee802154_alloc_device (size_t priv_size, struct ieee802154_ops *ops)
- - int ieee802154_register_hw(struct ieee802154_hw *hw):
- register PHY which is the allocated hardware device, in the system
+Allocation of IEEE 802.15.4 compatible device.
- - void ieee802154_unregister_hw(struct ieee802154_hw *hw):
- freeing registered PHY
+.. c:function:: void ieee802154_free_device(struct ieee802154_dev *dev)
- - void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb,
- u8 lqi):
- telling 802.15.4 module there is a new received frame in the skb with
- the RF Link Quality Indicator (LQI) from the hardware device
+Freeing allocated device.
- - void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
- bool ifs_handling):
- telling 802.15.4 module the frame in the skb is or going to be
- transmitted through the hardware device
+.. c:function:: int ieee802154_register_device(struct ieee802154_dev *dev)
+
+Register PHY in the system.
+
+.. c:function:: void ieee802154_unregister_device(struct ieee802154_dev *dev)
+
+Freeing registered PHY.
+
+.. c:function:: void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb, u8 lqi):
+
+Telling 802.15.4 module there is a new received frame in the skb with
+the RF Link Quality Indicator (LQI) from the hardware device.
+
+.. c:function:: void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb, bool ifs_handling):
+
+Telling 802.15.4 module the frame in the skb is or going to be
+transmitted through the hardware device
The device driver must implement the following callbacks in the IEEE 802.15.4
-operations structure at least:
-struct ieee802154_ops {
- ...
- int (*start)(struct ieee802154_hw *hw);
- void (*stop)(struct ieee802154_hw *hw);
- ...
- int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
- int (*ed)(struct ieee802154_hw *hw, u8 *level);
- int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
- ...
-};
-
- - int start(struct ieee802154_hw *hw):
- handler that 802.15.4 module calls for the hardware device initialization.
-
- - void stop(struct ieee802154_hw *hw):
- handler that 802.15.4 module calls for the hardware device cleanup.
-
- - int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
- handler that 802.15.4 module calls for each frame in the skb going to be
- transmitted through the hardware device.
-
- - int ed(struct ieee802154_hw *hw, u8 *level):
- handler that 802.15.4 module calls for Energy Detection from the hardware
- device.
-
- - int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
- set radio for listening on specific channel of the hardware device.
+operations structure at least::
-Moreover IEEE 802.15.4 device operations structure should be filled.
+ struct ieee802154_ops {
+ ...
+ int (*start)(struct ieee802154_hw *hw);
+ void (*stop)(struct ieee802154_hw *hw);
+ ...
+ int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
+ int (*ed)(struct ieee802154_hw *hw, u8 *level);
+ int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
+ ...
+ };
-Fake drivers
-============
+.. c:function:: int start(struct ieee802154_hw *hw):
-In addition there is a driver available which simulates a real device with
-SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
-provides a possibility to test and debug the stack without usage of real hardware.
+Handler that 802.15.4 module calls for the hardware device initialization.
-See sources in drivers/net/ieee802154 folder for more details.
+.. c:function:: void stop(struct ieee802154_hw *hw):
+Handler that 802.15.4 module calls for the hardware device cleanup.
-6LoWPAN Linux implementation
-============================
+.. c:function:: int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
-The IEEE 802.15.4 standard specifies an MTU of 127 bytes, yielding about 80
-octets of actual MAC payload once security is turned on, on a wireless link
-with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format
-[RFC4944] was specified to carry IPv6 datagrams over such constrained links,
-taking into account limited bandwidth, memory, or energy resources that are
-expected in applications such as wireless Sensor Networks. [RFC4944] defines
-a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header
-to support the IPv6 minimum MTU requirement [RFC2460], and stateless header
-compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the
-relatively large IPv6 and UDP headers down to (in the best case) several bytes.
+Handler that 802.15.4 module calls for each frame in the skb going to be
+transmitted through the hardware device.
-In September 2011 the standard update was published - [RFC6282].
-It deprecates HC1 and HC2 compression and defines IPHC encoding format which is
-used in this Linux implementation.
+.. c:function:: int ed(struct ieee802154_hw *hw, u8 *level):
-All the code related to 6lowpan you may find in files: net/6lowpan/*
-and net/ieee802154/6lowpan/*
+Handler that 802.15.4 module calls for Energy Detection from the hardware
+device.
-To setup a 6LoWPAN interface you need:
-1. Add IEEE802.15.4 interface and set channel and PAN ID;
-2. Add 6lowpan interface by command like:
- # ip link add link wpan0 name lowpan0 type lowpan
-3. Bring up 'lowpan0' interface
+.. c:function:: int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
+
+Set radio for listening on specific channel of the hardware device.
+
+Moreover IEEE 802.15.4 device operations structure should be filled.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 59e86de662cd..f0da1b001514 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -24,11 +24,15 @@ Contents:
device_drivers/intel/i40e
device_drivers/intel/iavf
device_drivers/intel/ice
+ devlink-info-versions
+ ieee802154
kapi
z8530book
msg_zerocopy
failover
net_failover
+ phy
+ sfp-phylink
alias
bridge
snmp_counter
diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst
index fe46d4867e2d..18c1415e7bfa 100644
--- a/Documentation/networking/msg_zerocopy.rst
+++ b/Documentation/networking/msg_zerocopy.rst
@@ -7,7 +7,7 @@ Intro
=====
The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
-The feature is currently implemented for TCP sockets.
+The feature is currently implemented for TCP and UDP sockets.
Opportunity and Caveats
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index 355c6d8ef8ad..b203d1334822 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -22,8 +22,9 @@ and changeable from userspace under certain rules.
2. Querying from userspace
Both admin and operational state can be queried via the netlink
-operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK
-to be notified of updates. This is important for setting from userspace.
+operation RTM_GETLINK. It is also possible to subscribe to RTNLGRP_LINK
+to be notified of updates while the interface is admin up. This is
+important for setting from userspace.
These values contain interface state:
@@ -101,8 +102,9 @@ because some driver controlled protocol establishment has to
complete. Corresponding functions are netif_dormant_on() to set the
flag, netif_dormant_off() to clear it and netif_dormant() to query.
-On device allocation, networking core sets the flags equivalent to
-netif_carrier_ok() and !netif_dormant().
+On device allocation, both flags __LINK_STATE_NOCARRIER and
+__LINK_STATE_DORMANT are cleared, so the effective state is equivalent
+to netif_carrier_ok() and !netif_dormant().
Whenever the driver CHANGES one of these flags, a workqueue event is
@@ -133,11 +135,11 @@ netif_carrier_ok() && !netif_dormant() is set by the
driver. Afterwards, the userspace application can set IFLA_OPERSTATE
to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
netif_carrier_off() or netif_dormant_on(). Changes made by userspace
-are multicasted on the netlink group RTMGRP_LINK.
+are multicasted on the netlink group RTNLGRP_LINK.
So basically a 802.1X supplicant interacts with the kernel like this:
--subscribe to RTMGRP_LINK
+-subscribe to RTNLGRP_LINK
-set IFLA_LINKMODE to 1 via RTM_SETLINK
-query RTM_GETLINK once to get initial state
-if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst
new file mode 100644
index 000000000000..0dd90d7df5ec
--- /dev/null
+++ b/Documentation/networking/phy.rst
@@ -0,0 +1,447 @@
+=====================
+PHY Abstraction Layer
+=====================
+
+Purpose
+=======
+
+Most network devices consist of set of registers which provide an interface
+to a MAC layer, which communicates with the physical connection through a
+PHY. The PHY concerns itself with negotiating link parameters with the link
+partner on the other side of the network connection (typically, an ethernet
+cable), and provides a register interface to allow drivers to determine what
+settings were chosen, and to configure what settings are allowed.
+
+While these devices are distinct from the network devices, and conform to a
+standard layout for the registers, it has been common practice to integrate
+the PHY management code with the network driver. This has resulted in large
+amounts of redundant code. Also, on embedded systems with multiple (and
+sometimes quite different) ethernet controllers connected to the same
+management bus, it is difficult to ensure safe use of the bus.
+
+Since the PHYs are devices, and the management busses through which they are
+accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
+In doing so, it has these goals:
+
+#. Increase code-reuse
+#. Increase overall code-maintainability
+#. Speed development time for new network drivers, and for new systems
+
+Basically, this layer is meant to provide an interface to PHY devices which
+allows network driver writers to write as little code as possible, while
+still providing a full feature set.
+
+The MDIO bus
+============
+
+Most network devices are connected to a PHY by means of a management bus.
+Different devices use different busses (though some share common interfaces).
+In order to take advantage of the PAL, each bus interface needs to be
+registered as a distinct device.
+
+#. read and write functions must be implemented. Their prototypes are::
+
+ int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
+ int read(struct mii_bus *bus, int mii_id, int regnum);
+
+ mii_id is the address on the bus for the PHY, and regnum is the register
+ number. These functions are guaranteed not to be called from interrupt
+ time, so it is safe for them to block, waiting for an interrupt to signal
+ the operation is complete
+
+#. A reset function is optional. This is used to return the bus to an
+ initialized state.
+
+#. A probe function is needed. This function should set up anything the bus
+ driver needs, setup the mii_bus structure, and register with the PAL using
+ mdiobus_register. Similarly, there's a remove function to undo all of
+ that (use mdiobus_unregister).
+
+#. Like any driver, the device_driver structure must be configured, and init
+ exit functions are used to register the driver.
+
+#. The bus must also be declared somewhere as a device, and registered.
+
+As an example for how one driver implemented an mdio bus driver, see
+drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
+for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
+
+(RG)MII/electrical interface considerations
+===========================================
+
+The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
+electrical signal interface using a synchronous 125Mhz clock signal and several
+data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+sink) have enough setup and hold times to sample the data lines correctly. The
+PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+the PHY driver and optionally the MAC driver, implement the required delay. The
+values of phy_interface_t must be understood from the perspective of the PHY
+device itself, leading to the following:
+
+* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+ internal delay by itself, it assumes that either the Ethernet MAC (if capable
+ or the PCB traces) insert the correct 1.5-2ns delay
+
+* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
+ for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
+ for the receive data lines (RXD[3:0]) processed by the PHY device
+
+* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
+ both transmit AND receive data lines from/to the PHY device
+
+Whenever possible, use the PHY side RGMII delay for these reasons:
+
+* PHY devices may offer sub-nanosecond granularity in how they allow a
+ receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+ precision may be required to account for differences in PCB trace lengths
+
+* PHY devices are typically qualified for a large range of applications
+ (industrial, medical, automotive...), and they provide a constant and
+ reliable delay across temperature/pressure/voltage ranges
+
+* PHY device drivers in PHYLIB being reusable by nature, being able to
+ configure correctly a specified delay enables more designs with similar delay
+ requirements to be operate correctly
+
+For cases where the PHY is not capable of providing this delay, but the
+Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
+should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+configured correctly in order to provide the required transmit and/or receive
+side delay from the perspective of the PHY device. Conversely, if the Ethernet
+MAC driver looks at the phy_interface_t value, for any other mode but
+PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+disabled.
+
+In case neither the Ethernet MAC, nor the PHY are capable of providing the
+required delays, as defined per the RGMII standard, several options may be
+available:
+
+* Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+ set of pins'strength, delays, and voltage; and it may be a suitable
+ option to insert the expected 2ns RGMII delay.
+
+* Modifying the PCB design to include a fixed delay (e.g: using a specifically
+ designed serpentine), which may not require software configuration at all.
+
+Common problems with RGMII delay mismatch
+-----------------------------------------
+
+When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
+will most likely result in the clock and data line signals to be unstable when
+the PHY or MAC take a snapshot of these signals to translate them into logical
+1 or 0 states and reconstruct the data being transmitted/received. Typical
+symptoms include:
+
+* Transmission/reception partially works, and there is frequent or occasional
+ packet loss observed
+
+* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
+ or just discard them all
+
+* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+ (since there is enough setup/hold time in that case)
+
+Connecting to a PHY
+===================
+
+Sometime during startup, the network driver needs to establish a connection
+between the PHY device, and the network device. At this time, the PHY's bus
+and drivers need to all have been loaded, so it is ready for the connection.
+At this point, there are several ways to connect to the PHY:
+
+#. The PAL handles everything, and only calls the network driver when
+ the link state changes, so it can react.
+
+#. The PAL handles everything except interrupts (usually because the
+ controller has the interrupt registers).
+
+#. The PAL handles everything, but checks in with the driver every second,
+ allowing the network driver to react first to any changes before the PAL
+ does.
+
+#. The PAL serves only as a library of functions, with the network device
+ manually calling functions to update status, and configure the PHY
+
+
+Letting the PHY Abstraction Layer do Everything
+===============================================
+
+If you choose option 1 (The hope is that every driver can, but to still be
+useful to drivers that can't), connecting to the PHY is simple:
+
+First, you need a function to react to changes in the link state. This
+function follows this protocol::
+
+ static void adjust_link(struct net_device *dev);
+
+Next, you need to know the device name of the PHY connected to this device.
+The name will look something like, "0:00", where the first number is the
+bus id, and the second is the PHY's address on that bus. Typically,
+the bus is responsible for making its ID unique.
+
+Now, to connect, just call this function::
+
+ phydev = phy_connect(dev, phy_name, &adjust_link, interface);
+
+*phydev* is a pointer to the phy_device structure which represents the PHY.
+If phy_connect is successful, it will return the pointer. dev, here, is the
+pointer to your net_device. Once done, this function will have started the
+PHY's software state machine, and registered for the PHY's interrupt, if it
+has one. The phydev structure will be populated with information about the
+current state, though the PHY will not yet be truly operational at this
+point.
+
+PHY-specific flags should be set in phydev->dev_flags prior to the call
+to phy_connect() such that the underlying PHY driver can check for flags
+and perform specific operations based on them.
+This is useful if the system has put hardware restrictions on
+the PHY/controller, of which the PHY needs to be aware.
+
+*interface* is a u32 which specifies the connection type used
+between the controller and the PHY. Examples are GMII, MII,
+RGMII, and SGMII. For a full list, see include/linux/phy.h
+
+Now just make sure that phydev->supported and phydev->advertising have any
+values pruned from them which don't make sense for your controller (a 10/100
+controller may be connected to a gigabit capable PHY, so you would need to
+mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
+for these bitfields. Note that you should not SET any bits, except the
+SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+put into an unsupported state.
+
+Lastly, once the controller is ready to handle network traffic, you call
+phy_start(phydev). This tells the PAL that you are ready, and configures the
+PHY to connect to the network. If the MAC interrupt of your network driver
+also handles PHY status changes, just set phydev->irq to PHY_IGNORE_INTERRUPT
+before you call phy_start and use phy_mac_interrupt() from the network
+driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL.
+phy_start() enables the PHY interrupts (if applicable) and starts the
+phylib state machine.
+
+When you want to disconnect from the network (even if just briefly), you call
+phy_stop(phydev). This function also stops the phylib state machine and
+disables PHY interrupts.
+
+Pause frames / flow control
+===========================
+
+The PHY does not participate directly in flow control/pause frames except by
+making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+controller supports such a thing. Since flow control/pause frames generation
+involves the Ethernet MAC driver, it is recommended that this driver takes care
+of properly indicating advertisement and support for such features by setting
+the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+either before or after phy_connect() and/or as a result of implementing the
+ethtool::set_pauseparam feature.
+
+
+Keeping Close Tabs on the PAL
+=============================
+
+It is possible that the PAL's built-in state machine needs a little help to
+keep your network device and the PHY properly in sync. If so, you can
+register a helper function when connecting to the PHY, which will be called
+every second before the state machine reacts to any changes. To do this, you
+need to manually call phy_attach() and phy_prepare_link(), and then call
+phy_start_machine() with the second argument set to point to your special
+handler.
+
+Currently there are no examples of how to use this functionality, and testing
+on it has been limited because the author does not have any drivers which use
+it (they all use option 1). So Caveat Emptor.
+
+Doing it all yourself
+=====================
+
+There's a remote chance that the PAL's built-in state machine cannot track
+the complex interactions between the PHY and your network device. If this is
+so, you can simply call phy_attach(), and not call phy_start_machine or
+phy_prepare_link(). This will mean that phydev->state is entirely yours to
+handle (phy_start and phy_stop toggle between some of the states, so you
+might need to avoid them).
+
+An effort has been made to make sure that useful functionality can be
+accessed without the state-machine running, and most of these functions are
+descended from functions which did not interact with a complex state-machine.
+However, again, no effort has been made so far to test running without the
+state machine, so tryer beware.
+
+Here is a brief rundown of the functions::
+
+ int phy_read(struct phy_device *phydev, u16 regnum);
+ int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
+
+Simple read/write primitives. They invoke the bus's read/write function
+pointers.
+::
+
+ void phy_print_status(struct phy_device *phydev);
+
+A convenience function to print out the PHY status neatly.
+::
+
+ void phy_request_interrupt(struct phy_device *phydev);
+
+Requests the IRQ for the PHY interrupts.
+::
+
+ struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
+ phy_interface_t interface);
+
+Attaches a network device to a particular PHY, binding the PHY to a generic
+driver if none was found during bus initialization.
+::
+
+ int phy_start_aneg(struct phy_device *phydev);
+
+Using variables inside the phydev structure, either configures advertising
+and resets autonegotiation, or disables autonegotiation, and configures
+forced settings.
+::
+
+ static inline int phy_read_status(struct phy_device *phydev);
+
+Fills the phydev structure with up-to-date information about the current
+settings in the PHY.
+::
+
+ int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
+
+Ethtool convenience functions.
+::
+
+ int phy_mii_ioctl(struct phy_device *phydev,
+ struct mii_ioctl_data *mii_data, int cmd);
+
+The MII ioctl. Note that this function will completely screw up the state
+machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
+use this only to write registers which are not standard, and don't set off
+a renegotiation.
+
+PHY Device Drivers
+==================
+
+With the PHY Abstraction Layer, adding support for new PHYs is
+quite easy. In some cases, no work is required at all! However,
+many PHYs require a little hand-holding to get up-and-running.
+
+Generic PHY driver
+------------------
+
+If the desired PHY doesn't have any errata, quirks, or special
+features you want to support, then it may be best to not add
+support, and let the PHY Abstraction Layer's Generic PHY Driver
+do all of the work.
+
+Writing a PHY driver
+--------------------
+
+If you do need to write a PHY driver, the first thing to do is
+make sure it can be matched with an appropriate PHY device.
+This is done during bus initialization by reading the device's
+UID (stored in registers 2 and 3), then comparing it to each
+driver's phy_id field by ANDing it with each driver's
+phy_id_mask field. Also, it needs a name. Here's an example::
+
+ static struct phy_driver dm9161_driver = {
+ .phy_id = 0x0181b880,
+ .name = "Davicom DM9161E",
+ .phy_id_mask = 0x0ffffff0,
+ ...
+ }
+
+Next, you need to specify what features (speed, duplex, autoneg,
+etc) your PHY device and driver support. Most PHYs support
+PHY_BASIC_FEATURES, but you can look in include/mii.h for other
+features.
+
+Each driver consists of a number of function pointers, documented
+in include/linux/phy.h under the phy_driver structure.
+
+Of these, only config_aneg and read_status are required to be
+assigned by the driver code. The rest are optional. Also, it is
+preferred to use the generic phy driver's versions of these two
+functions if at all possible: genphy_read_status and
+genphy_config_aneg. If this is not possible, it is likely that
+you only need to perform some actions before and after invoking
+these functions, and so your functions will wrap the generic
+ones.
+
+Feel free to look at the Marvell, Cicada, and Davicom drivers in
+drivers/net/phy/ for examples (the lxt and qsemi drivers have
+not been tested as of this writing).
+
+The PHY's MMD register accesses are handled by the PAL framework
+by default, but can be overridden by a specific PHY driver if
+required. This could be the case if a PHY was released for
+manufacturing before the MMD PHY register definitions were
+standardized by the IEEE. Most modern PHYs will be able to use
+the generic PAL framework for accessing the PHY's MMD registers.
+An example of such usage is for Energy Efficient Ethernet support,
+implemented in the PAL. This support uses the PAL to access MMD
+registers for EEE query and configuration if the PHY supports
+the IEEE standard access mechanisms, or can use the PHY's specific
+access interfaces if overridden by the specific PHY driver. See
+the Micrel driver in drivers/net/phy/ for an example of how this
+can be implemented.
+
+Board Fixups
+============
+
+Sometimes the specific interaction between the platform and the PHY requires
+special handling. For instance, to change where the PHY's clock input is,
+or to add a delay to account for latency issues in the data path. In order
+to support such contingencies, the PHY Layer allows platform code to register
+fixups to be run when the PHY is brought up (or subsequently reset).
+
+When the PHY Layer brings up a PHY it checks to see if there are any fixups
+registered for it, matching based on UID (contained in the PHY device's phy_id
+field) and the bus identifier (contained in phydev->dev.bus_id). Both must
+match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
+wildcards for the bus ID and UID, respectively.
+
+When a match is found, the PHY layer will invoke the run function associated
+with the fixup. This function is passed a pointer to the phy_device of
+interest. It should therefore only operate on that PHY.
+
+The platform code can either register the fixup using phy_register_fixup()::
+
+ int phy_register_fixup(const char *phy_id,
+ u32 phy_uid, u32 phy_uid_mask,
+ int (*run)(struct phy_device *));
+
+Or using one of the two stubs, phy_register_fixup_for_uid() and
+phy_register_fixup_for_id()::
+
+ int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
+ int (*run)(struct phy_device *));
+ int phy_register_fixup_for_id(const char *phy_id,
+ int (*run)(struct phy_device *));
+
+The stubs set one of the two matching criteria, and set the other one to
+match anything.
+
+When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module,
+unregister fixup and free allocate memory are required.
+
+Call one of following function before unloading module::
+
+ int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
+ int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
+ int phy_register_fixup_for_id(const char *phy_id);
+
+Standards
+=========
+
+IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
+http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+RGMII v1.3:
+http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+RGMII v2.0:
+http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
deleted file mode 100644
index bdec0f700bc1..000000000000
--- a/Documentation/networking/phy.txt
+++ /dev/null
@@ -1,427 +0,0 @@
-
--------
-PHY Abstraction Layer
-(Updated 2008-04-08)
-
-Purpose
-
- Most network devices consist of set of registers which provide an interface
- to a MAC layer, which communicates with the physical connection through a
- PHY. The PHY concerns itself with negotiating link parameters with the link
- partner on the other side of the network connection (typically, an ethernet
- cable), and provides a register interface to allow drivers to determine what
- settings were chosen, and to configure what settings are allowed.
-
- While these devices are distinct from the network devices, and conform to a
- standard layout for the registers, it has been common practice to integrate
- the PHY management code with the network driver. This has resulted in large
- amounts of redundant code. Also, on embedded systems with multiple (and
- sometimes quite different) ethernet controllers connected to the same
- management bus, it is difficult to ensure safe use of the bus.
-
- Since the PHYs are devices, and the management busses through which they are
- accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
- In doing so, it has these goals:
-
- 1) Increase code-reuse
- 2) Increase overall code-maintainability
- 3) Speed development time for new network drivers, and for new systems
-
- Basically, this layer is meant to provide an interface to PHY devices which
- allows network driver writers to write as little code as possible, while
- still providing a full feature set.
-
-The MDIO bus
-
- Most network devices are connected to a PHY by means of a management bus.
- Different devices use different busses (though some share common interfaces).
- In order to take advantage of the PAL, each bus interface needs to be
- registered as a distinct device.
-
- 1) read and write functions must be implemented. Their prototypes are:
-
- int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
- int read(struct mii_bus *bus, int mii_id, int regnum);
-
- mii_id is the address on the bus for the PHY, and regnum is the register
- number. These functions are guaranteed not to be called from interrupt
- time, so it is safe for them to block, waiting for an interrupt to signal
- the operation is complete
-
- 2) A reset function is optional. This is used to return the bus to an
- initialized state.
-
- 3) A probe function is needed. This function should set up anything the bus
- driver needs, setup the mii_bus structure, and register with the PAL using
- mdiobus_register. Similarly, there's a remove function to undo all of
- that (use mdiobus_unregister).
-
- 4) Like any driver, the device_driver structure must be configured, and init
- exit functions are used to register the driver.
-
- 5) The bus must also be declared somewhere as a device, and registered.
-
- As an example for how one driver implemented an mdio bus driver, see
- drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
- for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
-
-(RG)MII/electrical interface considerations
-
- The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
- electrical signal interface using a synchronous 125Mhz clock signal and several
- data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
- between the clock line (RXC or TXC) and the data lines to let the PHY (clock
- sink) have enough setup and hold times to sample the data lines correctly. The
- PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
- the PHY driver and optionally the MAC driver, implement the required delay. The
- values of phy_interface_t must be understood from the perspective of the PHY
- device itself, leading to the following:
-
- * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
- internal delay by itself, it assumes that either the Ethernet MAC (if capable
- or the PCB traces) insert the correct 1.5-2ns delay
-
- * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
- for the transmit data lines (TXD[3:0]) processed by the PHY device
-
- * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
- for the receive data lines (RXD[3:0]) processed by the PHY device
-
- * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
- both transmit AND receive data lines from/to the PHY device
-
- Whenever possible, use the PHY side RGMII delay for these reasons:
-
- * PHY devices may offer sub-nanosecond granularity in how they allow a
- receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
- precision may be required to account for differences in PCB trace lengths
-
- * PHY devices are typically qualified for a large range of applications
- (industrial, medical, automotive...), and they provide a constant and
- reliable delay across temperature/pressure/voltage ranges
-
- * PHY device drivers in PHYLIB being reusable by nature, being able to
- configure correctly a specified delay enables more designs with similar delay
- requirements to be operate correctly
-
- For cases where the PHY is not capable of providing this delay, but the
- Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
- should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
- configured correctly in order to provide the required transmit and/or receive
- side delay from the perspective of the PHY device. Conversely, if the Ethernet
- MAC driver looks at the phy_interface_t value, for any other mode but
- PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
- disabled.
-
- In case neither the Ethernet MAC, nor the PHY are capable of providing the
- required delays, as defined per the RGMII standard, several options may be
- available:
-
- * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
- set of pins'strength, delays, and voltage; and it may be a suitable
- option to insert the expected 2ns RGMII delay.
-
- * Modifying the PCB design to include a fixed delay (e.g: using a specifically
- designed serpentine), which may not require software configuration at all.
-
-Common problems with RGMII delay mismatch
-
- When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
- will most likely result in the clock and data line signals to be unstable when
- the PHY or MAC take a snapshot of these signals to translate them into logical
- 1 or 0 states and reconstruct the data being transmitted/received. Typical
- symptoms include:
-
- * Transmission/reception partially works, and there is frequent or occasional
- packet loss observed
-
- * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
- or just discard them all
-
- * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
- (since there is enough setup/hold time in that case)
-
-
-Connecting to a PHY
-
- Sometime during startup, the network driver needs to establish a connection
- between the PHY device, and the network device. At this time, the PHY's bus
- and drivers need to all have been loaded, so it is ready for the connection.
- At this point, there are several ways to connect to the PHY:
-
- 1) The PAL handles everything, and only calls the network driver when
- the link state changes, so it can react.
-
- 2) The PAL handles everything except interrupts (usually because the
- controller has the interrupt registers).
-
- 3) The PAL handles everything, but checks in with the driver every second,
- allowing the network driver to react first to any changes before the PAL
- does.
-
- 4) The PAL serves only as a library of functions, with the network device
- manually calling functions to update status, and configure the PHY
-
-
-Letting the PHY Abstraction Layer do Everything
-
- If you choose option 1 (The hope is that every driver can, but to still be
- useful to drivers that can't), connecting to the PHY is simple:
-
- First, you need a function to react to changes in the link state. This
- function follows this protocol:
-
- static void adjust_link(struct net_device *dev);
-
- Next, you need to know the device name of the PHY connected to this device.
- The name will look something like, "0:00", where the first number is the
- bus id, and the second is the PHY's address on that bus. Typically,
- the bus is responsible for making its ID unique.
-
- Now, to connect, just call this function:
-
- phydev = phy_connect(dev, phy_name, &adjust_link, interface);
-
- phydev is a pointer to the phy_device structure which represents the PHY. If
- phy_connect is successful, it will return the pointer. dev, here, is the
- pointer to your net_device. Once done, this function will have started the
- PHY's software state machine, and registered for the PHY's interrupt, if it
- has one. The phydev structure will be populated with information about the
- current state, though the PHY will not yet be truly operational at this
- point.
-
- PHY-specific flags should be set in phydev->dev_flags prior to the call
- to phy_connect() such that the underlying PHY driver can check for flags
- and perform specific operations based on them.
- This is useful if the system has put hardware restrictions on
- the PHY/controller, of which the PHY needs to be aware.
-
- interface is a u32 which specifies the connection type used
- between the controller and the PHY. Examples are GMII, MII,
- RGMII, and SGMII. For a full list, see include/linux/phy.h
-
- Now just make sure that phydev->supported and phydev->advertising have any
- values pruned from them which don't make sense for your controller (a 10/100
- controller may be connected to a gigabit capable PHY, so you would need to
- mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, except the
- SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
- put into an unsupported state.
-
- Lastly, once the controller is ready to handle network traffic, you call
- phy_start(phydev). This tells the PAL that you are ready, and configures the
- PHY to connect to the network. If you want to handle your own interrupts,
- just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start.
- Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL.
-
- When you want to disconnect from the network (even if just briefly), you call
- phy_stop(phydev).
-
-Pause frames / flow control
-
- The PHY does not participate directly in flow control/pause frames except by
- making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
- MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
- controller supports such a thing. Since flow control/pause frames generation
- involves the Ethernet MAC driver, it is recommended that this driver takes care
- of properly indicating advertisement and support for such features by setting
- the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
- either before or after phy_connect() and/or as a result of implementing the
- ethtool::set_pauseparam feature.
-
-
-Keeping Close Tabs on the PAL
-
- It is possible that the PAL's built-in state machine needs a little help to
- keep your network device and the PHY properly in sync. If so, you can
- register a helper function when connecting to the PHY, which will be called
- every second before the state machine reacts to any changes. To do this, you
- need to manually call phy_attach() and phy_prepare_link(), and then call
- phy_start_machine() with the second argument set to point to your special
- handler.
-
- Currently there are no examples of how to use this functionality, and testing
- on it has been limited because the author does not have any drivers which use
- it (they all use option 1). So Caveat Emptor.
-
-Doing it all yourself
-
- There's a remote chance that the PAL's built-in state machine cannot track
- the complex interactions between the PHY and your network device. If this is
- so, you can simply call phy_attach(), and not call phy_start_machine or
- phy_prepare_link(). This will mean that phydev->state is entirely yours to
- handle (phy_start and phy_stop toggle between some of the states, so you
- might need to avoid them).
-
- An effort has been made to make sure that useful functionality can be
- accessed without the state-machine running, and most of these functions are
- descended from functions which did not interact with a complex state-machine.
- However, again, no effort has been made so far to test running without the
- state machine, so tryer beware.
-
- Here is a brief rundown of the functions:
-
- int phy_read(struct phy_device *phydev, u16 regnum);
- int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
-
- Simple read/write primitives. They invoke the bus's read/write function
- pointers.
-
- void phy_print_status(struct phy_device *phydev);
-
- A convenience function to print out the PHY status neatly.
-
- int phy_start_interrupts(struct phy_device *phydev);
- int phy_stop_interrupts(struct phy_device *phydev);
-
- Requests the IRQ for the PHY interrupts, then enables them for
- start, or disables then frees them for stop.
-
- struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
- phy_interface_t interface);
-
- Attaches a network device to a particular PHY, binding the PHY to a generic
- driver if none was found during bus initialization.
-
- int phy_start_aneg(struct phy_device *phydev);
-
- Using variables inside the phydev structure, either configures advertising
- and resets autonegotiation, or disables autonegotiation, and configures
- forced settings.
-
- static inline int phy_read_status(struct phy_device *phydev);
-
- Fills the phydev structure with up-to-date information about the current
- settings in the PHY.
-
- int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
-
- Ethtool convenience functions.
-
- int phy_mii_ioctl(struct phy_device *phydev,
- struct mii_ioctl_data *mii_data, int cmd);
-
- The MII ioctl. Note that this function will completely screw up the state
- machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
- use this only to write registers which are not standard, and don't set off
- a renegotiation.
-
-
-PHY Device Drivers
-
- With the PHY Abstraction Layer, adding support for new PHYs is
- quite easy. In some cases, no work is required at all! However,
- many PHYs require a little hand-holding to get up-and-running.
-
-Generic PHY driver
-
- If the desired PHY doesn't have any errata, quirks, or special
- features you want to support, then it may be best to not add
- support, and let the PHY Abstraction Layer's Generic PHY Driver
- do all of the work.
-
-Writing a PHY driver
-
- If you do need to write a PHY driver, the first thing to do is
- make sure it can be matched with an appropriate PHY device.
- This is done during bus initialization by reading the device's
- UID (stored in registers 2 and 3), then comparing it to each
- driver's phy_id field by ANDing it with each driver's
- phy_id_mask field. Also, it needs a name. Here's an example:
-
- static struct phy_driver dm9161_driver = {
- .phy_id = 0x0181b880,
- .name = "Davicom DM9161E",
- .phy_id_mask = 0x0ffffff0,
- ...
- }
-
- Next, you need to specify what features (speed, duplex, autoneg,
- etc) your PHY device and driver support. Most PHYs support
- PHY_BASIC_FEATURES, but you can look in include/mii.h for other
- features.
-
- Each driver consists of a number of function pointers, documented
- in include/linux/phy.h under the phy_driver structure.
-
- Of these, only config_aneg and read_status are required to be
- assigned by the driver code. The rest are optional. Also, it is
- preferred to use the generic phy driver's versions of these two
- functions if at all possible: genphy_read_status and
- genphy_config_aneg. If this is not possible, it is likely that
- you only need to perform some actions before and after invoking
- these functions, and so your functions will wrap the generic
- ones.
-
- Feel free to look at the Marvell, Cicada, and Davicom drivers in
- drivers/net/phy/ for examples (the lxt and qsemi drivers have
- not been tested as of this writing).
-
- The PHY's MMD register accesses are handled by the PAL framework
- by default, but can be overridden by a specific PHY driver if
- required. This could be the case if a PHY was released for
- manufacturing before the MMD PHY register definitions were
- standardized by the IEEE. Most modern PHYs will be able to use
- the generic PAL framework for accessing the PHY's MMD registers.
- An example of such usage is for Energy Efficient Ethernet support,
- implemented in the PAL. This support uses the PAL to access MMD
- registers for EEE query and configuration if the PHY supports
- the IEEE standard access mechanisms, or can use the PHY's specific
- access interfaces if overridden by the specific PHY driver. See
- the Micrel driver in drivers/net/phy/ for an example of how this
- can be implemented.
-
-Board Fixups
-
- Sometimes the specific interaction between the platform and the PHY requires
- special handling. For instance, to change where the PHY's clock input is,
- or to add a delay to account for latency issues in the data path. In order
- to support such contingencies, the PHY Layer allows platform code to register
- fixups to be run when the PHY is brought up (or subsequently reset).
-
- When the PHY Layer brings up a PHY it checks to see if there are any fixups
- registered for it, matching based on UID (contained in the PHY device's phy_id
- field) and the bus identifier (contained in phydev->dev.bus_id). Both must
- match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
- wildcards for the bus ID and UID, respectively.
-
- When a match is found, the PHY layer will invoke the run function associated
- with the fixup. This function is passed a pointer to the phy_device of
- interest. It should therefore only operate on that PHY.
-
- The platform code can either register the fixup using phy_register_fixup():
-
- int phy_register_fixup(const char *phy_id,
- u32 phy_uid, u32 phy_uid_mask,
- int (*run)(struct phy_device *));
-
- Or using one of the two stubs, phy_register_fixup_for_uid() and
- phy_register_fixup_for_id():
-
- int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
- int (*run)(struct phy_device *));
- int phy_register_fixup_for_id(const char *phy_id,
- int (*run)(struct phy_device *));
-
- The stubs set one of the two matching criteria, and set the other one to
- match anything.
-
- When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
- unregister fixup and free allocate memory are required.
-
- Call one of following function before unloading module.
-
- int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
- int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
- int phy_register_fixup_for_id(const char *phy_id);
-
-Standards
-
- IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
- http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
-
- RGMII v1.3:
- http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
-
- RGMII v2.0:
- http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
new file mode 100644
index 000000000000..5bd26cb07244
--- /dev/null
+++ b/Documentation/networking/sfp-phylink.rst
@@ -0,0 +1,268 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+phylink
+=======
+
+Overview
+========
+
+phylink is a mechanism to support hot-pluggable networking modules
+without needing to re-initialise the adapter on hot-plug events.
+
+phylink supports conventional phylib-based setups, fixed link setups
+and SFP (Small Formfactor Pluggable) modules at present.
+
+Modes of operation
+==================
+
+phylink has several modes of operation, which depend on the firmware
+settings.
+
+1. PHY mode
+
+ In PHY mode, we use phylib to read the current link settings from
+ the PHY, and pass them to the MAC driver. We expect the MAC driver
+ to configure exactly the modes that are specified without any
+ negotiation being enabled on the link.
+
+2. Fixed mode
+
+ Fixed mode is the same as PHY mode as far as the MAC driver is
+ concerned.
+
+3. In-band mode
+
+ In-band mode is used with 802.3z, SGMII and similar interface modes,
+ and we are expecting to use and honor the in-band negotiation or
+ control word sent across the serdes channel.
+
+By example, what this means is that:
+
+.. code-block:: none
+
+ &eth {
+ phy = <&phy>;
+ phy-mode = "sgmii";
+ };
+
+does not use in-band SGMII signalling. The PHY is expected to follow
+exactly the settings given to it in its :c:func:`mac_config` function.
+The link should be forced up or down appropriately in the
+:c:func:`mac_link_up` and :c:func:`mac_link_down` functions.
+
+.. code-block:: none
+
+ &eth {
+ managed = "in-band-status";
+ phy = <&phy>;
+ phy-mode = "sgmii";
+ };
+
+uses in-band mode, where results from the PHY's negotiation are passed
+to the MAC through the SGMII control word, and the MAC is expected to
+acknowledge the control word. The :c:func:`mac_link_up` and
+:c:func:`mac_link_down` functions must not force the MAC side link
+up and down.
+
+Rough guide to converting a network driver to sfp/phylink
+=========================================================
+
+This guide briefly describes how to convert a network driver from
+phylib to the sfp/phylink support. Please send patches to improve
+this documentation.
+
+1. Optionally split the network driver's phylib update function into
+ three parts dealing with link-down, link-up and reconfiguring the
+ MAC settings. This can be done as a separate preparation commit.
+
+ An example of this preparation can be found in git commit fc548b991fb0.
+
+2. Replace::
+
+ select FIXED_PHY
+ select PHYLIB
+
+ with::
+
+ select PHYLINK
+
+ in the driver's Kconfig stanza.
+
+3. Add::
+
+ #include <linux/phylink.h>
+
+ to the driver's list of header files.
+
+4. Add::
+
+ struct phylink *phylink;
+
+ to the driver's private data structure. We shall refer to the
+ driver's private data pointer as ``priv`` below, and the driver's
+ private data structure as ``struct foo_priv``.
+
+5. Replace the following functions:
+
+ .. flat-table::
+ :header-rows: 1
+ :widths: 1 1
+ :stub-columns: 0
+
+ * - Original function
+ - Replacement function
+ * - phy_start(phydev)
+ - phylink_start(priv->phylink)
+ * - phy_stop(phydev)
+ - phylink_stop(priv->phylink)
+ * - phy_mii_ioctl(phydev, ifr, cmd)
+ - phylink_mii_ioctl(priv->phylink, ifr, cmd)
+ * - phy_ethtool_get_wol(phydev, wol)
+ - phylink_ethtool_get_wol(priv->phylink, wol)
+ * - phy_ethtool_set_wol(phydev, wol)
+ - phylink_ethtool_set_wol(priv->phylink, wol)
+ * - phy_disconnect(phydev)
+ - phylink_disconnect_phy(priv->phylink)
+
+ Please note that some of these functions must be called under the
+ rtnl lock, and will warn if not. This will normally be the case,
+ except if these are called from the driver suspend/resume paths.
+
+6. Add/replace ksettings get/set methods with:
+
+ .. code-block:: c
+
+ static int foo_ethtool_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_set(priv->phylink, cmd);
+ }
+
+ static int foo_ethtool_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_get(priv->phylink, cmd);
+ }
+
+7. Replace the call to:
+
+ phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface);
+
+ and associated code with a call to:
+
+ err = phylink_of_phy_connect(priv->phylink, node, flags);
+
+ For the most part, ``flags`` can be zero; these flags are passed to
+ the of_phy_attach() inside this function call if a PHY is specified
+ in the DT node ``node``.
+
+ ``node`` should be the DT node which contains the network phy property,
+ fixed link properties, and will also contain the sfp property.
+
+ The setup of fixed links should also be removed; these are handled
+ internally by phylink.
+
+ of_phy_connect() was also passed a function pointer for link updates.
+ This function is replaced by a different form of MAC updates
+ described below in (8).
+
+ Manipulation of the PHY's supported/advertised happens within phylink
+ based on the validate callback, see below in (8).
+
+ Note that the driver no longer needs to store the ``phy_interface``,
+ and also note that ``phy_interface`` becomes a dynamic property,
+ just like the speed, duplex etc. settings.
+
+ Finally, note that the MAC driver has no direct access to the PHY
+ anymore; that is because in the phylink model, the PHY can be
+ dynamic.
+
+8. Add a :c:type:`struct phylink_mac_ops <phylink_mac_ops>` instance to
+ the driver, which is a table of function pointers, and implement
+ these functions. The old link update function for
+ :c:func:`of_phy_connect` becomes three methods: :c:func:`mac_link_up`,
+ :c:func:`mac_link_down`, and :c:func:`mac_config`. If step 1 was
+ performed, then the functionality will have been split there.
+
+ It is important that if in-band negotiation is used,
+ :c:func:`mac_link_up` and :c:func:`mac_link_down` do not prevent the
+ in-band negotiation from completing, since these functions are called
+ when the in-band link state changes - otherwise the link will never
+ come up.
+
+ The :c:func:`validate` method should mask the supplied supported mask,
+ and ``state->advertising`` with the supported ethtool link modes.
+ These are the new ethtool link modes, so bitmask operations must be
+ used. For an example, see drivers/net/ethernet/marvell/mvneta.c.
+
+ The :c:func:`mac_link_state` method is used to read the link state
+ from the MAC, and report back the settings that the MAC is currently
+ using. This is particularly important for in-band negotiation
+ methods such as 1000base-X and SGMII.
+
+ The :c:func:`mac_config` method is used to update the MAC with the
+ requested state, and must avoid unnecessarily taking the link down
+ when making changes to the MAC configuration. This means the
+ function should modify the state and only take the link down when
+ absolutely necessary to change the MAC configuration. An example
+ of how to do this can be found in :c:func:`mvneta_mac_config` in
+ drivers/net/ethernet/marvell/mvneta.c.
+
+ For further information on these methods, please see the inline
+ documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`.
+
+9. Remove calls to of_parse_phandle() for the PHY,
+ of_phy_register_fixed_link() for fixed links etc. from the probe
+ function, and replace with:
+
+ .. code-block:: c
+
+ struct phylink *phylink;
+
+ phylink = phylink_create(dev, node, phy_mode, &phylink_ops);
+ if (IS_ERR(phylink)) {
+ err = PTR_ERR(phylink);
+ fail probe;
+ }
+
+ priv->phylink = phylink;
+
+ and arrange to destroy the phylink in the probe failure path as
+ appropriate and the removal path too by calling:
+
+ .. code-block:: c
+
+ phylink_destroy(priv->phylink);
+
+10. Arrange for MAC link state interrupts to be forwarded into
+ phylink, via:
+
+ .. code-block:: c
+
+ phylink_mac_change(priv->phylink, link_is_up);
+
+ where ``link_is_up`` is true if the link is currently up or false
+ otherwise.
+
+11. Verify that the driver does not call::
+
+ netif_carrier_on()
+ netif_carrier_off()
+
+ as these will interfere with phylink's tracking of the link state,
+ and cause phylink to omit calls via the :c:func:`mac_link_up` and
+ :c:func:`mac_link_down` methods.
+
+Network drivers should call phylink_stop() and phylink_start() via their
+suspend/resume paths, which ensures that the appropriate
+:c:type:`struct phylink_mac_ops <phylink_mac_ops>` methods are called
+as necessary.
+
+For information describing the SFP cage in DT, please see the binding
+documentation in the kernel source tree
+``Documentation/devicetree/bindings/net/sff,sfp.txt``
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index fe8f741193be..52b026be028f 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -1,16 +1,17 @@
-===========
+============
SNMP counter
-===========
+============
This document explains the meaning of SNMP counters.
General IPv4 counters
-====================
+=====================
All layer 4 packets and ICMP packets will change these counters, but
these counters won't be changed by layer 2 packets (such as STP) or
ARP packets.
* IpInReceives
+
Defined in `RFC1213 ipInReceives`_
.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
@@ -23,6 +24,7 @@ and so on). It indicates the number of aggregated segments after
GRO/LRO.
* IpInDelivers
+
Defined in `RFC1213 ipInDelivers`_
.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
@@ -33,6 +35,7 @@ supported protocols will be delivered, if someone listens on the raw
socket, all valid IP packets will be delivered.
* IpOutRequests
+
Defined in `RFC1213 ipOutRequests`_
.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
@@ -42,6 +45,7 @@ multicast packets, and would always be updated together with
IpExtOutOctets.
* IpExtInOctets and IpExtOutOctets
+
They are Linux kernel extensions, no RFC definitions. Please note,
RFC1213 indeed defines ifInOctets and ifOutOctets, but they
are different things. The ifInOctets and ifOutOctets include the MAC
@@ -49,6 +53,7 @@ layer header size but IpExtInOctets and IpExtOutOctets don't, they
only include the IP layer header and the IP layer data.
* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
+
They indicate the number of four kinds of ECN IP packets, please refer
`Explicit Congestion Notification`_ for more details.
@@ -60,6 +65,7 @@ for the same packet, you might find that IpInReceives count 1, but
IpExtInNoECTPkts counts 2 or more.
* IpInHdrErrors
+
Defined in `RFC1213 ipInHdrErrors`_. It indicates the packet is
dropped due to the IP header error. It might happen in both IP input
and IP forward paths.
@@ -67,6 +73,7 @@ and IP forward paths.
.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpInAddrErrors
+
Defined in `RFC1213 ipInAddrErrors`_. It will be increased in two
scenarios: (1) The IP address is invalid. (2) The destination IP
address is not a local address and IP forwarding is not enabled
@@ -74,6 +81,7 @@ address is not a local address and IP forwarding is not enabled
.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInNoRoutes
+
This counter means the packet is dropped when the IP stack receives a
packet and can't find a route for it from the route table. It might
happen when IP forwarding is enabled and the destination IP address is
@@ -81,6 +89,7 @@ not a local address and there is no route for the destination IP
address.
* IpInUnknownProtos
+
Defined in `RFC1213 ipInUnknownProtos`_. It will be increased if the
layer 4 protocol is unsupported by kernel. If an application is using
raw socket, kernel will always deliver the packet to the raw socket
@@ -89,10 +98,12 @@ and this counter won't be increased.
.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInTruncatedPkts
+
For IPv4 packet, it means the actual data size is smaller than the
"Total Length" field in the IPv4 header.
* IpInDiscards
+
Defined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped
in the IP receiving path and due to kernel internal reasons (e.g. no
enough memory).
@@ -100,20 +111,23 @@ enough memory).
.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutDiscards
+
Defined in `RFC1213 ipOutDiscards`_. It indicates the packet is
dropped in the IP sending path and due to kernel internal reasons.
.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutNoRoutes
+
Defined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is
dropped in the IP sending path and no route is found for it.
.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29
ICMP counters
-============
+=============
* IcmpInMsgs and IcmpOutMsgs
+
Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
@@ -126,6 +140,7 @@ IcmpOutMsgs would still be updated if the IP header is constructed by
a userspace program.
* ICMP named types
+
| These counters include most of common ICMP types, they are:
| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
@@ -180,6 +195,7 @@ straightforward. The 'In' counter means kernel receives such a packet
and the 'Out' counter means kernel sends such a packet.
* ICMP numeric types
+
They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
ICMP type number. These counters track all kinds of ICMP packets. The
ICMP type number definition could be found in the `ICMP parameters`_
@@ -192,12 +208,14 @@ IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
packet, IcmpMsgInType0 would increase 1.
* IcmpInCsumErrors
+
This counter indicates the checksum of the ICMP packet is
wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
before updating IcmpMsgInType[N]. If a packet has bad checksum, the
IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
* IcmpInErrors and IcmpOutErrors
+
Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
@@ -209,7 +227,7 @@ and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
is increased, IcmpInErrors would always be increased too.
relationship of the ICMP counters
--------------------------------
+---------------------------------
The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
are updated at the same time. The sum of IcmpMsgInType[N] plus
IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
@@ -229,8 +247,9 @@ IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
IcmpInErrors.
General TCP counters
-==================
+====================
* TcpInSegs
+
Defined in `RFC1213 tcpInSegs`_
.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
@@ -247,6 +266,7 @@ isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
counter would only increase 1.
* TcpOutSegs
+
Defined in `RFC1213 tcpOutSegs`_
.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
@@ -258,6 +278,7 @@ GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
increase 2.
* TcpActiveOpens
+
Defined in `RFC1213 tcpActiveOpens`_
.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@@ -267,6 +288,7 @@ state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
increase 1.
* TcpPassiveOpens
+
Defined in `RFC1213 tcpPassiveOpens`_
.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@@ -275,6 +297,7 @@ It means the TCP layer receives a SYN, replies a SYN+ACK, come into
the SYN-RCVD state.
* TcpExtTCPRcvCoalesce
+
When packets are received by the TCP layer and are not be read by the
application, the TCP layer will try to merge them. This counter
indicate how many packets are merged in such situation. If GRO is
@@ -282,12 +305,14 @@ enabled, lots of packets would be merged by GRO, these packets
wouldn't be counted to TcpExtTCPRcvCoalesce.
* TcpExtTCPAutoCorking
+
When sending packets, the TCP layer will try to merge small packets to
a bigger one. This counter increase 1 for every packet merged in such
situation. Please refer to the LWN article for more details:
https://lwn.net/Articles/576263/
* TcpExtTCPOrigDataSent
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -297,6 +322,7 @@ explaination below::
more useful to track the TCP retransmission rate.
* TCPSynRetrans
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -304,6 +330,7 @@ explaination below::
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
* TCPFastOpenActiveFail
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -313,6 +340,7 @@ explaination below::
.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
* TcpExtListenOverflows and TcpExtListenDrops
+
When kernel receives a SYN from a client, and if the TCP accept queue
is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
At the same time kernel will also add 1 to TcpExtListenDrops. When a
@@ -336,17 +364,22 @@ time client replies ACK, this socket will get another chance to move
to the accept queue.
+TCP Fast Open
+=============
* TcpEstabResets
+
Defined in `RFC1213 tcpEstabResets`_.
.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
* TcpAttemptFails
+
Defined in `RFC1213 tcpAttemptFails`_.
.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
* TcpOutRsts
+
Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
the 'segments sent containing the RST flag', but in linux kernel, this
couner indicates the segments kerenl tried to send. The sending
@@ -354,6 +387,30 @@ process might be failed due to some errors (e.g. memory alloc failed).
.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
+* TcpExtTCPSpuriousRtxHostQueues
+
+When the TCP stack wants to retransmit a packet, and finds that packet
+is not lost in the network, but the packet is not sent yet, the TCP
+stack would give up the retransmission and update this counter. It
+might happen if a packet stays too long time in a qdisc or driver
+queue.
+
+* TcpEstabResets
+
+The socket receives a RST packet in Establish or CloseWait state.
+
+* TcpExtTCPKeepAlive
+
+This counter indicates many keepalive packets were sent. The keepalive
+won't be enabled by default. A userspace program could enable it by
+setting the SO_KEEPALIVE socket option.
+
+* TcpExtTCPSpuriousRTOs
+
+The spurious retransmission timeout detected by the `F-RTO`_
+algorithm.
+
+.. _F-RTO: https://tools.ietf.org/html/rfc5682
TCP Fast Path
============
@@ -389,20 +446,23 @@ will disable the fast path at first, and try to enable it after kernel
receives packets.
* TcpExtTCPPureAcks and TcpExtTCPHPAcks
+
If a packet set ACK flag and has no data, it is a pure ACK packet, if
kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
if kernel handles it in the slow path, TcpExtTCPPureAcks will
increase 1.
* TcpExtTCPHPHits
+
If a TCP packet has data (which means it is not a pure ACK packet),
and this packet is handled in the fast path, TcpExtTCPHPHits will
increase 1.
TCP abort
-========
+=========
* TcpExtTCPAbortOnData
+
It means TCP layer has data in flight, but need to close the
connection. So TCP layer sends a RST to the other side, indicate the
connection is not closed very graceful. An easy way to increase this
@@ -421,11 +481,13 @@ when the application closes a connection, kernel will send a RST
immediately and increase the TcpExtTCPAbortOnData counter.
* TcpExtTCPAbortOnClose
+
This counter means the application has unread data in the TCP layer when
the application wants to close the TCP connection. In such a situation,
kernel will send a RST to the other side of the TCP connection.
* TcpExtTCPAbortOnMemory
+
When an application closes a TCP connection, kernel still need to track
the connection, let it complete the TCP disconnect process. E.g. an
app calls the close method of a socket, kernel sends fin to the other
@@ -447,10 +509,12 @@ the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
* TcpExtTCPAbortOnTimeout
+
This counter will increase when any of the TCP timers expire. In such
situation, kernel won't send RST, just give up the connection.
* TcpExtTCPAbortOnLinger
+
When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
for the fin packet from the other side, kernel could send a RST and
delete the socket immediately. This is not the default behavior of
@@ -458,6 +522,7 @@ Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
you could let kernel follow this behavior.
* TcpExtTCPAbortFailed
+
The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
satisfied. If an internal error occurs during this process,
TcpExtTCPAbortFailed will be increased.
@@ -465,7 +530,7 @@ TcpExtTCPAbortFailed will be increased.
.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
TCP Hybrid Slow Start
-====================
+=====================
The Hybrid Slow Start algorithm is an enhancement of the traditional
TCP congestion window Slow Start algorithm. It uses two pieces of
information to detect whether the max bandwidth of the TCP path is
@@ -481,23 +546,27 @@ relate with the Hybrid Slow Start algorithm.
.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
* TcpExtTCPHystartTrainDetect
+
How many times the ACK train length threshold is detected
* TcpExtTCPHystartTrainCwnd
+
The sum of CWND detected by ACK train length. Dividing this value by
TcpExtTCPHystartTrainDetect is the average CWND which detected by the
ACK train length.
* TcpExtTCPHystartDelayDetect
+
How many times the packet delay threshold is detected.
* TcpExtTCPHystartDelayCwnd
+
The sum of CWND detected by packet delay. Dividing this value by
TcpExtTCPHystartDelayDetect is the average CWND which detected by the
packet delay.
TCP retransmission and congestion control
-======================================
+=========================================
The TCP protocol has two retransmission mechanisms: SACK and fast
recovery. They are exclusive with each other. When SACK is enabled,
the kernel TCP stack would use SACK, or kernel would use fast
@@ -516,12 +585,14 @@ https://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf
.. _RFC6582: https://tools.ietf.org/html/rfc6582
* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery
+
When the congestion control comes into Recovery state, if sack is
used, TcpExtTCPSackRecovery increases 1, if sack is not used,
TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP
stack begins to retransmit the lost packets.
* TcpExtTCPSACKReneging
+
A packet was acknowledged by SACK, but the receiver has dropped this
packet, so the sender needs to retransmit this packet. In this
situation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver
@@ -532,6 +603,7 @@ the RTO expires for this packet, then the sender assumes this packet
has been dropped by the receiver.
* TcpExtTCPRenoReorder
+
The reorder packet is detected by fast recovery. It would only be used
if SACK is disabled. The fast recovery algorithm detects recorder by
the duplicate ACK number. E.g., if retransmission is triggered, and
@@ -542,6 +614,7 @@ order packet. Thus the sender would find more ACks than its
expectation, and the sender knows out of order occurs.
* TcpExtTCPTSReorder
+
The reorder packet is detected when a hole is filled. E.g., assume the
sender sends packet 1,2,3,4,5, and the receiving order is
1,2,4,5,3. When the sender receives the ACK of packet 3 (which will
@@ -551,6 +624,7 @@ fill the hole), two conditions will let TcpExtTCPTSReorder increase
than the retransmission timestamp.
* TcpExtTCPSACKReorder
+
The reorder packet detected by SACK. The SACK has two methods to
detect reorder: (1) DSACK is received by the sender. It means the
sender sends the same packet more than one times. And the only reason
@@ -562,6 +636,29 @@ packet yet, the sender would know packet 4 is out of order. The TCP
stack of kernel will increase TcpExtTCPSACKReorder for both of the
above scenarios.
+* TcpExtTCPSlowStartRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is 'Loss'.
+
+* TcpExtTCPFastRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is not 'Loss'.
+
+* TcpExtTCPLostRetransmit
+
+A SACK points out that a retransmission packet is lost again.
+
+* TcpExtTCPRetransFail
+
+The TCP stack tries to deliver a retransmission packet to lower layers
+but the lower layers return an error.
+
+* TcpExtTCPSynRetrans
+
+The TCP stack retransmits a SYN packet.
+
DSACK
=====
The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@@ -574,10 +671,12 @@ sender side.
.. _RFC2883 : https://tools.ietf.org/html/rfc2883
* TcpExtTCPDSACKOldSent
+
The TCP stack receives a duplicate packet which has been acked, so it
sends a DSACK to the sender.
* TcpExtTCPDSACKOfoSent
+
The TCP stack receives an out of order duplicate packet, so it sends a
DSACK to the sender.
@@ -586,6 +685,7 @@ The TCP stack receives a DSACK, which indicates an acknowledged
duplicate packet is received.
* TcpExtTCPDSACKOfoRecv
+
The TCP stack receives a DSACK, which indicate an out of order
duplicate packet is received.
@@ -640,23 +740,26 @@ A skb should be shifted or merged, but the TCP stack doesn't do it for
some reasons.
TCP out of order
-===============
+================
* TcpExtTCPOFOQueue
+
The TCP layer receives an out of order packet and has enough memory
to queue it.
* TcpExtTCPOFODrop
+
The TCP layer receives an out of order packet but doesn't have enough
memory, so drops it. Such packets won't be counted into
TcpExtTCPOFOQueue.
* TcpExtTCPOFOMerge
+
The received out of order packet has an overlay with the previous
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
packets will also be counted into TcpExtTCPOFOQueue.
TCP PAWS
-=======
+========
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
which is used to drop old packets. It depends on the TCP
timestamps. For detail information, please refer the `timestamp wiki`_
@@ -666,13 +769,15 @@ and the `RFC of PAWS`_.
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
* TcpExtPAWSActive
+
Packets are dropped by PAWS in Syn-Sent status.
* TcpExtPAWSEstab
+
Packets are dropped by PAWS in any status other than Syn-Sent.
TCP ACK skip
-===========
+============
In some scenarios, kernel would avoid sending duplicate ACKs too
frequently. Please find more details in the tcp_invalid_ratelimit
section of the `sysctl document`_. When kernel decides to skip an ACK
@@ -684,6 +789,7 @@ it has no data.
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
* TcpExtTCPACKSkippedSynRecv
+
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
@@ -697,6 +803,7 @@ increase TcpExtTCPACKSkippedSynRecv.
* TcpExtTCPACKSkippedPAWS
+
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
or Time-Wait statuses, the skipped ACK would be counted to
@@ -705,18 +812,22 @@ TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
would be counted to TcpExtTCPACKSkippedPAWS.
* TcpExtTCPACKSkippedSeq
+
The sequence number is out of window and the timestamp passes the PAWS
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
* TcpExtTCPACKSkippedFinWait2
+
The ACK is skipped in Fin-Wait-2 status, the reason would be either
PAWS check fails or the received sequence number is out of window.
* TcpExtTCPACKSkippedTimeWait
+
Tha ACK is skipped in Time-Wait status, the reason would be either
PAWS check failed or the received sequence number is out of window.
* TcpExtTCPACKSkippedChallenge
+
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
@@ -729,8 +840,9 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
TCP receive window
-=================
+==================
* TcpExtTCPWantZeroWindowAdv
+
Depending on current memory usage, the TCP stack tries to set receive
window to zero. But the receive window might still be a no-zero
value. For example, if the previous window size is 10, and the TCP
@@ -738,14 +850,16 @@ stack receives 3 bytes, the current window size would be 7 even if the
window size calculated by the memory usage is zero.
* TcpExtTCPToZeroWindowAdv
+
The TCP receive window is set to zero from a no-zero value.
* TcpExtTCPFromZeroWindowAdv
+
The TCP receive window is set to no-zero value from zero.
Delayed ACK
-==========
+===========
The TCP Delayed ACK is a technique which is used for reducing the
packet count in the network. For more details, please refer the
`Delayed ACK wiki`_
@@ -753,10 +867,12 @@ packet count in the network. For more details, please refer the
.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
* TcpExtDelayedACKs
+
A delayed ACK timer expires. The TCP stack will send a pure ACK packet
and exit the delayed ACK mode.
* TcpExtDelayedACKLocked
+
A delayed ACK timer expires, but the TCP stack can't send an ACK
immediately due to the socket is locked by a userspace program. The
TCP stack will send a pure ACK later (after the userspace program
@@ -765,29 +881,152 @@ TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
mode.
* TcpExtDelayedACKLost
+
It will be updated when the TCP stack receives a packet which has been
ACKed. A Delayed ACK loss might cause this issue, but it would also be
triggered by other reasons, such as a packet is duplicated in the
network.
Tail Loss Probe (TLP)
-===================
+=====================
TLP is an algorithm which is used to detect TCP packet loss. For more
details, please refer the `TLP paper`_.
.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
* TcpExtTCPLossProbes
+
A TLP probe packet is sent.
* TcpExtTCPLossProbeRecovery
+
A packet loss is detected and recovered by TLP.
+TCP Fast Open
+=============
+TCP Fast Open is a technology which allows data transfer before the
+3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
+general description.
+
+.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
+
+* TcpExtTCPFastOpenActive
+
+When the TCP stack receives an ACK packet in the SYN-SENT status, and
+the ACK packet acknowledges the data in the SYN packet, the TCP stack
+understand the TFO cookie is accepted by the other side, then it
+updates this counter.
+
+* TcpExtTCPFastOpenActiveFail
+
+This counter indicates that the TCP stack initiated a TCP Fast Open,
+but it failed. This counter would be updated in three scenarios: (1)
+the other side doesn't acknowledge the data in the SYN packet. (2) The
+SYN packet which has the TFO cookie is timeout at least once. (3)
+after the 3-way handshake, the retransmission timeout happens
+net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
+fast open after the handshake.
+
+* TcpExtTCPFastOpenPassive
+
+This counter indicates how many times the TCP stack accepts the fast
+open request.
+
+* TcpExtTCPFastOpenPassiveFail
+
+This counter indicates how many times the TCP stack rejects the fast
+open request. It is caused by either the TFO cookie is invalid or the
+TCP stack finds an error during the socket creating process.
+
+* TcpExtTCPFastOpenListenOverflow
+
+When the pending fast open request number is larger than
+fastopenq->max_qlen, the TCP stack will reject the fast open request
+and update this counter. When this counter is updated, the TCP stack
+won't update TcpExtTCPFastOpenPassive or
+TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
+TCP_FASTOPEN socket operation and it could not be larger than
+net.core.somaxconn. For example:
+
+setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
+
+* TcpExtTCPFastOpenCookieReqd
+
+This counter indicates how many times a client wants to request a TFO
+cookie.
+
+SYN cookies
+===========
+SYN cookies are used to mitigate SYN flood, for details, please refer
+the `SYN cookies wiki`_.
+
+.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
+
+* TcpExtSyncookiesSent
+
+It indicates how many SYN cookies are sent.
+
+* TcpExtSyncookiesRecv
+
+How many reply packets of the SYN cookies the TCP stack receives.
+
+* TcpExtSyncookiesFailed
+
+The MSS decoded from the SYN cookie is invalid. When this counter is
+updated, the received packet won't be treated as a SYN cookie and the
+TcpExtSyncookiesRecv counter wont be updated.
+
+Challenge ACK
+=============
+For details of challenge ACK, please refer the explaination of
+TcpExtTCPACKSkippedChallenge.
+
+* TcpExtTCPChallengeACK
+
+The number of challenge acks sent.
+
+* TcpExtTCPSYNChallenge
+
+The number of challenge acks sent in response to SYN packets. After
+updates this counter, the TCP stack might send a challenge ACK and
+update the TcpExtTCPChallengeACK counter, or it might also skip to
+send the challenge and update the TcpExtTCPACKSkippedChallenge.
+
+prune
+=====
+When a socket is under memory pressure, the TCP stack will try to
+reclaim memory from the receiving queue and out of order queue. One of
+the reclaiming method is 'collapse', which means allocate a big sbk,
+copy the contiguous skbs to the single big skb, and free these
+contiguous skbs.
+
+* TcpExtPruneCalled
+
+The TCP stack tries to reclaim memory for a socket. After updates this
+counter, the TCP stack will try to collapse the out of order queue and
+the receiving queue. If the memory is still not enough, the TCP stack
+will try to discard packets from the out of order queue (and update the
+TcpExtOfoPruned counter)
+
+* TcpExtOfoPruned
+
+The TCP stack tries to discard packet on the out of order queue.
+
+* TcpExtRcvPruned
+
+After 'collapse' and discard packets from the out of order queue, if
+the actually used memory is still larger than the max allowed memory,
+this counter will be updated. It means the 'prune' fails.
+
+* TcpExtTCPRcvCollapsed
+
+This counter indicates how many skbs are freed during 'collapse'.
+
examples
-=======
+========
ping test
---------
+---------
Run the ping command against the public dns server 8.8.8.8::
nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
@@ -831,7 +1070,7 @@ and its corresponding Echo Reply packet are constructed by:
So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
tcp 3-way handshake
-------------------
+-------------------
On server side, we run::
nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
@@ -873,7 +1112,7 @@ ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
1, TcpOutSegs increased 2.
TCP normal traffic
------------------
+------------------
Run nc on server::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@@ -996,7 +1235,7 @@ and the packet received from client qualified for fast path, so it
was counted into 'TcpExtTCPHPHits'.
TcpExtTCPAbortOnClose
---------------------
+---------------------
On the server side, we run below python script::
import socket
@@ -1030,7 +1269,7 @@ If we run tcpdump on the server side, we could find the server sent a
RST after we type Ctrl-C.
TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
------------------------------------------------
+---------------------------------------------------
Below is an example which let the orphan socket count be higher than
net.ipv4.tcp_max_orphans.
Change tcp_max_orphans to a smaller value on client::
@@ -1152,7 +1391,7 @@ FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
TcpExtTCPAbortOnTimeout 10 0.0
TcpExtTCPAbortOnLinger
----------------------
+----------------------
The server side code::
nstatuser@nstat-b:~$ cat server_linger.py
@@ -1197,7 +1436,7 @@ After run client_linger.py, check the output of nstat::
TcpExtTCPAbortOnLinger 1 0.0
TcpExtTCPRcvCoalesce
--------------------
+--------------------
On the server, we run a program which listen on TCP port 9000, but
doesn't read any data::
@@ -1257,7 +1496,7 @@ the receiving queue. So the TCP layer merged the two packets, and we
could find the TcpExtTCPRcvCoalesce increased 1.
TcpExtListenOverflows and TcpExtListenDrops
-----------------------------------------
+-------------------------------------------
On server, run the nc command, listen on port 9000::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@@ -1305,7 +1544,7 @@ TcpExtListenOverflows and TcpExtListenDrops would be larger, because
the SYN of the 4th nc was dropped, the client was retrying.
IpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes
-----------------------------------------------
+-------------------------------------------------
server A IP address: 192.168.122.250
server B IP address: 192.168.122.251
Prepare on server A, add a route to server B::
@@ -1400,7 +1639,7 @@ a route for the 8.8.8.8 IP address, so server B increased
IpOutNoRoutes.
TcpExtTCPACKSkippedSynRecv
-------------------------
+--------------------------
In this test, we send 3 same SYN packets from client to server. The
first SYN will let server create a socket, set it to Syn-Recv status,
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
@@ -1448,7 +1687,7 @@ Check snmp cunter on nstat-b::
As we expected, TcpExtTCPACKSkippedSynRecv is 1.
TcpExtTCPACKSkippedPAWS
-----------------------
+-----------------------
To trigger PAWS, we could send an old SYN.
On nstat-b, let nc listen on port 9000::
@@ -1485,7 +1724,7 @@ failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
TcpExtTCPACKSkippedSeq
---------------------
+----------------------
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
timestamp (to pass PAWS check) but the sequence number is out of
window. The linux TCP stack would avoid to skip if the packet has
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 82236a17b5e6..86174ce8cd13 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -92,11 +92,11 @@ device.
Switch ID
^^^^^^^^^
-The switchdev driver must implement the switchdev op switchdev_port_attr_get
-for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same
-physical ID for each port of a switch. The ID must be unique between switches
-on the same system. The ID does not need to be unique between switches on
-different systems.
+The switchdev driver must implement the net_device operation
+ndo_get_port_parent_id for each port netdev, returning the same physical ID for
+each port of a switch. The ID must be unique between switches on the same
+system. The ID does not need to be unique between switches on different
+systems.
The switch ID is used to locate ports on a switch and to know if aggregated
ports belong to the same switch.
@@ -196,7 +196,7 @@ The switch device will learn/forget source MAC address/VLAN on ingress packets
and notify the switch driver of the mac/vlan/port tuples. The switch driver,
in turn, will notify the bridge driver using the switchdev notifier call:
- err = call_switchdev_notifiers(val, dev, info);
+ err = call_switchdev_notifiers(val, dev, info, extack);
Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
forgetting, and info points to a struct switchdev_notifier_fdb_info. On
@@ -232,10 +232,8 @@ Learning_sync attribute enables syncing of the learned/forgotten FDB entry to
the bridge's FDB. It's possible, but not optimal, to enable learning on the
device port and on the bridge port, and disable learning_sync.
-To support learning and learning_sync port attributes, the driver implements
-switchdev op switchdev_port_attr_get/set for
-SWITCHDEV_ATTR_PORT_ID_BRIDGE_FLAGS. The driver should initialize the attributes
-to the hardware defaults.
+To support learning, the driver implements switchdev op
+switchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
FDB Ageing
^^^^^^^^^^
@@ -373,22 +371,3 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
NETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops
for the routes as arp_tbl updates. The driver implements ndo_neigh_destroy
to know when arp_tbl neighbor entries are purged from the port.
-
-Transaction item queue
-^^^^^^^^^^^^^^^^^^^^^^
-
-For switchdev ops attr_set and obj_add, there is a 2 phase transaction model
-used. First phase is to "prepare" anything needed, including various checks,
-memory allocation, etc. The goal is to handle the stuff that is not unlikely
-to fail here. The second phase is to "commit" the actual changes.
-
-Switchdev provides an infrastructure for sharing items (for example memory
-allocations) between the two phases.
-
-The object created by a driver in "prepare" phase and it is queued up by:
-switchdev_trans_item_enqueue()
-During the "commit" phase, the driver gets the object by:
-switchdev_trans_item_dequeue()
-
-If a transaction is aborted during "prepare" phase, switchdev code will handle
-cleanup of the queued-up objects.
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 9d1432e0aaa8..bbdaf8990031 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -6,11 +6,21 @@ The interfaces for receiving network packages timestamps are:
* SO_TIMESTAMP
Generates a timestamp for each incoming packet in (not necessarily
monotonic) system time. Reports the timestamp via recvmsg() in a
- control message as struct timeval (usec resolution).
+ control message in usec resolution.
+ SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD
+ based on the architecture type and time_t representation of libc.
+ Control message format is in struct __kernel_old_timeval for
+ SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for
+ SO_TIMESTAMP_NEW options respectively.
* SO_TIMESTAMPNS
Same timestamping mechanism as SO_TIMESTAMP, but reports the
- timestamp as struct timespec (nsec resolution).
+ timestamp as struct timespec in nsec resolution.
+ SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD
+ based on the architecture type and time_t representation of libc.
+ Control message format is in struct timespec for SO_TIMESTAMPNS_OLD
+ and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options
+ respectively.
* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
Only for multicast:approximate transmit timestamp obtained by
@@ -22,7 +32,7 @@ The interfaces for receiving network packages timestamps are:
timestamps for stream sockets.
-1.1 SO_TIMESTAMP:
+1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW):
This socket option enables timestamping of datagrams on the reception
path. Because the destination socket, if any, is not known early in
@@ -31,15 +41,25 @@ same is true for all early receive timestamp options.
For interface details, see `man 7 socket`.
+Always use SO_TIMESTAMP_NEW timestamp to always get timestamp in
+struct __kernel_sock_timeval format.
-1.2 SO_TIMESTAMPNS:
+SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
+
+1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW):
This option is identical to SO_TIMESTAMP except for the returned data type.
Its struct timespec allows for higher resolution (ns) timestamps than the
timeval of SO_TIMESTAMP (ms).
+Always use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in
+struct __kernel_timespec format.
+
+SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
-1.3 SO_TIMESTAMPING:
+1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW):
Supports multiple types of timestamp requests. As a result, this
socket option takes a bitmap of flags, not a boolean. In
@@ -323,10 +343,23 @@ SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
These timestamps are returned in a control message with cmsg_level
SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
+For SO_TIMESTAMPING_OLD:
+
struct scm_timestamping {
struct timespec ts[3];
};
+For SO_TIMESTAMPING_NEW:
+
+struct scm_timestamping64 {
+ struct __kernel_timespec ts[3];
+
+Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in
+struct scm_timestamping64 format.
+
+SO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038
+on 32 bit machines.
+
The structure can return up to three timestamps. This is a legacy
feature. At least one field is non-zero at any time. Most timestamps
are passed in ts[0]. Hardware timestamps are passed in ts[2].
diff --git a/Documentation/power/energy-model.txt b/Documentation/power/energy-model.txt
new file mode 100644
index 000000000000..a2b0ae4c76bd
--- /dev/null
+++ b/Documentation/power/energy-model.txt
@@ -0,0 +1,144 @@
+ ====================
+ Energy Model of CPUs
+ ====================
+
+1. Overview
+-----------
+
+The Energy Model (EM) framework serves as an interface between drivers knowing
+the power consumed by CPUs at various performance levels, and the kernel
+subsystems willing to use that information to make energy-aware decisions.
+
+The source of the information about the power consumed by CPUs can vary greatly
+from one platform to another. These power costs can be estimated using
+devicetree data in some cases. In others, the firmware will know better.
+Alternatively, userspace might be best positioned. And so on. In order to avoid
+each and every client subsystem to re-implement support for each and every
+possible source of information on its own, the EM framework intervenes as an
+abstraction layer which standardizes the format of power cost tables in the
+kernel, hence enabling to avoid redundant work.
+
+The figure below depicts an example of drivers (Arm-specific here, but the
+approach is applicable to any architecture) providing power costs to the EM
+framework, and interested clients reading the data from it.
+
+ +---------------+ +-----------------+ +---------------+
+ | Thermal (IPA) | | Scheduler (EAS) | | Other |
+ +---------------+ +-----------------+ +---------------+
+ | | em_pd_energy() |
+ | | em_cpu_get() |
+ +---------+ | +---------+
+ | | |
+ v v v
+ +---------------------+
+ | Energy Model |
+ | Framework |
+ +---------------------+
+ ^ ^ ^
+ | | | em_register_perf_domain()
+ +----------+ | +---------+
+ | | |
+ +---------------+ +---------------+ +--------------+
+ | cpufreq-dt | | arm_scmi | | Other |
+ +---------------+ +---------------+ +--------------+
+ ^ ^ ^
+ | | |
+ +--------------+ +---------------+ +--------------+
+ | Device Tree | | Firmware | | ? |
+ +--------------+ +---------------+ +--------------+
+
+The EM framework manages power cost tables per 'performance domain' in the
+system. A performance domain is a group of CPUs whose performance is scaled
+together. Performance domains generally have a 1-to-1 mapping with CPUFreq
+policies. All CPUs in a performance domain are required to have the same
+micro-architecture. CPUs in different performance domains can have different
+micro-architectures.
+
+
+2. Core APIs
+------------
+
+ 2.1 Config options
+
+CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
+
+
+ 2.2 Registration of performance domains
+
+Drivers are expected to register performance domains into the EM framework by
+calling the following API:
+
+ int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
+ struct em_data_callback *cb);
+
+Drivers must specify the CPUs of the performance domains using the cpumask
+argument, and provide a callback function returning <frequency, power> tuples
+for each capacity state. The callback function provided by the driver is free
+to fetch data from any relevant location (DT, firmware, ...), and by any mean
+deemed necessary. See Section 3. for an example of driver implementing this
+callback, and kernel/power/energy_model.c for further documentation on this
+API.
+
+
+ 2.3 Accessing performance domains
+
+Subsystems interested in the energy model of a CPU can retrieve it using the
+em_cpu_get() API. The energy model tables are allocated once upon creation of
+the performance domains, and kept in memory untouched.
+
+The energy consumed by a performance domain can be estimated using the
+em_pd_energy() API. The estimation is performed assuming that the schedutil
+CPUfreq governor is in use.
+
+More details about the above APIs can be found in include/linux/energy_model.h.
+
+
+3. Example driver
+-----------------
+
+This section provides a simple example of a CPUFreq driver registering a
+performance domain in the Energy Model framework using the (fake) 'foo'
+protocol. The driver implements an est_power() function to be provided to the
+EM framework.
+
+ -> drivers/cpufreq/foo_cpufreq.c
+
+01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
+02 {
+03 long freq, power;
+04
+05 /* Use the 'foo' protocol to ceil the frequency */
+06 freq = foo_get_freq_ceil(cpu, *KHz);
+07 if (freq < 0);
+08 return freq;
+09
+10 /* Estimate the power cost for the CPU at the relevant freq. */
+11 power = foo_estimate_power(cpu, freq);
+12 if (power < 0);
+13 return power;
+14
+15 /* Return the values to the EM framework */
+16 *mW = power;
+17 *KHz = freq;
+18
+19 return 0;
+20 }
+21
+22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
+23 {
+24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
+25 int nr_opp, ret;
+26
+27 /* Do the actual CPUFreq init work ... */
+28 ret = do_foo_cpufreq_init(policy);
+29 if (ret)
+30 return ret;
+31
+32 /* Find the number of OPPs for this policy */
+33 nr_opp = foo_get_nr_opp(policy);
+34
+35 /* And register the new performance domain */
+36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+37
+38 return 0;
+39 }
diff --git a/Documentation/process/applying-patches.rst b/Documentation/process/applying-patches.rst
index dc2ddc345044..fbb9297e6360 100644
--- a/Documentation/process/applying-patches.rst
+++ b/Documentation/process/applying-patches.rst
@@ -216,14 +216,14 @@ You can use the ``interdiff`` program (http://cyberelk.net/tim/patchutils/) to
generate a patch representing the differences between two patches and then
apply the result.
-This will let you move from something like 4.7.2 to 4.7.3 in a single
+This will let you move from something like 5.7.2 to 5.7.3 in a single
step. The -z flag to interdiff will even let you feed it patches in gzip or
bzip2 compressed form directly without the use of zcat or bzcat or manual
decompression.
-Here's how you'd go from 4.7.2 to 4.7.3 in a single step::
+Here's how you'd go from 5.7.2 to 5.7.3 in a single step::
- interdiff -z ../patch-4.7.2.gz ../patch-4.7.3.gz | patch -p1
+ interdiff -z ../patch-5.7.2.gz ../patch-5.7.3.gz | patch -p1
Although interdiff may save you a step or two you are generally advised to
do the additional steps since interdiff can get things wrong in some cases.
@@ -245,62 +245,67 @@ The patches are available at http://kernel.org/
Most recent patches are linked from the front page, but they also have
specific homes.
-The 4.x.y (-stable) and 4.x patches live at
+The 5.x.y (-stable) and 5.x patches live at
- https://www.kernel.org/pub/linux/kernel/v4.x/
+ https://www.kernel.org/pub/linux/kernel/v5.x/
-The -rc patches live at
+The -rc patches are not stored on the webserver but are generated on
+demand from git tags such as
- https://www.kernel.org/pub/linux/kernel/v4.x/testing/
+ https://git.kernel.org/torvalds/p/v5.1-rc1/v5.0
+The stable -rc patches live at
-The 4.x kernels
+ https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/
+
+
+The 5.x kernels
===============
These are the base stable releases released by Linus. The highest numbered
release is the most recent.
If regressions or other serious flaws are found, then a -stable fix patch
-will be released (see below) on top of this base. Once a new 4.x base
+will be released (see below) on top of this base. Once a new 5.x base
kernel is released, a patch is made available that is a delta between the
-previous 4.x kernel and the new one.
+previous 5.x kernel and the new one.
-To apply a patch moving from 4.6 to 4.7, you'd do the following (note
-that such patches do **NOT** apply on top of 4.x.y kernels but on top of the
-base 4.x kernel -- if you need to move from 4.x.y to 4.x+1 you need to
-first revert the 4.x.y patch).
+To apply a patch moving from 5.6 to 5.7, you'd do the following (note
+that such patches do **NOT** apply on top of 5.x.y kernels but on top of the
+base 5.x kernel -- if you need to move from 5.x.y to 5.x+1 you need to
+first revert the 5.x.y patch).
Here are some examples::
- # moving from 4.6 to 4.7
+ # moving from 5.6 to 5.7
- $ cd ~/linux-4.6 # change to kernel source dir
- $ patch -p1 < ../patch-4.7 # apply the 4.7 patch
+ $ cd ~/linux-5.6 # change to kernel source dir
+ $ patch -p1 < ../patch-5.7 # apply the 5.7 patch
$ cd ..
- $ mv linux-4.6 linux-4.7 # rename source dir
+ $ mv linux-5.6 linux-5.7 # rename source dir
- # moving from 4.6.1 to 4.7
+ # moving from 5.6.1 to 5.7
- $ cd ~/linux-4.6.1 # change to kernel source dir
- $ patch -p1 -R < ../patch-4.6.1 # revert the 4.6.1 patch
- # source dir is now 4.6
- $ patch -p1 < ../patch-4.7 # apply new 4.7 patch
+ $ cd ~/linux-5.6.1 # change to kernel source dir
+ $ patch -p1 -R < ../patch-5.6.1 # revert the 5.6.1 patch
+ # source dir is now 5.6
+ $ patch -p1 < ../patch-5.7 # apply new 5.7 patch
$ cd ..
- $ mv linux-4.6.1 linux-4.7 # rename source dir
+ $ mv linux-5.6.1 linux-5.7 # rename source dir
-The 4.x.y kernels
+The 5.x.y kernels
=================
Kernels with 3-digit versions are -stable kernels. They contain small(ish)
critical fixes for security problems or significant regressions discovered
-in a given 4.x kernel.
+in a given 5.x kernel.
This is the recommended branch for users who want the most recent stable
kernel and are not interested in helping test development/experimental
versions.
-If no 4.x.y kernel is available, then the highest numbered 4.x kernel is
+If no 5.x.y kernel is available, then the highest numbered 5.x kernel is
the current stable kernel.
.. note::
@@ -308,23 +313,23 @@ the current stable kernel.
The -stable team usually do make incremental patches available as well
as patches against the latest mainline release, but I only cover the
non-incremental ones below. The incremental ones can be found at
- https://www.kernel.org/pub/linux/kernel/v4.x/incr/
+ https://www.kernel.org/pub/linux/kernel/v5.x/incr/
-These patches are not incremental, meaning that for example the 4.7.3
-patch does not apply on top of the 4.7.2 kernel source, but rather on top
-of the base 4.7 kernel source.
+These patches are not incremental, meaning that for example the 5.7.3
+patch does not apply on top of the 5.7.2 kernel source, but rather on top
+of the base 5.7 kernel source.
-So, in order to apply the 4.7.3 patch to your existing 4.7.2 kernel
-source you have to first back out the 4.7.2 patch (so you are left with a
-base 4.7 kernel source) and then apply the new 4.7.3 patch.
+So, in order to apply the 5.7.3 patch to your existing 5.7.2 kernel
+source you have to first back out the 5.7.2 patch (so you are left with a
+base 5.7 kernel source) and then apply the new 5.7.3 patch.
Here's a small example::
- $ cd ~/linux-4.7.2 # change to the kernel source dir
- $ patch -p1 -R < ../patch-4.7.2 # revert the 4.7.2 patch
- $ patch -p1 < ../patch-4.7.3 # apply the new 4.7.3 patch
+ $ cd ~/linux-5.7.2 # change to the kernel source dir
+ $ patch -p1 -R < ../patch-5.7.2 # revert the 5.7.2 patch
+ $ patch -p1 < ../patch-5.7.3 # apply the new 5.7.3 patch
$ cd ..
- $ mv linux-4.7.2 linux-4.7.3 # rename the kernel source dir
+ $ mv linux-5.7.2 linux-5.7.3 # rename the kernel source dir
The -rc kernels
===============
@@ -343,38 +348,38 @@ This is a good branch to run for people who want to help out testing
development kernels but do not want to run some of the really experimental
stuff (such people should see the sections about -next and -mm kernels below).
-The -rc patches are not incremental, they apply to a base 4.x kernel, just
-like the 4.x.y patches described above. The kernel version before the -rcN
+The -rc patches are not incremental, they apply to a base 5.x kernel, just
+like the 5.x.y patches described above. The kernel version before the -rcN
suffix denotes the version of the kernel that this -rc kernel will eventually
turn into.
-So, 4.8-rc5 means that this is the fifth release candidate for the 4.8
-kernel and the patch should be applied on top of the 4.7 kernel source.
+So, 5.8-rc5 means that this is the fifth release candidate for the 5.8
+kernel and the patch should be applied on top of the 5.7 kernel source.
Here are 3 examples of how to apply these patches::
- # first an example of moving from 4.7 to 4.8-rc3
+ # first an example of moving from 5.7 to 5.8-rc3
- $ cd ~/linux-4.7 # change to the 4.7 source dir
- $ patch -p1 < ../patch-4.8-rc3 # apply the 4.8-rc3 patch
+ $ cd ~/linux-5.7 # change to the 5.7 source dir
+ $ patch -p1 < ../patch-5.8-rc3 # apply the 5.8-rc3 patch
$ cd ..
- $ mv linux-4.7 linux-4.8-rc3 # rename the source dir
+ $ mv linux-5.7 linux-5.8-rc3 # rename the source dir
- # now let's move from 4.8-rc3 to 4.8-rc5
+ # now let's move from 5.8-rc3 to 5.8-rc5
- $ cd ~/linux-4.8-rc3 # change to the 4.8-rc3 dir
- $ patch -p1 -R < ../patch-4.8-rc3 # revert the 4.8-rc3 patch
- $ patch -p1 < ../patch-4.8-rc5 # apply the new 4.8-rc5 patch
+ $ cd ~/linux-5.8-rc3 # change to the 5.8-rc3 dir
+ $ patch -p1 -R < ../patch-5.8-rc3 # revert the 5.8-rc3 patch
+ $ patch -p1 < ../patch-5.8-rc5 # apply the new 5.8-rc5 patch
$ cd ..
- $ mv linux-4.8-rc3 linux-4.8-rc5 # rename the source dir
+ $ mv linux-5.8-rc3 linux-5.8-rc5 # rename the source dir
- # finally let's try and move from 4.7.3 to 4.8-rc5
+ # finally let's try and move from 5.7.3 to 5.8-rc5
- $ cd ~/linux-4.7.3 # change to the kernel source dir
- $ patch -p1 -R < ../patch-4.7.3 # revert the 4.7.3 patch
- $ patch -p1 < ../patch-4.8-rc5 # apply new 4.8-rc5 patch
+ $ cd ~/linux-5.7.3 # change to the kernel source dir
+ $ patch -p1 -R < ../patch-5.7.3 # revert the 5.7.3 patch
+ $ patch -p1 < ../patch-5.8-rc5 # apply new 5.8-rc5 patch
$ cd ..
- $ mv linux-4.7.3 linux-4.8-rc5 # rename the kernel source dir
+ $ mv linux-5.7.3 linux-5.8-rc5 # rename the kernel source dir
The -mm patches and the linux-next tree
diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 000000000000..197d81f4b836
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,425 @@
+ =======================
+ Energy Aware Scheduling
+ =======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing:
+
+ /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize:
+
+ performance [inst/s]
+ --------------------
+ power [W]
+
+which is equivalent to minimizing:
+
+ energy [J]
+ -----------
+ instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+ Let us consider a platform with 12 CPUs, split in 3 performance domains
+ (pd0, pd4 and pd8), organized as follows:
+
+ CPUs: 0 1 2 3 4 5 6 7 8 9 10 11
+ PDs: |--pd0--|--pd4--|---pd8---|
+ RDs: |----rd1----|-----rd2-----|
+
+ Now, consider that userspace decided to split the system with two
+ exclusive cpusets, hence creating two independent root domains, each
+ containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+ above figure. Since pd4 intersects with both rd1 and rd2, it will be
+ present in the linked list '->pd' attached to each of them:
+ * rd1->pd: pd0 -> pd4
+ * rd2->pd: pd4 -> pd8
+
+ Please note that the scheduler will create two duplicate list nodes for
+ pd4 (one for each list). However, both just hold a pointer to the same
+ shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+ Let us consider a (fake) platform with 2 independent performance domains
+ composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+ are big.
+
+ The scheduler must decide where to place a task P whose util_avg = 200
+ and prev_cpu = 0.
+
+ The current utilization landscape of the CPUs is depicted on the graph
+ below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+ Each performance domain has three Operating Performance Points (OPPs).
+ The CPU capacity and power cost associated with each OPP is listed in
+ the Energy Model table. The util_avg of P is shown on the figures
+ below as 'PP'.
+
+ CPU util.
+ 1024 - - - - - - - Energy Model
+ +-----------+-------------+
+ | Little | Big |
+ 768 ============= +-----+-----+------+------+
+ | Cap | Pwr | Cap | Pwr |
+ +-----+-----+------+------+
+ 512 =========== - ##- - - - - | 170 | 50 | 512 | 400 |
+ ## ## | 341 | 150 | 768 | 800 |
+ 341 -PP - - - - ## ## | 512 | 300 | 1024 | 1700 |
+ PP ## ## +-----+-----+------+------+
+ 170 -## - - - - ## ##
+ ## ## ## ##
+ ------------ -------------
+ CPU0 CPU1 CPU2 CPU3
+
+ Current OPP: ===== Other OPP: - - - util_avg (100 each): ##
+
+
+ find_energy_efficient_cpu() will first look for the CPUs with the
+ maximum spare capacity in the two performance domains. In this example,
+ CPU1 and CPU3. Then it will estimate the energy of the system if P was
+ placed on either of them, and check if that would save some energy
+ compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+ (which is coherent with the behaviour of the schedutil CPUFreq
+ governor, see Section 6. for more details on this topic).
+
+ Case 1. P is migrated to CPU1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ 1024 - - - - - - -
+
+ Energy calculation:
+ 768 ============= * CPU0: 200 / 341 * 150 = 88
+ * CPU1: 300 / 341 * 150 = 131
+ * CPU2: 600 / 768 * 800 = 625
+ 512 - - - - - - - ##- - - - - * CPU3: 500 / 768 * 800 = 520
+ ## ## => total_energy = 1364
+ 341 =========== ## ##
+ PP ## ##
+ 170 -## - - PP- ## ##
+ ## ## ## ##
+ ------------ -------------
+ CPU0 CPU1 CPU2 CPU3
+
+
+ Case 2. P is migrated to CPU3
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ 1024 - - - - - - -
+
+ Energy calculation:
+ 768 ============= * CPU0: 200 / 341 * 150 = 88
+ * CPU1: 100 / 341 * 150 = 43
+ PP * CPU2: 600 / 768 * 800 = 625
+ 512 - - - - - - - ##- - -PP - * CPU3: 700 / 768 * 800 = 729
+ ## ## => total_energy = 1485
+ 341 =========== ## ##
+ ## ##
+ 170 -## - - - - ## ##
+ ## ## ## ##
+ ------------ -------------
+ CPU0 CPU1 CPU2 CPU3
+
+
+ Case 3. P stays on prev_cpu / CPU 0
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ 1024 - - - - - - -
+
+ Energy calculation:
+ 768 ============= * CPU0: 400 / 512 * 300 = 234
+ * CPU1: 100 / 512 * 300 = 58
+ * CPU2: 600 / 768 * 800 = 625
+ 512 =========== - ##- - - - - * CPU3: 500 / 768 * 800 = 520
+ ## ## => total_energy = 1437
+ 341 -PP - - - - ## ##
+ PP ## ##
+ 170 -## - - - - ## ##
+ ## ## ## ##
+ ------------ -------------
+ CPU0 CPU1 CPU2 CPU3
+
+
+ From these calculations, the Case 1 has the lowest total energy. So CPU 1
+ is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+ a. there is some idle time on all CPUs, so the utilization signals used by
+ EAS are likely to accurately represent the 'size' of the various tasks
+ in the system;
+ b. all tasks should already be provided with enough CPU capacity,
+ regardless of their nice values;
+ c. since there is spare capacity all tasks must be blocking/sleeping
+ regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+ 6.1 - Asymmetric CPU topology
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+ 6.2 - Energy Model presence
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+ 6.3 - Energy Model complexity
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+ C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+ 1. split your system into separate, smaller, root domains using exclusive
+ cpusets and enable EAS locally on each of them. This option has the
+ benefit to work out of the box but the drawback of preventing load
+ balance between root domains, which can result in an unbalanced system
+ overall;
+ 2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+ hence enabling it to cope with larger EMs in reasonable time.
+
+
+ 6.4 - Schedutil governor
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+ 6.5 Scale-invariant utilization signals
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+ 6.6 Multithreading (SMT)
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.
diff --git a/Documentation/spi/pxa2xx b/Documentation/spi/pxa2xx
index 13a0b7fb192f..551325b66b23 100644
--- a/Documentation/spi/pxa2xx
+++ b/Documentation/spi/pxa2xx
@@ -21,15 +21,15 @@ Typically a SPI master is defined in the arch/.../mach-*/board-*.c as a
"platform device". The master configuration is passed to the driver via a table
found in include/linux/spi/pxa2xx_spi.h:
-struct pxa2xx_spi_master {
+struct pxa2xx_spi_controller {
u16 num_chipselect;
u8 enable_dma;
};
-The "pxa2xx_spi_master.num_chipselect" field is used to determine the number of
+The "pxa2xx_spi_controller.num_chipselect" field is used to determine the number of
slave device (chips) attached to this SPI master.
-The "pxa2xx_spi_master.enable_dma" field informs the driver that SSP DMA should
+The "pxa2xx_spi_controller.enable_dma" field informs the driver that SSP DMA should
be used. This caused the driver to acquire two DMA channels: rx_channel and
tx_channel. The rx_channel has a higher DMA service priority the tx_channel.
See the "PXA2xx Developer Manual" section "DMA Controller".
@@ -51,7 +51,7 @@ static struct resource pxa_spi_nssp_resources[] = {
},
};
-static struct pxa2xx_spi_master pxa_nssp_master_info = {
+static struct pxa2xx_spi_controller pxa_nssp_master_info = {
.num_chipselect = 1, /* Matches the number of chips attached to NSSP */
.enable_dma = 1, /* Enables NSSP DMA */
};
@@ -206,7 +206,7 @@ DMA and PIO I/O Support
-----------------------
The pxa2xx_spi driver supports both DMA and interrupt driven PIO message
transfers. The driver defaults to PIO mode and DMA transfers must be enabled
-by setting the "enable_dma" flag in the "pxa2xx_spi_master" structure. The DMA
+by setting the "enable_dma" flag in the "pxa2xx_spi_controller" structure. The DMA
mode supports both coherent and stream based DMA mappings.
The following logic is used to determine the type of I/O to be used on
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 819caf8ca05f..ebc679bcb2dc 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -56,26 +56,34 @@ of any kernel data structures.
dentry-state:
-From linux/fs/dentry.c:
+From linux/include/linux/dcache.h:
--------------------------------------------------------------
-struct {
+struct dentry_stat_t dentry_stat {
int nr_dentry;
int nr_unused;
int age_limit; /* age in seconds */
int want_pages; /* pages requested by system */
- int dummy[2];
-} dentry_stat = {0, 0, 45, 0,};
---------------------------------------------------------------
-
-Dentries are dynamically allocated and deallocated, and
-nr_dentry seems to be 0 all the time. Hence it's safe to
-assume that only nr_unused, age_limit and want_pages are
-used. Nr_unused seems to be exactly what its name says.
+ int nr_negative; /* # of unused negative dentries */
+ int dummy; /* Reserved for future use */
+};
+--------------------------------------------------------------
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
+
Age_limit is the age in seconds after which dcache entries
can be reclaimed when memory is short and want_pages is
nonzero when shrink_dcache_pages() has been called and the
dcache isn't pruned yet.
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not map to any files. Instead,
+they help speeding up rejection of non-existing files provided
+by the users.
+
==============================================================
dquot-max & dquot-nr:
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index c0527d8a468a..379063e58326 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -79,6 +79,7 @@ show up in /proc/sys/kernel:
- reboot-cmd [ SPARC only ]
- rtsig-max
- rtsig-nr
+- sched_energy_aware
- seccomp/ ==> Documentation/userspace-api/seccomp_filter.rst
- sem
- sem_next_id [ sysv ipc ]
@@ -890,6 +891,17 @@ rtsig-nr shows the number of RT signals currently queued.
==============================================================
+sched_energy_aware:
+
+Enables/disables Energy Aware Scheduling (EAS). EAS starts
+automatically on platforms where it can run (that is,
+platforms with asymmetric CPU topologies and having an Energy
+Model available). If your platform happens to meet the
+requirements for EAS but you do not want to use it, change
+this value to 0.
+
+==============================================================
+
sched_schedstats:
Enables/disables scheduler statistics. Enabling this feature
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 2793d4eac55f..2ae91d3873bb 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -52,6 +52,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
- sparc64
- mips64
- s390x
+ - riscv
And the older cBPF JIT supported on the following archs:
- mips
@@ -291,6 +292,20 @@ user space is responsible for creating them if needed.
Default : 0 (for compatibility reasons)
+devconf_inherit_init_net
+----------------------------
+
+Controls if a new network namespace should inherit all current
+settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
+default, we keep the current behavior: for IPv4 we inherit all current
+settings from init_net and for IPv6 we reset all settings to default.
+
+If set to 1, both IPv4 and IPv6 settings are forced to inherit from
+current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
+forced to reset to their default values.
+
+Default : 0 (for compatibility reasons)
+
2. /proc/sys/net/unix - Parameters for Unix domain sockets
-------------------------------------------------------
diff --git a/Documentation/translations/it_IT/admin-guide/README.rst b/Documentation/translations/it_IT/admin-guide/README.rst
index 80f5ffc94a9e..b37166817842 100644
--- a/Documentation/translations/it_IT/admin-guide/README.rst
+++ b/Documentation/translations/it_IT/admin-guide/README.rst
@@ -4,7 +4,7 @@
.. _it_readme:
-Rilascio del kernel Linux 4.x <http://kernel.org/>
+Rilascio del kernel Linux 5.x <http://kernel.org/>
===================================================
.. warning::
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index c4dbe6f7cdae..1129c7550a48 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -28,18 +28,20 @@ PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bits 0-3 with
the following meaning:
-==== ===================== ===================================================
-Bit Define Description
-==== ===================== ===================================================
-0 PR_SPEC_PRCTL Mitigation can be controlled per task by
- PR_SET_SPECULATION_CTRL.
-1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
- disabled.
-2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
- enabled.
-3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A
- subsequent prctl(..., PR_SPEC_ENABLE) will fail.
-==== ===================== ===================================================
+==== ====================== ==================================================
+Bit Define Description
+==== ====================== ==================================================
+0 PR_SPEC_PRCTL Mitigation can be controlled per task by
+ PR_SET_SPECULATION_CTRL.
+1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
+ disabled.
+2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
+ enabled.
+3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A
+ subsequent prctl(..., PR_SPEC_ENABLE) will fail.
+4 PR_SPEC_DISABLE_NOEXEC Same as PR_SPEC_DISABLE, but the state will be
+ cleared on :manpage:`execve(2)`.
+==== ====================== ==================================================
If all bits are 0 the CPU is not affected by the speculation misfeature.
@@ -92,6 +94,7 @@ Speculation misfeature controls
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE_NOEXEC, 0, 0);
- PR_SPEC_INDIR_BRANCH: Indirect Branch Speculation in User Processes
(Mitigate Spectre V2 style attacks against user processes)
diff --git a/Documentation/x86/resctrl_ui.txt b/Documentation/x86/resctrl_ui.txt
index e8e8d14d3c4e..c1f95b59e14d 100644
--- a/Documentation/x86/resctrl_ui.txt
+++ b/Documentation/x86/resctrl_ui.txt
@@ -9,7 +9,7 @@ Fenghua Yu <fenghua.yu@intel.com>
Tony Luck <tony.luck@intel.com>
Vikas Shivappa <vikas.shivappa@intel.com>
-This feature is enabled by the CONFIG_X86_RESCTRL and the x86 /proc/cpuinfo
+This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
flag bits:
RDT (Resource Director Technology) Allocation - "rdt_a"
CAT (Cache Allocation Technology) - "cat_l3", "cat_l2"