summaryrefslogtreecommitdiff
path: root/drivers/cxl
AgeCommit message (Collapse)AuthorFilesLines
2022-11-04cxl/region: Recycle region idsDan Williams1-0/+20
At region creation time the next region-id is atomically cached so that there is predictability of region device names. If that region is destroyed and then a new one is created the region id increments. That ends up looking like a memory leak, or is otherwise surprising that identifiers roll forward even after destroying all previously created regions. Try to reuse rather than free old region ids at region release time. While this fixes a cosmetic issue, the needlessly advancing memory region-id gives the appearance of a memory leak, hence the "Fixes" tag, but no "Cc: stable" tag. Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752186062.947915.13200195701224993317.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix 'distance' calculation with passthrough portsDan Williams3-3/+19
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/pmem: Fix cxl_pmem_region and cxl_memdev leakDan Williams3-37/+68
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks. Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations. Cc: <stable@vger.kernel.org> Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1] Reported-by: Vishal Verma <vishal.l.verma@intel.com> Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix cxl_region leak, cleanup targets at region deleteDan Williams1-0/+11
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time. Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down. Cc: <stable@vger.kernel.org> Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752183055.947915.17681995648556534844.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix region HPA ordering validationDan Williams1-0/+3
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result: devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90 store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453 Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166752182461.947915.497032805239915067.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-03cxl/pmem: Use size_add() against integer overflowYu Zhe1-1/+1
"struct_size() + n" may cause a integer overflow, use size_add() to handle it. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Link: https://lore.kernel.org/r/20220927070247.23148-1-yuzhe@nfschina.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01cxl/region: Fix decoder allocation crashVishal Verma1-26/+41
When an intermediate port's decoders have been exhausted by existing regions, and creating a new region with the port in question in it's hierarchical path is attempted, cxl_port_attach_region() fails to find a port decoder (as would be expected), and drops into the failure / cleanup path. However, during cleanup of the region reference, a sanity check attempts to dereference the decoder, which in the above case didn't exist. This causes a NULL pointer dereference BUG. To fix this, refactor the decoder allocation and de-allocation into helper routines, and in this 'free' routine, check that the decoder, @cxld, is valid before attempting any operations on it. Cc: <stable@vger.kernel.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-10-20cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.Jonathan Cameron1-1/+1
Writes to the device must include an offset and size as defined in CXL 2.0 8.2.9.5.2.4 Set LSA (Opcode 4103h) Fixes tag is non obvious as this code has been through several reworks and variable names + wasn't in use until the addition of the region code. Due to a bug in QEMU CXL emulation this overrun resulted in QEMU crashing. Reported-by: Bobo WL <lmw.bobo@gmail.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Fixes: 60b8f17215de ("cxl/pmem: Translate NVDIMM label commands to CXL label commands") Link: https://lore.kernel.org/r/20220815154044.24733-3-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-10-20cxl/region: Fix null pointer dereference due to pass through decoder commitJonathan Cameron1-1/+2
Not all decoders have a commit callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit callback. Prior to this patch, the ->commit() callback was called unconditionally. Thus a configuration with 1 Host Bridge, 1 Root Port, 1 switch with multiple downstream ports below which there are multiple CXL type 3 devices results in a situation where committing the region causes a null pointer dereference. Reported-by: Bobo WL <lmw.bobo@gmail.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20220818164210.2084-1-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-10-20cxl/mbox: Add a check on input payload sizeJonathan Cameron1-1/+1
A bug in the LSA code resulted in transfers slightly larger than the mailbox size. Let us make it easier to catch similar issues in future by adding a low level check. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220815154044.24733-2-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/hdm: Fix skip allocations vs multiple pmem allocationsDan Williams1-1/+10
Vishal notes that when attempting to define a second pmem region on a device the DPA allocation fails with a message of the form: decoder11.1: failed to reserve skipped space Recall that the skip setting is used when there is a pmem allocation in the presence of free ram DPA space. The first pmem allocation skips over the free ram and subsequent pmem allocations do not require a skip. The bug is that a skip is still attempted and the DPA reservation code flags the double skip allocation conflict. Fixes: cf880423b6a0 ("cxl/hdm: Add support for allocating DPA to an endpoint decoder") Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973754730.1558392.15466392461645857658.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Disallow region granularity != window granularityDan Williams1-6/+7
The endpoint decode granularity must be <= the window granularity otherwise capacity in the endpoints is lost in the decode. Consider an attempt to have a region granularity of 512 with 4 devices within a window that maps 2 host bridges at a granularity of 256 bytes: HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x100 0 0 0 0x300 0x100 1 0 2 0x400 0x200 0 1 1 0x500 0x200 1 1 3 0x600 0x300 0 1 1 0x700 0x300 1 1 3 0x800 0x400 0 0 0 0x900 0x400 1 0 2 0xA00 0x500 0 0 0 0xB00 0x500 1 0 2 Notice how endpoint0 maps HPA 0x0 and 0x200 correctly, but then at HPA 0x800 it results in DPA 0x200-0x400 on being skipped. Fix this by restricing the region granularity to be equal to the window granularity resulting in the following for a x4 region under a x2 window at a granularity of 256. HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x0 0 1 1 0x300 0x0 1 1 3 0x400 0x100 0 0 0 0x500 0x100 1 0 2 0x600 0x100 0 1 1 0x700 0x100 1 1 3 Not that it ever made practical sense to support region granularity > window granularity. The window rotates host bridges causing endpoints to never see a consecutive stream of requests at the desired granularity without breaks to issue cycles to the other host bridge. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973127171.1526540.9923273539049172976.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix x1 interleave to greater than x1 interleave routingDan Williams1-1/+5
In cases where the decode fans out as it traverses downstream, the interleave granularity needs to increment to identify the port selector bits out of the remaining address bits. For example, recall that with an x2 parent port intereleave (IW == 1), the downstream decode for children of those ports will either see address bit IG+8 always set, or address bit IG+8 always clear. So if the child port needs to select a downstream port it can only use address bits starting at IG+9 (where IG and IW are the CXL encoded values for interleave granularity (ilog2(ig) - 8) and ways (ilog2(iw))). When the parent port interleave is x1 no such masking occurs and the child port can maintain the granularity that was routed to the parent port. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126583.1526540.657948655360009242.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Move HPA setup to cxl_region_attach()Dan Williams2-26/+24
A recent bug fix added the setup of the endpoint decoder interleave geometry settings to cxl_region_attach(). Move the HPA setup there as well to keep all endpoint decoder parameter setting in a central location. For symmetry, move endpoint HPA teardown to cxl_region_detach(), and for switches move HPA setup / teardown to cxl_port_{setup,reset}_targets(). Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126020.1526540.14701949254436069807.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix decoder interleave programmingDan Williams1-0/+3
Jonathan notes: "Curiously interleave ways = 1 for the EPs which is obviously wrong" ...while testing the latest CXL development branch on QEMU. It turns out the region creation process failed to program the endpoint decoders. This was missed because the default settings of x1 at 4K intereleave still results in the region appearing to function. Jonathan caught the bug by reverse mapping the translations that need to happen for the QEMU support. Link: https://lore.kernel.org/r/62e95fdf9f6e2_30440294e4@dwillia2-xfh.jf.intel.com.notmuch Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165951146336.967013.11160153960900111443.stgit@dwillia2-xfh.jf.intel.com Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: describe targets and nr_targets members of cxl_region_paramsBagas Sanjaya1-0/+2
Sphinx reported undescribed parameters in cxl_region_params struct: ./drivers/cxl/cxl.h:376: warning: Function parameter or member 'targets' not described in 'cxl_region_params' ./drivers/cxl/cxl.h:376: warning: Function parameter or member 'nr_targets' not described in 'cxl_region_params' Describe these members. Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220804075448.98241-3-bagasdotme@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/regions: add padding for cxl_rr_ep_add nested listsBagas Sanjaya1-0/+3
Sphinx reported indentation warnings: Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:732: WARNING: Unexpected indentation. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:733: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:735: WARNING: Unexpected indentation. These warnings above are due to missing blank line padding in the nested list in kernel-doc comment for cxl_rr_ep_add(). Add the paddings to fix the warnings. Fixes: 384e624bb211b4 ("cxl/region: Attach endpoint decoders") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220804075448.98241-2-bagasdotme@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix IS_ERR() vs NULL checkDan Carpenter1-2/+2
The nvdimm_pmem_region_create() function returns NULL on error. It does not return error pointers. Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/Yuo65lq2WtfdGJ0X@kili Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix region reference target accountingDan Williams1-28/+43
Dan reports: The error handling in cxl_port_attach_region() looks like it might have a similar bug. The cxl_rr->nr_targets++; might want a --. That function is more complicated. Indeed cxl_rr->nr_targets leaks when cxl_rr_ep_add() fails, but that flow is not clear. Fix the bug and the clarity by separating the 'new' region-reference case from the 'extend' region-reference case. This also moves the host-physical-address (HPA) validation, that the HPA of a new region being accounted to the port is greater than the HPA of all other regions associated with the port, to alloc_region_ref(). Introduce @nr_targets_inc to track when the error exit path needs to clean up cxl_rr->nr_targets. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/165939482134.252363.1915691883146696327.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix region commit uninitialized variable warningDan Williams1-17/+13
0day robot reports: drivers/cxl/core/region.c:196 cxl_region_decode_commit() error: uninitialized symbol 'rc'. The re-checking of loop termination conditions to determine "success" makes it hard to see that @rc is initialized in all cases. Remove those to make it explicit that @rc reflects a commit error and that the rest of logic is concerned with unwinding committed decoders. This change potentially results in cxl_region_decode_reset() being called with @count == 0 where it was not called before, but cxl_region_decode_reset() treats that as a nop. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/165951148105.967013.14191992449932268431.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix port setup uninitialized variable warningsDan Williams1-3/+22
0day robot reports: drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'eiw'. drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'peig'. drivers/cxl/core/region.c:1068 cxl_port_setup_targets() error: uninitialized symbol 'peiw'. ...which are all valid reports. Add debug statement to consume the, albeit unexpected, errors. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165951147487.967013.929590444907251028.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: Stop initializing interleave granularityDan Williams1-4/+0
In preparation for a patch that validates that the region ways setting is compatible with the granularity setting, the initial granularity setting needs to start at zero to indicate "unset". Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/165853777484.2430596.3423921169034844397.stgit@dwillia2-xfh.jf.intel.com [djbw: fix up unused variable] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetimeDan Williams1-2/+5
After adding support for emulating platform firmware established DPA reservations, the cxl-topology.sh [1] unit test started crashing with the following signature: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> __cxl_dpa_release+0x1b/0xd0 [cxl_core] cxl_dpa_release+0x1d/0x30 [cxl_core] release_nodes+0x63/0x90 devres_release_all+0x88/0xc0 ...i.e. a use after free of a 'struct cxl_endpoint_decoder' object. This results from the ordering of init_hdm_decoder() before add_hdm_decoder() where, at release time, the decoder is unregistered and released before the DPA reservation. Fix this by extending the life of the object until all DPA reservations have been released which also preserves platform decoder settings being settled by the time the decoder is published in sysfs (KOBJ_ADD time). Note that the @len == 0 case in __cxl_dpa_reserve() is avoided in practice as this function is only called for committed decoders and new non-zero DPA allocations. Link: https://github.com/pmem/ndctl/blob/pending/test/cxl-topology.sh [1] Fixes: 9c57cde0dcbd ("cxl/hdm: Enumerate allocated DPA") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/165896020625.3546860.12390103413706292760.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/acpi: Minimize granularity for x1 interleavesDan Williams2-0/+8
The kernel enforces that region granularity is >= to the top-level interleave-granularity for the given CXL window. However, when the CXL window interleave is x1, i.e. non-interleaved at the host bridge level, then the specified granularity does not matter. Override the window specified granularity to the CXL minimum so that any valid region granularity is >= to the root granularity. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com [djbw: add CXL_DECODER_MIN_GRANULARITY per vishal] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: Delete 'region' attribute from root decodersDan Williams1-1/+2
For switch and endpoint decoders the relationship of decoders to regions is 1:1. However, for root decoders the relationship is 1:N. Also, regions are already children of root decoders, so the 1:N relationship is observed by walking the following glob: /sys/bus/cxl/devices/$decoder/region* Hide the vestigial 'region' attribute for root decoders. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/165853776328.2430596.4647259305040072751.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/acpi: Autoload driver for 'cxl_acpi' test devicesDan Williams1-0/+7
In support of CXL unit tests in the ndctl project, arrange for the cxl_acpi driver to load in response to the registration of cxl_test devices. Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/165853775783.2430596.13637998086505316619.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: decrement ->nr_targets on error in cxl_region_attach()Dan Carpenter1-1/+3
The ++ needs a match -- on the clean up path. If the p->nr_targets value gets to be more than 16 it leads to uninitialized data in cxl_port_setup_targets(). drivers/cxl/core/region.c:995 cxl_port_setup_targets() error: uninitialized symbol 'eiw'. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuepCvUAoCtdpcoO@kili Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: prevent underflow in ways_to_cxl()Dan Carpenter2-3/+4
The "ways" variable comes from the user. The ways_to_cxl() function has an upper bound but it doesn't check for negatives. Make the "ways" variable an unsigned int to fix this bug. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yueo3NV2hFCXx1iV@kili [djbw: fixup interleave_ways_store() to only accept unsigned input] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: uninitialized variable in alloc_hpa()Dan Carpenter1-1/+1
This should check "p->res" instead of "res" (which is uninitialized). Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yueor88I/DkVSOtL@kili Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Introduce cxl_pmem_region objectsDan Williams6-5/+420
The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/pmem: Fix offline_nvdimm_bus() to offline by bridgeDan Williams2-4/+18
Be careful to only disable cxl_pmem objects related to a given cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL domains and disables more than is expected. Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784339569.1758207.1557084545278004577.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Add region driver boiler plateDan Williams4-1/+66
The CXL region driver is responsible for routing fully formed CXL regions to one of libnvdimm, for persistent memory regions, device-dax for volatile memory regions, or just act as an enumeration placeholder if the region was setup and configuration locked by platform firmware. In the platform-firmware-setup case the expectation is that region is already accounted in the system memory map, i.e. already enabled as "System RAM". For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and take no further action. Given this driver is just a small / simple router, include it in the core rather than its own module. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/hdm: Commit decoder state to hardwareDan Williams4-11/+424
After all the soft validation of the region has completed, convey the region configuration to hardware while being careful to commit decoders in specification mandated order. In addition to programming the endpoint decoder base-address, interleave ways and granularity, the switch decoder target lists are also established. While the kernel can enforce spec-mandated commit order, it can not enforce spec-mandated reset order. For example, the kernel can't stop someone from removing an endpoint device that is occupying decoderN in a switch decoder where decoderN+1 is also committed. To reset decoderN, decoderN+1 must be torn down first. That "tear down the world" implementation is saved for a follow-on patch. Callback operations are provided for the 'commit' and 'reset' operations. While those callbacks may prove useful for CXL accelerators (Type-2 devices with memory) the primary motivation is to enable a simple way for cxl_test to intercept those operations. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Program target listsDan Williams4-11/+259
Once the region's interleave geometry (ways, granularity, size) is established and all the endpoint decoder targets are assigned, the next phase is to program all the intermediate decoders. Specifically, each CXL switch in the path between the endpoint and its CXL host-bridge (including the logical switch internal to the host-bridge) needs to have its decoders programmed and the target list order assigned. The difficulty in this implementation lies in determining which endpoint decoder ordering combinations are valid. Consider the cxl_test case of 2 host bridges, each of those host-bridges attached to 2 switches, and each of those switches attached to 2 endpoints for a potential 8-way interleave. The x2 interleave at the host-bridge level requires that all even numbered endpoint decoder positions be located on the "left" hand side of the topology tree, and the odd numbered positions on the other. The endpoints that are peers on the same switch need to have a position that can be routed with a dedicated address bit per-endpoint. See check_last_peer() for the details. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Attach endpoint decodersDan Williams5-12/+394
CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/acpi: Add a host-bridge index lookup mechanismDan Williams2-0/+18
The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple methods to determine which host bridge provides access to a given endpoint relative to that device's position in the interleave. The "Interleave Arithmetic" defines either a "standard modulo" / round-random algorithm, or "xormap" based algorithm which can be defined as a non-linear transform. Given that there are already more options beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL specific, provide a callback for the region provisioning code to map endpoint positions back to expected host bridge id (cxl_dport target). For now just support the simple modulo math case and save the xormap for a follow-on change. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-14-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Enable the assignment of endpoint decoders to regionsDan Williams5-2/+321
The region provisioning process involves allocating DPA to a set of endpoint decoders, and HPA plus the region geometry to a region device. Then the decoder is assigned to the region. At this point several validation steps can be performed to validate that the decoder is suitable to participate in the region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Allocate HPA capacity to regionsDan Williams3-1/+154
After a region's interleave parameters (ways and granularity) are set, add a way for regions to allocate HPA (host physical address space) from the free capacity in their parent root-decoder. The allocator for this capacity reuses the 'struct resource' based allocator used for CONFIG_DEVICE_PRIVATE. Once the tuple of "ways, granularity, [uuid], and size" is set the region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE state which is a precursor to allowing endpoint decoders to be added to a region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Add interleave geometry attributesBen Widawsky2-0/+167
Add ABI to allow the number of devices that comprise a region to be set as well as the interleave granularity for the region. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: reword changelog] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Add a 'uuid' attributeBen Widawsky2-0/+143
The process of provisioning a region involves triggering the creation of a new region object, pouring in the configuration, and then binding that configured object to the region driver to start its operation. For persistent memory regions the CXL specification mandates that it identified by a uuid. Add an ABI for userspace to specify a region's uuid. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: simplify locking] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/region: Add region creation supportBen Widawsky6-0/+274
CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/mem: Enumerate port targets before adding endpointsDan Williams3-29/+47
The port scanning algorithm in devm_cxl_enumerate_ports() walks up the topology and adds cxl_port objects starting from the root down to the endpoint. When those ports are initially created they know all their dports, but they do not know the downstream cxl_port instance that represents the next descendant in the topology. Rework create_endpoint() into devm_cxl_add_endpoint() that enumerates the downstream cxl_port topology into each port's 'struct cxl_ep' record for each endpoint it that the port is an ancestor. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-7-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/hdm: Add sysfs attributes for interleave ways + granularityBen Widawsky1-0/+23
The region provisioning flow involves selecting interleave ways + granularity settings for a region, and then programming the decoder topology to meet those constraints, if possible. For example, root decoders set the minimum interleave ways + granularity for any hosted regions. Given decoder programming is not atomic and collisions can occur between multiple requesting regions userspace will be responsible for conflict resolution and it needs these attributes to make those decisions. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784332235.1758207.7185062713652694607.stgit@dwillia2-xfh.jf.intel.com [djbw: reword changelog, make read-only, add sysfs ABI documentaion] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/port: Move dport tracking to an xarrayDan Williams3-56/+47
Reduce the complexity and the overhead of walking the topology to determine endpoint connectivity to root decoder interleave configurations. Note that cxl_detach_ep(), after it determines that the last @ep has departed and decides to delete the port, now needs to walk the dport array with the device_lock() held to remove entries. Previously list_splice_init() could be used atomically delete all dport entries at once and then perform entry tear down outside the lock. There is no list_splice_init() equivalent for the xarray. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784331647.1758207.6345820282285119339.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/port: Move 'cxl_ep' references to an xarray per portDan Williams2-34/+30
In preparation for region provisioning that needs to walk the topology by endpoints, use an xarray to record endpoint interest in a given port. In addition to being more space and time efficient it also reduces the complexity of the implementation by moving locking internal to the xarray implementation. It also allows for a single cxl_ep reference to be recorded in multiple xarrays. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-2-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/port: Record parent dport when adding portsDan Williams4-20/+27
At the time that cxl_port instances are being created, cache the dport from the parent port that points to this new child port. This will be useful for region provisioning when walking the tree to calculate decoder targets, and saves rewalking the dport list after the fact to build this information. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-1-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/port: Record dport in endpoint referencesDan Williams2-17/+37
Recall that the primary role of the cxl_mem driver is to probe if the given endpoint is connected to a CXL port topology. In that process it walks its device ancestry to its PCI root port. If that root port is also a CXL root port then the probe process adds cxl_port object instances at switch in the path between to the root and the endpoint. As those cxl_port instances are added, or if a previous enumeration attempt already created the port, a 'struct cxl_ep' instance is registered with that port to track the endpoints interested in that port. At the time the cxl_ep is registered the downstream egress path from the port to the endpoint is known. Take the opportunity to record that information as it will be needed for dynamic programming of decoder targets during region provisioning. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784329944.1758207.15203961796832072116.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/hdm: Add support for allocating DPA to an endpoint decoderDan Williams3-1/+259
The region provisioning flow will roughly follow a sequence of: 1/ Allocate DPA to a set of decoders 2/ Allocate HPA to a region 3/ Associate decoders with a region and validate that the DPA allocations and topologies match the parameters of the region. For now, this change (step 1) arranges for DPA capacity to be allocated and deleted from non-committed decoders based on the decoder's mode / partition selection. Capacity is allocated from the lowest DPA in the partition and any 'pmem' allocation blocks out all remaining ram capacity in its 'skip' setting. DPA allocations are enforced in decoder instance order. I.e. decoder N + 1 always starts at a higher DPA than instance N, and deleting allocations must proceed from the highest-instance allocated decoder to the lowest. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784329399.1758207.16732038126938632700.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/hdm: Track next decoder to allocateDan Williams3-0/+18
The CXL specification enforces that endpoint decoders are committed in hw instance id order. In preparation for adding dynamic DPA allocation, record the hw instance id in endpoint decoders, and enforce allocations to occur in hw instance id order. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784328827.1758207.9627538529944559954.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/hdm: Add 'mode' attribute to decoder objectsDan Williams3-0/+39
Recall that the Device Physical Address (DPA) space of a CXL Memory Expander is potentially partitioned into a volatile and persistent portion. A decoder maps a Host Physical Address (HPA) range to a DPA range and that translation depends on the value of all previous (lower instance number) decoders before the current one. In preparation for allowing dynamic provisioning of regions, decoders need an ABI to indicate which DPA partition a decoder targets. This ABI needs to be prepared for the possibility that some other agent committed and locked a decoder that spans the partition boundary. Add 'decoderX.Y/mode' to endpoint decoders that indicates which partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder currently spans the partition boundary. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165603881967.551046.6007594190951596439.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>