summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2013-02-21i5100_edac: connect fault injection to debugfs nodeNiklas Söderlund1-1/+70
Create a debugfs direcotry i5100_edac/mcX for each memory controller and add nodes to control how fault injection is preformed. After configuring an injection using inject_channel, inject_deviceptr1, inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel trigger the injection by writing anything to inject_enable. Example of a CE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Example of UE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2 echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1 echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Sometimes it is needed to enable the injection more then once (echo to the inject_enable node) for the injection to happen, I am not sure why. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21i5100_edac: add fault injection codeNiklas Söderlund1-0/+87
Add fault injection based on information datasheet for i5100, see 1. In addition to the i5100 datasheet some missing information on injection functions where found through experimentation and the i7300 datasheet, see 2. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf [2] Intel 7300 Chipset MemoryController Hub (MCH) Doc.Nr: 318082 http://www.intel.com/assets/pdf/datasheet/318082.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21i5100_edac: probe for device 19 function 0Niklas Söderlund1-1/+27
Probe and store the device handle for the device 19 function 0 during driver initialization. The device is used during fault injection. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21edac: only create sdram_scrub_rate where supportedMauro Carvalho Chehab1-10/+19
Currently, sdram_scrub_rate sysfs node is created even if the device doesn't support get/set the scub rate. Change the logic to only create this device node when the operation is supported. Reported-by: Felipe Balbi <balbi@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21i3200_edac: Fix the logic that detects filled memoriesMauro Carvalho Chehab1-9/+12
After running a series of tests on an HP DL320, filled with different memory sizes, it was noticed that, when filled with just one DIMM on such hardware, the driver wrongly detects twice the memory, and thinks that both channels 0 and 1 are filled. It seems to be partially caused by the BIOS and partially by the driver. The i3200_edac current logic would be working fine if the BIOS were disabling the unused second channel when just one DIMM is connected, in order to do power-saving, as recommended on this chipset's datasheet. However, the BIOS on this particular machine doesn't do it: [ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode [ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled So, the driver were assuming that 2 channels are enabled (well, they are, but the second is unused). Combined with that, I found two issues at the logic that creates the EDAC data, that were failing when the two channels are not equally filled (AFAICT, that happens only when just 1 DIMM is plugged). The first one is that a 0 at DRB means that nothing is filled. The driver's logic, however, do some calculation with that. The second one is that the logic that fills the DIMM data currently assumes that both channels are equally filled. I tested the system already with the current configuration and my patch and it is now working fine. So, for a 2R single DIMM 2Gb memory at dimm slot 01 (channel 0), it is now displaying: [ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0 [ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0 [ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0 [ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0 ... [ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb [ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb and the corresponding sysfs nodes are now properly filled. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21i3200_edac: Add more debug to the driverMauro Carvalho Chehab1-2/+14
Currently, it is not possible to know, when debug is enabled, if the driver is using 2 DIMMS per channel mode or not. It is not possible to know the values of the drbs registers, used to identify the memory rank sizes. Add debug for both, as it helps to track issues on the driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20Merge tag 'v3.8-rc7' into nextMauro Carvalho Chehab36-144/+802
Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...
2013-01-30EDAC: Fix kcalloc argument orderJoe Perches1-3/+3
First number, then size. Signed-off-by: Joe Perches <joe@perches.com> Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-30EDAC: Test correct variable in ->store functionDan Carpenter1-1/+1
We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-09Merge tag 'edac_fixes_for_3.8' of ↵Linus Torvalds2-18/+9
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fixes from Borislav Petkov: "Two error path fixes causing a crash and a Kconfig fix for an issue which spilled all EDAC suboptions into the 'Device Drivers' menu." * tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Cleanup device deregistering path EDAC: Fix EDAC Kconfig menu EDAC: Fix kernel panic on module unloading
2013-01-07EDAC: Cleanup device deregistering pathLans Zhang1-12/+6
Use device_unregister to replace put_device + device_del for cleanup, and fix the potential use after free. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07EDAC: Fix EDAC Kconfig menuBorislav Petkov1-5/+3
After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering the "Device Drivers" toplevel menu in menuconfig, the suboptions behind EDAC appeared merged with the rest of the device drivers types. This was because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig bool which was defined after the menu definition. When pushing EDAC_SUPPORT up, before the menu definition, the variable is defined earlier and the above menuconfig artifact doesn't happen. Drop a useless menuconfig comment while at it. Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07EDAC: Fix kernel panic on module unloadingKonstantin Khlebnikov1-2/+1
This patch fixes use-after-free and double-free bugs in edac_mc_sysfs_exit(). mci_pdev has single reference and put_device() calls mc_attr_release() which calls kfree(). The following device_del() works with already released memory. An another kfree() in edac_mc_sysfs_exit() releses the same memory again. Great. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: stable@vger.kernel.org # 3.[67] Cc: Denis Kirjanov <kirjanov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-03Drivers: edac: remove __dev* attributes.Greg Kroah-Hartman31-128/+111
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-21i7core_edac: fix kernel crash on unloading i7core_edac.Lans Zhang1-1/+1
It is easy to trigger this crash on 3.7.0: root@intel_westmere_ep-3:~# modprobe -r i7core_edac EDAC PCI: Removed device 0 for i7core_edac EDAC PCI controller: DEV 0000:fe:03.0 EDAC MC: Removed device 1 for i7core_edac.c i7 core #1: DEV 0000:fe:03.0 EDAC PCI: Removed device 1 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0 EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 IP: [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 PGD 1eaae7067 PUD 1e96e4067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: minix acpi_cpufreq freq_table mperf ioatdma processor edac_core(-) iTCO_wdt coretemp evdev hwmon lpc_ich dca mfd_core crc32c_intel ioapic [last unloaded: i7core_edac] CPU 3 Pid: 1268, comm: modprobe Not tainted 3.7.0-WR5.0.1.0_standard+ #30 Intel Corporation S5520HC/S5520HC RIP: 0010:[<ffffffff82069ee9>] [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 RSP: 0018:ffff8801eb12de28 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000000f0 RCX: 00000000ffffffff RDX: ffff88012b452800 RSI: 0000000000000002 RDI: 00000000000000f0 RBP: ffff8801eb12de68 R08: 0000000000000000 R09: ffffea0004ad1118 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8801eb12dee8 R14: ffff88012b452800 R15: 000000000060e518 FS: 00007f9ea95a9700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000110 CR3: 00000001262f1000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1268, threadinfo ffff8801eb12c000, task ffff8801e8421690) Stack: ffff88012c802a00 ffff88012b445ec0 ffff88012c802300 ffff88012b452800 0000000000000000 ffff8801eb12dee8 000000000060e080 000000000060e518 ffff8801eb12de78 ffffffff82069f56 ffff8801eb12dea8 ffffffff824ead7c Call Trace: [<ffffffff82069f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff824ead7c>] device_del+0x3c/0x1d0 [<ffffffffa00095a8>] edac_mc_sysfs_exit+0x1c/0x2f [edac_core] [<ffffffffa000961c>] edac_exit+0x4f/0x56 [edac_core] [<ffffffff820a3d2a>] sys_delete_module+0x17a/0x240 [<ffffffff8212da7c>] ? vm_munmap+0x5c/0x80 [<ffffffff82877682>] system_call_fastpath+0x16/0x1b Code: 90 90 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 31 c0 49 89 d6 48 89 fb <48> 8b 57 20 49 89 f5 41 89 cf 4c 8d 67 20 48 85 d2 74 2c 4c 89 RIP [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 RSP <ffff8801eb12de28> CR2: 0000000000000110 ---[ end trace b69acf12ccad1c0d ]--- Usually, edac_subsys is grabbed one time by pci at initialization. But edac_subsys may be released several times if multiple pci MCs exist. The fix just makes the operations balanced. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21i7core_edac: fix erroneous size of static arrayNiklas Söderlund1-4/+4
Remove size from lookup arrays and mark them as const. Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Niklas Söderlund <niso@kth.se> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21sb_edac: add a missing /n on a debug messageMauro Carvalho Chehab1-1/+1
[ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-21edac: edac_mc no longer deals with kobjects directlyShaun Ruffell1-7/+0
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and a comment that does not reflect the code anymore. Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-12-14Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds6-1/+685
Pull MIPS updates from Ralf Baechle: "The MIPS bits for 3.8. This also includes a bunch fixes that were sitting in the linux-mips.org git tree for a long time. This pull request contains updates to several OCTEON drivers and the board support code for BCM47XX, BCM63XX, XLP, XLR, XLS, lantiq, Loongson1B, updates to the SSB bus support, MIPS kexec code and adds support for kdump. When pulling this, there are two expected merge conflicts in include/linux/bcma/bcma_driver_chipcommon.h which are trivial to resolve, just remove the conflict markers and keep both alternatives." * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (90 commits) MIPS: PMC-Sierra Yosemite: Remove support. VIDEO: Newport Fix console crashes MIPS: wrppmc: Fix build of PCI code. MIPS: IP22/IP28: Fix build of EISA code. MIPS: RB532: Fix build of prom code. MIPS: PowerTV: Fix build. MIPS: IP27: Correct fucked grammar in ops-bridge.c MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled MIPS: Fix potencial corruption MIPS: Fix for warning from FPU emulation code MIPS: Handle COP3 Unusable exception as COP1X for FP emulation MIPS: Fix poweroff failure when HOTPLUG_CPU configured. MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y MIPS: Remove unused smvp.h MIPS/EDAC: Improve OCTEON EDAC support. MIPS: OCTEON: Add definitions for OCTEON memory contoller registers. MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian. MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree. MIPS: Remove usage of CEVT_R4K_LIB config option. ...
2012-12-13MIPS/EDAC: Improve OCTEON EDAC support.David Daney5-321/+348
Some initialization errors are reported with the existing OCTEON EDAC support patch. Also some parts have more than one memory controller. Fix the errors and add multiple controllers if present. Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-12MIPS: Cavium: Add EDAC support.Ralf Baechle7-1/+658
Drivers for EDAC on Cavium. Supported subsystems are: o CPU primary caches. These are parity protected only, so only error reporting. o Second level cache - ECC protected, provides SECDED. o Memory: ECC / SECDEC if used with suitable DRAM modules. The driver will will only initialize if ECC is enabled on a system so is safe to run on non-ECC memory. o PCI: Parity error reporting Since it is very hard to test this sort of code the implementation is very conservative and uses polling where possible for now. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
2012-12-11Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds12-446/+451
Pull EDAC fixes from Borislav Petkov: - EDAC core error path fix, from Denis Kirjanov. - Generalization of AMD MCE bank names and some minor error reporting improvements. - EDAC core cleanups and simplifications, from Wei Yongjun. - amd64_edac fixes for sysfs-reported values, from Josh Hunt. - some heavy amd64_edac error reporting path shaving, leading to removing a bunch of code. - amd64_edac error injection method improvements. - EDAC core cleanups and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits) EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code EDAC: Handle error path in edac_mc_sysfs_init() properly MCE, AMD: Dump error status MCE, AMD: Report decoded error type first MCE, AMD: Dump CPU f/m/s triple with the error MCE, AMD: Remove functional unit references EDAC: Convert to use simple_open() EDAC, Calxeda highbank: Convert to use simple_open() EDAC: Fix mc size reported in sysfs EDAC: Fix csrow size reported in sysfs EDAC: Pass mci parent EDAC: Add memory controller flags amd64_edac: Fix csrows size and pages computation amd64_edac: Use DBAM_DIMM macro amd64_edac: Fix K8 chip select reporting amd64_edac: Reorganize error reporting path amd64_edac: Do not check whether error address is valid amd64_edac: Improve error injection amd64_edac: Cleanup error injection code amd64_edac: Small fixlets and cleanups ...
2012-12-04EDAC, pci_sysfs: Use for_each_pci_dev to simplify the codeWei Yongjun1-8/+4
Use for_each_pci_dev to simplify the code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> [Boris: cleanup comments and drop loop brackets] Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edacLinus Torvalds4-17/+22
Pull EDAC fixes from Mauro Carvalho Chehab: "One EDAC core fix, and a few driver fixes (i7300, i9275x, i7core)." * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: i7core_edac: fix panic when accessing sysfs files i7300_edac: Fix error flag testing edac: Fix the dimm filling for csrows-based layouts i82975x_edac: Fix dimm label initialization
2012-11-28EDAC: Handle error path in edac_mc_sysfs_init() properlyDenis Kirjanov1-2/+15
Make sure proper deregistration happens on all error paths in edac_mc_sysfs_init. Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> [ Boris: cleanup and concretize commit message ] Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28MCE, AMD: Dump error statusBorislav Petkov2-6/+22
Dump error status after decoding the error which describes the error disposition. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Report decoded error type firstBorislav Petkov1-25/+25
Instead of starting with the error details, report the decoded, readable error type first. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Dump CPU f/m/s triple with the errorBorislav Petkov1-4/+6
It is very useful to have the family/model/stepping with the reported error so dump it. This saves us asking the bug reporter about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28MCE, AMD: Remove functional unit referencesBorislav Petkov2-97/+94
Having the functional unit names in each bank decode is only misleading as this code supports multiple families and there's no guarantee the mapping between FUs and MCE banks will stay the same. And also, knowing the functional unit name doesn't help much since you end up looking at the respective BKDG anyway. So drop all FU references and use the MC bank numbers instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Convert to use simple_open()Wei Yongjun1-7/+1
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC, Calxeda highbank: Convert to use simple_open()Wei Yongjun1-7/+1
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix mc size reported in sysfsJosh Hunt1-4/+8
This is the complement to previous commit "EDAC: Fix csrow size reported in sysfs". This fixes the memory controller size reporting on csrow-based memory controllers. The csrow size is already combined for both channels. Without this patch memory size is reported doubled. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix csrow size reported in sysfsBorislav Petkov2-0/+4
On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Pass mci parentBorislav Petkov1-0/+1
Initialize the mem_ctl_info descriptor of a csrow properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Add memory controller flagsBorislav Petkov1-0/+1
The first flag is ->csbased and will be used in common EDAC code later. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Fix csrows size and pages computationBorislav Petkov1-21/+27
Make sure code pays attention to K8 having only one DCT, reformat and cleanup code, correct debug messages, remove unused code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Use DBAM_DIMM macroBorislav Petkov2-2/+2
Instead of open-coding it, use the DBAM_DIMM macro in amd64_csrow_nr_pages() which we have already. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Fix K8 chip select reportingBorislav Petkov1-6/+3
This basically reverts 603adaf6b3e3 ("amd64_edac: fix K8 chip select reporting") because it was a clumsy workaround for DIMM sizes reporting on K8 which got superceded by a much more correct one with 41d8bfaba70 ("amd64_edac: Improve DRAM address mapping") without removing the prior one. Remove it now finally. Reported-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Reorganize error reporting pathBorislav Petkov2-124/+89
Rewrite CE/UE paths so that they use the same code and drop additional code duplication in handle_ue. Add a struct err_info which collects required info for the error reporting. This, in turn, helps slimming all edac_mc_handle_error() calls down to one. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Do not check whether error address is validBorislav Petkov1-21/+0
All families report a valid error address when encountering a DRAM ECC error so no need to check it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Improve error injectionBorislav Petkov3-10/+41
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine does this by injecting the error in the next non-cached access. This takes relatively long time on a normal system so that in order for us to expedite it, we disable the caches around the injection. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Cleanup error injection codeBorislav Petkov2-69/+60
Invert kstrtoul return value testing and win one indentation level. Also, shorten up macro names so that the lines can fit into 80 cols. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28amd64_edac: Small fixlets and cleanupsBorislav Petkov1-8/+5
amd64_get_dram_hole_info: remove local variable 'base'. sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize constants formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Handle empty msg strings when reporting errorsBorislav Petkov1-19/+25
A reported error could look like this [ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6) with two spaces back-to-back due to the msg argument of edac_mc_handle_error being passed on empty by the specific drivers. Handle that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Remove useless assignment of error typeBorislav Petkov1-5/+2
The tracepoint decodes the error type later anyway so remove a useless assignment to the temporary p which gets overwritten later anyway. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Boundary-check edac_debug_levelBorislav Petkov2-11/+24
Only levels [0:4] are allowed so enforce that. Also, while at it, massage Kconfig text and add valid debug levels range to the module parameter description. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Respect operational state in edac_pci.cBorislav Petkov1-1/+2
Currently, we unconditionally enable PCI polling and we don't look at the edac_op_state module parameter. Make this dependent on the parameter setting supplied on the command line. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28i7core_edac: fix panic when accessing sysfs filesPrarit Bhargava1-3/+3
The i7core_edac addrmatch_dev and chancounts_dev have sysfs files associated with them. The sysfs files, however, are coded so that the parent device is is the mci device. This is incorrect and the mci struct should be obtained through the addrmatch_dev and chancounts_dev device's private data field which is populated in i7core_create_sysfs_devices(). Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-10-30EDAC: Change Boris' email addressBorislav Petkov3-4/+4
My @amd.com address will be invalid soon so move to private email address. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1351532410-4887-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-25i7300_edac: Fix error flag testingJean Delvare1-4/+4
* Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so that the callers get the result they expect. * Fix definition of FERR_FAT_FBD_ERR_MASK. * Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on register FERR_NF_FBD. We were lucky they have the same definition. This fixes kernel bug #44131: https://bugzilla.kernel.org/show_bug.cgi?id=44131 Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>