summaryrefslogtreecommitdiff
path: root/kernel/kexec.c
AgeCommit message (Collapse)AuthorFilesLines
2009-05-24PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUsRafael J. Wysocki1-2/+0
We shouldn't hold dpm_list_mtx while executing [disable|enable]_nonboot_cpus(), because theoretically this may lead to a deadlock as shown by the following example (provided by Johannes Berg): CPU 3 CPU 2 CPU 1 suspend/hibernate something: rtnl_lock() device_pm_lock() -> mutex_lock(&dpm_list_mtx) mutex_lock(&dpm_list_mtx) linkwatch_work -> rtnl_lock() disable_nonboot_cpus() -> flush CPU 3 workqueue Fortunately, device drivers are supposed to stop any activities that might lead to the registration of new device objects way before disable_nonboot_cpus() is called, so it shouldn't be necessary to hold dpm_list_mtx over the entire late part of device suspend and early part of device resume. Thus, during the late suspend and the early resume of devices acquire dpm_list_mtx only when dpm_list is going to be traversed and release it right after that. This patch is reported to fix the regressions tracked as http://bugzilla.kernel.org/show_bug.cgi?id=13245. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Miles Lane <miles.lane@gmail.com> Tested-by: Ming Lei <tom.leiming@gmail.com>
2009-04-02kexec: vmcoreinfo_data[] can become staticDmitri Vorobiev1-1/+1
The vmcoreinfo_data[] array is not used outside of kernel/kexec.c, and can therefore become static. This patch adds the relevant keyword to the definition of the array. Noticed by sparse. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02kexec: add dmesg log symbols to /proc/vmcoreinfo listsNeil Horman1-0/+1
It would be nice to be able to extract the dmesg log from a vmcore file without needing to keep the debug symbols for the running kernel handy all the time. We have a facility to do this in /proc/vmcore. This patch adds the log_buf and log_end symbols to the vmcoreinfo area so that tools (like makedumpfile) can easily extract the dmesg logs from a vmcore image. [akpm@linux-foundation.org: several fixes and cleanups] [akpm@linux-foundation.org: fix unused log_buf_kexec_setup()] [akpm@linux-foundation.org: build fix] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30kexec: Change kexec jump code orderingRafael J. Wysocki1-10/+9
Change the ordering of the kexec jump code so that the nonboot CPUs are disabled after calling device drivers' "late suspend" methods. This change reflects the recent modifications of the power management code that is also used by kexec jump. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-30PM: Rework handling of interrupts during suspend-resumeRafael J. Wysocki1-4/+4
Use the functions introduced in by the previous patch, suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(), to rework the handling of interrupts during suspend (hibernation) and resume. Namely, interrupts will only be disabled on the CPU right before suspending sysdevs, while device drivers will be prevented from receiving interrupts, with the help of the new helper function, before their "late" suspend callbacks run (and analogously during resume). In addition, since the device interrups are now disabled before the CPU has turned all interrupts off and the CPU will ACK the interrupts setting the IRQ_PENDING bit for them, check in sysdev_suspend() if any wake-up interrupts are pending and abort suspend if that's the case. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-02-22Merge branch 'linus' into x86/apicIngo Molnar1-0/+7
Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22PM: Split up sysdev_[suspend|resume] from device_power_[down|up]Rafael J. Wysocki1-0/+7
Move the sysdev_suspend/resume from the callee to the callers, with no real change in semantics, so that we can rework the disabling of interrupts during suspend/hibernation. This is based on an earlier patch from Linus. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-10elf: add ELF_CORE_COPY_KERNEL_REGS()Tejun Heo1-1/+1
ELF core dump is used for both user land core dump and kernel crash dump. Depending on architecture, register might need to be accessed differently for userland and kernel. Allow architectures to define ELF_CORE_COPY_KERNEL_REGS() and use different operation for kernel register dump. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14[CVE-2009-0029] System call wrappers part 07Heiko Carstens1-3/+2
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2009-01-01cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: coreRusty Russell1-1/+1
Impact: cleanup In future, all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: James Morris <jmorris@namei.org> Cc: Eric Biederman <ebiederm@xmission.com>
2008-10-20kexec: fix crash_save_vmcoreinfo_init build problemLuck, Tony1-0/+1
This fixes kernel/kexec.c: In function 'crash_save_vmcoreinfo_init': kernel/kexec.c:1374: error: 'vmlist' undeclared (first use in this function) kernel/kexec.c:1374: error: (Each undeclared identifier is reported only once kernel/kexec.c:1374: error: for each function it appears in.) kernel/kexec.c:1410: error: invalid use of undefined type 'struct vm_struct' make[1]: *** [kernel/kexec.o] Error 1 Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20kdump: add vmlist.addr to vmcoreinfo for x86 vmalloc translation.Ken'ichi Ohmichi1-0/+2
Add the symbols 'vmlist' and offset 'vm_struct.addr' to the vmcoreinfo[1] data for i386 vmalloc translation. makedumpfile[2] needs VMALLOC_START value for distinguishing a vmalloc address or not, because it should choose suitable translation method. If applying this patch, makedumpfile will be able to take VMALLOC_START value from 'vmlist.addr'. vmcoreinfo[1]: The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile[2] uses it to distinguish unnecessary pages and creates a small dumpfile. makedumpfile[2]: dump filtering command https://sourceforge.net/projects/makedumpfile/ Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-23kexec: fix segmentation fault in kimage_add_entryJonathan Steel1-1/+7
A segmentation fault can occur in kimage_add_entry in kexec.c when loading a kernel image into memory. The fault occurs because a page is requested by calling kimage_alloc_page with gfp_mask GFP_KERNEL and the function may actually return a page with gfp_mask GFP_HIGHUSER. The high mem page is returned because it was swapped with the kernel page due to the kernel page being a page that will shortly be copied to. This patch ensures that kimage_alloc_page returns a page that was created with the correct gfp flags. I have verified the change and fixed the whitespace damage of the original patch. Jonathan did a great job of tracking this down after he hit the problem. -- Eric Signed-off-by: Jonathan Steel <jon.steel@esentire.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec: use a mutex for locking rather than xchg()Andrew Morton1-24/+10
Functionally the same, but more conventional. Cc: Huang Ying <ying.huang@intel.com> Tested-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec jump: fix for ftraceHuang Ying1-2/+0
Ftrace depends on some processor state that we destroyed during kexec and restored by restore_processor_state(). So save_processor_state() and restore_processor_state() are moved into machine_kexec() and ftrace is restored after restore_processor_state(). Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec jump: in sync with hibernation implementationHuang Ying1-0/+2
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with current hibernation implementation. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec jump: remove duplication of kexec_restart_prepare()Huang Ying1-5/+1
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the code. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZEHuang Ying1-3/+3
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. [akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion] Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec jump: clean up #ifdef and commentsHuang Ying1-9/+8
Move if (kexec_image->preserve_context) { ... } into #ifdef CONFIG_KEXEC_JUMP to make code looks cleaner. Fix no longer correct comments of kernel_kexec(). Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec()Huang Ying1-1/+2
kernel/kexec.c: In function 'kernel_kexec': kernel/kexec.c:1506: warning: value computed is not used Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26kexec jump: save/restore device stateHuang Ying1-0/+39
This patch implements devices state save/restore before after kexec. This patch together with features in kexec_jump patch can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. The following user-space tools can be used with kexec jump: - kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 - makedumpfile with patches are used as memory image saving tool, it can exclude free pages from original kernel memory image file. The patches and the precompiled makedumpfile can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10 - An initramfs image can be used as the root file system of kexeced kernel. An initramfs image built with "BuildRoot" can be downloaded from the following URL: initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz All user space tools above are included in the initramfs image. Usage example of simple hibernation: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PM=y CONFIG_HIBERNATION=y CONFIG_KEXEC_JUMP=y 2. Build an initramfs image contains kexec-tool and makedumpfile, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Prepare a partition to save memory image of original kernel, called hibernating partition in following text. 4. Boot kernel compiled in step 1 (kernel A). 5. In the kernel A, load kernel compiled in step 1 (kernel B) with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000 --mem-max=0xffffff --initrd=rootfs.gz 6. Boot the kernel B with following shell command line: /sbin/kexec -e 7. The kernel B will boot as normal kexec. In kernel B the memory image of kernel A can be saved into hibernating partition as follow: jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='` echo $jump_back_entry > kexec_jump_back_entry cp /proc/vmcore dump.elf Then you can shutdown the machine as normal. 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as root file system. 9. In kernel C, load the memory image of kernel A as follow: /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf 10. Jump back to the kernel A as follow: /sbin/kexec -e Then, kernel A is resumed. Implementation point: To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. Known issues: - Because the segment number supported by sys_kexec_load is limited, hibernation image with many segments may not be load. This is planned to be eliminated by adding a new flag to sys_kexec_load to make a image can be loaded with multiple sys_kexec_load invoking. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26kexec jumpHuang Ying1-0/+57
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26kernel/kexec.c: make 'kimage_terminate' voidWANG Cong1-6/+2
Since kimage_terminate() always returns 0, make it void. Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01kexec: make extended crashkernel= syntax less confusingMichael Ellerman1-1/+1
The extended crashkernel syntax is a little confusing in the way it handles ranges. eg: crashkernel=512M-2G:64M,2G-:128M Means if the machine has between 512M and 2G of memory the crash region should be 64M, and if the machine has 2G of memory the region should be 64M. Only if the machine has more than 2G memory will 128M be allocated. Although that semantic is correct, it is somewhat baffling. Instead I propose that the end of the range means the first address past the end of the range, ie: 512M up to but not including 2G. [bwalle@suse.de: clarify inclusive/exclusive in crashkernel commandline in documentation] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Bernhard Walle <bwalle@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28vmcoreinfo: add page flags valuesKen'ichi Ohmichi1-0/+3
Add some values of page flags to the vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. An old makedumpfile (v1.2.4 or before) had assumed some values of page flags internally, and this implementation could not follow the change of these values. For example, Christoph Lameter is changing these values by the follwing patch: http://lkml.org/lkml/2008/2/29/463 So a new makedumpfile (v1.2.5) came to need these values and I created this patch to let the kernel output them. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-18kernel: Remove unnecessary inclusions of asm/semaphore.hMatthew Wilcox1-1/+0
None of these files use any of the functionality promised by asm/semaphore.h. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-02-07vmcoreinfo: add "VMCOREINFO_" to all the call for vmcoreinfo_append_str()Ken'ichi Ohmichi1-2/+2
For readability, all the calls to vmcoreinfo_append_str() are changed to macros having a prefix "VMCOREINFO_". This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0584.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07vmcoreinfo: rename vmcoreinfo's macros returning the sizeKen'ichi Ohmichi1-7/+7
This patchset is for the vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. This patch: VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always returning the size of the struct with a given name. This change would allow VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not be used exclusively for typedefs. This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08vmcoreinfo: add the array length of "free_list" for filtering free pagesKen'ichi Ohmichi1-0/+1
This patch adds the array length of "free_area.free_list" to the vmcoreinfo data so that makedumpfile (dump filtering command) can exclude all free pages in linux-2.6.24. makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. In 2.6.24-rc1 or later, the free_area.free_list is an array which has one list for each migrate types instead of a single list. makedumpfile needs the array length of "free_area.free_list" and the vmcoreinfo data should contain it. Signed-off-by: Huang Ying <ying.huang@intel.com> Tested-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Extended crashkernel command lineBernhard Walle1-0/+166
This patch adds a extended crashkernel syntax that makes the value of reserved system RAM dependent on the system RAM itself: crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] range=start-[end] For example: crashkernel=512M-2G:64M,2G-:128M The motivation comes from distributors that configure their crashkernel command line automatically with some configuration tool (YaST, you know ;)). Of course that tool knows the value of System RAM, but if the user removes RAM, then the system becomes unbootable or at least unusable and error handling is very difficult. This series implements this change for i386, x86_64, ia64, ppc64 and sh. That should be all platforms that support kdump in current mainline. I tested all platforms except sh due to the lack of a sh processor. This patch: This is the generic part of the patch. It adds a parse_crashkernel() function in kernel/kexec.c that is called by the architecture specific code that actually reserves the memory. That function takes the whole command line and looks itself for "crashkernel=" in it. If there are multiple occurrences, then the last one is taken. The advantage is that if you have a bootloader like lilo or elilo which allows you to append a command line parameter but not to remove one (like in GRUB), then you can add another crashkernel value for testing at the boot command line and this one overwrites the command line in the configuration then. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19pid namespaces: define is_global_init() and is_container_init()Serge E. Hallyn1-1/+1
is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sparse pointer use of zero as nullStephen Hemminger1-2/+2
Get rid of sparse related warnings from places that use integer as NULL pointer. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Ian Kent <raven@themaw.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macrosKen'ichi Ohmichi1-34/+34
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is impossible to grep for them. So these names should be changed. This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_dataKen'ichi Ohmichi1-0/+2
[2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data. The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed the above values, and it was not good from the reliability viewpoint. So makedumpfile v1.2.0 came to need these values and I created the patch to let the kernel output them. makedumpfile site: https://sourceforge.net/projects/makedumpfile/ Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: cleanup the coding style according to Andrew's commentsKen'ichi Ohmichi1-5/+5
[1/3] Cleanup the coding style according to Andrew's comments: http://lists.infradead.org/pipermail/kexec/2007-August/000522.html - vmcoreinfo_append_str() should have suitable __attribute__s so that the compiler can check its use. - vmcoreinfo_max_size should have size_t. - Use get_seconds() instead of xtime.tv_sec. - Use init_uts_ns.name.release instead of UTS_RELEASE. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add vmcoreinfoKen'ichi Ohmichi1-0/+110
This patch set frees the restriction that makedumpfile users should install a vmlinux file (including the debugging information) into each system. makedumpfile command is the dump filtering feature for kdump. It creates a small dumpfile by filtering unnecessary pages for the analysis. To distinguish unnecessary pages, it needs a vmlinux file including the debugging information. These days, the debugging package becomes a huge file, and it is hard to install it into each system. To solve the problem, kdump developers discussed it at lkml and kexec-ml. As the result, we reached the conclusion that necessary information for dump filtering (called "vmcoreinfo") should be embedded into the first kernel file and it should be accessed through /proc/vmcore during the second kernel. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html) Dan Aloni created the patch set for the above implementation. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html) And I updated it for multi architectures and memory models. (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html) Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Clean up duplicate includes in kernel/Jesper Juhl1-1/+0
This patch cleans up duplicate includes in kernel/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08kdump/kexec: calculate note size at compile timeSimon Horman1-2/+2
Currently the size of the per-cpu region reserved to save crash notes is set by the per-architecture value MAX_NOTE_BYTES. Which in turn is currently set to 1024 on all supported architectures. While testing ia64 I recently discovered that this value is in fact too small. The particular setup I was using actually needs 1172 bytes. This lead to very tedious failure mode where the tail of one elf note would overwrite the head of another if they ended up being alocated sequentially by kmalloc, which was often the case. It seems to me that a far better approach is to caclculate the size that the area needs to be. This patch does just that. If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is needed then this should be as easy as making MAX_NOTE_BYTES larger in arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I think that the approach in this patch is a much more robust idea. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-07Merge branch 'release' of ↵Linus Torvalds1-0/+1
master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] replace kmalloc+memset with kzalloc [IA64] resolve name clash by renaming is_available_memory() [IA64] Need export for csum_ipv6_magic [IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP [PATCH] Add support for type argument in PAL_GET_PSTATE [IA64] tidy up return value of ip_fast_csum [IA64] implement csum_ipv6_magic for ia64. [IA64] More Itanium PAL spec updates [IA64] Update processor_info features [IA64] Add se bit to Processor State Parameter structure [IA64] Add dp bit to cache and bus check structs [IA64] SN: Correctly update smp_affinty mask [IA64] sparse cleanups [IA64] IA64 Kexec/kdump
2006-12-07[IA64] IA64 Kexec/kdumpZou Nan hai1-0/+1
Changes and updates. 1. Remove fake rendz path and related code according to discuss with Khalid Aziz. 2. fc.i offset fix in relocate_kernel.S. 3. iospic shutdown code eoi and mask race fix from Fujitsu. 4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner. 5. Send slave to SAL slave loop patch from Jay Lan. 6. Kdump on non-recoverable MCA event patch from Jay Lan 7. Use CTL_UNNUMBERED in kdump_on_init sysctl. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-12-07[PATCH] Kexec / Kdump: Unify elf note codeMagnus Damm1-0/+56
The elf note saving code is currently duplicated over several architectures. This cleanup patch simply adds code to a common file and then replaces the arch-specific code with calls to the newly added code. The only drawback with this approach is that s390 doesn't fully support kexec-on-panic which for that arch leads to introduction of unused code. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] kernel core: replace kmalloc+memset with kzallocBurman Yan1-2/+1
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29[PATCH] kexec warning fixRoland McGrath1-2/+4
This fixes a couple of compiler warnings, and adds paranoia checks as well. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29[PATCH] pidspace: is_init()Sukadev Bhattiprolu1-1/+1
This is an updated version of Eric Biederman's is_init() patch. (http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and replaces a few more instances of ->pid == 1 with is_init(). Further, is_init() checks pid and thus removes dependency on Eric's other patches for now. Eric's original description: There are a lot of places in the kernel where we test for init because we give it special properties. Most significantly init must not die. This results in code all over the kernel test ->pid == 1. Introduce is_init to capture this case. With multiple pid spaces for all of the cases affected we are looking for only the first process on the system, not some other process that has pid == 1. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: <lxc-devel@lists.sourceforge.net> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.David Wilder1-4/+2
With this patch, kdump uses the firmware soft-reset NMI for two purposes: 1) Initiate the kdump (take a crash dump) by issuing a soft-reset. 2) Break a CPU out of a deadlock condition that is detected during kdump processing. When a soft-reset is initiated each CPU will enter system_reset_exception() and set its corresponding bit in the global bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs are "secondary CPUs". The secondary CPU's pass through to crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs to enter via soft-reset then boots the kdump kernel (see crash_soft_reset_check()) When the system crashes due to a panic or exception, crash_kexec() is called by panic() or die(). The crashing CPU sends an IPI to all other CPUs to notify them of the pending shutdown. If a CPU is in a deadlock or hung state with interrupts disabled, the IPI will not be delivered. The result being, that the kdump kernel is not booted. This problem is solved with the use of a firmware generated soft-reset. After the crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to enter crash_ipi_callback(). A CPU signifies its entry to crash_ipi_callback() by setting its corresponding bit in the cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The operator is then prompted to generate a soft-reset to break the deadlock. Each CPU enters the soft reset handler as described above. Two conditions must be handled at this point: 1) The system crashed because the operator generated a soft-reset. See 2) The system had crashed before the soft-reset was generated ( in the case of a Panic or oops). The first CPU to enter crash_kexec() uses the state of the kexec_lock to determine this state. If kexec_lock is already held then condition 2 is true and crash_kexec_secondary() is called, else; this CPU is flagged as the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds as described above. Each additional CPUs responding to the soft-reset will pass through crash_kexec() to kexec_secondary(). All secondary CPUs call crash_ipi_callback() readying them self's for the shutdown. When ready they clear their bit in cpus_in_sr. The crashing CPU waits in kexec_secondary() until all other CPUs have cleared their bits in cpus_in_sr. The kexec kernel boot is then started. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-23[PATCH] Add a sysfs file to determine if a kexec kernel is loadedJeff Moyer1-3/+3
Create two files in /sys/kernel, kexec_loaded and kexec_crash_loaded. Each file contains a simple boolean value indicating whether the relevant kernel has been loaded into memory. The motivation for this is geared around support. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11[PATCH] move capable() to capability.hRandy.Dunlap1-0/+1
- Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] kdump: save registers early (inline functions)Vivek Goyal1-1/+3
- If system panics then cpu register states are captured through funciton crash_get_current_regs(). This is not a inline function hence a stack frame is pushed on to the stack and then cpu register state is captured. Later this frame is popped and new frames are pushed (machine_kexec). - In theory this is not very right as we are capturing register states for a frame and that frame is no more valid. This seems to have created back trace problems for ppc64. - This patch fixes it up. The very first thing it does after entering crash_kexec() is to capture the register states. Anyway we don't want the back trace beyond crash_kexec(). crash_get_current_regs() has been made inline - crash_setup_regs() is the top architecture dependent function which should be responsible for capturing the register states as well as to do some architecture dependent tricks. For ex. fixing up ss and esp for i386. crash_setup_regs() has also been made inline to ensure no new call frame is pushed onto stack. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] kdump: dynamic per cpu allocation of memory for saving cpu registersVivek Goyal1-0/+16
- In case of system crash, current state of cpu registers is saved in memory in elf note format. So far memory for storing elf notes was being allocated statically for NR_CPUS. - This patch introduces dynamic allocation of memory for storing elf notes. It uses alloc_percpu() interface. This should lead to better memory usage. - Introduced based on Andi Kleen's and Eric W. Biederman's suggestions. - This patch also moves memory allocation for elf notes from architecture dependent portion to architecture independent portion. Now crash_notes is architecture independent. The whole idea is that size of memory to be allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and allocation of this memory can be architecture independent. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: split page table lockHugh Dickins1-2/+2
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with a many-threaded application which concurrently initializes different parts of a large anonymous area. This patch corrects that, by using a separate spinlock per page table page, to guard the page table entries in that page, instead of using the mm's single page_table_lock. (But even then, page_table_lock is still used to guard page table allocation, and anon_vma allocation.) In this implementation, the spinlock is tucked inside the struct page of the page table page: with a BUILD_BUG_ON in case it overflows - which it would in the case of 32-bit PA-RISC with spinlock debugging enabled. Splitting the lock is not quite for free: another cacheline access. Ideally, I suppose we would use split ptlock only for multi-threaded processes on multi-cpu machines; but deciding that dynamically would have its own costs. So for now enable it by config, at some number of cpus - since the Kconfig language doesn't support inequalities, let preprocessor compare that with NR_CPUS. But I don't think it's worth being user-configurable: for good testing of both split and unsplit configs, split now at 4 cpus, and perhaps change that to 8 later. There is a benefit even for singly threaded processes: kswapd can be attacking one part of the mm while another part is busy faulting. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>