summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-07-05Merge branch 'work.memdup_user' of ↵Linus Torvalds2-39/+20
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull memdup_user() conversions from Al Viro: "A fairly self-contained series - hunting down open-coded memdup_user() and memdup_user_nul() instances" * 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bpf: don't open-code memdup_user() kimage_file_prepare_segments(): don't open-code memdup_user() ethtool: don't open-code memdup_user() do_ip_setsockopt(): don't open-code memdup_user() do_ipv6_setsockopt(): don't open-code memdup_user() irda: don't open-code memdup_user() xfrm_user_policy(): don't open-code memdup_user() ima_write_policy(): don't open-code memdup_user_nul() sel_write_validatetrans(): don't open-code memdup_user_nul()
2017-07-05Merge branch 'timers-compat' of ↵Linus Torvalds7-147/+241
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull timer-related user access updates from Al Viro: "Continuation of timers-related stuff (there had been more, but my parts of that series are already merged via timers/core). This is more of y2038 work by Deepa Dinamani, partially disrupted by the unification of native and compat timers-related syscalls" * 'timers-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: posix_clocks: Use get_itimerspec64() and put_itimerspec64() timerfd: Use get_itimerspec64() and put_itimerspec64() nanosleep: Use get_timespec64() and put_timespec64() posix-timers: Use get_timespec64() and put_timespec64() posix-stubs: Conditionally include COMPAT_SYS_NI defines time: introduce {get,put}_itimerspec64 time: add get_timespec64 and put_timespec64
2017-07-05Merge branch 'work.sys_wait' of ↵Linus Torvalds3-219/+170
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull wait syscall updates from Al Viro: "Consolidating sys_wait* and compat counterparts. Gets rid of set_fs()/double-copy mess, simplifies the whole thing (lifting the copyouts to the syscalls means less headache in the part that does actual work - fewer failure exits, to start with), gets rid of the overhead of field-by-field __put_user()" * 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: osf_wait4: switch to kernel_wait4() waitid(): switch copyout of siginfo to unsafe_put_user() wait_task_zombie: consolidate info logics kill wait_noreap_copyout() lift getrusage() from wait_noreap_copyout() waitid(2): leave copyout of siginfo to syscall itself kernel_wait4()/kernel_waitid(): delay copying status to userland wait4(2)/waitid(2): separate copying rusage to userland move compat wait4 and waitid next to native variants
2017-07-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds10-149/+786
Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...
2017-07-05Merge branch 'next' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: - a major update for AppArmor. From JJ: * several bug fixes and cleanups * the patch to add symlink support to securityfs that was floated on the list earlier and the apparmorfs changes that make use of securityfs symlinks * it introduces the domain labeling base code that Ubuntu has been carrying for several years, with several cleanups applied. And it converts the current mediation over to using the domain labeling base, which brings domain stacking support with it. This finally will bring the base upstream code in line with Ubuntu and provide a base to upstream the new feature work that Ubuntu carries. * This does _not_ contain any of the newer apparmor mediation features/controls (mount, signals, network, keys, ...) that Ubuntu is currently carrying, all of which will be RFC'd on top of this. - Notable also is the Infiniband work in SELinux, and the new file:map permission. From Paul: "While we're down to 21 patches for v4.13 (it was 31 for v4.12), the diffstat jumps up tremendously with over 2k of line changes. Almost all of these changes are the SELinux/IB work done by Daniel Jurgens; some other noteworthy changes include a NFS v4.2 labeling fix, a new file:map permission, and reporting of policy capabilities on policy load" There's also now genfscon labeling support for tracefs, which was lost in v4.1 with the separation from debugfs. - Smack incorporates a safer socket check in file_receive, and adds a cap_capable call in privilege check. - TPM as usual has a bunch of fixes and enhancements. - Multiple calls to security_add_hooks() can now be made for the same LSM, to allow LSMs to have hook declarations across multiple files. - IMA now supports different "ima_appraise=" modes (eg. log, fix) from the boot command line. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (126 commits) apparmor: put back designators in struct initialisers seccomp: Switch from atomic_t to recount_t seccomp: Adjust selftests to avoid double-join seccomp: Clean up core dump logic IMA: update IMA policy documentation to include pcr= option ima: Log the same audit cause whenever a file has no signature ima: Simplify policy_func_show. integrity: Small code improvements ima: fix get_binary_runtime_size() ima: use ima_parse_buf() to parse template data ima: use ima_parse_buf() to parse measurements headers ima: introduce ima_parse_buf() ima: Add cgroups2 to the defaults list ima: use memdup_user_nul ima: fix up #endif comments IMA: Correct Kconfig dependencies for hash selection ima: define is_ima_appraise_enabled() ima: define Kconfig IMA_APPRAISE_BOOTPARAM option ima: define a set of appraisal rules requiring file signatures ima: extend the "ima_policy" boot command line to support multiple policies ...
2017-07-05Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds3-49/+53
Pull audit updates from Paul Moore: "Things are relatively quiet on the audit front for v4.13, just five patches for a total diffstat of 102 lines. There are two patches from Richard to consistently record the POSIX capabilities and add the ambient capability information as well. I also chipped in two patches to fix a race condition with the auditd tracking code and ensure we don't skip sending any records to the audit multicast group. Finally a single style fix that I accepted because I must have been in a good mood that day. Everything passes our test suite, and should be relatively harmless, please merge for v4.13" * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit: audit: make sure we never skip the multicast broadcast audit: fix a race condition with the auditd tracking code audit: style fix audit: add ambient capabilities to CAPSET and BPRM_FCAPS records audit: unswing cap_* fields in PATH records
2017-07-05Merge branch 'for-linus' of ↵Linus Torvalds3-14/+47
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Store printk() messages into the main log buffer directly even in NMI when the lock is available. It is the best effort to print even large chunk of text. It is handy, for example, when all ftrace messages are printed during the system panic in NMI. - Add missing annotations to calm down compiler warnings * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: add __printf attributes to internal functions printk: Use the main logbuf in NMI when logbuf_lock is available
2017-07-04Merge tag 'pm-4.13-rc1' of ↵Linus Torvalds4-14/+36
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The big ticket items here are the rework of suspend-to-idle in order to add proper support for power button wakeup from it on recent Dell laptops and the rework of interfaces exporting the current CPU frequency on x86. In addition to that, support for a few new pieces of hardware is added, the PCI/ACPI device wakeup infrastructure is simplified significantly and the wakeup IRQ framework is fixed to unbreak the IRQ bus locking infrastructure. Also, there are some functional improvements for intel_pstate, tools updates and small fixes and cleanups all over. Specifics: - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski)" * tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits) cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes PM: hibernate: constify attribute_group structures. cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style PM / Domains: Fix missing default_power_down_ok comment PM / Domains: Fix unsafe iteration over modified list of domains PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device PM / Domains: Call driver's noirq callbacks PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags PM / QoS: constify *_attribute_group. PM / AVS: rockchip-io: add io selectors and supplies for rk3228 powercap/RAPL: prevent overridding bits outside of the mask PM / sysfs: Constify attribute groups ...
2017-07-03Merge tag 'docs-4.13' of git://git.lwn.net/linuxLinus Torvalds6-27/+27
Pull documentation updates from Jonathan Corbet: "There has been a fair amount of activity in the docs tree this time around. Highlights include: - Conversion of a bunch of security documentation into RST - The conversion of the remaining DocBook templates by The Amazing Mauro Machine. We can now drop the entire DocBook build chain. - The usual collection of fixes and minor updates" * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits) scripts/kernel-doc: handle DECLARE_HASHTABLE Documentation: atomic_ops.txt is core-api/atomic_ops.rst Docs: clean up some DocBook loose ends Make the main documentation title less Geocities Docs: Use kernel-figure in vidioc-g-selection.rst Docs: fix table problems in ras.rst Docs: Fix breakage with Sphinx 1.5 and upper Docs: Include the Latex "ifthen" package doc/kokr/howto: Only send regression fixes after -rc1 docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters doc: Document suitability of IBM Verse for kernel development Doc: fix a markup error in coding-style.rst docs: driver-api: i2c: remove some outdated information Documentation: DMA API: fix a typo in a function name Docs: Insert missing space to separate link from text doc/ko_KR/memory-barriers: Update control-dependencies example Documentation, kbuild: fix typo "minimun" -> "minimum" docs: Fix some formatting issues in request-key.rst doc: ReSTify keys-trusted-encrypted.txt doc: ReSTify keys-request-key.txt ...
2017-07-03Merge tag 'char-misc-4.13-rc1' of ↵Linus Torvalds2-2/+14
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here is the "big" char/misc driver patchset for 4.13-rc1. Lots of stuff in here, a large thunderbolt update, w1 driver header reorg, the new mux driver subsystem, google firmware driver updates, and a raft of other smaller things. Full details in the shortlog. All of these have been in linux-next for a while with the only reported issue being a merge problem with this tree and the jc-docs tree in the w1 documentation area" * tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (147 commits) misc: apds990x: Use sysfs_match_string() helper mei: drop unreachable code in mei_start mei: validate the message header only in first fragment. DocBook: w1: Update W1 file locations and names in DocBook mux: adg792a: always require I2C support nvmem: rockchip-efuse: add support for rk322x-efuse nvmem: core: add locking to nvmem_find_cell nvmem: core: Call put_device() in nvmem_unregister() nvmem: core: fix leaks on registration errors nvmem: correct Broadcom OTP controller driver writes w1: Add subsystem kernel public interface drivers/fsi: Add module license to core driver drivers/fsi: Use asynchronous slave mode drivers/fsi: Add hub master support drivers/fsi: Add SCOM FSI client device driver drivers/fsi/gpio: Add tracepoints for GPIO master drivers/fsi: Add GPIO based FSI master drivers/fsi: Document FSI master sysfs files in ABI drivers/fsi: Add error handling for slave drivers/fsi: Add tracepoints for low-level operations ...
2017-07-03Merge tag 'driver-core-4.13-rc1' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core update for 4.13-rc1. The large majority of this is a lot of cleanup of old fields in the driver core structures and their remaining usages in random drivers. All of those fixes have been reviewed by the various subsystem maintainers. There's also some small firmware updates in here, a new kobject uevent api interface that makes userspace interaction easier, and a few other minor things. All of these have been in linux-next for a long while with no reported issues" * tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits) arm: mach-rpc: ecard: fix build error zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO() driver-core: remove struct bus_type.dev_attrs powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type powerpc: vio: use dev_groups and not dev_attrs for bus_type USB: usbip: convert to use DRIVER_ATTR_RW s390: drivers: convert to use DRIVER_ATTR_RO/WO platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW pcmcia: ds: convert to use DRIVER_ATTR_RO wireless: ipw2x00: convert to use DRIVER_ATTR_RW net: ehea: convert to use DRIVER_ATTR_RO net: caif: convert to use DRIVER_ATTR_RO TTY: hvc: convert to use DRIVER_ATTR_RW PCI: pci-driver: convert to use DRIVER_ATTR_WO IB: nes: convert to use DRIVER_ATTR_RW HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups arm: ecard: fix dev_groups patch typo tty: serdev: use dev_groups and not dev_attrs for bus_type sparc: vio: use dev_groups and not dev_attrs for bus_type hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type ...
2017-07-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds6-236/+244
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...
2017-07-03Merge branch 'irq-core-for-linus' of ↵Linus Torvalds19-264/+1755
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq department delivers: - Expand the generic infrastructure handling the irq migration on CPU hotplug and convert X86 over to it. (Thomas Gleixner) Aside of consolidating code this is a preparatory change for: - Finalizing the affinity management for multi-queue devices. The main change here is to shut down interrupts which are affine to a outgoing CPU and reenabling them when the CPU comes online again. That avoids moving interrupts pointlessly around and breaking and reestablishing affinities for no value. (Christoph Hellwig) Note: This contains also the BLOCK-MQ and NVME changes which depend on the rework of the irq core infrastructure. Jens acked them and agreed that they should go with the irq changes. - Consolidation of irq domain code (Marc Zyngier) - State tracking consolidation in the core code (Jeffy Chen) - Add debug infrastructure for hierarchical irq domains (Thomas Gleixner) - Infrastructure enhancement for managing generic interrupt chips via devmem (Bartosz Golaszewski) - Constification work all over the place (Tobias Klauser) - Two new interrupt controller drivers for MVEBU (Thomas Petazzoni) - The usual set of fixes, updates and enhancements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) irqchip/or1k-pic: Fix interrupt acknowledgement irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity nvme: Allocate queues for all possible CPUs blk-mq: Create hctx for each present CPU blk-mq: Include all present CPUs in the default queue mapping genirq: Avoid unnecessary low level irq function calls genirq: Set irq masked state when initializing irq_desc genirq/timings: Add infrastructure for estimating the next interrupt arrival time genirq/timings: Add infrastructure to track the interrupt timings genirq/debugfs: Remove pointless NULL pointer check irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID irqchip/gic-v3-its: Add ACPI NUMA node mapping irqchip/gic-v3-its-platform-msi: Make of_device_ids const irqchip/gic-v3-its: Make of_device_ids const irqchip/irq-mvebu-icu: Add new driver for Marvell ICU irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU genirq/irqdomain: Remove auto-recursive hierarchy support irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access ...
2017-07-03Merge branch 'timers-core-for-linus' of ↵Linus Torvalds13-1241/+1198
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather large update for timers/timekeeping: - compat syscall consolidation (Al Viro) - Posix timer consolidation (Christoph Helwig / Thomas Gleixner) - Cleanup of the device tree based initialization for clockevents and clocksources (Daniel Lezcano) - Consolidation of the FTTMR010 clocksource/event driver (Linus Walleij) - The usual set of small fixes and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits) timers: Make the cpu base lock raw clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()' clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap clocksource/drivers/tcb_clksrc: Make IO endian agnostic clocksource/drivers/sun4i: Switch to the timer-of common init clocksource/drivers/timer-of: Fix invalid iomap check Revert "ktime: Simplify ktime_compare implementation" clocksource/drivers: Fix uninitialized variable use in timer_of_init kselftests: timers: Add test for frequency step kselftests: timers: Fix inconsistency-check to not ignore first timestamp time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Clean up CLOCK_MONOTONIC_RAW time handling posix-cpu-timers: Make timespec to nsec conversion safe itimer: Make timeval to nsec conversion range limited timers: Fix parameter description of try_to_del_timer_sync() ktime: Simplify ktime_compare implementation clocksource/drivers/fttmr010: Factor out clock read code clocksource/drivers/fttmr010: Implement delay timer clocksource/drivers: Add timer-of common init routine clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume ...
2017-07-03Merge tag 'm68k-for-v4.13-tag1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - NuBus improvements and cleanups - defconfig updates - Fix debugger syscall restart interactions, leading to the global removal of ptrace_signal_deliver() * tag 'm68k-for-v4.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Remove ptrace_signal_deliver m68k/defconfig: Update defconfigs for v4.12-rc1 nubus: Fix pointer validation nubus: Remove slot zero probe
2017-07-03Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds2-23/+42
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull nohz updates from Ingo Molnar: "The main changes in this cycle relate to fixing another bad (but sporadic and hard to detect) interaction between the dynticks scheduler tick and hrtimers, plus related improvements to better detection and handling of similar problems - by Frédéric Weisbecker" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: Fix spurious warning when hrtimer and clockevent get out of sync nohz: Fix buggy tick delay on IRQ storms nohz: Reset next_tick cache even when the timer has no regs nohz: Fix collision between tick and other hrtimers, again nohz: Add hrtimer sanity check
2017-07-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds25-1585/+2413
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
2017-07-03Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Most of the changes are for tooling, the main changes in this cycle were: - Improve Intel-PT hardware tracing support, both on the kernel and on the tooling side: PTWRITE instruction support, power events for C-state tracing, etc. (Adrian Hunter) - Add support to measure SMI cost to the x86 architecture, with tooling support in 'perf stat' (Kan Liang) - Support function filtering in 'perf ftrace', plus related improvements (Namhyung Kim) - Allow adding and removing fields to the default 'perf script' columns, using + or - as field prefixes to do so (Andi Kleen) - Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso' (Mark Santaniello) - Add perf tooling unwind support for PowerPC (Paolo Bonzini) - ... and various other improvements as well" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits) perf auxtrace: Add CPU filter support perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC perf intel-pt: Update documentation to include new ptwrite and power events perf intel-pt: Add example script for power events and PTWRITE perf intel-pt: Synthesize new power and "ptwrite" events perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting perf intel-pt: Factor out intel_pt_set_event_name() perf intel-pt: Tidy messages into called function intel_pt_synth_event() perf intel-pt: Tidy Intel PT evsel lookup into separate function perf intel-pt: Join needlessly wrapped lines perf intel-pt: Remove unused instructions_sample_period perf intel-pt: Factor out common code synthesizing event samples perf script: Add synthesized Intel PT power and ptwrite events perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static perf script: Add 'synth' field for synthesized event payloads perf auxtrace: Add itrace option to output power events perf auxtrace: Add itrace option to output ptwrite events tools include: Add byte-swapping macros to kernel.h perf script: Add 'synth' event type for synthesized events x86/insn: perf tools: Add new ptwrite instruction ...
2017-07-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds4-10/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Add CONFIG_REFCOUNT_FULL=y to allow the disabling of the 'full' (robustness checked) refcount_t implementation with slightly lower runtime overhead. (Kees Cook) The lighter weight variant is the default. The two variants use the same API. Having this variant was a precondition by some maintainers to merge refcount_t cleanups. - Add lockdep support for rtmutexes (Peter Zijlstra) - liblockdep fixes and improvements (Sasha Levin, Ben Hutchings) - ... misc fixes and improvements" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) locking/refcount: Remove the half-implemented refcount_sub() API locking/refcount: Create unchecked atomic_t implementation locking/rtmutex: Don't initialize lockdep when not required locking/selftest: Add RT-mutex support locking/selftest: Remove the bad unlock ordering test rt_mutex: Add lockdep annotations MAINTAINERS: Claim atomic*_t maintainership locking/x86: Remove the unused atomic_inc_short() methd tools/lib/lockdep: Remove private kernel headers tools/lib/lockdep: Hide liblockdep output from test results tools/lib/lockdep: Add dummy current_gfp_context() tools/include: Add IS_ERR_OR_NULL to err.h tools/lib/lockdep: Add empty __is_[module,kernel]_percpu_address tools/lib/lockdep: Include err.h tools/include: Add (mostly) empty include/linux/sched/mm.h tools/lib/lockdep: Use LDFLAGS tools/lib/lockdep: Remove double-quotes from soname tools/lib/lockdep: Fix object file paths used in an out-of-tree build tools/lib/lockdep: Fix compilation for 4.11 tools/lib/lockdep: Don't mix fd-based and stream IO ...
2017-07-03Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds20-2300/+1248
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The sole purpose of these changes is to shrink and simplify the RCU code base, which has suffered from creeping bloat over the past couple of years. The end result is a net removal of ~2700 lines of code: 79 files changed, 1496 insertions(+), 4211 deletions(-) Plus there's a marked reduction in the Kconfig space complexity as well, here's the number of matches on 'grep RCU' in the .config: before after x86-defconfig 17 15 x86-allmodconfig 33 20" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits) rcu: Remove RCU CPU stall warnings from Tiny RCU rcu: Remove event tracing from Tiny RCU rcu: Move RCU debug Kconfig options to kernel/rcu rcu: Move RCU non-debug Kconfig options to kernel/rcu rcu: Eliminate NOCBs CPU-state Kconfig options rcu: Remove debugfs tracing srcu: Remove Classic SRCU srcu: Fix rcutorture-statistics typo rcu: Remove SPARSE_RCU_POINTER Kconfig option rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option rcu: Remove typecheck() from RCU locking wrapper functions rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h rcu: Remove nohz_full full-system-idle state machine rcu: Remove the RCU_KTHREAD_PRIO Kconfig option rcu: Remove *_SLOW_* Kconfig options srcu: Use rnp->lock wrappers to replace explicit memory barriers rcu: Move rnp->lock wrappers for SRCU use rcu: Convert rnp->lock wrappers to macros for SRCU use rcu: Refactor #includes from include/linux/rcupdate.h bcm47xx: Fix build regression ...
2017-07-03Merge branch 'core-objtool-for-linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "This is an extensive rewrite of the objdump tool to track all stack pointer modifications through the machine instructions of disassembled functions found in kernel .o files. This re-design removes the prior dependency on CONFIG_FRAME_POINTERS, with the goal to prepare the tool to generate kernel debuginfo data in the future. There's also an increase in checking/tracking robustness as a side effect as well. No (intended) changes to existing functionality" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Silence warnings for functions which use IRET objtool: Implement stack validation 2.0 objtool, x86: Add several functions and files to the objtool whitelist objtool: Move checking code to check.c
2017-07-03Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-blockLinus Torvalds2-9/+9
Pull core block/IO updates from Jens Axboe: "This is the main pull request for the block layer for 4.13. Not a huge round in terms of features, but there's a lot of churn related to some core cleanups. Note this depends on the UUID tree pull request, that Christoph already sent out. This pull request contains: - A series from Christoph, unifying the error/stats codes in the block layer. We now use blk_status_t everywhere, instead of using different schemes for different places. - Also from Christoph, some cleanups around request allocation and IO scheduler interactions in blk-mq. - And yet another series from Christoph, cleaning up how we handle and do bounce buffering in the block layer. - A blk-mq debugfs series from Bart, further improving on the support we have for exporting internal information to aid debugging IO hangs or stalls. - Also from Bart, a series that cleans up the request initialization differences across types of devices. - A series from Goldwyn Rodrigues, allowing the block layer to return failure if we will block and the user asked for non-blocking. - Patch from Hannes for supporting setting loop devices block size to that of the underlying device. - Two series of patches from Javier, fixing various issues with lightnvm, particular around pblk. - A series from me, adding support for write hints. This comes with NVMe support as well, so applications can help guide data placement on flash to improve performance, latencies, and write amplification. - A series from Ming, improving and hardening blk-mq support for stopping/starting and quiescing hardware queues. - Two pull requests for NVMe updates. Nothing major on the feature side, but lots of cleanups and bug fixes. From the usual crew. - A series from Neil Brown, greatly improving the bio rescue set support. Most notably, this kills the bio rescue work queues, if we don't really need them. - Lots of other little bug fixes that are all over the place" * 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits) lightnvm: pblk: set line bitmap check under debug lightnvm: pblk: verify that cache read is still valid lightnvm: pblk: add initialization check lightnvm: pblk: remove target using async. I/Os lightnvm: pblk: use vmalloc for GC data buffer lightnvm: pblk: use right metadata buffer for recovery lightnvm: pblk: schedule if data is not ready lightnvm: pblk: remove unused return variable lightnvm: pblk: fix double-free on pblk init lightnvm: pblk: fix bad le64 assignations nvme: Makefile: remove dead build rule blk-mq: map all HWQ also in hyperthreaded system nvmet-rdma: register ib_client to not deadlock in device removal nvme_fc: fix error recovery on link down. nvmet_fc: fix crashes on bad opcodes nvme_fc: Fix crash when nvme controller connection fails. nvme_fc: replace ioabort msleep loop with completion nvme_fc: fix double calls to nvme_cleanup_cmd() nvme-fabrics: verify that a controller returns the correct NQN nvme: simplify nvme_dev_attrs_are_visible ...
2017-07-03Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuidLinus Torvalds1-2/+2
Pull uuid subsystem from Christoph Hellwig: "This is the new uuid subsystem, in which Amir, Andy and I have started consolidating our uuid/guid helpers and improving the types used for them. Note that various other subsystems have pulled in this tree, so I'd like it to go in early. UUID/GUID summary: - introduce the new uuid_t/guid_t types that are going to replace the somewhat confusing uuid_be/uuid_le types and make the terminology fit the various specs, as well as the userspace libuuid library. (me, based on a previous version from Amir) - consolidated generic uuid/guid helper functions lifted from XFS and libnvdimm (Amir and me) - conversions to the new types and helpers (Amir, Andy and me)" * tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits) ACPI: hns_dsaf_acpi_dsm_guid can be static mmc: sdhci-pci: make guid intel_dsm_guid static uuid: Take const on input of uuid_is_null() and guid_is_null() thermal: int340x_thermal: fix compile after the UUID API switch thermal: int340x_thermal: Switch to use new generic UUID API acpi: always include uuid.h ACPI: Switch to use generic guid_t in acpi_evaluate_dsm() ACPI / extlog: Switch to use new generic UUID API ACPI / bus: Switch to use new generic UUID API ACPI / APEI: Switch to use new generic UUID API acpi, nfit: Switch to use new generic UUID API MAINTAINERS: add uuid entry tmpfs: generate random sb->s_uuid scsi_debug: switch to uuid_t nvme: switch to uuid_t sysctl: switch to use uuid_t partitions/ldm: switch to use uuid_t overlayfs: use uuid_t instead of uuid_be fs: switch ->s_uuid to uuid_t ima/policy: switch to use uuid_t ...
2017-07-03Merge branch 'for-4.13' into for-linusPetr Mladek3-14/+47
2017-07-03Merge branch 'acpi-pm'Rafael J. Wysocki2-7/+30
* acpi-pm: PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems platform: x86: intel-hid: Wake up the system from suspend-to-idle platform: x86: intel-vbtn: Wake up the system from suspend-to-idle ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle platform/x86: Add driver for ACPI INT0002 Virtual GPIO device PCI / PM: Restore PME Enable if skipping wakeup setup PM / sleep: Print timing information if debug is enabled ACPI / PM: Clean up device wakeup enable/disable code ACPI / PM: Change log level of wakeup-related message USB / PCI / PM: Allow the PCI core to do the resume cleanup ACPI / PM: Run wakeup notify handlers synchronously Conflicts: drivers/base/power/main.c
2017-07-03Merge branch 'pm-sleep'Rafael J. Wysocki2-7/+6
* pm-sleep: PM: hibernate: constify attribute_group structures. PM / hibernate: Drop redundant parameter of swsusp_alloc() PM / hibernate: Use CONFIG_HAVE_SET_MEMORY for include condition x86/power/64: Use char arrays for asm function names
2017-07-03Merge branch 'uuid-types'Rafael J. Wysocki1-2/+2
Merge 'uuid-types' from git://git.infradead.org/users/hch/uuid.git
2017-07-03bpf, verifier: add additional patterns to evaluate_reg_imm_aluJohn Fastabend1-0/+62
Currently the verifier does not track imm across alu operations when the source register is of unknown type. This adds additional pattern matching to catch this and track imm. We've seen LLVM generating this pattern while working on cilium. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: extend bpf_trace_printk to support %iJohn Fastabend1-3/+4
Currently, bpf_trace_printk does not support common formatting symbol '%i' however vsprintf does and is what eventually gets called by bpf helper. If users are used to '%i' and currently make use of it, then bpf_trace_printk will just return with error without dumping anything to the trace pipe, so just add support for '%i' to the helper. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: export whether tail call has jited ownerDaniel Borkmann1-1/+6
We do export through fdinfo already whether a prog is JITed or not, given a program load can fail in case of either prog or tail call map has JITed property, but neither both are JITed or not JITed, we can facilitate error reporting in loaders like iproute2 through exporting owner_jited of tail call map. We already do export owner_prog_type through this facility, so parser can pick up both for comparison. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: simplify narrower ctx accessDaniel Borkmann2-63/+46
This work tries to make the semantics and code around the narrower ctx access a bit easier to follow. Right now everything is done inside the .is_valid_access(). Offset matching is done differently for read/write types, meaning writes don't support narrower access and thus matching only on offsetof(struct foo, bar) is enough whereas for read case that supports narrower access we must check for offsetof(struct foo, bar) + offsetof(struct foo, bar) + sizeof(<bar>) - 1 for each of the cases. For read cases of individual members that don't support narrower access (like packet pointers or skb->cb[] case which has its own narrow access logic), we check as usual only offsetof(struct foo, bar) like in write case. Then, for the case where narrower access is allowed, we also need to set the aux info for the access. Meaning, ctx_field_size and converted_op_size have to be set. First is the original field size e.g. sizeof(<bar>) as in above example from the user facing ctx, and latter one is the target size after actual rewrite happened, thus for the kernel facing ctx. Also here we need the range match and we need to keep track changing convert_ctx_access() and converted_op_size from is_valid_access() as both are not at the same location. We can simplify the code a bit: check_ctx_access() becomes simpler in that we only store ctx_field_size as a meta data and later in convert_ctx_accesses() we fetch the target_size right from the location where we do convert. Should the verifier be misconfigured we do reject for BPF_WRITE cases or target_size that are not provided. For the subsystems, we always work on ranges in is_valid_access() and add small helpers for ranges and narrow access, convert_ctx_accesses() sets target_size for the relevant instruction. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: BPF support for sock_opsLawrence Brakmo2-0/+42
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.). It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. The program will be called at appropriate times to set relevant connections parameters such as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection information such as IP addresses, port numbers, etc. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some distinct advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it oes not require application changes and it can be updated easily at any time. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. The purpose of this new program type is to simplify setting connection parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is easy to use facebook's internal IPv6 addresses to determine if both hosts of a connection are in the same datacenter. Therefore, it is easy to write a BPF program to choose a small SYN RTO value when both hosts are in the same datacenter. This patch only contains the framework to support the new BPF program type, following patches add the functionality to set various connection parameters. This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS and a new bpf syscall command to load a new program of this type: BPF_PROG_LOAD_SOCKET_OPS. Two new corresponding structs (one for the kernel one for the user/BPF program): /* kernel version */ struct bpf_sock_ops_kern { struct sock *sk; __u32 op; union { __u32 reply; __u32 replylong[4]; }; }; /* user version * Some fields are in network byte order reflecting the sock struct * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to * convert them to host byte order. */ struct bpf_sock_ops { __u32 op; union { __u32 reply; __u32 replylong[4]; }; __u32 family; __u32 remote_ip4; /* In network byte order */ __u32 local_ip4; /* In network byte order */ __u32 remote_ip6[4]; /* In network byte order */ __u32 local_ip6[4]; /* In network byte order */ __u32 remote_port; /* In network byte order */ __u32 local_port; /* In host byte horder */ }; Currently there are two types of ops. The first type expects the BPF program to return a value which is then used by the caller (or a negative value to indicate the operation is not supported). The second type expects state changes to be done by the BPF program, for example through a setsockopt BPF helper function, and they ignore the return value. The reply fields of the bpf_sockt_ops struct are there in case a bpf program needs to return a value larger than an integer. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge tag 'trace-v4.12-rc5' of ↵Linus Torvalds5-14/+24
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull last-minute tracing fixes from Steven Rostedt: "Two fixes: One is for a crash when using the :mod: trace probe command into stack_trace_filter. This bug was introduced during the last merge window. The other was there forever. It's a small bug that makes it impossible to name a module function for kprobes when the module starts with a digit" * tag 'trace-v4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Allow to create probe with a module name starting with a digit ftrace: Fix regression with module command in stack_trace_filter
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-32/+66
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30objtool, x86: Add several functions and files to the objtool whitelistJosh Poimboeuf1-1/+3
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30posix_clocks: Use get_itimerspec64() and put_itimerspec64()Deepa Dinamani1-28/+16
Usage of these apis and their compat versions makes the syscalls: timer_settime and timer_gettime and their compat implementations simpler. This patch also serves as a preparatory patch for changing syscalls to use new time_t data types to support the y2038 effort by isolating the processing of user pointers through these apis. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-30nanosleep: Use get_timespec64() and put_timespec64()Deepa Dinamani4-37/+25
Usage of these apis and their compat versions makes the syscalls: clock_nanosleep and nanosleep and their compat implementations simpler. This is a preparatory patch to isolate data conversions to struct timespec64 at userspace boundaries. This helps contain the changes needed to transition to new y2038 safe types. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-30posix-timers: Use get_timespec64() and put_timespec64()Deepa Dinamani2-76/+70
Usage of these apis and their compat versions makes the syscalls: clock_gettime, clock_settime, clock_getres and their compat implementations simpler. This is a preparatory patch to isolate data conversions to struct timespec64 at userspace boundaries. This helps contain the changes needed to transition to new y2038 safe types. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-30sched/cputime: Refactor the cputime_adjust() codeGustavo A. R. Silva1-11/+5
Address a Coverity false positive, which is caused by overly convoluted code: Value assigned to variable 'utime' at line 619:utime = rtime; is overwritten at line 642:utime = rtime - stime; before it can be used. This makes such variable assignment useless. Remove this variable assignment and refactor the code related. Addresses-Coverity-ID: 1371643 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Frans Klaver <fransklaver@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/20170629184128.GA5271@embeddedgus Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30cpu/hotplug: Constify attribute_group structuresArvind Yadav1-2/+2
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const: File size before: text data bss dec hex filename 12582 15361 20 27963 6d3b kernel/cpu.o File size After adding 'const': text data bss dec hex filename 12710 15265 20 27995 6d5b kernel/cpu.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: anna-maria@linutronix.de Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: rcochran@linutronix.de Link: http://lkml.kernel.org/r/f9079e94e12b36d245e7adbf67d312bc5d0250c6.1498737970.git.arvind.yadav.cs@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30sched/debug: Expose the number of RT/DL tasks that can migrateDaniel Bristot de Oliveira1-2/+15
Add the value of the rt_rq.rt_nr_migratory and dl_rq.dl_nr_migratory to the sched_debug output, for instance: rt_rq[0]: .rt_nr_running : 2 .rt_nr_migratory : 1 <--- Like this .rt_throttled : 0 .rt_time : 828.645877 .rt_runtime : 1000.000000 This is useful to debug problems related to the RT/DL schedulers. This also fixes the format of some variables, that were unsigned, rather than signed. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/7896f71cada54ee7dd8507bb666063a2e051c3d4.1498482127.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30bpf: don't open-code memdup_user()Al Viro1-29/+16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-30kimage_file_prepare_segments(): don't open-code memdup_user()Al Viro1-10/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29tracing/kprobes: Allow to create probe with a module name starting with a digitSabrina Dubroca1-9/+5
Always try to parse an address, since kstrtoul() will safely fail when given a symbol as input. If that fails (which will be the case for a symbol), try to parse a symbol instead. This allows creating a probe such as: p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0 Which is necessary for this command to work: perf probe -m 8021q -a vlan_gro_receive Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net Cc: stable@vger.kernel.org Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+5
Pull networking fixes from David Miller: 1) Need to access netdev->num_rx_queues behind an accessor in netvsc driver otherwise the build breaks with some configs, from Arnd Bergmann. 2) Add dummy xfrm_dev_event() so that build doesn't fail when CONFIG_XFRM_OFFLOAD is not set. From Hangbin Liu. 3) Don't OOPS when pfkey_msg2xfrm_state() signals an erros, from Dan Carpenter. 4) Fix MCDI command size for filter operations in sfc driver, from Martin Habets. 5) Fix UFO segmenting so that we don't calculate incorrect checksums, from Michal Kubecek. 6) When ipv6 datagram connects fail, reset destination address and port. From Wei Wang. 7) TCP disconnect must reset the cached receive DST, from WANG Cong. 8) Fix sign extension bug on 32-bit in dev_get_stats(), from Eric Dumazet. 9) fman driver has to depend on HAS_DMA, from Madalin Bucur. 10) Fix bpf pointer leak with xadd in verifier, from Daniel Borkmann. 11) Fix negative page counts with GFO, from Michal Kubecek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) sfc: fix attempt to translate invalid filter ID net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() bpf: prevent leaking pointer via xadd on unpriviledged arcnet: com20020-pci: add missing pdev setup in netdev structure arcnet: com20020-pci: fix dev_id calculation arcnet: com20020: remove needless base_addr assignment Trivial fix to spelling mistake in arc_printk message arcnet: change irq handler to lock irqsave rocker: move dereference before free mlxsw: spectrum_router: Fix NULL pointer dereference net: sched: Fix one possible panic when no destroy callback virtio-net: serialize tx routine during reset net: usb: asix88179_178a: Add support for the Belkin B2B128 fsl/fman: add dependency on HAS_DMA net: prevent sign extension in dev_get_stats() tcp: reset sk_rx_dst in tcp_disconnect() net: ipv6: reset daddr and dport in sk if connect() fails bnx2x: Don't log mc removal needlessly bnxt_en: Fix netpoll handling. bnxt_en: Add missing logic to handle TPA end error conditions. ...
2017-06-29PM: hibernate: constify attribute_group structures.Arvind Yadav1-1/+1
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 6332 488 308 7128 1bd8 kernel/power/hibernate.o File size After adding 'const': text data bss dec hex filename 6396 424 308 7128 1bd8 kernel/power/hibernate.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-29bpf: prevent leaking pointer via xadd on unpriviledgedDaniel Borkmann1-0/+5
Leaking kernel addresses on unpriviledged is generally disallowed, for example, verifier rejects the following: 0: (b7) r0 = 0 1: (18) r2 = 0xffff897e82304400 3: (7b) *(u64 *)(r1 +48) = r2 R2 leaks addr into ctx Doing pointer arithmetic on them is also forbidden, so that they don't turn into unknown value and then get leaked out. However, there's xadd as a special case, where we don't check the src reg for being a pointer register, e.g. the following will pass: 0: (b7) r0 = 0 1: (7b) *(u64 *)(r1 +48) = r0 2: (18) r2 = 0xffff897e82304400 ; map 4: (db) lock *(u64 *)(r1 +48) += r2 5: (95) exit We could store the pointer into skb->cb, loose the type context, and then read it out from there again to leak it eventually out of a map value. Or more easily in a different variant, too: 0: (bf) r6 = r1 1: (7a) *(u64 *)(r10 -8) = 0 2: (bf) r2 = r10 3: (07) r2 += -8 4: (18) r1 = 0x0 6: (85) call bpf_map_lookup_elem#1 7: (15) if r0 == 0x0 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp 8: (b7) r3 = 0 9: (7b) *(u64 *)(r0 +0) = r3 10: (db) lock *(u64 *)(r0 +0) += r6 11: (b7) r0 = 0 12: (95) exit from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp 11: (b7) r0 = 0 12: (95) exit Prevent this by checking xadd src reg for pointer types. Also add a couple of test cases related to this. Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29bpf: Fix out-of-bound access on interpreters[]Martin KaFai Lau1-1/+3
The index is off-by-one when fp->aux->stack_depth has already been rounded up to 32. In particular, if stack_depth is 512, the index will be 16. The fix is to round_up and then takes -1 instead of round_down. [ 22.318680] ================================================================== [ 22.319745] BUG: KASAN: global-out-of-bounds in bpf_prog_select_runtime+0x48a/0x670 [ 22.320737] Read of size 8 at addr ffffffff82aadae0 by task sockex3/1946 [ 22.321646] [ 22.321858] CPU: 1 PID: 1946 Comm: sockex3 Tainted: G W 4.12.0-rc6-01680-g2ee87db3a287 #22 [ 22.323061] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014 [ 22.324260] Call Trace: [ 22.324612] dump_stack+0x67/0x99 [ 22.325081] print_address_description+0x1e8/0x290 [ 22.325734] ? bpf_prog_select_runtime+0x48a/0x670 [ 22.326360] kasan_report+0x265/0x350 [ 22.326860] __asan_report_load8_noabort+0x19/0x20 [ 22.327484] bpf_prog_select_runtime+0x48a/0x670 [ 22.328109] bpf_prog_load+0x626/0xd40 [ 22.328637] ? __bpf_prog_charge+0xc0/0xc0 [ 22.329222] ? check_nnp_nosuid.isra.61+0x100/0x100 [ 22.329890] ? __might_fault+0xf6/0x1b0 [ 22.330446] ? lock_acquire+0x360/0x360 [ 22.331013] SyS_bpf+0x67c/0x24d0 [ 22.331491] ? trace_hardirqs_on+0xd/0x10 [ 22.332049] ? __getnstimeofday64+0xaf/0x1c0 [ 22.332635] ? bpf_prog_get+0x20/0x20 [ 22.333135] ? __audit_syscall_entry+0x300/0x600 [ 22.333770] ? syscall_trace_enter+0x540/0xdd0 [ 22.334339] ? exit_to_usermode_loop+0xe0/0xe0 [ 22.334950] ? do_syscall_64+0x48/0x410 [ 22.335446] ? bpf_prog_get+0x20/0x20 [ 22.335954] do_syscall_64+0x181/0x410 [ 22.336454] entry_SYSCALL64_slow_path+0x25/0x25 [ 22.337121] RIP: 0033:0x7f263fe81f19 [ 22.337618] RSP: 002b:00007ffd9a3440c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 22.338619] RAX: ffffffffffffffda RBX: 0000000000aac5fb RCX: 00007f263fe81f19 [ 22.339600] RDX: 0000000000000030 RSI: 00007ffd9a3440d0 RDI: 0000000000000005 [ 22.340470] RBP: 0000000000a9a1e0 R08: 0000000000a9a1e0 R09: 0000009d00000001 [ 22.341430] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000010000 [ 22.342411] R13: 0000000000a9a023 R14: 0000000000000001 R15: 0000000000000003 [ 22.343369] [ 22.343593] The buggy address belongs to the variable: [ 22.344241] interpreters+0x80/0x980 [ 22.344708] [ 22.344908] Memory state around the buggy address: [ 22.345556] ffffffff82aad980: 00 00 00 04 fa fa fa fa 04 fa fa fa fa fa fa fa [ 22.346449] ffffffff82aada00: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 00 [ 22.347361] >ffffffff82aada80: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa [ 22.348301] ^ [ 22.349142] ffffffff82aadb00: 00 01 fa fa fa fa fa fa 00 00 00 00 00 00 00 00 [ 22.350058] ffffffff82aadb80: 00 00 07 fa fa fa fa fa 00 00 05 fa fa fa fa fa [ 22.350984] ================================================================== Fixes: b870aa901f4b ("bpf: use different interpreter depending on required stack size") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29bpf: Add syscall lookup support for fd array and htabMartin KaFai Lau5-3/+67
This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29ftrace: Fix regression with module command in stack_trace_filterSteven Rostedt (VMware)4-5/+19
When doing the following command: # echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter it triggered a crash. This happened with the clean up of probes. It required all callers to the regex function (doing ftrace filtering) to have ops->private be a pointer to a trace_array. But for the stack tracer, that is not the case. Allow for the ops->private to be NULL, and change the function command callbacks to handle the trace_array pointer being NULL as well. Fixes: d2afd57a4b96 ("tracing/ftrace: Allow instances to have their own function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>