From 03d6a74b5f85ff46f20e1382982b7f4860f5fec6 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 22 Sep 2009 11:09:12 -0400 Subject: nfsd: fix Documentation typo Caught by Benny, thanks! Signed-off-by: J. Bruce Fields --- Documentation/filesystems/nfs41-server.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt index 5920fe26e6ff..1f95e7731886 100644 --- a/Documentation/filesystems/nfs41-server.txt +++ b/Documentation/filesystems/nfs41-server.txt @@ -41,7 +41,7 @@ interoperability problems with future clients. Known issues: conformant with the spec (for example, we don't use kerberos on the backchannel correctly). - no trunking support: no clients currently take advantage of - trunking, but this is a mandatory failure, and its use is + trunking, but this is a mandatory feature, and its use is recommended to clients in a number of places. (E.g. to ensure timely renewal in case an existing connection's retry timeouts have gotten too long; see section 8.3 of the draft.) -- cgit v1.2.3 From ddc04fd4d5163aee9ebdb38a56c365b602e2b7b7 Mon Sep 17 00:00:00 2001 From: Andy Adamson Date: Wed, 23 Sep 2009 21:32:21 -0400 Subject: nfsd41: use sv_max_mesg for forechannel max sizes ca_maxresponsesize and ca_maxrequest size include the RPC header. sv_max_mesg is sv_max_payolad plus a page for overhead and is used in svc_init_buffer to allocate server buffer space for both the request and reply. Note that this means we can service an RPC compound that requires ca_maxrequestsize (MAXWRITE) or ca_max_responsesize (MAXREAD) but that we do not support an RPC compound that requires both ca_maxrequestsize and ca_maxresponsesize. Signed-off-by: Andy Adamson [bfields@citi.umich.edu: more documentation updates] Signed-off-by: J. Bruce Fields --- Documentation/filesystems/nfs41-server.txt | 7 +++++++ fs/nfsd/nfs4state.c | 3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt index 1f95e7731886..1bd0d0c05171 100644 --- a/Documentation/filesystems/nfs41-server.txt +++ b/Documentation/filesystems/nfs41-server.txt @@ -213,3 +213,10 @@ The following cases aren't supported yet: DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. * DESTROY_SESSION MUST be the final operation in the COMPOUND request. +Nonstandard compound limitations: +* No support for a sessions fore channel RPC compound that requires both a + ca_maxrequestsize request and a ca_maxresponsesize reply, so we may + fail to live up to the promise we made in CREATE_SESSION fore channel + negotiation. +* No more than one IO operation (read, write, readdir) allowed per + compound. diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2153f9bdbebd..fcb9817881a1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -477,13 +477,14 @@ static int set_forechannel_drc_size(struct nfsd4_channel_attrs *fchan) /* * fchan holds the client values on input, and the server values on output + * sv_max_mesg is the maximum payload plus one page for overhead. */ static int init_forechannel_attrs(struct svc_rqst *rqstp, struct nfsd4_channel_attrs *session_fchan, struct nfsd4_channel_attrs *fchan) { int status = 0; - __u32 maxcount = svc_max_payload(rqstp); + __u32 maxcount = nfsd_serv->sv_max_mesg; /* headerpadsz set to zero in encode routine */ -- cgit v1.2.3 From 78eb00cc574d3dbf8e6bed804948a89e8110a064 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Thu, 15 Oct 2009 02:20:22 -0700 Subject: slub: allow stats to be cleared When collecting slub stats for particular workloads, it's necessary to collect each statistic for all caches before the job is even started because the counters are usually greater than zero just from boot and initialization. This allows a statistic to be cleared on each cpu by writing '0' to its sysfs file. This creates a baseline for statistics of interest before the workload is started. Setting a statistic to a particular value is not supported, so all values written to these files other than '0' returns -EINVAL. Cc: Christoph Lameter Signed-off-by: David Rientjes Signed-off-by: Pekka Enberg --- Documentation/ABI/testing/sysfs-kernel-slab | 109 +++++++++++++++------------- mm/slub.c | 18 ++++- 2 files changed, 75 insertions(+), 52 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 6dcf75e594fb..8b093f8222d3 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -45,8 +45,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The alloc_fastpath file is read-only and specifies how many - objects have been allocated using the fast path. + The alloc_fastpath file shows how many objects have been + allocated using the fast path. It can be written to clear the + current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/alloc_from_partial @@ -55,9 +56,10 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The alloc_from_partial file is read-only and specifies how - many times a cpu slab has been full and it has been refilled - by using a slab from the list of partially used slabs. + The alloc_from_partial file shows how many times a cpu slab has + been full and it has been refilled by using a slab from the list + of partially used slabs. It can be written to clear the current + count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/alloc_refill @@ -66,9 +68,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The alloc_refill file is read-only and specifies how many - times the per-cpu freelist was empty but there were objects - available as the result of remote cpu frees. + The alloc_refill file shows how many times the per-cpu freelist + was empty but there were objects available as the result of + remote cpu frees. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/alloc_slab @@ -77,8 +79,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The alloc_slab file is read-only and specifies how many times - a new slab had to be allocated from the page allocator. + The alloc_slab file is shows how many times a new slab had to + be allocated from the page allocator. It can be written to + clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/alloc_slowpath @@ -87,9 +90,10 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The alloc_slowpath file is read-only and specifies how many - objects have been allocated using the slow path because of a - refill or allocation from a partial or new slab. + The alloc_slowpath file shows how many objects have been + allocated using the slow path because of a refill or + allocation from a partial or new slab. It can be written to + clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/cache_dma @@ -117,10 +121,11 @@ KernelVersion: 2.6.31 Contact: Pekka Enberg , Christoph Lameter Description: - The file cpuslab_flush is read-only and specifies how many - times a cache's cpu slabs have been flushed as the result of - destroying or shrinking a cache, a cpu going offline, or as - the result of forcing an allocation from a certain node. + The file cpuslab_flush shows how many times a cache's cpu slabs + have been flushed as the result of destroying or shrinking a + cache, a cpu going offline, or as the result of forcing an + allocation from a certain node. It can be written to clear the + current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/ctor @@ -139,8 +144,8 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file deactivate_empty is read-only and specifies how many - times an empty cpu slab was deactivated. + The deactivate_empty file shows how many times an empty cpu slab + was deactivated. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/deactivate_full @@ -149,8 +154,8 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file deactivate_full is read-only and specifies how many - times a full cpu slab was deactivated. + The deactivate_full file shows how many times a full cpu slab + was deactivated. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/deactivate_remote_frees @@ -159,9 +164,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file deactivate_remote_frees is read-only and specifies how - many times a cpu slab has been deactivated and contained free - objects that were freed remotely. + The deactivate_remote_frees file shows how many times a cpu slab + has been deactivated and contained free objects that were freed + remotely. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/deactivate_to_head @@ -170,9 +175,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file deactivate_to_head is read-only and specifies how - many times a partial cpu slab was deactivated and added to the - head of its node's partial list. + The deactivate_to_head file shows how many times a partial cpu + slab was deactivated and added to the head of its node's partial + list. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/deactivate_to_tail @@ -181,9 +186,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file deactivate_to_tail is read-only and specifies how - many times a partial cpu slab was deactivated and added to the - tail of its node's partial list. + The deactivate_to_tail file shows how many times a partial cpu + slab was deactivated and added to the tail of its node's partial + list. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/destroy_by_rcu @@ -201,9 +206,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file free_add_partial is read-only and specifies how many - times an object has been freed in a full slab so that it had to - added to its node's partial list. + The free_add_partial file shows how many times an object has + been freed in a full slab so that it had to added to its node's + partial list. It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/free_calls @@ -222,9 +227,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The free_fastpath file is read-only and specifies how many - objects have been freed using the fast path because it was an - object from the cpu slab. + The free_fastpath file shows how many objects have been freed + using the fast path because it was an object from the cpu slab. + It can be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/free_frozen @@ -233,9 +238,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The free_frozen file is read-only and specifies how many - objects have been freed to a frozen slab (i.e. a remote cpu - slab). + The free_frozen file shows how many objects have been freed to + a frozen slab (i.e. a remote cpu slab). It can be written to + clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/free_remove_partial @@ -244,9 +249,10 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The file free_remove_partial is read-only and specifies how - many times an object has been freed to a now-empty slab so - that it had to be removed from its node's partial list. + The free_remove_partial file shows how many times an object has + been freed to a now-empty slab so that it had to be removed from + its node's partial list. It can be written to clear the current + count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/free_slab @@ -255,8 +261,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The free_slab file is read-only and specifies how many times an - empty slab has been freed back to the page allocator. + The free_slab file shows how many times an empty slab has been + freed back to the page allocator. It can be written to clear + the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/free_slowpath @@ -265,9 +272,9 @@ KernelVersion: 2.6.25 Contact: Pekka Enberg , Christoph Lameter Description: - The free_slowpath file is read-only and specifies how many - objects have been freed using the slow path (i.e. to a full or - partial slab). + The free_slowpath file shows how many objects have been freed + using the slow path (i.e. to a full or partial slab). It can + be written to clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/hwcache_align @@ -346,10 +353,10 @@ KernelVersion: 2.6.26 Contact: Pekka Enberg , Christoph Lameter Description: - The file order_fallback is read-only and specifies how many - times an allocation of a new slab has not been possible at the - cache's order and instead fallen back to its minimum possible - order. + The order_fallback file shows how many times an allocation of a + new slab has not been possible at the cache's order and instead + fallen back to its minimum possible order. It can be written to + clear the current count. Available when CONFIG_SLUB_STATS is enabled. What: /sys/kernel/slab/cache/partial diff --git a/mm/slub.c b/mm/slub.c index 4996fc719552..ac0ca4c0d054 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4371,12 +4371,28 @@ static int show_stat(struct kmem_cache *s, char *buf, enum stat_item si) return len + sprintf(buf + len, "\n"); } +static void clear_stat(struct kmem_cache *s, enum stat_item si) +{ + int cpu; + + for_each_online_cpu(cpu) + get_cpu_slab(s, cpu)->stat[si] = 0; +} + #define STAT_ATTR(si, text) \ static ssize_t text##_show(struct kmem_cache *s, char *buf) \ { \ return show_stat(s, buf, si); \ } \ -SLAB_ATTR_RO(text); \ +static ssize_t text##_store(struct kmem_cache *s, \ + const char *buf, size_t length) \ +{ \ + if (buf[0] != '0') \ + return -EINVAL; \ + clear_stat(s, si); \ + return length; \ +} \ +SLAB_ATTR(text); \ STAT_ATTR(ALLOC_FASTPATH, alloc_fastpath); STAT_ATTR(ALLOC_SLOWPATH, alloc_slowpath); -- cgit v1.2.3 From dc7a08166f3a5f23e79e839a8a88849bd3397c32 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 27 Oct 2009 14:41:35 -0400 Subject: nfs: new subdir Documentation/filesystems/nfs We're adding enough nfs documentation that it may as well have its own subdirectory. Acked-by: Randy Dunlap Signed-off-by: J. Bruce Fields --- Documentation/filesystems/00-INDEX | 10 +- Documentation/filesystems/Exporting | 147 -------------- Documentation/filesystems/nfs-rdma.txt | 271 ------------------------- Documentation/filesystems/nfs.txt | 98 --------- Documentation/filesystems/nfs/00-INDEX | 12 ++ Documentation/filesystems/nfs/Exporting | 147 ++++++++++++++ Documentation/filesystems/nfs/nfs-rdma.txt | 271 +++++++++++++++++++++++++ Documentation/filesystems/nfs/nfs.txt | 98 +++++++++ Documentation/filesystems/nfs/nfs41-server.txt | 222 ++++++++++++++++++++ Documentation/filesystems/nfs/nfsroot.txt | 270 ++++++++++++++++++++++++ Documentation/filesystems/nfs41-server.txt | 222 -------------------- Documentation/filesystems/nfsroot.txt | 270 ------------------------ Documentation/filesystems/porting | 2 +- Documentation/kernel-parameters.txt | 6 +- fs/cifs/export.c | 2 +- fs/exportfs/expfs.c | 2 +- fs/isofs/export.c | 2 +- fs/nfs/Kconfig | 2 +- include/linux/exportfs.h | 2 +- net/ipv4/Kconfig | 6 +- net/ipv4/ipconfig.c | 2 +- 21 files changed, 1035 insertions(+), 1029 deletions(-) delete mode 100644 Documentation/filesystems/Exporting delete mode 100644 Documentation/filesystems/nfs-rdma.txt delete mode 100644 Documentation/filesystems/nfs.txt create mode 100644 Documentation/filesystems/nfs/00-INDEX create mode 100644 Documentation/filesystems/nfs/Exporting create mode 100644 Documentation/filesystems/nfs/nfs-rdma.txt create mode 100644 Documentation/filesystems/nfs/nfs.txt create mode 100644 Documentation/filesystems/nfs/nfs41-server.txt create mode 100644 Documentation/filesystems/nfs/nfsroot.txt delete mode 100644 Documentation/filesystems/nfs41-server.txt delete mode 100644 Documentation/filesystems/nfsroot.txt (limited to 'Documentation') diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index f15621ee5599..482151c883a5 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -1,7 +1,5 @@ 00-INDEX - this file (info on some of the filesystems supported by linux). -Exporting - - explanation of how to make filesystems exportable. Locking - info on locking rules as they pertain to Linux VFS. 9p.txt @@ -66,12 +64,8 @@ mandatory-locking.txt - info on the Linux implementation of Sys V mandatory file locking. ncpfs.txt - info on Novell Netware(tm) filesystem using NCP protocol. -nfs41-server.txt - - info on the Linux server implementation of NFSv4 minor version 1. -nfs-rdma.txt - - how to install and setup the Linux NFS/RDMA client and server software. -nfsroot.txt - - short guide on setting up a diskless box with NFS root filesystem. +nfs/ + - nfs-related documentation. nilfs2.txt - info and mount options for the NILFS2 filesystem. ntfs.txt diff --git a/Documentation/filesystems/Exporting b/Documentation/filesystems/Exporting deleted file mode 100644 index 87019d2b5981..000000000000 --- a/Documentation/filesystems/Exporting +++ /dev/null @@ -1,147 +0,0 @@ - -Making Filesystems Exportable -============================= - -Overview --------- - -All filesystem operations require a dentry (or two) as a starting -point. Local applications have a reference-counted hold on suitable -dentries via open file descriptors or cwd/root. However remote -applications that access a filesystem via a remote filesystem protocol -such as NFS may not be able to hold such a reference, and so need a -different way to refer to a particular dentry. As the alternative -form of reference needs to be stable across renames, truncates, and -server-reboot (among other things, though these tend to be the most -problematic), there is no simple answer like 'filename'. - -The mechanism discussed here allows each filesystem implementation to -specify how to generate an opaque (outside of the filesystem) byte -string for any dentry, and how to find an appropriate dentry for any -given opaque byte string. -This byte string will be called a "filehandle fragment" as it -corresponds to part of an NFS filehandle. - -A filesystem which supports the mapping between filehandle fragments -and dentries will be termed "exportable". - - - -Dcache Issues -------------- - -The dcache normally contains a proper prefix of any given filesystem -tree. This means that if any filesystem object is in the dcache, then -all of the ancestors of that filesystem object are also in the dcache. -As normal access is by filename this prefix is created naturally and -maintained easily (by each object maintaining a reference count on -its parent). - -However when objects are included into the dcache by interpreting a -filehandle fragment, there is no automatic creation of a path prefix -for the object. This leads to two related but distinct features of -the dcache that are not needed for normal filesystem access. - -1/ The dcache must sometimes contain objects that are not part of the - proper prefix. i.e that are not connected to the root. -2/ The dcache must be prepared for a newly found (via ->lookup) directory - to already have a (non-connected) dentry, and must be able to move - that dentry into place (based on the parent and name in the - ->lookup). This is particularly needed for directories as - it is a dcache invariant that directories only have one dentry. - -To implement these features, the dcache has: - -a/ A dentry flag DCACHE_DISCONNECTED which is set on - any dentry that might not be part of the proper prefix. - This is set when anonymous dentries are created, and cleared when a - dentry is noticed to be a child of a dentry which is in the proper - prefix. - -b/ A per-superblock list "s_anon" of dentries which are the roots of - subtrees that are not in the proper prefix. These dentries, as - well as the proper prefix, need to be released at unmount time. As - these dentries will not be hashed, they are linked together on the - d_hash list_head. - -c/ Helper routines to allocate anonymous dentries, and to help attach - loose directory dentries at lookup time. They are: - d_alloc_anon(inode) will return a dentry for the given inode. - If the inode already has a dentry, one of those is returned. - If it doesn't, a new anonymous (IS_ROOT and - DCACHE_DISCONNECTED) dentry is allocated and attached. - In the case of a directory, care is taken that only one dentry - can ever be attached. - d_splice_alias(inode, dentry) will make sure that there is a - dentry with the same name and parent as the given dentry, and - which refers to the given inode. - If the inode is a directory and already has a dentry, then that - dentry is d_moved over the given dentry. - If the passed dentry gets attached, care is taken that this is - mutually exclusive to a d_alloc_anon operation. - If the passed dentry is used, NULL is returned, else the used - dentry is returned. This corresponds to the calling pattern of - ->lookup. - - -Filesystem Issues ------------------ - -For a filesystem to be exportable it must: - - 1/ provide the filehandle fragment routines described below. - 2/ make sure that d_splice_alias is used rather than d_add - when ->lookup finds an inode for a given parent and name. - Typically the ->lookup routine will end with a: - - return d_splice_alias(inode, dentry); - } - - - - A file system implementation declares that instances of the filesystem -are exportable by setting the s_export_op field in the struct -super_block. This field must point to a "struct export_operations" -struct which has the following members: - - encode_fh (optional) - Takes a dentry and creates a filehandle fragment which can later be used - to find or create a dentry for the same object. The default - implementation creates a filehandle fragment that encodes a 32bit inode - and generation number for the inode encoded, and if necessary the - same information for the parent. - - fh_to_dentry (mandatory) - Given a filehandle fragment, this should find the implied object and - create a dentry for it (possibly with d_alloc_anon). - - fh_to_parent (optional but strongly recommended) - Given a filehandle fragment, this should find the parent of the - implied object and create a dentry for it (possibly with d_alloc_anon). - May fail if the filehandle fragment is too small. - - get_parent (optional but strongly recommended) - When given a dentry for a directory, this should return a dentry for - the parent. Quite possibly the parent dentry will have been allocated - by d_alloc_anon. The default get_parent function just returns an error - so any filehandle lookup that requires finding a parent will fail. - ->lookup("..") is *not* used as a default as it can leave ".." entries - in the dcache which are too messy to work with. - - get_name (optional) - When given a parent dentry and a child dentry, this should find a name - in the directory identified by the parent dentry, which leads to the - object identified by the child dentry. If no get_name function is - supplied, a default implementation is provided which uses vfs_readdir - to find potential names, and matches inode numbers to find the correct - match. - - -A filehandle fragment consists of an array of 1 or more 4byte words, -together with a one byte "type". -The decode_fh routine should not depend on the stated size that is -passed to it. This size may be larger than the original filehandle -generated by encode_fh, in which case it will have been padded with -nuls. Rather, the encode_fh routine should choose a "type" which -indicates the decode_fh how much of the filehandle is valid, and how -it should be interpreted. diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt deleted file mode 100644 index e386f7e4bcee..000000000000 --- a/Documentation/filesystems/nfs-rdma.txt +++ /dev/null @@ -1,271 +0,0 @@ -################################################################################ -# # -# NFS/RDMA README # -# # -################################################################################ - - Author: NetApp and Open Grid Computing - Date: May 29, 2008 - -Table of Contents -~~~~~~~~~~~~~~~~~ - - Overview - - Getting Help - - Installation - - Check RDMA and NFS Setup - - NFS/RDMA Setup - -Overview -~~~~~~~~ - - This document describes how to install and setup the Linux NFS/RDMA client - and server software. - - The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server - was first included in the following release, Linux 2.6.25. - - In our testing, we have obtained excellent performance results (full 10Gbit - wire bandwidth at minimal client CPU) under many workloads. The code passes - the full Connectathon test suite and operates over both Infiniband and iWARP - RDMA adapters. - -Getting Help -~~~~~~~~~~~~ - - If you get stuck, you can ask questions on the - - nfs-rdma-devel@lists.sourceforge.net - - mailing list. - -Installation -~~~~~~~~~~~~ - - These instructions are a step by step guide to building a machine for - use with NFS/RDMA. - - - Install an RDMA device - - Any device supported by the drivers in drivers/infiniband/hw is acceptable. - - Testing has been performed using several Mellanox-based IB cards, the - Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. - - - Install a Linux distribution and tools - - The first kernel release to contain both the NFS/RDMA client and server was - Linux 2.6.25 Therefore, a distribution compatible with this and subsequent - Linux kernel release should be installed. - - The procedures described in this document have been tested with - distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). - - - Install nfs-utils-1.1.2 or greater on the client - - An NFS/RDMA mount point can be obtained by using the mount.nfs command in - nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils - version with support for NFS/RDMA mounts, but for various reasons we - recommend using nfs-utils-1.1.2 or greater). To see which version of - mount.nfs you are using, type: - - $ /sbin/mount.nfs -V - - If the version is less than 1.1.2 or the command does not exist, - you should install the latest version of nfs-utils. - - Download the latest package from: - - http://www.kernel.org/pub/linux/utils/nfs - - Uncompress the package and follow the installation instructions. - - If you will not need the idmapper and gssd executables (you do not need - these to create an NFS/RDMA enabled mount command), the installation - process can be simplified by disabling these features when running - configure: - - $ ./configure --disable-gss --disable-nfsv4 - - To build nfs-utils you will need the tcp_wrappers package installed. For - more information on this see the package's README and INSTALL files. - - After building the nfs-utils package, there will be a mount.nfs binary in - the utils/mount directory. This binary can be used to initiate NFS v2, v3, - or v4 mounts. To initiate a v4 mount, the binary must be called - mount.nfs4. The standard technique is to create a symlink called - mount.nfs4 to mount.nfs. - - This mount.nfs binary should be installed at /sbin/mount.nfs as follows: - - $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs - - In this location, mount.nfs will be invoked automatically for NFS mounts - by the system mount command. - - NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed - on the NFS client machine. You do not need this specific version of - nfs-utils on the server. Furthermore, only the mount.nfs command from - nfs-utils-1.1.2 is needed on the client. - - - Install a Linux kernel with NFS/RDMA - - The NFS/RDMA client and server are both included in the mainline Linux - kernel version 2.6.25 and later. This and other versions of the 2.6 Linux - kernel can be found at: - - ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ - - Download the sources and place them in an appropriate location. - - - Configure the RDMA stack - - Make sure your kernel configuration has RDMA support enabled. Under - Device Drivers -> InfiniBand support, update the kernel configuration - to enable InfiniBand support [NOTE: the option name is misleading. Enabling - InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. - - Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or - iWARP adapter support (amso, cxgb3, etc.). - - If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. - - - Configure the NFS client and server - - Your kernel configuration must also have NFS file system support and/or - NFS server support enabled. These and other NFS related configuration - options can be found under File Systems -> Network File Systems. - - - Build, install, reboot - - The NFS/RDMA code will be enabled automatically if NFS and RDMA - are turned on. The NFS/RDMA client and server are configured via the hidden - SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The - value of SUNRPC_XPRT_RDMA will be: - - - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client - and server will not be built - - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, - in this case the NFS/RDMA client and server will be built as modules - - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client - and server will be built into the kernel - - Therefore, if you have followed the steps above and turned no NFS and RDMA, - the NFS/RDMA client and server will be built. - - Build a new kernel, install it, boot it. - -Check RDMA and NFS Setup -~~~~~~~~~~~~~~~~~~~~~~~~ - - Before configuring the NFS/RDMA software, it is a good idea to test - your new kernel to ensure that the kernel is working correctly. - In particular, it is a good idea to verify that the RDMA stack - is functioning as expected and standard NFS over TCP/IP and/or UDP/IP - is working properly. - - - Check RDMA Setup - - If you built the RDMA components as modules, load them at - this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel - card: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - - If you are using InfiniBand, make sure there is a Subnet Manager (SM) - running on the network. If your IB switch has an embedded SM, you can - use it. Otherwise, you will need to run an SM, such as OpenSM, on one - of your end nodes. - - If an SM is running on your network, you should see the following: - - $ cat /sys/class/infiniband/driverX/ports/1/state - 4: ACTIVE - - where driverX is mthca0, ipath5, ehca3, etc. - - To further test the InfiniBand software stack, use IPoIB (this - assumes you have two IB hosts named host1 and host2): - - host1$ ifconfig ib0 a.b.c.x - host2$ ifconfig ib0 a.b.c.y - host1$ ping a.b.c.y - host2$ ping a.b.c.x - - For other device types, follow the appropriate procedures. - - - Check NFS Setup - - For the NFS components enabled above (client and/or server), - test their functionality over standard Ethernet using TCP/IP or UDP/IP. - -NFS/RDMA Setup -~~~~~~~~~~~~~~ - - We recommend that you use two machines, one to act as the client and - one to act as the server. - - One time configuration: - - - On the server system, configure the /etc/exports file and - start the NFS/RDMA server. - - Exports entries with the following formats have been tested: - - /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) - /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) - - The IP address(es) is(are) the client's IPoIB address for an InfiniBand - HCA or the cleint's iWARP address(es) for an RNIC. - - NOTE: The "insecure" option must be used because the NFS/RDMA client does - not use a reserved port. - - Each time a machine boots: - - - Load and configure the RDMA drivers - - For InfiniBand using a Mellanox adapter: - - $ modprobe ib_mthca - $ modprobe ib_ipoib - $ ifconfig ib0 a.b.c.d - - NOTE: use unique addresses for the client and server - - - Start the NFS server - - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA transport module: - - $ modprobe svcrdma - - Regardless of how the server was built (module or built-in), start the - server: - - $ /etc/init.d/nfs start - - or - - $ service nfs start - - Instruct the server to listen on the RDMA transport: - - $ echo rdma 20049 > /proc/fs/nfsd/portlist - - - On the client system - - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in - kernel config), load the RDMA client module: - - $ modprobe xprtrdma.ko - - Regardless of how the client was built (module or built-in), use this - command to mount the NFS/RDMA server: - - $ mount -o rdma,port=20049 :/ /mnt - - To verify that the mount is using RDMA, run "cat /proc/mounts" and check - the "proto" field for the given mount. - - Congratulations! You're using NFS/RDMA! diff --git a/Documentation/filesystems/nfs.txt b/Documentation/filesystems/nfs.txt deleted file mode 100644 index f50f26ce6cd0..000000000000 --- a/Documentation/filesystems/nfs.txt +++ /dev/null @@ -1,98 +0,0 @@ - -The NFS client -============== - -The NFS version 2 protocol was first documented in RFC1094 (March 1989). -Since then two more major releases of NFS have been published, with NFSv3 -being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April -2003). - -The Linux NFS client currently supports all the above published versions, -and work is in progress on adding support for minor version 1 of the NFSv4 -protocol. - -The purpose of this document is to provide information on some of the -upcall interfaces that are used in order to provide the NFS client with -some of the information that it requires in order to fully comply with -the NFS spec. - -The DNS resolver -================ - -NFSv4 allows for one server to refer the NFS client to data that has been -migrated onto another server by means of the special "fs_locations" -attribute. See - http://tools.ietf.org/html/rfc3530#section-6 -and - http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 - -The fs_locations information can take the form of either an ip address and -a path, or a DNS hostname and a path. The latter requires the NFS client to -do a DNS lookup in order to mount the new volume, and hence the need for an -upcall to allow userland to provide this service. - -Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual -/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: - - (1) The process checks the dns_resolve cache to see if it contains a - valid entry. If so, it returns that entry and exits. - - (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' - (may be changed using the 'nfs.cache_getent' kernel boot parameter) - is run, with two arguments: - - the cache name, "dns_resolve" - - the hostname to resolve - - (3) After looking up the corresponding ip address, the helper script - writes the result into the rpc_pipefs pseudo-file - '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' - in the following (text) format: - - " \n" - - Where is in the usual IPv4 (123.456.78.90) or IPv6 - (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. - is identical to the second argument of the helper - script, and is the 'time to live' of this cache entry (in - units of seconds). - - Note: If is invalid, say the string "0", then a negative - entry is created, which will cause the kernel to treat the hostname - as having no valid DNS translation. - - - - -A basic sample /sbin/nfs_cache_getent -===================================== - -#!/bin/bash -# -ttl=600 -# -cut=/usr/bin/cut -getent=/usr/bin/getent -rpc_pipefs=/var/lib/nfs/rpc_pipefs -# -die() -{ - echo "Usage: $0 cache_name entry_name" - exit 1 -} - -[ $# -lt 2 ] && die -cachename="$1" -cache_path=${rpc_pipefs}/cache/${cachename}/channel - -case "${cachename}" in - dns_resolve) - name="$2" - result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" - [ -z "${result}" ] && result="0" - ;; - *) - die - ;; -esac -echo "${result} ${name} ${ttl}" >${cache_path} - diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX new file mode 100644 index 000000000000..6ff3d212027b --- /dev/null +++ b/Documentation/filesystems/nfs/00-INDEX @@ -0,0 +1,12 @@ +00-INDEX + - this file (nfs-related documentation). +Exporting + - explanation of how to make filesystems exportable. +nfs.txt + - nfs client, and DNS resolution for fs_locations. +nfs41-server.txt + - info on the Linux server implementation of NFSv4 minor version 1. +nfs-rdma.txt + - how to install and setup the Linux NFS/RDMA client and server software +nfsroot.txt + - short guide on setting up a diskless box with NFS root filesystem. diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting new file mode 100644 index 000000000000..87019d2b5981 --- /dev/null +++ b/Documentation/filesystems/nfs/Exporting @@ -0,0 +1,147 @@ + +Making Filesystems Exportable +============================= + +Overview +-------- + +All filesystem operations require a dentry (or two) as a starting +point. Local applications have a reference-counted hold on suitable +dentries via open file descriptors or cwd/root. However remote +applications that access a filesystem via a remote filesystem protocol +such as NFS may not be able to hold such a reference, and so need a +different way to refer to a particular dentry. As the alternative +form of reference needs to be stable across renames, truncates, and +server-reboot (among other things, though these tend to be the most +problematic), there is no simple answer like 'filename'. + +The mechanism discussed here allows each filesystem implementation to +specify how to generate an opaque (outside of the filesystem) byte +string for any dentry, and how to find an appropriate dentry for any +given opaque byte string. +This byte string will be called a "filehandle fragment" as it +corresponds to part of an NFS filehandle. + +A filesystem which supports the mapping between filehandle fragments +and dentries will be termed "exportable". + + + +Dcache Issues +------------- + +The dcache normally contains a proper prefix of any given filesystem +tree. This means that if any filesystem object is in the dcache, then +all of the ancestors of that filesystem object are also in the dcache. +As normal access is by filename this prefix is created naturally and +maintained easily (by each object maintaining a reference count on +its parent). + +However when objects are included into the dcache by interpreting a +filehandle fragment, there is no automatic creation of a path prefix +for the object. This leads to two related but distinct features of +the dcache that are not needed for normal filesystem access. + +1/ The dcache must sometimes contain objects that are not part of the + proper prefix. i.e that are not connected to the root. +2/ The dcache must be prepared for a newly found (via ->lookup) directory + to already have a (non-connected) dentry, and must be able to move + that dentry into place (based on the parent and name in the + ->lookup). This is particularly needed for directories as + it is a dcache invariant that directories only have one dentry. + +To implement these features, the dcache has: + +a/ A dentry flag DCACHE_DISCONNECTED which is set on + any dentry that might not be part of the proper prefix. + This is set when anonymous dentries are created, and cleared when a + dentry is noticed to be a child of a dentry which is in the proper + prefix. + +b/ A per-superblock list "s_anon" of dentries which are the roots of + subtrees that are not in the proper prefix. These dentries, as + well as the proper prefix, need to be released at unmount time. As + these dentries will not be hashed, they are linked together on the + d_hash list_head. + +c/ Helper routines to allocate anonymous dentries, and to help attach + loose directory dentries at lookup time. They are: + d_alloc_anon(inode) will return a dentry for the given inode. + If the inode already has a dentry, one of those is returned. + If it doesn't, a new anonymous (IS_ROOT and + DCACHE_DISCONNECTED) dentry is allocated and attached. + In the case of a directory, care is taken that only one dentry + can ever be attached. + d_splice_alias(inode, dentry) will make sure that there is a + dentry with the same name and parent as the given dentry, and + which refers to the given inode. + If the inode is a directory and already has a dentry, then that + dentry is d_moved over the given dentry. + If the passed dentry gets attached, care is taken that this is + mutually exclusive to a d_alloc_anon operation. + If the passed dentry is used, NULL is returned, else the used + dentry is returned. This corresponds to the calling pattern of + ->lookup. + + +Filesystem Issues +----------------- + +For a filesystem to be exportable it must: + + 1/ provide the filehandle fragment routines described below. + 2/ make sure that d_splice_alias is used rather than d_add + when ->lookup finds an inode for a given parent and name. + Typically the ->lookup routine will end with a: + + return d_splice_alias(inode, dentry); + } + + + + A file system implementation declares that instances of the filesystem +are exportable by setting the s_export_op field in the struct +super_block. This field must point to a "struct export_operations" +struct which has the following members: + + encode_fh (optional) + Takes a dentry and creates a filehandle fragment which can later be used + to find or create a dentry for the same object. The default + implementation creates a filehandle fragment that encodes a 32bit inode + and generation number for the inode encoded, and if necessary the + same information for the parent. + + fh_to_dentry (mandatory) + Given a filehandle fragment, this should find the implied object and + create a dentry for it (possibly with d_alloc_anon). + + fh_to_parent (optional but strongly recommended) + Given a filehandle fragment, this should find the parent of the + implied object and create a dentry for it (possibly with d_alloc_anon). + May fail if the filehandle fragment is too small. + + get_parent (optional but strongly recommended) + When given a dentry for a directory, this should return a dentry for + the parent. Quite possibly the parent dentry will have been allocated + by d_alloc_anon. The default get_parent function just returns an error + so any filehandle lookup that requires finding a parent will fail. + ->lookup("..") is *not* used as a default as it can leave ".." entries + in the dcache which are too messy to work with. + + get_name (optional) + When given a parent dentry and a child dentry, this should find a name + in the directory identified by the parent dentry, which leads to the + object identified by the child dentry. If no get_name function is + supplied, a default implementation is provided which uses vfs_readdir + to find potential names, and matches inode numbers to find the correct + match. + + +A filehandle fragment consists of an array of 1 or more 4byte words, +together with a one byte "type". +The decode_fh routine should not depend on the stated size that is +passed to it. This size may be larger than the original filehandle +generated by encode_fh, in which case it will have been padded with +nuls. Rather, the encode_fh routine should choose a "type" which +indicates the decode_fh how much of the filehandle is valid, and how +it should be interpreted. diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt new file mode 100644 index 000000000000..e386f7e4bcee --- /dev/null +++ b/Documentation/filesystems/nfs/nfs-rdma.txt @@ -0,0 +1,271 @@ +################################################################################ +# # +# NFS/RDMA README # +# # +################################################################################ + + Author: NetApp and Open Grid Computing + Date: May 29, 2008 + +Table of Contents +~~~~~~~~~~~~~~~~~ + - Overview + - Getting Help + - Installation + - Check RDMA and NFS Setup + - NFS/RDMA Setup + +Overview +~~~~~~~~ + + This document describes how to install and setup the Linux NFS/RDMA client + and server software. + + The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server + was first included in the following release, Linux 2.6.25. + + In our testing, we have obtained excellent performance results (full 10Gbit + wire bandwidth at minimal client CPU) under many workloads. The code passes + the full Connectathon test suite and operates over both Infiniband and iWARP + RDMA adapters. + +Getting Help +~~~~~~~~~~~~ + + If you get stuck, you can ask questions on the + + nfs-rdma-devel@lists.sourceforge.net + + mailing list. + +Installation +~~~~~~~~~~~~ + + These instructions are a step by step guide to building a machine for + use with NFS/RDMA. + + - Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + + - Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + + - Install nfs-utils-1.1.2 or greater on the client + + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: + + $ /sbin/mount.nfs -V + + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. + + Download the latest package from: + + http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: + + $ ./configure --disable-gss --disable-nfsv4 + + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. + + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount command. + + NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.2 is needed on the client. + + - Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the 2.6 Linux + kernel can be found at: + + ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ + + Download the sources and place them in an appropriate location. + + - Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + + - Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + + - Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +~~~~~~~~~~~~~~~~~~~~~~~~ + + Before configuring the NFS/RDMA software, it is a good idea to test + your new kernel to ensure that the kernel is working correctly. + In particular, it is a good idea to verify that the RDMA stack + is functioning as expected and standard NFS over TCP/IP and/or UDP/IP + is working properly. + + - Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + $ modprobe ib_mthca + $ modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + $ cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + host1$ ifconfig ib0 a.b.c.x + host2$ ifconfig ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x + + For other device types, follow the appropriate procedures. + + - Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +~~~~~~~~~~~~~~ + + We recommend that you use two machines, one to act as the client and + one to act as the server. + + One time configuration: + + - On the server system, configure the /etc/exports file and + start the NFS/RDMA server. + + Exports entries with the following formats have been tested: + + /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) + /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) + + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the cleint's iWARP address(es) for an RNIC. + + NOTE: The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. + + Each time a machine boots: + + - Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ifconfig ib0 a.b.c.d + + NOTE: use unique addresses for the client and server + + - Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: + + $ modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the + server: + + $ /etc/init.d/nfs start + + or + + $ service nfs start + + Instruct the server to listen on the RDMA transport: + + $ echo rdma 20049 > /proc/fs/nfsd/portlist + + - On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: + + $ modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: + + $ mount -o rdma,port=20049 :/ /mnt + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt new file mode 100644 index 000000000000..f50f26ce6cd0 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs.txt @@ -0,0 +1,98 @@ + +The NFS client +============== + +The NFS version 2 protocol was first documented in RFC1094 (March 1989). +Since then two more major releases of NFS have been published, with NFSv3 +being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April +2003). + +The Linux NFS client currently supports all the above published versions, +and work is in progress on adding support for minor version 1 of the NFSv4 +protocol. + +The purpose of this document is to provide information on some of the +upcall interfaces that are used in order to provide the NFS client with +some of the information that it requires in order to fully comply with +the NFS spec. + +The DNS resolver +================ + +NFSv4 allows for one server to refer the NFS client to data that has been +migrated onto another server by means of the special "fs_locations" +attribute. See + http://tools.ietf.org/html/rfc3530#section-6 +and + http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 + +The fs_locations information can take the form of either an ip address and +a path, or a DNS hostname and a path. The latter requires the NFS client to +do a DNS lookup in order to mount the new volume, and hence the need for an +upcall to allow userland to provide this service. + +Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual +/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: + + (1) The process checks the dns_resolve cache to see if it contains a + valid entry. If so, it returns that entry and exits. + + (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' + (may be changed using the 'nfs.cache_getent' kernel boot parameter) + is run, with two arguments: + - the cache name, "dns_resolve" + - the hostname to resolve + + (3) After looking up the corresponding ip address, the helper script + writes the result into the rpc_pipefs pseudo-file + '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' + in the following (text) format: + + " \n" + + Where is in the usual IPv4 (123.456.78.90) or IPv6 + (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. + is identical to the second argument of the helper + script, and is the 'time to live' of this cache entry (in + units of seconds). + + Note: If is invalid, say the string "0", then a negative + entry is created, which will cause the kernel to treat the hostname + as having no valid DNS translation. + + + + +A basic sample /sbin/nfs_cache_getent +===================================== + +#!/bin/bash +# +ttl=600 +# +cut=/usr/bin/cut +getent=/usr/bin/getent +rpc_pipefs=/var/lib/nfs/rpc_pipefs +# +die() +{ + echo "Usage: $0 cache_name entry_name" + exit 1 +} + +[ $# -lt 2 ] && die +cachename="$1" +cache_path=${rpc_pipefs}/cache/${cachename}/channel + +case "${cachename}" in + dns_resolve) + name="$2" + result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" + [ -z "${result}" ] && result="0" + ;; + *) + die + ;; +esac +echo "${result} ${name} ${ttl}" >${cache_path} + diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt new file mode 100644 index 000000000000..1bd0d0c05171 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs41-server.txt @@ -0,0 +1,222 @@ +NFSv4.1 Server Implementation + +Server support for minorversion 1 can be controlled using the +/proc/fs/nfsd/versions control file. The string output returned +by reading this file will contain either "+4.1" or "-4.1" +correspondingly. + +Currently, server support for minorversion 1 is disabled by default. +It can be enabled at run time by writing the string "+4.1" to +the /proc/fs/nfsd/versions control file. Note that to write this +control file, the nfsd service must be taken down. Use your user-mode +nfs-utils to set this up; see rpc.nfsd(8) + +(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and +"-4", respectively. Therefore, code meant to work on both new and old +kernels must turn 4.1 on or off *before* turning support for version 4 +on or off; rpc.nfsd does this correctly.) + +The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based +on the latest NFSv4.1 Internet Draft: +http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 + +From the many new features in NFSv4.1 the current implementation +focuses on the mandatory-to-implement NFSv4.1 Sessions, providing +"exactly once" semantics and better control and throttling of the +resources allocated for each client. + +Other NFSv4.1 features, Parallel NFS operations in particular, +are still under development out of tree. +See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design +for more information. + +The current implementation is intended for developers only: while it +does support ordinary file operations on clients we have tested against +(including the linux client), it is incomplete in ways which may limit +features unexpectedly, cause known bugs in rare cases, or cause +interoperability problems with future clients. Known issues: + + - gss support is questionable: currently mounts with kerberos + from a linux client are possible, but we aren't really + conformant with the spec (for example, we don't use kerberos + on the backchannel correctly). + - no trunking support: no clients currently take advantage of + trunking, but this is a mandatory feature, and its use is + recommended to clients in a number of places. (E.g. to ensure + timely renewal in case an existing connection's retry timeouts + have gotten too long; see section 8.3 of the draft.) + Therefore, lack of this feature may cause future clients to + fail. + - Incomplete backchannel support: incomplete backchannel gss + support and no support for BACKCHANNEL_CTL mean that + callbacks (hence delegations and layouts) may not be + available and clients confused by the incomplete + implementation may fail. + - Server reboot recovery is unsupported; if the server reboots, + clients may fail. + - We do not support SSV, which provides security for shared + client-server state (thus preventing unauthorized tampering + with locks and opens, for example). It is mandatory for + servers to support this, though no clients use it yet. + - Mandatory operations which we do not support, such as + DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and + TEST_STATEID, are not currently used by clients, but will be + (and the spec recommends their uses in common cases), and + clients should not be expected to know how to recover from the + case where they are not supported. This will eventually cause + interoperability failures. + +In addition, some limitations are inherited from the current NFSv4 +implementation: + + - Incomplete delegation enforcement: if a file is renamed or + unlinked, a client holding a delegation may continue to + indefinitely allow opens of the file under the old name. + +The table below, taken from the NFSv4.1 document, lists +the operations that are mandatory to implement (REQ), optional +(OPT), and NFSv4.0 operations that are required not to implement (MNI) +in minor version 1. The first column indicates the operations that +are not supported yet by the linux server implementation. + +The OPTIONAL features identified and their abbreviations are as follows: + pNFS Parallel NFS + FDELG File Delegations + DDELG Directory Delegations + +The following abbreviations indicate the linux server implementation status. + I Implemented NFSv4.1 operations. + NS Not Supported. + NS* unimplemented optional feature. + P pNFS features implemented out of tree. + PNS pNFS features that are not supported yet (out of tree). + +Operations + + +----------------------+------------+--------------+----------------+ + | Operation | REQ, REC, | Feature | Definition | + | | OPT, or | (REQ, REC, | | + | | MNI | or OPT) | | + +----------------------+------------+--------------+----------------+ + | ACCESS | REQ | | Section 18.1 | +NS | BACKCHANNEL_CTL | REQ | | Section 18.33 | +NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | + | CLOSE | REQ | | Section 18.2 | + | COMMIT | REQ | | Section 18.3 | + | CREATE | REQ | | Section 18.4 | +I | CREATE_SESSION | REQ | | Section 18.36 | +NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | + | DELEGRETURN | OPT | FDELG, | Section 18.6 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS | DESTROY_CLIENTID | REQ | | Section 18.50 | +I | DESTROY_SESSION | REQ | | Section 18.37 | +I | EXCHANGE_ID | REQ | | Section 18.35 | +NS | FREE_STATEID | REQ | | Section 18.38 | + | GETATTR | REQ | | Section 18.7 | +P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | +P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | + | GETFH | REQ | | Section 18.8 | +NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | +P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | +P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | +P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | + | LINK | OPT | | Section 18.9 | + | LOCK | REQ | | Section 18.10 | + | LOCKT | REQ | | Section 18.11 | + | LOCKU | REQ | | Section 18.12 | + | LOOKUP | REQ | | Section 18.13 | + | LOOKUPP | REQ | | Section 18.14 | + | NVERIFY | REQ | | Section 18.15 | + | OPEN | REQ | | Section 18.16 | +NS*| OPENATTR | OPT | | Section 18.17 | + | OPEN_CONFIRM | MNI | | N/A | + | OPEN_DOWNGRADE | REQ | | Section 18.18 | + | PUTFH | REQ | | Section 18.19 | + | PUTPUBFH | REQ | | Section 18.20 | + | PUTROOTFH | REQ | | Section 18.21 | + | READ | REQ | | Section 18.22 | + | READDIR | REQ | | Section 18.23 | + | READLINK | OPT | | Section 18.24 | +NS | RECLAIM_COMPLETE | REQ | | Section 18.51 | + | RELEASE_LOCKOWNER | MNI | | N/A | + | REMOVE | REQ | | Section 18.25 | + | RENAME | REQ | | Section 18.26 | + | RENEW | MNI | | N/A | + | RESTOREFH | REQ | | Section 18.27 | + | SAVEFH | REQ | | Section 18.28 | + | SECINFO | REQ | | Section 18.29 | +NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | + | | | layout (REQ) | Section 13.12 | +I | SEQUENCE | REQ | | Section 18.46 | + | SETATTR | REQ | | Section 18.30 | + | SETCLIENTID | MNI | | N/A | + | SETCLIENTID_CONFIRM | MNI | | N/A | +NS | SET_SSV | REQ | | Section 18.47 | +NS | TEST_STATEID | REQ | | Section 18.48 | + | VERIFY | REQ | | Section 18.31 | +NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | + | WRITE | REQ | | Section 18.32 | + +Callback Operations + + +-------------------------+-----------+-------------+---------------+ + | Operation | REQ, REC, | Feature | Definition | + | | OPT, or | (REQ, REC, | | + | | MNI | or OPT) | | + +-------------------------+-----------+-------------+---------------+ + | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | +P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | +NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | +P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | +NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | +NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | + | CB_RECALL | OPT | FDELG, | Section 20.2 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS | CB_RECALL_SLOT | REQ | | Section 20.8 | +NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | + | | | (REQ) | | +I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | + | | | DDELG, pNFS | | + | | | (REQ) | | +NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | + | | | DDELG, pNFS | | + | | | (REQ) | | + +-------------------------+-----------+-------------+---------------+ + +Implementation notes: + +DELEGPURGE: +* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or + CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that + persist across client reboots). Thus we need not implement this for + now. + +EXCHANGE_ID: +* only SP4_NONE state protection supported +* implementation ids are ignored + +CREATE_SESSION: +* backchannel attributes are ignored +* backchannel security parameters are ignored + +SEQUENCE: +* no support for dynamic slot table renegotiation (optional) + +nfsv4.1 COMPOUND rules: +The following cases aren't supported yet: +* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION, + DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. +* DESTROY_SESSION MUST be the final operation in the COMPOUND request. + +Nonstandard compound limitations: +* No support for a sessions fore channel RPC compound that requires both a + ca_maxrequestsize request and a ca_maxresponsesize reply, so we may + fail to live up to the promise we made in CREATE_SESSION fore channel + negotiation. +* No more than one IO operation (read, write, readdir) allowed per + compound. diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt new file mode 100644 index 000000000000..3ba0b945aaf8 --- /dev/null +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -0,0 +1,270 @@ +Mounting the root filesystem via NFS (nfsroot) +=============================================== + +Written 1996 by Gero Kuhlmann +Updated 1997 by Martin Mares +Updated 2006 by Nico Schottelius +Updated 2006 by Horms + + + +In order to use a diskless system, such as an X-terminal or printer server +for example, it is necessary for the root filesystem to be present on a +non-disk device. This may be an initramfs (see Documentation/filesystems/ +ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a +filesystem mounted via NFS. The following text describes on how to use NFS +for the root filesystem. For the rest of this text 'client' means the +diskless system, and 'server' means the NFS server. + + + + +1.) Enabling nfsroot capabilities + ----------------------------- + +In order to use nfsroot, NFS client support needs to be selected as +built-in during configuration. Once this has been selected, the nfsroot +option will become available, which should also be selected. + +In the networking options, kernel level autoconfiguration can be selected, +along with the types of autoconfiguration to support. Selecting all of +DHCP, BOOTP and RARP is safe. + + + + +2.) Kernel command line + ------------------- + +When the kernel has been loaded by a boot loader (see below) it needs to be +told what root fs device to use. And in the case of nfsroot, where to find +both the server and the name of the directory on the server to mount as root. +This can be established using the following kernel command line parameters: + + +root=/dev/nfs + + This is necessary to enable the pseudo-NFS-device. Note that it's not a + real device but just a synonym to tell the kernel to use NFS instead of + a real device. + + +nfsroot=[:][,] + + If the `nfsroot' parameter is NOT given on the command line, + the default "/tftpboot/%s" will be used. + + Specifies the IP address of the NFS server. + The default address is determined by the `ip' parameter + (see below). This parameter allows the use of different + servers for IP autoconfiguration and NFS. + + Name of the directory on the server to mount as root. + If there is a "%s" token in the string, it will be + replaced by the ASCII-representation of the client's + IP address. + + Standard NFS options. All options are separated by commas. + The following defaults are used: + port = as given by server portmap daemon + rsize = 4096 + wsize = 4096 + timeo = 7 + retrans = 3 + acregmin = 3 + acregmax = 60 + acdirmin = 30 + acdirmax = 60 + flags = hard, nointr, noposix, cto, ac + + +ip=:::::: + + This parameter tells the kernel how to configure IP addresses of devices + and also how to set up the IP routing table. It was originally called + `nfsaddrs', but now the boot-time IP configuration works independently of + NFS, so it was renamed to `ip' and the old name remained as an alias for + compatibility reasons. + + If this parameter is missing from the kernel command line, all fields are + assumed to be empty, and the defaults mentioned below apply. In general + this means that the kernel tries to configure everything using + autoconfiguration. + + The parameter can appear alone as the value to the `ip' + parameter (without all the ':' characters before). If the value is + "ip=off" or "ip=none", no autoconfiguration will take place, otherwise + autoconfiguration will take place. The most common way to use this + is "ip=dhcp". + + IP address of the client. + + Default: Determined using autoconfiguration. + + IP address of the NFS server. If RARP is used to determine + the client address and this parameter is NOT empty only + replies from the specified server are accepted. + + Only required for NFS root. That is autoconfiguration + will not be triggered if it is missing and NFS root is not + in operation. + + Default: Determined using autoconfiguration. + The address of the autoconfiguration server is used. + + IP address of a gateway if the server is on a different subnet. + + Default: Determined using autoconfiguration. + + Netmask for local network interface. If unspecified + the netmask is derived from the client IP address assuming + classful addressing. + + Default: Determined using autoconfiguration. + + Name of the client. May be supplied by autoconfiguration, + but its absence will not trigger autoconfiguration. + + Default: Client IP address is used in ASCII notation. + + Name of network device to use. + + Default: If the host only has one device, it is used. + Otherwise the device is determined using + autoconfiguration. This is done by sending + autoconfiguration requests out of all devices, + and using the device that received the first reply. + + Method to use for autoconfiguration. In the case of options + which specify multiple autoconfiguration protocols, + requests are sent using all protocols, and the first one + to reply is used. + + Only autoconfiguration protocols that have been compiled + into the kernel will be used, regardless of the value of + this option. + + off or none: don't use autoconfiguration + (do static IP assignment instead) + on or any: use any protocol available in the kernel + (default) + dhcp: use DHCP + bootp: use BOOTP + rarp: use RARP + both: use both BOOTP and RARP but not DHCP + (old option kept for backwards compatibility) + + Default: any + + + + +3.) Boot Loader + ---------- + +To get the kernel into memory different approaches can be used. +They depend on various facilities being available: + + +3.1) Booting from a floppy using syslinux + + When building kernels, an easy way to create a boot floppy that uses + syslinux is to use the zdisk or bzdisk make targets which use zimage + and bzimage images respectively. Both targets accept the + FDARGS parameter which can be used to set the kernel command line. + + e.g. + make bzdisk FDARGS="root=/dev/nfs" + + Note that the user running this command will need to have + access to the floppy drive device, /dev/fd0 + + For more information on syslinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + N.B: Previously it was possible to write a kernel directly to + a floppy using dd, configure the boot device using rdev, and + boot using the resulting floppy. Linux no longer supports this + method of booting. + +3.2) Booting from a cdrom using isolinux + + When building kernels, an easy way to create a bootable cdrom that + uses isolinux is to use the isoimage target which uses a bzimage + image. Like zdisk and bzdisk, this target accepts the FDARGS + parameter which can be used to set the kernel command line. + + e.g. + make isoimage FDARGS="root=/dev/nfs" + + The resulting iso image will be arch//boot/image.iso + This can be written to a cdrom using a variety of tools including + cdrecord. + + e.g. + cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + +3.2) Using LILO + When using LILO all the necessary command line parameters may be + specified using the 'append=' directive in the LILO configuration + file. + + However, to use the 'root=' directive you also need to create + a dummy root device, which may be removed after LILO is run. + + mknod /dev/boot255 c 0 255 + + For information on configuring LILO, please refer to its documentation. + +3.3) Using GRUB + When using GRUB, kernel parameter are simply appended after the kernel + specification: kernel + +3.4) Using loadlin + loadlin may be used to boot Linux from a DOS command prompt without + requiring a local hard disk to mount as root. This has not been + thoroughly tested by the authors of this document, but in general + it should be possible configure the kernel command line similarly + to the configuration of LILO. + + Please refer to the loadlin documentation for further information. + +3.5) Using a boot ROM + This is probably the most elegant way of booting a diskless client. + With a boot ROM the kernel is loaded using the TFTP protocol. The + authors of this document are not aware of any no commercial boot + ROMs that support booting Linux over the network. However, there + are two free implementations of a boot ROM, netboot-nfs and + etherboot, both of which are available on sunsite.unc.edu, and both + of which contain everything you need to boot a diskless Linux client. + +3.6) Using pxelinux + Pxelinux may be used to boot linux using the PXE boot loader + which is present on many modern network cards. + + When using pxelinux, the kernel image is specified using + "kernel ". The nfsroot parameters + are passed to the kernel by adding them to the "append" line. + It is common to use serial console in conjunction with pxeliunx, + see Documentation/serial-console.txt for more information. + + For more information on isolinux, including how to create bootdisks + for prebuilt kernels, see http://syslinux.zytor.com/ + + + + +4.) Credits + ------- + + The nfsroot code in the kernel and the RARP support have been written + by Gero Kuhlmann . + + The rest of the IP layer autoconfiguration code has been written + by Martin Mares . + + In order to write the initial version of nfsroot I would like to thank + Jens-Uwe Mager for his help. diff --git a/Documentation/filesystems/nfs41-server.txt b/Documentation/filesystems/nfs41-server.txt deleted file mode 100644 index 1bd0d0c05171..000000000000 --- a/Documentation/filesystems/nfs41-server.txt +++ /dev/null @@ -1,222 +0,0 @@ -NFSv4.1 Server Implementation - -Server support for minorversion 1 can be controlled using the -/proc/fs/nfsd/versions control file. The string output returned -by reading this file will contain either "+4.1" or "-4.1" -correspondingly. - -Currently, server support for minorversion 1 is disabled by default. -It can be enabled at run time by writing the string "+4.1" to -the /proc/fs/nfsd/versions control file. Note that to write this -control file, the nfsd service must be taken down. Use your user-mode -nfs-utils to set this up; see rpc.nfsd(8) - -(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and -"-4", respectively. Therefore, code meant to work on both new and old -kernels must turn 4.1 on or off *before* turning support for version 4 -on or off; rpc.nfsd does this correctly.) - -The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based -on the latest NFSv4.1 Internet Draft: -http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 - -From the many new features in NFSv4.1 the current implementation -focuses on the mandatory-to-implement NFSv4.1 Sessions, providing -"exactly once" semantics and better control and throttling of the -resources allocated for each client. - -Other NFSv4.1 features, Parallel NFS operations in particular, -are still under development out of tree. -See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design -for more information. - -The current implementation is intended for developers only: while it -does support ordinary file operations on clients we have tested against -(including the linux client), it is incomplete in ways which may limit -features unexpectedly, cause known bugs in rare cases, or cause -interoperability problems with future clients. Known issues: - - - gss support is questionable: currently mounts with kerberos - from a linux client are possible, but we aren't really - conformant with the spec (for example, we don't use kerberos - on the backchannel correctly). - - no trunking support: no clients currently take advantage of - trunking, but this is a mandatory feature, and its use is - recommended to clients in a number of places. (E.g. to ensure - timely renewal in case an existing connection's retry timeouts - have gotten too long; see section 8.3 of the draft.) - Therefore, lack of this feature may cause future clients to - fail. - - Incomplete backchannel support: incomplete backchannel gss - support and no support for BACKCHANNEL_CTL mean that - callbacks (hence delegations and layouts) may not be - available and clients confused by the incomplete - implementation may fail. - - Server reboot recovery is unsupported; if the server reboots, - clients may fail. - - We do not support SSV, which provides security for shared - client-server state (thus preventing unauthorized tampering - with locks and opens, for example). It is mandatory for - servers to support this, though no clients use it yet. - - Mandatory operations which we do not support, such as - DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and - TEST_STATEID, are not currently used by clients, but will be - (and the spec recommends their uses in common cases), and - clients should not be expected to know how to recover from the - case where they are not supported. This will eventually cause - interoperability failures. - -In addition, some limitations are inherited from the current NFSv4 -implementation: - - - Incomplete delegation enforcement: if a file is renamed or - unlinked, a client holding a delegation may continue to - indefinitely allow opens of the file under the old name. - -The table below, taken from the NFSv4.1 document, lists -the operations that are mandatory to implement (REQ), optional -(OPT), and NFSv4.0 operations that are required not to implement (MNI) -in minor version 1. The first column indicates the operations that -are not supported yet by the linux server implementation. - -The OPTIONAL features identified and their abbreviations are as follows: - pNFS Parallel NFS - FDELG File Delegations - DDELG Directory Delegations - -The following abbreviations indicate the linux server implementation status. - I Implemented NFSv4.1 operations. - NS Not Supported. - NS* unimplemented optional feature. - P pNFS features implemented out of tree. - PNS pNFS features that are not supported yet (out of tree). - -Operations - - +----------------------+------------+--------------+----------------+ - | Operation | REQ, REC, | Feature | Definition | - | | OPT, or | (REQ, REC, | | - | | MNI | or OPT) | | - +----------------------+------------+--------------+----------------+ - | ACCESS | REQ | | Section 18.1 | -NS | BACKCHANNEL_CTL | REQ | | Section 18.33 | -NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | - | CLOSE | REQ | | Section 18.2 | - | COMMIT | REQ | | Section 18.3 | - | CREATE | REQ | | Section 18.4 | -I | CREATE_SESSION | REQ | | Section 18.36 | -NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 | - | DELEGRETURN | OPT | FDELG, | Section 18.6 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS | DESTROY_CLIENTID | REQ | | Section 18.50 | -I | DESTROY_SESSION | REQ | | Section 18.37 | -I | EXCHANGE_ID | REQ | | Section 18.35 | -NS | FREE_STATEID | REQ | | Section 18.38 | - | GETATTR | REQ | | Section 18.7 | -P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | -P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | - | GETFH | REQ | | Section 18.8 | -NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | -P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | -P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | -P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | - | LINK | OPT | | Section 18.9 | - | LOCK | REQ | | Section 18.10 | - | LOCKT | REQ | | Section 18.11 | - | LOCKU | REQ | | Section 18.12 | - | LOOKUP | REQ | | Section 18.13 | - | LOOKUPP | REQ | | Section 18.14 | - | NVERIFY | REQ | | Section 18.15 | - | OPEN | REQ | | Section 18.16 | -NS*| OPENATTR | OPT | | Section 18.17 | - | OPEN_CONFIRM | MNI | | N/A | - | OPEN_DOWNGRADE | REQ | | Section 18.18 | - | PUTFH | REQ | | Section 18.19 | - | PUTPUBFH | REQ | | Section 18.20 | - | PUTROOTFH | REQ | | Section 18.21 | - | READ | REQ | | Section 18.22 | - | READDIR | REQ | | Section 18.23 | - | READLINK | OPT | | Section 18.24 | -NS | RECLAIM_COMPLETE | REQ | | Section 18.51 | - | RELEASE_LOCKOWNER | MNI | | N/A | - | REMOVE | REQ | | Section 18.25 | - | RENAME | REQ | | Section 18.26 | - | RENEW | MNI | | N/A | - | RESTOREFH | REQ | | Section 18.27 | - | SAVEFH | REQ | | Section 18.28 | - | SECINFO | REQ | | Section 18.29 | -NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | - | | | layout (REQ) | Section 13.12 | -I | SEQUENCE | REQ | | Section 18.46 | - | SETATTR | REQ | | Section 18.30 | - | SETCLIENTID | MNI | | N/A | - | SETCLIENTID_CONFIRM | MNI | | N/A | -NS | SET_SSV | REQ | | Section 18.47 | -NS | TEST_STATEID | REQ | | Section 18.48 | - | VERIFY | REQ | | Section 18.31 | -NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | - | WRITE | REQ | | Section 18.32 | - -Callback Operations - - +-------------------------+-----------+-------------+---------------+ - | Operation | REQ, REC, | Feature | Definition | - | | OPT, or | (REQ, REC, | | - | | MNI | or OPT) | | - +-------------------------+-----------+-------------+---------------+ - | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | -P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | -NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | -P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | -NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | -NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | - | CB_RECALL | OPT | FDELG, | Section 20.2 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS | CB_RECALL_SLOT | REQ | | Section 20.8 | -NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 | - | | | (REQ) | | -I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 | - | | | DDELG, pNFS | | - | | | (REQ) | | -NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 | - | | | DDELG, pNFS | | - | | | (REQ) | | - +-------------------------+-----------+-------------+---------------+ - -Implementation notes: - -DELEGPURGE: -* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or - CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that - persist across client reboots). Thus we need not implement this for - now. - -EXCHANGE_ID: -* only SP4_NONE state protection supported -* implementation ids are ignored - -CREATE_SESSION: -* backchannel attributes are ignored -* backchannel security parameters are ignored - -SEQUENCE: -* no support for dynamic slot table renegotiation (optional) - -nfsv4.1 COMPOUND rules: -The following cases aren't supported yet: -* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION, - DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. -* DESTROY_SESSION MUST be the final operation in the COMPOUND request. - -Nonstandard compound limitations: -* No support for a sessions fore channel RPC compound that requires both a - ca_maxrequestsize request and a ca_maxresponsesize reply, so we may - fail to live up to the promise we made in CREATE_SESSION fore channel - negotiation. -* No more than one IO operation (read, write, readdir) allowed per - compound. diff --git a/Documentation/filesystems/nfsroot.txt b/Documentation/filesystems/nfsroot.txt deleted file mode 100644 index 3ba0b945aaf8..000000000000 --- a/Documentation/filesystems/nfsroot.txt +++ /dev/null @@ -1,270 +0,0 @@ -Mounting the root filesystem via NFS (nfsroot) -=============================================== - -Written 1996 by Gero Kuhlmann -Updated 1997 by Martin Mares -Updated 2006 by Nico Schottelius -Updated 2006 by Horms - - - -In order to use a diskless system, such as an X-terminal or printer server -for example, it is necessary for the root filesystem to be present on a -non-disk device. This may be an initramfs (see Documentation/filesystems/ -ramfs-rootfs-initramfs.txt), a ramdisk (see Documentation/initrd.txt) or a -filesystem mounted via NFS. The following text describes on how to use NFS -for the root filesystem. For the rest of this text 'client' means the -diskless system, and 'server' means the NFS server. - - - - -1.) Enabling nfsroot capabilities - ----------------------------- - -In order to use nfsroot, NFS client support needs to be selected as -built-in during configuration. Once this has been selected, the nfsroot -option will become available, which should also be selected. - -In the networking options, kernel level autoconfiguration can be selected, -along with the types of autoconfiguration to support. Selecting all of -DHCP, BOOTP and RARP is safe. - - - - -2.) Kernel command line - ------------------- - -When the kernel has been loaded by a boot loader (see below) it needs to be -told what root fs device to use. And in the case of nfsroot, where to find -both the server and the name of the directory on the server to mount as root. -This can be established using the following kernel command line parameters: - - -root=/dev/nfs - - This is necessary to enable the pseudo-NFS-device. Note that it's not a - real device but just a synonym to tell the kernel to use NFS instead of - a real device. - - -nfsroot=[:][,] - - If the `nfsroot' parameter is NOT given on the command line, - the default "/tftpboot/%s" will be used. - - Specifies the IP address of the NFS server. - The default address is determined by the `ip' parameter - (see below). This parameter allows the use of different - servers for IP autoconfiguration and NFS. - - Name of the directory on the server to mount as root. - If there is a "%s" token in the string, it will be - replaced by the ASCII-representation of the client's - IP address. - - Standard NFS options. All options are separated by commas. - The following defaults are used: - port = as given by server portmap daemon - rsize = 4096 - wsize = 4096 - timeo = 7 - retrans = 3 - acregmin = 3 - acregmax = 60 - acdirmin = 30 - acdirmax = 60 - flags = hard, nointr, noposix, cto, ac - - -ip=:::::: - - This parameter tells the kernel how to configure IP addresses of devices - and also how to set up the IP routing table. It was originally called - `nfsaddrs', but now the boot-time IP configuration works independently of - NFS, so it was renamed to `ip' and the old name remained as an alias for - compatibility reasons. - - If this parameter is missing from the kernel command line, all fields are - assumed to be empty, and the defaults mentioned below apply. In general - this means that the kernel tries to configure everything using - autoconfiguration. - - The parameter can appear alone as the value to the `ip' - parameter (without all the ':' characters before). If the value is - "ip=off" or "ip=none", no autoconfiguration will take place, otherwise - autoconfiguration will take place. The most common way to use this - is "ip=dhcp". - - IP address of the client. - - Default: Determined using autoconfiguration. - - IP address of the NFS server. If RARP is used to determine - the client address and this parameter is NOT empty only - replies from the specified server are accepted. - - Only required for NFS root. That is autoconfiguration - will not be triggered if it is missing and NFS root is not - in operation. - - Default: Determined using autoconfiguration. - The address of the autoconfiguration server is used. - - IP address of a gateway if the server is on a different subnet. - - Default: Determined using autoconfiguration. - - Netmask for local network interface. If unspecified - the netmask is derived from the client IP address assuming - classful addressing. - - Default: Determined using autoconfiguration. - - Name of the client. May be supplied by autoconfiguration, - but its absence will not trigger autoconfiguration. - - Default: Client IP address is used in ASCII notation. - - Name of network device to use. - - Default: If the host only has one device, it is used. - Otherwise the device is determined using - autoconfiguration. This is done by sending - autoconfiguration requests out of all devices, - and using the device that received the first reply. - - Method to use for autoconfiguration. In the case of options - which specify multiple autoconfiguration protocols, - requests are sent using all protocols, and the first one - to reply is used. - - Only autoconfiguration protocols that have been compiled - into the kernel will be used, regardless of the value of - this option. - - off or none: don't use autoconfiguration - (do static IP assignment instead) - on or any: use any protocol available in the kernel - (default) - dhcp: use DHCP - bootp: use BOOTP - rarp: use RARP - both: use both BOOTP and RARP but not DHCP - (old option kept for backwards compatibility) - - Default: any - - - - -3.) Boot Loader - ---------- - -To get the kernel into memory different approaches can be used. -They depend on various facilities being available: - - -3.1) Booting from a floppy using syslinux - - When building kernels, an easy way to create a boot floppy that uses - syslinux is to use the zdisk or bzdisk make targets which use zimage - and bzimage images respectively. Both targets accept the - FDARGS parameter which can be used to set the kernel command line. - - e.g. - make bzdisk FDARGS="root=/dev/nfs" - - Note that the user running this command will need to have - access to the floppy drive device, /dev/fd0 - - For more information on syslinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - N.B: Previously it was possible to write a kernel directly to - a floppy using dd, configure the boot device using rdev, and - boot using the resulting floppy. Linux no longer supports this - method of booting. - -3.2) Booting from a cdrom using isolinux - - When building kernels, an easy way to create a bootable cdrom that - uses isolinux is to use the isoimage target which uses a bzimage - image. Like zdisk and bzdisk, this target accepts the FDARGS - parameter which can be used to set the kernel command line. - - e.g. - make isoimage FDARGS="root=/dev/nfs" - - The resulting iso image will be arch//boot/image.iso - This can be written to a cdrom using a variety of tools including - cdrecord. - - e.g. - cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - -3.2) Using LILO - When using LILO all the necessary command line parameters may be - specified using the 'append=' directive in the LILO configuration - file. - - However, to use the 'root=' directive you also need to create - a dummy root device, which may be removed after LILO is run. - - mknod /dev/boot255 c 0 255 - - For information on configuring LILO, please refer to its documentation. - -3.3) Using GRUB - When using GRUB, kernel parameter are simply appended after the kernel - specification: kernel - -3.4) Using loadlin - loadlin may be used to boot Linux from a DOS command prompt without - requiring a local hard disk to mount as root. This has not been - thoroughly tested by the authors of this document, but in general - it should be possible configure the kernel command line similarly - to the configuration of LILO. - - Please refer to the loadlin documentation for further information. - -3.5) Using a boot ROM - This is probably the most elegant way of booting a diskless client. - With a boot ROM the kernel is loaded using the TFTP protocol. The - authors of this document are not aware of any no commercial boot - ROMs that support booting Linux over the network. However, there - are two free implementations of a boot ROM, netboot-nfs and - etherboot, both of which are available on sunsite.unc.edu, and both - of which contain everything you need to boot a diskless Linux client. - -3.6) Using pxelinux - Pxelinux may be used to boot linux using the PXE boot loader - which is present on many modern network cards. - - When using pxelinux, the kernel image is specified using - "kernel ". The nfsroot parameters - are passed to the kernel by adding them to the "append" line. - It is common to use serial console in conjunction with pxeliunx, - see Documentation/serial-console.txt for more information. - - For more information on isolinux, including how to create bootdisks - for prebuilt kernels, see http://syslinux.zytor.com/ - - - - -4.) Credits - ------- - - The nfsroot code in the kernel and the RARP support have been written - by Gero Kuhlmann . - - The rest of the IP layer autoconfiguration code has been written - by Martin Mares . - - In order to write the initial version of nfsroot I would like to thank - Jens-Uwe Mager for his help. diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 92b888d540a6..a7e9746ee7ea 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -140,7 +140,7 @@ Callers of notify_change() need ->i_mutex now. New super_block field "struct export_operations *s_export_op" for explicit support for exporting, e.g. via NFS. The structure is fully documented at its declaration in include/linux/fs.h, and in -Documentation/filesystems/Exporting. +Documentation/filesystems/nfs/Exporting. Briefly it allows for the definition of decode_fh and encode_fh operations to encode and decode filehandles, and allows the filesystem to use diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9107b387e91f..dab0f04b4264 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1017,7 +1017,7 @@ and is between 256 and 4096 characters. It is defined in the file No delay ip= [IP_PNP] - See Documentation/filesystems/nfsroot.txt. + See Documentation/filesystems/nfs/nfsroot.txt. ip2= [HW] Set IO/IRQ pairs for up to 4 IntelliPort boards See comment before ip2_setup() in @@ -1538,10 +1538,10 @@ and is between 256 and 4096 characters. It is defined in the file going to be removed in 2.6.29. nfsaddrs= [NFS] - See Documentation/filesystems/nfsroot.txt. + See Documentation/filesystems/nfs/nfsroot.txt. nfsroot= [NFS] nfs root filesystem for disk-less boxes. - See Documentation/filesystems/nfsroot.txt. + See Documentation/filesystems/nfs/nfsroot.txt. nfs.callback_tcpport= [NFS] set the TCP port on which the NFSv4 callback diff --git a/fs/cifs/export.c b/fs/cifs/export.c index 75949d6a5f1b..6177f7cca16a 100644 --- a/fs/cifs/export.c +++ b/fs/cifs/export.c @@ -24,7 +24,7 @@ */ /* - * See Documentation/filesystems/Exporting + * See Documentation/filesystems/nfs/Exporting * and examples in fs/exportfs * * Since cifs is a network file system, an "fsid" must be included for diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 197c7db583c7..e9e175949a63 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -6,7 +6,7 @@ * and for mapping back from file handles to dentries. * * For details on why we do all the strange and hairy things in here - * take a look at Documentation/filesystems/Exporting. + * take a look at Documentation/filesystems/nfs/Exporting. */ #include #include diff --git a/fs/isofs/export.c b/fs/isofs/export.c index e81a30593ba9..ed752cb38474 100644 --- a/fs/isofs/export.c +++ b/fs/isofs/export.c @@ -9,7 +9,7 @@ * * The following files are helpful: * - * Documentation/filesystems/Exporting + * Documentation/filesystems/nfs/Exporting * fs/exportfs/expfs.c. */ diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig index 2a77bc25d5af..59e5673b4597 100644 --- a/fs/nfs/Kconfig +++ b/fs/nfs/Kconfig @@ -90,7 +90,7 @@ config ROOT_NFS If you want your system to mount its root file system via NFS, choose Y here. This is common practice for managing systems without local permanent storage. For details, read - . + . Most people say N here. diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 27e772cefb6a..dc12f416a49f 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -97,7 +97,7 @@ struct fid { * @get_name: find the name for a given inode in a given directory * @get_parent: find the parent of a given directory * - * See Documentation/filesystems/Exporting for details on how to use + * See Documentation/filesystems/nfs/Exporting for details on how to use * this interface correctly. * * encode_fh: diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 70491d9035eb..0c94a1ac2946 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -166,7 +166,7 @@ config IP_PNP_DHCP If unsure, say Y. Note that if you want to use DHCP, a DHCP server must be operating on your network. Read - for details. + for details. config IP_PNP_BOOTP bool "IP: BOOTP support" @@ -181,7 +181,7 @@ config IP_PNP_BOOTP does BOOTP itself, providing all necessary information on the kernel command line, you can say N here. If unsure, say Y. Note that if you want to use BOOTP, a BOOTP server must be operating on your network. - Read for details. + Read for details. config IP_PNP_RARP bool "IP: RARP support" @@ -194,7 +194,7 @@ config IP_PNP_RARP older protocol which is being obsoleted by BOOTP and DHCP), say Y here. Note that if you want to use RARP, a RARP server must be operating on your network. Read - for details. + for details. # not yet ready.. # bool ' IP: ARP support' CONFIG_IP_PNP_ARP diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index f8d04c256454..7dcbf4706099 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -1447,7 +1447,7 @@ late_initcall(ip_auto_config); /* * Decode any IP configuration options in the "ip=" or "nfsaddrs=" kernel - * command line parameter. See Documentation/filesystems/nfsroot.txt. + * command line parameter. See Documentation/filesystems/nfs/nfsroot.txt. */ static int __init ic_proto_name(char *name) { -- cgit v1.2.3 From 3d8e3ad879441ae14c5957b933028daf39d252b0 Mon Sep 17 00:00:00 2001 From: Frans Pop Date: Mon, 26 Oct 2009 08:39:02 +0100 Subject: thermal: add sanity check for the passive attribute Values below 1000 milli-celsius don't make sense and can cause the system to go into a thermal heart attack: the actual temperature will always be lower and thus the system will be throttled down to its lowest setting. An additional problem is that values below 1000 will show as 0 in /proc/acpi/thermal/TZx/trip_points:passive. cat passive 0 echo -n 90 >passive bash: echo: write error: Invalid argument echo -n 90000 >passive cat passive 90000 Signed-off-by: Frans Pop Acked-by: Zhang Rui Signed-off-by: Andrew Morton Signed-off-by: Len Brown --- Documentation/thermal/sysfs-api.txt | 1 + drivers/thermal/thermal_sys.c | 6 ++++++ 2 files changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt index a87dc277a5ca..cb3d15bc1aeb 100644 --- a/Documentation/thermal/sysfs-api.txt +++ b/Documentation/thermal/sysfs-api.txt @@ -206,6 +206,7 @@ passive passive trip point for the zone. Activation is done by polling with an interval of 1 second. Unit: millidegrees Celsius + Valid values: 0 (disabled) or greater than 1000 RW, Optional ***************************** diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c index 6f8d8f971212..310e40ab00b6 100644 --- a/drivers/thermal/thermal_sys.c +++ b/drivers/thermal/thermal_sys.c @@ -225,6 +225,12 @@ passive_store(struct device *dev, struct device_attribute *attr, if (!sscanf(buf, "%d\n", &state)) return -EINVAL; + /* sanity check: values below 1000 millicelcius don't make sense + * and can cause the system to go into a thermal heart attack + */ + if (state && state < 1000) + return -EINVAL; + if (state && !tz->forced_passive) { mutex_lock(&thermal_list_lock); list_for_each_entry(cdev, &thermal_cdev_list, node) { -- cgit v1.2.3 From ea4878a24d7e6a467d369b962bab95bd6a12cbe0 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 6 Nov 2009 13:59:43 -0500 Subject: nfs: move more to Documentation/filesystems/nfs Oops: I missed two files in the first commit that created this directory. Signed-off-by: J. Bruce Fields --- Documentation/filesystems/00-INDEX | 2 - Documentation/filesystems/knfsd-stats.txt | 159 -------------------- Documentation/filesystems/nfs/00-INDEX | 4 + Documentation/filesystems/nfs/knfsd-stats.txt | 159 ++++++++++++++++++++ Documentation/filesystems/nfs/rpc-cache.txt | 202 ++++++++++++++++++++++++++ Documentation/filesystems/rpc-cache.txt | 202 -------------------------- 6 files changed, 365 insertions(+), 363 deletions(-) delete mode 100644 Documentation/filesystems/knfsd-stats.txt create mode 100644 Documentation/filesystems/nfs/knfsd-stats.txt create mode 100644 Documentation/filesystems/nfs/rpc-cache.txt delete mode 100644 Documentation/filesystems/rpc-cache.txt (limited to 'Documentation') diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 482151c883a5..658154f52557 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -84,8 +84,6 @@ relay.txt - info on relay, for efficient streaming from kernel to user space. romfs.txt - description of the ROMFS filesystem. -rpc-cache.txt - - introduction to the caching mechanisms in the sunrpc layer. seq_file.txt - how to use the seq_file API sharedsubtree.txt diff --git a/Documentation/filesystems/knfsd-stats.txt b/Documentation/filesystems/knfsd-stats.txt deleted file mode 100644 index 64ced5149d37..000000000000 --- a/Documentation/filesystems/knfsd-stats.txt +++ /dev/null @@ -1,159 +0,0 @@ - -Kernel NFS Server Statistics -============================ - -This document describes the format and semantics of the statistics -which the kernel NFS server makes available to userspace. These -statistics are available in several text form pseudo files, each of -which is described separately below. - -In most cases you don't need to know these formats, as the nfsstat(8) -program from the nfs-utils distribution provides a helpful command-line -interface for extracting and printing them. - -All the files described here are formatted as a sequence of text lines, -separated by newline '\n' characters. Lines beginning with a hash -'#' character are comments intended for humans and should be ignored -by parsing routines. All other lines contain a sequence of fields -separated by whitespace. - -/proc/fs/nfsd/pool_stats ------------------------- - -This file is available in kernels from 2.6.30 onwards, if the -/proc/fs/nfsd filesystem is mounted (it almost always should be). - -The first line is a comment which describes the fields present in -all the other lines. The other lines present the following data as -a sequence of unsigned decimal numeric fields. One line is shown -for each NFS thread pool. - -All counters are 64 bits wide and wrap naturally. There is no way -to zero these counters, instead applications should do their own -rate conversion. - -pool - The id number of the NFS thread pool to which this line applies. - This number does not change. - - Thread pool ids are a contiguous set of small integers starting - at zero. The maximum value depends on the thread pool mode, but - currently cannot be larger than the number of CPUs in the system. - Note that in the default case there will be a single thread pool - which contains all the nfsd threads and all the CPUs in the system, - and thus this file will have a single line with a pool id of "0". - -packets-arrived - Counts how many NFS packets have arrived. More precisely, this - is the number of times that the network stack has notified the - sunrpc server layer that new data may be available on a transport - (e.g. an NFS or UDP socket or an NFS/RDMA endpoint). - - Depending on the NFS workload patterns and various network stack - effects (such as Large Receive Offload) which can combine packets - on the wire, this may be either more or less than the number - of NFS calls received (which statistic is available elsewhere). - However this is a more accurate and less workload-dependent measure - of how much CPU load is being placed on the sunrpc server layer - due to NFS network traffic. - -sockets-enqueued - Counts how many times an NFS transport is enqueued to wait for - an nfsd thread to service it, i.e. no nfsd thread was considered - available. - - The circumstance this statistic tracks indicates that there was NFS - network-facing work to be done but it couldn't be done immediately, - thus introducing a small delay in servicing NFS calls. The ideal - rate of change for this counter is zero; significantly non-zero - values may indicate a performance limitation. - - This can happen either because there are too few nfsd threads in the - thread pool for the NFS workload (the workload is thread-limited), - or because the NFS workload needs more CPU time than is available in - the thread pool (the workload is CPU-limited). In the former case, - configuring more nfsd threads will probably improve the performance - of the NFS workload. In the latter case, the sunrpc server layer is - already choosing not to wake idle nfsd threads because there are too - many nfsd threads which want to run but cannot, so configuring more - nfsd threads will make no difference whatsoever. The overloads-avoided - statistic (see below) can be used to distinguish these cases. - -threads-woken - Counts how many times an idle nfsd thread is woken to try to - receive some data from an NFS transport. - - This statistic tracks the circumstance where incoming - network-facing NFS work is being handled quickly, which is a good - thing. The ideal rate of change for this counter will be close - to but less than the rate of change of the packets-arrived counter. - -overloads-avoided - Counts how many times the sunrpc server layer chose not to wake an - nfsd thread, despite the presence of idle nfsd threads, because - too many nfsd threads had been recently woken but could not get - enough CPU time to actually run. - - This statistic counts a circumstance where the sunrpc layer - heuristically avoids overloading the CPU scheduler with too many - runnable nfsd threads. The ideal rate of change for this counter - is zero. Significant non-zero values indicate that the workload - is CPU limited. Usually this is associated with heavy CPU usage - on all the CPUs in the nfsd thread pool. - - If a sustained large overloads-avoided rate is detected on a pool, - the top(1) utility should be used to check for the following - pattern of CPU usage on all the CPUs associated with the given - nfsd thread pool. - - - %us ~= 0 (as you're *NOT* running applications on your NFS server) - - - %wa ~= 0 - - - %id ~= 0 - - - %sy + %hi + %si ~= 100 - - If this pattern is seen, configuring more nfsd threads will *not* - improve the performance of the workload. If this patten is not - seen, then something more subtle is wrong. - -threads-timedout - Counts how many times an nfsd thread triggered an idle timeout, - i.e. was not woken to handle any incoming network packets for - some time. - - This statistic counts a circumstance where there are more nfsd - threads configured than can be used by the NFS workload. This is - a clue that the number of nfsd threads can be reduced without - affecting performance. Unfortunately, it's only a clue and not - a strong indication, for a couple of reasons: - - - Currently the rate at which the counter is incremented is quite - slow; the idle timeout is 60 minutes. Unless the NFS workload - remains constant for hours at a time, this counter is unlikely - to be providing information that is still useful. - - - It is usually a wise policy to provide some slack, - i.e. configure a few more nfsds than are currently needed, - to allow for future spikes in load. - - -Note that incoming packets on NFS transports will be dealt with in -one of three ways. An nfsd thread can be woken (threads-woken counts -this case), or the transport can be enqueued for later attention -(sockets-enqueued counts this case), or the packet can be temporarily -deferred because the transport is currently being used by an nfsd -thread. This last case is not very interesting and is not explicitly -counted, but can be inferred from the other counters thus: - -packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) - - -More ----- -Descriptions of the other statistics file should go here. - - -Greg Banks -26 Mar 2009 diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index 6ff3d212027b..2f68cd688769 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX @@ -2,6 +2,8 @@ - this file (nfs-related documentation). Exporting - explanation of how to make filesystems exportable. +knfsd-stats.txt + - statistics which the NFS server makes available to user space. nfs.txt - nfs client, and DNS resolution for fs_locations. nfs41-server.txt @@ -10,3 +12,5 @@ nfs-rdma.txt - how to install and setup the Linux NFS/RDMA client and server software nfsroot.txt - short guide on setting up a diskless box with NFS root filesystem. +rpc-cache.txt + - introduction to the caching mechanisms in the sunrpc layer. diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.txt new file mode 100644 index 000000000000..64ced5149d37 --- /dev/null +++ b/Documentation/filesystems/nfs/knfsd-stats.txt @@ -0,0 +1,159 @@ + +Kernel NFS Server Statistics +============================ + +This document describes the format and semantics of the statistics +which the kernel NFS server makes available to userspace. These +statistics are available in several text form pseudo files, each of +which is described separately below. + +In most cases you don't need to know these formats, as the nfsstat(8) +program from the nfs-utils distribution provides a helpful command-line +interface for extracting and printing them. + +All the files described here are formatted as a sequence of text lines, +separated by newline '\n' characters. Lines beginning with a hash +'#' character are comments intended for humans and should be ignored +by parsing routines. All other lines contain a sequence of fields +separated by whitespace. + +/proc/fs/nfsd/pool_stats +------------------------ + +This file is available in kernels from 2.6.30 onwards, if the +/proc/fs/nfsd filesystem is mounted (it almost always should be). + +The first line is a comment which describes the fields present in +all the other lines. The other lines present the following data as +a sequence of unsigned decimal numeric fields. One line is shown +for each NFS thread pool. + +All counters are 64 bits wide and wrap naturally. There is no way +to zero these counters, instead applications should do their own +rate conversion. + +pool + The id number of the NFS thread pool to which this line applies. + This number does not change. + + Thread pool ids are a contiguous set of small integers starting + at zero. The maximum value depends on the thread pool mode, but + currently cannot be larger than the number of CPUs in the system. + Note that in the default case there will be a single thread pool + which contains all the nfsd threads and all the CPUs in the system, + and thus this file will have a single line with a pool id of "0". + +packets-arrived + Counts how many NFS packets have arrived. More precisely, this + is the number of times that the network stack has notified the + sunrpc server layer that new data may be available on a transport + (e.g. an NFS or UDP socket or an NFS/RDMA endpoint). + + Depending on the NFS workload patterns and various network stack + effects (such as Large Receive Offload) which can combine packets + on the wire, this may be either more or less than the number + of NFS calls received (which statistic is available elsewhere). + However this is a more accurate and less workload-dependent measure + of how much CPU load is being placed on the sunrpc server layer + due to NFS network traffic. + +sockets-enqueued + Counts how many times an NFS transport is enqueued to wait for + an nfsd thread to service it, i.e. no nfsd thread was considered + available. + + The circumstance this statistic tracks indicates that there was NFS + network-facing work to be done but it couldn't be done immediately, + thus introducing a small delay in servicing NFS calls. The ideal + rate of change for this counter is zero; significantly non-zero + values may indicate a performance limitation. + + This can happen either because there are too few nfsd threads in the + thread pool for the NFS workload (the workload is thread-limited), + or because the NFS workload needs more CPU time than is available in + the thread pool (the workload is CPU-limited). In the former case, + configuring more nfsd threads will probably improve the performance + of the NFS workload. In the latter case, the sunrpc server layer is + already choosing not to wake idle nfsd threads because there are too + many nfsd threads which want to run but cannot, so configuring more + nfsd threads will make no difference whatsoever. The overloads-avoided + statistic (see below) can be used to distinguish these cases. + +threads-woken + Counts how many times an idle nfsd thread is woken to try to + receive some data from an NFS transport. + + This statistic tracks the circumstance where incoming + network-facing NFS work is being handled quickly, which is a good + thing. The ideal rate of change for this counter will be close + to but less than the rate of change of the packets-arrived counter. + +overloads-avoided + Counts how many times the sunrpc server layer chose not to wake an + nfsd thread, despite the presence of idle nfsd threads, because + too many nfsd threads had been recently woken but could not get + enough CPU time to actually run. + + This statistic counts a circumstance where the sunrpc layer + heuristically avoids overloading the CPU scheduler with too many + runnable nfsd threads. The ideal rate of change for this counter + is zero. Significant non-zero values indicate that the workload + is CPU limited. Usually this is associated with heavy CPU usage + on all the CPUs in the nfsd thread pool. + + If a sustained large overloads-avoided rate is detected on a pool, + the top(1) utility should be used to check for the following + pattern of CPU usage on all the CPUs associated with the given + nfsd thread pool. + + - %us ~= 0 (as you're *NOT* running applications on your NFS server) + + - %wa ~= 0 + + - %id ~= 0 + + - %sy + %hi + %si ~= 100 + + If this pattern is seen, configuring more nfsd threads will *not* + improve the performance of the workload. If this patten is not + seen, then something more subtle is wrong. + +threads-timedout + Counts how many times an nfsd thread triggered an idle timeout, + i.e. was not woken to handle any incoming network packets for + some time. + + This statistic counts a circumstance where there are more nfsd + threads configured than can be used by the NFS workload. This is + a clue that the number of nfsd threads can be reduced without + affecting performance. Unfortunately, it's only a clue and not + a strong indication, for a couple of reasons: + + - Currently the rate at which the counter is incremented is quite + slow; the idle timeout is 60 minutes. Unless the NFS workload + remains constant for hours at a time, this counter is unlikely + to be providing information that is still useful. + + - It is usually a wise policy to provide some slack, + i.e. configure a few more nfsds than are currently needed, + to allow for future spikes in load. + + +Note that incoming packets on NFS transports will be dealt with in +one of three ways. An nfsd thread can be woken (threads-woken counts +this case), or the transport can be enqueued for later attention +(sockets-enqueued counts this case), or the packet can be temporarily +deferred because the transport is currently being used by an nfsd +thread. This last case is not very interesting and is not explicitly +counted, but can be inferred from the other counters thus: + +packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) + + +More +---- +Descriptions of the other statistics file should go here. + + +Greg Banks +26 Mar 2009 diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.txt new file mode 100644 index 000000000000..8a382bea6808 --- /dev/null +++ b/Documentation/filesystems/nfs/rpc-cache.txt @@ -0,0 +1,202 @@ + This document gives a brief introduction to the caching +mechanisms in the sunrpc layer that is used, in particular, +for NFS authentication. + +CACHES +====== +The caching replaces the old exports table and allows for +a wide variety of values to be caches. + +There are a number of caches that are similar in structure though +quite possibly very different in content and use. There is a corpus +of common code for managing these caches. + +Examples of caches that are likely to be needed are: + - mapping from IP address to client name + - mapping from client name and filesystem to export options + - mapping from UID to list of GIDs, to work around NFS's limitation + of 16 gids. + - mappings between local UID/GID and remote UID/GID for sites that + do not have uniform uid assignment + - mapping from network identify to public key for crypto authentication. + +The common code handles such things as: + - general cache lookup with correct locking + - supporting 'NEGATIVE' as well as positive entries + - allowing an EXPIRED time on cache items, and removing + items after they expire, and are no longer in-use. + - making requests to user-space to fill in cache entries + - allowing user-space to directly set entries in the cache + - delaying RPC requests that depend on as-yet incomplete + cache entries, and replaying those requests when the cache entry + is complete. + - clean out old entries as they expire. + +Creating a Cache +---------------- + +1/ A cache needs a datum to store. This is in the form of a + structure definition that must contain a + struct cache_head + as an element, usually the first. + It will also contain a key and some content. + Each cache element is reference counted and contains + expiry and update times for use in cache management. +2/ A cache needs a "cache_detail" structure that + describes the cache. This stores the hash table, some + parameters for cache management, and some operations detailing how + to work with particular cache items. + The operations requires are: + struct cache_head *alloc(void) + This simply allocates appropriate memory and returns + a pointer to the cache_detail embedded within the + structure + void cache_put(struct kref *) + This is called when the last reference to an item is + dropped. The pointer passed is to the 'ref' field + in the cache_head. cache_put should release any + references create by 'cache_init' and, if CACHE_VALID + is set, any references created by cache_update. + It should then release the memory allocated by + 'alloc'. + int match(struct cache_head *orig, struct cache_head *new) + test if the keys in the two structures match. Return + 1 if they do, 0 if they don't. + void init(struct cache_head *orig, struct cache_head *new) + Set the 'key' fields in 'new' from 'orig'. This may + include taking references to shared objects. + void update(struct cache_head *orig, struct cache_head *new) + Set the 'content' fileds in 'new' from 'orig'. + int cache_show(struct seq_file *m, struct cache_detail *cd, + struct cache_head *h) + Optional. Used to provide a /proc file that lists the + contents of a cache. This should show one item, + usually on just one line. + int cache_request(struct cache_detail *cd, struct cache_head *h, + char **bpp, int *blen) + Format a request to be send to user-space for an item + to be instantiated. *bpp is a buffer of size *blen. + bpp should be moved forward over the encoded message, + and *blen should be reduced to show how much free + space remains. Return 0 on success or <0 if not + enough room or other problem. + int cache_parse(struct cache_detail *cd, char *buf, int len) + A message from user space has arrived to fill out a + cache entry. It is in 'buf' of length 'len'. + cache_parse should parse this, find the item in the + cache with sunrpc_cache_lookup, and update the item + with sunrpc_cache_update. + + +3/ A cache needs to be registered using cache_register(). This + includes it on a list of caches that will be regularly + cleaned to discard old data. + +Using a cache +------------- + +To find a value in a cache, call sunrpc_cache_lookup passing a pointer +to the cache_head in a sample item with the 'key' fields filled in. +This will be passed to ->match to identify the target entry. If no +entry is found, a new entry will be create, added to the cache, and +marked as not containing valid data. + +The item returned is typically passed to cache_check which will check +if the data is valid, and may initiate an up-call to get fresh data. +cache_check will return -ENOENT in the entry is negative or if an up +call is needed but not possible, -EAGAIN if an upcall is pending, +or 0 if the data is valid; + +cache_check can be passed a "struct cache_req *". This structure is +typically embedded in the actual request and can be used to create a +deferred copy of the request (struct cache_deferred_req). This is +done when the found cache item is not uptodate, but the is reason to +believe that userspace might provide information soon. When the cache +item does become valid, the deferred copy of the request will be +revisited (->revisit). It is expected that this method will +reschedule the request for processing. + +The value returned by sunrpc_cache_lookup can also be passed to +sunrpc_cache_update to set the content for the item. A second item is +passed which should hold the content. If the item found by _lookup +has valid data, then it is discarded and a new item is created. This +saves any user of an item from worrying about content changing while +it is being inspected. If the item found by _lookup does not contain +valid data, then the content is copied across and CACHE_VALID is set. + +Populating a cache +------------------ + +Each cache has a name, and when the cache is registered, a directory +with that name is created in /proc/net/rpc + +This directory contains a file called 'channel' which is a channel +for communicating between kernel and user for populating the cache. +This directory may later contain other files of interacting +with the cache. + +The 'channel' works a bit like a datagram socket. Each 'write' is +passed as a whole to the cache for parsing and interpretation. +Each cache can treat the write requests differently, but it is +expected that a message written will contain: + - a key + - an expiry time + - a content. +with the intention that an item in the cache with the give key +should be create or updated to have the given content, and the +expiry time should be set on that item. + +Reading from a channel is a bit more interesting. When a cache +lookup fails, or when it succeeds but finds an entry that may soon +expire, a request is lodged for that cache item to be updated by +user-space. These requests appear in the channel file. + +Successive reads will return successive requests. +If there are no more requests to return, read will return EOF, but a +select or poll for read will block waiting for another request to be +added. + +Thus a user-space helper is likely to: + open the channel. + select for readable + read a request + write a response + loop. + +If it dies and needs to be restarted, any requests that have not been +answered will still appear in the file and will be read by the new +instance of the helper. + +Each cache should define a "cache_parse" method which takes a message +written from user-space and processes it. It should return an error +(which propagates back to the write syscall) or 0. + +Each cache should also define a "cache_request" method which +takes a cache item and encodes a request into the buffer +provided. + +Note: If a cache has no active readers on the channel, and has had not +active readers for more than 60 seconds, further requests will not be +added to the channel but instead all lookups that do not find a valid +entry will fail. This is partly for backward compatibility: The +previous nfs exports table was deemed to be authoritative and a +failed lookup meant a definite 'no'. + +request/response format +----------------------- + +While each cache is free to use it's own format for requests +and responses over channel, the following is recommended as +appropriate and support routines are available to help: +Each request or response record should be printable ASCII +with precisely one newline character which should be at the end. +Fields within the record should be separated by spaces, normally one. +If spaces, newlines, or nul characters are needed in a field they +much be quoted. two mechanisms are available: +1/ If a field begins '\x' then it must contain an even number of + hex digits, and pairs of these digits provide the bytes in the + field. +2/ otherwise a \ in the field must be followed by 3 octal digits + which give the code for a byte. Other characters are treated + as them selves. At the very least, space, newline, nul, and + '\' must be quoted in this way. diff --git a/Documentation/filesystems/rpc-cache.txt b/Documentation/filesystems/rpc-cache.txt deleted file mode 100644 index 8a382bea6808..000000000000 --- a/Documentation/filesystems/rpc-cache.txt +++ /dev/null @@ -1,202 +0,0 @@ - This document gives a brief introduction to the caching -mechanisms in the sunrpc layer that is used, in particular, -for NFS authentication. - -CACHES -====== -The caching replaces the old exports table and allows for -a wide variety of values to be caches. - -There are a number of caches that are similar in structure though -quite possibly very different in content and use. There is a corpus -of common code for managing these caches. - -Examples of caches that are likely to be needed are: - - mapping from IP address to client name - - mapping from client name and filesystem to export options - - mapping from UID to list of GIDs, to work around NFS's limitation - of 16 gids. - - mappings between local UID/GID and remote UID/GID for sites that - do not have uniform uid assignment - - mapping from network identify to public key for crypto authentication. - -The common code handles such things as: - - general cache lookup with correct locking - - supporting 'NEGATIVE' as well as positive entries - - allowing an EXPIRED time on cache items, and removing - items after they expire, and are no longer in-use. - - making requests to user-space to fill in cache entries - - allowing user-space to directly set entries in the cache - - delaying RPC requests that depend on as-yet incomplete - cache entries, and replaying those requests when the cache entry - is complete. - - clean out old entries as they expire. - -Creating a Cache ----------------- - -1/ A cache needs a datum to store. This is in the form of a - structure definition that must contain a - struct cache_head - as an element, usually the first. - It will also contain a key and some content. - Each cache element is reference counted and contains - expiry and update times for use in cache management. -2/ A cache needs a "cache_detail" structure that - describes the cache. This stores the hash table, some - parameters for cache management, and some operations detailing how - to work with particular cache items. - The operations requires are: - struct cache_head *alloc(void) - This simply allocates appropriate memory and returns - a pointer to the cache_detail embedded within the - structure - void cache_put(struct kref *) - This is called when the last reference to an item is - dropped. The pointer passed is to the 'ref' field - in the cache_head. cache_put should release any - references create by 'cache_init' and, if CACHE_VALID - is set, any references created by cache_update. - It should then release the memory allocated by - 'alloc'. - int match(struct cache_head *orig, struct cache_head *new) - test if the keys in the two structures match. Return - 1 if they do, 0 if they don't. - void init(struct cache_head *orig, struct cache_head *new) - Set the 'key' fields in 'new' from 'orig'. This may - include taking references to shared objects. - void update(struct cache_head *orig, struct cache_head *new) - Set the 'content' fileds in 'new' from 'orig'. - int cache_show(struct seq_file *m, struct cache_detail *cd, - struct cache_head *h) - Optional. Used to provide a /proc file that lists the - contents of a cache. This should show one item, - usually on just one line. - int cache_request(struct cache_detail *cd, struct cache_head *h, - char **bpp, int *blen) - Format a request to be send to user-space for an item - to be instantiated. *bpp is a buffer of size *blen. - bpp should be moved forward over the encoded message, - and *blen should be reduced to show how much free - space remains. Return 0 on success or <0 if not - enough room or other problem. - int cache_parse(struct cache_detail *cd, char *buf, int len) - A message from user space has arrived to fill out a - cache entry. It is in 'buf' of length 'len'. - cache_parse should parse this, find the item in the - cache with sunrpc_cache_lookup, and update the item - with sunrpc_cache_update. - - -3/ A cache needs to be registered using cache_register(). This - includes it on a list of caches that will be regularly - cleaned to discard old data. - -Using a cache -------------- - -To find a value in a cache, call sunrpc_cache_lookup passing a pointer -to the cache_head in a sample item with the 'key' fields filled in. -This will be passed to ->match to identify the target entry. If no -entry is found, a new entry will be create, added to the cache, and -marked as not containing valid data. - -The item returned is typically passed to cache_check which will check -if the data is valid, and may initiate an up-call to get fresh data. -cache_check will return -ENOENT in the entry is negative or if an up -call is needed but not possible, -EAGAIN if an upcall is pending, -or 0 if the data is valid; - -cache_check can be passed a "struct cache_req *". This structure is -typically embedded in the actual request and can be used to create a -deferred copy of the request (struct cache_deferred_req). This is -done when the found cache item is not uptodate, but the is reason to -believe that userspace might provide information soon. When the cache -item does become valid, the deferred copy of the request will be -revisited (->revisit). It is expected that this method will -reschedule the request for processing. - -The value returned by sunrpc_cache_lookup can also be passed to -sunrpc_cache_update to set the content for the item. A second item is -passed which should hold the content. If the item found by _lookup -has valid data, then it is discarded and a new item is created. This -saves any user of an item from worrying about content changing while -it is being inspected. If the item found by _lookup does not contain -valid data, then the content is copied across and CACHE_VALID is set. - -Populating a cache ------------------- - -Each cache has a name, and when the cache is registered, a directory -with that name is created in /proc/net/rpc - -This directory contains a file called 'channel' which is a channel -for communicating between kernel and user for populating the cache. -This directory may later contain other files of interacting -with the cache. - -The 'channel' works a bit like a datagram socket. Each 'write' is -passed as a whole to the cache for parsing and interpretation. -Each cache can treat the write requests differently, but it is -expected that a message written will contain: - - a key - - an expiry time - - a content. -with the intention that an item in the cache with the give key -should be create or updated to have the given content, and the -expiry time should be set on that item. - -Reading from a channel is a bit more interesting. When a cache -lookup fails, or when it succeeds but finds an entry that may soon -expire, a request is lodged for that cache item to be updated by -user-space. These requests appear in the channel file. - -Successive reads will return successive requests. -If there are no more requests to return, read will return EOF, but a -select or poll for read will block waiting for another request to be -added. - -Thus a user-space helper is likely to: - open the channel. - select for readable - read a request - write a response - loop. - -If it dies and needs to be restarted, any requests that have not been -answered will still appear in the file and will be read by the new -instance of the helper. - -Each cache should define a "cache_parse" method which takes a message -written from user-space and processes it. It should return an error -(which propagates back to the write syscall) or 0. - -Each cache should also define a "cache_request" method which -takes a cache item and encodes a request into the buffer -provided. - -Note: If a cache has no active readers on the channel, and has had not -active readers for more than 60 seconds, further requests will not be -added to the channel but instead all lookups that do not find a valid -entry will fail. This is partly for backward compatibility: The -previous nfs exports table was deemed to be authoritative and a -failed lookup meant a definite 'no'. - -request/response format ------------------------ - -While each cache is free to use it's own format for requests -and responses over channel, the following is recommended as -appropriate and support routines are available to help: -Each request or response record should be printable ASCII -with precisely one newline character which should be at the end. -Fields within the record should be separated by spaces, normally one. -If spaces, newlines, or nul characters are needed in a field they -much be quoted. two mechanisms are available: -1/ If a field begins '\x' then it must contain an even number of - hex digits, and pairs of these digits provide the bytes in the - field. -2/ otherwise a \ in the field must be followed by 3 octal digits - which give the code for a byte. Other characters are treated - as them selves. At the very least, space, newline, nul, and - '\' must be quoted in this way. -- cgit v1.2.3 From 8c68e2f7885b22f0a63bf087752a46b690d6b6ea Mon Sep 17 00:00:00 2001 From: Anton Vorontsov Date: Wed, 16 Sep 2009 01:44:00 +0400 Subject: powerpc/86xx: Add power management support for MPC8610HPCD boards This patch adds needed nodes and properties to support suspend/resume on the MPC8610HPCD boards. There is a dedicated switch (SW9) that is used to wake up the boards. By default the SW9 button is routed to IRQ8, but could be re-routed (via PIXIS) to sreset. With 'no_console_suspend' kernel command line argument specified, the board is also able to wakeup upon serial port input. Signed-off-by: Anton Vorontsov Acked-by: Scott Wood [dts] Signed-off-by: Kumar Gala --- Documentation/powerpc/dts-bindings/fsl/board.txt | 4 ++ arch/powerpc/boot/dts/mpc8610_hpcd.dts | 26 +++++++++++++ arch/powerpc/platforms/86xx/mpc8610_hpcd.c | 48 ++++++++++++++++++++++-- 3 files changed, 74 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/fsl/board.txt b/Documentation/powerpc/dts-bindings/fsl/board.txt index e8b5bc24d0ac..39e941515a36 100644 --- a/Documentation/powerpc/dts-bindings/fsl/board.txt +++ b/Documentation/powerpc/dts-bindings/fsl/board.txt @@ -20,12 +20,16 @@ Required properities: - compatible : should be "fsl,fpga-pixis". - reg : should contain the address and the length of the FPPGA register set. +- interrupt-parent: should specify phandle for the interrupt controller. +- interrupts : should specify event (wakeup) IRQ. Example (MPC8610HPCD): board-control@e8000000 { compatible = "fsl,fpga-pixis"; reg = <0xe8000000 32>; + interrupt-parent = <&mpic>; + interrupts = <8 8>; }; * Freescale BCSR GPIO banks diff --git a/arch/powerpc/boot/dts/mpc8610_hpcd.dts b/arch/powerpc/boot/dts/mpc8610_hpcd.dts index f468d215f716..9535ce68caae 100644 --- a/arch/powerpc/boot/dts/mpc8610_hpcd.dts +++ b/arch/powerpc/boot/dts/mpc8610_hpcd.dts @@ -35,6 +35,8 @@ i-cache-line-size = <32>; d-cache-size = <32768>; // L1 i-cache-size = <32768>; // L1 + sleep = <&pmc 0x00008000 0 // core + &pmc 0x00004000 0>; // timebase timebase-frequency = <0>; // From uboot bus-frequency = <0>; // From uboot clock-frequency = <0>; // From uboot @@ -60,6 +62,7 @@ 5 0 0xe8480000 0x00008000 6 0 0xe84c0000 0x00008000 3 0 0xe8000000 0x00000020>; + sleep = <&pmc 0x08000000 0>; flash@0,0 { compatible = "cfi-flash"; @@ -105,6 +108,8 @@ compatible = "fsl,fpga-pixis"; reg = <3 0 0x20>; ranges = <0 3 0 0x20>; + interrupt-parent = <&mpic>; + interrupts = <8 8>; sdcsr_pio: gpio-controller@a { #gpio-cells = <2>; @@ -163,6 +168,7 @@ reg = <0x3100 0x100>; interrupts = <43 2>; interrupt-parent = <&mpic>; + sleep = <&pmc 0x00000004 0>; dfsrr; }; @@ -174,6 +180,7 @@ clock-frequency = <0>; interrupts = <42 2>; interrupt-parent = <&mpic>; + sleep = <&pmc 0x00000002 0>; }; serial1: serial@4600 { @@ -184,6 +191,7 @@ clock-frequency = <0>; interrupts = <42 2>; interrupt-parent = <&mpic>; + sleep = <&pmc 0x00000008 0>; }; spi@7000 { @@ -196,6 +204,7 @@ interrupt-parent = <&mpic>; mode = "cpu"; gpios = <&sdcsr_pio 7 0>; + sleep = <&pmc 0x00000800 0>; mmc-slot@0 { compatible = "fsl,mpc8610hpcd-mmc-slot", @@ -213,6 +222,7 @@ reg = <0x2c000 100>; interrupts = <72 2>; interrupt-parent = <&mpic>; + sleep = <&pmc 0x04000000 0>; }; mpic: interrupt-controller@40000 { @@ -241,9 +251,18 @@ }; global-utilities@e0000 { + #address-cells = <1>; + #size-cells = <1>; compatible = "fsl,mpc8610-guts"; reg = <0xe0000 0x1000>; + ranges = <0 0xe0000 0x1000>; fsl,has-rstcr; + + pmc: power@70 { + compatible = "fsl,mpc8610-pmc", + "fsl,mpc8641d-pmc"; + reg = <0x70 0x20>; + }; }; wdt@e4000 { @@ -262,6 +281,7 @@ fsl,playback-dma = <&dma00>; fsl,capture-dma = <&dma01>; fsl,fifo-depth = <8>; + sleep = <&pmc 0 0x08000000>; }; ssi@16100 { @@ -271,6 +291,7 @@ interrupt-parent = <&mpic>; interrupts = <63 2>; fsl,fifo-depth = <8>; + sleep = <&pmc 0 0x04000000>; }; dma@21300 { @@ -280,6 +301,7 @@ cell-index = <0>; reg = <0x21300 0x4>; /* DMA general status register */ ranges = <0x0 0x21100 0x200>; + sleep = <&pmc 0x00000400 0>; dma00: dma-channel@0 { compatible = "fsl,mpc8610-dma-channel", @@ -322,6 +344,7 @@ cell-index = <1>; reg = <0xc300 0x4>; /* DMA general status register */ ranges = <0x0 0xc100 0x200>; + sleep = <&pmc 0x00000200 0>; dma-channel@0 { compatible = "fsl,mpc8610-dma-channel", @@ -369,6 +392,7 @@ bus-range = <0 0>; ranges = <0x02000000 0x0 0x80000000 0x80000000 0x0 0x10000000 0x01000000 0x0 0x00000000 0xe1000000 0x0 0x00100000>; + sleep = <&pmc 0x80000000 0>; clock-frequency = <33333333>; interrupt-parent = <&mpic>; interrupts = <24 2>; @@ -398,6 +422,7 @@ bus-range = <1 3>; ranges = <0x02000000 0x0 0xa0000000 0xa0000000 0x0 0x10000000 0x01000000 0x0 0x00000000 0xe3000000 0x0 0x00100000>; + sleep = <&pmc 0x40000000 0>; clock-frequency = <33333333>; interrupt-parent = <&mpic>; interrupts = <26 2>; @@ -474,6 +499,7 @@ 0x0000 0 0 4 &mpic 7 1>; interrupt-parent = <&mpic>; interrupts = <25 2>; + sleep = <&pmc 0x20000000 0>; clock-frequency = <33333333>; }; }; diff --git a/arch/powerpc/platforms/86xx/mpc8610_hpcd.c b/arch/powerpc/platforms/86xx/mpc8610_hpcd.c index 627908a4cd77..5abe137f6309 100644 --- a/arch/powerpc/platforms/86xx/mpc8610_hpcd.c +++ b/arch/powerpc/platforms/86xx/mpc8610_hpcd.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -41,10 +42,46 @@ #include "mpc86xx.h" +static struct device_node *pixis_node; static unsigned char *pixis_bdcfg0, *pixis_arch; +#ifdef CONFIG_SUSPEND +static irqreturn_t mpc8610_sw9_irq(int irq, void *data) +{ + pr_debug("%s: PIXIS' event (sw9/wakeup) IRQ handled\n", __func__); + return IRQ_HANDLED; +} + +static void __init mpc8610_suspend_init(void) +{ + int irq; + int ret; + + if (!pixis_node) + return; + + irq = irq_of_parse_and_map(pixis_node, 0); + if (!irq) { + pr_err("%s: can't map pixis event IRQ.\n", __func__); + return; + } + + ret = request_irq(irq, mpc8610_sw9_irq, 0, "sw9/wakeup", NULL); + if (ret) { + pr_err("%s: can't request pixis event IRQ: %d\n", + __func__, ret); + irq_dispose_mapping(irq); + } + + enable_irq_wake(irq); +} +#else +static inline void mpc8610_suspend_init(void) { } +#endif /* CONFIG_SUSPEND */ + static struct of_device_id __initdata mpc8610_ids[] = { { .compatible = "fsl,mpc8610-immr", }, + { .compatible = "fsl,mpc8610-guts", }, { .compatible = "simple-bus", }, { .compatible = "gianfar", }, {} @@ -55,6 +92,9 @@ static int __init mpc8610_declare_of_platform_devices(void) /* Firstly, register PIXIS GPIOs. */ simple_gpiochip_init("fsl,fpga-pixis-gpio-bank"); + /* Enable wakeup on PIXIS' event IRQ. */ + mpc8610_suspend_init(); + /* Without this call, the SSI device driver won't get probed. */ of_platform_bus_probe(NULL, mpc8610_ids, NULL); @@ -250,10 +290,10 @@ static void __init mpc86xx_hpcd_setup_arch(void) diu_ops.set_sysfs_monitor_port = mpc8610hpcd_set_sysfs_monitor_port; #endif - np = of_find_compatible_node(NULL, NULL, "fsl,fpga-pixis"); - if (np) { - of_address_to_resource(np, 0, &r); - of_node_put(np); + pixis_node = of_find_compatible_node(NULL, NULL, "fsl,fpga-pixis"); + if (pixis_node) { + of_address_to_resource(pixis_node, 0, &r); + of_node_put(pixis_node); pixis = ioremap(r.start, 32); if (!pixis) { printk(KERN_ERR "Err: can't map FPGA cfg register!\n"); -- cgit v1.2.3 From 13b600b59df287a175b1476d2d588ab935092b58 Mon Sep 17 00:00:00 2001 From: Albrecht Dreß Date: Fri, 13 Nov 2009 11:09:30 -0700 Subject: mpc52xx/wdt: OF property to enable the WDT on boot MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the "fsl,wdt-on-boot" OF property as to reserve a GPT as WDT which may be a requirement in safety-related (e.g. ISO/EN 61508) applications. Signed-off-by: Albrecht Dreß Signed-off-by: Grant Likely --- Documentation/powerpc/dts-bindings/fsl/mpc5200.txt | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt index 8447fd7090d0..ddd5ee32ea63 100644 --- a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt +++ b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt @@ -103,7 +103,22 @@ fsl,mpc5200-gpt nodes --------------------- On the mpc5200 and 5200b, GPT0 has a watchdog timer function. If the board design supports the internal wdt, then the device node for GPT0 should -include the empty property 'fsl,has-wdt'. +include the empty property 'fsl,has-wdt'. Note that this does not activate +the watchdog. The timer will function as a GPT if the timer api is used, and +it will function as watchdog if the watchdog device is used. The watchdog +mode has priority over the gpt mode, i.e. if the watchdog is activated, any +gpt api call to this timer will fail with -EBUSY. + +If you add the property + fsl,wdt-on-boot = ; +GPT0 will be marked as in-use watchdog, i.e. blocking every gpt access to it. +If n>0, the watchdog is started with a timeout of n seconds. If n=0, the +configuration of the watchdog is not touched. This is useful in two cases: +- just mark GPT0 as watchdog, blocking gpt accesses, and configure it later; +- do not touch a configuration assigned by the boot loader which supervises + the boot process itself. + +The watchdog will respect the CONFIG_WATCHDOG_NOWAYOUT option. An mpc5200-gpt can be used as a single line GPIO controller. To do so, add the following properties to the gpt node: -- cgit v1.2.3 From 5328e635315734d42080de9a5a1ee87bf4cae0a4 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Thu, 19 Nov 2009 14:25:42 -0500 Subject: ext4: make trim/discard optional (and off by default) It is anticipated that when sb_issue_discard starts doing real work on trim-capable devices, we may see issues. Make this mount-time optional, and default it to off until we know that things are working out OK. Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 6 ++++++ fs/ext4/ext4.h | 1 + fs/ext4/mballoc.c | 21 +++++++++++++-------- fs/ext4/super.c | 14 +++++++++++++- 4 files changed, 33 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 6d94e0696f8c..26904ff6d61b 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -353,6 +353,12 @@ noauto_da_alloc replacing existing files via patterns such as system crashes before the delayed allocation blocks are forced to disk. +discard Controls whether ext4 should issue discard/TRIM +nodiscard(*) commands to the underlying block device when + blocks are freed. This is useful for SSD devices + and sparse/thinly-provisioned LUNs, but it is off + by default until sufficient testing has been done. + Data Mode ========= There are 3 different data modes: diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 8825515eeddd..05ce38b981cb 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -750,6 +750,7 @@ struct ext4_inode_info { #define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */ #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */ #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */ +#define EXT4_MOUNT_DISCARD 0x40000000 /* Issue DISCARD requests */ #define clear_opt(o, opt) o &= ~EXT4_MOUNT_##opt #define set_opt(o, opt) o |= EXT4_MOUNT_##opt diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index bba12824defa..6e5a23a2cc25 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2529,7 +2529,6 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) struct ext4_group_info *db; int err, count = 0, count2 = 0; struct ext4_free_data *entry; - ext4_fsblk_t discard_block; struct list_head *l, *ltmp; list_for_each_safe(l, ltmp, &txn->t_private_list) { @@ -2559,13 +2558,19 @@ static void release_blocks_on_commit(journal_t *journal, transaction_t *txn) page_cache_release(e4b.bd_bitmap_page); } ext4_unlock_group(sb, entry->group); - discard_block = (ext4_fsblk_t) entry->group * EXT4_BLOCKS_PER_GROUP(sb) - + entry->start_blk - + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block); - trace_ext4_discard_blocks(sb, (unsigned long long)discard_block, - entry->count); - sb_issue_discard(sb, discard_block, entry->count); - + if (test_opt(sb, DISCARD)) { + ext4_fsblk_t discard_block; + struct ext4_super_block *es = EXT4_SB(sb)->s_es; + + discard_block = (ext4_fsblk_t)entry->group * + EXT4_BLOCKS_PER_GROUP(sb) + + entry->start_blk + + le32_to_cpu(es->s_first_data_block); + trace_ext4_discard_blocks(sb, + (unsigned long long)discard_block, + entry->count); + sb_issue_discard(sb, discard_block, entry->count); + } kmem_cache_free(ext4_free_ext_cachep, entry); ext4_mb_release_desc(&e4b); } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index f2d5ec77c1e9..2b382f6f80c1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -899,6 +899,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, NO_AUTO_DA_ALLOC)) seq_puts(seq, ",noauto_da_alloc"); + if (test_opt(sb, DISCARD)) + seq_puts(seq, ",discard"); + ext4_show_quota_options(seq, sb); return 0; @@ -1079,7 +1082,8 @@ enum { Opt_usrquota, Opt_grpquota, Opt_i_version, Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_block_validity, Opt_noblock_validity, - Opt_inode_readahead_blks, Opt_journal_ioprio + Opt_inode_readahead_blks, Opt_journal_ioprio, + Opt_discard, Opt_nodiscard, }; static const match_table_t tokens = { @@ -1144,6 +1148,8 @@ static const match_table_t tokens = { {Opt_auto_da_alloc, "auto_da_alloc=%u"}, {Opt_auto_da_alloc, "auto_da_alloc"}, {Opt_noauto_da_alloc, "noauto_da_alloc"}, + {Opt_discard, "discard"}, + {Opt_nodiscard, "nodiscard"}, {Opt_err, NULL}, }; @@ -1565,6 +1571,12 @@ set_qf_format: else set_opt(sbi->s_mount_opt,NO_AUTO_DA_ALLOC); break; + case Opt_discard: + set_opt(sbi->s_mount_opt, DISCARD); + break; + case Opt_nodiscard: + clear_opt(sbi->s_mount_opt, DISCARD); + break; default: ext4_msg(sb, KERN_ERR, "Unrecognized mount option \"%s\" " -- cgit v1.2.3 From e3bb52ae2bb9573e84c17b8e3560378d13a5c798 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Thu, 19 Nov 2009 14:28:50 -0500 Subject: ext4: make "norecovery" an alias for "noload" Users on the linux-ext4 list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext4. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 4 ++-- fs/ext4/super.c | 4 ++++ 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 26904ff6d61b..af6885c3c821 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -153,8 +153,8 @@ journal_dev=devnum When the external journal device's major/minor numbers identified through its new major/minor numbers encoded in devnum. -noload Don't load the journal on mounting. Note that - if the filesystem was not unmounted cleanly, +norecovery Don't load the journal on mounting. Note that +noload if the filesystem was not unmounted cleanly, skipping the journal replay will lead to the filesystem containing inconsistencies that can lead to any number of problems. diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 2b382f6f80c1..5a2db612950b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -902,6 +902,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, DISCARD)) seq_puts(seq, ",discard"); + if (test_opt(sb, NOLOAD)) + seq_puts(seq, ",norecovery"); + ext4_show_quota_options(seq, sb); return 0; @@ -1108,6 +1111,7 @@ static const match_table_t tokens = { {Opt_acl, "acl"}, {Opt_noacl, "noacl"}, {Opt_noload, "noload"}, + {Opt_noload, "norecovery"}, {Opt_nobh, "nobh"}, {Opt_bh, "bh"}, {Opt_commit, "commit=%u"}, -- cgit v1.2.3 From 91f1953bf3243a4215b57d8e2f317a7035924de7 Mon Sep 17 00:00:00 2001 From: Jiro SEKIBA Date: Thu, 12 Nov 2009 14:07:26 +0900 Subject: nilfs2: Using nobarrier option instead of barrier=off Since most of fs using nofoobar style option, modified barrier=off option as nobarrier. Signed-off-by: Jiro SEKIBA Signed-off-by: Ryusuke Konishi --- Documentation/filesystems/nilfs2.txt | 3 +-- fs/nilfs2/super.c | 15 +++++---------- 2 files changed, 6 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index 01539f410676..cbd877978c80 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -49,8 +49,7 @@ Mount options NILFS2 supports the following mount options: (*) == default -barrier=on(*) This enables/disables barriers. barrier=off disables - it, barrier=on enables it. +nobarrier Disables barriers. errors=continue(*) Keep going on a filesystem error. errors=remount-ro Remount the filesystem read-only on an error. errors=panic Panic and halt the machine if an error occurs. diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 644e66727dd0..02dcbb008673 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -490,7 +490,7 @@ static int nilfs_show_options(struct seq_file *seq, struct vfsmount *vfs) struct nilfs_sb_info *sbi = NILFS_SB(sb); if (!nilfs_test_opt(sbi, BARRIER)) - seq_printf(seq, ",barrier=off"); + seq_printf(seq, ",nobarrier"); if (nilfs_test_opt(sbi, SNAPSHOT)) seq_printf(seq, ",cp=%llu", (unsigned long long int)sbi->s_snapshot_cno); @@ -568,7 +568,7 @@ static const struct export_operations nilfs_export_ops = { enum { Opt_err_cont, Opt_err_panic, Opt_err_ro, - Opt_barrier, Opt_snapshot, Opt_order, + Opt_nobarrier, Opt_snapshot, Opt_order, Opt_err, }; @@ -576,7 +576,7 @@ static match_table_t tokens = { {Opt_err_cont, "errors=continue"}, {Opt_err_panic, "errors=panic"}, {Opt_err_ro, "errors=remount-ro"}, - {Opt_barrier, "barrier=%s"}, + {Opt_nobarrier, "nobarrier"}, {Opt_snapshot, "cp=%u"}, {Opt_order, "order=%s"}, {Opt_err, NULL} @@ -612,13 +612,8 @@ static int parse_options(char *options, struct super_block *sb) token = match_token(p, tokens, args); switch (token) { - case Opt_barrier: - if (match_bool(&args[0], &option)) - return 0; - if (option) - nilfs_set_opt(sbi, BARRIER); - else - nilfs_clear_opt(sbi, BARRIER); + case Opt_nobarrier: + nilfs_clear_opt(sbi, BARRIER); break; case Opt_order: if (strcmp(args[0].from, "relaxed") == 0) -- cgit v1.2.3 From 0234576d041b9b2cc7043691ea61d2c2ca597aaa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Fri, 20 Nov 2009 03:28:01 +0900 Subject: nilfs2: add norecovery mount option This adds "norecovery" mount option which disables temporal write access to read-only mounts or snapshots during mount/recovery. Without this option, write access will be even performed for those types of mounts; the temporal write access is needed to mount root file system read-only after an unclean shutdown. This option will be helpful when user wants to prevent any write access to the device. Signed-off-by: Ryusuke Konishi Cc: Eric Sandeen --- Documentation/filesystems/nilfs2.txt | 4 ++++ fs/nilfs2/super.c | 16 +++++++++++++++- fs/nilfs2/the_nilfs.c | 20 ++++++++++++++++++-- include/linux/nilfs2_fs.h | 2 ++ 4 files changed, 39 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index cbd877978c80..4949fcaa6b6a 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -70,6 +70,10 @@ order=strict Apply strict in-order semantics that preserves sequence blocks. That means, it is guaranteed that no overtaking of events occurs in the recovered file system after a crash. +norecovery Disable recovery of the filesystem on mount. + This disables every write access on the device for + read-only mounts or snapshots. This option will fail + for r/w mounts on an unclean volume. NILFS2 usage ============ diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 990ead43a833..5403b3ef3a42 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -479,6 +479,8 @@ static int nilfs_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_printf(seq, ",errors=panic"); if (nilfs_test_opt(sbi, STRICT_ORDER)) seq_printf(seq, ",order=strict"); + if (nilfs_test_opt(sbi, NORECOVERY)) + seq_printf(seq, ",norecovery"); return 0; } @@ -547,7 +549,7 @@ static const struct export_operations nilfs_export_ops = { enum { Opt_err_cont, Opt_err_panic, Opt_err_ro, - Opt_nobarrier, Opt_snapshot, Opt_order, + Opt_nobarrier, Opt_snapshot, Opt_order, Opt_norecovery, Opt_err, }; @@ -558,6 +560,7 @@ static match_table_t tokens = { {Opt_nobarrier, "nobarrier"}, {Opt_snapshot, "cp=%u"}, {Opt_order, "order=%s"}, + {Opt_norecovery, "norecovery"}, {Opt_err, NULL} }; @@ -608,6 +611,9 @@ static int parse_options(char *options, struct super_block *sb) sbi->s_snapshot_cno = option; nilfs_set_opt(sbi, SNAPSHOT); break; + case Opt_norecovery: + nilfs_set_opt(sbi, NORECOVERY); + break; default: printk(KERN_ERR "NILFS: Unrecognized mount option \"%s\"\n", p); @@ -863,6 +869,14 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data) goto restore_opts; } + if (!nilfs_valid_fs(nilfs)) { + printk(KERN_WARNING "NILFS (device %s): couldn't " + "remount because the filesystem is in an " + "incomplete recovery state.\n", sb->s_id); + err = -EINVAL; + goto restore_opts; + } + if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY)) goto out; if (*flags & MS_RDONLY) { diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index 890a8d3886cf..6241e1722efc 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -264,8 +264,14 @@ int load_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi) int valid_fs = nilfs_valid_fs(nilfs); int err; - if (nilfs_loaded(nilfs)) - return 0; + if (nilfs_loaded(nilfs)) { + if (valid_fs || + ((s_flags & MS_RDONLY) && nilfs_test_opt(sbi, NORECOVERY))) + return 0; + printk(KERN_ERR "NILFS: the filesystem is in an incomplete " + "recovery state.\n"); + return -EINVAL; + } if (!valid_fs) { printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n"); @@ -295,6 +301,11 @@ int load_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi) goto skip_recovery; if (s_flags & MS_RDONLY) { + if (nilfs_test_opt(sbi, NORECOVERY)) { + printk(KERN_INFO "NILFS: norecovery option specified. " + "skipping roll-forward recovery\n"); + goto skip_recovery; + } if (really_read_only) { printk(KERN_ERR "NILFS: write access " "unavailable, cannot proceed.\n"); @@ -302,6 +313,11 @@ int load_nilfs(struct the_nilfs *nilfs, struct nilfs_sb_info *sbi) goto failed_unload; } sbi->s_super->s_flags &= ~MS_RDONLY; + } else if (nilfs_test_opt(sbi, NORECOVERY)) { + printk(KERN_ERR "NILFS: recovery cancelled because norecovery " + "option was specified for a read/write mount\n"); + err = -EINVAL; + goto failed_unload; } err = nilfs_recover_logical_segments(nilfs, sbi, &ri); diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h index 72289d2bb341..3fe02cf8b65a 100644 --- a/include/linux/nilfs2_fs.h +++ b/include/linux/nilfs2_fs.h @@ -151,6 +151,8 @@ struct nilfs_super_root { #define NILFS_MOUNT_BARRIER 0x1000 /* Use block barriers */ #define NILFS_MOUNT_STRICT_ORDER 0x2000 /* Apply strict in-order semantics also for data */ +#define NILFS_MOUNT_NORECOVERY 0x4000 /* Disable write access during + mount-time recovery */ /** -- cgit v1.2.3 From 3aa565f53c396914a9406388efaa238e9c937fc6 Mon Sep 17 00:00:00 2001 From: Gautham R Shenoy Date: Thu, 29 Oct 2009 19:22:53 +0000 Subject: powerpc/pseries: Add hooks to put the CPU into an appropriate offline state When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand the CPU back to the resource pool. This path is used for DLPAR which will cause a change in the LPAR configuration which will be visible outside. This patch changes the default state a CPU is put into when it is offlined. On platforms which support ceding the processor to the hypervisor with latency hint specifier value, during a cpu offline operation, instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor while passing a latency hint specifier value. The Hypervisor can use this hint to provide better energy savings. Also, during the offline operation, the control of the vCPU remains with the LPAR as oppposed to returning it to the resource pool. The patch achieves this by creating an infrastructure to set the preferred_offline_state() which can be either - CPU_STATE_OFFLINE: which is the current behaviour of calling rtas_stop_self() - CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency hint specifier. The codepath which wants to perform a DLPAR operation can set the preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking cpu_down(). The patch also provides a boot-time command line argument to disable/enable CPU_STATE_INACTIVE. Signed-off-by: Gautham R Shenoy Signed-off-by: Nathan Fontenot Signed-off-by: Benjamin Herrenschmidt --- Documentation/cpu-hotplug.txt | 6 + arch/powerpc/platforms/pseries/hotplug-cpu.c | 182 ++++++++++++++++++++++-- arch/powerpc/platforms/pseries/offline_states.h | 18 +++ arch/powerpc/platforms/pseries/smp.c | 19 +++ 4 files changed, 216 insertions(+), 9 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/offline_states.h (limited to 'Documentation') diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 9d620c153b04..4d4a644b505e 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -49,6 +49,12 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using additional_cpus=n (*) Use this to limit hotpluggable cpus. This option sets cpu_possible_map = cpu_present_map + additional_cpus +cede_offline={"off","on"} Use this option to disable/enable putting offlined + processors to an extended H_CEDE state on + supported pseries platforms. + If nothing is specified, + cede_offline is set to "on". + (*) Option valid only for following architectures - ia64 diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index ebff6d9a4e39..6ea4698d9176 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -30,6 +30,7 @@ #include #include "xics.h" #include "plpar_wrappers.h" +#include "offline_states.h" /* This version can't take the spinlock, because it never returns */ static struct rtas_args rtas_stop_self_args = { @@ -39,6 +40,55 @@ static struct rtas_args rtas_stop_self_args = { .rets = &rtas_stop_self_args.args[0], }; +static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) = + CPU_STATE_OFFLINE; +static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE; + +static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE; + +static int cede_offline_enabled __read_mostly = 1; + +/* + * Enable/disable cede_offline when available. + */ +static int __init setup_cede_offline(char *str) +{ + if (!strcmp(str, "off")) + cede_offline_enabled = 0; + else if (!strcmp(str, "on")) + cede_offline_enabled = 1; + else + return 0; + return 1; +} + +__setup("cede_offline=", setup_cede_offline); + +enum cpu_state_vals get_cpu_current_state(int cpu) +{ + return per_cpu(current_state, cpu); +} + +void set_cpu_current_state(int cpu, enum cpu_state_vals state) +{ + per_cpu(current_state, cpu) = state; +} + +enum cpu_state_vals get_preferred_offline_state(int cpu) +{ + return per_cpu(preferred_offline_state, cpu); +} + +void set_preferred_offline_state(int cpu, enum cpu_state_vals state) +{ + per_cpu(preferred_offline_state, cpu) = state; +} + +void set_default_offline_state(int cpu) +{ + per_cpu(preferred_offline_state, cpu) = default_offline_state; +} + static void rtas_stop_self(void) { struct rtas_args *args = &rtas_stop_self_args; @@ -56,11 +106,61 @@ static void rtas_stop_self(void) static void pseries_mach_cpu_die(void) { + unsigned int cpu = smp_processor_id(); + unsigned int hwcpu = hard_smp_processor_id(); + u8 cede_latency_hint = 0; + local_irq_disable(); idle_task_exit(); xics_teardown_cpu(); - unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow())); - rtas_stop_self(); + + if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) { + set_cpu_current_state(cpu, CPU_STATE_INACTIVE); + cede_latency_hint = 2; + + get_lppaca()->idle = 1; + if (!get_lppaca()->shared_proc) + get_lppaca()->donate_dedicated_cpu = 1; + + printk(KERN_INFO + "cpu %u (hwid %u) ceding for offline with hint %d\n", + cpu, hwcpu, cede_latency_hint); + while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) { + extended_cede_processor(cede_latency_hint); + printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n", + cpu, hwcpu); + printk(KERN_INFO + "Decrementer value = %x Timebase value = %llx\n", + get_dec(), get_tb()); + } + + printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n", + cpu, hwcpu); + + if (!get_lppaca()->shared_proc) + get_lppaca()->donate_dedicated_cpu = 0; + get_lppaca()->idle = 0; + } + + if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) { + unregister_slb_shadow(hwcpu, __pa(get_slb_shadow())); + + /* + * NOTE: Calling start_secondary() here for now to + * start new context. + * However, need to do it cleanly by resetting the + * stack pointer. + */ + start_secondary(); + + } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { + + set_cpu_current_state(cpu, CPU_STATE_OFFLINE); + unregister_slb_shadow(hard_smp_processor_id(), + __pa(get_slb_shadow())); + rtas_stop_self(); + } + /* Should never get here... */ BUG(); for(;;); @@ -106,18 +206,43 @@ static int pseries_cpu_disable(void) return 0; } +/* + * pseries_cpu_die: Wait for the cpu to die. + * @cpu: logical processor id of the CPU whose death we're awaiting. + * + * This function is called from the context of the thread which is performing + * the cpu-offline. Here we wait for long enough to allow the cpu in question + * to self-destroy so that the cpu-offline thread can send the CPU_DEAD + * notifications. + * + * OTOH, pseries_mach_cpu_die() is called by the @cpu when it wants to + * self-destruct. + */ static void pseries_cpu_die(unsigned int cpu) { int tries; - int cpu_status; + int cpu_status = 1; unsigned int pcpu = get_hard_smp_processor_id(cpu); - for (tries = 0; tries < 25; tries++) { - cpu_status = query_cpu_stopped(pcpu); - if (cpu_status == 0 || cpu_status == -1) - break; - cpu_relax(); + if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) { + cpu_status = 1; + for (tries = 0; tries < 1000; tries++) { + if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) { + cpu_status = 0; + break; + } + cpu_relax(); + } + } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { + + for (tries = 0; tries < 25; tries++) { + cpu_status = query_cpu_stopped(pcpu); + if (cpu_status == 0 || cpu_status == -1) + break; + cpu_relax(); + } } + if (cpu_status != 0) { printk("Querying DEAD? cpu %i (%i) shows %i\n", cpu, pcpu, cpu_status); @@ -252,10 +377,41 @@ static struct notifier_block pseries_smp_nb = { .notifier_call = pseries_smp_notifier, }; +#define MAX_CEDE_LATENCY_LEVELS 4 +#define CEDE_LATENCY_PARAM_LENGTH 10 +#define CEDE_LATENCY_PARAM_MAX_LENGTH \ + (MAX_CEDE_LATENCY_LEVELS * CEDE_LATENCY_PARAM_LENGTH * sizeof(char)) +#define CEDE_LATENCY_TOKEN 45 + +static char cede_parameters[CEDE_LATENCY_PARAM_MAX_LENGTH]; + +static int parse_cede_parameters(void) +{ + int call_status; + + memset(cede_parameters, 0, CEDE_LATENCY_PARAM_MAX_LENGTH); + call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1, + NULL, + CEDE_LATENCY_TOKEN, + __pa(cede_parameters), + CEDE_LATENCY_PARAM_MAX_LENGTH); + + if (call_status != 0) + printk(KERN_INFO "CEDE_LATENCY: \ + %s %s Error calling get-system-parameter(0x%x)\n", + __FILE__, __func__, call_status); + else + printk(KERN_INFO "CEDE_LATENCY: \ + get-system-parameter successful.\n"); + + return call_status; +} + static int __init pseries_cpu_hotplug_init(void) { struct device_node *np; const char *typep; + int cpu; for_each_node_by_name(np, "interrupt-controller") { typep = of_get_property(np, "compatible", NULL); @@ -283,8 +439,16 @@ static int __init pseries_cpu_hotplug_init(void) smp_ops->cpu_die = pseries_cpu_die; /* Processors can be added/removed only on LPAR */ - if (firmware_has_feature(FW_FEATURE_LPAR)) + if (firmware_has_feature(FW_FEATURE_LPAR)) { pSeries_reconfig_notifier_register(&pseries_smp_nb); + cpu_maps_update_begin(); + if (cede_offline_enabled && parse_cede_parameters() == 0) { + default_offline_state = CPU_STATE_INACTIVE; + for_each_online_cpu(cpu) + set_default_offline_state(cpu); + } + cpu_maps_update_done(); + } return 0; } diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h new file mode 100644 index 000000000000..22574e0d9d91 --- /dev/null +++ b/arch/powerpc/platforms/pseries/offline_states.h @@ -0,0 +1,18 @@ +#ifndef _OFFLINE_STATES_H_ +#define _OFFLINE_STATES_H_ + +/* Cpu offline states go here */ +enum cpu_state_vals { + CPU_STATE_OFFLINE, + CPU_STATE_INACTIVE, + CPU_STATE_ONLINE, + CPU_MAX_OFFLINE_STATES +}; + +extern enum cpu_state_vals get_cpu_current_state(int cpu); +extern void set_cpu_current_state(int cpu, enum cpu_state_vals state); +extern enum cpu_state_vals get_preferred_offline_state(int cpu); +extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state); +extern void set_default_offline_state(int cpu); +extern int start_secondary(void); +#endif diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 440000cc7130..8868c012268a 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -48,6 +48,7 @@ #include "plpar_wrappers.h" #include "pseries.h" #include "xics.h" +#include "offline_states.h" /* @@ -84,6 +85,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu) /* Fixup atomic count: it exited inside IRQ handler. */ task_thread_info(paca[lcpu].__current)->preempt_count = 0; + if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE) + goto out; + /* * If the RTAS start-cpu token does not exist then presume the * cpu is already spinning. @@ -98,6 +102,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu) return 0; } +out: return 1; } @@ -111,12 +116,16 @@ static void __devinit smp_xics_setup_cpu(int cpu) vpa_init(cpu); cpu_clear(cpu, of_spin_map); + set_cpu_current_state(cpu, CPU_STATE_ONLINE); + set_default_offline_state(cpu); } #endif /* CONFIG_XICS */ static void __devinit smp_pSeries_kick_cpu(int nr) { + long rc; + unsigned long hcpuid; BUG_ON(nr < 0 || nr >= NR_CPUS); if (!smp_startup_cpu(nr)) @@ -128,6 +137,16 @@ static void __devinit smp_pSeries_kick_cpu(int nr) * the processor will continue on to secondary_start */ paca[nr].cpu_start = 1; + + set_preferred_offline_state(nr, CPU_STATE_ONLINE); + + if (get_cpu_current_state(nr) == CPU_STATE_INACTIVE) { + hcpuid = get_hard_smp_processor_id(nr); + rc = plpar_hcall_norets(H_PROD, hcpuid); + if (rc != H_SUCCESS) + panic("Error: Prod to wake up processor %d Ret= %ld\n", + nr, rc); + } } static int smp_pSeries_cpu_bootable(unsigned int nr) -- cgit v1.2.3 From 0cda8b91f2e096bbef1cb05f23c42e423eae7728 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Wed, 21 Oct 2009 21:45:46 -0600 Subject: [CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/ This is a complex interface and is already described in Documentation/cpu-freq/, especially in the user-guide.txt file. No need to copy/paste all that information. Let's just alert the reader to the presence of the user-guide. Signed-off-by: Alex Chiang Signed-off-by: Dave Jones --- Documentation/ABI/testing/sysfs-devices-system-cpu | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index a703b9e9aeb9..974e29f5da86 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -136,6 +136,24 @@ Description: Discover cpuidle policy and mechanism See files in Documentation/cpuidle/ for more information. +What: /sys/devices/system/cpu/cpu#/cpufreq/* +Date: pre-git history +Contact: cpufreq@vger.kernel.org +Description: Discover and change clock speed of CPUs + + Clock scaling allows you to change the clock speed of the + CPUs on the fly. This is a nice method to save battery + power, because the lower the clock speed, the less power + the CPU consumes. + + There are many knobs to tweak in this directory. + + See files in Documentation/cpu-freq/ for more information. + + In particular, read Documentation/cpu-freq/user-guide.txt + to learn how to control the knobs. + + What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X Date: August 2008 KernelVersion: 2.6.27 -- cgit v1.2.3 From bbe237aafeaae37a1088f2a95ebe81ff81d9e646 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Thu, 12 Nov 2009 16:06:45 +0000 Subject: [CPUFREQ] Document units for transition latency They're documented in the header but not in Documentation. Signed-off-by: Mark Brown Signed-off-by: Dave Jones --- Documentation/cpu-freq/cpu-drivers.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 75a58d14d3cf..6c30e930c122 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -92,9 +92,9 @@ policy->cpuinfo.max_freq - the minimum and maximum frequency (in kHz) which is supported by this CPU policy->cpuinfo.transition_latency the time it takes on this CPU to - switch between two frequencies (if - appropriate, else specify - CPUFREQ_ETERNAL) + switch between two frequencies in + nanoseconds (if appropriate, else + specify CPUFREQ_ETERNAL) policy->cur The current operating frequency of this CPU (if appropriate) -- cgit v1.2.3 From e2f74f355e9e2914483db10c05d70e69e0b7ae04 Mon Sep 17 00:00:00 2001 From: Thomas Renninger Date: Thu, 19 Nov 2009 12:31:01 +0100 Subject: [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface This interface is mainly intended (and implemented) for ACPI _PPC BIOS frequency limitations, but other cpufreq drivers can also use it for similar use-cases. Why is this needed: Currently it's not obvious why cpufreq got limited. People see cpufreq/scaling_max_freq reduced, but this could have happened by: - any userspace prog writing to scaling_max_freq - thermal limitations - hardware (_PPC in ACPI case) limitiations Therefore export bios_limit (in kHz) to: - Point the user that it's the BIOS (broken or intended) which limits frequency - Export it as a sysfs interface for userspace progs. While this was a rarely used feature on laptops, there will appear more and more server implemenations providing "Green IT" features like allowing the service processor to limit the frequency. People want to know about HW/BIOS frequency limitations. All ACPI P-state driven cpufreq drivers are covered with this patch: - powernow-k8 - powernow-k7 - acpi-cpufreq Tested with a patched DSDT which limits the first two cores (_PPC returns 1) via _PPC, exposed by bios_limit: # echo 2200000 >cpu2/cpufreq/scaling_max_freq # cat cpu*/cpufreq/scaling_max_freq 2600000 2600000 2200000 2200000 # #scaling_max_freq shows general user/thermal/BIOS limitations # cat cpu*/cpufreq/bios_limit 2600000 2600000 2800000 2800000 # #bios_limit only shows the HW/BIOS limitation CC: Pallipadi Venkatesh CC: Len Brown CC: davej@codemonkey.org.uk CC: linux@dominikbrodowski.net Signed-off-by: Thomas Renninger Signed-off-by: Dave Jones --- Documentation/cpu-freq/user-guide.txt | 11 +++++++++++ arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 17 +++++++++-------- arch/x86/kernel/cpu/cpufreq/powernow-k7.c | 19 +++++++++++-------- arch/x86/kernel/cpu/cpufreq/powernow-k8.c | 17 +++++++++-------- drivers/acpi/processor_perflib.c | 13 +++++++++++++ drivers/cpufreq/cpufreq.c | 21 +++++++++++++++++++++ include/acpi/processor.h | 6 ++++++ include/linux/cpufreq.h | 1 + 8 files changed, 81 insertions(+), 24 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 2a5b850847c0..04f6b32993e6 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -203,6 +203,17 @@ scaling_cur_freq : Current frequency of the CPU as determined by the frequency the kernel thinks the CPU runs at. +bios_limit : If the BIOS tells the OS to limit a CPU to + lower frequencies, the user can read out the + maximum available frequency from this file. + This typically can happen through (often not + intended) BIOS settings, restrictions + triggered through a service processor or other + BIOS/HW based implementations. + This does not cover thermal ACPI limitations + which can be detected through the generic + thermal driver. + If you have selected the "userspace" governor which allows you to set the CPU operating frequency to a specific value, you can read out the current frequency in diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c index 8b581d3905cb..d2e7c77c1ea4 100644 --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c @@ -764,14 +764,15 @@ static struct freq_attr *acpi_cpufreq_attr[] = { }; static struct cpufreq_driver acpi_cpufreq_driver = { - .verify = acpi_cpufreq_verify, - .target = acpi_cpufreq_target, - .init = acpi_cpufreq_cpu_init, - .exit = acpi_cpufreq_cpu_exit, - .resume = acpi_cpufreq_resume, - .name = "acpi-cpufreq", - .owner = THIS_MODULE, - .attr = acpi_cpufreq_attr, + .verify = acpi_cpufreq_verify, + .target = acpi_cpufreq_target, + .bios_limit = acpi_processor_get_bios_limit, + .init = acpi_cpufreq_cpu_init, + .exit = acpi_cpufreq_cpu_exit, + .resume = acpi_cpufreq_resume, + .name = "acpi-cpufreq", + .owner = THIS_MODULE, + .attr = acpi_cpufreq_attr, }; static int __init acpi_cpufreq_init(void) diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k7.c b/arch/x86/kernel/cpu/cpufreq/powernow-k7.c index d47c775eb0ab..9a97116f89e5 100644 --- a/arch/x86/kernel/cpu/cpufreq/powernow-k7.c +++ b/arch/x86/kernel/cpu/cpufreq/powernow-k7.c @@ -714,14 +714,17 @@ static struct freq_attr *powernow_table_attr[] = { }; static struct cpufreq_driver powernow_driver = { - .verify = powernow_verify, - .target = powernow_target, - .get = powernow_get, - .init = powernow_cpu_init, - .exit = powernow_cpu_exit, - .name = "powernow-k7", - .owner = THIS_MODULE, - .attr = powernow_table_attr, + .verify = powernow_verify, + .target = powernow_target, + .get = powernow_get, +#ifdef CONFIG_X86_POWERNOW_K7_ACPI + .bios_limit = acpi_processor_get_bios_limit, +#endif + .init = powernow_cpu_init, + .exit = powernow_cpu_exit, + .name = "powernow-k7", + .owner = THIS_MODULE, + .attr = powernow_table_attr, }; static int __init powernow_init(void) diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c index f30d25383940..a9df9441a9a2 100644 --- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c +++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c @@ -1398,14 +1398,15 @@ static struct freq_attr *powernow_k8_attr[] = { }; static struct cpufreq_driver cpufreq_amd64_driver = { - .verify = powernowk8_verify, - .target = powernowk8_target, - .init = powernowk8_cpu_init, - .exit = __devexit_p(powernowk8_cpu_exit), - .get = powernowk8_get, - .name = "powernow-k8", - .owner = THIS_MODULE, - .attr = powernow_k8_attr, + .verify = powernowk8_verify, + .target = powernowk8_target, + .bios_limit = acpi_processor_get_bios_limit, + .init = powernowk8_cpu_init, + .exit = __devexit_p(powernowk8_cpu_exit), + .get = powernowk8_get, + .name = "powernow-k8", + .owner = THIS_MODULE, + .attr = powernow_k8_attr, }; /* driver entry point for init */ diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c index 8ba0ed0b9ddb..01e366d2b6fb 100644 --- a/drivers/acpi/processor_perflib.c +++ b/drivers/acpi/processor_perflib.c @@ -167,6 +167,19 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr) return cpufreq_update_policy(pr->id); } +int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) +{ + struct acpi_processor *pr; + + pr = per_cpu(processors, cpu); + if (!pr || !pr->performance || !pr->performance->state_count) + return -ENODEV; + *limit = pr->performance->states[pr->performance_platform_limit]. + core_frequency * 1000; + return 0; +} +EXPORT_SYMBOL(acpi_processor_get_bios_limit); + void acpi_processor_ppc_init(void) { if (!cpufreq_register_notifier diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5b9b1c8c4950..f20668c09ce0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -647,6 +647,21 @@ static ssize_t show_scaling_setspeed(struct cpufreq_policy *policy, char *buf) return policy->governor->show_setspeed(policy, buf); } +/** + * show_scaling_driver - show the current cpufreq HW/BIOS limitation + */ +static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf) +{ + unsigned int limit; + int ret; + if (cpufreq_driver->bios_limit) { + ret = cpufreq_driver->bios_limit(policy->cpu, &limit); + if (!ret) + return sprintf(buf, "%u\n", limit); + } + return sprintf(buf, "%u\n", policy->cpuinfo.max_freq); +} + #define define_one_ro(_name) \ static struct freq_attr _name = \ __ATTR(_name, 0444, show_##_name, NULL) @@ -666,6 +681,7 @@ define_one_ro(cpuinfo_transition_latency); define_one_ro(scaling_available_governors); define_one_ro(scaling_driver); define_one_ro(scaling_cur_freq); +define_one_ro(bios_limit); define_one_ro(related_cpus); define_one_ro(affected_cpus); define_one_rw(scaling_min_freq); @@ -905,6 +921,11 @@ static int cpufreq_add_dev_interface(unsigned int cpu, if (ret) goto err_out_kobj_put; } + if (cpufreq_driver->bios_limit) { + ret = sysfs_create_file(&policy->kobj, &bios_limit.attr); + if (ret) + goto err_out_kobj_put; + } spin_lock_irqsave(&cpufreq_driver_lock, flags); for_each_cpu(j, policy->cpus) { diff --git a/include/acpi/processor.h b/include/acpi/processor.h index 740ac3ad8fd0..8b668ead6d6e 100644 --- a/include/acpi/processor.h +++ b/include/acpi/processor.h @@ -295,6 +295,7 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx void acpi_processor_ppc_init(void); void acpi_processor_ppc_exit(void); int acpi_processor_ppc_has_changed(struct acpi_processor *pr); +extern int acpi_processor_get_bios_limit(int cpu, unsigned int *limit); #else static inline void acpi_processor_ppc_init(void) { @@ -316,6 +317,11 @@ static inline int acpi_processor_ppc_has_changed(struct acpi_processor *pr) } return 0; } +static inline int acpi_processor_get_bios_limit(int cpu, unsigned int *limit) +{ + return -ENODEV; +} + #endif /* CONFIG_CPU_FREQ */ /* in processor_throttling.c */ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 79a2340d83cd..4de02b10007f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -232,6 +232,7 @@ struct cpufreq_driver { /* optional */ unsigned int (*getavg) (struct cpufreq_policy *policy, unsigned int cpu); + int (*bios_limit) (int cpu, unsigned int *limit); int (*exit) (struct cpufreq_policy *policy); int (*suspend) (struct cpufreq_policy *policy, pm_message_t pmsg); -- cgit v1.2.3 From f5754bfd107b08edddaf871d676ec6fe0792d07a Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Thu, 19 Nov 2009 13:54:05 -0800 Subject: lockstat: Add usage info to Documentation/lockstat.txt Add usage info to Documentation/lockstat.txt Signed-off-by: Frank Rowand Signed-off-by: Peter Zijlstra LKML-Reference: <4B05BE7D.30500@am.sony.com> Signed-off-by: Ingo Molnar --- Documentation/lockstat.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index 9cb9138f7a79..65f4c795015d 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -62,8 +62,20 @@ applicable). It also tracks 4 contention points per class. A contention point is a call site that had to wait on lock acquisition. + - CONFIGURATION + +Lock statistics are enabled via CONFIG_LOCK_STATS. + - USAGE +Enable collection of statistics: + +# echo 1 >/proc/sys/kernel/lock_stat + +Disable collection of statistics: + +# echo 0 >/proc/sys/kernel/lock_stat + Look at the current lock statistics: ( line numbers not part of actual output, done for clarity in the explanation -- cgit v1.2.3 From 64f16603eae17e869d5fc8a60ae987394190e639 Mon Sep 17 00:00:00 2001 From: Tilman Schmidt Date: Sat, 5 Dec 2009 08:54:20 +0000 Subject: gigaset: documentation amendments Various additions and improvements to the Gigaset driver's README file, and added comments to its userspace visible include file. Signed-off-by: Tilman Schmidt Signed-off-by: David S. Miller --- Documentation/isdn/README.gigaset | 116 ++++++++++++++++++++++++++++++-------- include/linux/gigaset_dev.h | 22 +++++--- 2 files changed, 106 insertions(+), 32 deletions(-) (limited to 'Documentation') diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset index 0fc9831d7ecb..794941fc9493 100644 --- a/Documentation/isdn/README.gigaset +++ b/Documentation/isdn/README.gigaset @@ -68,22 +68,38 @@ GigaSet 307x Device Driver for troubleshooting or to pass module parameters. The module ser_gigaset provides a serial line discipline N_GIGASET_M101 - which drives the device through the regular serial line driver. It must - be attached to the serial line to which the M101 is connected with the - ldattach(8) command (requires util-linux-ng release 2.14 or later), for - example: - ldattach GIGASET_M101 /dev/ttyS1 + which uses the regular serial port driver to access the device, and must + therefore be attached to the serial device to which the M101 is connected. + The ldattach(8) command (included in util-linux-ng release 2.14 or later) + can be used for that purpose, for example: + ldattach GIGASET_M101 /dev/ttyS1 This will open the device file, attach the line discipline to it, and then sleep in the background, keeping the device open so that the line discipline remains active. To deactivate it, kill the daemon, for example with - killall ldattach + killall ldattach before disconnecting the device. To have this happen automatically at system startup/shutdown on an LSB compatible system, create and activate an appropriate LSB startup script /etc/init.d/gigaset. (The init name 'gigaset' is officially assigned to this project by LANANA.) Alternatively, just add the 'ldattach' command line to /etc/rc.local. + The modules accept the following parameters: + + Module Parameter Meaning + + gigaset debug debug level (see section 3.2.) + + startmode initial operation mode (see section 2.5.): + bas_gigaset ) 1=ISDN4linux/CAPI (default), 0=Unimodem + ser_gigaset ) + usb_gigaset ) cidmode initial Call-ID mode setting (see section + 2.5.): 1=on (default), 0=off + + Depending on your distribution you may want to create a separate module + configuration file /etc/modprobe.d/gigaset for these, or add them to a + custom file like /etc/modprobe.conf.local. + 2.2. Device nodes for user space programs ------------------------------------ The device can be accessed from user space (eg. by the user space tools @@ -93,11 +109,48 @@ GigaSet 307x Device Driver - /dev/ttyGU0 for M105 (USB data boxes) - /dev/ttyGB0 for the base driver (direct USB connection) - You can also select a "default device" which is used by the frontends when + If you connect more than one device of a type, they will get consecutive + device nodes, eg. /dev/ttyGU1 for a second M105. + + You can also set a "default device" for the user space tools to use when no device node is given as parameter, by creating a symlink /dev/ttyG to one of them, eg.: - ln -s /dev/ttyGB0 /dev/ttyG + ln -s /dev/ttyGB0 /dev/ttyG + + The devices accept the following device specific ioctl calls + (defined in gigaset_dev.h): + + ioctl(int fd, GIGASET_REDIR, int *cmd); + If cmd==1, the device is set to be controlled exclusively through the + character device node; access from the ISDN subsystem is blocked. + If cmd==0, the device is set to be used from the ISDN subsystem and does + not communicate through the character device node. + + ioctl(int fd, GIGASET_CONFIG, int *cmd); + (ser_gigaset and usb_gigaset only) + If cmd==1, the device is set to adapter configuration mode where commands + are interpreted by the M10x DECT adapter itself instead of being + forwarded to the base station. In this mode, the device accepts the + commands described in Siemens document "AT-Kommando Alignment M10x Data" + for setting the operation mode, associating with a base station and + querying parameters like field strengh and signal quality. + Note that there is no ioctl command for leaving adapter configuration + mode and returning to regular operation. In order to leave adapter + configuration mode, write the command ATO to the device. + + ioctl(int fd, GIGASET_BRKCHARS, unsigned char brkchars[6]); + (usb_gigaset only) + Set the break characters on an M105's internal serial adapter to the six + bytes stored in brkchars[]. Unused bytes should be set to zero. + + ioctl(int fd, GIGASET_VERSION, unsigned version[4]); + Retrieve version information from the driver. version[0] must be set to + one of: + - GIGVER_DRIVER: retrieve driver version + - GIGVER_COMPAT: retrieve interface compatibility version + - GIGVER_FWBASE: retrieve the firmware version of the base + Upon return, version[] is filled with the requested version information. 2.3. ISDN4linux ---------- @@ -113,15 +166,24 @@ GigaSet 307x Device Driver Connection State: 0, Response: -1 gigaset_process_response: resp_code -1 in ConState 0 ! Timeout occurred - you might need to use unimodem mode. (see section 2.5.) + you probably need to use unimodem mode. (see section 2.5.) 2.4. CAPI ---- If the driver is compiled with CAPI support (kernel configuration option GIGASET_CAPI, experimental) it can also be used with CAPI 2.0 kernel and - user space applications. ISDN4Linux is supported in this configuration + user space applications. For user space access, the module capi.ko must + be loaded. The capiinit command (included in the capi4k-utils package) + does this for you. + + The CAPI variant of the driver supports legacy ISDN4Linux applications via the capidrv compatibility driver. The kernel module capidrv.ko must - be loaded explicitly ("modprobe capidrv") if needed. + be loaded explicitly with the command + modprobe capidrv + if needed, and cannot be unloaded again without unloading the driver + first. (These are limitations of capidrv.) + + The note about unimodem mode in the preceding section applies here, too. 2.5. Unimodem mode ------------- @@ -134,9 +196,14 @@ GigaSet 307x Device Driver You can switch back using gigacontr --mode isdn - You can also load the driver using e.g. - modprobe usb_gigaset startmode=0 - to prevent the driver from starting in "isdn4linux mode". + You can also put the driver directly into Unimodem mode when it's loaded, + by passing the module parameter startmode=0 to the hardware specific + module, e.g. + modprobe usb_gigaset startmode=0 + or by adding a line like + options usb_gigaset startmode=0 + to an appropriate module configuration file, like /etc/modprobe.d/gigaset + or /etc/modprobe.conf.local. In this mode the device works like a modem connected to a serial port (the /dev/ttyGU0, ... mentioned above) which understands the commands @@ -164,9 +231,8 @@ GigaSet 307x Device Driver options ppp_async flag_time=0 - to /etc/modprobe.conf. If your distribution has some local module - configuration file like /etc/modprobe.conf.local, - using that should be preferred. + to an appropriate module configuration file, like /etc/modprobe.d/gigaset + or /etc/modprobe.conf.local. 2.6. Call-ID (CID) mode ------------------ @@ -189,12 +255,13 @@ GigaSet 307x Device Driver settings (CID mode). - If you have several DECT data devices (M10x) which you want to use in turn, select Unimodem mode by passing the parameter "cidmode=0" to - the driver ("modprobe usb_gigaset cidmode=0" or modprobe.conf). + the appropriate driver module (ser_gigaset or usb_gigaset). If you want both of these at once, you are out of luck. - You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode - setting (ttyGxy is ttyGU0 or ttyGB0). + You can also use the tty class parameter "cidmode" of the device to + change its CID mode while the driver is loaded, eg. + echo 0 > /sys/class/tty/ttyGU0/cidmode 2.7. Unregistered Wireless Devices (M101/M105) ----------------------------------------- @@ -208,7 +275,7 @@ GigaSet 307x Device Driver driver. In that situation, a restricted set of functions is available which includes, in particular, those necessary for registering the device to a base or for switching it between Fixed Part and Portable Part - modes. + modes. See the gigacontr(8) manpage for details. 3. Troubleshooting --------------- @@ -222,9 +289,7 @@ GigaSet 307x Device Driver options isdn dialtimeout=15 - to /etc/modprobe.conf. If your distribution has some local module - configuration file like /etc/modprobe.conf.local, - using that should be preferred. + to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file. Problem: Your isdn script aborts with a message about isdnlog. @@ -264,7 +329,8 @@ GigaSet 307x Device Driver The initial value can be set using the debug parameter when loading the module "gigaset", e.g. by adding a line options gigaset debug=0 - to /etc/modprobe.conf, ... + to your module configuration file, eg. /etc/modprobe.d/gigaset or + /etc/modprobe.conf.local. Generated debugging information can be found - as output of the command diff --git a/include/linux/gigaset_dev.h b/include/linux/gigaset_dev.h index 5dc4a316ca37..258ba82937e7 100644 --- a/include/linux/gigaset_dev.h +++ b/include/linux/gigaset_dev.h @@ -16,15 +16,23 @@ #include +/* The magic IOCTL value for this interface. */ #define GIGASET_IOCTL 0x47 -#define GIGVER_DRIVER 0 -#define GIGVER_COMPAT 1 -#define GIGVER_FWBASE 2 +/* enable/disable device control via character device (lock out ISDN subsys) */ +#define GIGASET_REDIR _IOWR(GIGASET_IOCTL, 0, int) -#define GIGASET_REDIR _IOWR (GIGASET_IOCTL, 0, int) -#define GIGASET_CONFIG _IOWR (GIGASET_IOCTL, 1, int) -#define GIGASET_BRKCHARS _IOW (GIGASET_IOCTL, 2, unsigned char[6]) //FIXME [6] okay? -#define GIGASET_VERSION _IOWR (GIGASET_IOCTL, 3, unsigned[4]) +/* enable adapter configuration mode (M10x only) */ +#define GIGASET_CONFIG _IOWR(GIGASET_IOCTL, 1, int) + +/* set break characters (M105 only) */ +#define GIGASET_BRKCHARS _IOW(GIGASET_IOCTL, 2, unsigned char[6]) + +/* get version information selected by arg[0] */ +#define GIGASET_VERSION _IOWR(GIGASET_IOCTL, 3, unsigned[4]) +/* values for GIGASET_VERSION arg[0] */ +#define GIGVER_DRIVER 0 /* get driver version */ +#define GIGVER_COMPAT 1 /* get interface compatibility version */ +#define GIGVER_FWBASE 2 /* get base station firmware version */ #endif -- cgit v1.2.3 From 12633e803a2a556f6469e0933d08233d0844a2d9 Mon Sep 17 00:00:00 2001 From: Nathan Fontenot Date: Wed, 25 Nov 2009 17:23:25 +0000 Subject: sysfs/cpu: Add probe/release files Version 3 of this patch is updated with documentation added to Documentation/ABI. There are no changes to any of the C code from v2 of the patch. In order to support kernel DLPAR of CPU resources we need to provide an interface to add (probe) and remove (release) the resource from the system. This patch Creates new generic probe and release sysfs files to facilitate cpu probe/release. The probe/release interface provides for allowing each arch to supply their own routines for implementing the backend of adding and removing cpus to/from the system. This also creates the powerpc specific stubs to handle the arch callouts from writes to the sysfs files. The creation and use of these files is regulated by the CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the capability will have the files created. Signed-off-by: Nathan Fontenot Signed-off-by: Benjamin Herrenschmidt --- Documentation/ABI/testing/sysfs-devices-system-cpu | 15 ++++++++++ arch/powerpc/Kconfig | 4 +++ arch/powerpc/include/asm/machdep.h | 5 ++++ arch/powerpc/kernel/sysfs.c | 19 +++++++++++++ drivers/base/cpu.c | 32 ++++++++++++++++++++++ include/linux/cpu.h | 2 ++ 6 files changed, 77 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index a703b9e9aeb9..d868a11c94a5 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -62,6 +62,21 @@ Description: CPU topology files that describe kernel limits related to See Documentation/cputopology.txt for more information. +What: /sys/devices/system/cpu/probe + /sys/devices/system/cpu/release +Date: November 2009 +Contact: Linux kernel mailing list +Description: Dynamic addition and removal of CPU's. This is not hotplug + removal, this is meant complete removal/addition of the CPU + from the system. + + probe: writes to this file will dynamically add a CPU to the + system. Information written to the file to add CPU's is + architecture specific. + + release: writes to this file dynamically remove a CPU from + the system. Information writtento the file to remove CPU's + is architecture specific. What: /sys/devices/system/cpu/cpu#/node Date: October 2009 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5dbd375a3f2a..0df57466e783 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -320,6 +320,10 @@ config HOTPLUG_CPU Say N if you are unsure. +config ARCH_CPU_PROBE_RELEASE + def_bool y + depends on HOTPLUG_CPU + config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 9efa2be78331..9f0fc9e6ce0d 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -266,6 +266,11 @@ struct machdep_calls { void (*suspend_disable_irqs)(void); void (*suspend_enable_irqs)(void); #endif + +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE + ssize_t (*cpu_probe)(const char *, size_t); + ssize_t (*cpu_release)(const char *, size_t); +#endif }; extern void e500_idle(void); diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 956ab33fd73f..e235e52dc4fe 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -461,6 +461,25 @@ static void unregister_cpu_online(unsigned int cpu) cacheinfo_cpu_offline(cpu); } + +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE +ssize_t arch_cpu_probe(const char *buf, size_t count) +{ + if (ppc_md.cpu_probe) + return ppc_md.cpu_probe(buf, count); + + return -EINVAL; +} + +ssize_t arch_cpu_release(const char *buf, size_t count) +{ + if (ppc_md.cpu_release) + return ppc_md.cpu_release(buf, count); + + return -EINVAL; +} +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ + #endif /* CONFIG_HOTPLUG_CPU */ static int __cpuinit sysfs_cpu_notify(struct notifier_block *self, diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index e62a4ccea54d..7c03af7b84a9 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -72,6 +72,38 @@ void unregister_cpu(struct cpu *cpu) per_cpu(cpu_sys_devices, logical_cpu) = NULL; return; } + +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE +static ssize_t cpu_probe_store(struct class *class, const char *buf, + size_t count) +{ + return arch_cpu_probe(buf, count); +} + +static ssize_t cpu_release_store(struct class *class, const char *buf, + size_t count) +{ + return arch_cpu_release(buf, count); +} + +static CLASS_ATTR(probe, S_IWUSR, NULL, cpu_probe_store); +static CLASS_ATTR(release, S_IWUSR, NULL, cpu_release_store); + +int __init cpu_probe_release_init(void) +{ + int rc; + + rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj, + &class_attr_probe.attr); + if (!rc) + rc = sysfs_create_file(&cpu_sysdev_class.kset.kobj, + &class_attr_release.attr); + + return rc; +} +device_initcall(cpu_probe_release_init); +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ + #else /* ... !CONFIG_HOTPLUG_CPU */ static inline void register_cpu_control(struct cpu *cpu) { diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 47536197ffdd..c972f7ccb7d3 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -43,6 +43,8 @@ extern int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls); #ifdef CONFIG_HOTPLUG_CPU extern void unregister_cpu(struct cpu *cpu); +extern ssize_t arch_cpu_probe(const char *, size_t); +extern ssize_t arch_cpu_release(const char *, size_t); #endif struct notifier_block; -- cgit v1.2.3 From 4d1a7c122aeae6ae9732be0a32f5e199fff63fb7 Mon Sep 17 00:00:00 2001 From: Tomi Valkeinen Date: Tue, 4 Aug 2009 15:47:11 +0300 Subject: OMAP: DSS2: Documentation for DSS2 Signed-off-by: Tomi Valkeinen --- Documentation/arm/OMAP/DSS | 317 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 317 insertions(+) create mode 100644 Documentation/arm/OMAP/DSS (limited to 'Documentation') diff --git a/Documentation/arm/OMAP/DSS b/Documentation/arm/OMAP/DSS new file mode 100644 index 000000000000..0af0e9eed5d6 --- /dev/null +++ b/Documentation/arm/OMAP/DSS @@ -0,0 +1,317 @@ +OMAP2/3 Display Subsystem +------------------------- + +This is an almost total rewrite of the OMAP FB driver in drivers/video/omap +(let's call it DSS1). The main differences between DSS1 and DSS2 are DSI, +TV-out and multiple display support, but there are lots of small improvements +also. + +The DSS2 driver (omapdss module) is in arch/arm/plat-omap/dss/, and the FB, +panel and controller drivers are in drivers/video/omap2/. DSS1 and DSS2 live +currently side by side, you can choose which one to use. + +Features +-------- + +Working and tested features include: + +- MIPI DPI (parallel) output +- MIPI DSI output in command mode +- MIPI DBI (RFBI) output +- SDI output +- TV output +- All pieces can be compiled as a module or inside kernel +- Use DISPC to update any of the outputs +- Use CPU to update RFBI or DSI output +- OMAP DISPC planes +- RGB16, RGB24 packed, RGB24 unpacked +- YUV2, UYVY +- Scaling +- Adjusting DSS FCK to find a good pixel clock +- Use DSI DPLL to create DSS FCK + +Tested boards include: +- OMAP3 SDP board +- Beagle board +- N810 + +omapdss driver +-------------- + +The DSS driver does not itself have any support for Linux framebuffer, V4L or +such like the current ones, but it has an internal kernel API that upper level +drivers can use. + +The DSS driver models OMAP's overlays, overlay managers and displays in a +flexible way to enable non-common multi-display configuration. In addition to +modelling the hardware overlays, omapdss supports virtual overlays and overlay +managers. These can be used when updating a display with CPU or system DMA. + +Panel and controller drivers +---------------------------- + +The drivers implement panel or controller specific functionality and are not +usually visible to users except through omapfb driver. They register +themselves to the DSS driver. + +omapfb driver +------------- + +The omapfb driver implements arbitrary number of standard linux framebuffers. +These framebuffers can be routed flexibly to any overlays, thus allowing very +dynamic display architecture. + +The driver exports some omapfb specific ioctls, which are compatible with the +ioctls in the old driver. + +The rest of the non standard features are exported via sysfs. Whether the final +implementation will use sysfs, or ioctls, is still open. + +V4L2 drivers +------------ + +V4L2 is being implemented in TI. + +From omapdss point of view the V4L2 drivers should be similar to framebuffer +driver. + +Architecture +-------------------- + +Some clarification what the different components do: + + - Framebuffer is a memory area inside OMAP's SRAM/SDRAM that contains the + pixel data for the image. Framebuffer has width and height and color + depth. + - Overlay defines where the pixels are read from and where they go on the + screen. The overlay may be smaller than framebuffer, thus displaying only + part of the framebuffer. The position of the overlay may be changed if + the overlay is smaller than the display. + - Overlay manager combines the overlays in to one image and feeds them to + display. + - Display is the actual physical display device. + +A framebuffer can be connected to multiple overlays to show the same pixel data +on all of the overlays. Note that in this case the overlay input sizes must be +the same, but, in case of video overlays, the output size can be different. Any +framebuffer can be connected to any overlay. + +An overlay can be connected to one overlay manager. Also DISPC overlays can be +connected only to DISPC overlay managers, and virtual overlays can be only +connected to virtual overlays. + +An overlay manager can be connected to one display. There are certain +restrictions which kinds of displays an overlay manager can be connected: + + - DISPC TV overlay manager can be only connected to TV display. + - Virtual overlay managers can only be connected to DBI or DSI displays. + - DISPC LCD overlay manager can be connected to all displays, except TV + display. + +Sysfs +----- +The sysfs interface is mainly used for testing. I don't think sysfs +interface is the best for this in the final version, but I don't quite know +what would be the best interfaces for these things. + +The sysfs interface is divided to two parts: DSS and FB. + +/sys/class/graphics/fb? directory: +mirror 0=off, 1=on +rotate Rotation 0-3 for 0, 90, 180, 270 degrees +rotate_type 0 = DMA rotation, 1 = VRFB rotation +overlays List of overlay numbers to which framebuffer pixels go +phys_addr Physical address of the framebuffer +virt_addr Virtual address of the framebuffer +size Size of the framebuffer + +/sys/devices/platform/omapdss/overlay? directory: +enabled 0=off, 1=on +input_size width,height (ie. the framebuffer size) +manager Destination overlay manager name +name +output_size width,height +position x,y +screen_width width +global_alpha global alpha 0-255 0=transparent 255=opaque + +/sys/devices/platform/omapdss/manager? directory: +display Destination display +name +alpha_blending_enabled 0=off, 1=on +trans_key_enabled 0=off, 1=on +trans_key_type gfx-destination, video-source +trans_key_value transparency color key (RGB24) +default_color default background color (RGB24) + +/sys/devices/platform/omapdss/display? directory: +ctrl_name Controller name +mirror 0=off, 1=on +update_mode 0=off, 1=auto, 2=manual +enabled 0=off, 1=on +name +rotate Rotation 0-3 for 0, 90, 180, 270 degrees +timings Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw) + When writing, two special timings are accepted for tv-out: + "pal" and "ntsc" +panel_name +tear_elim Tearing elimination 0=off, 1=on + +There are also some debugfs files at /omapdss/ which show information +about clocks and registers. + +Examples +-------- + +The following definitions have been made for the examples below: + +ovl0=/sys/devices/platform/omapdss/overlay0 +ovl1=/sys/devices/platform/omapdss/overlay1 +ovl2=/sys/devices/platform/omapdss/overlay2 + +mgr0=/sys/devices/platform/omapdss/manager0 +mgr1=/sys/devices/platform/omapdss/manager1 + +lcd=/sys/devices/platform/omapdss/display0 +dvi=/sys/devices/platform/omapdss/display1 +tv=/sys/devices/platform/omapdss/display2 + +fb0=/sys/class/graphics/fb0 +fb1=/sys/class/graphics/fb1 +fb2=/sys/class/graphics/fb2 + +Default setup on OMAP3 SDP +-------------------------- + +Here's the default setup on OMAP3 SDP board. All planes go to LCD. DVI +and TV-out are not in use. The columns from left to right are: +framebuffers, overlays, overlay managers, displays. Framebuffers are +handled by omapfb, and the rest by the DSS. + +FB0 --- GFX -\ DVI +FB1 --- VID1 --+- LCD ---- LCD +FB2 --- VID2 -/ TV ----- TV + +Example: Switch from LCD to DVI +---------------------- + +w=`cat $dvi/timings | cut -d "," -f 2 | cut -d "/" -f 1` +h=`cat $dvi/timings | cut -d "," -f 3 | cut -d "/" -f 1` + +echo "0" > $lcd/enabled +echo "" > $mgr0/display +fbset -fb /dev/fb0 -xres $w -yres $h -vxres $w -vyres $h +# at this point you have to switch the dvi/lcd dip-switch from the omap board +echo "dvi" > $mgr0/display +echo "1" > $dvi/enabled + +After this the configuration looks like: + +FB0 --- GFX -\ -- DVI +FB1 --- VID1 --+- LCD -/ LCD +FB2 --- VID2 -/ TV ----- TV + +Example: Clone GFX overlay to LCD and TV +------------------------------- + +w=`cat $tv/timings | cut -d "," -f 2 | cut -d "/" -f 1` +h=`cat $tv/timings | cut -d "," -f 3 | cut -d "/" -f 1` + +echo "0" > $ovl0/enabled +echo "0" > $ovl1/enabled + +echo "" > $fb1/overlays +echo "0,1" > $fb0/overlays + +echo "$w,$h" > $ovl1/output_size +echo "tv" > $ovl1/manager + +echo "1" > $ovl0/enabled +echo "1" > $ovl1/enabled + +echo "1" > $tv/enabled + +After this the configuration looks like (only relevant parts shown): + +FB0 +-- GFX ---- LCD ---- LCD + \- VID1 ---- TV ---- TV + +Misc notes +---------- + +OMAP FB allocates the framebuffer memory using the OMAP VRAM allocator. + +Using DSI DPLL to generate pixel clock it is possible produce the pixel clock +of 86.5MHz (max possible), and with that you get 1280x1024@57 output from DVI. + +Rotation and mirroring currently only supports RGB565 and RGB8888 modes. VRFB +does not support mirroring. + +VRFB rotation requires much more memory than non-rotated framebuffer, so you +probably need to increase your vram setting before using VRFB rotation. Also, +many applications may not work with VRFB if they do not pay attention to all +framebuffer parameters. + +Kernel boot arguments +--------------------- + +vram= + - Amount of total VRAM to preallocate. For example, "10M". omapfb + allocates memory for framebuffers from VRAM. + +omapfb.mode=:[,...] + - Default video mode for specified displays. For example, + "dvi:800x400MR-24@60". See drivers/video/modedb.c. + There are also two special modes: "pal" and "ntsc" that + can be used to tv out. + +omapfb.vram=:[@][,...] + - VRAM allocated for a framebuffer. Normally omapfb allocates vram + depending on the display size. With this you can manually allocate + more or define the physical address of each framebuffer. For example, + "1:4M" to allocate 4M for fb1. + +omapfb.debug= + - Enable debug printing. You have to have OMAPFB debug support enabled + in kernel config. + +omapfb.test= + - Draw test pattern to framebuffer whenever framebuffer settings change. + You need to have OMAPFB debug support enabled in kernel config. + +omapfb.vrfb= + - Use VRFB rotation for all framebuffers. + +omapfb.rotate= + - Default rotation applied to all framebuffers. + 0 - 0 degree rotation + 1 - 90 degree rotation + 2 - 180 degree rotation + 3 - 270 degree rotation + +omapfb.mirror= + - Default mirror for all framebuffers. Only works with DMA rotation. + +omapdss.def_disp= + - Name of default display, to which all overlays will be connected. + Common examples are "lcd" or "tv". + +omapdss.debug= + - Enable debug printing. You have to have DSS debug support enabled in + kernel config. + +TODO +---- + +DSS locking + +Error checking +- Lots of checks are missing or implemented just as BUG() + +System DMA update for DSI +- Can be used for RGB16 and RGB24P modes. Probably not for RGB24U (how + to skip the empty byte?) + +OMAP1 support +- Not sure if needed + -- cgit v1.2.3 From 347a26860e2293b1347996876d3550499c7bb31f Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Wed, 9 Dec 2009 01:36:24 +0000 Subject: thinkpad-acpi: issue backlight class events Take advantage of the new events capabilities of the backlight class to notify userspace of backlight changes. This depends on "backlight: Allow drivers to update the core, and generate events on changes", by Matthew Garrett. Signed-off-by: Henrique de Moraes Holschuh Cc: Matthew Garrett Cc: Richard Purdie Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 51 ++++++++------------------------- drivers/platform/x86/thinkpad_acpi.c | 24 +++++++++++++++- 2 files changed, 35 insertions(+), 40 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index aafcaa634191..f5056c7fb5be 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -460,6 +460,8 @@ event code Key Notes For Lenovo ThinkPads with a new BIOS, it has to be handled either by the ACPI OSI, or by userspace. + The driver does the right thing, + never mess with this. 0x1011 0x10 FN+END Brightness down. See brightness up for details. @@ -582,46 +584,15 @@ with hotkey_report_mode. Brightness hotkey notes: -These are the current sane choices for brightness key mapping in -thinkpad-acpi: +Don't mess with the brightness hotkeys in a Thinkpad. If you want +notifications for OSD, use the sysfs backlight class event support. -For IBM and Lenovo models *without* ACPI backlight control (the ones on -which thinkpad-acpi will autoload its backlight interface by default, -and on which ACPI video does not export a backlight interface): - -1. Don't enable or map the brightness hotkeys in thinkpad-acpi, as - these older firmware versions unfortunately won't respect the hotkey - mask for brightness keys anyway, and always reacts to them. This - usually work fine, unless X.org drivers are doing something to block - the BIOS. In that case, use (3) below. This is the default mode of - operation. - -2. Enable the hotkeys, but map them to something else that is NOT - KEY_BRIGHTNESS_UP/DOWN or any other keycode that would cause - userspace to try to change the backlight level, and use that as an - on-screen-display hint. - -3. IF AND ONLY IF X.org drivers find a way to block the firmware from - automatically changing the brightness, enable the hotkeys and map - them to KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN, and feed that to - something that calls xbacklight. thinkpad-acpi will not be able to - change brightness in that case either, so you should disable its - backlight interface. - -For Lenovo models *with* ACPI backlight control: - -1. Load up ACPI video and use that. ACPI video will report ACPI - events for brightness change keys. Do not mess with thinkpad-acpi - defaults in this case. thinkpad-acpi should not have anything to do - with backlight events in a scenario where ACPI video is loaded: - brightness hotkeys must be disabled, and the backlight interface is - to be kept disabled as well. This is the default mode of operation. - -2. Do *NOT* load up ACPI video, enable the hotkeys in thinkpad-acpi, - and map them to KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN. Process - these keys on userspace somehow (e.g. by calling xbacklight). - The driver will do this automatically if it detects that ACPI video - has been disabled. +The driver will issue KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN events +automatically for the cases were userspace has to do something to +implement brightness changes. When you override these events, you will +either fail to handle properly the ThinkPads that require explicit +action to change backlight brightness, or the ThinkPads that require +that no action be taken to work properly. Bluetooth @@ -1465,3 +1436,5 @@ Sysfs interface changelog: and it is always able to disable hot keys. Very old thinkpads are properly supported. hotkey_bios_mask is deprecated and marked for removal. + +0x020600: Marker for backlight change event support. diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 6160813d906a..44061367a10d 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -22,7 +22,7 @@ */ #define TPACPI_VERSION "0.23" -#define TPACPI_SYSFS_VERSION 0x020500 +#define TPACPI_SYSFS_VERSION 0x020600 /* * Changelog: @@ -6073,6 +6073,12 @@ static int brightness_get(struct backlight_device *bd) return status & TP_EC_BACKLIGHT_LVLMSK; } +static void tpacpi_brightness_notify_change(void) +{ + backlight_force_update(ibm_backlight_device, + BACKLIGHT_UPDATE_HOTKEY); +} + static struct backlight_ops ibm_backlight_data = { .get_brightness = brightness_get, .update_status = brightness_update_status, @@ -6227,6 +6233,12 @@ static int __init brightness_init(struct ibm_init_struct *iibm) ibm_backlight_device->props.brightness = b & TP_EC_BACKLIGHT_LVLMSK; backlight_update_status(ibm_backlight_device); + vdbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_BRGHT, + "brightness: registering brightness hotkeys " + "as change notification\n"); + tpacpi_hotkey_driver_mask_set(hotkey_driver_mask + | TP_ACPI_HKEY_BRGHTUP_MASK + | TP_ACPI_HKEY_BRGHTDWN_MASK);; return 0; } @@ -6303,6 +6315,9 @@ static int brightness_write(char *buf) * Doing it this way makes the syscall restartable in case of EINTR */ rc = brightness_set(level); + if (!rc && ibm_backlight_device) + backlight_force_update(ibm_backlight_device, + BACKLIGHT_UPDATE_SYSFS); return (rc == -EINTR)? -ERESTARTSYS : rc; } @@ -7702,6 +7717,13 @@ static struct ibm_struct fan_driver_data = { */ static void tpacpi_driver_event(const unsigned int hkey_event) { + if (ibm_backlight_device) { + switch (hkey_event) { + case TP_HKEY_EV_BRGHT_UP: + case TP_HKEY_EV_BRGHT_DOWN: + tpacpi_brightness_notify_change(); + } + } } -- cgit v1.2.3 From 728900f6fa7142e07a67d10d862bcb774d7a3493 Mon Sep 17 00:00:00 2001 From: Corentin Chary Date: Thu, 3 Dec 2009 07:45:17 +0000 Subject: asus-laptop: schedule display_get and lcd_switch for removal Signed-off-by: Corentin Chary Signed-off-by: Len Brown --- Documentation/feature-removal-schedule.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index bc693fffabe0..d1c9e1763cfb 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -489,3 +489,23 @@ Why: With the recent innovations in CPU hardware acceleration technologies Who: Alok N Kataria ---------------------------- + +What: Support for lcd_switch and display_get in asus-laptop driver +When: March 2010 +Why: These two features use non-standard interfaces. There are the + only features that really need multiple path to guess what's + the right method name on a specific laptop. + + Removing them will allow to remove a lot of code an significantly + clean the drivers. + + This will affect the backlight code which won't be able to know + if the backlight is on or off. The platform display file will also be + write only (like the one in eeepc-laptop). + + This should'nt affect a lot of user because they usually know + when their display is on or off. + +Who: Corentin Chary + +---------------------------- -- cgit v1.2.3 From f7111821e51a58ee0f548f256743121ce1b365ae Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 9 Dec 2009 14:21:36 -0800 Subject: IB: Fix typo in ipoib.txt Delete extra words in "is to takes advantage of". Signed-off-by: Bart Van Assche Signed-off-by: Roland Dreier --- Documentation/infiniband/ipoib.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 6d40f00b358c..64eeb55d0c09 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -36,11 +36,11 @@ Datagram vs Connected modes fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes. In connected mode, the IB RC (Reliable Connected) transport is used. - Connected mode is to takes advantage of the connected nature of the - IB transport and allows an MTU up to the maximal IP packet size of - 64K, which reduces the number of IP packets needed for handling - large UDP datagrams, TCP segments, etc and increases the performance - for large messages. + Connected mode takes advantage of the connected nature of the IB + transport and allows an MTU up to the maximal IP packet size of 64K, + which reduces the number of IP packets needed for handling large UDP + datagrams, TCP segments, etc and increases the performance for large + messages. In connected mode, the interface's UD QP is still used for multicast and communication with peers that don't support connected mode. In -- cgit v1.2.3 From 9f249162fbf1aea1335e13c57fd5355d00e07a47 Mon Sep 17 00:00:00 2001 From: Thadeu Lima de Souza Cascardo Date: Mon, 27 Jul 2009 13:26:32 -0300 Subject: trivial: some small fixes in exofs documentation Add exofs.txt to filesystems Documentation index and fix some typos, identation and grammar. Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Boaz Harrosh --- Documentation/filesystems/00-INDEX | 2 ++ Documentation/filesystems/exofs.txt | 23 ++++++++++++----------- 2 files changed, 14 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index f15621ee5599..7001782ab932 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -36,6 +36,8 @@ dnotify.txt - info about directory notification in Linux. ecryptfs.txt - docs on eCryptfs: stacked cryptographic filesystem for Linux. +exofs.txt + - info, usage, mount options, design about EXOFS. ext2.txt - info, mount options and specifications for the Ext2 filesystem. ext3.txt diff --git a/Documentation/filesystems/exofs.txt b/Documentation/filesystems/exofs.txt index 0ced74c2f73c..abd2a9b5b787 100644 --- a/Documentation/filesystems/exofs.txt +++ b/Documentation/filesystems/exofs.txt @@ -60,13 +60,13 @@ USAGE mkfs.exofs --pid=65536 --format /dev/osd0 - The --format is optional if not specified no OSD_FORMAT will be - preformed and a clean file system will be created in the specified pid, + The --format is optional. If not specified, no OSD_FORMAT will be + performed and a clean file system will be created in the specified pid, in the available space of the target. (Use --format=size_in_meg to limit the total LUN space available) - If pid already exist it will be deleted and a new one will be created in it's - place. Be careful. + If pid already exists, it will be deleted and a new one will be created in + its place. Be careful. An exofs lives inside a single OSD partition. You can create multiple exofs filesystems on the same device using multiple pids. @@ -81,7 +81,7 @@ USAGE 7. For reference (See do-exofs example script): do-exofs start - an example of how to perform the above steps. - do-exofs stop - an example of how to unmount the file system. + do-exofs stop - an example of how to unmount the file system. do-exofs format - an example of how to format and mkfs a new exofs. 8. Extra compilation flags (uncomment in fs/exofs/Kbuild): @@ -104,8 +104,8 @@ Where: exofs specific options: Options are separated by commas (,) pid= - The partition number to mount/create as container of the filesystem. - This option is mandatory - to= - Timeout in ticks for a single command + This option is mandatory. + to= - Timeout in ticks for a single command. default is (60 * HZ) [for debugging only] =============================================================================== @@ -116,7 +116,7 @@ DESIGN with a special ID (defined in common.h). Information included in the file system control block is used to fill the in-memory superblock structure at mount time. This object is created before - the file system is used by mkexofs.c It contains information such as: + the file system is used by mkexofs.c. It contains information such as: - The file system's magic number - The next inode number to be allocated @@ -134,8 +134,8 @@ DESIGN attributes. This applies to both regular files and other types (directories, device files, symlinks, etc.). -* Credentials are generated per object (inode and superblock) when they is - created in memory (read off disk or created). The credential works for all +* Credentials are generated per object (inode and superblock) when they are + created in memory (read from disk or created). The credential works for all operations and is used as long as the object remains in memory. * Async OSD operations are used whenever possible, but the target may execute @@ -145,7 +145,8 @@ DESIGN from executing in reverse order: - The following are handled with the OBJ_CREATED and OBJ_2BCREATED flags. OBJ_CREATED is set when we know the object exists on the OSD - - in create's callback function, and when we successfully do a read_inode. + in create's callback function, and when we successfully do a + read_inode. OBJ_2BCREATED is set in the beginning of the create function, so we know that we should wait. - create/delete: delete should wait until the object is created -- cgit v1.2.3 From 94004ed726f38a841cc51f97c4a3f9eda9fbd0d9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 30 Sep 2009 22:16:33 +0200 Subject: kill wait_on_page_writeback_range All callers really want the more logical filemap_fdatawait_range interface, so convert them to use it and merge wait_on_page_writeback_range into filemap_fdatawait_range. Signed-off-by: Christoph Hellwig Signed-off-by: Jan Kara --- Documentation/filesystems/vfs.txt | 2 +- fs/jbd2/commit.c | 2 +- fs/sync.c | 8 ++----- include/linux/fs.h | 2 -- mm/filemap.c | 49 +++++++++++---------------------------- 5 files changed, 18 insertions(+), 45 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 623f094c9d8d..3de2f32edd90 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -472,7 +472,7 @@ __sync_single_inode) to check if ->writepages has been successful in writing out the whole address_space. The Writeback tag is used by filemap*wait* and sync_page* functions, -via wait_on_page_writeback_range, to wait for all writeback to +via filemap_fdatawait_range, to wait for all writeback to complete. While waiting ->sync_page (if defined) will be called on each page that is found to require writeback. diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index d4cfd6d2779e..c5edc13ccdd0 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -286,7 +286,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal, if (err) { /* * Because AS_EIO is cleared by - * wait_on_page_writeback_range(), set it again so + * filemap_fdatawait_range(), set it again so * that user process can get -EIO from fsync(). */ set_bit(AS_EIO, diff --git a/fs/sync.c b/fs/sync.c index b75ca68dc081..36752a683481 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -453,9 +453,7 @@ int do_sync_mapping_range(struct address_space *mapping, loff_t offset, ret = 0; if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { - ret = wait_on_page_writeback_range(mapping, - offset >> PAGE_CACHE_SHIFT, - endbyte >> PAGE_CACHE_SHIFT); + ret = filemap_fdatawait_range(mapping, offset, endbyte); if (ret < 0) goto out; } @@ -468,9 +466,7 @@ int do_sync_mapping_range(struct address_space *mapping, loff_t offset, } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) { - ret = wait_on_page_writeback_range(mapping, - offset >> PAGE_CACHE_SHIFT, - endbyte >> PAGE_CACHE_SHIFT); + ret = filemap_fdatawait_range(mapping, offset, endbyte); } out: return ret; diff --git a/include/linux/fs.h b/include/linux/fs.h index 891f7d642e5c..a057f48eb156 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2091,8 +2091,6 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); -extern int wait_on_page_writeback_range(struct address_space *mapping, - pgoff_t start, pgoff_t end); extern int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end, int sync_mode); extern int filemap_fdatawrite_range(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index c3d3506ecaba..8b4d88f9249e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -260,27 +260,27 @@ int filemap_flush(struct address_space *mapping) EXPORT_SYMBOL(filemap_flush); /** - * wait_on_page_writeback_range - wait for writeback to complete - * @mapping: target address_space - * @start: beginning page index - * @end: ending page index + * filemap_fdatawait_range - wait for writeback to complete + * @mapping: address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) * - * Wait for writeback to complete against pages indexed by start->end - * inclusive + * Walk the list of under-writeback pages of the given address space + * in the given range and wait for all of them. */ -int wait_on_page_writeback_range(struct address_space *mapping, - pgoff_t start, pgoff_t end) +int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, + loff_t end_byte) { + pgoff_t index = start_byte >> PAGE_CACHE_SHIFT; + pgoff_t end = end_byte >> PAGE_CACHE_SHIFT; struct pagevec pvec; int nr_pages; int ret = 0; - pgoff_t index; - if (end < start) + if (end_byte < start_byte) return 0; pagevec_init(&pvec, 0); - index = start; while ((index <= end) && (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_WRITEBACK, @@ -310,25 +310,6 @@ int wait_on_page_writeback_range(struct address_space *mapping, return ret; } - -/** - * filemap_fdatawait_range - wait for all under-writeback pages to complete in a given range - * @mapping: address space structure to wait for - * @start: offset in bytes where the range starts - * @end: offset in bytes where the range ends (inclusive) - * - * Walk the list of under-writeback pages of the given address space - * in the given range and wait for all of them. - * - * This is just a simple wrapper so that callers don't have to convert offsets - * to page indexes themselves - */ -int filemap_fdatawait_range(struct address_space *mapping, loff_t start, - loff_t end) -{ - return wait_on_page_writeback_range(mapping, start >> PAGE_CACHE_SHIFT, - end >> PAGE_CACHE_SHIFT); -} EXPORT_SYMBOL(filemap_fdatawait_range); /** @@ -345,8 +326,7 @@ int filemap_fdatawait(struct address_space *mapping) if (i_size == 0) return 0; - return wait_on_page_writeback_range(mapping, 0, - (i_size - 1) >> PAGE_CACHE_SHIFT); + return filemap_fdatawait_range(mapping, 0, i_size - 1); } EXPORT_SYMBOL(filemap_fdatawait); @@ -393,9 +373,8 @@ int filemap_write_and_wait_range(struct address_space *mapping, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ if (err != -EIO) { - int err2 = wait_on_page_writeback_range(mapping, - lstart >> PAGE_CACHE_SHIFT, - lend >> PAGE_CACHE_SHIFT); + int err2 = filemap_fdatawait_range(mapping, + lstart, lend); if (!err) err = err2; } -- cgit v1.2.3 From dee1d3b6270a7cf5cc65c493a2ab4ebaad1a1caf Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Mon, 16 Nov 2009 16:50:49 -0600 Subject: ext3: make "norecovery" an alias for "noload" Users on the list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext3. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen Signed-off-by: Jan Kara --- Documentation/filesystems/ext3.txt | 4 ++-- fs/ext3/super.c | 4 ++++ 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index 05d5cf1d743f..867c5b50cb42 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -32,8 +32,8 @@ journal_dev=devnum When the external journal device's major/minor numbers identified through its new major/minor numbers encoded in devnum. -noload Don't load the journal on mounting. Note that this forces - mount of inconsistent filesystem, which can lead to +norecovery Don't load the journal on mounting. Note that this forces +noload mount of inconsistent filesystem, which can lead to various problems. data=journal All data are committed into the journal prior to being diff --git a/fs/ext3/super.c b/fs/ext3/super.c index ca3068fd2346..c1d128d765b6 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -636,6 +636,9 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, DATA_ERR_ABORT)) seq_puts(seq, ",data_err=abort"); + if (test_opt(sb, NOLOAD)) + seq_puts(seq, ",norecovery"); + ext3_show_quota_options(seq, sb); return 0; @@ -818,6 +821,7 @@ static const match_table_t tokens = { {Opt_reservation, "reservation"}, {Opt_noreservation, "noreservation"}, {Opt_noload, "noload"}, + {Opt_noload, "norecovery"}, {Opt_nobh, "nobh"}, {Opt_bh, "bh"}, {Opt_commit, "commit=%u"}, -- cgit v1.2.3 From d698aa4500aa3ca9559142060caf0f79da998744 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Thu, 10 Dec 2009 23:52:30 +0000 Subject: dm snapshot: add merge target The snapshot-merge target allows a snapshot to be merged back into the snapshot's origin device. One anticipated use of snapshot merging is the rollback of filesystems to back out problematic system upgrades. This patch adds snapshot-merge target management to both dm_snapshot_init() and dm_snapshot_exit(). As an initial place-holder, snapshot-merge is identical to the snapshot target. Documentation is provided. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Alasdair G Kergon --- Documentation/device-mapper/snapshot.txt | 60 +++++++++++++++++++++++++++++--- drivers/md/dm-snap.c | 53 +++++++++++++++++++++------- 2 files changed, 96 insertions(+), 17 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt index a5009c8300f3..e3a77b215135 100644 --- a/Documentation/device-mapper/snapshot.txt +++ b/Documentation/device-mapper/snapshot.txt @@ -8,13 +8,19 @@ the block device which are also writable without interfering with the original content; *) To create device "forks", i.e. multiple different versions of the same data stream. +*) To merge a snapshot of a block device back into the snapshot's origin +device. +In the first two cases, dm copies only the chunks of data that get +changed and uses a separate copy-on-write (COW) block device for +storage. -In both cases, dm copies only the chunks of data that get changed and -uses a separate copy-on-write (COW) block device for storage. +For snapshot merge the contents of the COW storage are merged back into +the origin device. -There are two dm targets available: snapshot and snapshot-origin. +There are three dm targets available: +snapshot, snapshot-origin, and snapshot-merge. *) snapshot-origin @@ -40,8 +46,25 @@ The difference is that for transient snapshots less metadata must be saved on disk - they can be kept in memory by the kernel. -How this is used by LVM2 -======================== +* snapshot-merge + +takes the same table arguments as the snapshot target except it only +works with persistent snapshots. This target assumes the role of the +"snapshot-origin" target and must not be loaded if the "snapshot-origin" +is still present for . + +Creates a merging snapshot that takes control of the changed chunks +stored in the of an existing snapshot, through a handover +procedure, and merges these chunks back into the . Once merging +has started (in the background) the may be opened and the merge +will continue while I/O is flowing to it. Changes to the are +deferred until the merging snapshot's corresponding chunk(s) have been +merged. Once merging has started the snapshot device, associated with +the "snapshot" target, will return -EIO when accessed. + + +How snapshot is used by LVM2 +============================ When you create the first LVM2 snapshot of a volume, four dm devices are used: 1) a device containing the original mapping table of the source volume; @@ -72,3 +95,30 @@ brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base + +How snapshot-merge is used by LVM2 +================================== +A merging snapshot assumes the role of the "snapshot-origin" while +merging. As such the "snapshot-origin" is replaced with +"snapshot-merge". The "-real" device is not changed and the "-cow" +device is renamed to -cow to aid LVM2's cleanup of the +merging snapshot after it completes. The "snapshot" that hands over its +COW device to the "snapshot-merge" is deactivated (unless using lvchange +--refresh); but if it is left active it will simply return I/O errors. + +A snapshot will merge into its origin with the following command: + +lvconvert --merge volumeGroup/snap + +we'll now have this situation: + +# dmsetup table|grep volumeGroup + +volumeGroup-base-real: 0 2097152 linear 8:19 384 +volumeGroup-base-cow: 0 204800 linear 8:19 2097536 +volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 + +# ls -lL /dev/mapper/volumeGroup-* +brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real +brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow +brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 288994ee7142..446827f98236 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -25,6 +25,11 @@ #define DM_MSG_PREFIX "snapshots" +static const char dm_snapshot_merge_target_name[] = "snapshot-merge"; + +#define dm_target_is_snapshot_merge(ti) \ + ((ti)->type->name == dm_snapshot_merge_target_name) + /* * The percentage increment we will wake up users at */ @@ -1743,6 +1748,21 @@ static struct target_type snapshot_target = { .iterate_devices = snapshot_iterate_devices, }; +static struct target_type merge_target = { + .name = dm_snapshot_merge_target_name, + .version = {1, 0, 0}, + .module = THIS_MODULE, + .ctr = snapshot_ctr, + .dtr = snapshot_dtr, + .map = snapshot_map, + .end_io = snapshot_end_io, + .postsuspend = snapshot_postsuspend, + .preresume = snapshot_preresume, + .resume = snapshot_resume, + .status = snapshot_status, + .iterate_devices = snapshot_iterate_devices, +}; + static int __init dm_snapshot_init(void) { int r; @@ -1754,7 +1774,7 @@ static int __init dm_snapshot_init(void) } r = dm_register_target(&snapshot_target); - if (r) { + if (r < 0) { DMERR("snapshot target register failed %d", r); goto bad_register_snapshot_target; } @@ -1762,34 +1782,40 @@ static int __init dm_snapshot_init(void) r = dm_register_target(&origin_target); if (r < 0) { DMERR("Origin target register failed %d", r); - goto bad1; + goto bad_register_origin_target; + } + + r = dm_register_target(&merge_target); + if (r < 0) { + DMERR("Merge target register failed %d", r); + goto bad_register_merge_target; } r = init_origin_hash(); if (r) { DMERR("init_origin_hash failed."); - goto bad2; + goto bad_origin_hash; } exception_cache = KMEM_CACHE(dm_exception, 0); if (!exception_cache) { DMERR("Couldn't create exception cache."); r = -ENOMEM; - goto bad3; + goto bad_exception_cache; } pending_cache = KMEM_CACHE(dm_snap_pending_exception, 0); if (!pending_cache) { DMERR("Couldn't create pending cache."); r = -ENOMEM; - goto bad4; + goto bad_pending_cache; } tracked_chunk_cache = KMEM_CACHE(dm_snap_tracked_chunk, 0); if (!tracked_chunk_cache) { DMERR("Couldn't create cache to track chunks in use."); r = -ENOMEM; - goto bad5; + goto bad_tracked_chunk_cache; } ksnapd = create_singlethread_workqueue("ksnapd"); @@ -1803,19 +1829,21 @@ static int __init dm_snapshot_init(void) bad_pending_pool: kmem_cache_destroy(tracked_chunk_cache); -bad5: +bad_tracked_chunk_cache: kmem_cache_destroy(pending_cache); -bad4: +bad_pending_cache: kmem_cache_destroy(exception_cache); -bad3: +bad_exception_cache: exit_origin_hash(); -bad2: +bad_origin_hash: + dm_unregister_target(&merge_target); +bad_register_merge_target: dm_unregister_target(&origin_target); -bad1: +bad_register_origin_target: dm_unregister_target(&snapshot_target); - bad_register_snapshot_target: dm_exception_store_exit(); + return r; } @@ -1825,6 +1853,7 @@ static void __exit dm_snapshot_exit(void) dm_unregister_target(&snapshot_target); dm_unregister_target(&origin_target); + dm_unregister_target(&merge_target); exit_origin_hash(); kmem_cache_destroy(pending_cache); -- cgit v1.2.3 From a1a541d86f50a9957beeedb122a035870d602647 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Wed, 2 Dec 2009 13:31:00 +0800 Subject: ACPI: support customizing ACPI control methods at runtime Introduce a new debugfs I/F (/sys/kernel/debug/acpi/custom_method) for ACPI, which can be used to customize the ACPI control methods at runtime. We can use this to debug the AML code level bugs instead of overriding the whole DSDT table, without rebuilding/rebooting kernel any more. Detailed description about how to use this debugfs I/F is stated in Documentation/acpi/method-customizing.txt Signed-off-by: Zhang Rui Signed-off-by: Len Brown --- Documentation/acpi/method-customizing.txt | 66 ++++++++++++++++++++++++ drivers/acpi/debug.c | 83 ++++++++++++++++++++++++++++++- 2 files changed, 148 insertions(+), 1 deletion(-) create mode 100644 Documentation/acpi/method-customizing.txt (limited to 'Documentation') diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt new file mode 100644 index 000000000000..e628cd23ca80 --- /dev/null +++ b/Documentation/acpi/method-customizing.txt @@ -0,0 +1,66 @@ +Linux ACPI Custom Control Method How To +======================================= + +Written by Zhang Rui + + +Linux supports customizing ACPI control methods at runtime. + +Users can use this to +1. override an existing method which may not work correctly, + or just for debugging purposes. +2. insert a completely new method in order to create a missing + method such as _OFF, _ON, _STA, _INI, etc. +For these cases, it is far simpler to dynamically install a single +control method rather than override the entire DSDT, because kernel +rebuild/reboot is not needed and test result can be got in minutes. + +Note: Only ACPI METHOD can be overridden, any other object types like + "Device", "OperationRegion", are not recognized. +Note: The same ACPI control method can be overridden for many times, + and it's always the latest one that used by Linux/kernel. + +1. override an existing method + a) get the ACPI table via ACPI sysfs I/F. e.g. to get the DSDT, + just run "cat /sys/firmware/acpi/tables/DSDT > /tmp/dsdt.dat" + b) disassemble the table by running "iasl -d dsdt.dat". + c) rewrite the ASL code of the method and save it in a new file, + d) package the new file (psr.asl) to an ACPI table format. + Here is an example of a customized \_SB._AC._PSR method, + + DefinitionBlock ("", "SSDT", 1, "", "", 0x20080715) + { + External (ACON) + + Method (\_SB_.AC._PSR, 0, NotSerialized) + { + Store ("In AC _PSR", Debug) + Return (ACON) + } + } + Note that the full pathname of the method in ACPI namespace + should be used. + And remember to use "External" to declare external objects. + e) assemble the file to generate the AML code of the method. + e.g. "iasl psr.asl" (psr.aml is generated as a result) + f) mount debugfs by "mount -t debugfs none /sys/kernel/debug" + g) override the old method via the debugfs by running + "cat /tmp/psr.aml > /sys/kernel/debug/acpi/custom_method" + +2. insert a new method + This is easier than overriding an existing method. + We just need to create the ASL code of the method we want to + insert and then follow the step c) ~ g) in section 1. + +3. undo your changes + The "undo" operation is not supported for a new inserted method + right now, i.e. we can not remove a method currently. + For an overrided method, in order to undo your changes, please + save a copy of the method original ASL code in step c) section 1, + and redo step c) ~ g) to override the method with the original one. + + +Note: We can use a kernel with multiple custom ACPI method running, + But each individual write to debugfs can implement a SINGLE + method override. i.e. if we want to insert/override multiple + ACPI methods, we need to redo step c) ~ g) for multiple times. diff --git a/drivers/acpi/debug.c b/drivers/acpi/debug.c index 8a690c3b8e23..0bedd1554f2f 100644 --- a/drivers/acpi/debug.c +++ b/drivers/acpi/debug.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -195,6 +196,79 @@ static int param_get_trace_state(char *buffer, struct kernel_param *kp) module_param_call(trace_state, param_set_trace_state, param_get_trace_state, NULL, 0644); +/* -------------------------------------------------------------------------- + DebugFS Interface + -------------------------------------------------------------------------- */ + +static ssize_t cm_write(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + static char *buf; + static int uncopied_bytes; + struct acpi_table_header table; + acpi_status status; + + if (!(*ppos)) { + /* parse the table header to get the table length */ + if (count <= sizeof(struct acpi_table_header)) + return -EINVAL; + if (copy_from_user(&table, user_buf, + sizeof(struct acpi_table_header))) + return -EFAULT; + uncopied_bytes = table.length; + buf = kzalloc(uncopied_bytes, GFP_KERNEL); + if (!buf) + return -ENOMEM; + } + + if (uncopied_bytes < count) { + kfree(buf); + return -EINVAL; + } + + if (copy_from_user(buf + (*ppos), user_buf, count)) { + kfree(buf); + return -EFAULT; + } + + uncopied_bytes -= count; + *ppos += count; + + if (!uncopied_bytes) { + status = acpi_install_method(buf); + kfree(buf); + if (ACPI_FAILURE(status)) + return -EINVAL; + } + + return count; +} + +static const struct file_operations cm_fops = { + .write = cm_write, +}; + +static int acpi_debugfs_init(void) +{ + struct dentry *acpi_dir, *cm_dentry; + + acpi_dir = debugfs_create_dir("acpi", NULL); + if (!acpi_dir) + goto err; + + cm_dentry = debugfs_create_file("custom_method", S_IWUGO, + acpi_dir, NULL, &cm_fops); + if (!cm_dentry) + goto err; + + return 0; + +err: + if (acpi_dir) + debugfs_remove(acpi_dir); + return -EINVAL; +} + /* -------------------------------------------------------------------------- FS Interface (/proc) -------------------------------------------------------------------------- */ @@ -286,7 +360,7 @@ static const struct file_operations acpi_system_debug_proc_fops = { }; #endif -int __init acpi_debug_init(void) +int __init acpi_procfs_init(void) { #ifdef CONFIG_ACPI_PROCFS struct proc_dir_entry *entry; @@ -321,3 +395,10 @@ int __init acpi_debug_init(void) return 0; #endif } + +int __init acpi_debug_init(void) +{ + acpi_debugfs_init(); + acpi_procfs_init(); + return 0; +} -- cgit v1.2.3 From 08d3c18e6674c5d46e4333a462b1e2e4c4ded1d4 Mon Sep 17 00:00:00 2001 From: Julie Zhu Date: Mon, 21 Sep 2009 16:08:19 -0600 Subject: USB: Add support for Xilinx USB host controller Add bus glue driver for Xilinx USB host controller. The controller can be configured as HS only or HS/FS hybrid. The driver uses the device tree file to configure the driver according to the setting in the hardware system. This driver has been tested with usbtest using the NET2280 PCI card. Signed-off-by: Julie Zhu Signed-off-by: John Linn Acked-by: Grant Likely Signed-off-by: Greg Kroah-Hartman --- Documentation/powerpc/dts-bindings/xilinx.txt | 11 + drivers/usb/host/Kconfig | 15 +- drivers/usb/host/ehci-hcd.c | 5 + drivers/usb/host/ehci-xilinx-of.c | 300 ++++++++++++++++++++++++++ 4 files changed, 329 insertions(+), 2 deletions(-) create mode 100644 drivers/usb/host/ehci-xilinx-of.c (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/xilinx.txt b/Documentation/powerpc/dts-bindings/xilinx.txt index 80339fe4300b..ea68046bb9cb 100644 --- a/Documentation/powerpc/dts-bindings/xilinx.txt +++ b/Documentation/powerpc/dts-bindings/xilinx.txt @@ -292,4 +292,15 @@ - reg-offset : A value of 3 is required - reg-shift : A value of 2 is required + vii) Xilinx USB Host controller + + The Xilinx USB host controller is EHCI compatible but with a different + base address for the EHCI registers, and it is always a big-endian + USB Host controller. The hardware can be configured as high speed only, + or high speed/full speed hybrid. + + Required properties: + - xlnx,support-usb-fs: A value 0 means the core is built as high speed + only. A value 1 means the core also supports + full speed devices. diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig index 9b43b226817f..6d97e039cdbb 100644 --- a/drivers/usb/host/Kconfig +++ b/drivers/usb/host/Kconfig @@ -90,14 +90,25 @@ config USB_EHCI_TT_NEWSCHED config USB_EHCI_BIG_ENDIAN_MMIO bool - depends on USB_EHCI_HCD && (PPC_CELLEB || PPC_PS3 || 440EPX || ARCH_IXP4XX) + depends on USB_EHCI_HCD && (PPC_CELLEB || PPC_PS3 || 440EPX || ARCH_IXP4XX || XPS_USB_HCD_XILINX) default y config USB_EHCI_BIG_ENDIAN_DESC bool - depends on USB_EHCI_HCD && (440EPX || ARCH_IXP4XX) + depends on USB_EHCI_HCD && (440EPX || ARCH_IXP4XX || XPS_USB_HCD_XILINX) default y +config XPS_USB_HCD_XILINX + bool "Use Xilinx usb host EHCI controller core" + depends on USB_EHCI_HCD && (PPC32 || MICROBLAZE) + select USB_EHCI_BIG_ENDIAN_DESC + select USB_EHCI_BIG_ENDIAN_MMIO + ---help--- + Xilinx xps USB host controller core is EHCI compilant and has + transaction translator built-in. It can be configured to either + support both high speed and full speed devices, or high speed + devices only. + config USB_EHCI_FSL bool "Support for Freescale on-chip EHCI USB controller" depends on USB_EHCI_HCD && FSL_SOC diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index d8f4aaa616f2..e2afb2b5faf8 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -1120,6 +1120,11 @@ MODULE_LICENSE ("GPL"); #define OF_PLATFORM_DRIVER ehci_hcd_ppc_of_driver #endif +#ifdef CONFIG_XPS_USB_HCD_XILINX +#include "ehci-xilinx-of.c" +#define OF_PLATFORM_DRIVER ehci_hcd_xilinx_of_driver +#endif + #ifdef CONFIG_PLAT_ORION #include "ehci-orion.c" #define PLATFORM_DRIVER ehci_orion_driver diff --git a/drivers/usb/host/ehci-xilinx-of.c b/drivers/usb/host/ehci-xilinx-of.c new file mode 100644 index 000000000000..a5861531ad3e --- /dev/null +++ b/drivers/usb/host/ehci-xilinx-of.c @@ -0,0 +1,300 @@ +/* + * EHCI HCD (Host Controller Driver) for USB. + * + * Bus Glue for Xilinx EHCI core on the of_platform bus + * + * Copyright (c) 2009 Xilinx, Inc. + * + * Based on "ehci-ppc-of.c" by Valentine Barshak + * and "ehci-ppc-soc.c" by Stefan Roese + * and "ohci-ppc-of.c" by Sylvain Munaut + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + */ + +#include + +#include +#include + +/** + * ehci_xilinx_of_setup - Initialize the device for ehci_reset() + * @hcd: Pointer to the usb_hcd device to which the host controller bound + * + * called during probe() after chip reset completes. + */ +static int ehci_xilinx_of_setup(struct usb_hcd *hcd) +{ + struct ehci_hcd *ehci = hcd_to_ehci(hcd); + int retval; + + retval = ehci_halt(ehci); + if (retval) + return retval; + + retval = ehci_init(hcd); + if (retval) + return retval; + + ehci->sbrn = 0x20; + + return ehci_reset(ehci); +} + +/** + * ehci_xilinx_port_handed_over - hand the port out if failed to enable it + * @hcd: Pointer to the usb_hcd device to which the host controller bound + * @portnum:Port number to which the device is attached. + * + * This function is used as a place to tell the user that the Xilinx USB host + * controller does support LS devices. And in an HS only configuration, it + * does not support FS devices either. It is hoped that this can help a + * confused user. + * + * There are cases when the host controller fails to enable the port due to, + * for example, insufficient power that can be supplied to the device from + * the USB bus. In those cases, the messages printed here are not helpful. + */ +static int ehci_xilinx_port_handed_over(struct usb_hcd *hcd, int portnum) +{ + dev_warn(hcd->self.controller, "port %d cannot be enabled\n", portnum); + if (hcd->has_tt) { + dev_warn(hcd->self.controller, + "Maybe you have connected a low speed device?\n"); + + dev_warn(hcd->self.controller, + "We do not support low speed devices\n"); + } else { + dev_warn(hcd->self.controller, + "Maybe your device is not a high speed device?\n"); + dev_warn(hcd->self.controller, + "The USB host controller does not support full speed " + "nor low speed devices\n"); + dev_warn(hcd->self.controller, + "You can reconfigure the host controller to have " + "full speed support\n"); + } + + return 0; +} + + +static const struct hc_driver ehci_xilinx_of_hc_driver = { + .description = hcd_name, + .product_desc = "OF EHCI", + .hcd_priv_size = sizeof(struct ehci_hcd), + + /* + * generic hardware linkage + */ + .irq = ehci_irq, + .flags = HCD_MEMORY | HCD_USB2, + + /* + * basic lifecycle operations + */ + .reset = ehci_xilinx_of_setup, + .start = ehci_run, + .stop = ehci_stop, + .shutdown = ehci_shutdown, + + /* + * managing i/o requests and associated device resources + */ + .urb_enqueue = ehci_urb_enqueue, + .urb_dequeue = ehci_urb_dequeue, + .endpoint_disable = ehci_endpoint_disable, + + /* + * scheduling support + */ + .get_frame_number = ehci_get_frame, + + /* + * root hub support + */ + .hub_status_data = ehci_hub_status_data, + .hub_control = ehci_hub_control, +#ifdef CONFIG_PM + .bus_suspend = ehci_bus_suspend, + .bus_resume = ehci_bus_resume, +#endif + .relinquish_port = NULL, + .port_handed_over = ehci_xilinx_port_handed_over, + + .clear_tt_buffer_complete = ehci_clear_tt_buffer_complete, +}; + +/** + * ehci_hcd_xilinx_of_probe - Probe method for the USB host controller + * @op: pointer to the of_device to which the host controller bound + * @match: pointer to of_device_id structure, not used + * + * This function requests resources and sets up appropriate properties for the + * host controller. Because the Xilinx USB host controller can be configured + * as HS only or HS/FS only, it checks the configuration in the device tree + * entry, and sets an appropriate value for hcd->has_tt. + */ +static int __devinit +ehci_hcd_xilinx_of_probe(struct of_device *op, const struct of_device_id *match) +{ + struct device_node *dn = op->node; + struct usb_hcd *hcd; + struct ehci_hcd *ehci; + struct resource res; + int irq; + int rv; + int *value; + + if (usb_disabled()) + return -ENODEV; + + dev_dbg(&op->dev, "initializing XILINX-OF USB Controller\n"); + + rv = of_address_to_resource(dn, 0, &res); + if (rv) + return rv; + + hcd = usb_create_hcd(&ehci_xilinx_of_hc_driver, &op->dev, + "XILINX-OF USB"); + if (!hcd) + return -ENOMEM; + + hcd->rsrc_start = res.start; + hcd->rsrc_len = res.end - res.start + 1; + + if (!request_mem_region(hcd->rsrc_start, hcd->rsrc_len, hcd_name)) { + printk(KERN_ERR __FILE__ ": request_mem_region failed\n"); + rv = -EBUSY; + goto err_rmr; + } + + irq = irq_of_parse_and_map(dn, 0); + if (irq == NO_IRQ) { + printk(KERN_ERR __FILE__ ": irq_of_parse_and_map failed\n"); + rv = -EBUSY; + goto err_irq; + } + + hcd->regs = ioremap(hcd->rsrc_start, hcd->rsrc_len); + if (!hcd->regs) { + printk(KERN_ERR __FILE__ ": ioremap failed\n"); + rv = -ENOMEM; + goto err_ioremap; + } + + ehci = hcd_to_ehci(hcd); + + /* This core always has big-endian register interface and uses + * big-endian memory descriptors. + */ + ehci->big_endian_mmio = 1; + ehci->big_endian_desc = 1; + + /* Check whether the FS support option is selected in the hardware. + */ + value = (int *)of_get_property(dn, "xlnx,support-usb-fs", NULL); + if (value && (*value == 1)) { + ehci_dbg(ehci, "USB host controller supports FS devices\n"); + hcd->has_tt = 1; + } else { + ehci_dbg(ehci, + "USB host controller is HS only\n"); + hcd->has_tt = 0; + } + + /* Debug registers are at the first 0x100 region + */ + ehci->caps = hcd->regs + 0x100; + ehci->regs = hcd->regs + 0x100 + + HC_LENGTH(ehci_readl(ehci, &ehci->caps->hc_capbase)); + + /* cache this readonly data; minimize chip reads */ + ehci->hcs_params = ehci_readl(ehci, &ehci->caps->hcs_params); + + rv = usb_add_hcd(hcd, irq, 0); + if (rv == 0) + return 0; + + iounmap(hcd->regs); + +err_ioremap: +err_irq: + release_mem_region(hcd->rsrc_start, hcd->rsrc_len); +err_rmr: + usb_put_hcd(hcd); + + return rv; +} + +/** + * ehci_hcd_xilinx_of_remove - shutdown hcd and release resources + * @op: pointer to of_device structure that is to be removed + * + * Remove the hcd structure, and release resources that has been requested + * during probe. + */ +static int ehci_hcd_xilinx_of_remove(struct of_device *op) +{ + struct usb_hcd *hcd = dev_get_drvdata(&op->dev); + dev_set_drvdata(&op->dev, NULL); + + dev_dbg(&op->dev, "stopping XILINX-OF USB Controller\n"); + + usb_remove_hcd(hcd); + + iounmap(hcd->regs); + release_mem_region(hcd->rsrc_start, hcd->rsrc_len); + + usb_put_hcd(hcd); + + return 0; +} + +/** + * ehci_hcd_xilinx_of_shutdown - shutdown the hcd + * @op: pointer to of_device structure that is to be removed + * + * Properly shutdown the hcd, call driver's shutdown routine. + */ +static int ehci_hcd_xilinx_of_shutdown(struct of_device *op) +{ + struct usb_hcd *hcd = dev_get_drvdata(&op->dev); + + if (hcd->driver->shutdown) + hcd->driver->shutdown(hcd); + + return 0; +} + + +static struct of_device_id ehci_hcd_xilinx_of_match[] = { + {.compatible = "xlnx,xps-usb-host-1.00.a",}, + {}, +}; +MODULE_DEVICE_TABLE(of, ehci_hcd_xilinx_of_match); + +static struct of_platform_driver ehci_hcd_xilinx_of_driver = { + .name = "xilinx-of-ehci", + .match_table = ehci_hcd_xilinx_of_match, + .probe = ehci_hcd_xilinx_of_probe, + .remove = ehci_hcd_xilinx_of_remove, + .shutdown = ehci_hcd_xilinx_of_shutdown, + .driver = { + .name = "xilinx-of-ehci", + .owner = THIS_MODULE, + }, +}; -- cgit v1.2.3 From c3f22d92a1249665d4cd87a68a4078a56002c3ab Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Mon, 12 Oct 2009 15:45:18 +0000 Subject: USB: wusb: add wusb_phy_rate sysfs file to host controllers Add the wusb_phy_rate sysfs file to Wireless USB host controllers. This sets the maximum PHY rate that will be used for all connected devices. Signed-off-by: David Vrabel Signed-off-by: Greg Kroah-Hartman --- .../ABI/testing/sysfs-class-uwb_rc-wusbhc | 13 +++++++++ drivers/usb/host/whci/qset.c | 24 ++++++++++++++-- drivers/usb/host/whci/whci-hc.h | 9 +----- drivers/usb/wusbcore/wusbhc.c | 32 ++++++++++++++++++++++ drivers/usb/wusbcore/wusbhc.h | 1 + 5 files changed, 68 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc index 4e8106f7cfd9..25b1e751b777 100644 --- a/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc +++ b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc @@ -23,3 +23,16 @@ Description: Since this relates to security (specifically, the lifetime of PTKs and GTKs) it should not be changed from the default. + +What: /sys/class/uwb_rc/uwbN/wusbhc/wusb_phy_rate +Date: August 2009 +KernelVersion: 2.6.32 +Contact: David Vrabel +Description: + The maximum PHY rate to use for all connected devices. + This is only of limited use for testing and + development as the hardware's automatic rate + adaptation is better then this simple control. + + Refer to [ECMA-368] section 10.3.1.1 for the value to + use. diff --git a/drivers/usb/host/whci/qset.c b/drivers/usb/host/whci/qset.c index 08280869ed1c..39e855a55c63 100644 --- a/drivers/usb/host/whci/qset.c +++ b/drivers/usb/host/whci/qset.c @@ -49,11 +49,13 @@ struct whc_qset *qset_alloc(struct whc *whc, gfp_t mem_flags) * state * @urb: an urb for a transfer to this endpoint */ -static void qset_fill_qh(struct whc_qset *qset, struct urb *urb) +static void qset_fill_qh(struct whc *whc, struct whc_qset *qset, struct urb *urb) { struct usb_device *usb_dev = urb->dev; + struct wusb_dev *wusb_dev = usb_dev->wusb_dev; struct usb_wireless_ep_comp_descriptor *epcd; bool is_out; + uint8_t phy_rate; is_out = usb_pipeout(urb->pipe); @@ -68,6 +70,22 @@ static void qset_fill_qh(struct whc_qset *qset, struct urb *urb) qset->max_burst = 1; } + /* + * Initial PHY rate is 53.3 Mbit/s for control endpoints or + * the maximum supported by the device for other endpoints + * (unless limited by the user). + */ + if (usb_pipecontrol(urb->pipe)) + phy_rate = UWB_PHY_RATE_53; + else { + uint16_t phy_rates; + + phy_rates = le16_to_cpu(wusb_dev->wusb_cap_descr->wPHYRates); + phy_rate = fls(phy_rates) - 1; + if (phy_rate > whc->wusbhc.phy_rate) + phy_rate = whc->wusbhc.phy_rate; + } + qset->qh.info1 = cpu_to_le32( QH_INFO1_EP(usb_pipeendpoint(urb->pipe)) | (is_out ? QH_INFO1_DIR_OUT : QH_INFO1_DIR_IN) @@ -87,7 +105,7 @@ static void qset_fill_qh(struct whc_qset *qset, struct urb *urb) * strength and can presumably guess the Tx power required * from that? */ qset->qh.info3 = cpu_to_le32( - QH_INFO3_TX_RATE_53_3 + QH_INFO3_TX_RATE(phy_rate) | QH_INFO3_TX_PWR(0) /* 0 == max power */ ); @@ -149,7 +167,7 @@ struct whc_qset *get_qset(struct whc *whc, struct urb *urb, qset->ep = urb->ep; urb->ep->hcpriv = qset; - qset_fill_qh(qset, urb); + qset_fill_qh(whc, qset, urb); } return qset; } diff --git a/drivers/usb/host/whci/whci-hc.h b/drivers/usb/host/whci/whci-hc.h index d5e5c3aacced..4d4cbc0730bf 100644 --- a/drivers/usb/host/whci/whci-hc.h +++ b/drivers/usb/host/whci/whci-hc.h @@ -172,14 +172,7 @@ struct whc_qhead { #define QH_INFO3_MAX_DELAY(d) ((d) << 0) /* maximum stream delay in 125 us units (isoc only) */ #define QH_INFO3_INTERVAL(i) ((i) << 16) /* segment interval in 125 us units (isoc only) */ -#define QH_INFO3_TX_RATE_53_3 (0 << 24) -#define QH_INFO3_TX_RATE_80 (1 << 24) -#define QH_INFO3_TX_RATE_106_7 (2 << 24) -#define QH_INFO3_TX_RATE_160 (3 << 24) -#define QH_INFO3_TX_RATE_200 (4 << 24) -#define QH_INFO3_TX_RATE_320 (5 << 24) -#define QH_INFO3_TX_RATE_400 (6 << 24) -#define QH_INFO3_TX_RATE_480 (7 << 24) +#define QH_INFO3_TX_RATE(r) ((r) << 24) /* PHY rate (see [ECMA-368] section 10.3.1.1) */ #define QH_INFO3_TX_PWR(p) ((p) << 29) /* transmit power (see [WUSB] section 5.2.1.2) */ #define QH_STATUS_FLOW_CTRL (1 << 15) diff --git a/drivers/usb/wusbcore/wusbhc.c b/drivers/usb/wusbcore/wusbhc.c index ee6256f23636..eab86e4bc770 100644 --- a/drivers/usb/wusbcore/wusbhc.c +++ b/drivers/usb/wusbcore/wusbhc.c @@ -147,10 +147,40 @@ static ssize_t wusb_chid_store(struct device *dev, } static DEVICE_ATTR(wusb_chid, 0644, wusb_chid_show, wusb_chid_store); + +static ssize_t wusb_phy_rate_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct wusbhc *wusbhc = usbhc_dev_to_wusbhc(dev); + + return sprintf(buf, "%d\n", wusbhc->phy_rate); +} + +static ssize_t wusb_phy_rate_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t size) +{ + struct wusbhc *wusbhc = usbhc_dev_to_wusbhc(dev); + uint8_t phy_rate; + ssize_t result; + + result = sscanf(buf, "%hhu", &phy_rate); + if (result != 1) + return -EINVAL; + if (phy_rate >= UWB_PHY_RATE_INVALID) + return -EINVAL; + + wusbhc->phy_rate = phy_rate; + return size; +} +static DEVICE_ATTR(wusb_phy_rate, 0644, wusb_phy_rate_show, wusb_phy_rate_store); + /* Group all the WUSBHC attributes */ static struct attribute *wusbhc_attrs[] = { &dev_attr_wusb_trust_timeout.attr, &dev_attr_wusb_chid.attr, + &dev_attr_wusb_phy_rate.attr, NULL, }; @@ -177,6 +207,8 @@ int wusbhc_create(struct wusbhc *wusbhc) int result = 0; wusbhc->trust_timeout = WUSB_TRUST_TIMEOUT_MS; + wusbhc->phy_rate = UWB_PHY_RATE_INVALID - 1; + mutex_init(&wusbhc->mutex); result = wusbhc_mmcie_create(wusbhc); if (result < 0) diff --git a/drivers/usb/wusbcore/wusbhc.h b/drivers/usb/wusbcore/wusbhc.h index 797c2453a35b..fd2fd4e277e1 100644 --- a/drivers/usb/wusbcore/wusbhc.h +++ b/drivers/usb/wusbcore/wusbhc.h @@ -253,6 +253,7 @@ struct wusbhc { unsigned trust_timeout; /* in jiffies */ struct wusb_ckhdid chid; + uint8_t phy_rate; struct wuie_host_info *wuie_host_info; struct mutex mutex; /* locks everything else */ -- cgit v1.2.3 From fb34d53752d5bec5acc73422e462a9c68aeeaa2a Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 13 Nov 2009 11:53:59 -0500 Subject: USB: remove the auto_pm flag This patch (as1302) removes the auto_pm flag from struct usb_device. The flag's only purpose was to distinguish between autosuspends and external suspends, but that information is now available in the pm_message_t argument passed to suspend methods. Signed-off-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- Documentation/usb/power-management.txt | 9 +++++---- drivers/bluetooth/btusb.c | 2 +- drivers/hid/usbhid/hid-core.c | 8 ++++---- drivers/net/wimax/i2400m/usb.c | 7 ++----- drivers/usb/core/driver.c | 4 ---- drivers/usb/serial/option.c | 2 +- drivers/usb/serial/sierra.c | 2 +- include/linux/usb.h | 2 -- 8 files changed, 14 insertions(+), 22 deletions(-) (limited to 'Documentation') diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt index ad642615ad4c..8817368203d6 100644 --- a/Documentation/usb/power-management.txt +++ b/Documentation/usb/power-management.txt @@ -423,15 +423,16 @@ an URB had completed too recently. External suspend calls should never be allowed to fail in this way, only autosuspend calls. The driver can tell them apart by checking -udev->auto_pm; this flag will be set to 1 for internal PM events -(autosuspend or autoresume) and 0 for external PM events. +the PM_EVENT_AUTO bit in the message.event argument to the suspend +method; this bit will be set for internal PM events (autosuspend) and +clear for external PM events. Many of the ingredients in the autosuspend framework are oriented towards interfaces: The usb_interface structure contains the pm_usage_cnt field, and the usb_autopm_* routines take an interface pointer as their argument. But somewhat confusingly, a few of the -pieces (usb_mark_last_busy() and udev->auto_pm) use the usb_device -structure instead. Drivers need to keep this straight; they can call +pieces (i.e., usb_mark_last_busy()) use the usb_device structure +instead. Drivers need to keep this straight; they can call interface_to_usbdev() to find the device structure for a given interface. diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 44bc8bbabf54..4d2905996751 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -1066,7 +1066,7 @@ static int btusb_suspend(struct usb_interface *intf, pm_message_t message) return 0; spin_lock_irq(&data->txlock); - if (!(interface_to_usbdev(intf)->auto_pm && data->tx_in_flight)) { + if (!((message.event & PM_EVENT_AUTO) && data->tx_in_flight)) { set_bit(BTUSB_SUSPENDING, &data->flags); spin_unlock_irq(&data->txlock); } else { diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c index 0258289f3b3e..e2997a8d5e1b 100644 --- a/drivers/hid/usbhid/hid-core.c +++ b/drivers/hid/usbhid/hid-core.c @@ -1253,10 +1253,9 @@ static int hid_suspend(struct usb_interface *intf, pm_message_t message) { struct hid_device *hid = usb_get_intfdata(intf); struct usbhid_device *usbhid = hid->driver_data; - struct usb_device *udev = interface_to_usbdev(intf); int status; - if (udev->auto_pm) { + if (message.event & PM_EVENT_AUTO) { spin_lock_irq(&usbhid->lock); /* Sync with error handler */ if (!test_bit(HID_RESET_PENDING, &usbhid->iofl) && !test_bit(HID_CLEAR_HALT, &usbhid->iofl) @@ -1281,7 +1280,7 @@ static int hid_suspend(struct usb_interface *intf, pm_message_t message) return -EIO; } - if (!ignoreled && udev->auto_pm) { + if (!ignoreled && (message.event & PM_EVENT_AUTO)) { spin_lock_irq(&usbhid->lock); if (test_bit(HID_LED_ON, &usbhid->iofl)) { spin_unlock_irq(&usbhid->lock); @@ -1294,7 +1293,8 @@ static int hid_suspend(struct usb_interface *intf, pm_message_t message) hid_cancel_delayed_stuff(usbhid); hid_cease_io(usbhid); - if (udev->auto_pm && test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) { + if ((message.event & PM_EVENT_AUTO) && + test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) { /* lost race against keypresses */ status = hid_start_in(hid); if (status < 0) diff --git a/drivers/net/wimax/i2400m/usb.c b/drivers/net/wimax/i2400m/usb.c index 47e84ef355c5..3b48681f8a0d 100644 --- a/drivers/net/wimax/i2400m/usb.c +++ b/drivers/net/wimax/i2400m/usb.c @@ -579,7 +579,7 @@ void i2400mu_disconnect(struct usb_interface *iface) * * As well, the device might refuse going to sleep for whichever * reason. In this case we just fail. For system suspend/hibernate, - * we *can't* fail. We look at usb_dev->auto_pm to see if the + * we *can't* fail. We check PM_EVENT_AUTO to see if the * suspend call comes from the USB stack or from the system and act * in consequence. * @@ -591,14 +591,11 @@ int i2400mu_suspend(struct usb_interface *iface, pm_message_t pm_msg) int result = 0; struct device *dev = &iface->dev; struct i2400mu *i2400mu = usb_get_intfdata(iface); -#ifdef CONFIG_PM - struct usb_device *usb_dev = i2400mu->usb_dev; -#endif unsigned is_autosuspend = 0; struct i2400m *i2400m = &i2400mu->i2400m; #ifdef CONFIG_PM - if (usb_dev->auto_pm > 0) + if (pm_msg.event & PM_EVENT_AUTO) is_autosuspend = 1; #endif diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 4f864472c5c4..8016a296010e 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1341,7 +1341,6 @@ static int usb_autopm_do_device(struct usb_device *udev, int inc_usage_cnt) int status = 0; usb_pm_lock(udev); - udev->auto_pm = 1; udev->pm_usage_cnt += inc_usage_cnt; WARN_ON(udev->pm_usage_cnt < 0); if (inc_usage_cnt) @@ -1473,7 +1472,6 @@ static int usb_autopm_do_interface(struct usb_interface *intf, if (intf->condition == USB_INTERFACE_UNBOUND) status = -ENODEV; else { - udev->auto_pm = 1; atomic_add(inc_usage_cnt, &intf->pm_usage_cnt); udev->last_busy = jiffies; if (inc_usage_cnt >= 0 && @@ -1707,7 +1705,6 @@ int usb_external_suspend_device(struct usb_device *udev, pm_message_t msg) do_unbind_rebind(udev, DO_UNBIND); usb_pm_lock(udev); - udev->auto_pm = 0; status = usb_suspend_both(udev, msg); usb_pm_unlock(udev); return status; @@ -1730,7 +1727,6 @@ int usb_external_resume_device(struct usb_device *udev, pm_message_t msg) int status; usb_pm_lock(udev); - udev->auto_pm = 0; status = usb_resume_both(udev, msg); udev->last_busy = jiffies; usb_pm_unlock(udev); diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 0d46bbec44b7..8751ec79a159 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1313,7 +1313,7 @@ static int option_suspend(struct usb_serial *serial, pm_message_t message) dbg("%s entered", __func__); - if (serial->dev->auto_pm) { + if (message.event & PM_EVENT_AUTO) { spin_lock_irq(&intfdata->susp_lock); b = intfdata->in_flight; spin_unlock_irq(&intfdata->susp_lock); diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c index c5c41aed106d..ac1b6449fb6a 100644 --- a/drivers/usb/serial/sierra.c +++ b/drivers/usb/serial/sierra.c @@ -1005,7 +1005,7 @@ static int sierra_suspend(struct usb_serial *serial, pm_message_t message) struct sierra_intf_private *intfdata; int b; - if (serial->dev->auto_pm) { + if (message.event & PM_EVENT_AUTO) { intfdata = serial->private; spin_lock_irq(&intfdata->susp_lock); b = intfdata->in_flight; diff --git a/include/linux/usb.h b/include/linux/usb.h index 6e91ee4f5b81..4b6f6db544ee 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -429,7 +429,6 @@ struct usb_tt; * @last_busy: time of last use * @autosuspend_delay: in jiffies * @connect_time: time device was first connected - * @auto_pm: autosuspend/resume in progress * @do_remote_wakeup: remote wakeup should be enabled * @reset_resume: needs reset instead of resume * @autosuspend_disabled: autosuspend disabled by the user @@ -514,7 +513,6 @@ struct usb_device { int autosuspend_delay; unsigned long connect_time; - unsigned auto_pm:1; unsigned do_remote_wakeup:1; unsigned reset_resume:1; unsigned autosuspend_disabled:1; -- cgit v1.2.3 From 8e4ceb38eb5bbaef22fc00abe9bc11e26bea2ab5 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Mon, 7 Dec 2009 13:01:37 -0500 Subject: USB: prepare for changover to Runtime PM framework This patch (as1303) revises the USB Power Management infrastructure to make it compatible with the new driver-model Runtime PM framework: Drivers are no longer allowed to access intf->pm_usage_cnt directly; the PM framework manages its own usage counters. usb_autopm_set_interface() is eliminated, because it directly sets intf->pm_usage_cnt. usb_autopm_enable() and usb_autopm_disable() are eliminated, because they call usb_autopm_set_interface(). usb_autopm_get_interface_no_resume() and usb_autopm_put_interface_no_suspend() are added. They correspond to pm_runtime_get_noresume() and pm_runtime_put_noidle() in the PM framework. The power/level attribute no longer accepts "suspend", only "on" and "auto". The PM framework doesn't allow devices to be forced into a suspended mode. The hub driver contains the only code that violates the new guidelines. It is updated to use the new interface routines instead. Signed-off-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- Documentation/usb/power-management.txt | 60 ++++++++++++++++------------------ drivers/usb/core/driver.c | 31 ------------------ drivers/usb/core/hub.c | 45 +++++++++++++++++-------- drivers/usb/core/sysfs.c | 25 +++----------- include/linux/usb.h | 26 ++++++--------- 5 files changed, 74 insertions(+), 113 deletions(-) (limited to 'Documentation') diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt index 8817368203d6..c7c1dc2f8017 100644 --- a/Documentation/usb/power-management.txt +++ b/Documentation/usb/power-management.txt @@ -2,7 +2,7 @@ Alan Stern - October 5, 2007 + November 10, 2009 @@ -123,9 +123,9 @@ relevant attribute files are: wakeup, level, and autosuspend. power/level - This file contains one of three words: "on", "auto", - or "suspend". You can write those words to the file - to change the device's setting. + This file contains one of two words: "on" or "auto". + You can write those words to the file to change the + device's setting. "on" means that the device should be resumed and autosuspend is not allowed. (Of course, system @@ -134,10 +134,10 @@ relevant attribute files are: wakeup, level, and autosuspend. "auto" is the normal state in which the kernel is allowed to autosuspend and autoresume the device. - "suspend" means that the device should remain - suspended, and autoresume is not allowed. (But remote - wakeup may still be allowed, since it is controlled - separately by the power/wakeup attribute.) + (In kernels up to 2.6.32, you could also specify + "suspend", meaning that the device should remain + suspended and autoresume was not allowed. This + setting is no longer supported.) power/autosuspend @@ -313,13 +313,14 @@ three of the methods listed above. In addition, a driver indicates that it supports autosuspend by setting the .supports_autosuspend flag in its usb_driver structure. It is then responsible for informing the USB core whenever one of its interfaces becomes busy or idle. The -driver does so by calling these five functions: +driver does so by calling these six functions: int usb_autopm_get_interface(struct usb_interface *intf); void usb_autopm_put_interface(struct usb_interface *intf); - int usb_autopm_set_interface(struct usb_interface *intf); int usb_autopm_get_interface_async(struct usb_interface *intf); void usb_autopm_put_interface_async(struct usb_interface *intf); + void usb_autopm_get_interface_no_resume(struct usb_interface *intf); + void usb_autopm_put_interface_no_suspend(struct usb_interface *intf); The functions work by maintaining a counter in the usb_interface structure. When intf->pm_usage_count is > 0 then the interface is @@ -331,11 +332,13 @@ considered to be idle, and the kernel may autosuspend the device. associated with the device itself rather than any of its interfaces. This field is used only by the USB core.) -The driver owns intf->pm_usage_count; it can modify the value however -and whenever it likes. A nice aspect of the non-async usb_autopm_* -routines is that the changes they make are protected by the usb_device -structure's PM mutex (udev->pm_mutex); however drivers may change -pm_usage_count without holding the mutex. Drivers using the async +Drivers must not modify intf->pm_usage_count directly; its value +should be changed only be using the functions listed above. Drivers +are responsible for insuring that the overall change to pm_usage_count +during their lifetime balances out to 0 (it may be necessary for the +disconnect method to call usb_autopm_put_interface() one or more times +to fulfill this requirement). The first two routines use the PM mutex +in struct usb_device for mutual exclusion; drivers using the async routines are responsible for their own synchronization and mutual exclusion. @@ -347,11 +350,6 @@ exclusion. attempts an autosuspend if the new value is <= 0 and the device isn't suspended. - usb_autopm_set_interface() leaves pm_usage_count alone. - It attempts an autoresume if the value is > 0 and the device - is suspended, and it attempts an autosuspend if the value is - <= 0 and the device isn't suspended. - usb_autopm_get_interface_async() and usb_autopm_put_interface_async() do almost the same things as their non-async counterparts. The differences are: they do @@ -360,13 +358,11 @@ exclusion. such as an URB's completion handler, but when they return the device will not generally not yet be in the desired state. -There also are a couple of utility routines drivers can use: - - usb_autopm_enable() sets pm_usage_cnt to 0 and then calls - usb_autopm_set_interface(), which will attempt an autosuspend. - - usb_autopm_disable() sets pm_usage_cnt to 1 and then calls - usb_autopm_set_interface(), which will attempt an autoresume. + usb_autopm_get_interface_no_resume() and + usb_autopm_put_interface_no_suspend() merely increment or + decrement the pm_usage_count value; they do not attempt to + carry out an autoresume or an autosuspend. Hence they can be + called in an atomic context. The conventional usage pattern is that a driver calls usb_autopm_get_interface() in its open routine and @@ -400,11 +396,11 @@ though, setting this flag won't cause the kernel to autoresume it. Normally a driver would set this flag in its probe method, at which time the device is guaranteed not to be autosuspended.) -The usb_autopm_* routines have to run in a sleepable process context; -they must not be called from an interrupt handler or while holding a -spinlock. In fact, the entire autosuspend mechanism is not well geared -toward interrupt-driven operation. However there is one thing a -driver can do in an interrupt handler: +The synchronous usb_autopm_* routines have to run in a sleepable +process context; they must not be called from an interrupt handler or +while holding a spinlock. In fact, the entire autosuspend mechanism +is not well geared toward interrupt-driven operation. However there +is one thing a driver can do in an interrupt handler: usb_mark_last_busy(struct usb_device *udev); diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 8016a296010e..7a05bab73960 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -948,8 +948,6 @@ static int usb_resume_device(struct usb_device *udev, pm_message_t msg) done: dev_vdbg(&udev->dev, "%s: status %d\n", __func__, status); - if (status == 0) - udev->autoresume_disabled = 0; return status; } @@ -1280,11 +1278,6 @@ static int usb_resume_both(struct usb_device *udev, pm_message_t msg) /* Propagate the resume up the tree, if necessary */ if (udev->state == USB_STATE_SUSPENDED) { - if ((msg.event & PM_EVENT_AUTO) && - udev->autoresume_disabled) { - status = -EPERM; - goto done; - } if (parent) { status = usb_autoresume_device(parent); if (status == 0) { @@ -1638,8 +1631,6 @@ int usb_autopm_get_interface_async(struct usb_interface *intf) if (intf->condition == USB_INTERFACE_UNBOUND) status = -ENODEV; - else if (udev->autoresume_disabled) - status = -EPERM; else { atomic_inc(&intf->pm_usage_cnt); if (atomic_read(&intf->pm_usage_cnt) > 0 && @@ -1652,28 +1643,6 @@ int usb_autopm_get_interface_async(struct usb_interface *intf) } EXPORT_SYMBOL_GPL(usb_autopm_get_interface_async); -/** - * usb_autopm_set_interface - set a USB interface's autosuspend state - * @intf: the usb_interface whose state should be set - * - * This routine sets the autosuspend state of @intf's device according - * to @intf's usage counter, which the caller must have set previously. - * If the counter is <= 0, the device is autosuspended (if it isn't - * already suspended and if nothing else prevents the autosuspend). If - * the counter is > 0, the device is autoresumed (if it isn't already - * awake). - */ -int usb_autopm_set_interface(struct usb_interface *intf) -{ - int status; - - status = usb_autopm_do_interface(intf, 0); - dev_vdbg(&intf->dev, "%s: status %d cnt %d\n", - __func__, status, atomic_read(&intf->pm_usage_cnt)); - return status; -} -EXPORT_SYMBOL_GPL(usb_autopm_set_interface); - #else void usb_autosuspend_work(struct work_struct *work) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 5413d712cae0..b38fd6730e2a 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -71,6 +71,7 @@ struct usb_hub { unsigned mA_per_port; /* current for each child */ + unsigned init_done:1; unsigned limited_power:1; unsigned quiescing:1; unsigned disconnected:1; @@ -375,12 +376,13 @@ static void kick_khubd(struct usb_hub *hub) { unsigned long flags; - /* Suppress autosuspend until khubd runs */ - atomic_set(&to_usb_interface(hub->intfdev)->pm_usage_cnt, 1); - spin_lock_irqsave(&hub_event_lock, flags); if (!hub->disconnected && list_empty(&hub->event_list)) { list_add_tail(&hub->event_list, &hub_event_list); + + /* Suppress autosuspend until khubd runs */ + usb_autopm_get_interface_no_resume( + to_usb_interface(hub->intfdev)); wake_up(&khubd_wait); } spin_unlock_irqrestore(&hub_event_lock, flags); @@ -665,7 +667,7 @@ int usb_remove_device(struct usb_device *udev) } enum hub_activation_type { - HUB_INIT, HUB_INIT2, HUB_INIT3, + HUB_INIT, HUB_INIT2, HUB_INIT3, /* INITs must come first */ HUB_POST_RESET, HUB_RESUME, HUB_RESET_RESUME, }; @@ -710,8 +712,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) msecs_to_jiffies(delay)); /* Suppress autosuspend until init is done */ - atomic_set(&to_usb_interface(hub->intfdev)-> - pm_usage_cnt, 1); + usb_autopm_get_interface_no_resume( + to_usb_interface(hub->intfdev)); return; /* Continues at init2: below */ } else { hub_power_on(hub, true); @@ -818,6 +820,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) } init3: hub->quiescing = 0; + hub->init_done = 1; status = usb_submit_urb(hub->urb, GFP_NOIO); if (status < 0) @@ -827,6 +830,10 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) /* Scan all ports that need attention */ kick_khubd(hub); + + /* Allow autosuspend if it was suppressed */ + if (type <= HUB_INIT3) + usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); } /* Implement the continuations for the delays above */ @@ -854,6 +861,11 @@ static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type) int i; cancel_delayed_work_sync(&hub->init_work); + if (!hub->init_done) { + hub->init_done = 1; + usb_autopm_put_interface_no_suspend( + to_usb_interface(hub->intfdev)); + } /* khubd and related activity won't re-trigger */ hub->quiescing = 1; @@ -1176,7 +1188,10 @@ static void hub_disconnect(struct usb_interface *intf) /* Take the hub off the event list and don't let it be added again */ spin_lock_irq(&hub_event_lock); - list_del_init(&hub->event_list); + if (!list_empty(&hub->event_list)) { + list_del_init(&hub->event_list); + usb_autopm_put_interface_no_suspend(intf); + } hub->disconnected = 1; spin_unlock_irq(&hub_event_lock); @@ -3235,7 +3250,7 @@ static void hub_events(void) * disconnected while waiting for the lock to succeed. */ usb_lock_device(hdev); if (unlikely(hub->disconnected)) - goto loop; + goto loop2; /* If the hub has died, clean up after it */ if (hdev->state == USB_STATE_NOTATTACHED) { @@ -3384,11 +3399,15 @@ static void hub_events(void) } } -loop_autopm: - /* Allow autosuspend if we're not going to run again */ - if (list_empty(&hub->event_list)) - usb_autopm_enable(intf); -loop: + loop_autopm: + /* Balance the usb_autopm_get_interface() above */ + usb_autopm_put_interface_no_suspend(intf); + loop: + /* Balance the usb_autopm_get_interface_no_resume() in + * kick_khubd() and allow autosuspend. + */ + usb_autopm_put_interface(intf); + loop2: usb_unlock_device(hdev); kref_put(&hub->kref, hub_release); diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c index ae763974be25..15477008b631 100644 --- a/drivers/usb/core/sysfs.c +++ b/drivers/usb/core/sysfs.c @@ -327,7 +327,6 @@ static DEVICE_ATTR(autosuspend, S_IRUGO | S_IWUSR, static const char on_string[] = "on"; static const char auto_string[] = "auto"; -static const char suspend_string[] = "suspend"; static ssize_t show_level(struct device *dev, struct device_attribute *attr, char *buf) @@ -335,13 +334,8 @@ show_level(struct device *dev, struct device_attribute *attr, char *buf) struct usb_device *udev = to_usb_device(dev); const char *p = auto_string; - if (udev->state == USB_STATE_SUSPENDED) { - if (udev->autoresume_disabled) - p = suspend_string; - } else { - if (udev->autosuspend_disabled) - p = on_string; - } + if (udev->state != USB_STATE_SUSPENDED && udev->autosuspend_disabled) + p = on_string; return sprintf(buf, "%s\n", p); } @@ -353,7 +347,7 @@ set_level(struct device *dev, struct device_attribute *attr, int len = count; char *cp; int rc = 0; - int old_autosuspend_disabled, old_autoresume_disabled; + int old_autosuspend_disabled; cp = memchr(buf, '\n', count); if (cp) @@ -361,7 +355,6 @@ set_level(struct device *dev, struct device_attribute *attr, usb_lock_device(udev); old_autosuspend_disabled = udev->autosuspend_disabled; - old_autoresume_disabled = udev->autoresume_disabled; /* Setting the flags without calling usb_pm_lock is a subject to * races, but who cares... @@ -369,28 +362,18 @@ set_level(struct device *dev, struct device_attribute *attr, if (len == sizeof on_string - 1 && strncmp(buf, on_string, len) == 0) { udev->autosuspend_disabled = 1; - udev->autoresume_disabled = 0; rc = usb_external_resume_device(udev, PMSG_USER_RESUME); } else if (len == sizeof auto_string - 1 && strncmp(buf, auto_string, len) == 0) { udev->autosuspend_disabled = 0; - udev->autoresume_disabled = 0; rc = usb_external_resume_device(udev, PMSG_USER_RESUME); - } else if (len == sizeof suspend_string - 1 && - strncmp(buf, suspend_string, len) == 0) { - udev->autosuspend_disabled = 0; - udev->autoresume_disabled = 1; - rc = usb_external_suspend_device(udev, PMSG_USER_SUSPEND); - } else rc = -EINVAL; - if (rc) { + if (rc) udev->autosuspend_disabled = old_autosuspend_disabled; - udev->autoresume_disabled = old_autoresume_disabled; - } usb_unlock_device(udev); return (rc < 0 ? rc : count); } diff --git a/include/linux/usb.h b/include/linux/usb.h index 4b6f6db544ee..6af3581e1114 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -432,7 +432,6 @@ struct usb_tt; * @do_remote_wakeup: remote wakeup should be enabled * @reset_resume: needs reset instead of resume * @autosuspend_disabled: autosuspend disabled by the user - * @autoresume_disabled: autoresume disabled by the user * @skip_sys_resume: skip the next system resume * @wusb_dev: if this is a Wireless USB device, link to the WUSB * specific data for the device. @@ -516,7 +515,6 @@ struct usb_device { unsigned do_remote_wakeup:1; unsigned reset_resume:1; unsigned autosuspend_disabled:1; - unsigned autoresume_disabled:1; unsigned skip_sys_resume:1; #endif struct wusb_dev *wusb_dev; @@ -542,22 +540,20 @@ extern struct usb_device *usb_find_device(u16 vendor_id, u16 product_id); /* USB autosuspend and autoresume */ #ifdef CONFIG_USB_SUSPEND -extern int usb_autopm_set_interface(struct usb_interface *intf); extern int usb_autopm_get_interface(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); -static inline void usb_autopm_enable(struct usb_interface *intf) +static inline void usb_autopm_get_interface_no_resume( + struct usb_interface *intf) { - atomic_set(&intf->pm_usage_cnt, 0); - usb_autopm_set_interface(intf); + atomic_inc(&intf->pm_usage_cnt); } - -static inline void usb_autopm_disable(struct usb_interface *intf) +static inline void usb_autopm_put_interface_no_suspend( + struct usb_interface *intf) { - atomic_set(&intf->pm_usage_cnt, 1); - usb_autopm_set_interface(intf); + atomic_dec(&intf->pm_usage_cnt); } static inline void usb_mark_last_busy(struct usb_device *udev) @@ -567,12 +563,8 @@ static inline void usb_mark_last_busy(struct usb_device *udev) #else -static inline int usb_autopm_set_interface(struct usb_interface *intf) -{ return 0; } - static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } - static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } @@ -580,9 +572,11 @@ static inline void usb_autopm_put_interface(struct usb_interface *intf) { } static inline void usb_autopm_put_interface_async(struct usb_interface *intf) { } -static inline void usb_autopm_enable(struct usb_interface *intf) +static inline void usb_autopm_get_interface_no_resume( + struct usb_interface *intf) { } -static inline void usb_autopm_disable(struct usb_interface *intf) +static inline void usb_autopm_put_interface_no_suspend( + struct usb_interface *intf) { } static inline void usb_mark_last_busy(struct usb_device *udev) { } -- cgit v1.2.3 From a0bb108112a872c0b0c4b3ef4974f95fb75b155d Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Mon, 7 Dec 2009 16:39:16 -0500 Subject: USB: usb-storage: add BAD_SENSE flag This patch (as1311) fixes a problem in usb-storage: Some devices are pretty broken when it comes to reporting sense data. The information they send back indicates that they have more than 18 bytes of sense data available, but when the system asks for more than 18 they fail or hang. The symptom is that probing fails with multiple resets. The patch adds a new BAD_SENSE flag to indicate that usb-storage should never ask for more than 18 bytes of sense data. The flag can be set in an unusual_devs entry or via the "quirks=" module parameter, and it is set automatically whenever a REQUEST SENSE command for more than 18 bytes fails or times out. An unusual_devs entry is added for the Agfa photo frame, which uses a Prolific chip having this bug. Signed-off-by: Alan Stern Tested-by: Daniel Kukula Cc: stable Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 2 ++ drivers/usb/storage/transport.c | 17 +++++++++++++---- drivers/usb/storage/unusual_devs.h | 7 +++++++ drivers/usb/storage/usb.c | 3 +++ include/linux/usb_usual.h | 4 +++- 5 files changed, 28 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 777dc8a32df8..3f886e298f62 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2663,6 +2663,8 @@ and is between 256 and 4096 characters. It is defined in the file to a common usb-storage quirk flag as follows: a = SANE_SENSE (collect more than 18 bytes of sense data); + b = BAD_SENSE (don't collect more than 18 + bytes of sense data); c = FIX_CAPACITY (decrease the reported device capacity by one sector); h = CAPACITY_HEURISTICS (decrease the diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c index 589f6b4404f0..cc313d16d727 100644 --- a/drivers/usb/storage/transport.c +++ b/drivers/usb/storage/transport.c @@ -666,10 +666,11 @@ void usb_stor_invoke_transport(struct scsi_cmnd *srb, struct us_data *us) * to wait for at least one CHECK_CONDITION to determine * SANE_SENSE support */ - if ((srb->cmnd[0] == ATA_16 || srb->cmnd[0] == ATA_12) && + if (unlikely((srb->cmnd[0] == ATA_16 || srb->cmnd[0] == ATA_12) && result == USB_STOR_TRANSPORT_GOOD && !(us->fflags & US_FL_SANE_SENSE) && - !(srb->cmnd[2] & 0x20)) { + !(us->fflags & US_FL_BAD_SENSE) && + !(srb->cmnd[2] & 0x20))) { US_DEBUGP("-- SAT supported, increasing auto-sense\n"); us->fflags |= US_FL_SANE_SENSE; } @@ -718,6 +719,12 @@ Retry_Sense: if (test_bit(US_FLIDX_TIMED_OUT, &us->dflags)) { US_DEBUGP("-- auto-sense aborted\n"); srb->result = DID_ABORT << 16; + + /* If SANE_SENSE caused this problem, disable it */ + if (sense_size != US_SENSE_SIZE) { + us->fflags &= ~US_FL_SANE_SENSE; + us->fflags |= US_FL_BAD_SENSE; + } goto Handle_Errors; } @@ -727,10 +734,11 @@ Retry_Sense: * (small) sense request. This fixes some USB GSM modems */ if (temp_result == USB_STOR_TRANSPORT_FAILED && - (us->fflags & US_FL_SANE_SENSE) && - sense_size != US_SENSE_SIZE) { + sense_size != US_SENSE_SIZE) { US_DEBUGP("-- auto-sense failure, retry small sense\n"); sense_size = US_SENSE_SIZE; + us->fflags &= ~US_FL_SANE_SENSE; + us->fflags |= US_FL_BAD_SENSE; goto Retry_Sense; } @@ -754,6 +762,7 @@ Retry_Sense: */ if (srb->sense_buffer[7] > (US_SENSE_SIZE - 8) && !(us->fflags & US_FL_SANE_SENSE) && + !(us->fflags & US_FL_BAD_SENSE) && (srb->sense_buffer[0] & 0x7C) == 0x70) { US_DEBUGP("-- SANE_SENSE support enabled\n"); us->fflags |= US_FL_SANE_SENSE; diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h index d4f034ebaa8a..64a0a2c27e12 100644 --- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -818,6 +818,13 @@ UNUSUAL_DEV( 0x066f, 0x8000, 0x0001, 0x0001, US_SC_DEVICE, US_PR_DEVICE, NULL, US_FL_FIX_CAPACITY ), +/* Reported by Daniel Kukula */ +UNUSUAL_DEV( 0x067b, 0x1063, 0x0100, 0x0100, + "Prolific Technology, Inc.", + "Prolific Storage Gadget", + US_SC_DEVICE, US_PR_DEVICE, NULL, + US_FL_BAD_SENSE ), + /* Reported by Rogerio Brito */ UNUSUAL_DEV( 0x067b, 0x2317, 0x0001, 0x001, "Prolific Technology, Inc.", diff --git a/drivers/usb/storage/usb.c b/drivers/usb/storage/usb.c index 1599d86154c4..f5c0264caa33 100644 --- a/drivers/usb/storage/usb.c +++ b/drivers/usb/storage/usb.c @@ -463,6 +463,9 @@ static void adjust_quirks(struct us_data *us) case 'a': f |= US_FL_SANE_SENSE; break; + case 'b': + f |= US_FL_BAD_SENSE; + break; case 'c': f |= US_FL_FIX_CAPACITY; break; diff --git a/include/linux/usb_usual.h b/include/linux/usb_usual.h index 3d15fb9bc116..a4b947e470a5 100644 --- a/include/linux/usb_usual.h +++ b/include/linux/usb_usual.h @@ -56,7 +56,9 @@ US_FLAG(SANE_SENSE, 0x00008000) \ /* Sane Sense (> 18 bytes) */ \ US_FLAG(CAPACITY_OK, 0x00010000) \ - /* READ CAPACITY response is correct */ + /* READ CAPACITY response is correct */ \ + US_FLAG(BAD_SENSE, 0x00020000) \ + /* Bad Sense (never more than 18 bytes) */ #define US_FLAG(name, value) US_FL_##name = value , enum { US_DO_ALL_FLAGS }; -- cgit v1.2.3 From 0c7a2b72746a96f999fd2728520d03d94879be69 Mon Sep 17 00:00:00 2001 From: CHENG Renquan Date: Sun, 22 Nov 2009 01:28:52 +0800 Subject: USB: add remove_id sysfs attr for usb drivers Accroding commit 0994375e, which is adding remove_id sysfs attr for pci drivers, for management tools dynamically bind/unbind a pci/usb devices to a specified drivers; with this patch, the management tools can be simplied. And the original code didn't handle the failure of usb_create_newid_file, fixed in this patch. Signed-off-by: CHENG Renquan Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-bus-usb | 13 +++++ drivers/usb/core/driver.c | 100 +++++++++++++++++++++++++++++--- 2 files changed, 104 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index 7772928ee48f..deb6b489e4e5 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -144,3 +144,16 @@ Description: Write a 1 to force the device to disconnect (equivalent to unplugging a wired USB device). + +What: /sys/bus/usb/drivers/.../remove_id +Date: November 2009 +Contact: CHENG Renquan +Description: + Writing a device ID to this file will remove an ID + that was dynamically added via the new_id sysfs entry. + The format for the device ID is: + idVendor idProduct. After successfully + removing an ID, the driver will no longer support the + device. This is useful to ensure auto probing won't + match the driver to the device. For example: + # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 7a05bab73960..60a45f1e3a67 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -83,6 +83,47 @@ static ssize_t store_new_id(struct device_driver *driver, } static DRIVER_ATTR(new_id, S_IWUSR, NULL, store_new_id); +/** + * store_remove_id - remove a USB device ID from this driver + * @driver: target device driver + * @buf: buffer for scanning device ID data + * @count: input size + * + * Removes a dynamic usb device ID from this driver. + */ +static ssize_t +store_remove_id(struct device_driver *driver, const char *buf, size_t count) +{ + struct usb_dynid *dynid, *n; + struct usb_driver *usb_driver = to_usb_driver(driver); + u32 idVendor = 0; + u32 idProduct = 0; + int fields = 0; + int retval = 0; + + fields = sscanf(buf, "%x %x", &idVendor, &idProduct); + if (fields < 2) + return -EINVAL; + + spin_lock(&usb_driver->dynids.lock); + list_for_each_entry_safe(dynid, n, &usb_driver->dynids.list, node) { + struct usb_device_id *id = &dynid->id; + if ((id->idVendor == idVendor) && + (id->idProduct == idProduct)) { + list_del(&dynid->node); + kfree(dynid); + retval = 0; + break; + } + } + spin_unlock(&usb_driver->dynids.lock); + + if (retval) + return retval; + return count; +} +static DRIVER_ATTR(remove_id, S_IWUSR, NULL, store_remove_id); + static int usb_create_newid_file(struct usb_driver *usb_drv) { int error = 0; @@ -107,6 +148,21 @@ static void usb_remove_newid_file(struct usb_driver *usb_drv) &driver_attr_new_id); } +static int +usb_create_removeid_file(struct usb_driver *drv) +{ + int error = 0; + if (drv->probe != NULL) + error = driver_create_file(&drv->drvwrap.driver, + &driver_attr_remove_id); + return error; +} + +static void usb_remove_removeid_file(struct usb_driver *drv) +{ + driver_remove_file(&drv->drvwrap.driver, &driver_attr_remove_id); +} + static void usb_free_dynids(struct usb_driver *usb_drv) { struct usb_dynid *dynid, *n; @@ -128,6 +184,16 @@ static void usb_remove_newid_file(struct usb_driver *usb_drv) { } +static int +usb_create_removeid_file(struct usb_driver *drv) +{ + return 0; +} + +static void usb_remove_removeid_file(struct usb_driver *drv) +{ +} + static inline void usb_free_dynids(struct usb_driver *usb_drv) { } @@ -774,19 +840,34 @@ int usb_register_driver(struct usb_driver *new_driver, struct module *owner, INIT_LIST_HEAD(&new_driver->dynids.list); retval = driver_register(&new_driver->drvwrap.driver); + if (retval) + goto out; - if (!retval) { - pr_info("%s: registered new interface driver %s\n", + usbfs_update_special(); + + retval = usb_create_newid_file(new_driver); + if (retval) + goto out_newid; + + retval = usb_create_removeid_file(new_driver); + if (retval) + goto out_removeid; + + pr_info("%s: registered new interface driver %s\n", usbcore_name, new_driver->name); - usbfs_update_special(); - usb_create_newid_file(new_driver); - } else { - printk(KERN_ERR "%s: error %d registering interface " - " driver %s\n", - usbcore_name, retval, new_driver->name); - } +out: return retval; + +out_removeid: + usb_remove_newid_file(new_driver); +out_newid: + driver_unregister(&new_driver->drvwrap.driver); + + printk(KERN_ERR "%s: error %d registering interface " + " driver %s\n", + usbcore_name, retval, new_driver->name); + goto out; } EXPORT_SYMBOL_GPL(usb_register_driver); @@ -806,6 +887,7 @@ void usb_deregister(struct usb_driver *driver) pr_info("%s: deregistering interface driver %s\n", usbcore_name, driver->name); + usb_remove_removeid_file(driver); usb_remove_newid_file(driver); usb_free_dynids(driver); driver_unregister(&driver->drvwrap.driver); -- cgit v1.2.3 From d75757abd01672608289dbed2755bdcf822fb592 Mon Sep 17 00:00:00 2001 From: "H. Peter Anvin" Date: Fri, 11 Dec 2009 14:23:44 -0800 Subject: doc: Add documentation for bootloader_{type,version} Add documentation for kernel/bootloader_type and kernel/bootloader_version to sysctl/kernel.txt. This should really have been done a long time ago. Signed-off-by: H. Peter Anvin Cc: Shen Feng --- Documentation/sysctl/kernel.txt | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 8f7a0e73ef44..3894eaa23486 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -19,6 +19,8 @@ Currently, these files might (depending on your configuration) show up in /proc/sys/kernel: - acpi_video_flags - acct +- bootloader_type [ X86 only ] +- bootloader_version [ X86 only ] - callhome [ S390 only ] - auto_msgmni - core_pattern @@ -93,6 +95,35 @@ valid for 30 seconds. ============================================================== +bootloader_type: + +x86 bootloader identification + +This gives the bootloader type number as indicated by the bootloader, +shifted left by 4, and OR'd with the low four bits of the bootloader +version. The reason for this encoding is that this used to match the +type_of_loader field in the kernel header; the encoding is kept for +backwards compatibility. That is, if the full bootloader type number +is 0x15 and the full version number is 0x234, this file will contain +the value 340 = 0x154. + +See the type_of_loader and ext_loader_type fields in +Documentation/x86/boot.txt for additional information. + +============================================================== + +bootloader_version: + +x86 bootloader version + +The complete bootloader version number. In the example above, this +file will contain the value 564 = 0x234. + +See the type_of_loader and ext_loader_ver fields in +Documentation/x86/boot.txt for additional information. + +============================================================== + callhome: Controls the kernel's callhome behavior in case of a kernel panic. -- cgit v1.2.3 From f53a2ade0bb9f2a81f473e6469155172a96b7c38 Mon Sep 17 00:00:00 2001 From: Alan Cox Date: Fri, 9 Oct 2009 12:56:41 +0100 Subject: tty: esp: remove broken driver The ESP driver has been marked broken for years. It's an old ISA device that clearly nobody cares about any more. Remove it Signed-off-by: Alan Cox Signed-off-by: Greg Kroah-Hartman --- Documentation/serial/hayes-esp.txt | 154 --- drivers/char/Kconfig | 13 - drivers/char/Makefile | 1 - drivers/char/esp.c | 2533 ------------------------------------ include/linux/Kbuild | 1 - include/linux/hayesesp.h | 114 -- 6 files changed, 2816 deletions(-) delete mode 100644 Documentation/serial/hayes-esp.txt delete mode 100644 drivers/char/esp.c delete mode 100644 include/linux/hayesesp.h (limited to 'Documentation') diff --git a/Documentation/serial/hayes-esp.txt b/Documentation/serial/hayes-esp.txt deleted file mode 100644 index 09b5d5856758..000000000000 --- a/Documentation/serial/hayes-esp.txt +++ /dev/null @@ -1,154 +0,0 @@ -HAYES ESP DRIVER VERSION 2.1 - -A big thanks to the people at Hayes, especially Alan Adamson. Their support -has enabled me to provide enhancements to the driver. - -Please report your experiences with this driver to me (arobinso@nyx.net). I -am looking for both positive and negative feedback. - -*** IMPORTANT CHANGES FOR 2.1 *** -Support for PIO mode. Five situations will cause PIO mode to be used: -1) A multiport card is detected. PIO mode will always be used. (8 port cards -do not support DMA). -2) The DMA channel is set to an invalid value (anything other than 1 or 3). -3) The DMA buffer/channel could not be allocated. The port will revert to PIO -mode until it is reopened. -4) Less than a specified number of bytes need to be transferred to/from the -FIFOs. PIO mode will be used for that transfer only. -5) A port needs to do a DMA transfer and another port is already using the -DMA channel. PIO mode will be used for that transfer only. - -Since the Hayes ESP seems to conflict with other cards (notably sound cards) -when using DMA, DMA is turned off by default. To use DMA, it must be turned -on explicitly, either with the "dma=" option described below or with -setserial. A multiport card can be forced into DMA mode by using setserial; -however, most multiport cards don't support DMA. - -The latest version of setserial allows the enhanced configuration of the ESP -card to be viewed and modified. -*** - -This package contains the files needed to compile a module to support the Hayes -ESP card. The drivers are basically a modified version of the serial drivers. - -Features: - -- Uses the enhanced mode of the ESP card, allowing a wider range of - interrupts and features than compatibility mode -- Uses DMA and 16 bit PIO mode to transfer data to and from the ESP's FIFOs, - reducing CPU load -- Supports primary and secondary ports - - -If the driver is compiled as a module, the IRQs to use can be specified by -using the irq= option. The format is: - -irq=[0x100],[0x140],[0x180],[0x200],[0x240],[0x280],[0x300],[0x380] - -The address in brackets is the base address of the card. The IRQ of -nonexistent cards can be set to 0. If an IRQ of a card that does exist is set -to 0, the driver will attempt to guess at the correct IRQ. For example, to set -the IRQ of the card at address 0x300 to 12, the insmod command would be: - -insmod esp irq=0,0,0,0,0,0,12,0 - -The custom divisor can be set by using the divisor= option. The format is the -same as for the irq= option. Each divisor value is a series of hex digits, -with each digit representing the divisor to use for a corresponding port. The -divisor value is constructed RIGHT TO LEFT. Specifying a nonzero divisor value -will automatically set the spd_cust flag. To calculate the divisor to use for -a certain baud rate, divide the port's base baud (generally 921600) by the -desired rate. For example, to set the divisor of the primary port at 0x300 to -4 and the divisor of the secondary port at 0x308 to 8, the insmod command would -be: - -insmod esp divisor=0,0,0,0,0,0,0x84,0 - -The dma= option can be used to set the DMA channel. The channel can be either -1 or 3. Specifying any other value will force the driver to use PIO mode. -For example, to set the DMA channel to 3, the insmod command would be: - -insmod esp dma=3 - -The rx_trigger= and tx_trigger= options can be used to set the FIFO trigger -levels. They specify when the ESP card should send an interrupt. Larger -values will decrease the number of interrupts; however, a value too high may -result in data loss. Valid values are 1 through 1023, with 768 being the -default. For example, to set the receive trigger level to 512 bytes and the -transmit trigger level to 700 bytes, the insmod command would be: - -insmod esp rx_trigger=512 tx_trigger=700 - -The flow_off= and flow_on= options can be used to set the hardware flow off/ -flow on levels. The flow on level must be lower than the flow off level, and -the flow off level should be higher than rx_trigger. Valid values are 1 -through 1023, with 1016 being the default flow off level and 944 being the -default flow on level. For example, to set the flow off level to 1000 bytes -and the flow on level to 935 bytes, the insmod command would be: - -insmod esp flow_off=1000 flow_on=935 - -The rx_timeout= option can be used to set the receive timeout value. This -value indicates how long after receiving the last character that the ESP card -should wait before signalling an interrupt. Valid values are 0 though 255, -with 128 being the default. A value too high will increase latency, and a -value too low will cause unnecessary interrupts. For example, to set the -receive timeout to 255, the insmod command would be: - -insmod esp rx_timeout=255 - -The pio_threshold= option sets the threshold (in number of characters) for -using PIO mode instead of DMA mode. For example, if this value is 32, -transfers of 32 bytes or less will always use PIO mode. - -insmod esp pio_threshold=32 - -Multiple options can be listed on the insmod command line by separating each -option with a space. For example: - -insmod esp dma=3 trigger=512 - -The esp module can be automatically loaded when needed. To cause this to -happen, add the following lines to /etc/modprobe.conf (replacing the last line -with options for your configuration): - -alias char-major-57 esp -alias char-major-58 esp -options esp irq=0,0,0,0,0,0,3,0 divisor=0,0,0,0,0,0,0x4,0 - -You may also need to run 'depmod -a'. - -Devices must be created manually. To create the devices, note the output from -the module after it is inserted. The output will appear in the location where -kernel messages usually appear (usually /var/adm/messages). Create two devices -for each 'tty' mentioned, one with major of 57 and the other with major of 58. -The minor number should be the same as the tty number reported. The commands -would be (replace ? with the tty number): - -mknod /dev/ttyP? c 57 ? -mknod /dev/cup? c 58 ? - -For example, if the following line appears: - -Oct 24 18:17:23 techno kernel: ttyP8 at 0x0140 (irq = 3) is an ESP primary port - -...two devices should be created: - -mknod /dev/ttyP8 c 57 8 -mknod /dev/cup8 c 58 8 - -You may need to set the permissions on the devices: - -chmod 666 /dev/ttyP* -chmod 666 /dev/cup* - -The ESP module and the serial module should not conflict (they can be used at -the same time). After the ESP module has been loaded the ports on the ESP card -will no longer be accessible by the serial driver. - -If I/O errors are experienced when accessing the port, check for IRQ and DMA -conflicts ('cat /proc/interrupts' and 'cat /proc/dma' for a list of IRQs and -DMAs currently in use). - -Enjoy! -Andrew J. Robinson diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 6aad99ec4e0f..6f31c9472100 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -201,19 +201,6 @@ config DIGIEPCA To compile this driver as a module, choose M here: the module will be called epca. -config ESPSERIAL - tristate "Hayes ESP serial port support" - depends on SERIAL_NONSTANDARD && ISA && ISA_DMA_API && BROKEN - help - This is a driver which supports Hayes ESP serial ports. Both single - port cards and multiport cards are supported. Make sure to read - . - - To compile this driver as a module, choose M here: the - module will be called esp. - - If unsure, say N. - config MOXA_INTELLIO tristate "Moxa Intellio support" depends on SERIAL_NONSTANDARD && (ISA || EISA || PCI) diff --git a/drivers/char/Makefile b/drivers/char/Makefile index 19a79dd79eee..f957edf7e45d 100644 --- a/drivers/char/Makefile +++ b/drivers/char/Makefile @@ -18,7 +18,6 @@ obj-$(CONFIG_CONSOLE_TRANSLATIONS) += consolemap.o consolemap_deftbl.o obj-$(CONFIG_HW_CONSOLE) += vt.o defkeymap.o obj-$(CONFIG_AUDIT) += tty_audit.o obj-$(CONFIG_MAGIC_SYSRQ) += sysrq.o -obj-$(CONFIG_ESPSERIAL) += esp.o obj-$(CONFIG_MVME147_SCC) += generic_serial.o vme_scc.o obj-$(CONFIG_MVME162_SCC) += generic_serial.o vme_scc.o obj-$(CONFIG_BVME6000_SCC) += generic_serial.o vme_scc.o diff --git a/drivers/char/esp.c b/drivers/char/esp.c deleted file mode 100644 index b19d43cd9542..000000000000 --- a/drivers/char/esp.c +++ /dev/null @@ -1,2533 +0,0 @@ -/* - * esp.c - driver for Hayes ESP serial cards - * - * --- Notices from serial.c, upon which this driver is based --- - * - * Copyright (C) 1991, 1992 Linus Torvalds - * - * Extensively rewritten by Theodore Ts'o, 8/16/92 -- 9/14/92. Now - * much more extensible to support other serial cards based on the - * 16450/16550A UART's. Added support for the AST FourPort and the - * Accent Async board. - * - * set_serial_info fixed to set the flags, custom divisor, and uart - * type fields. Fix suggested by Michael K. Johnson 12/12/92. - * - * 11/95: TIOCMIWAIT, TIOCGICOUNT by Angelo Haritsis - * - * 03/96: Modularised by Angelo Haritsis - * - * rs_set_termios fixed to look also for changes of the input - * flags INPCK, BRKINT, PARMRK, IGNPAR and IGNBRK. - * Bernd Anhäupl 05/17/96. - * - * --- End of notices from serial.c --- - * - * Support for the ESP serial card by Andrew J. Robinson - * (Card detection routine taken from a patch - * by Dennis J. Boylan). Patches to allow use with 2.1.x contributed - * by Chris Faylor. - * - * Most recent changes: (Andrew J. Robinson) - * Support for PIO mode. This allows the driver to work properly with - * multiport cards. - * - * Arnaldo Carvalho de Melo - - * several cleanups, use module_init/module_exit, etc - * - * This module exports the following rs232 io functions: - * - * int espserial_init(void); - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -#include -#include -#include - -#include - -#define NR_PORTS 64 /* maximum number of ports */ -#define NR_PRIMARY 8 /* maximum number of primary ports */ -#define REGION_SIZE 8 /* size of io region to request */ - -/* The following variables can be set by giving module options */ -static int irq[NR_PRIMARY]; /* IRQ for each base port */ -static unsigned int divisor[NR_PRIMARY]; /* custom divisor for each port */ -static unsigned int dma = ESP_DMA_CHANNEL; /* DMA channel */ -static unsigned int rx_trigger = ESP_RX_TRIGGER; -static unsigned int tx_trigger = ESP_TX_TRIGGER; -static unsigned int flow_off = ESP_FLOW_OFF; -static unsigned int flow_on = ESP_FLOW_ON; -static unsigned int rx_timeout = ESP_RX_TMOUT; -static unsigned int pio_threshold = ESP_PIO_THRESHOLD; - -MODULE_LICENSE("GPL"); - -module_param_array(irq, int, NULL, 0); -module_param_array(divisor, uint, NULL, 0); -module_param(dma, uint, 0); -module_param(rx_trigger, uint, 0); -module_param(tx_trigger, uint, 0); -module_param(flow_off, uint, 0); -module_param(flow_on, uint, 0); -module_param(rx_timeout, uint, 0); -module_param(pio_threshold, uint, 0); - -/* END */ - -static char *dma_buffer; -static int dma_bytes; -static struct esp_pio_buffer *free_pio_buf; - -#define DMA_BUFFER_SZ 1024 - -#define WAKEUP_CHARS 1024 - -static char serial_name[] __initdata = "ESP serial driver"; -static char serial_version[] __initdata = "2.2"; - -static struct tty_driver *esp_driver; - -/* - * Serial driver configuration section. Here are the various options: - * - * SERIAL_PARANOIA_CHECK - * Check the magic number for the esp_structure where - * ever possible. - */ - -#undef SERIAL_PARANOIA_CHECK -#define SERIAL_DO_RESTART - -#undef SERIAL_DEBUG_INTR -#undef SERIAL_DEBUG_OPEN -#undef SERIAL_DEBUG_FLOW - -#if defined(MODULE) && defined(SERIAL_DEBUG_MCOUNT) -#define DBG_CNT(s) printk(KERN_DEBUG "(%s): [%x] refc=%d, serc=%d, ttyc=%d -> %s\n", \ - tty->name, info->port.flags, \ - serial_driver.refcount, \ - info->port.count, tty->count, s) -#else -#define DBG_CNT(s) -#endif - -static struct esp_struct *ports; - -static void change_speed(struct esp_struct *info); -static void rs_wait_until_sent(struct tty_struct *, int); - -/* - * The ESP card has a clock rate of 14.7456 MHz (that is, 2**ESPC_SCALE - * times the normal 1.8432 Mhz clock of most serial boards). - */ -#define BASE_BAUD ((1843200 / 16) * (1 << ESPC_SCALE)) - -/* Standard COM flags (except for COM4, because of the 8514 problem) */ -#define STD_COM_FLAGS (ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST) - -static inline int serial_paranoia_check(struct esp_struct *info, - char *name, const char *routine) -{ -#ifdef SERIAL_PARANOIA_CHECK - static const char badmagic[] = KERN_WARNING - "Warning: bad magic number for serial struct (%s) in %s\n"; - static const char badinfo[] = KERN_WARNING - "Warning: null esp_struct for (%s) in %s\n"; - - if (!info) { - printk(badinfo, name, routine); - return 1; - } - if (info->magic != ESP_MAGIC) { - printk(badmagic, name, routine); - return 1; - } -#endif - return 0; -} - -static inline unsigned int serial_in(struct esp_struct *info, int offset) -{ - return inb(info->io_port + offset); -} - -static inline void serial_out(struct esp_struct *info, int offset, - unsigned char value) -{ - outb(value, info->io_port+offset); -} - -/* - * ------------------------------------------------------------ - * rs_stop() and rs_start() - * - * This routines are called before setting or resetting tty->stopped. - * They enable or disable transmitter interrupts, as necessary. - * ------------------------------------------------------------ - */ -static void rs_stop(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_stop")) - return; - - spin_lock_irqsave(&info->lock, flags); - if (info->IER & UART_IER_THRI) { - info->IER &= ~UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } - spin_unlock_irqrestore(&info->lock, flags); -} - -static void rs_start(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_start")) - return; - - spin_lock_irqsave(&info->lock, flags); - if (info->xmit_cnt && info->xmit_buf && !(info->IER & UART_IER_THRI)) { - info->IER |= UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } - spin_unlock_irqrestore(&info->lock, flags); -} - -/* - * ---------------------------------------------------------------------- - * - * Here starts the interrupt handling routines. All of the following - * subroutines are declared as inline and are folded into - * rs_interrupt(). They were separated out for readability's sake. - * - * Note: rs_interrupt() is a "fast" interrupt, which means that it - * runs with interrupts turned off. People who may want to modify - * rs_interrupt() should try to keep the interrupt handler as fast as - * possible. After you are done making modifications, it is not a bad - * idea to do: - * - * gcc -S -DKERNEL -Wall -Wstrict-prototypes -O6 -fomit-frame-pointer serial.c - * - * and look at the resulting assemble code in serial.s. - * - * - Ted Ts'o (tytso@mit.edu), 7-Mar-93 - * ----------------------------------------------------------------------- - */ - -static DEFINE_SPINLOCK(pio_lock); - -static inline struct esp_pio_buffer *get_pio_buffer(void) -{ - struct esp_pio_buffer *buf; - unsigned long flags; - - spin_lock_irqsave(&pio_lock, flags); - if (free_pio_buf) { - buf = free_pio_buf; - free_pio_buf = buf->next; - } else { - buf = kmalloc(sizeof(struct esp_pio_buffer), GFP_ATOMIC); - } - spin_unlock_irqrestore(&pio_lock, flags); - return buf; -} - -static inline void release_pio_buffer(struct esp_pio_buffer *buf) -{ - unsigned long flags; - spin_lock_irqsave(&pio_lock, flags); - buf->next = free_pio_buf; - free_pio_buf = buf; - spin_unlock_irqrestore(&pio_lock, flags); -} - -static inline void receive_chars_pio(struct esp_struct *info, int num_bytes) -{ - struct tty_struct *tty = info->port.tty; - int i; - struct esp_pio_buffer *pio_buf; - struct esp_pio_buffer *err_buf; - unsigned char status_mask; - - pio_buf = get_pio_buffer(); - - if (!pio_buf) - return; - - err_buf = get_pio_buffer(); - - if (!err_buf) { - release_pio_buffer(pio_buf); - return; - } - - status_mask = (info->read_status_mask >> 2) & 0x07; - - for (i = 0; i < num_bytes - 1; i += 2) { - *((unsigned short *)(pio_buf->data + i)) = - inw(info->io_port + UART_ESI_RX); - err_buf->data[i] = serial_in(info, UART_ESI_RWS); - err_buf->data[i + 1] = (err_buf->data[i] >> 3) & status_mask; - err_buf->data[i] &= status_mask; - } - - if (num_bytes & 0x0001) { - pio_buf->data[num_bytes - 1] = serial_in(info, UART_ESI_RX); - err_buf->data[num_bytes - 1] = - (serial_in(info, UART_ESI_RWS) >> 3) & status_mask; - } - - /* make sure everything is still ok since interrupts were enabled */ - tty = info->port.tty; - - if (!tty) { - release_pio_buffer(pio_buf); - release_pio_buffer(err_buf); - info->stat_flags &= ~ESP_STAT_RX_TIMEOUT; - return; - } - - status_mask = (info->ignore_status_mask >> 2) & 0x07; - - for (i = 0; i < num_bytes; i++) { - if (!(err_buf->data[i] & status_mask)) { - int flag = 0; - - if (err_buf->data[i] & 0x04) { - flag = TTY_BREAK; - if (info->port.flags & ASYNC_SAK) - do_SAK(tty); - } else if (err_buf->data[i] & 0x02) - flag = TTY_FRAME; - else if (err_buf->data[i] & 0x01) - flag = TTY_PARITY; - tty_insert_flip_char(tty, pio_buf->data[i], flag); - } - } - - tty_schedule_flip(tty); - - info->stat_flags &= ~ESP_STAT_RX_TIMEOUT; - release_pio_buffer(pio_buf); - release_pio_buffer(err_buf); -} - -static void program_isa_dma(int dma, int dir, unsigned long addr, int len) -{ - unsigned long flags; - - flags = claim_dma_lock(); - disable_dma(dma); - clear_dma_ff(dma); - set_dma_mode(dma, dir); - set_dma_addr(dma, addr); - set_dma_count(dma, len); - enable_dma(dma); - release_dma_lock(flags); -} - -static void receive_chars_dma(struct esp_struct *info, int num_bytes) -{ - info->stat_flags &= ~ESP_STAT_RX_TIMEOUT; - dma_bytes = num_bytes; - info->stat_flags |= ESP_STAT_DMA_RX; - - program_isa_dma(dma, DMA_MODE_READ, isa_virt_to_bus(dma_buffer), - dma_bytes); - serial_out(info, UART_ESI_CMD1, ESI_START_DMA_RX); -} - -static inline void receive_chars_dma_done(struct esp_struct *info, - int status) -{ - struct tty_struct *tty = info->port.tty; - int num_bytes; - unsigned long flags; - - flags = claim_dma_lock(); - disable_dma(dma); - clear_dma_ff(dma); - - info->stat_flags &= ~ESP_STAT_DMA_RX; - num_bytes = dma_bytes - get_dma_residue(dma); - release_dma_lock(flags); - - info->icount.rx += num_bytes; - - if (num_bytes > 0) { - tty_insert_flip_string(tty, dma_buffer, num_bytes - 1); - - status &= (0x1c & info->read_status_mask); - - /* Is the status significant or do we throw the last byte ? */ - if (!(status & info->ignore_status_mask)) { - int statflag = 0; - - if (status & 0x10) { - statflag = TTY_BREAK; - (info->icount.brk)++; - if (info->port.flags & ASYNC_SAK) - do_SAK(tty); - } else if (status & 0x08) { - statflag = TTY_FRAME; - info->icount.frame++; - } else if (status & 0x04) { - statflag = TTY_PARITY; - info->icount.parity++; - } - tty_insert_flip_char(tty, dma_buffer[num_bytes - 1], - statflag); - } - tty_schedule_flip(tty); - } - - if (dma_bytes != num_bytes) { - num_bytes = dma_bytes - num_bytes; - dma_bytes = 0; - receive_chars_dma(info, num_bytes); - } else - dma_bytes = 0; -} - -/* Caller must hold info->lock */ - -static inline void transmit_chars_pio(struct esp_struct *info, - int space_avail) -{ - int i; - struct esp_pio_buffer *pio_buf; - - pio_buf = get_pio_buffer(); - - if (!pio_buf) - return; - - while (space_avail && info->xmit_cnt) { - if (info->xmit_tail + space_avail <= ESP_XMIT_SIZE) { - memcpy(pio_buf->data, - &(info->xmit_buf[info->xmit_tail]), - space_avail); - } else { - i = ESP_XMIT_SIZE - info->xmit_tail; - memcpy(pio_buf->data, - &(info->xmit_buf[info->xmit_tail]), i); - memcpy(&(pio_buf->data[i]), info->xmit_buf, - space_avail - i); - } - - info->xmit_cnt -= space_avail; - info->xmit_tail = (info->xmit_tail + space_avail) & - (ESP_XMIT_SIZE - 1); - - for (i = 0; i < space_avail - 1; i += 2) { - outw(*((unsigned short *)(pio_buf->data + i)), - info->io_port + UART_ESI_TX); - } - - if (space_avail & 0x0001) - serial_out(info, UART_ESI_TX, - pio_buf->data[space_avail - 1]); - - if (info->xmit_cnt) { - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_TX_AVAIL); - space_avail = serial_in(info, UART_ESI_STAT1) << 8; - space_avail |= serial_in(info, UART_ESI_STAT2); - - if (space_avail > info->xmit_cnt) - space_avail = info->xmit_cnt; - } - } - - if (info->xmit_cnt < WAKEUP_CHARS) { - if (info->port.tty) - tty_wakeup(info->port.tty); - -#ifdef SERIAL_DEBUG_INTR - printk("THRE..."); -#endif - - if (info->xmit_cnt <= 0) { - info->IER &= ~UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, - ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } - } - - release_pio_buffer(pio_buf); -} - -/* Caller must hold info->lock */ -static inline void transmit_chars_dma(struct esp_struct *info, int num_bytes) -{ - dma_bytes = num_bytes; - - if (info->xmit_tail + dma_bytes <= ESP_XMIT_SIZE) { - memcpy(dma_buffer, &(info->xmit_buf[info->xmit_tail]), - dma_bytes); - } else { - int i = ESP_XMIT_SIZE - info->xmit_tail; - memcpy(dma_buffer, &(info->xmit_buf[info->xmit_tail]), - i); - memcpy(&(dma_buffer[i]), info->xmit_buf, dma_bytes - i); - } - - info->xmit_cnt -= dma_bytes; - info->xmit_tail = (info->xmit_tail + dma_bytes) & (ESP_XMIT_SIZE - 1); - - if (info->xmit_cnt < WAKEUP_CHARS) { - if (info->port.tty) - tty_wakeup(info->port.tty); - -#ifdef SERIAL_DEBUG_INTR - printk("THRE..."); -#endif - - if (info->xmit_cnt <= 0) { - info->IER &= ~UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } - } - - info->stat_flags |= ESP_STAT_DMA_TX; - - program_isa_dma(dma, DMA_MODE_WRITE, isa_virt_to_bus(dma_buffer), - dma_bytes); - serial_out(info, UART_ESI_CMD1, ESI_START_DMA_TX); -} - -static inline void transmit_chars_dma_done(struct esp_struct *info) -{ - int num_bytes; - unsigned long flags; - - flags = claim_dma_lock(); - disable_dma(dma); - clear_dma_ff(dma); - - num_bytes = dma_bytes - get_dma_residue(dma); - info->icount.tx += dma_bytes; - release_dma_lock(flags); - - if (dma_bytes != num_bytes) { - dma_bytes -= num_bytes; - memmove(dma_buffer, dma_buffer + num_bytes, dma_bytes); - - program_isa_dma(dma, DMA_MODE_WRITE, - isa_virt_to_bus(dma_buffer), dma_bytes); - - serial_out(info, UART_ESI_CMD1, ESI_START_DMA_TX); - } else { - dma_bytes = 0; - info->stat_flags &= ~ESP_STAT_DMA_TX; - } -} - -static void check_modem_status(struct esp_struct *info) -{ - int status; - - serial_out(info, UART_ESI_CMD1, ESI_GET_UART_STAT); - status = serial_in(info, UART_ESI_STAT2); - - if (status & UART_MSR_ANY_DELTA) { - /* update input line counters */ - if (status & UART_MSR_TERI) - info->icount.rng++; - if (status & UART_MSR_DDSR) - info->icount.dsr++; - if (status & UART_MSR_DDCD) - info->icount.dcd++; - if (status & UART_MSR_DCTS) - info->icount.cts++; - wake_up_interruptible(&info->port.delta_msr_wait); - } - - if ((info->port.flags & ASYNC_CHECK_CD) && (status & UART_MSR_DDCD)) { -#if (defined(SERIAL_DEBUG_OPEN) || defined(SERIAL_DEBUG_INTR)) - printk("ttys%d CD now %s...", info->line, - (status & UART_MSR_DCD) ? "on" : "off"); -#endif - if (status & UART_MSR_DCD) - wake_up_interruptible(&info->port.open_wait); - else { -#ifdef SERIAL_DEBUG_OPEN - printk("scheduling hangup..."); -#endif - tty_hangup(info->port.tty); - } - } -} - -/* - * This is the serial driver's interrupt routine - */ -static irqreturn_t rs_interrupt_single(int irq, void *dev_id) -{ - struct esp_struct *info; - unsigned err_status; - unsigned int scratch; - -#ifdef SERIAL_DEBUG_INTR - printk("rs_interrupt_single(%d)...", irq); -#endif - info = (struct esp_struct *)dev_id; - err_status = 0; - scratch = serial_in(info, UART_ESI_SID); - - spin_lock(&info->lock); - - if (!info->port.tty) { - spin_unlock(&info->lock); - return IRQ_NONE; - } - - if (scratch & 0x04) { /* error */ - serial_out(info, UART_ESI_CMD1, ESI_GET_ERR_STAT); - err_status = serial_in(info, UART_ESI_STAT1); - serial_in(info, UART_ESI_STAT2); - - if (err_status & 0x01) - info->stat_flags |= ESP_STAT_RX_TIMEOUT; - - if (err_status & 0x20) /* UART status */ - check_modem_status(info); - - if (err_status & 0x80) /* Start break */ - wake_up_interruptible(&info->break_wait); - } - - if ((scratch & 0x88) || /* DMA completed or timed out */ - (err_status & 0x1c) /* receive error */) { - if (info->stat_flags & ESP_STAT_DMA_RX) - receive_chars_dma_done(info, err_status); - else if (info->stat_flags & ESP_STAT_DMA_TX) - transmit_chars_dma_done(info); - } - - if (!(info->stat_flags & (ESP_STAT_DMA_RX | ESP_STAT_DMA_TX)) && - ((scratch & 0x01) || (info->stat_flags & ESP_STAT_RX_TIMEOUT)) && - (info->IER & UART_IER_RDI)) { - int num_bytes; - - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_RX_AVAIL); - num_bytes = serial_in(info, UART_ESI_STAT1) << 8; - num_bytes |= serial_in(info, UART_ESI_STAT2); - - num_bytes = tty_buffer_request_room(info->port.tty, num_bytes); - - if (num_bytes) { - if (dma_bytes || - (info->stat_flags & ESP_STAT_USE_PIO) || - (num_bytes <= info->config.pio_threshold)) - receive_chars_pio(info, num_bytes); - else - receive_chars_dma(info, num_bytes); - } - } - - if (!(info->stat_flags & (ESP_STAT_DMA_RX | ESP_STAT_DMA_TX)) && - (scratch & 0x02) && (info->IER & UART_IER_THRI)) { - if ((info->xmit_cnt <= 0) || info->port.tty->stopped) { - info->IER &= ~UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } else { - int num_bytes; - - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_TX_AVAIL); - num_bytes = serial_in(info, UART_ESI_STAT1) << 8; - num_bytes |= serial_in(info, UART_ESI_STAT2); - - if (num_bytes > info->xmit_cnt) - num_bytes = info->xmit_cnt; - - if (num_bytes) { - if (dma_bytes || - (info->stat_flags & ESP_STAT_USE_PIO) || - (num_bytes <= info->config.pio_threshold)) - transmit_chars_pio(info, num_bytes); - else - transmit_chars_dma(info, num_bytes); - } - } - } - - info->last_active = jiffies; - -#ifdef SERIAL_DEBUG_INTR - printk("end.\n"); -#endif - spin_unlock(&info->lock); - return IRQ_HANDLED; -} - -/* - * ------------------------------------------------------------------- - * Here ends the serial interrupt routines. - * ------------------------------------------------------------------- - */ - -/* - * --------------------------------------------------------------- - * Low level utility subroutines for the serial driver: routines to - * figure out the appropriate timeout for an interrupt chain, routines - * to initialize and startup a serial port, and routines to shutdown a - * serial port. Useful stuff like that. - * - * Caller should hold lock - * --------------------------------------------------------------- - */ - -static void esp_basic_init(struct esp_struct *info) -{ - /* put ESPC in enhanced mode */ - serial_out(info, UART_ESI_CMD1, ESI_SET_MODE); - - if (info->stat_flags & ESP_STAT_NEVER_DMA) - serial_out(info, UART_ESI_CMD2, 0x01); - else - serial_out(info, UART_ESI_CMD2, 0x31); - - /* disable interrupts for now */ - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, 0x00); - - /* set interrupt and DMA channel */ - serial_out(info, UART_ESI_CMD1, ESI_SET_IRQ); - - if (info->stat_flags & ESP_STAT_NEVER_DMA) - serial_out(info, UART_ESI_CMD2, 0x01); - else - serial_out(info, UART_ESI_CMD2, (dma << 4) | 0x01); - - serial_out(info, UART_ESI_CMD1, ESI_SET_ENH_IRQ); - - if (info->line % 8) /* secondary port */ - serial_out(info, UART_ESI_CMD2, 0x0d); /* shared */ - else if (info->irq == 9) - serial_out(info, UART_ESI_CMD2, 0x02); - else - serial_out(info, UART_ESI_CMD2, info->irq); - - /* set error status mask (check this) */ - serial_out(info, UART_ESI_CMD1, ESI_SET_ERR_MASK); - - if (info->stat_flags & ESP_STAT_NEVER_DMA) - serial_out(info, UART_ESI_CMD2, 0xa1); - else - serial_out(info, UART_ESI_CMD2, 0xbd); - - serial_out(info, UART_ESI_CMD2, 0x00); - - /* set DMA timeout */ - serial_out(info, UART_ESI_CMD1, ESI_SET_DMA_TMOUT); - serial_out(info, UART_ESI_CMD2, 0xff); - - /* set FIFO trigger levels */ - serial_out(info, UART_ESI_CMD1, ESI_SET_TRIGGER); - serial_out(info, UART_ESI_CMD2, info->config.rx_trigger >> 8); - serial_out(info, UART_ESI_CMD2, info->config.rx_trigger); - serial_out(info, UART_ESI_CMD2, info->config.tx_trigger >> 8); - serial_out(info, UART_ESI_CMD2, info->config.tx_trigger); - - /* Set clock scaling and wait states */ - serial_out(info, UART_ESI_CMD1, ESI_SET_PRESCALAR); - serial_out(info, UART_ESI_CMD2, 0x04 | ESPC_SCALE); - - /* set reinterrupt pacing */ - serial_out(info, UART_ESI_CMD1, ESI_SET_REINTR); - serial_out(info, UART_ESI_CMD2, 0xff); -} - -static int startup(struct esp_struct *info) -{ - unsigned long flags; - int retval = 0; - unsigned int num_chars; - - spin_lock_irqsave(&info->lock, flags); - - if (info->port.flags & ASYNC_INITIALIZED) - goto out; - - if (!info->xmit_buf) { - info->xmit_buf = (unsigned char *)get_zeroed_page(GFP_ATOMIC); - retval = -ENOMEM; - if (!info->xmit_buf) - goto out; - } - -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "starting up ttys%d (irq %d)...", - info->line, info->irq); -#endif - - /* Flush the RX buffer. Using the ESI flush command may cause */ - /* wild interrupts, so read all the data instead. */ - - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_RX_AVAIL); - num_chars = serial_in(info, UART_ESI_STAT1) << 8; - num_chars |= serial_in(info, UART_ESI_STAT2); - - while (num_chars > 1) { - inw(info->io_port + UART_ESI_RX); - num_chars -= 2; - } - - if (num_chars) - serial_in(info, UART_ESI_RX); - - /* set receive character timeout */ - serial_out(info, UART_ESI_CMD1, ESI_SET_RX_TIMEOUT); - serial_out(info, UART_ESI_CMD2, info->config.rx_timeout); - - /* clear all flags except the "never DMA" flag */ - info->stat_flags &= ESP_STAT_NEVER_DMA; - - if (info->stat_flags & ESP_STAT_NEVER_DMA) - info->stat_flags |= ESP_STAT_USE_PIO; - - spin_unlock_irqrestore(&info->lock, flags); - - /* - * Allocate the IRQ - */ - - retval = request_irq(info->irq, rs_interrupt_single, IRQF_SHARED, - "esp serial", info); - - if (retval) { - if (capable(CAP_SYS_ADMIN)) { - if (info->port.tty) - set_bit(TTY_IO_ERROR, - &info->port.tty->flags); - retval = 0; - } - goto out_unlocked; - } - - if (!(info->stat_flags & ESP_STAT_USE_PIO) && !dma_buffer) { - dma_buffer = (char *)__get_dma_pages( - GFP_KERNEL, get_order(DMA_BUFFER_SZ)); - - /* use PIO mode if DMA buf/chan cannot be allocated */ - if (!dma_buffer) - info->stat_flags |= ESP_STAT_USE_PIO; - else if (request_dma(dma, "esp serial")) { - free_pages((unsigned long)dma_buffer, - get_order(DMA_BUFFER_SZ)); - dma_buffer = NULL; - info->stat_flags |= ESP_STAT_USE_PIO; - } - - } - - info->MCR = UART_MCR_DTR | UART_MCR_RTS | UART_MCR_OUT2; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, info->MCR); - - /* - * Finally, enable interrupts - */ - /* info->IER = UART_IER_MSI | UART_IER_RLSI | UART_IER_RDI; */ - info->IER = UART_IER_RLSI | UART_IER_RDI | UART_IER_DMA_TMOUT | - UART_IER_DMA_TC; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - - if (info->port.tty) - clear_bit(TTY_IO_ERROR, &info->port.tty->flags); - info->xmit_cnt = info->xmit_head = info->xmit_tail = 0; - spin_unlock_irqrestore(&info->lock, flags); - - /* - * Set up the tty->alt_speed kludge - */ - if (info->port.tty) { - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_HI) - info->port.tty->alt_speed = 57600; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_VHI) - info->port.tty->alt_speed = 115200; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_SHI) - info->port.tty->alt_speed = 230400; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_WARP) - info->port.tty->alt_speed = 460800; - } - - /* - * set the speed of the serial port - */ - change_speed(info); - info->port.flags |= ASYNC_INITIALIZED; - return 0; - -out: - spin_unlock_irqrestore(&info->lock, flags); -out_unlocked: - return retval; -} - -/* - * This routine will shutdown a serial port; interrupts are disabled, and - * DTR is dropped if the hangup on close termio flag is on. - */ -static void shutdown(struct esp_struct *info) -{ - unsigned long flags, f; - - if (!(info->port.flags & ASYNC_INITIALIZED)) - return; - -#ifdef SERIAL_DEBUG_OPEN - printk("Shutting down serial port %d (irq %d)....", info->line, - info->irq); -#endif - - spin_lock_irqsave(&info->lock, flags); - /* - * clear delta_msr_wait queue to avoid mem leaks: we may free the irq - * here so the queue might never be waken up - */ - wake_up_interruptible(&info->port.delta_msr_wait); - wake_up_interruptible(&info->break_wait); - - /* stop a DMA transfer on the port being closed */ - /* DMA lock is higher priority always */ - if (info->stat_flags & (ESP_STAT_DMA_RX | ESP_STAT_DMA_TX)) { - f = claim_dma_lock(); - disable_dma(dma); - clear_dma_ff(dma); - release_dma_lock(f); - - dma_bytes = 0; - } - - /* - * Free the IRQ - */ - free_irq(info->irq, info); - - if (dma_buffer) { - struct esp_struct *current_port = ports; - - while (current_port) { - if ((current_port != info) && - (current_port->port.flags & ASYNC_INITIALIZED)) - break; - - current_port = current_port->next_port; - } - - if (!current_port) { - free_dma(dma); - free_pages((unsigned long)dma_buffer, - get_order(DMA_BUFFER_SZ)); - dma_buffer = NULL; - } - } - - if (info->xmit_buf) { - free_page((unsigned long) info->xmit_buf); - info->xmit_buf = NULL; - } - - info->IER = 0; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, 0x00); - - if (!info->port.tty || (info->port.tty->termios->c_cflag & HUPCL)) - info->MCR &= ~(UART_MCR_DTR|UART_MCR_RTS); - - info->MCR &= ~UART_MCR_OUT2; - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, info->MCR); - - if (info->port.tty) - set_bit(TTY_IO_ERROR, &info->port.tty->flags); - - info->port.flags &= ~ASYNC_INITIALIZED; - spin_unlock_irqrestore(&info->lock, flags); -} - -/* - * This routine is called to set the UART divisor registers to match - * the specified baud rate for a serial port. - */ -static void change_speed(struct esp_struct *info) -{ - unsigned short port; - int quot = 0; - unsigned cflag, cval; - int baud, bits; - unsigned char flow1 = 0, flow2 = 0; - unsigned long flags; - - if (!info->port.tty || !info->port.tty->termios) - return; - cflag = info->port.tty->termios->c_cflag; - port = info->io_port; - - /* byte size and parity */ - switch (cflag & CSIZE) { - case CS5: cval = 0x00; bits = 7; break; - case CS6: cval = 0x01; bits = 8; break; - case CS7: cval = 0x02; bits = 9; break; - case CS8: cval = 0x03; bits = 10; break; - default: cval = 0x00; bits = 7; break; - } - if (cflag & CSTOPB) { - cval |= 0x04; - bits++; - } - if (cflag & PARENB) { - cval |= UART_LCR_PARITY; - bits++; - } - if (!(cflag & PARODD)) - cval |= UART_LCR_EPAR; -#ifdef CMSPAR - if (cflag & CMSPAR) - cval |= UART_LCR_SPAR; -#endif - baud = tty_get_baud_rate(info->port.tty); - if (baud == 38400 && - ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_CUST)) - quot = info->custom_divisor; - else { - if (baud == 134) /* Special case since 134 is really 134.5 */ - quot = (2*BASE_BAUD / 269); - else if (baud) - quot = BASE_BAUD / baud; - } - /* If the quotient is ever zero, default to 9600 bps */ - if (!quot) - quot = BASE_BAUD / 9600; - - if (baud) { - /* Actual rate */ - baud = BASE_BAUD/quot; - tty_encode_baud_rate(info->port.tty, baud, baud); - } - info->timeout = ((1024 * HZ * bits * quot) / BASE_BAUD) + (HZ / 50); - - /* CTS flow control flag and modem status interrupts */ - /* info->IER &= ~UART_IER_MSI; */ - if (cflag & CRTSCTS) { - info->port.flags |= ASYNC_CTS_FLOW; - /* info->IER |= UART_IER_MSI; */ - flow1 = 0x04; - flow2 = 0x10; - } else - info->port.flags &= ~ASYNC_CTS_FLOW; - if (cflag & CLOCAL) - info->port.flags &= ~ASYNC_CHECK_CD; - else - info->port.flags |= ASYNC_CHECK_CD; - - /* - * Set up parity check flag - */ - info->read_status_mask = UART_LSR_OE | UART_LSR_THRE | UART_LSR_DR; - if (I_INPCK(info->port.tty)) - info->read_status_mask |= UART_LSR_FE | UART_LSR_PE; - if (I_BRKINT(info->port.tty) || I_PARMRK(info->port.tty)) - info->read_status_mask |= UART_LSR_BI; - - info->ignore_status_mask = 0; -#if 0 - /* This should be safe, but for some broken bits of hardware... */ - if (I_IGNPAR(info->port.tty)) { - info->ignore_status_mask |= UART_LSR_PE | UART_LSR_FE; - info->read_status_mask |= UART_LSR_PE | UART_LSR_FE; - } -#endif - if (I_IGNBRK(info->port.tty)) { - info->ignore_status_mask |= UART_LSR_BI; - info->read_status_mask |= UART_LSR_BI; - /* - * If we're ignore parity and break indicators, ignore - * overruns too. (For real raw support). - */ - if (I_IGNPAR(info->port.tty)) { - info->ignore_status_mask |= UART_LSR_OE | \ - UART_LSR_PE | UART_LSR_FE; - info->read_status_mask |= UART_LSR_OE | \ - UART_LSR_PE | UART_LSR_FE; - } - } - - if (I_IXOFF(info->port.tty)) - flow1 |= 0x81; - - spin_lock_irqsave(&info->lock, flags); - /* set baud */ - serial_out(info, UART_ESI_CMD1, ESI_SET_BAUD); - serial_out(info, UART_ESI_CMD2, quot >> 8); - serial_out(info, UART_ESI_CMD2, quot & 0xff); - - /* set data bits, parity, etc. */ - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_LCR); - serial_out(info, UART_ESI_CMD2, cval); - - /* Enable flow control */ - serial_out(info, UART_ESI_CMD1, ESI_SET_FLOW_CNTL); - serial_out(info, UART_ESI_CMD2, flow1); - serial_out(info, UART_ESI_CMD2, flow2); - - /* set flow control characters (XON/XOFF only) */ - if (I_IXOFF(info->port.tty)) { - serial_out(info, UART_ESI_CMD1, ESI_SET_FLOW_CHARS); - serial_out(info, UART_ESI_CMD2, START_CHAR(info->port.tty)); - serial_out(info, UART_ESI_CMD2, STOP_CHAR(info->port.tty)); - serial_out(info, UART_ESI_CMD2, 0x10); - serial_out(info, UART_ESI_CMD2, 0x21); - switch (cflag & CSIZE) { - case CS5: - serial_out(info, UART_ESI_CMD2, 0x1f); - break; - case CS6: - serial_out(info, UART_ESI_CMD2, 0x3f); - break; - case CS7: - case CS8: - serial_out(info, UART_ESI_CMD2, 0x7f); - break; - default: - serial_out(info, UART_ESI_CMD2, 0xff); - break; - } - } - - /* Set high/low water */ - serial_out(info, UART_ESI_CMD1, ESI_SET_FLOW_LVL); - serial_out(info, UART_ESI_CMD2, info->config.flow_off >> 8); - serial_out(info, UART_ESI_CMD2, info->config.flow_off); - serial_out(info, UART_ESI_CMD2, info->config.flow_on >> 8); - serial_out(info, UART_ESI_CMD2, info->config.flow_on); - - spin_unlock_irqrestore(&info->lock, flags); -} - -static int rs_put_char(struct tty_struct *tty, unsigned char ch) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - int ret = 0; - - if (serial_paranoia_check(info, tty->name, "rs_put_char")) - return 0; - - if (!info->xmit_buf) - return 0; - - spin_lock_irqsave(&info->lock, flags); - if (info->xmit_cnt < ESP_XMIT_SIZE - 1) { - info->xmit_buf[info->xmit_head++] = ch; - info->xmit_head &= ESP_XMIT_SIZE-1; - info->xmit_cnt++; - ret = 1; - } - spin_unlock_irqrestore(&info->lock, flags); - return ret; -} - -static void rs_flush_chars(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_flush_chars")) - return; - - spin_lock_irqsave(&info->lock, flags); - - if (info->xmit_cnt <= 0 || tty->stopped || !info->xmit_buf) - goto out; - - if (!(info->IER & UART_IER_THRI)) { - info->IER |= UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } -out: - spin_unlock_irqrestore(&info->lock, flags); -} - -static int rs_write(struct tty_struct *tty, - const unsigned char *buf, int count) -{ - int c, t, ret = 0; - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_write")) - return 0; - - if (!info->xmit_buf) - return 0; - - while (1) { - /* Thanks to R. Wolff for suggesting how to do this with */ - /* interrupts enabled */ - - c = count; - t = ESP_XMIT_SIZE - info->xmit_cnt - 1; - - if (t < c) - c = t; - - t = ESP_XMIT_SIZE - info->xmit_head; - - if (t < c) - c = t; - - if (c <= 0) - break; - - memcpy(info->xmit_buf + info->xmit_head, buf, c); - - info->xmit_head = (info->xmit_head + c) & (ESP_XMIT_SIZE-1); - info->xmit_cnt += c; - buf += c; - count -= c; - ret += c; - } - - spin_lock_irqsave(&info->lock, flags); - - if (info->xmit_cnt && !tty->stopped && !(info->IER & UART_IER_THRI)) { - info->IER |= UART_IER_THRI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - } - - spin_unlock_irqrestore(&info->lock, flags); - return ret; -} - -static int rs_write_room(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - int ret; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_write_room")) - return 0; - - spin_lock_irqsave(&info->lock, flags); - - ret = ESP_XMIT_SIZE - info->xmit_cnt - 1; - if (ret < 0) - ret = 0; - spin_unlock_irqrestore(&info->lock, flags); - return ret; -} - -static int rs_chars_in_buffer(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - - if (serial_paranoia_check(info, tty->name, "rs_chars_in_buffer")) - return 0; - return info->xmit_cnt; -} - -static void rs_flush_buffer(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_flush_buffer")) - return; - spin_lock_irqsave(&info->lock, flags); - info->xmit_cnt = info->xmit_head = info->xmit_tail = 0; - spin_unlock_irqrestore(&info->lock, flags); - tty_wakeup(tty); -} - -/* - * ------------------------------------------------------------ - * rs_throttle() - * - * This routine is called by the upper-layer tty layer to signal that - * incoming characters should be throttled. - * ------------------------------------------------------------ - */ -static void rs_throttle(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; -#ifdef SERIAL_DEBUG_THROTTLE - char buf[64]; - - printk("throttle %s: %d....\n", tty_name(tty, buf), - tty_chars_in_buffer(tty)); -#endif - - if (serial_paranoia_check(info, tty->name, "rs_throttle")) - return; - - spin_lock_irqsave(&info->lock, flags); - info->IER &= ~UART_IER_RDI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - serial_out(info, UART_ESI_CMD1, ESI_SET_RX_TIMEOUT); - serial_out(info, UART_ESI_CMD2, 0x00); - spin_unlock_irqrestore(&info->lock, flags); -} - -static void rs_unthrottle(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; -#ifdef SERIAL_DEBUG_THROTTLE - char buf[64]; - - printk(KERN_DEBUG "unthrottle %s: %d....\n", tty_name(tty, buf), - tty_chars_in_buffer(tty)); -#endif - - if (serial_paranoia_check(info, tty->name, "rs_unthrottle")) - return; - - spin_lock_irqsave(&info->lock, flags); - info->IER |= UART_IER_RDI; - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - serial_out(info, UART_ESI_CMD1, ESI_SET_RX_TIMEOUT); - serial_out(info, UART_ESI_CMD2, info->config.rx_timeout); - spin_unlock_irqrestore(&info->lock, flags); -} - -/* - * ------------------------------------------------------------ - * rs_ioctl() and friends - * ------------------------------------------------------------ - */ - -static int get_serial_info(struct esp_struct *info, - struct serial_struct __user *retinfo) -{ - struct serial_struct tmp; - - lock_kernel(); - memset(&tmp, 0, sizeof(tmp)); - tmp.type = PORT_16550A; - tmp.line = info->line; - tmp.port = info->io_port; - tmp.irq = info->irq; - tmp.flags = info->port.flags; - tmp.xmit_fifo_size = 1024; - tmp.baud_base = BASE_BAUD; - tmp.close_delay = info->close_delay; - tmp.closing_wait = info->closing_wait; - tmp.custom_divisor = info->custom_divisor; - tmp.hub6 = 0; - unlock_kernel(); - if (copy_to_user(retinfo, &tmp, sizeof(*retinfo))) - return -EFAULT; - return 0; -} - -static int get_esp_config(struct esp_struct *info, - struct hayes_esp_config __user *retinfo) -{ - struct hayes_esp_config tmp; - - if (!retinfo) - return -EFAULT; - - memset(&tmp, 0, sizeof(tmp)); - lock_kernel(); - tmp.rx_timeout = info->config.rx_timeout; - tmp.rx_trigger = info->config.rx_trigger; - tmp.tx_trigger = info->config.tx_trigger; - tmp.flow_off = info->config.flow_off; - tmp.flow_on = info->config.flow_on; - tmp.pio_threshold = info->config.pio_threshold; - tmp.dma_channel = (info->stat_flags & ESP_STAT_NEVER_DMA ? 0 : dma); - unlock_kernel(); - - return copy_to_user(retinfo, &tmp, sizeof(*retinfo)) ? -EFAULT : 0; -} - -static int set_serial_info(struct esp_struct *info, - struct serial_struct __user *new_info) -{ - struct serial_struct new_serial; - struct esp_struct old_info; - unsigned int change_irq; - int retval = 0; - struct esp_struct *current_async; - - if (copy_from_user(&new_serial, new_info, sizeof(new_serial))) - return -EFAULT; - old_info = *info; - - if ((new_serial.type != PORT_16550A) || - (new_serial.hub6) || - (info->io_port != new_serial.port) || - (new_serial.baud_base != BASE_BAUD) || - (new_serial.irq > 15) || - (new_serial.irq < 2) || - (new_serial.irq == 6) || - (new_serial.irq == 8) || - (new_serial.irq == 13)) - return -EINVAL; - - change_irq = new_serial.irq != info->irq; - - if (change_irq && (info->line % 8)) - return -EINVAL; - - if (!capable(CAP_SYS_ADMIN)) { - if (change_irq || - (new_serial.close_delay != info->close_delay) || - ((new_serial.flags & ~ASYNC_USR_MASK) != - (info->port.flags & ~ASYNC_USR_MASK))) - return -EPERM; - info->port.flags = ((info->port.flags & ~ASYNC_USR_MASK) | - (new_serial.flags & ASYNC_USR_MASK)); - info->custom_divisor = new_serial.custom_divisor; - } else { - if (new_serial.irq == 2) - new_serial.irq = 9; - - if (change_irq) { - current_async = ports; - - while (current_async) { - if ((current_async->line >= info->line) && - (current_async->line < (info->line + 8))) { - if (current_async == info) { - if (current_async->port.count > 1) - return -EBUSY; - } else if (current_async->port.count) - return -EBUSY; - } - - current_async = current_async->next_port; - } - } - - /* - * OK, past this point, all the error checking has been done. - * At this point, we start making changes..... - */ - - info->port.flags = ((info->port.flags & ~ASYNC_FLAGS) | - (new_serial.flags & ASYNC_FLAGS)); - info->custom_divisor = new_serial.custom_divisor; - info->close_delay = new_serial.close_delay * HZ/100; - info->closing_wait = new_serial.closing_wait * HZ/100; - - if (change_irq) { - /* - * We need to shutdown the serial port at the old - * port/irq combination. - */ - shutdown(info); - - current_async = ports; - - while (current_async) { - if ((current_async->line >= info->line) && - (current_async->line < (info->line + 8))) - current_async->irq = new_serial.irq; - - current_async = current_async->next_port; - } - - serial_out(info, UART_ESI_CMD1, ESI_SET_ENH_IRQ); - if (info->irq == 9) - serial_out(info, UART_ESI_CMD2, 0x02); - else - serial_out(info, UART_ESI_CMD2, info->irq); - } - } - - if (info->port.flags & ASYNC_INITIALIZED) { - if (((old_info.port.flags & ASYNC_SPD_MASK) != - (info->port.flags & ASYNC_SPD_MASK)) || - (old_info.custom_divisor != info->custom_divisor)) { - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_HI) - info->port.tty->alt_speed = 57600; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_VHI) - info->port.tty->alt_speed = 115200; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_SHI) - info->port.tty->alt_speed = 230400; - if ((info->port.flags & ASYNC_SPD_MASK) == ASYNC_SPD_WARP) - info->port.tty->alt_speed = 460800; - change_speed(info); - } - } else - retval = startup(info); - - return retval; -} - -static int set_esp_config(struct esp_struct *info, - struct hayes_esp_config __user *new_info) -{ - struct hayes_esp_config new_config; - unsigned int change_dma; - int retval = 0; - struct esp_struct *current_async; - unsigned long flags; - - /* Perhaps a non-sysadmin user should be able to do some of these */ - /* operations. I haven't decided yet. */ - - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - if (copy_from_user(&new_config, new_info, sizeof(new_config))) - return -EFAULT; - - if ((new_config.flow_on >= new_config.flow_off) || - (new_config.rx_trigger < 1) || - (new_config.tx_trigger < 1) || - (new_config.flow_off < 1) || - (new_config.flow_on < 1) || - (new_config.rx_trigger > 1023) || - (new_config.tx_trigger > 1023) || - (new_config.flow_off > 1023) || - (new_config.flow_on > 1023) || - (new_config.pio_threshold < 0) || - (new_config.pio_threshold > 1024)) - return -EINVAL; - - if ((new_config.dma_channel != 1) && (new_config.dma_channel != 3)) - new_config.dma_channel = 0; - - if (info->stat_flags & ESP_STAT_NEVER_DMA) - change_dma = new_config.dma_channel; - else - change_dma = (new_config.dma_channel != dma); - - if (change_dma) { - if (new_config.dma_channel) { - /* PIO mode to DMA mode transition OR */ - /* change current DMA channel */ - current_async = ports; - - while (current_async) { - if (current_async == info) { - if (current_async->port.count > 1) - return -EBUSY; - } else if (current_async->port.count) - return -EBUSY; - - current_async = current_async->next_port; - } - - shutdown(info); - dma = new_config.dma_channel; - info->stat_flags &= ~ESP_STAT_NEVER_DMA; - - /* all ports must use the same DMA channel */ - - spin_lock_irqsave(&info->lock, flags); - current_async = ports; - - while (current_async) { - esp_basic_init(current_async); - current_async = current_async->next_port; - } - spin_unlock_irqrestore(&info->lock, flags); - } else { - /* DMA mode to PIO mode only */ - if (info->port.count > 1) - return -EBUSY; - - shutdown(info); - spin_lock_irqsave(&info->lock, flags); - info->stat_flags |= ESP_STAT_NEVER_DMA; - esp_basic_init(info); - spin_unlock_irqrestore(&info->lock, flags); - } - } - - info->config.pio_threshold = new_config.pio_threshold; - - if ((new_config.flow_off != info->config.flow_off) || - (new_config.flow_on != info->config.flow_on)) { - info->config.flow_off = new_config.flow_off; - info->config.flow_on = new_config.flow_on; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_SET_FLOW_LVL); - serial_out(info, UART_ESI_CMD2, new_config.flow_off >> 8); - serial_out(info, UART_ESI_CMD2, new_config.flow_off); - serial_out(info, UART_ESI_CMD2, new_config.flow_on >> 8); - serial_out(info, UART_ESI_CMD2, new_config.flow_on); - spin_unlock_irqrestore(&info->lock, flags); - } - - if ((new_config.rx_trigger != info->config.rx_trigger) || - (new_config.tx_trigger != info->config.tx_trigger)) { - info->config.rx_trigger = new_config.rx_trigger; - info->config.tx_trigger = new_config.tx_trigger; - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_SET_TRIGGER); - serial_out(info, UART_ESI_CMD2, - new_config.rx_trigger >> 8); - serial_out(info, UART_ESI_CMD2, new_config.rx_trigger); - serial_out(info, UART_ESI_CMD2, - new_config.tx_trigger >> 8); - serial_out(info, UART_ESI_CMD2, new_config.tx_trigger); - spin_unlock_irqrestore(&info->lock, flags); - } - - if (new_config.rx_timeout != info->config.rx_timeout) { - info->config.rx_timeout = new_config.rx_timeout; - spin_lock_irqsave(&info->lock, flags); - - if (info->IER & UART_IER_RDI) { - serial_out(info, UART_ESI_CMD1, - ESI_SET_RX_TIMEOUT); - serial_out(info, UART_ESI_CMD2, - new_config.rx_timeout); - } - - spin_unlock_irqrestore(&info->lock, flags); - } - - if (!(info->port.flags & ASYNC_INITIALIZED)) - retval = startup(info); - - return retval; -} - -/* - * get_lsr_info - get line status register info - * - * Purpose: Let user call ioctl() to get info when the UART physically - * is emptied. On bus types like RS485, the transmitter must - * release the bus after transmitting. This must be done when - * the transmit shift register is empty, not be done when the - * transmit holding register is empty. This functionality - * allows an RS485 driver to be written in user space. - */ -static int get_lsr_info(struct esp_struct *info, unsigned int __user *value) -{ - unsigned char status; - unsigned int result; - unsigned long flags; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_GET_UART_STAT); - status = serial_in(info, UART_ESI_STAT1); - spin_unlock_irqrestore(&info->lock, flags); - result = ((status & UART_LSR_TEMT) ? TIOCSER_TEMT : 0); - return put_user(result, value); -} - - -static int esp_tiocmget(struct tty_struct *tty, struct file *file) -{ - struct esp_struct *info = tty->driver_data; - unsigned char control, status; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, __func__)) - return -ENODEV; - if (tty->flags & (1 << TTY_IO_ERROR)) - return -EIO; - - control = info->MCR; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_GET_UART_STAT); - status = serial_in(info, UART_ESI_STAT2); - spin_unlock_irqrestore(&info->lock, flags); - - return ((control & UART_MCR_RTS) ? TIOCM_RTS : 0) - | ((control & UART_MCR_DTR) ? TIOCM_DTR : 0) - | ((status & UART_MSR_DCD) ? TIOCM_CAR : 0) - | ((status & UART_MSR_RI) ? TIOCM_RNG : 0) - | ((status & UART_MSR_DSR) ? TIOCM_DSR : 0) - | ((status & UART_MSR_CTS) ? TIOCM_CTS : 0); -} - -static int esp_tiocmset(struct tty_struct *tty, struct file *file, - unsigned int set, unsigned int clear) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, __func__)) - return -ENODEV; - if (tty->flags & (1 << TTY_IO_ERROR)) - return -EIO; - - spin_lock_irqsave(&info->lock, flags); - - if (set & TIOCM_RTS) - info->MCR |= UART_MCR_RTS; - if (set & TIOCM_DTR) - info->MCR |= UART_MCR_DTR; - - if (clear & TIOCM_RTS) - info->MCR &= ~UART_MCR_RTS; - if (clear & TIOCM_DTR) - info->MCR &= ~UART_MCR_DTR; - - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, info->MCR); - - spin_unlock_irqrestore(&info->lock, flags); - return 0; -} - -/* - * rs_break() --- routine which turns the break handling on or off - */ -static int esp_break(struct tty_struct *tty, int break_state) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "esp_break")) - return -EINVAL; - - if (break_state == -1) { - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_ISSUE_BREAK); - serial_out(info, UART_ESI_CMD2, 0x01); - spin_unlock_irqrestore(&info->lock, flags); - - /* FIXME - new style wait needed here */ - interruptible_sleep_on(&info->break_wait); - } else { - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_ISSUE_BREAK); - serial_out(info, UART_ESI_CMD2, 0x00); - spin_unlock_irqrestore(&info->lock, flags); - } - return 0; -} - -static int rs_ioctl(struct tty_struct *tty, struct file *file, - unsigned int cmd, unsigned long arg) -{ - struct esp_struct *info = tty->driver_data; - struct async_icount cprev, cnow; /* kernel counter temps */ - struct serial_icounter_struct __user *p_cuser; /* user space */ - void __user *argp = (void __user *)arg; - unsigned long flags; - int ret; - - if (serial_paranoia_check(info, tty->name, "rs_ioctl")) - return -ENODEV; - - if ((cmd != TIOCGSERIAL) && (cmd != TIOCSSERIAL) && - (cmd != TIOCSERCONFIG) && (cmd != TIOCSERGWILD) && - (cmd != TIOCSERSWILD) && (cmd != TIOCSERGSTRUCT) && - (cmd != TIOCMIWAIT) && (cmd != TIOCGICOUNT) && - (cmd != TIOCGHAYESESP) && (cmd != TIOCSHAYESESP)) { - if (tty->flags & (1 << TTY_IO_ERROR)) - return -EIO; - } - - switch (cmd) { - case TIOCGSERIAL: - return get_serial_info(info, argp); - case TIOCSSERIAL: - lock_kernel(); - ret = set_serial_info(info, argp); - unlock_kernel(); - return ret; - case TIOCSERGWILD: - return put_user(0L, (unsigned long __user *)argp); - case TIOCSERGETLSR: /* Get line status register */ - return get_lsr_info(info, argp); - case TIOCSERSWILD: - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - return 0; - /* - * Wait for any of the 4 modem inputs (DCD,RI,DSR,CTS) to change - * - mask passed in arg for lines of interest - * (use |'ed TIOCM_RNG/DSR/CD/CTS for masking) - * Caller should use TIOCGICOUNT to see which one it was - */ - case TIOCMIWAIT: - spin_lock_irqsave(&info->lock, flags); - cprev = info->icount; /* note the counters on entry */ - spin_unlock_irqrestore(&info->lock, flags); - while (1) { - /* FIXME: convert to new style wakeup */ - interruptible_sleep_on(&info->port.delta_msr_wait); - /* see if a signal did it */ - if (signal_pending(current)) - return -ERESTARTSYS; - spin_lock_irqsave(&info->lock, flags); - cnow = info->icount; /* atomic copy */ - spin_unlock_irqrestore(&info->lock, flags); - if (cnow.rng == cprev.rng && - cnow.dsr == cprev.dsr && - cnow.dcd == cprev.dcd && - cnow.cts == cprev.cts) - return -EIO; /* no change => error */ - if (((arg & TIOCM_RNG) && - (cnow.rng != cprev.rng)) || - ((arg & TIOCM_DSR) && - (cnow.dsr != cprev.dsr)) || - ((arg & TIOCM_CD) && - (cnow.dcd != cprev.dcd)) || - ((arg & TIOCM_CTS) && - (cnow.cts != cprev.cts))) { - return 0; - } - cprev = cnow; - } - /* NOTREACHED */ - /* - * Get counter of input serial line interrupts (DCD,RI,DSR,CTS) - * Return: write counters to the user passed counter struct - * NB: both 1->0 and 0->1 transitions are counted except for - * RI where only 0->1 is counted. - */ - case TIOCGICOUNT: - spin_lock_irqsave(&info->lock, flags); - cnow = info->icount; - spin_unlock_irqrestore(&info->lock, flags); - p_cuser = argp; - if (put_user(cnow.cts, &p_cuser->cts) || - put_user(cnow.dsr, &p_cuser->dsr) || - put_user(cnow.rng, &p_cuser->rng) || - put_user(cnow.dcd, &p_cuser->dcd)) - return -EFAULT; - return 0; - case TIOCGHAYESESP: - return get_esp_config(info, argp); - case TIOCSHAYESESP: - lock_kernel(); - ret = set_esp_config(info, argp); - unlock_kernel(); - return ret; - default: - return -ENOIOCTLCMD; - } - return 0; -} - -static void rs_set_termios(struct tty_struct *tty, struct ktermios *old_termios) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - change_speed(info); - - spin_lock_irqsave(&info->lock, flags); - - /* Handle transition to B0 status */ - if ((old_termios->c_cflag & CBAUD) && - !(tty->termios->c_cflag & CBAUD)) { - info->MCR &= ~(UART_MCR_DTR|UART_MCR_RTS); - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, info->MCR); - } - - /* Handle transition away from B0 status */ - if (!(old_termios->c_cflag & CBAUD) && - (tty->termios->c_cflag & CBAUD)) { - info->MCR |= (UART_MCR_DTR | UART_MCR_RTS); - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, info->MCR); - } - - spin_unlock_irqrestore(&info->lock, flags); - - /* Handle turning of CRTSCTS */ - if ((old_termios->c_cflag & CRTSCTS) && - !(tty->termios->c_cflag & CRTSCTS)) { - rs_start(tty); - } -} - -/* - * ------------------------------------------------------------ - * rs_close() - * - * This routine is called when the serial port gets closed. First, we - * wait for the last remaining data to be sent. Then, we unlink its - * async structure from the interrupt chain if necessary, and we free - * that IRQ if nothing is left in the chain. - * ------------------------------------------------------------ - */ -static void rs_close(struct tty_struct *tty, struct file *filp) -{ - struct esp_struct *info = tty->driver_data; - unsigned long flags; - - if (!info || serial_paranoia_check(info, tty->name, "rs_close")) - return; - - spin_lock_irqsave(&info->lock, flags); - - if (tty_hung_up_p(filp)) { - DBG_CNT("before DEC-hung"); - goto out; - } - -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "rs_close ttys%d, count = %d\n", - info->line, info->port.count); -#endif - if (tty->count == 1 && info->port.count != 1) { - /* - * Uh, oh. tty->count is 1, which means that the tty - * structure will be freed. Info->count should always - * be one in these conditions. If it's greater than - * one, we've got real problems, since it means the - * serial port won't be shutdown. - */ - printk(KERN_DEBUG "rs_close: bad serial port count; tty->count is 1, info->port.count is %d\n", info->port.count); - info->port.count = 1; - } - if (--info->port.count < 0) { - printk(KERN_ERR "rs_close: bad serial port count for ttys%d: %d\n", - info->line, info->port.count); - info->port.count = 0; - } - if (info->port.count) { - DBG_CNT("before DEC-2"); - goto out; - } - info->port.flags |= ASYNC_CLOSING; - - spin_unlock_irqrestore(&info->lock, flags); - /* - * Now we wait for the transmit buffer to clear; and we notify - * the line discipline to only process XON/XOFF characters. - */ - tty->closing = 1; - if (info->closing_wait != ASYNC_CLOSING_WAIT_NONE) - tty_wait_until_sent(tty, info->closing_wait); - /* - * At this point we stop accepting input. To do this, we - * disable the receive line status interrupts, and tell the - * interrupt driver to stop checking the data ready bit in the - * line status register. - */ - /* info->IER &= ~UART_IER_RLSI; */ - info->IER &= ~UART_IER_RDI; - info->read_status_mask &= ~UART_LSR_DR; - if (info->port.flags & ASYNC_INITIALIZED) { - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_SET_SRV_MASK); - serial_out(info, UART_ESI_CMD2, info->IER); - - /* disable receive timeout */ - serial_out(info, UART_ESI_CMD1, ESI_SET_RX_TIMEOUT); - serial_out(info, UART_ESI_CMD2, 0x00); - - spin_unlock_irqrestore(&info->lock, flags); - - /* - * Before we drop DTR, make sure the UART transmitter - * has completely drained; this is especially - * important if there is a transmit FIFO! - */ - rs_wait_until_sent(tty, info->timeout); - } - shutdown(info); - rs_flush_buffer(tty); - tty_ldisc_flush(tty); - tty->closing = 0; - info->port.tty = NULL; - - if (info->port.blocked_open) { - if (info->close_delay) - msleep_interruptible(jiffies_to_msecs(info->close_delay)); - wake_up_interruptible(&info->port.open_wait); - } - info->port.flags &= ~(ASYNC_NORMAL_ACTIVE|ASYNC_CLOSING); - wake_up_interruptible(&info->port.close_wait); - return; - -out: - spin_unlock_irqrestore(&info->lock, flags); -} - -static void rs_wait_until_sent(struct tty_struct *tty, int timeout) -{ - struct esp_struct *info = tty->driver_data; - unsigned long orig_jiffies, char_time; - unsigned long flags; - - if (serial_paranoia_check(info, tty->name, "rs_wait_until_sent")) - return; - - orig_jiffies = jiffies; - char_time = ((info->timeout - HZ / 50) / 1024) / 5; - - if (!char_time) - char_time = 1; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_TX_AVAIL); - - while ((serial_in(info, UART_ESI_STAT1) != 0x03) || - (serial_in(info, UART_ESI_STAT2) != 0xff)) { - - spin_unlock_irqrestore(&info->lock, flags); - msleep_interruptible(jiffies_to_msecs(char_time)); - - if (signal_pending(current)) - return; - - if (timeout && time_after(jiffies, orig_jiffies + timeout)) - return; - - spin_lock_irqsave(&info->lock, flags); - serial_out(info, UART_ESI_CMD1, ESI_NO_COMMAND); - serial_out(info, UART_ESI_CMD1, ESI_GET_TX_AVAIL); - } - spin_unlock_irqrestore(&info->lock, flags); - set_current_state(TASK_RUNNING); -} - -/* - * esp_hangup() --- called by tty_hangup() when a hangup is signaled. - */ -static void esp_hangup(struct tty_struct *tty) -{ - struct esp_struct *info = tty->driver_data; - - if (serial_paranoia_check(info, tty->name, "esp_hangup")) - return; - - rs_flush_buffer(tty); - shutdown(info); - info->port.count = 0; - info->port.flags &= ~ASYNC_NORMAL_ACTIVE; - info->port.tty = NULL; - wake_up_interruptible(&info->port.open_wait); -} - -static int esp_carrier_raised(struct tty_port *port) -{ - struct esp_struct *info = container_of(port, struct esp_struct, port); - serial_out(info, UART_ESI_CMD1, ESI_GET_UART_STAT); - if (serial_in(info, UART_ESI_STAT2) & UART_MSR_DCD) - return 1; - return 0; -} - -/* - * ------------------------------------------------------------ - * esp_open() and friends - * ------------------------------------------------------------ - */ -static int block_til_ready(struct tty_struct *tty, struct file *filp, - struct esp_struct *info) -{ - DECLARE_WAITQUEUE(wait, current); - int retval; - int do_clocal = 0; - unsigned long flags; - int cd; - struct tty_port *port = &info->port; - - /* - * If the device is in the middle of being closed, then block - * until it's done, and then try again. - */ - if (tty_hung_up_p(filp) || - (port->flags & ASYNC_CLOSING)) { - if (port->flags & ASYNC_CLOSING) - interruptible_sleep_on(&port->close_wait); -#ifdef SERIAL_DO_RESTART - if (port->flags & ASYNC_HUP_NOTIFY) - return -EAGAIN; - else - return -ERESTARTSYS; -#else - return -EAGAIN; -#endif - } - - /* - * If non-blocking mode is set, or the port is not enabled, - * then make the check up front and then exit. - */ - if ((filp->f_flags & O_NONBLOCK) || - (tty->flags & (1 << TTY_IO_ERROR))) { - port->flags |= ASYNC_NORMAL_ACTIVE; - return 0; - } - - if (tty->termios->c_cflag & CLOCAL) - do_clocal = 1; - - /* - * Block waiting for the carrier detect and the line to become - * free (i.e., not in use by the callout). While we are in - * this loop, port->count is dropped by one, so that - * rs_close() knows when to free things. We restore it upon - * exit, either normal or abnormal. - */ - retval = 0; - add_wait_queue(&port->open_wait, &wait); -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "block_til_ready before block: ttys%d, count = %d\n", - info->line, port->count); -#endif - spin_lock_irqsave(&info->lock, flags); - if (!tty_hung_up_p(filp)) - port->count--; - port->blocked_open++; - while (1) { - if ((tty->termios->c_cflag & CBAUD)) { - unsigned int scratch; - - serial_out(info, UART_ESI_CMD1, ESI_READ_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - scratch = serial_in(info, UART_ESI_STAT1); - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, - scratch | UART_MCR_DTR | UART_MCR_RTS); - } - set_current_state(TASK_INTERRUPTIBLE); - if (tty_hung_up_p(filp) || - !(port->flags & ASYNC_INITIALIZED)) { -#ifdef SERIAL_DO_RESTART - if (port->flags & ASYNC_HUP_NOTIFY) - retval = -EAGAIN; - else - retval = -ERESTARTSYS; -#else - retval = -EAGAIN; -#endif - break; - } - - cd = tty_port_carrier_raised(port); - - if (!(port->flags & ASYNC_CLOSING) && - (do_clocal)) - break; - if (signal_pending(current)) { - retval = -ERESTARTSYS; - break; - } -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "block_til_ready blocking: ttys%d, count = %d\n", - info->line, port->count); -#endif - spin_unlock_irqrestore(&info->lock, flags); - schedule(); - spin_lock_irqsave(&info->lock, flags); - } - set_current_state(TASK_RUNNING); - remove_wait_queue(&port->open_wait, &wait); - if (!tty_hung_up_p(filp)) - port->count++; - port->blocked_open--; - spin_unlock_irqrestore(&info->lock, flags); -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "block_til_ready after blocking: ttys%d, count = %d\n", - info->line, port->count); -#endif - if (retval) - return retval; - port->flags |= ASYNC_NORMAL_ACTIVE; - return 0; -} - -/* - * This routine is called whenever a serial port is opened. It - * enables interrupts for a serial port, linking in its async structure into - * the IRQ chain. It also performs the serial-specific - * initialization for the tty structure. - */ -static int esp_open(struct tty_struct *tty, struct file *filp) -{ - struct esp_struct *info; - int retval, line; - unsigned long flags; - - line = tty->index; - if ((line < 0) || (line >= NR_PORTS)) - return -ENODEV; - - /* find the port in the chain */ - - info = ports; - - while (info && (info->line != line)) - info = info->next_port; - - if (!info) { - serial_paranoia_check(info, tty->name, "esp_open"); - return -ENODEV; - } - -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "esp_open %s, count = %d\n", tty->name, info->port.count); -#endif - spin_lock_irqsave(&info->lock, flags); - info->port.count++; - tty->driver_data = info; - info->port.tty = tty; - - spin_unlock_irqrestore(&info->lock, flags); - - /* - * Start up serial port - */ - retval = startup(info); - if (retval) - return retval; - - retval = block_til_ready(tty, filp, info); - if (retval) { -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "esp_open returning after block_til_ready with %d\n", - retval); -#endif - return retval; - } -#ifdef SERIAL_DEBUG_OPEN - printk(KERN_DEBUG "esp_open %s successful...", tty->name); -#endif - return 0; -} - -/* - * --------------------------------------------------------------------- - * espserial_init() and friends - * - * espserial_init() is called at boot-time to initialize the serial driver. - * --------------------------------------------------------------------- - */ - -/* - * This routine prints out the appropriate serial driver version - * number, and identifies which options were configured into this - * driver. - */ - -static void __init show_serial_version(void) -{ - printk(KERN_INFO "%s version %s (DMA %u)\n", - serial_name, serial_version, dma); -} - -/* - * This routine is called by espserial_init() to initialize a specific serial - * port. - */ -static int autoconfig(struct esp_struct *info) -{ - int port_detected = 0; - unsigned long flags; - - if (!request_region(info->io_port, REGION_SIZE, "esp serial")) - return -EIO; - - spin_lock_irqsave(&info->lock, flags); - /* - * Check for ESP card - */ - - if (serial_in(info, UART_ESI_BASE) == 0xf3) { - serial_out(info, UART_ESI_CMD1, 0x00); - serial_out(info, UART_ESI_CMD1, 0x01); - - if ((serial_in(info, UART_ESI_STAT2) & 0x70) == 0x20) { - port_detected = 1; - - if (!(info->irq)) { - serial_out(info, UART_ESI_CMD1, 0x02); - - if (serial_in(info, UART_ESI_STAT1) & 0x01) - info->irq = 3; - else - info->irq = 4; - } - - - /* put card in enhanced mode */ - /* this prevents access through */ - /* the "old" IO ports */ - esp_basic_init(info); - - /* clear out MCR */ - serial_out(info, UART_ESI_CMD1, ESI_WRITE_UART); - serial_out(info, UART_ESI_CMD2, UART_MCR); - serial_out(info, UART_ESI_CMD2, 0x00); - } - } - if (!port_detected) - release_region(info->io_port, REGION_SIZE); - - spin_unlock_irqrestore(&info->lock, flags); - return (port_detected); -} - -static const struct tty_operations esp_ops = { - .open = esp_open, - .close = rs_close, - .write = rs_write, - .put_char = rs_put_char, - .flush_chars = rs_flush_chars, - .write_room = rs_write_room, - .chars_in_buffer = rs_chars_in_buffer, - .flush_buffer = rs_flush_buffer, - .ioctl = rs_ioctl, - .throttle = rs_throttle, - .unthrottle = rs_unthrottle, - .set_termios = rs_set_termios, - .stop = rs_stop, - .start = rs_start, - .hangup = esp_hangup, - .break_ctl = esp_break, - .wait_until_sent = rs_wait_until_sent, - .tiocmget = esp_tiocmget, - .tiocmset = esp_tiocmset, -}; - -static const struct tty_port_operations esp_port_ops = { - .esp_carrier_raised, -}; - -/* - * The serial driver boot-time initialization code! - */ -static int __init espserial_init(void) -{ - int i, offset; - struct esp_struct *info; - struct esp_struct *last_primary = NULL; - int esp[] = { 0x100, 0x140, 0x180, 0x200, 0x240, 0x280, 0x300, 0x380 }; - - esp_driver = alloc_tty_driver(NR_PORTS); - if (!esp_driver) - return -ENOMEM; - - for (i = 0; i < NR_PRIMARY; i++) { - if (irq[i] != 0) { - if ((irq[i] < 2) || (irq[i] > 15) || (irq[i] == 6) || - (irq[i] == 8) || (irq[i] == 13)) - irq[i] = 0; - else if (irq[i] == 2) - irq[i] = 9; - } - } - - if ((dma != 1) && (dma != 3)) - dma = 0; - - if ((rx_trigger < 1) || (rx_trigger > 1023)) - rx_trigger = 768; - - if ((tx_trigger < 1) || (tx_trigger > 1023)) - tx_trigger = 768; - - if ((flow_off < 1) || (flow_off > 1023)) - flow_off = 1016; - - if ((flow_on < 1) || (flow_on > 1023)) - flow_on = 944; - - if ((rx_timeout < 0) || (rx_timeout > 255)) - rx_timeout = 128; - - if (flow_on >= flow_off) - flow_on = flow_off - 1; - - show_serial_version(); - - /* Initialize the tty_driver structure */ - - esp_driver->owner = THIS_MODULE; - esp_driver->name = "ttyP"; - esp_driver->major = ESP_IN_MAJOR; - esp_driver->minor_start = 0; - esp_driver->type = TTY_DRIVER_TYPE_SERIAL; - esp_driver->subtype = SERIAL_TYPE_NORMAL; - esp_driver->init_termios = tty_std_termios; - esp_driver->init_termios.c_cflag = - B9600 | CS8 | CREAD | HUPCL | CLOCAL; - esp_driver->init_termios.c_ispeed = 9600; - esp_driver->init_termios.c_ospeed = 9600; - esp_driver->flags = TTY_DRIVER_REAL_RAW; - tty_set_operations(esp_driver, &esp_ops); - if (tty_register_driver(esp_driver)) { - printk(KERN_ERR "Couldn't register esp serial driver"); - put_tty_driver(esp_driver); - return 1; - } - - info = kzalloc(sizeof(struct esp_struct), GFP_KERNEL); - - if (!info) { - printk(KERN_ERR "Couldn't allocate memory for esp serial device information\n"); - tty_unregister_driver(esp_driver); - put_tty_driver(esp_driver); - return 1; - } - - spin_lock_init(&info->lock); - /* rx_trigger, tx_trigger are needed by autoconfig */ - info->config.rx_trigger = rx_trigger; - info->config.tx_trigger = tx_trigger; - - i = 0; - offset = 0; - - do { - tty_port_init(&info->port); - info->port.ops = &esp_port_ops; - info->io_port = esp[i] + offset; - info->irq = irq[i]; - info->line = (i * 8) + (offset / 8); - - if (!autoconfig(info)) { - i++; - offset = 0; - continue; - } - - info->custom_divisor = (divisor[i] >> (offset / 2)) & 0xf; - info->port.flags = STD_COM_FLAGS; - if (info->custom_divisor) - info->port.flags |= ASYNC_SPD_CUST; - info->magic = ESP_MAGIC; - info->close_delay = 5*HZ/10; - info->closing_wait = 30*HZ; - info->config.rx_timeout = rx_timeout; - info->config.flow_on = flow_on; - info->config.flow_off = flow_off; - info->config.pio_threshold = pio_threshold; - info->next_port = ports; - init_waitqueue_head(&info->break_wait); - ports = info; - printk(KERN_INFO "ttyP%d at 0x%04x (irq = %d) is an ESP ", - info->line, info->io_port, info->irq); - - if (info->line % 8) { - printk("secondary port\n"); - /* 8 port cards can't do DMA */ - info->stat_flags |= ESP_STAT_NEVER_DMA; - - if (last_primary) - last_primary->stat_flags |= ESP_STAT_NEVER_DMA; - } else { - printk("primary port\n"); - last_primary = info; - irq[i] = info->irq; - } - - if (!dma) - info->stat_flags |= ESP_STAT_NEVER_DMA; - - info = kzalloc(sizeof(struct esp_struct), GFP_KERNEL); - if (!info) { - printk(KERN_ERR "Couldn't allocate memory for esp serial device information\n"); - /* allow use of the already detected ports */ - return 0; - } - - spin_lock_init(&info->lock); - /* rx_trigger, tx_trigger are needed by autoconfig */ - info->config.rx_trigger = rx_trigger; - info->config.tx_trigger = tx_trigger; - - if (offset == 56) { - i++; - offset = 0; - } else { - offset += 8; - } - } while (i < NR_PRIMARY); - - /* free the last port memory allocation */ - kfree(info); - - return 0; -} - -static void __exit espserial_exit(void) -{ - int e1; - struct esp_struct *temp_async; - struct esp_pio_buffer *pio_buf; - - e1 = tty_unregister_driver(esp_driver); - if (e1) - printk(KERN_ERR "esp: failed to unregister driver (%d)\n", e1); - put_tty_driver(esp_driver); - - while (ports) { - if (ports->io_port) - release_region(ports->io_port, REGION_SIZE); - temp_async = ports->next_port; - kfree(ports); - ports = temp_async; - } - - if (dma_buffer) - free_pages((unsigned long)dma_buffer, - get_order(DMA_BUFFER_SZ)); - - while (free_pio_buf) { - pio_buf = free_pio_buf->next; - kfree(free_pio_buf); - free_pio_buf = pio_buf; - } -} - -module_init(espserial_init); -module_exit(espserial_exit); diff --git a/include/linux/Kbuild b/include/linux/Kbuild index 5a5385749e16..f72914db2a11 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -214,7 +214,6 @@ unifdef-y += futex.h unifdef-y += fs.h unifdef-y += gameport.h unifdef-y += generic_serial.h -unifdef-y += hayesesp.h unifdef-y += hdlcdrv.h unifdef-y += hdlc.h unifdef-y += hdreg.h diff --git a/include/linux/hayesesp.h b/include/linux/hayesesp.h deleted file mode 100644 index 92b08cfe4a75..000000000000 --- a/include/linux/hayesesp.h +++ /dev/null @@ -1,114 +0,0 @@ -#ifndef HAYESESP_H -#define HAYESESP_H - -struct hayes_esp_config { - short flow_on; - short flow_off; - short rx_trigger; - short tx_trigger; - short pio_threshold; - unsigned char rx_timeout; - char dma_channel; -}; - -#ifdef __KERNEL__ - -#define ESP_DMA_CHANNEL 0 -#define ESP_RX_TRIGGER 768 -#define ESP_TX_TRIGGER 768 -#define ESP_FLOW_OFF 1016 -#define ESP_FLOW_ON 944 -#define ESP_RX_TMOUT 128 -#define ESP_PIO_THRESHOLD 32 - -#define ESP_IN_MAJOR 57 /* major dev # for dial in */ -#define ESP_OUT_MAJOR 58 /* major dev # for dial out */ -#define ESPC_SCALE 3 -#define UART_ESI_BASE 0x00 -#define UART_ESI_SID 0x01 -#define UART_ESI_RX 0x02 -#define UART_ESI_TX 0x02 -#define UART_ESI_CMD1 0x04 -#define UART_ESI_CMD2 0x05 -#define UART_ESI_STAT1 0x04 -#define UART_ESI_STAT2 0x05 -#define UART_ESI_RWS 0x07 - -#define UART_IER_DMA_TMOUT 0x80 -#define UART_IER_DMA_TC 0x08 - -#define ESI_SET_IRQ 0x04 -#define ESI_SET_DMA_TMOUT 0x05 -#define ESI_SET_SRV_MASK 0x06 -#define ESI_SET_ERR_MASK 0x07 -#define ESI_SET_FLOW_CNTL 0x08 -#define ESI_SET_FLOW_CHARS 0x09 -#define ESI_SET_FLOW_LVL 0x0a -#define ESI_SET_TRIGGER 0x0b -#define ESI_SET_RX_TIMEOUT 0x0c -#define ESI_SET_FLOW_TMOUT 0x0d -#define ESI_WRITE_UART 0x0e -#define ESI_READ_UART 0x0f -#define ESI_SET_MODE 0x10 -#define ESI_GET_ERR_STAT 0x12 -#define ESI_GET_UART_STAT 0x13 -#define ESI_GET_RX_AVAIL 0x14 -#define ESI_GET_TX_AVAIL 0x15 -#define ESI_START_DMA_RX 0x16 -#define ESI_START_DMA_TX 0x17 -#define ESI_ISSUE_BREAK 0x1a -#define ESI_FLUSH_RX 0x1b -#define ESI_FLUSH_TX 0x1c -#define ESI_SET_BAUD 0x1d -#define ESI_SET_ENH_IRQ 0x1f -#define ESI_SET_REINTR 0x20 -#define ESI_SET_PRESCALAR 0x23 -#define ESI_NO_COMMAND 0xff - -#define ESP_STAT_RX_TIMEOUT 0x01 -#define ESP_STAT_DMA_RX 0x02 -#define ESP_STAT_DMA_TX 0x04 -#define ESP_STAT_NEVER_DMA 0x08 -#define ESP_STAT_USE_PIO 0x10 - -#define ESP_MAGIC 0x53ee -#define ESP_XMIT_SIZE 4096 - -struct esp_struct { - int magic; - struct tty_port port; - spinlock_t lock; - int io_port; - int irq; - int read_status_mask; - int ignore_status_mask; - int timeout; - int stat_flags; - int custom_divisor; - int close_delay; - unsigned short closing_wait; - unsigned short closing_wait2; - int IER; /* Interrupt Enable Register */ - int MCR; /* Modem control register */ - unsigned long last_active; - int line; - unsigned char *xmit_buf; - int xmit_head; - int xmit_tail; - int xmit_cnt; - wait_queue_head_t break_wait; - struct async_icount icount; /* kernel counters for the 4 input interrupts */ - struct hayes_esp_config config; /* port configuration */ - struct esp_struct *next_port; /* For the linked list */ -}; - -struct esp_pio_buffer { - unsigned char data[1024]; - struct esp_pio_buffer *next; -}; - -#endif /* __KERNEL__ */ - - -#endif /* ESP_H */ - -- cgit v1.2.3 From 7e11a0fb3b7ab83871b7a56c7a67c603283ec4b9 Mon Sep 17 00:00:00 2001 From: Tilman Schmidt Date: Wed, 4 Nov 2009 16:04:52 -0800 Subject: tty: docs: serial/tty, add to ldisc methods A small addition to the ldisc method descriptions. Signed-off-by: Tilman Schmidt Signed-off-by: Randy Dunlap Acked-by: Alan Cox Signed-off-by: Greg Kroah-Hartman --- Documentation/serial/tty.txt | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt index 8e65c4498c52..5e5349a4fcd2 100644 --- a/Documentation/serial/tty.txt +++ b/Documentation/serial/tty.txt @@ -42,7 +42,8 @@ TTY side interfaces: open() - Called when the line discipline is attached to the terminal. No other call into the line discipline for this tty will occur until it - completes successfully. Can sleep. + completes successfully. Returning an error will + prevent the ldisc from being attached. Can sleep. close() - This is called on a terminal when the line discipline is being unplugged. At the point of @@ -52,7 +53,7 @@ close() - This is called on a terminal when the line hangup() - Called when the tty line is hung up. The line discipline should cease I/O to the tty. No further calls into the ldisc code will occur. - Can sleep. + The return value is ignored. Can sleep. write() - A process is writing data through the line discipline. Multiple write calls are serialized @@ -83,6 +84,10 @@ ioctl() - Called when an ioctl is handed to the tty layer that might be for the ldisc. Multiple ioctl calls may occur in parallel. May sleep. +compat_ioctl() - Called when a 32 bit ioctl is handed to the tty layer + that might be for the ldisc. Multiple ioctl calls + may occur in parallel. May sleep. + Driver Side Interfaces: receive_buf() - Hand buffers of bytes from the driver to the ldisc -- cgit v1.2.3 From 15ac7afe515631ec36966b1cf632a87276536f57 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 11 Dec 2009 16:16:32 -0800 Subject: omap: mux: Add new style pin multiplexing code for omap3 Initially only for 34xx. This code allows us to: - Make the code more generic as the omap internal signal names can stay the same across omap generations for some devices - Map mux registers to GPIO registers that is needed for dynamic muxing of pins during off-idle - Override bootloader mux values via kernel cmdline using omap_mux=some.signa1=0x1234,some.signal2=0x1234 - View and set the mux registers via debugfs if CONFIG_DEBUG_FS is enabled Cc: Mike Rapoport Cc: Benoit Cousson Signed-off-by: Paul Walmsley Signed-off-by: Tony Lindgren --- Documentation/kernel-parameters.txt | 5 + arch/arm/mach-omap2/mux.c | 444 +++++++++++++++++++++++++++++++++++- arch/arm/mach-omap2/mux.h | 160 +++++++++++++ arch/arm/plat-omap/mux.c | 8 +- 4 files changed, 611 insertions(+), 6 deletions(-) create mode 100644 arch/arm/mach-omap2/mux.h (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 777dc8a32df8..4f62fcbce108 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1787,6 +1787,11 @@ and is between 256 and 4096 characters. It is defined in the file waiting for the ACK, so if this is set too high interrupts *may* be lost! + omap_mux= [OMAP] Override bootloader pin multiplexing. + Format: ... + For example, to override I2C bus2: + omap_mux=i2c2_scl.i2c2_scl=0x100,i2c2_sda.i2c2_sda=0x100 + opl3= [HW,OSS] Format: diff --git a/arch/arm/mach-omap2/mux.c b/arch/arm/mach-omap2/mux.c index 64250c504c64..b082b504de8b 100644 --- a/arch/arm/mach-omap2/mux.c +++ b/arch/arm/mach-omap2/mux.c @@ -27,18 +27,23 @@ #include #include #include +#include #include #include #include -#ifdef CONFIG_OMAP_MUX +#include "mux.h" #define OMAP_MUX_BASE_OFFSET 0x30 /* Offset from CTRL_BASE */ #define OMAP_MUX_BASE_SZ 0x5ca -static struct omap_mux_cfg arch_mux_cfg; +struct omap_mux_entry { + struct omap_mux mux; + struct list_head node; +}; + static void __iomem *mux_base; static inline u16 omap_mux_read(u16 reg) @@ -57,6 +62,10 @@ static inline void omap_mux_write(u16 val, u16 reg) __raw_writew(val, mux_base + reg); } +#ifdef CONFIG_OMAP_MUX + +static struct omap_mux_cfg arch_mux_cfg; + /* NOTE: See mux.h for the enumeration */ #ifdef CONFIG_ARCH_OMAP24XX @@ -667,8 +676,8 @@ int __init omap2_mux_init(void) mux_pbase = OMAP2420_CTRL_BASE + OMAP_MUX_BASE_OFFSET; else if (cpu_is_omap2430()) mux_pbase = OMAP243X_CTRL_BASE + OMAP_MUX_BASE_OFFSET; - else if (cpu_is_omap34xx()) - mux_pbase = OMAP343X_CTRL_BASE + OMAP_MUX_BASE_OFFSET; + else + return -ENODEV; mux_base = ioremap(mux_pbase, OMAP_MUX_BASE_SZ); if (!mux_base) { @@ -689,4 +698,431 @@ int __init omap2_mux_init(void) return omap_mux_register(&arch_mux_cfg); } +#endif /* CONFIG_OMAP_MUX */ + +/*----------------------------------------------------------------------------*/ + +#ifdef CONFIG_ARCH_OMAP34XX + +static LIST_HEAD(muxmodes); +static DEFINE_MUTEX(muxmode_mutex); + +#ifdef CONFIG_OMAP_MUX + +static char *omap_mux_options; + +int __init omap_mux_init_gpio(int gpio, int val) +{ + struct omap_mux_entry *e; + int found = 0; + + if (!gpio) + return -EINVAL; + + list_for_each_entry(e, &muxmodes, node) { + struct omap_mux *m = &e->mux; + if (gpio == m->gpio) { + u16 old_mode; + u16 mux_mode; + + old_mode = omap_mux_read(m->reg_offset); + mux_mode = val & ~(OMAP_MUX_NR_MODES - 1); + mux_mode |= OMAP_MUX_MODE4; + printk(KERN_DEBUG "mux: Setting signal " + "%s.gpio%i 0x%04x -> 0x%04x\n", + m->muxnames[0], gpio, old_mode, mux_mode); + omap_mux_write(mux_mode, m->reg_offset); + found++; + } + } + + if (found == 1) + return 0; + + if (found > 1) { + printk(KERN_ERR "mux: Multiple gpio paths for gpio%i\n", gpio); + return -EINVAL; + } + + printk(KERN_ERR "mux: Could not set gpio%i\n", gpio); + + return -ENODEV; +} + +int __init omap_mux_init_signal(char *muxname, int val) +{ + struct omap_mux_entry *e; + char *m0_name = NULL, *mode_name = NULL; + int found = 0; + + mode_name = strchr(muxname, '.'); + if (mode_name) { + *mode_name = '\0'; + mode_name++; + m0_name = muxname; + } else { + mode_name = muxname; + } + + list_for_each_entry(e, &muxmodes, node) { + struct omap_mux *m = &e->mux; + char *m0_entry = m->muxnames[0]; + int i; + + if (m0_name && strcmp(m0_name, m0_entry)) + continue; + + for (i = 0; i < OMAP_MUX_NR_MODES; i++) { + char *mode_cur = m->muxnames[i]; + + if (!mode_cur) + continue; + + if (!strcmp(mode_name, mode_cur)) { + u16 old_mode; + u16 mux_mode; + + old_mode = omap_mux_read(m->reg_offset); + mux_mode = val | i; + printk(KERN_DEBUG "mux: Setting signal " + "%s.%s 0x%04x -> 0x%04x\n", + m0_entry, muxname, old_mode, mux_mode); + omap_mux_write(mux_mode, m->reg_offset); + found++; + } + } + } + + if (found == 1) + return 0; + + if (found > 1) { + printk(KERN_ERR "mux: Multiple signal paths (%i) for %s\n", + found, muxname); + return -EINVAL; + } + + printk(KERN_ERR "mux: Could not set signal %s\n", muxname); + + return -ENODEV; +} + +static void __init omap_mux_free_names(struct omap_mux *m) +{ + int i; + + for (i = 0; i < OMAP_MUX_NR_MODES; i++) + kfree(m->muxnames[i]); + +#ifdef CONFIG_DEBUG_FS + for (i = 0; i < OMAP_MUX_NR_SIDES; i++) + kfree(m->balls[i]); +#endif + +} + +/* Free all data except for GPIO pins unless CONFIG_DEBUG_FS is set */ +static int __init omap_mux_late_init(void) +{ + struct omap_mux_entry *e, *tmp; + + list_for_each_entry_safe(e, tmp, &muxmodes, node) { + struct omap_mux *m = &e->mux; + u16 mode = omap_mux_read(m->reg_offset); + + if (OMAP_MODE_GPIO(mode)) + continue; + +#ifndef CONFIG_DEBUG_FS + mutex_lock(&muxmode_mutex); + list_del(&e->node); + mutex_unlock(&muxmode_mutex); + omap_mux_free_names(m); + kfree(m); #endif + + } + + return 0; +} +late_initcall(omap_mux_late_init); + +static void __init omap_mux_package_fixup(struct omap_mux *p, + struct omap_mux *superset) +{ + while (p->reg_offset != OMAP_MUX_TERMINATOR) { + struct omap_mux *s = superset; + int found = 0; + + while (s->reg_offset != OMAP_MUX_TERMINATOR) { + if (s->reg_offset == p->reg_offset) { + *s = *p; + found++; + break; + } + s++; + } + if (!found) + printk(KERN_ERR "mux: Unknown entry offset 0x%x\n", + p->reg_offset); + p++; + } +} + +#ifdef CONFIG_DEBUG_FS + +static void __init omap_mux_package_init_balls(struct omap_ball *b, + struct omap_mux *superset) +{ + while (b->reg_offset != OMAP_MUX_TERMINATOR) { + struct omap_mux *s = superset; + int found = 0; + + while (s->reg_offset != OMAP_MUX_TERMINATOR) { + if (s->reg_offset == b->reg_offset) { + s->balls[0] = b->balls[0]; + s->balls[1] = b->balls[1]; + found++; + break; + } + s++; + } + if (!found) + printk(KERN_ERR "mux: Unknown ball offset 0x%x\n", + b->reg_offset); + b++; + } +} + +#else /* CONFIG_DEBUG_FS */ + +static inline void omap_mux_package_init_balls(struct omap_ball *b, + struct omap_mux *superset) +{ +} + +#endif /* CONFIG_DEBUG_FS */ + +static int __init omap_mux_setup(char *options) +{ + if (!options) + return 0; + + omap_mux_options = options; + + return 1; +} +__setup("omap_mux=", omap_mux_setup); + +/* + * Note that the omap_mux=some.signal1=0x1234,some.signal2=0x1234 + * cmdline options only override the bootloader values. + * During development, please enable CONFIG_DEBUG_FS, and use the + * signal specific entries under debugfs. + */ +static void __init omap_mux_set_cmdline_signals(void) +{ + char *options, *next_opt, *token; + + if (!omap_mux_options) + return; + + options = kmalloc(strlen(omap_mux_options) + 1, GFP_KERNEL); + if (!options) + return; + + strcpy(options, omap_mux_options); + next_opt = options; + + while ((token = strsep(&next_opt, ",")) != NULL) { + char *keyval, *name; + unsigned long val; + + keyval = token; + name = strsep(&keyval, "="); + if (name) { + int res; + + res = strict_strtoul(keyval, 0x10, &val); + if (res < 0) + continue; + + omap_mux_init_signal(name, (u16)val); + } + } + + kfree(options); +} + +static void __init omap_mux_set_board_signals(struct omap_board_mux *board_mux) +{ + while (board_mux->reg_offset != OMAP_MUX_TERMINATOR) { + omap_mux_write(board_mux->value, board_mux->reg_offset); + board_mux++; + } +} + +static int __init omap_mux_copy_names(struct omap_mux *src, + struct omap_mux *dst) +{ + int i; + + for (i = 0; i < OMAP_MUX_NR_MODES; i++) { + if (src->muxnames[i]) { + dst->muxnames[i] = + kmalloc(strlen(src->muxnames[i]) + 1, + GFP_KERNEL); + if (!dst->muxnames[i]) + goto free; + strcpy(dst->muxnames[i], src->muxnames[i]); + } + } + +#ifdef CONFIG_DEBUG_FS + for (i = 0; i < OMAP_MUX_NR_SIDES; i++) { + if (src->balls[i]) { + dst->balls[i] = + kmalloc(strlen(src->balls[i]) + 1, + GFP_KERNEL); + if (!dst->balls[i]) + goto free; + strcpy(dst->balls[i], src->balls[i]); + } + } +#endif + + return 0; + +free: + omap_mux_free_names(dst); + return -ENOMEM; + +} + +#endif /* CONFIG_OMAP_MUX */ + +static u16 omap_mux_get_by_gpio(int gpio) +{ + struct omap_mux_entry *e; + u16 offset = OMAP_MUX_TERMINATOR; + + list_for_each_entry(e, &muxmodes, node) { + struct omap_mux *m = &e->mux; + if (m->gpio == gpio) { + offset = m->reg_offset; + break; + } + } + + return offset; +} + +/* Needed for dynamic muxing of GPIO pins for off-idle */ +u16 omap_mux_get_gpio(int gpio) +{ + u16 offset; + + offset = omap_mux_get_by_gpio(gpio); + if (offset == OMAP_MUX_TERMINATOR) { + printk(KERN_ERR "mux: Could not get gpio%i\n", gpio); + return offset; + } + + return omap_mux_read(offset); +} + +/* Needed for dynamic muxing of GPIO pins for off-idle */ +void omap_mux_set_gpio(u16 val, int gpio) +{ + u16 offset; + + offset = omap_mux_get_by_gpio(gpio); + if (offset == OMAP_MUX_TERMINATOR) { + printk(KERN_ERR "mux: Could not set gpio%i\n", gpio); + return; + } + + omap_mux_write(val, offset); +} + +static struct omap_mux * __init omap_mux_list_add(struct omap_mux *src) +{ + struct omap_mux_entry *entry; + struct omap_mux *m; + + entry = kzalloc(sizeof(struct omap_mux_entry), GFP_KERNEL); + if (!entry) + return NULL; + + m = &entry->mux; + memcpy(m, src, sizeof(struct omap_mux_entry)); + +#ifdef CONFIG_OMAP_MUX + if (omap_mux_copy_names(src, m)) { + kfree(entry); + return NULL; + } +#endif + + mutex_lock(&muxmode_mutex); + list_add_tail(&entry->node, &muxmodes); + mutex_unlock(&muxmode_mutex); + + return m; +} + +/* + * Note if CONFIG_OMAP_MUX is not selected, we will only initialize + * the GPIO to mux offset mapping that is needed for dynamic muxing + * of GPIO pins for off-idle. + */ +static void __init omap_mux_init_list(struct omap_mux *superset) +{ + while (superset->reg_offset != OMAP_MUX_TERMINATOR) { + struct omap_mux *entry; + +#ifndef CONFIG_OMAP_MUX + /* Skip pins that are not muxed as GPIO by bootloader */ + if (!OMAP_MODE_GPIO(omap_mux_read(superset->reg_offset))) { + superset++; + continue; + } +#endif + + entry = omap_mux_list_add(superset); + if (!entry) { + printk(KERN_ERR "mux: Could not add entry\n"); + return; + } + superset++; + } +} + +int __init omap_mux_init(u32 mux_pbase, u32 mux_size, + struct omap_mux *superset, + struct omap_mux *package_subset, + struct omap_board_mux *board_mux, + struct omap_ball *package_balls) +{ + if (mux_base) + return -EBUSY; + + mux_base = ioremap(mux_pbase, mux_size); + if (!mux_base) { + printk(KERN_ERR "mux: Could not ioremap\n"); + return -ENODEV; + } + +#ifdef CONFIG_OMAP_MUX + omap_mux_package_fixup(package_subset, superset); + omap_mux_package_init_balls(package_balls, superset); + omap_mux_set_cmdline_signals(); + omap_mux_set_board_signals(board_mux); +#endif + + omap_mux_init_list(superset); + + return 0; +} + +#endif /* CONFIG_ARCH_OMAP34XX */ diff --git a/arch/arm/mach-omap2/mux.h b/arch/arm/mach-omap2/mux.h new file mode 100644 index 000000000000..bebe9cc60858 --- /dev/null +++ b/arch/arm/mach-omap2/mux.h @@ -0,0 +1,160 @@ +/* + * Copyright (C) 2009 Nokia + * Copyright (C) 2009 Texas Instruments + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#define OMAP_MUX_TERMINATOR 0xffff + +/* 34xx mux mode options for each pin. See TRM for options */ +#define OMAP_MUX_MODE0 0 +#define OMAP_MUX_MODE1 1 +#define OMAP_MUX_MODE2 2 +#define OMAP_MUX_MODE3 3 +#define OMAP_MUX_MODE4 4 +#define OMAP_MUX_MODE5 5 +#define OMAP_MUX_MODE6 6 +#define OMAP_MUX_MODE7 7 + +/* 24xx/34xx mux bit defines */ +#define OMAP_PULL_ENA (1 << 3) +#define OMAP_PULL_UP (1 << 4) +#define OMAP_ALTELECTRICALSEL (1 << 5) + +/* 34xx specific mux bit defines */ +#define OMAP_INPUT_EN (1 << 8) +#define OMAP_OFF_EN (1 << 9) +#define OMAP_OFFOUT_EN (1 << 10) +#define OMAP_OFFOUT_VAL (1 << 11) +#define OMAP_OFF_PULL_EN (1 << 12) +#define OMAP_OFF_PULL_UP (1 << 13) +#define OMAP_WAKEUP_EN (1 << 14) + +/* Active pin states */ +#define OMAP_PIN_OUTPUT 0 +#define OMAP_PIN_INPUT OMAP_INPUT_EN +#define OMAP_PIN_INPUT_PULLUP (OMAP_PULL_ENA | OMAP_INPUT_EN \ + | OMAP_PULL_UP) +#define OMAP_PIN_INPUT_PULLDOWN (OMAP_PULL_ENA | OMAP_INPUT_EN) + +/* Off mode states */ +#define OMAP_PIN_OFF_NONE 0 +#define OMAP_PIN_OFF_OUTPUT_HIGH (OMAP_OFF_EN | OMAP_OFFOUT_EN \ + | OMAP_OFFOUT_VAL) +#define OMAP_PIN_OFF_OUTPUT_LOW (OMAP_OFF_EN | OMAP_OFFOUT_EN) +#define OMAP_PIN_OFF_INPUT_PULLUP (OMAP_OFF_EN | OMAP_OFF_PULL_EN \ + | OMAP_OFF_PULL_UP) +#define OMAP_PIN_OFF_INPUT_PULLDOWN (OMAP_OFF_EN | OMAP_OFF_PULL_EN) +#define OMAP_PIN_OFF_WAKEUPENABLE OMAP_WAKEUP_EN + +#define OMAP_MODE_GPIO(x) (((x) & OMAP_MUX_MODE7) == OMAP_MUX_MODE4) + +/* Flags for omap_mux_init */ +#define OMAP_PACKAGE_MASK 0xffff +#define OMAP_PACKAGE_CUS 3 /* 423-pin 0.65 */ +#define OMAP_PACKAGE_CBB 2 /* 515-pin 0.40 0.50 */ +#define OMAP_PACKAGE_CBC 1 /* 515-pin 0.50 0.65 */ + + +#define OMAP_MUX_NR_MODES 8 /* Available modes */ +#define OMAP_MUX_NR_SIDES 2 /* Bottom & top */ + +/** + * struct omap_mux - data for omap mux register offset and it's value + * @reg_offset: mux register offset from the mux base + * @gpio: GPIO number + * @muxnames: available signal modes for a ball + */ +struct omap_mux { + u16 reg_offset; + u16 gpio; +#ifdef CONFIG_OMAP_MUX + char *muxnames[OMAP_MUX_NR_MODES]; +#ifdef CONFIG_DEBUG_FS + char *balls[OMAP_MUX_NR_SIDES]; +#endif +#endif +}; + +/** + * struct omap_ball - data for balls on omap package + * @reg_offset: mux register offset from the mux base + * @balls: available balls on the package + */ +struct omap_ball { + u16 reg_offset; + char *balls[OMAP_MUX_NR_SIDES]; +}; + +/** + * struct omap_board_mux - data for initializing mux registers + * @reg_offset: mux register offset from the mux base + * @mux_value: desired mux value to set + */ +struct omap_board_mux { + u16 reg_offset; + u16 value; +}; + +#if defined(CONFIG_OMAP_MUX) && defined(CONFIG_ARCH_OMAP34XX) + +/** + * omap_mux_init_gpio - initialize a signal based on the GPIO number + * @gpio: GPIO number + * @val: Options for the mux register value + */ +int omap_mux_init_gpio(int gpio, int val); + +/** + * omap_mux_init_signal - initialize a signal based on the signal name + * @muxname: Mux name in mode0_name.signal_name format + * @val: Options for the mux register value + */ +int omap_mux_init_signal(char *muxname, int val); + +#else + +static inline int omap_mux_init_gpio(int gpio, int val) +{ + return 0; +} +static inline int omap_mux_init_signal(char *muxname, int val) +{ + return 0; +} + +#endif + +/** + * omap_mux_get_gpio() - get mux register value based on GPIO number + * @gpio: GPIO number + * + */ +u16 omap_mux_get_gpio(int gpio); + +/** + * omap_mux_set_gpio() - set mux register value based on GPIO number + * @val: New mux register value + * @gpio: GPIO number + * + */ +void omap_mux_set_gpio(u16 val, int gpio); + +/** + * omap3_mux_init() - initialize mux system with board specific set + * @board_mux: Board specific mux table + * @flags: OMAP package type used for the board + */ +int omap3_mux_init(struct omap_board_mux *board_mux, int flags); + +/** + * omap_mux_init - private mux init function, do not call + */ +int omap_mux_init(u32 mux_pbase, u32 mux_size, + struct omap_mux *superset, + struct omap_mux *package_subset, + struct omap_board_mux *board_mux, + struct omap_ball *package_balls); diff --git a/arch/arm/plat-omap/mux.c b/arch/arm/plat-omap/mux.c index 05aebcad215b..06703635ace1 100644 --- a/arch/arm/plat-omap/mux.c +++ b/arch/arm/plat-omap/mux.c @@ -54,8 +54,12 @@ int __init_or_module omap_cfg_reg(const unsigned long index) { struct pin_config *reg; - if (cpu_is_omap44xx()) - return 0; + if (cpu_is_omap34xx() || cpu_is_omap44xx()) { + printk(KERN_ERR "mux: Broken omap_cfg_reg(%lu) entry\n", + index); + WARN_ON(1); + return -EINVAL; + } if (mux_cfg == NULL) { printk(KERN_ERR "Pin mux table not initialized\n"); -- cgit v1.2.3 From 12458ea06efd7b44281e68fe59c950ec7d59c649 Mon Sep 17 00:00:00 2001 From: Anatolij Gustschin Date: Fri, 11 Dec 2009 21:24:44 -0700 Subject: ppc440spe-adma: adds updated ppc440spe adma driver This patch adds new version of the PPC440SPe ADMA driver. Signed-off-by: Yuri Tikhonov Signed-off-by: Anatolij Gustschin Signed-off-by: Dan Williams --- .../powerpc/dts-bindings/4xx/ppc440spe-adma.txt | 93 + arch/powerpc/include/asm/async_tx.h | 47 + arch/powerpc/include/asm/dcr-regs.h | 23 + drivers/dma/Kconfig | 11 + drivers/dma/Makefile | 1 + drivers/dma/ppc4xx/Makefile | 1 + drivers/dma/ppc4xx/adma.c | 5027 ++++++++++++++++++++ drivers/dma/ppc4xx/adma.h | 195 + drivers/dma/ppc4xx/dma.h | 223 + drivers/dma/ppc4xx/xor.h | 110 + 10 files changed, 5731 insertions(+) create mode 100644 Documentation/powerpc/dts-bindings/4xx/ppc440spe-adma.txt create mode 100644 arch/powerpc/include/asm/async_tx.h create mode 100644 drivers/dma/ppc4xx/Makefile create mode 100644 drivers/dma/ppc4xx/adma.c create mode 100644 drivers/dma/ppc4xx/adma.h create mode 100644 drivers/dma/ppc4xx/dma.h create mode 100644 drivers/dma/ppc4xx/xor.h (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/4xx/ppc440spe-adma.txt b/Documentation/powerpc/dts-bindings/4xx/ppc440spe-adma.txt new file mode 100644 index 000000000000..515ebcf1b97d --- /dev/null +++ b/Documentation/powerpc/dts-bindings/4xx/ppc440spe-adma.txt @@ -0,0 +1,93 @@ +PPC440SPe DMA/XOR (DMA Controller and XOR Accelerator) + +Device nodes needed for operation of the ppc440spe-adma driver +are specified hereby. These are I2O/DMA, DMA and XOR nodes +for DMA engines and Memory Queue Module node. The latter is used +by ADMA driver for configuration of RAID-6 H/W capabilities of +the PPC440SPe. In addition to the nodes and properties described +below, the ranges property of PLB node must specify ranges for +DMA devices. + + i) The I2O node + + Required properties: + + - compatible : "ibm,i2o-440spe"; + - reg : + - dcr-reg : + + Example: + + I2O: i2o@400100000 { + compatible = "ibm,i2o-440spe"; + reg = <0x00000004 0x00100000 0x100>; + dcr-reg = <0x060 0x020>; + }; + + + ii) The DMA node + + Required properties: + + - compatible : "ibm,dma-440spe"; + - cell-index : 1 cell, hardware index of the DMA engine + (typically 0x0 and 0x1 for DMA0 and DMA1) + - reg : + - dcr-reg : + - interrupts : . + - interrupt-parent : needed for interrupt mapping + + Example: + + DMA0: dma0@400100100 { + compatible = "ibm,dma-440spe"; + cell-index = <0>; + reg = <0x00000004 0x00100100 0x100>; + dcr-reg = <0x060 0x020>; + interrupt-parent = <&DMA0>; + interrupts = <0 1>; + #interrupt-cells = <1>; + #address-cells = <0>; + #size-cells = <0>; + interrupt-map = < + 0 &UIC0 0x14 4 + 1 &UIC1 0x16 4>; + }; + + + iii) XOR Accelerator node + + Required properties: + + - compatible : "amcc,xor-accelerator"; + - reg : + - interrupts : + - interrupt-parent : for interrupt mapping + + Example: + + xor-accel@400200000 { + compatible = "amcc,xor-accelerator"; + reg = <0x00000004 0x00200000 0x400>; + interrupt-parent = <&UIC1>; + interrupts = <0x1f 4>; + }; + + + iv) Memory Queue Module node + + Required properties: + + - compatible : "ibm,mq-440spe"; + - dcr-reg : + + Example: + + MQ0: mq { + compatible = "ibm,mq-440spe"; + dcr-reg = <0x040 0x020>; + }; + diff --git a/arch/powerpc/include/asm/async_tx.h b/arch/powerpc/include/asm/async_tx.h new file mode 100644 index 000000000000..8b2dc55d01ab --- /dev/null +++ b/arch/powerpc/include/asm/async_tx.h @@ -0,0 +1,47 @@ +/* + * Copyright (C) 2008-2009 DENX Software Engineering. + * + * Author: Yuri Tikhonov + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * The full GNU General Public License is included in this distribution in the + * file called COPYING. + */ +#ifndef _ASM_POWERPC_ASYNC_TX_H_ +#define _ASM_POWERPC_ASYNC_TX_H_ + +#if defined(CONFIG_440SPe) || defined(CONFIG_440SP) +extern struct dma_chan * +ppc440spe_async_tx_find_best_channel(enum dma_transaction_type cap, + struct page **dst_lst, int dst_cnt, struct page **src_lst, + int src_cnt, size_t src_sz); + +#define async_tx_find_channel(dep, cap, dst_lst, dst_cnt, src_lst, \ + src_cnt, src_sz) \ + ppc440spe_async_tx_find_best_channel(cap, dst_lst, dst_cnt, src_lst, \ + src_cnt, src_sz) +#else + +#define async_tx_find_channel(dep, type, dst, dst_count, src, src_count, len) \ + __async_tx_find_channel(dep, type) + +struct dma_chan * +__async_tx_find_channel(struct async_submit_ctl *submit, + enum dma_transaction_type tx_type); + +#endif + +#endif diff --git a/arch/powerpc/include/asm/dcr-regs.h b/arch/powerpc/include/asm/dcr-regs.h index 828e3aa1f2fc..380274de429f 100644 --- a/arch/powerpc/include/asm/dcr-regs.h +++ b/arch/powerpc/include/asm/dcr-regs.h @@ -157,4 +157,27 @@ #define L2C_SNP_SSR_32G 0x0000f000 #define L2C_SNP_ESR 0x00000800 +/* + * DCR register offsets for 440SP/440SPe I2O/DMA controller. + * The base address is configured in the device tree. + */ +#define DCRN_I2O0_IBAL 0x006 +#define DCRN_I2O0_IBAH 0x007 +#define I2O_REG_ENABLE 0x00000001 /* Enable I2O/DMA access */ + +/* 440SP/440SPe Software Reset DCR */ +#define DCRN_SDR0_SRST 0x0200 +#define DCRN_SDR0_SRST_I2ODMA (0x80000000 >> 15) /* Reset I2O/DMA */ + +/* 440SP/440SPe Memory Queue DCR offsets */ +#define DCRN_MQ0_XORBA 0x04 +#define DCRN_MQ0_CF2H 0x06 +#define DCRN_MQ0_CFBHL 0x0f +#define DCRN_MQ0_BAUH 0x10 + +/* HB/LL Paths Configuration Register */ +#define MQ0_CFBHL_TPLM 28 +#define MQ0_CFBHL_HBCL 23 +#define MQ0_CFBHL_POLY 15 + #endif /* __DCR_REGS_H__ */ diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 24cdd20fe462..fe93d70f2e37 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -116,6 +116,17 @@ config COH901318 help Enable support for ST-Ericsson COH 901 318 DMA. +config AMCC_PPC440SPE_ADMA + tristate "AMCC PPC440SPe ADMA support" + depends on 440SPe || 440SP + select DMA_ENGINE + select ARCH_HAS_ASYNC_TX_FIND_CHANNEL + help + Enable support for the AMCC PPC440SPe RAID engines. + +config ARCH_HAS_ASYNC_TX_FIND_CHANNEL + bool + config DMA_ENGINE bool diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index 4db768e09cf3..807053d48232 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -11,3 +11,4 @@ obj-$(CONFIG_MX3_IPU) += ipu/ obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o obj-$(CONFIG_SH_DMAE) += shdma.o obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o +obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/ diff --git a/drivers/dma/ppc4xx/Makefile b/drivers/dma/ppc4xx/Makefile new file mode 100644 index 000000000000..b3d259b3e52a --- /dev/null +++ b/drivers/dma/ppc4xx/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += adma.o diff --git a/drivers/dma/ppc4xx/adma.c b/drivers/dma/ppc4xx/adma.c new file mode 100644 index 000000000000..0a3478e910f0 --- /dev/null +++ b/drivers/dma/ppc4xx/adma.c @@ -0,0 +1,5027 @@ +/* + * Copyright (C) 2006-2009 DENX Software Engineering. + * + * Author: Yuri Tikhonov + * + * Further porting to arch/powerpc by + * Anatolij Gustschin + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * The full GNU General Public License is included in this distribution in the + * file called COPYING. + */ + +/* + * This driver supports the asynchrounous DMA copy and RAID engines available + * on the AMCC PPC440SPe Processors. + * Based on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x) + * ADMA driver written by D.Williams. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "adma.h" + +enum ppc_adma_init_code { + PPC_ADMA_INIT_OK = 0, + PPC_ADMA_INIT_MEMRES, + PPC_ADMA_INIT_MEMREG, + PPC_ADMA_INIT_ALLOC, + PPC_ADMA_INIT_COHERENT, + PPC_ADMA_INIT_CHANNEL, + PPC_ADMA_INIT_IRQ1, + PPC_ADMA_INIT_IRQ2, + PPC_ADMA_INIT_REGISTER +}; + +static char *ppc_adma_errors[] = { + [PPC_ADMA_INIT_OK] = "ok", + [PPC_ADMA_INIT_MEMRES] = "failed to get memory resource", + [PPC_ADMA_INIT_MEMREG] = "failed to request memory region", + [PPC_ADMA_INIT_ALLOC] = "failed to allocate memory for adev " + "structure", + [PPC_ADMA_INIT_COHERENT] = "failed to allocate coherent memory for " + "hardware descriptors", + [PPC_ADMA_INIT_CHANNEL] = "failed to allocate memory for channel", + [PPC_ADMA_INIT_IRQ1] = "failed to request first irq", + [PPC_ADMA_INIT_IRQ2] = "failed to request second irq", + [PPC_ADMA_INIT_REGISTER] = "failed to register dma async device", +}; + +static enum ppc_adma_init_code +ppc440spe_adma_devices[PPC440SPE_ADMA_ENGINES_NUM]; + +struct ppc_dma_chan_ref { + struct dma_chan *chan; + struct list_head node; +}; + +/* The list of channels exported by ppc440spe ADMA */ +struct list_head +ppc440spe_adma_chan_list = LIST_HEAD_INIT(ppc440spe_adma_chan_list); + +/* This flag is set when want to refetch the xor chain in the interrupt + * handler + */ +static u32 do_xor_refetch; + +/* Pointer to DMA0, DMA1 CP/CS FIFO */ +static void *ppc440spe_dma_fifo_buf; + +/* Pointers to last submitted to DMA0, DMA1 CDBs */ +static struct ppc440spe_adma_desc_slot *chan_last_sub[3]; +static struct ppc440spe_adma_desc_slot *chan_first_cdb[3]; + +/* Pointer to last linked and submitted xor CB */ +static struct ppc440spe_adma_desc_slot *xor_last_linked; +static struct ppc440spe_adma_desc_slot *xor_last_submit; + +/* This array is used in data-check operations for storing a pattern */ +static char ppc440spe_qword[16]; + +static atomic_t ppc440spe_adma_err_irq_ref; +static dcr_host_t ppc440spe_mq_dcr_host; +static unsigned int ppc440spe_mq_dcr_len; + +/* Since RXOR operations use the common register (MQ0_CF2H) for setting-up + * the block size in transactions, then we do not allow to activate more than + * only one RXOR transactions simultaneously. So use this var to store + * the information about is RXOR currently active (PPC440SPE_RXOR_RUN bit is + * set) or not (PPC440SPE_RXOR_RUN is clear). + */ +static unsigned long ppc440spe_rxor_state; + +/* These are used in enable & check routines + */ +static u32 ppc440spe_r6_enabled; +static struct ppc440spe_adma_chan *ppc440spe_r6_tchan; +static struct completion ppc440spe_r6_test_comp; + +static int ppc440spe_adma_dma2rxor_prep_src( + struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_rxor *cursor, int index, + int src_cnt, u32 addr); +static void ppc440spe_adma_dma2rxor_set_src( + struct ppc440spe_adma_desc_slot *desc, + int index, dma_addr_t addr); +static void ppc440spe_adma_dma2rxor_set_mult( + struct ppc440spe_adma_desc_slot *desc, + int index, u8 mult); + +#ifdef ADMA_LL_DEBUG +#define ADMA_LL_DBG(x) ({ if (1) x; 0; }) +#else +#define ADMA_LL_DBG(x) ({ if (0) x; 0; }) +#endif + +static void print_cb(struct ppc440spe_adma_chan *chan, void *block) +{ + struct dma_cdb *cdb; + struct xor_cb *cb; + int i; + + switch (chan->device->id) { + case 0: + case 1: + cdb = block; + + pr_debug("CDB at %p [%d]:\n" + "\t attr 0x%02x opc 0x%02x cnt 0x%08x\n" + "\t sg1u 0x%08x sg1l 0x%08x\n" + "\t sg2u 0x%08x sg2l 0x%08x\n" + "\t sg3u 0x%08x sg3l 0x%08x\n", + cdb, chan->device->id, + cdb->attr, cdb->opc, le32_to_cpu(cdb->cnt), + le32_to_cpu(cdb->sg1u), le32_to_cpu(cdb->sg1l), + le32_to_cpu(cdb->sg2u), le32_to_cpu(cdb->sg2l), + le32_to_cpu(cdb->sg3u), le32_to_cpu(cdb->sg3l) + ); + break; + case 2: + cb = block; + + pr_debug("CB at %p [%d]:\n" + "\t cbc 0x%08x cbbc 0x%08x cbs 0x%08x\n" + "\t cbtah 0x%08x cbtal 0x%08x\n" + "\t cblah 0x%08x cblal 0x%08x\n", + cb, chan->device->id, + cb->cbc, cb->cbbc, cb->cbs, + cb->cbtah, cb->cbtal, + cb->cblah, cb->cblal); + for (i = 0; i < 16; i++) { + if (i && !cb->ops[i].h && !cb->ops[i].l) + continue; + pr_debug("\t ops[%2d]: h 0x%08x l 0x%08x\n", + i, cb->ops[i].h, cb->ops[i].l); + } + break; + } +} + +static void print_cb_list(struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *iter) +{ + for (; iter; iter = iter->hw_next) + print_cb(chan, iter->hw_desc); +} + +static void prep_dma_xor_dbg(int id, dma_addr_t dst, dma_addr_t *src, + unsigned int src_cnt) +{ + int i; + + pr_debug("\n%s(%d):\nsrc: ", __func__, id); + for (i = 0; i < src_cnt; i++) + pr_debug("\t0x%016llx ", src[i]); + pr_debug("dst:\n\t0x%016llx\n", dst); +} + +static void prep_dma_pq_dbg(int id, dma_addr_t *dst, dma_addr_t *src, + unsigned int src_cnt) +{ + int i; + + pr_debug("\n%s(%d):\nsrc: ", __func__, id); + for (i = 0; i < src_cnt; i++) + pr_debug("\t0x%016llx ", src[i]); + pr_debug("dst: "); + for (i = 0; i < 2; i++) + pr_debug("\t0x%016llx ", dst[i]); +} + +static void prep_dma_pqzero_sum_dbg(int id, dma_addr_t *src, + unsigned int src_cnt, + const unsigned char *scf) +{ + int i; + + pr_debug("\n%s(%d):\nsrc(coef): ", __func__, id); + if (scf) { + for (i = 0; i < src_cnt; i++) + pr_debug("\t0x%016llx(0x%02x) ", src[i], scf[i]); + } else { + for (i = 0; i < src_cnt; i++) + pr_debug("\t0x%016llx(no) ", src[i]); + } + + pr_debug("dst: "); + for (i = 0; i < 2; i++) + pr_debug("\t0x%016llx ", src[src_cnt + i]); +} + +/****************************************************************************** + * Command (Descriptor) Blocks low-level routines + ******************************************************************************/ +/** + * ppc440spe_desc_init_interrupt - initialize the descriptor for INTERRUPT + * pseudo operation + */ +static void ppc440spe_desc_init_interrupt(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan) +{ + struct xor_cb *p; + + switch (chan->device->id) { + case PPC440SPE_XOR_ID: + p = desc->hw_desc; + memset(desc->hw_desc, 0, sizeof(struct xor_cb)); + /* NOP with Command Block Complete Enable */ + p->cbc = XOR_CBCR_CBCE_BIT; + break; + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + memset(desc->hw_desc, 0, sizeof(struct dma_cdb)); + /* NOP with interrupt */ + set_bit(PPC440SPE_DESC_INT, &desc->flags); + break; + default: + printk(KERN_ERR "Unsupported id %d in %s\n", chan->device->id, + __func__); + break; + } +} + +/** + * ppc440spe_desc_init_null_xor - initialize the descriptor for NULL XOR + * pseudo operation + */ +static void ppc440spe_desc_init_null_xor(struct ppc440spe_adma_desc_slot *desc) +{ + memset(desc->hw_desc, 0, sizeof(struct xor_cb)); + desc->hw_next = NULL; + desc->src_cnt = 0; + desc->dst_cnt = 1; +} + +/** + * ppc440spe_desc_init_xor - initialize the descriptor for XOR operation + */ +static void ppc440spe_desc_init_xor(struct ppc440spe_adma_desc_slot *desc, + int src_cnt, unsigned long flags) +{ + struct xor_cb *hw_desc = desc->hw_desc; + + memset(desc->hw_desc, 0, sizeof(struct xor_cb)); + desc->hw_next = NULL; + desc->src_cnt = src_cnt; + desc->dst_cnt = 1; + + hw_desc->cbc = XOR_CBCR_TGT_BIT | src_cnt; + if (flags & DMA_PREP_INTERRUPT) + /* Enable interrupt on completion */ + hw_desc->cbc |= XOR_CBCR_CBCE_BIT; +} + +/** + * ppc440spe_desc_init_dma2pq - initialize the descriptor for PQ + * operation in DMA2 controller + */ +static void ppc440spe_desc_init_dma2pq(struct ppc440spe_adma_desc_slot *desc, + int dst_cnt, int src_cnt, unsigned long flags) +{ + struct xor_cb *hw_desc = desc->hw_desc; + + memset(desc->hw_desc, 0, sizeof(struct xor_cb)); + desc->hw_next = NULL; + desc->src_cnt = src_cnt; + desc->dst_cnt = dst_cnt; + memset(desc->reverse_flags, 0, sizeof(desc->reverse_flags)); + desc->descs_per_op = 0; + + hw_desc->cbc = XOR_CBCR_TGT_BIT; + if (flags & DMA_PREP_INTERRUPT) + /* Enable interrupt on completion */ + hw_desc->cbc |= XOR_CBCR_CBCE_BIT; +} + +#define DMA_CTRL_FLAGS_LAST DMA_PREP_FENCE +#define DMA_PREP_ZERO_P (DMA_CTRL_FLAGS_LAST << 1) +#define DMA_PREP_ZERO_Q (DMA_PREP_ZERO_P << 1) + +/** + * ppc440spe_desc_init_dma01pq - initialize the descriptors for PQ operation + * with DMA0/1 + */ +static void ppc440spe_desc_init_dma01pq(struct ppc440spe_adma_desc_slot *desc, + int dst_cnt, int src_cnt, unsigned long flags, + unsigned long op) +{ + struct dma_cdb *hw_desc; + struct ppc440spe_adma_desc_slot *iter; + u8 dopc; + + /* Common initialization of a PQ descriptors chain */ + set_bits(op, &desc->flags); + desc->src_cnt = src_cnt; + desc->dst_cnt = dst_cnt; + + /* WXOR MULTICAST if both P and Q are being computed + * MV_SG1_SG2 if Q only + */ + dopc = (desc->dst_cnt == DMA_DEST_MAX_NUM) ? + DMA_CDB_OPC_MULTICAST : DMA_CDB_OPC_MV_SG1_SG2; + + list_for_each_entry(iter, &desc->group_list, chain_node) { + hw_desc = iter->hw_desc; + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + + if (likely(!list_is_last(&iter->chain_node, + &desc->group_list))) { + /* set 'next' pointer */ + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, chain_node); + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + } else { + /* this is the last descriptor. + * this slot will be pasted from ADMA level + * each time it wants to configure parameters + * of the transaction (src, dst, ...) + */ + iter->hw_next = NULL; + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &iter->flags); + else + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + } + } + + /* Set OPS depending on WXOR/RXOR type of operation */ + if (!test_bit(PPC440SPE_DESC_RXOR, &desc->flags)) { + /* This is a WXOR only chain: + * - first descriptors are for zeroing destinations + * if PPC440SPE_ZERO_P/Q set; + * - descriptors remained are for GF-XOR operations. + */ + iter = list_first_entry(&desc->group_list, + struct ppc440spe_adma_desc_slot, + chain_node); + + if (test_bit(PPC440SPE_ZERO_P, &desc->flags)) { + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + } + + if (test_bit(PPC440SPE_ZERO_Q, &desc->flags)) { + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + } + + list_for_each_entry_from(iter, &desc->group_list, chain_node) { + hw_desc = iter->hw_desc; + hw_desc->opc = dopc; + } + } else { + /* This is either RXOR-only or mixed RXOR/WXOR */ + + /* The first 1 or 2 slots in chain are always RXOR, + * if need to calculate P & Q, then there are two + * RXOR slots; if only P or only Q, then there is one + */ + iter = list_first_entry(&desc->group_list, + struct ppc440spe_adma_desc_slot, + chain_node); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + + if (desc->dst_cnt == DMA_DEST_MAX_NUM) { + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + } + + /* The remaining descs (if any) are WXORs */ + if (test_bit(PPC440SPE_DESC_WXOR, &desc->flags)) { + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + list_for_each_entry_from(iter, &desc->group_list, + chain_node) { + hw_desc = iter->hw_desc; + hw_desc->opc = dopc; + } + } + } +} + +/** + * ppc440spe_desc_init_dma01pqzero_sum - initialize the descriptor + * for PQ_ZERO_SUM operation + */ +static void ppc440spe_desc_init_dma01pqzero_sum( + struct ppc440spe_adma_desc_slot *desc, + int dst_cnt, int src_cnt) +{ + struct dma_cdb *hw_desc; + struct ppc440spe_adma_desc_slot *iter; + int i = 0; + u8 dopc = (dst_cnt == 2) ? DMA_CDB_OPC_MULTICAST : + DMA_CDB_OPC_MV_SG1_SG2; + /* + * Initialize starting from 2nd or 3rd descriptor dependent + * on dst_cnt. First one or two slots are for cloning P + * and/or Q to chan->pdest and/or chan->qdest as we have + * to preserve original P/Q. + */ + iter = list_first_entry(&desc->group_list, + struct ppc440spe_adma_desc_slot, chain_node); + iter = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, chain_node); + + if (dst_cnt > 1) { + iter = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, chain_node); + } + /* initialize each source descriptor in chain */ + list_for_each_entry_from(iter, &desc->group_list, chain_node) { + hw_desc = iter->hw_desc; + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->src_cnt = 0; + iter->dst_cnt = 0; + + /* This is a ZERO_SUM operation: + * - descriptors starting from 2nd or 3rd + * descriptor are for GF-XOR operations; + * - remaining descriptors are for checking the result + */ + if (i++ < src_cnt) + /* MV_SG1_SG2 if only Q is being verified + * MULTICAST if both P and Q are being verified + */ + hw_desc->opc = dopc; + else + /* DMA_CDB_OPC_DCHECK128 operation */ + hw_desc->opc = DMA_CDB_OPC_DCHECK128; + + if (likely(!list_is_last(&iter->chain_node, + &desc->group_list))) { + /* set 'next' pointer */ + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + } else { + /* this is the last descriptor. + * this slot will be pasted from ADMA level + * each time it wants to configure parameters + * of the transaction (src, dst, ...) + */ + iter->hw_next = NULL; + /* always enable interrupt generation since we get + * the status of pqzero from the handler + */ + set_bit(PPC440SPE_DESC_INT, &iter->flags); + } + } + desc->src_cnt = src_cnt; + desc->dst_cnt = dst_cnt; +} + +/** + * ppc440spe_desc_init_memcpy - initialize the descriptor for MEMCPY operation + */ +static void ppc440spe_desc_init_memcpy(struct ppc440spe_adma_desc_slot *desc, + unsigned long flags) +{ + struct dma_cdb *hw_desc = desc->hw_desc; + + memset(desc->hw_desc, 0, sizeof(struct dma_cdb)); + desc->hw_next = NULL; + desc->src_cnt = 1; + desc->dst_cnt = 1; + + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &desc->flags); + else + clear_bit(PPC440SPE_DESC_INT, &desc->flags); + + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; +} + +/** + * ppc440spe_desc_init_memset - initialize the descriptor for MEMSET operation + */ +static void ppc440spe_desc_init_memset(struct ppc440spe_adma_desc_slot *desc, + int value, unsigned long flags) +{ + struct dma_cdb *hw_desc = desc->hw_desc; + + memset(desc->hw_desc, 0, sizeof(struct dma_cdb)); + desc->hw_next = NULL; + desc->src_cnt = 1; + desc->dst_cnt = 1; + + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &desc->flags); + else + clear_bit(PPC440SPE_DESC_INT, &desc->flags); + + hw_desc->sg1u = hw_desc->sg1l = cpu_to_le32((u32)value); + hw_desc->sg3u = hw_desc->sg3l = cpu_to_le32((u32)value); + hw_desc->opc = DMA_CDB_OPC_DFILL128; +} + +/** + * ppc440spe_desc_set_src_addr - set source address into the descriptor + */ +static void ppc440spe_desc_set_src_addr(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, + int src_idx, dma_addr_t addrh, + dma_addr_t addrl) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + phys_addr_t addr64, tmplow, tmphi; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + if (!addrh) { + addr64 = addrl; + tmphi = (addr64 >> 32); + tmplow = (addr64 & 0xFFFFFFFF); + } else { + tmphi = addrh; + tmplow = addrl; + } + dma_hw_desc = desc->hw_desc; + dma_hw_desc->sg1l = cpu_to_le32((u32)tmplow); + dma_hw_desc->sg1u |= cpu_to_le32((u32)tmphi); + break; + case PPC440SPE_XOR_ID: + xor_hw_desc = desc->hw_desc; + xor_hw_desc->ops[src_idx].l = addrl; + xor_hw_desc->ops[src_idx].h |= addrh; + break; + } +} + +/** + * ppc440spe_desc_set_src_mult - set source address mult into the descriptor + */ +static void ppc440spe_desc_set_src_mult(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, u32 mult_index, + int sg_index, unsigned char mult_value) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + u32 *psgu; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + + switch (sg_index) { + /* for RXOR operations set multiplier + * into source cued address + */ + case DMA_CDB_SG_SRC: + psgu = &dma_hw_desc->sg1u; + break; + /* for WXOR operations set multiplier + * into destination cued address(es) + */ + case DMA_CDB_SG_DST1: + psgu = &dma_hw_desc->sg2u; + break; + case DMA_CDB_SG_DST2: + psgu = &dma_hw_desc->sg3u; + break; + default: + BUG(); + } + + *psgu |= cpu_to_le32(mult_value << mult_index); + break; + case PPC440SPE_XOR_ID: + xor_hw_desc = desc->hw_desc; + break; + default: + BUG(); + } +} + +/** + * ppc440spe_desc_set_dest_addr - set destination address into the descriptor + */ +static void ppc440spe_desc_set_dest_addr(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, + dma_addr_t addrh, dma_addr_t addrl, + u32 dst_idx) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + phys_addr_t addr64, tmphi, tmplow; + u32 *psgu, *psgl; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + if (!addrh) { + addr64 = addrl; + tmphi = (addr64 >> 32); + tmplow = (addr64 & 0xFFFFFFFF); + } else { + tmphi = addrh; + tmplow = addrl; + } + dma_hw_desc = desc->hw_desc; + + psgu = dst_idx ? &dma_hw_desc->sg3u : &dma_hw_desc->sg2u; + psgl = dst_idx ? &dma_hw_desc->sg3l : &dma_hw_desc->sg2l; + + *psgl = cpu_to_le32((u32)tmplow); + *psgu |= cpu_to_le32((u32)tmphi); + break; + case PPC440SPE_XOR_ID: + xor_hw_desc = desc->hw_desc; + xor_hw_desc->cbtal = addrl; + xor_hw_desc->cbtah |= addrh; + break; + } +} + +/** + * ppc440spe_desc_set_byte_count - set number of data bytes involved + * into the operation + */ +static void ppc440spe_desc_set_byte_count(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, + u32 byte_count) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + dma_hw_desc->cnt = cpu_to_le32(byte_count); + break; + case PPC440SPE_XOR_ID: + xor_hw_desc = desc->hw_desc; + xor_hw_desc->cbbc = byte_count; + break; + } +} + +/** + * ppc440spe_desc_set_rxor_block_size - set RXOR block size + */ +static inline void ppc440spe_desc_set_rxor_block_size(u32 byte_count) +{ + /* assume that byte_count is aligned on the 512-boundary; + * thus write it directly to the register (bits 23:31 are + * reserved there). + */ + dcr_write(ppc440spe_mq_dcr_host, DCRN_MQ0_CF2H, byte_count); +} + +/** + * ppc440spe_desc_set_dcheck - set CHECK pattern + */ +static void ppc440spe_desc_set_dcheck(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, u8 *qword) +{ + struct dma_cdb *dma_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + iowrite32(qword[0], &dma_hw_desc->sg3l); + iowrite32(qword[4], &dma_hw_desc->sg3u); + iowrite32(qword[8], &dma_hw_desc->sg2l); + iowrite32(qword[12], &dma_hw_desc->sg2u); + break; + default: + BUG(); + } +} + +/** + * ppc440spe_xor_set_link - set link address in xor CB + */ +static void ppc440spe_xor_set_link(struct ppc440spe_adma_desc_slot *prev_desc, + struct ppc440spe_adma_desc_slot *next_desc) +{ + struct xor_cb *xor_hw_desc = prev_desc->hw_desc; + + if (unlikely(!next_desc || !(next_desc->phys))) { + printk(KERN_ERR "%s: next_desc=0x%p; next_desc->phys=0x%llx\n", + __func__, next_desc, + next_desc ? next_desc->phys : 0); + BUG(); + } + + xor_hw_desc->cbs = 0; + xor_hw_desc->cblal = next_desc->phys; + xor_hw_desc->cblah = 0; + xor_hw_desc->cbc |= XOR_CBCR_LNK_BIT; +} + +/** + * ppc440spe_desc_set_link - set the address of descriptor following this + * descriptor in chain + */ +static void ppc440spe_desc_set_link(struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *prev_desc, + struct ppc440spe_adma_desc_slot *next_desc) +{ + unsigned long flags; + struct ppc440spe_adma_desc_slot *tail = next_desc; + + if (unlikely(!prev_desc || !next_desc || + (prev_desc->hw_next && prev_desc->hw_next != next_desc))) { + /* If previous next is overwritten something is wrong. + * though we may refetch from append to initiate list + * processing; in this case - it's ok. + */ + printk(KERN_ERR "%s: prev_desc=0x%p; next_desc=0x%p; " + "prev->hw_next=0x%p\n", __func__, prev_desc, + next_desc, prev_desc ? prev_desc->hw_next : 0); + BUG(); + } + + local_irq_save(flags); + + /* do s/w chaining both for DMA and XOR descriptors */ + prev_desc->hw_next = next_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + break; + case PPC440SPE_XOR_ID: + /* bind descriptor to the chain */ + while (tail->hw_next) + tail = tail->hw_next; + xor_last_linked = tail; + + if (prev_desc == xor_last_submit) + /* do not link to the last submitted CB */ + break; + ppc440spe_xor_set_link(prev_desc, next_desc); + break; + } + + local_irq_restore(flags); +} + +/** + * ppc440spe_desc_get_src_addr - extract the source address from the descriptor + */ +static u32 ppc440spe_desc_get_src_addr(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, int src_idx) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + /* May have 0, 1, 2, or 3 sources */ + switch (dma_hw_desc->opc) { + case DMA_CDB_OPC_NO_OP: + case DMA_CDB_OPC_DFILL128: + return 0; + case DMA_CDB_OPC_DCHECK128: + if (unlikely(src_idx)) { + printk(KERN_ERR "%s: try to get %d source for" + " DCHECK128\n", __func__, src_idx); + BUG(); + } + return le32_to_cpu(dma_hw_desc->sg1l); + case DMA_CDB_OPC_MULTICAST: + case DMA_CDB_OPC_MV_SG1_SG2: + if (unlikely(src_idx > 2)) { + printk(KERN_ERR "%s: try to get %d source from" + " DMA descr\n", __func__, src_idx); + BUG(); + } + if (src_idx) { + if (le32_to_cpu(dma_hw_desc->sg1u) & + DMA_CUED_XOR_WIN_MSK) { + u8 region; + + if (src_idx == 1) + return le32_to_cpu( + dma_hw_desc->sg1l) + + desc->unmap_len; + + region = (le32_to_cpu( + dma_hw_desc->sg1u)) >> + DMA_CUED_REGION_OFF; + + region &= DMA_CUED_REGION_MSK; + switch (region) { + case DMA_RXOR123: + return le32_to_cpu( + dma_hw_desc->sg1l) + + (desc->unmap_len << 1); + case DMA_RXOR124: + return le32_to_cpu( + dma_hw_desc->sg1l) + + (desc->unmap_len * 3); + case DMA_RXOR125: + return le32_to_cpu( + dma_hw_desc->sg1l) + + (desc->unmap_len << 2); + default: + printk(KERN_ERR + "%s: try to" + " get src3 for region %02x" + "PPC440SPE_DESC_RXOR12?\n", + __func__, region); + BUG(); + } + } else { + printk(KERN_ERR + "%s: try to get %d" + " source for non-cued descr\n", + __func__, src_idx); + BUG(); + } + } + return le32_to_cpu(dma_hw_desc->sg1l); + default: + printk(KERN_ERR "%s: unknown OPC 0x%02x\n", + __func__, dma_hw_desc->opc); + BUG(); + } + return le32_to_cpu(dma_hw_desc->sg1l); + case PPC440SPE_XOR_ID: + /* May have up to 16 sources */ + xor_hw_desc = desc->hw_desc; + return xor_hw_desc->ops[src_idx].l; + } + return 0; +} + +/** + * ppc440spe_desc_get_dest_addr - extract the destination address from the + * descriptor + */ +static u32 ppc440spe_desc_get_dest_addr(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, int idx) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + + if (likely(!idx)) + return le32_to_cpu(dma_hw_desc->sg2l); + return le32_to_cpu(dma_hw_desc->sg3l); + case PPC440SPE_XOR_ID: + xor_hw_desc = desc->hw_desc; + return xor_hw_desc->cbtal; + } + return 0; +} + +/** + * ppc440spe_desc_get_src_num - extract the number of source addresses from + * the descriptor + */ +static u32 ppc440spe_desc_get_src_num(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan) +{ + struct dma_cdb *dma_hw_desc; + struct xor_cb *xor_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_hw_desc = desc->hw_desc; + + switch (dma_hw_desc->opc) { + case DMA_CDB_OPC_NO_OP: + case DMA_CDB_OPC_DFILL128: + return 0; + case DMA_CDB_OPC_DCHECK128: + return 1; + case DMA_CDB_OPC_MV_SG1_SG2: + case DMA_CDB_OPC_MULTICAST: + /* + * Only for RXOR operations we have more than + * one source + */ + if (le32_to_cpu(dma_hw_desc->sg1u) & + DMA_CUED_XOR_WIN_MSK) { + /* RXOR op, there are 2 or 3 sources */ + if (((le32_to_cpu(dma_hw_desc->sg1u) >> + DMA_CUED_REGION_OFF) & + DMA_CUED_REGION_MSK) == DMA_RXOR12) { + /* RXOR 1-2 */ + return 2; + } else { + /* RXOR 1-2-3/1-2-4/1-2-5 */ + return 3; + } + } + return 1; + default: + printk(KERN_ERR "%s: unknown OPC 0x%02x\n", + __func__, dma_hw_desc->opc); + BUG(); + } + case PPC440SPE_XOR_ID: + /* up to 16 sources */ + xor_hw_desc = desc->hw_desc; + return xor_hw_desc->cbc & XOR_CDCR_OAC_MSK; + default: + BUG(); + } + return 0; +} + +/** + * ppc440spe_desc_get_dst_num - get the number of destination addresses in + * this descriptor + */ +static u32 ppc440spe_desc_get_dst_num(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan) +{ + struct dma_cdb *dma_hw_desc; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* May be 1 or 2 destinations */ + dma_hw_desc = desc->hw_desc; + switch (dma_hw_desc->opc) { + case DMA_CDB_OPC_NO_OP: + case DMA_CDB_OPC_DCHECK128: + return 0; + case DMA_CDB_OPC_MV_SG1_SG2: + case DMA_CDB_OPC_DFILL128: + return 1; + case DMA_CDB_OPC_MULTICAST: + if (desc->dst_cnt == 2) + return 2; + else + return 1; + default: + printk(KERN_ERR "%s: unknown OPC 0x%02x\n", + __func__, dma_hw_desc->opc); + BUG(); + } + case PPC440SPE_XOR_ID: + /* Always only 1 destination */ + return 1; + default: + BUG(); + } + return 0; +} + +/** + * ppc440spe_desc_get_link - get the address of the descriptor that + * follows this one + */ +static inline u32 ppc440spe_desc_get_link(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan) +{ + if (!desc->hw_next) + return 0; + + return desc->hw_next->phys; +} + +/** + * ppc440spe_desc_is_aligned - check alignment + */ +static inline int ppc440spe_desc_is_aligned( + struct ppc440spe_adma_desc_slot *desc, int num_slots) +{ + return (desc->idx & (num_slots - 1)) ? 0 : 1; +} + +/** + * ppc440spe_chan_xor_slot_count - get the number of slots necessary for + * XOR operation + */ +static int ppc440spe_chan_xor_slot_count(size_t len, int src_cnt, + int *slots_per_op) +{ + int slot_cnt; + + /* each XOR descriptor provides up to 16 source operands */ + slot_cnt = *slots_per_op = (src_cnt + XOR_MAX_OPS - 1)/XOR_MAX_OPS; + + if (likely(len <= PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT)) + return slot_cnt; + + printk(KERN_ERR "%s: len %d > max %d !!\n", + __func__, len, PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT); + BUG(); + return slot_cnt; +} + +/** + * ppc440spe_dma2_pq_slot_count - get the number of slots necessary for + * DMA2 PQ operation + */ +static int ppc440spe_dma2_pq_slot_count(dma_addr_t *srcs, + int src_cnt, size_t len) +{ + signed long long order = 0; + int state = 0; + int addr_count = 0; + int i; + for (i = 1; i < src_cnt; i++) { + dma_addr_t cur_addr = srcs[i]; + dma_addr_t old_addr = srcs[i-1]; + switch (state) { + case 0: + if (cur_addr == old_addr + len) { + /* direct RXOR */ + order = 1; + state = 1; + if (i == src_cnt-1) + addr_count++; + } else if (old_addr == cur_addr + len) { + /* reverse RXOR */ + order = -1; + state = 1; + if (i == src_cnt-1) + addr_count++; + } else { + state = 3; + } + break; + case 1: + if (i == src_cnt-2 || (order == -1 + && cur_addr != old_addr - len)) { + order = 0; + state = 0; + addr_count++; + } else if (cur_addr == old_addr + len*order) { + state = 2; + if (i == src_cnt-1) + addr_count++; + } else if (cur_addr == old_addr + 2*len) { + state = 2; + if (i == src_cnt-1) + addr_count++; + } else if (cur_addr == old_addr + 3*len) { + state = 2; + if (i == src_cnt-1) + addr_count++; + } else { + order = 0; + state = 0; + addr_count++; + } + break; + case 2: + order = 0; + state = 0; + addr_count++; + break; + } + if (state == 3) + break; + } + if (src_cnt <= 1 || (state != 1 && state != 2)) { + pr_err("%s: src_cnt=%d, state=%d, addr_count=%d, order=%lld\n", + __func__, src_cnt, state, addr_count, order); + for (i = 0; i < src_cnt; i++) + pr_err("\t[%d] 0x%llx \n", i, srcs[i]); + BUG(); + } + + return (addr_count + XOR_MAX_OPS - 1) / XOR_MAX_OPS; +} + + +/****************************************************************************** + * ADMA channel low-level routines + ******************************************************************************/ + +static u32 +ppc440spe_chan_get_current_descriptor(struct ppc440spe_adma_chan *chan); +static void ppc440spe_chan_append(struct ppc440spe_adma_chan *chan); + +/** + * ppc440spe_adma_device_clear_eot_status - interrupt ack to XOR or DMA engine + */ +static void ppc440spe_adma_device_clear_eot_status( + struct ppc440spe_adma_chan *chan) +{ + struct dma_regs *dma_reg; + struct xor_regs *xor_reg; + u8 *p = chan->device->dma_desc_pool_virt; + struct dma_cdb *cdb; + u32 rv, i; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* read FIFO to ack */ + dma_reg = chan->device->dma_reg; + while ((rv = ioread32(&dma_reg->csfpl))) { + i = rv & DMA_CDB_ADDR_MSK; + cdb = (struct dma_cdb *)&p[i - + (u32)chan->device->dma_desc_pool]; + + /* Clear opcode to ack. This is necessary for + * ZeroSum operations only + */ + cdb->opc = 0; + + if (test_bit(PPC440SPE_RXOR_RUN, + &ppc440spe_rxor_state)) { + /* probably this is a completed RXOR op, + * get pointer to CDB using the fact that + * physical and virtual addresses of CDB + * in pools have the same offsets + */ + if (le32_to_cpu(cdb->sg1u) & + DMA_CUED_XOR_BASE) { + /* this is a RXOR */ + clear_bit(PPC440SPE_RXOR_RUN, + &ppc440spe_rxor_state); + } + } + + if (rv & DMA_CDB_STATUS_MSK) { + /* ZeroSum check failed + */ + struct ppc440spe_adma_desc_slot *iter; + dma_addr_t phys = rv & ~DMA_CDB_MSK; + + /* + * Update the status of corresponding + * descriptor. + */ + list_for_each_entry(iter, &chan->chain, + chain_node) { + if (iter->phys == phys) + break; + } + /* + * if cannot find the corresponding + * slot it's a bug + */ + BUG_ON(&iter->chain_node == &chan->chain); + + if (iter->xor_check_result) { + if (test_bit(PPC440SPE_DESC_PCHECK, + &iter->flags)) { + *iter->xor_check_result |= + SUM_CHECK_P_RESULT; + } else + if (test_bit(PPC440SPE_DESC_QCHECK, + &iter->flags)) { + *iter->xor_check_result |= + SUM_CHECK_Q_RESULT; + } else + BUG(); + } + } + } + + rv = ioread32(&dma_reg->dsts); + if (rv) { + pr_err("DMA%d err status: 0x%x\n", + chan->device->id, rv); + /* write back to clear */ + iowrite32(rv, &dma_reg->dsts); + } + break; + case PPC440SPE_XOR_ID: + /* reset status bits to ack */ + xor_reg = chan->device->xor_reg; + rv = ioread32be(&xor_reg->sr); + iowrite32be(rv, &xor_reg->sr); + + if (rv & (XOR_IE_ICBIE_BIT|XOR_IE_ICIE_BIT|XOR_IE_RPTIE_BIT)) { + if (rv & XOR_IE_RPTIE_BIT) { + /* Read PLB Timeout Error. + * Try to resubmit the CB + */ + u32 val = ioread32be(&xor_reg->ccbalr); + + iowrite32be(val, &xor_reg->cblalr); + + val = ioread32be(&xor_reg->crsr); + iowrite32be(val | XOR_CRSR_XAE_BIT, + &xor_reg->crsr); + } else + pr_err("XOR ERR 0x%x status\n", rv); + break; + } + + /* if the XORcore is idle, but there are unprocessed CBs + * then refetch the s/w chain here + */ + if (!(ioread32be(&xor_reg->sr) & XOR_SR_XCP_BIT) && + do_xor_refetch) + ppc440spe_chan_append(chan); + break; + } +} + +/** + * ppc440spe_chan_is_busy - get the channel status + */ +static int ppc440spe_chan_is_busy(struct ppc440spe_adma_chan *chan) +{ + struct dma_regs *dma_reg; + struct xor_regs *xor_reg; + int busy = 0; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_reg = chan->device->dma_reg; + /* if command FIFO's head and tail pointers are equal and + * status tail is the same as command, then channel is free + */ + if (ioread16(&dma_reg->cpfhp) != ioread16(&dma_reg->cpftp) || + ioread16(&dma_reg->cpftp) != ioread16(&dma_reg->csftp)) + busy = 1; + break; + case PPC440SPE_XOR_ID: + /* use the special status bit for the XORcore + */ + xor_reg = chan->device->xor_reg; + busy = (ioread32be(&xor_reg->sr) & XOR_SR_XCP_BIT) ? 1 : 0; + break; + } + + return busy; +} + +/** + * ppc440spe_chan_set_first_xor_descriptor - init XORcore chain + */ +static void ppc440spe_chan_set_first_xor_descriptor( + struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *next_desc) +{ + struct xor_regs *xor_reg = chan->device->xor_reg; + + if (ioread32be(&xor_reg->sr) & XOR_SR_XCP_BIT) + printk(KERN_INFO "%s: Warn: XORcore is running " + "when try to set the first CDB!\n", + __func__); + + xor_last_submit = xor_last_linked = next_desc; + + iowrite32be(XOR_CRSR_64BA_BIT, &xor_reg->crsr); + + iowrite32be(next_desc->phys, &xor_reg->cblalr); + iowrite32be(0, &xor_reg->cblahr); + iowrite32be(ioread32be(&xor_reg->cbcr) | XOR_CBCR_LNK_BIT, + &xor_reg->cbcr); + + chan->hw_chain_inited = 1; +} + +/** + * ppc440spe_dma_put_desc - put DMA0,1 descriptor to FIFO. + * called with irqs disabled + */ +static void ppc440spe_dma_put_desc(struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *desc) +{ + u32 pcdb; + struct dma_regs *dma_reg = chan->device->dma_reg; + + pcdb = desc->phys; + if (!test_bit(PPC440SPE_DESC_INT, &desc->flags)) + pcdb |= DMA_CDB_NO_INT; + + chan_last_sub[chan->device->id] = desc; + + ADMA_LL_DBG(print_cb(chan, desc->hw_desc)); + + iowrite32(pcdb, &dma_reg->cpfpl); +} + +/** + * ppc440spe_chan_append - update the h/w chain in the channel + */ +static void ppc440spe_chan_append(struct ppc440spe_adma_chan *chan) +{ + struct xor_regs *xor_reg; + struct ppc440spe_adma_desc_slot *iter; + struct xor_cb *xcb; + u32 cur_desc; + unsigned long flags; + + local_irq_save(flags); + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + cur_desc = ppc440spe_chan_get_current_descriptor(chan); + + if (likely(cur_desc)) { + iter = chan_last_sub[chan->device->id]; + BUG_ON(!iter); + } else { + /* first peer */ + iter = chan_first_cdb[chan->device->id]; + BUG_ON(!iter); + ppc440spe_dma_put_desc(chan, iter); + chan->hw_chain_inited = 1; + } + + /* is there something new to append */ + if (!iter->hw_next) + break; + + /* flush descriptors from the s/w queue to fifo */ + list_for_each_entry_continue(iter, &chan->chain, chain_node) { + ppc440spe_dma_put_desc(chan, iter); + if (!iter->hw_next) + break; + } + break; + case PPC440SPE_XOR_ID: + /* update h/w links and refetch */ + if (!xor_last_submit->hw_next) + break; + + xor_reg = chan->device->xor_reg; + /* the last linked CDB has to generate an interrupt + * that we'd be able to append the next lists to h/w + * regardless of the XOR engine state at the moment of + * appending of these next lists + */ + xcb = xor_last_linked->hw_desc; + xcb->cbc |= XOR_CBCR_CBCE_BIT; + + if (!(ioread32be(&xor_reg->sr) & XOR_SR_XCP_BIT)) { + /* XORcore is idle. Refetch now */ + do_xor_refetch = 0; + ppc440spe_xor_set_link(xor_last_submit, + xor_last_submit->hw_next); + + ADMA_LL_DBG(print_cb_list(chan, + xor_last_submit->hw_next)); + + xor_last_submit = xor_last_linked; + iowrite32be(ioread32be(&xor_reg->crsr) | + XOR_CRSR_RCBE_BIT | XOR_CRSR_64BA_BIT, + &xor_reg->crsr); + } else { + /* XORcore is running. Refetch later in the handler */ + do_xor_refetch = 1; + } + + break; + } + + local_irq_restore(flags); +} + +/** + * ppc440spe_chan_get_current_descriptor - get the currently executed descriptor + */ +static u32 +ppc440spe_chan_get_current_descriptor(struct ppc440spe_adma_chan *chan) +{ + struct dma_regs *dma_reg; + struct xor_regs *xor_reg; + + if (unlikely(!chan->hw_chain_inited)) + /* h/w descriptor chain is not initialized yet */ + return 0; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_reg = chan->device->dma_reg; + return ioread32(&dma_reg->acpl) & (~DMA_CDB_MSK); + case PPC440SPE_XOR_ID: + xor_reg = chan->device->xor_reg; + return ioread32be(&xor_reg->ccbalr); + } + return 0; +} + +/** + * ppc440spe_chan_run - enable the channel + */ +static void ppc440spe_chan_run(struct ppc440spe_adma_chan *chan) +{ + struct xor_regs *xor_reg; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* DMAs are always enabled, do nothing */ + break; + case PPC440SPE_XOR_ID: + /* drain write buffer */ + xor_reg = chan->device->xor_reg; + + /* fetch descriptor pointed to in */ + iowrite32be(XOR_CRSR_64BA_BIT | XOR_CRSR_XAE_BIT, + &xor_reg->crsr); + break; + } +} + +/****************************************************************************** + * ADMA device level + ******************************************************************************/ + +static void ppc440spe_chan_start_null_xor(struct ppc440spe_adma_chan *chan); +static int ppc440spe_adma_alloc_chan_resources(struct dma_chan *chan); + +static dma_cookie_t +ppc440spe_adma_tx_submit(struct dma_async_tx_descriptor *tx); + +static void ppc440spe_adma_set_dest(struct ppc440spe_adma_desc_slot *tx, + dma_addr_t addr, int index); +static void +ppc440spe_adma_memcpy_xor_set_src(struct ppc440spe_adma_desc_slot *tx, + dma_addr_t addr, int index); + +static void +ppc440spe_adma_pq_set_dest(struct ppc440spe_adma_desc_slot *tx, + dma_addr_t *paddr, unsigned long flags); +static void +ppc440spe_adma_pq_set_src(struct ppc440spe_adma_desc_slot *tx, + dma_addr_t addr, int index); +static void +ppc440spe_adma_pq_set_src_mult(struct ppc440spe_adma_desc_slot *tx, + unsigned char mult, int index, int dst_pos); +static void +ppc440spe_adma_pqzero_sum_set_dest(struct ppc440spe_adma_desc_slot *tx, + dma_addr_t paddr, dma_addr_t qaddr); + +static struct page *ppc440spe_rxor_srcs[32]; + +/** + * ppc440spe_can_rxor - check if the operands may be processed with RXOR + */ +static int ppc440spe_can_rxor(struct page **srcs, int src_cnt, size_t len) +{ + int i, order = 0, state = 0; + int idx = 0; + + if (unlikely(!(src_cnt > 1))) + return 0; + + BUG_ON(src_cnt > ARRAY_SIZE(ppc440spe_rxor_srcs)); + + /* Skip holes in the source list before checking */ + for (i = 0; i < src_cnt; i++) { + if (!srcs[i]) + continue; + ppc440spe_rxor_srcs[idx++] = srcs[i]; + } + src_cnt = idx; + + for (i = 1; i < src_cnt; i++) { + char *cur_addr = page_address(ppc440spe_rxor_srcs[i]); + char *old_addr = page_address(ppc440spe_rxor_srcs[i - 1]); + + switch (state) { + case 0: + if (cur_addr == old_addr + len) { + /* direct RXOR */ + order = 1; + state = 1; + } else if (old_addr == cur_addr + len) { + /* reverse RXOR */ + order = -1; + state = 1; + } else + goto out; + break; + case 1: + if ((i == src_cnt - 2) || + (order == -1 && cur_addr != old_addr - len)) { + order = 0; + state = 0; + } else if ((cur_addr == old_addr + len * order) || + (cur_addr == old_addr + 2 * len) || + (cur_addr == old_addr + 3 * len)) { + state = 2; + } else { + order = 0; + state = 0; + } + break; + case 2: + order = 0; + state = 0; + break; + } + } + +out: + if (state == 1 || state == 2) + return 1; + + return 0; +} + +/** + * ppc440spe_adma_device_estimate - estimate the efficiency of processing + * the operation given on this channel. It's assumed that 'chan' is + * capable to process 'cap' type of operation. + * @chan: channel to use + * @cap: type of transaction + * @dst_lst: array of destination pointers + * @dst_cnt: number of destination operands + * @src_lst: array of source pointers + * @src_cnt: number of source operands + * @src_sz: size of each source operand + */ +static int ppc440spe_adma_estimate(struct dma_chan *chan, + enum dma_transaction_type cap, struct page **dst_lst, int dst_cnt, + struct page **src_lst, int src_cnt, size_t src_sz) +{ + int ef = 1; + + if (cap == DMA_PQ || cap == DMA_PQ_VAL) { + /* If RAID-6 capabilities were not activated don't try + * to use them + */ + if (unlikely(!ppc440spe_r6_enabled)) + return -1; + } + /* In the current implementation of ppc440spe ADMA driver it + * makes sense to pick out only pq case, because it may be + * processed: + * (1) either using Biskup method on DMA2; + * (2) or on DMA0/1. + * Thus we give a favour to (1) if the sources are suitable; + * else let it be processed on one of the DMA0/1 engines. + * In the sum_product case where destination is also the + * source process it on DMA0/1 only. + */ + if (cap == DMA_PQ && chan->chan_id == PPC440SPE_XOR_ID) { + + if (dst_cnt == 1 && src_cnt == 2 && dst_lst[0] == src_lst[1]) + ef = 0; /* sum_product case, process on DMA0/1 */ + else if (ppc440spe_can_rxor(src_lst, src_cnt, src_sz)) + ef = 3; /* override (DMA0/1 + idle) */ + else + ef = 0; /* can't process on DMA2 if !rxor */ + } + + /* channel idleness increases the priority */ + if (likely(ef) && + !ppc440spe_chan_is_busy(to_ppc440spe_adma_chan(chan))) + ef++; + + return ef; +} + +struct dma_chan * +ppc440spe_async_tx_find_best_channel(enum dma_transaction_type cap, + struct page **dst_lst, int dst_cnt, struct page **src_lst, + int src_cnt, size_t src_sz) +{ + struct dma_chan *best_chan = NULL; + struct ppc_dma_chan_ref *ref; + int best_rank = -1; + + if (unlikely(!src_sz)) + return NULL; + if (src_sz > PAGE_SIZE) { + /* + * should a user of the api ever pass > PAGE_SIZE requests + * we sort out cases where temporary page-sized buffers + * are used. + */ + switch (cap) { + case DMA_PQ: + if (src_cnt == 1 && dst_lst[1] == src_lst[0]) + return NULL; + if (src_cnt == 2 && dst_lst[1] == src_lst[1]) + return NULL; + break; + case DMA_PQ_VAL: + case DMA_XOR_VAL: + return NULL; + default: + break; + } + } + + list_for_each_entry(ref, &ppc440spe_adma_chan_list, node) { + if (dma_has_cap(cap, ref->chan->device->cap_mask)) { + int rank; + + rank = ppc440spe_adma_estimate(ref->chan, cap, dst_lst, + dst_cnt, src_lst, src_cnt, src_sz); + if (rank > best_rank) { + best_rank = rank; + best_chan = ref->chan; + } + } + } + + return best_chan; +} +EXPORT_SYMBOL_GPL(ppc440spe_async_tx_find_best_channel); + +/** + * ppc440spe_get_group_entry - get group entry with index idx + * @tdesc: is the last allocated slot in the group. + */ +static struct ppc440spe_adma_desc_slot * +ppc440spe_get_group_entry(struct ppc440spe_adma_desc_slot *tdesc, u32 entry_idx) +{ + struct ppc440spe_adma_desc_slot *iter = tdesc->group_head; + int i = 0; + + if (entry_idx < 0 || entry_idx >= (tdesc->src_cnt + tdesc->dst_cnt)) { + printk("%s: entry_idx %d, src_cnt %d, dst_cnt %d\n", + __func__, entry_idx, tdesc->src_cnt, tdesc->dst_cnt); + BUG(); + } + + list_for_each_entry(iter, &tdesc->group_list, chain_node) { + if (i++ == entry_idx) + break; + } + return iter; +} + +/** + * ppc440spe_adma_free_slots - flags descriptor slots for reuse + * @slot: Slot to free + * Caller must hold &ppc440spe_chan->lock while calling this function + */ +static void ppc440spe_adma_free_slots(struct ppc440spe_adma_desc_slot *slot, + struct ppc440spe_adma_chan *chan) +{ + int stride = slot->slots_per_op; + + while (stride--) { + slot->slots_per_op = 0; + slot = list_entry(slot->slot_node.next, + struct ppc440spe_adma_desc_slot, + slot_node); + } +} + +static void ppc440spe_adma_unmap(struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *desc) +{ + u32 src_cnt, dst_cnt; + dma_addr_t addr; + + /* + * get the number of sources & destination + * included in this descriptor and unmap + * them all + */ + src_cnt = ppc440spe_desc_get_src_num(desc, chan); + dst_cnt = ppc440spe_desc_get_dst_num(desc, chan); + + /* unmap destinations */ + if (!(desc->async_tx.flags & DMA_COMPL_SKIP_DEST_UNMAP)) { + while (dst_cnt--) { + addr = ppc440spe_desc_get_dest_addr( + desc, chan, dst_cnt); + dma_unmap_page(chan->device->dev, + addr, desc->unmap_len, + DMA_FROM_DEVICE); + } + } + + /* unmap sources */ + if (!(desc->async_tx.flags & DMA_COMPL_SKIP_SRC_UNMAP)) { + while (src_cnt--) { + addr = ppc440spe_desc_get_src_addr( + desc, chan, src_cnt); + dma_unmap_page(chan->device->dev, + addr, desc->unmap_len, + DMA_TO_DEVICE); + } + } +} + +/** + * ppc440spe_adma_run_tx_complete_actions - call functions to be called + * upon completion + */ +static dma_cookie_t ppc440spe_adma_run_tx_complete_actions( + struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan, + dma_cookie_t cookie) +{ + int i; + + BUG_ON(desc->async_tx.cookie < 0); + if (desc->async_tx.cookie > 0) { + cookie = desc->async_tx.cookie; + desc->async_tx.cookie = 0; + + /* call the callback (must not sleep or submit new + * operations to this channel) + */ + if (desc->async_tx.callback) + desc->async_tx.callback( + desc->async_tx.callback_param); + + /* unmap dma addresses + * (unmap_single vs unmap_page?) + * + * actually, ppc's dma_unmap_page() functions are empty, so + * the following code is just for the sake of completeness + */ + if (chan && chan->needs_unmap && desc->group_head && + desc->unmap_len) { + struct ppc440spe_adma_desc_slot *unmap = + desc->group_head; + /* assume 1 slot per op always */ + u32 slot_count = unmap->slot_cnt; + + /* Run through the group list and unmap addresses */ + for (i = 0; i < slot_count; i++) { + BUG_ON(!unmap); + ppc440spe_adma_unmap(chan, unmap); + unmap = unmap->hw_next; + } + } + } + + /* run dependent operations */ + dma_run_dependencies(&desc->async_tx); + + return cookie; +} + +/** + * ppc440spe_adma_clean_slot - clean up CDB slot (if ack is set) + */ +static int ppc440spe_adma_clean_slot(struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_adma_chan *chan) +{ + /* the client is allowed to attach dependent operations + * until 'ack' is set + */ + if (!async_tx_test_ack(&desc->async_tx)) + return 0; + + /* leave the last descriptor in the chain + * so we can append to it + */ + if (list_is_last(&desc->chain_node, &chan->chain) || + desc->phys == ppc440spe_chan_get_current_descriptor(chan)) + return 1; + + if (chan->device->id != PPC440SPE_XOR_ID) { + /* our DMA interrupt handler clears opc field of + * each processed descriptor. For all types of + * operations except for ZeroSum we do not actually + * need ack from the interrupt handler. ZeroSum is a + * special case since the result of this operation + * is available from the handler only, so if we see + * such type of descriptor (which is unprocessed yet) + * then leave it in chain. + */ + struct dma_cdb *cdb = desc->hw_desc; + if (cdb->opc == DMA_CDB_OPC_DCHECK128) + return 1; + } + + dev_dbg(chan->device->common.dev, "\tfree slot %llx: %d stride: %d\n", + desc->phys, desc->idx, desc->slots_per_op); + + list_del(&desc->chain_node); + ppc440spe_adma_free_slots(desc, chan); + return 0; +} + +/** + * __ppc440spe_adma_slot_cleanup - this is the common clean-up routine + * which runs through the channel CDBs list until reach the descriptor + * currently processed. When routine determines that all CDBs of group + * are completed then corresponding callbacks (if any) are called and slots + * are freed. + */ +static void __ppc440spe_adma_slot_cleanup(struct ppc440spe_adma_chan *chan) +{ + struct ppc440spe_adma_desc_slot *iter, *_iter, *group_start = NULL; + dma_cookie_t cookie = 0; + u32 current_desc = ppc440spe_chan_get_current_descriptor(chan); + int busy = ppc440spe_chan_is_busy(chan); + int seen_current = 0, slot_cnt = 0, slots_per_op = 0; + + dev_dbg(chan->device->common.dev, "ppc440spe adma%d: %s\n", + chan->device->id, __func__); + + if (!current_desc) { + /* There were no transactions yet, so + * nothing to clean + */ + return; + } + + /* free completed slots from the chain starting with + * the oldest descriptor + */ + list_for_each_entry_safe(iter, _iter, &chan->chain, + chain_node) { + dev_dbg(chan->device->common.dev, "\tcookie: %d slot: %d " + "busy: %d this_desc: %#llx next_desc: %#x " + "cur: %#x ack: %d\n", + iter->async_tx.cookie, iter->idx, busy, iter->phys, + ppc440spe_desc_get_link(iter, chan), current_desc, + async_tx_test_ack(&iter->async_tx)); + prefetch(_iter); + prefetch(&_iter->async_tx); + + /* do not advance past the current descriptor loaded into the + * hardware channel,subsequent descriptors are either in process + * or have not been submitted + */ + if (seen_current) + break; + + /* stop the search if we reach the current descriptor and the + * channel is busy, or if it appears that the current descriptor + * needs to be re-read (i.e. has been appended to) + */ + if (iter->phys == current_desc) { + BUG_ON(seen_current++); + if (busy || ppc440spe_desc_get_link(iter, chan)) { + /* not all descriptors of the group have + * been completed; exit. + */ + break; + } + } + + /* detect the start of a group transaction */ + if (!slot_cnt && !slots_per_op) { + slot_cnt = iter->slot_cnt; + slots_per_op = iter->slots_per_op; + if (slot_cnt <= slots_per_op) { + slot_cnt = 0; + slots_per_op = 0; + } + } + + if (slot_cnt) { + if (!group_start) + group_start = iter; + slot_cnt -= slots_per_op; + } + + /* all the members of a group are complete */ + if (slots_per_op != 0 && slot_cnt == 0) { + struct ppc440spe_adma_desc_slot *grp_iter, *_grp_iter; + int end_of_chain = 0; + + /* clean up the group */ + slot_cnt = group_start->slot_cnt; + grp_iter = group_start; + list_for_each_entry_safe_from(grp_iter, _grp_iter, + &chan->chain, chain_node) { + + cookie = ppc440spe_adma_run_tx_complete_actions( + grp_iter, chan, cookie); + + slot_cnt -= slots_per_op; + end_of_chain = ppc440spe_adma_clean_slot( + grp_iter, chan); + if (end_of_chain && slot_cnt) { + /* Should wait for ZeroSum completion */ + if (cookie > 0) + chan->completed_cookie = cookie; + return; + } + + if (slot_cnt == 0 || end_of_chain) + break; + } + + /* the group should be complete at this point */ + BUG_ON(slot_cnt); + + slots_per_op = 0; + group_start = NULL; + if (end_of_chain) + break; + else + continue; + } else if (slots_per_op) /* wait for group completion */ + continue; + + cookie = ppc440spe_adma_run_tx_complete_actions(iter, chan, + cookie); + + if (ppc440spe_adma_clean_slot(iter, chan)) + break; + } + + BUG_ON(!seen_current); + + if (cookie > 0) { + chan->completed_cookie = cookie; + pr_debug("\tcompleted cookie %d\n", cookie); + } + +} + +/** + * ppc440spe_adma_tasklet - clean up watch-dog initiator + */ +static void ppc440spe_adma_tasklet(unsigned long data) +{ + struct ppc440spe_adma_chan *chan = (struct ppc440spe_adma_chan *) data; + + spin_lock_nested(&chan->lock, SINGLE_DEPTH_NESTING); + __ppc440spe_adma_slot_cleanup(chan); + spin_unlock(&chan->lock); +} + +/** + * ppc440spe_adma_slot_cleanup - clean up scheduled initiator + */ +static void ppc440spe_adma_slot_cleanup(struct ppc440spe_adma_chan *chan) +{ + spin_lock_bh(&chan->lock); + __ppc440spe_adma_slot_cleanup(chan); + spin_unlock_bh(&chan->lock); +} + +/** + * ppc440spe_adma_alloc_slots - allocate free slots (if any) + */ +static struct ppc440spe_adma_desc_slot *ppc440spe_adma_alloc_slots( + struct ppc440spe_adma_chan *chan, int num_slots, + int slots_per_op) +{ + struct ppc440spe_adma_desc_slot *iter = NULL, *_iter; + struct ppc440spe_adma_desc_slot *alloc_start = NULL; + struct list_head chain = LIST_HEAD_INIT(chain); + int slots_found, retry = 0; + + + BUG_ON(!num_slots || !slots_per_op); + /* start search from the last allocated descrtiptor + * if a contiguous allocation can not be found start searching + * from the beginning of the list + */ +retry: + slots_found = 0; + if (retry == 0) + iter = chan->last_used; + else + iter = list_entry(&chan->all_slots, + struct ppc440spe_adma_desc_slot, + slot_node); + list_for_each_entry_safe_continue(iter, _iter, &chan->all_slots, + slot_node) { + prefetch(_iter); + prefetch(&_iter->async_tx); + if (iter->slots_per_op) { + slots_found = 0; + continue; + } + + /* start the allocation if the slot is correctly aligned */ + if (!slots_found++) + alloc_start = iter; + + if (slots_found == num_slots) { + struct ppc440spe_adma_desc_slot *alloc_tail = NULL; + struct ppc440spe_adma_desc_slot *last_used = NULL; + + iter = alloc_start; + while (num_slots) { + int i; + /* pre-ack all but the last descriptor */ + if (num_slots != slots_per_op) + async_tx_ack(&iter->async_tx); + + list_add_tail(&iter->chain_node, &chain); + alloc_tail = iter; + iter->async_tx.cookie = 0; + iter->hw_next = NULL; + iter->flags = 0; + iter->slot_cnt = num_slots; + iter->xor_check_result = NULL; + for (i = 0; i < slots_per_op; i++) { + iter->slots_per_op = slots_per_op - i; + last_used = iter; + iter = list_entry(iter->slot_node.next, + struct ppc440spe_adma_desc_slot, + slot_node); + } + num_slots -= slots_per_op; + } + alloc_tail->group_head = alloc_start; + alloc_tail->async_tx.cookie = -EBUSY; + list_splice(&chain, &alloc_tail->group_list); + chan->last_used = last_used; + return alloc_tail; + } + } + if (!retry++) + goto retry; + + /* try to free some slots if the allocation fails */ + tasklet_schedule(&chan->irq_tasklet); + return NULL; +} + +/** + * ppc440spe_adma_alloc_chan_resources - allocate pools for CDB slots + */ +static int ppc440spe_adma_alloc_chan_resources(struct dma_chan *chan) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *slot = NULL; + char *hw_desc; + int i, db_sz; + int init; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + init = ppc440spe_chan->slots_allocated ? 0 : 1; + chan->chan_id = ppc440spe_chan->device->id; + + /* Allocate descriptor slots */ + i = ppc440spe_chan->slots_allocated; + if (ppc440spe_chan->device->id != PPC440SPE_XOR_ID) + db_sz = sizeof(struct dma_cdb); + else + db_sz = sizeof(struct xor_cb); + + for (; i < (ppc440spe_chan->device->pool_size / db_sz); i++) { + slot = kzalloc(sizeof(struct ppc440spe_adma_desc_slot), + GFP_KERNEL); + if (!slot) { + printk(KERN_INFO "SPE ADMA Channel only initialized" + " %d descriptor slots", i--); + break; + } + + hw_desc = (char *) ppc440spe_chan->device->dma_desc_pool_virt; + slot->hw_desc = (void *) &hw_desc[i * db_sz]; + dma_async_tx_descriptor_init(&slot->async_tx, chan); + slot->async_tx.tx_submit = ppc440spe_adma_tx_submit; + INIT_LIST_HEAD(&slot->chain_node); + INIT_LIST_HEAD(&slot->slot_node); + INIT_LIST_HEAD(&slot->group_list); + slot->phys = ppc440spe_chan->device->dma_desc_pool + i * db_sz; + slot->idx = i; + + spin_lock_bh(&ppc440spe_chan->lock); + ppc440spe_chan->slots_allocated++; + list_add_tail(&slot->slot_node, &ppc440spe_chan->all_slots); + spin_unlock_bh(&ppc440spe_chan->lock); + } + + if (i && !ppc440spe_chan->last_used) { + ppc440spe_chan->last_used = + list_entry(ppc440spe_chan->all_slots.next, + struct ppc440spe_adma_desc_slot, + slot_node); + } + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: allocated %d descriptor slots\n", + ppc440spe_chan->device->id, i); + + /* initialize the channel and the chain with a null operation */ + if (init) { + switch (ppc440spe_chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + ppc440spe_chan->hw_chain_inited = 0; + /* Use WXOR for self-testing */ + if (!ppc440spe_r6_tchan) + ppc440spe_r6_tchan = ppc440spe_chan; + break; + case PPC440SPE_XOR_ID: + ppc440spe_chan_start_null_xor(ppc440spe_chan); + break; + default: + BUG(); + } + ppc440spe_chan->needs_unmap = 1; + } + + return (i > 0) ? i : -ENOMEM; +} + +/** + * ppc440spe_desc_assign_cookie - assign a cookie + */ +static dma_cookie_t ppc440spe_desc_assign_cookie( + struct ppc440spe_adma_chan *chan, + struct ppc440spe_adma_desc_slot *desc) +{ + dma_cookie_t cookie = chan->common.cookie; + + cookie++; + if (cookie < 0) + cookie = 1; + chan->common.cookie = desc->async_tx.cookie = cookie; + return cookie; +} + +/** + * ppc440spe_rxor_set_region_data - + */ +static void ppc440spe_rxor_set_region(struct ppc440spe_adma_desc_slot *desc, + u8 xor_arg_no, u32 mask) +{ + struct xor_cb *xcb = desc->hw_desc; + + xcb->ops[xor_arg_no].h |= mask; +} + +/** + * ppc440spe_rxor_set_src - + */ +static void ppc440spe_rxor_set_src(struct ppc440spe_adma_desc_slot *desc, + u8 xor_arg_no, dma_addr_t addr) +{ + struct xor_cb *xcb = desc->hw_desc; + + xcb->ops[xor_arg_no].h |= DMA_CUED_XOR_BASE; + xcb->ops[xor_arg_no].l = addr; +} + +/** + * ppc440spe_rxor_set_mult - + */ +static void ppc440spe_rxor_set_mult(struct ppc440spe_adma_desc_slot *desc, + u8 xor_arg_no, u8 idx, u8 mult) +{ + struct xor_cb *xcb = desc->hw_desc; + + xcb->ops[xor_arg_no].h |= mult << (DMA_CUED_MULT1_OFF + idx * 8); +} + +/** + * ppc440spe_adma_check_threshold - append CDBs to h/w chain if threshold + * has been achieved + */ +static void ppc440spe_adma_check_threshold(struct ppc440spe_adma_chan *chan) +{ + dev_dbg(chan->device->common.dev, "ppc440spe adma%d: pending: %d\n", + chan->device->id, chan->pending); + + if (chan->pending >= PPC440SPE_ADMA_THRESHOLD) { + chan->pending = 0; + ppc440spe_chan_append(chan); + } +} + +/** + * ppc440spe_adma_tx_submit - submit new descriptor group to the channel + * (it's not necessary that descriptors will be submitted to the h/w + * chains too right now) + */ +static dma_cookie_t ppc440spe_adma_tx_submit(struct dma_async_tx_descriptor *tx) +{ + struct ppc440spe_adma_desc_slot *sw_desc; + struct ppc440spe_adma_chan *chan = to_ppc440spe_adma_chan(tx->chan); + struct ppc440spe_adma_desc_slot *group_start, *old_chain_tail; + int slot_cnt; + int slots_per_op; + dma_cookie_t cookie; + + sw_desc = tx_to_ppc440spe_adma_slot(tx); + + group_start = sw_desc->group_head; + slot_cnt = group_start->slot_cnt; + slots_per_op = group_start->slots_per_op; + + spin_lock_bh(&chan->lock); + + cookie = ppc440spe_desc_assign_cookie(chan, sw_desc); + + if (unlikely(list_empty(&chan->chain))) { + /* first peer */ + list_splice_init(&sw_desc->group_list, &chan->chain); + chan_first_cdb[chan->device->id] = group_start; + } else { + /* isn't first peer, bind CDBs to chain */ + old_chain_tail = list_entry(chan->chain.prev, + struct ppc440spe_adma_desc_slot, + chain_node); + list_splice_init(&sw_desc->group_list, + &old_chain_tail->chain_node); + /* fix up the hardware chain */ + ppc440spe_desc_set_link(chan, old_chain_tail, group_start); + } + + /* increment the pending count by the number of operations */ + chan->pending += slot_cnt / slots_per_op; + ppc440spe_adma_check_threshold(chan); + spin_unlock_bh(&chan->lock); + + dev_dbg(chan->device->common.dev, + "ppc440spe adma%d: %s cookie: %d slot: %d tx %p\n", + chan->device->id, __func__, + sw_desc->async_tx.cookie, sw_desc->idx, sw_desc); + + return cookie; +} + +/** + * ppc440spe_adma_prep_dma_interrupt - prepare CDB for a pseudo DMA operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_interrupt( + struct dma_chan *chan, unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc, *group_start; + int slot_cnt, slots_per_op; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s\n", ppc440spe_chan->device->id, + __func__); + + spin_lock_bh(&ppc440spe_chan->lock); + slot_cnt = slots_per_op = 1; + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, + slots_per_op); + if (sw_desc) { + group_start = sw_desc->group_head; + ppc440spe_desc_init_interrupt(group_start, ppc440spe_chan); + group_start->unmap_len = 0; + sw_desc->async_tx.flags = flags; + } + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc ? &sw_desc->async_tx : NULL; +} + +/** + * ppc440spe_adma_prep_dma_memcpy - prepare CDB for a MEMCPY operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_memcpy( + struct dma_chan *chan, dma_addr_t dma_dest, + dma_addr_t dma_src, size_t len, unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc, *group_start; + int slot_cnt, slots_per_op; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + if (unlikely(!len)) + return NULL; + + BUG_ON(unlikely(len > PPC440SPE_ADMA_DMA_MAX_BYTE_COUNT)); + + spin_lock_bh(&ppc440spe_chan->lock); + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s len: %u int_en %d\n", + ppc440spe_chan->device->id, __func__, len, + flags & DMA_PREP_INTERRUPT ? 1 : 0); + slot_cnt = slots_per_op = 1; + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, + slots_per_op); + if (sw_desc) { + group_start = sw_desc->group_head; + ppc440spe_desc_init_memcpy(group_start, flags); + ppc440spe_adma_set_dest(group_start, dma_dest, 0); + ppc440spe_adma_memcpy_xor_set_src(group_start, dma_src, 0); + ppc440spe_desc_set_byte_count(group_start, ppc440spe_chan, len); + sw_desc->unmap_len = len; + sw_desc->async_tx.flags = flags; + } + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc ? &sw_desc->async_tx : NULL; +} + +/** + * ppc440spe_adma_prep_dma_memset - prepare CDB for a MEMSET operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_memset( + struct dma_chan *chan, dma_addr_t dma_dest, int value, + size_t len, unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc, *group_start; + int slot_cnt, slots_per_op; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + if (unlikely(!len)) + return NULL; + + BUG_ON(unlikely(len > PPC440SPE_ADMA_DMA_MAX_BYTE_COUNT)); + + spin_lock_bh(&ppc440spe_chan->lock); + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s cal: %u len: %u int_en %d\n", + ppc440spe_chan->device->id, __func__, value, len, + flags & DMA_PREP_INTERRUPT ? 1 : 0); + + slot_cnt = slots_per_op = 1; + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, + slots_per_op); + if (sw_desc) { + group_start = sw_desc->group_head; + ppc440spe_desc_init_memset(group_start, value, flags); + ppc440spe_adma_set_dest(group_start, dma_dest, 0); + ppc440spe_desc_set_byte_count(group_start, ppc440spe_chan, len); + sw_desc->unmap_len = len; + sw_desc->async_tx.flags = flags; + } + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc ? &sw_desc->async_tx : NULL; +} + +/** + * ppc440spe_adma_prep_dma_xor - prepare CDB for a XOR operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_xor( + struct dma_chan *chan, dma_addr_t dma_dest, + dma_addr_t *dma_src, u32 src_cnt, size_t len, + unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc, *group_start; + int slot_cnt, slots_per_op; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + ADMA_LL_DBG(prep_dma_xor_dbg(ppc440spe_chan->device->id, + dma_dest, dma_src, src_cnt)); + if (unlikely(!len)) + return NULL; + BUG_ON(unlikely(len > PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT)); + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s src_cnt: %d len: %u int_en: %d\n", + ppc440spe_chan->device->id, __func__, src_cnt, len, + flags & DMA_PREP_INTERRUPT ? 1 : 0); + + spin_lock_bh(&ppc440spe_chan->lock); + slot_cnt = ppc440spe_chan_xor_slot_count(len, src_cnt, &slots_per_op); + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, + slots_per_op); + if (sw_desc) { + group_start = sw_desc->group_head; + ppc440spe_desc_init_xor(group_start, src_cnt, flags); + ppc440spe_adma_set_dest(group_start, dma_dest, 0); + while (src_cnt--) + ppc440spe_adma_memcpy_xor_set_src(group_start, + dma_src[src_cnt], src_cnt); + ppc440spe_desc_set_byte_count(group_start, ppc440spe_chan, len); + sw_desc->unmap_len = len; + sw_desc->async_tx.flags = flags; + } + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc ? &sw_desc->async_tx : NULL; +} + +static inline void +ppc440spe_desc_set_xor_src_cnt(struct ppc440spe_adma_desc_slot *desc, + int src_cnt); +static void ppc440spe_init_rxor_cursor(struct ppc440spe_rxor *cursor); + +/** + * ppc440spe_adma_init_dma2rxor_slot - + */ +static void ppc440spe_adma_init_dma2rxor_slot( + struct ppc440spe_adma_desc_slot *desc, + dma_addr_t *src, int src_cnt) +{ + int i; + + /* initialize CDB */ + for (i = 0; i < src_cnt; i++) { + ppc440spe_adma_dma2rxor_prep_src(desc, &desc->rxor_cursor, i, + desc->src_cnt, (u32)src[i]); + } +} + +/** + * ppc440spe_dma01_prep_mult - + * for Q operation where destination is also the source + */ +static struct ppc440spe_adma_desc_slot *ppc440spe_dma01_prep_mult( + struct ppc440spe_adma_chan *ppc440spe_chan, + dma_addr_t *dst, int dst_cnt, dma_addr_t *src, int src_cnt, + const unsigned char *scf, size_t len, unsigned long flags) +{ + struct ppc440spe_adma_desc_slot *sw_desc = NULL; + unsigned long op = 0; + int slot_cnt; + + set_bit(PPC440SPE_DESC_WXOR, &op); + slot_cnt = 2; + + spin_lock_bh(&ppc440spe_chan->lock); + + /* use WXOR, each descriptor occupies one slot */ + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, 1); + if (sw_desc) { + struct ppc440spe_adma_chan *chan; + struct ppc440spe_adma_desc_slot *iter; + struct dma_cdb *hw_desc; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + set_bits(op, &sw_desc->flags); + sw_desc->src_cnt = src_cnt; + sw_desc->dst_cnt = dst_cnt; + /* First descriptor, zero data in the destination and copy it + * to q page using MULTICAST transfer. + */ + iter = list_first_entry(&sw_desc->group_list, + struct ppc440spe_adma_desc_slot, + chain_node); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + /* set 'next' pointer */ + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MULTICAST; + + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, dst[0], 0); + ppc440spe_desc_set_dest_addr(iter, chan, 0, dst[1], 1); + ppc440spe_desc_set_src_addr(iter, chan, 0, DMA_CUED_XOR_HB, + src[0]); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, len); + iter->unmap_len = len; + + /* + * Second descriptor, multiply data from the q page + * and store the result in real destination. + */ + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->hw_next = NULL; + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &iter->flags); + else + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + ppc440spe_desc_set_src_addr(iter, chan, 0, + DMA_CUED_XOR_HB, dst[1]); + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, dst[0], 0); + + ppc440spe_desc_set_src_mult(iter, chan, DMA_CUED_MULT1_OFF, + DMA_CDB_SG_DST1, scf[0]); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, len); + iter->unmap_len = len; + sw_desc->async_tx.flags = flags; + } + + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc; +} + +/** + * ppc440spe_dma01_prep_sum_product - + * Dx = A*(P+Pxy) + B*(Q+Qxy) operation where destination is also + * the source. + */ +static struct ppc440spe_adma_desc_slot *ppc440spe_dma01_prep_sum_product( + struct ppc440spe_adma_chan *ppc440spe_chan, + dma_addr_t *dst, dma_addr_t *src, int src_cnt, + const unsigned char *scf, size_t len, unsigned long flags) +{ + struct ppc440spe_adma_desc_slot *sw_desc = NULL; + unsigned long op = 0; + int slot_cnt; + + set_bit(PPC440SPE_DESC_WXOR, &op); + slot_cnt = 3; + + spin_lock_bh(&ppc440spe_chan->lock); + + /* WXOR, each descriptor occupies one slot */ + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, 1); + if (sw_desc) { + struct ppc440spe_adma_chan *chan; + struct ppc440spe_adma_desc_slot *iter; + struct dma_cdb *hw_desc; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + set_bits(op, &sw_desc->flags); + sw_desc->src_cnt = src_cnt; + sw_desc->dst_cnt = 1; + /* 1st descriptor, src[1] data to q page and zero destination */ + iter = list_first_entry(&sw_desc->group_list, + struct ppc440spe_adma_desc_slot, + chain_node); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MULTICAST; + + ppc440spe_desc_set_dest_addr(iter, chan, DMA_CUED_XOR_BASE, + *dst, 0); + ppc440spe_desc_set_dest_addr(iter, chan, 0, + ppc440spe_chan->qdest, 1); + ppc440spe_desc_set_src_addr(iter, chan, 0, DMA_CUED_XOR_HB, + src[1]); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, len); + iter->unmap_len = len; + + /* 2nd descriptor, multiply src[1] data and store the + * result in destination */ + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + /* set 'next' pointer */ + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &iter->flags); + else + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + ppc440spe_desc_set_src_addr(iter, chan, 0, DMA_CUED_XOR_HB, + ppc440spe_chan->qdest); + ppc440spe_desc_set_dest_addr(iter, chan, DMA_CUED_XOR_BASE, + *dst, 0); + ppc440spe_desc_set_src_mult(iter, chan, DMA_CUED_MULT1_OFF, + DMA_CDB_SG_DST1, scf[1]); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, len); + iter->unmap_len = len; + + /* + * 3rd descriptor, multiply src[0] data and xor it + * with destination + */ + iter = list_first_entry(&iter->chain_node, + struct ppc440spe_adma_desc_slot, + chain_node); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->hw_next = NULL; + if (flags & DMA_PREP_INTERRUPT) + set_bit(PPC440SPE_DESC_INT, &iter->flags); + else + clear_bit(PPC440SPE_DESC_INT, &iter->flags); + + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + ppc440spe_desc_set_src_addr(iter, chan, 0, DMA_CUED_XOR_HB, + src[0]); + ppc440spe_desc_set_dest_addr(iter, chan, DMA_CUED_XOR_BASE, + *dst, 0); + ppc440spe_desc_set_src_mult(iter, chan, DMA_CUED_MULT1_OFF, + DMA_CDB_SG_DST1, scf[0]); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, len); + iter->unmap_len = len; + sw_desc->async_tx.flags = flags; + } + + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc; +} + +static struct ppc440spe_adma_desc_slot *ppc440spe_dma01_prep_pq( + struct ppc440spe_adma_chan *ppc440spe_chan, + dma_addr_t *dst, int dst_cnt, dma_addr_t *src, int src_cnt, + const unsigned char *scf, size_t len, unsigned long flags) +{ + int slot_cnt; + struct ppc440spe_adma_desc_slot *sw_desc = NULL, *iter; + unsigned long op = 0; + unsigned char mult = 1; + + pr_debug("%s: dst_cnt %d, src_cnt %d, len %d\n", + __func__, dst_cnt, src_cnt, len); + /* select operations WXOR/RXOR depending on the + * source addresses of operators and the number + * of destinations (RXOR support only Q-parity calculations) + */ + set_bit(PPC440SPE_DESC_WXOR, &op); + if (!test_and_set_bit(PPC440SPE_RXOR_RUN, &ppc440spe_rxor_state)) { + /* no active RXOR; + * do RXOR if: + * - there are more than 1 source, + * - len is aligned on 512-byte boundary, + * - source addresses fit to one of 4 possible regions. + */ + if (src_cnt > 1 && + !(len & MQ0_CF2H_RXOR_BS_MASK) && + (src[0] + len) == src[1]) { + /* may do RXOR R1 R2 */ + set_bit(PPC440SPE_DESC_RXOR, &op); + if (src_cnt != 2) { + /* may try to enhance region of RXOR */ + if ((src[1] + len) == src[2]) { + /* do RXOR R1 R2 R3 */ + set_bit(PPC440SPE_DESC_RXOR123, + &op); + } else if ((src[1] + len * 2) == src[2]) { + /* do RXOR R1 R2 R4 */ + set_bit(PPC440SPE_DESC_RXOR124, &op); + } else if ((src[1] + len * 3) == src[2]) { + /* do RXOR R1 R2 R5 */ + set_bit(PPC440SPE_DESC_RXOR125, + &op); + } else { + /* do RXOR R1 R2 */ + set_bit(PPC440SPE_DESC_RXOR12, + &op); + } + } else { + /* do RXOR R1 R2 */ + set_bit(PPC440SPE_DESC_RXOR12, &op); + } + } + + if (!test_bit(PPC440SPE_DESC_RXOR, &op)) { + /* can not do this operation with RXOR */ + clear_bit(PPC440SPE_RXOR_RUN, + &ppc440spe_rxor_state); + } else { + /* can do; set block size right now */ + ppc440spe_desc_set_rxor_block_size(len); + } + } + + /* Number of necessary slots depends on operation type selected */ + if (!test_bit(PPC440SPE_DESC_RXOR, &op)) { + /* This is a WXOR only chain. Need descriptors for each + * source to GF-XOR them with WXOR, and need descriptors + * for each destination to zero them with WXOR + */ + slot_cnt = src_cnt; + + if (flags & DMA_PREP_ZERO_P) { + slot_cnt++; + set_bit(PPC440SPE_ZERO_P, &op); + } + if (flags & DMA_PREP_ZERO_Q) { + slot_cnt++; + set_bit(PPC440SPE_ZERO_Q, &op); + } + } else { + /* Need 1/2 descriptor for RXOR operation, and + * need (src_cnt - (2 or 3)) for WXOR of sources + * remained (if any) + */ + slot_cnt = dst_cnt; + + if (flags & DMA_PREP_ZERO_P) + set_bit(PPC440SPE_ZERO_P, &op); + if (flags & DMA_PREP_ZERO_Q) + set_bit(PPC440SPE_ZERO_Q, &op); + + if (test_bit(PPC440SPE_DESC_RXOR12, &op)) + slot_cnt += src_cnt - 2; + else + slot_cnt += src_cnt - 3; + + /* Thus we have either RXOR only chain or + * mixed RXOR/WXOR + */ + if (slot_cnt == dst_cnt) + /* RXOR only chain */ + clear_bit(PPC440SPE_DESC_WXOR, &op); + } + + spin_lock_bh(&ppc440spe_chan->lock); + /* for both RXOR/WXOR each descriptor occupies one slot */ + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, 1); + if (sw_desc) { + ppc440spe_desc_init_dma01pq(sw_desc, dst_cnt, src_cnt, + flags, op); + + /* setup dst/src/mult */ + pr_debug("%s: set dst descriptor 0, 1: 0x%016llx, 0x%016llx\n", + __func__, dst[0], dst[1]); + ppc440spe_adma_pq_set_dest(sw_desc, dst, flags); + while (src_cnt--) { + ppc440spe_adma_pq_set_src(sw_desc, src[src_cnt], + src_cnt); + + /* NOTE: "Multi = 0 is equivalent to = 1" as it + * stated in 440SPSPe_RAID6_Addendum_UM_1_17.pdf + * doesn't work for RXOR with DMA0/1! Instead, multi=0 + * leads to zeroing source data after RXOR. + * So, for P case set-up mult=1 explicitly. + */ + if (!(flags & DMA_PREP_PQ_DISABLE_Q)) + mult = scf[src_cnt]; + ppc440spe_adma_pq_set_src_mult(sw_desc, + mult, src_cnt, dst_cnt - 1); + } + + /* Setup byte count foreach slot just allocated */ + sw_desc->async_tx.flags = flags; + list_for_each_entry(iter, &sw_desc->group_list, + chain_node) { + ppc440spe_desc_set_byte_count(iter, + ppc440spe_chan, len); + iter->unmap_len = len; + } + } + spin_unlock_bh(&ppc440spe_chan->lock); + + return sw_desc; +} + +static struct ppc440spe_adma_desc_slot *ppc440spe_dma2_prep_pq( + struct ppc440spe_adma_chan *ppc440spe_chan, + dma_addr_t *dst, int dst_cnt, dma_addr_t *src, int src_cnt, + const unsigned char *scf, size_t len, unsigned long flags) +{ + int slot_cnt, descs_per_op; + struct ppc440spe_adma_desc_slot *sw_desc = NULL, *iter; + unsigned long op = 0; + unsigned char mult = 1; + + BUG_ON(!dst_cnt); + /*pr_debug("%s: dst_cnt %d, src_cnt %d, len %d\n", + __func__, dst_cnt, src_cnt, len);*/ + + spin_lock_bh(&ppc440spe_chan->lock); + descs_per_op = ppc440spe_dma2_pq_slot_count(src, src_cnt, len); + if (descs_per_op < 0) { + spin_unlock_bh(&ppc440spe_chan->lock); + return NULL; + } + + /* depending on number of sources we have 1 or 2 RXOR chains */ + slot_cnt = descs_per_op * dst_cnt; + + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, 1); + if (sw_desc) { + op = slot_cnt; + sw_desc->async_tx.flags = flags; + list_for_each_entry(iter, &sw_desc->group_list, chain_node) { + ppc440spe_desc_init_dma2pq(iter, dst_cnt, src_cnt, + --op ? 0 : flags); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, + len); + iter->unmap_len = len; + + ppc440spe_init_rxor_cursor(&(iter->rxor_cursor)); + iter->rxor_cursor.len = len; + iter->descs_per_op = descs_per_op; + } + op = 0; + list_for_each_entry(iter, &sw_desc->group_list, chain_node) { + op++; + if (op % descs_per_op == 0) + ppc440spe_adma_init_dma2rxor_slot(iter, src, + src_cnt); + if (likely(!list_is_last(&iter->chain_node, + &sw_desc->group_list))) { + /* set 'next' pointer */ + iter->hw_next = + list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + ppc440spe_xor_set_link(iter, iter->hw_next); + } else { + /* this is the last descriptor. */ + iter->hw_next = NULL; + } + } + + /* fixup head descriptor */ + sw_desc->dst_cnt = dst_cnt; + if (flags & DMA_PREP_ZERO_P) + set_bit(PPC440SPE_ZERO_P, &sw_desc->flags); + if (flags & DMA_PREP_ZERO_Q) + set_bit(PPC440SPE_ZERO_Q, &sw_desc->flags); + + /* setup dst/src/mult */ + ppc440spe_adma_pq_set_dest(sw_desc, dst, flags); + + while (src_cnt--) { + /* handle descriptors (if dst_cnt == 2) inside + * the ppc440spe_adma_pq_set_srcxxx() functions + */ + ppc440spe_adma_pq_set_src(sw_desc, src[src_cnt], + src_cnt); + if (!(flags & DMA_PREP_PQ_DISABLE_Q)) + mult = scf[src_cnt]; + ppc440spe_adma_pq_set_src_mult(sw_desc, + mult, src_cnt, dst_cnt - 1); + } + } + spin_unlock_bh(&ppc440spe_chan->lock); + ppc440spe_desc_set_rxor_block_size(len); + return sw_desc; +} + +/** + * ppc440spe_adma_prep_dma_pq - prepare CDB (group) for a GF-XOR operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_pq( + struct dma_chan *chan, dma_addr_t *dst, dma_addr_t *src, + unsigned int src_cnt, const unsigned char *scf, + size_t len, unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc = NULL; + int dst_cnt = 0; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + ADMA_LL_DBG(prep_dma_pq_dbg(ppc440spe_chan->device->id, + dst, src, src_cnt)); + BUG_ON(!len); + BUG_ON(unlikely(len > PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT)); + BUG_ON(!src_cnt); + + if (src_cnt == 1 && dst[1] == src[0]) { + dma_addr_t dest[2]; + + /* dst[1] is real destination (Q) */ + dest[0] = dst[1]; + /* this is the page to multicast source data to */ + dest[1] = ppc440spe_chan->qdest; + sw_desc = ppc440spe_dma01_prep_mult(ppc440spe_chan, + dest, 2, src, src_cnt, scf, len, flags); + return sw_desc ? &sw_desc->async_tx : NULL; + } + + if (src_cnt == 2 && dst[1] == src[1]) { + sw_desc = ppc440spe_dma01_prep_sum_product(ppc440spe_chan, + &dst[1], src, 2, scf, len, flags); + return sw_desc ? &sw_desc->async_tx : NULL; + } + + if (!(flags & DMA_PREP_PQ_DISABLE_P)) { + BUG_ON(!dst[0]); + dst_cnt++; + flags |= DMA_PREP_ZERO_P; + } + + if (!(flags & DMA_PREP_PQ_DISABLE_Q)) { + BUG_ON(!dst[1]); + dst_cnt++; + flags |= DMA_PREP_ZERO_Q; + } + + BUG_ON(!dst_cnt); + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s src_cnt: %d len: %u int_en: %d\n", + ppc440spe_chan->device->id, __func__, src_cnt, len, + flags & DMA_PREP_INTERRUPT ? 1 : 0); + + switch (ppc440spe_chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + sw_desc = ppc440spe_dma01_prep_pq(ppc440spe_chan, + dst, dst_cnt, src, src_cnt, scf, + len, flags); + break; + + case PPC440SPE_XOR_ID: + sw_desc = ppc440spe_dma2_prep_pq(ppc440spe_chan, + dst, dst_cnt, src, src_cnt, scf, + len, flags); + break; + } + + return sw_desc ? &sw_desc->async_tx : NULL; +} + +/** + * ppc440spe_adma_prep_dma_pqzero_sum - prepare CDB group for + * a PQ_ZERO_SUM operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_pqzero_sum( + struct dma_chan *chan, dma_addr_t *pq, dma_addr_t *src, + unsigned int src_cnt, const unsigned char *scf, size_t len, + enum sum_check_flags *pqres, unsigned long flags) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *sw_desc, *iter; + dma_addr_t pdest, qdest; + int slot_cnt, slots_per_op, idst, dst_cnt; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + + if (flags & DMA_PREP_PQ_DISABLE_P) + pdest = 0; + else + pdest = pq[0]; + + if (flags & DMA_PREP_PQ_DISABLE_Q) + qdest = 0; + else + qdest = pq[1]; + + ADMA_LL_DBG(prep_dma_pqzero_sum_dbg(ppc440spe_chan->device->id, + src, src_cnt, scf)); + + /* Always use WXOR for P/Q calculations (two destinations). + * Need 1 or 2 extra slots to verify results are zero. + */ + idst = dst_cnt = (pdest && qdest) ? 2 : 1; + + /* One additional slot per destination to clone P/Q + * before calculation (we have to preserve destinations). + */ + slot_cnt = src_cnt + dst_cnt * 2; + slots_per_op = 1; + + spin_lock_bh(&ppc440spe_chan->lock); + sw_desc = ppc440spe_adma_alloc_slots(ppc440spe_chan, slot_cnt, + slots_per_op); + if (sw_desc) { + ppc440spe_desc_init_dma01pqzero_sum(sw_desc, dst_cnt, src_cnt); + + /* Setup byte count for each slot just allocated */ + sw_desc->async_tx.flags = flags; + list_for_each_entry(iter, &sw_desc->group_list, chain_node) { + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, + len); + iter->unmap_len = len; + } + + if (pdest) { + struct dma_cdb *hw_desc; + struct ppc440spe_adma_chan *chan; + + iter = sw_desc->group_head; + chan = to_ppc440spe_adma_chan(iter->async_tx.chan); + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + iter->src_cnt = 0; + iter->dst_cnt = 0; + ppc440spe_desc_set_dest_addr(iter, chan, 0, + ppc440spe_chan->pdest, 0); + ppc440spe_desc_set_src_addr(iter, chan, 0, 0, pdest); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, + len); + iter->unmap_len = 0; + /* override pdest to preserve original P */ + pdest = ppc440spe_chan->pdest; + } + if (qdest) { + struct dma_cdb *hw_desc; + struct ppc440spe_adma_chan *chan; + + iter = list_first_entry(&sw_desc->group_list, + struct ppc440spe_adma_desc_slot, + chain_node); + chan = to_ppc440spe_adma_chan(iter->async_tx.chan); + + if (pdest) { + iter = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + } + + memset(iter->hw_desc, 0, sizeof(struct dma_cdb)); + iter->hw_next = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + hw_desc = iter->hw_desc; + hw_desc->opc = DMA_CDB_OPC_MV_SG1_SG2; + iter->src_cnt = 0; + iter->dst_cnt = 0; + ppc440spe_desc_set_dest_addr(iter, chan, 0, + ppc440spe_chan->qdest, 0); + ppc440spe_desc_set_src_addr(iter, chan, 0, 0, qdest); + ppc440spe_desc_set_byte_count(iter, ppc440spe_chan, + len); + iter->unmap_len = 0; + /* override qdest to preserve original Q */ + qdest = ppc440spe_chan->qdest; + } + + /* Setup destinations for P/Q ops */ + ppc440spe_adma_pqzero_sum_set_dest(sw_desc, pdest, qdest); + + /* Setup zero QWORDs into DCHECK CDBs */ + idst = dst_cnt; + list_for_each_entry_reverse(iter, &sw_desc->group_list, + chain_node) { + /* + * The last CDB corresponds to Q-parity check, + * the one before last CDB corresponds + * P-parity check + */ + if (idst == DMA_DEST_MAX_NUM) { + if (idst == dst_cnt) { + set_bit(PPC440SPE_DESC_QCHECK, + &iter->flags); + } else { + set_bit(PPC440SPE_DESC_PCHECK, + &iter->flags); + } + } else { + if (qdest) { + set_bit(PPC440SPE_DESC_QCHECK, + &iter->flags); + } else { + set_bit(PPC440SPE_DESC_PCHECK, + &iter->flags); + } + } + iter->xor_check_result = pqres; + + /* + * set it to zero, if check fail then result will + * be updated + */ + *iter->xor_check_result = 0; + ppc440spe_desc_set_dcheck(iter, ppc440spe_chan, + ppc440spe_qword); + + if (!(--dst_cnt)) + break; + } + + /* Setup sources and mults for P/Q ops */ + list_for_each_entry_continue_reverse(iter, &sw_desc->group_list, + chain_node) { + struct ppc440spe_adma_chan *chan; + u32 mult_dst; + + chan = to_ppc440spe_adma_chan(iter->async_tx.chan); + ppc440spe_desc_set_src_addr(iter, chan, 0, + DMA_CUED_XOR_HB, + src[src_cnt - 1]); + if (qdest) { + mult_dst = (dst_cnt - 1) ? DMA_CDB_SG_DST2 : + DMA_CDB_SG_DST1; + ppc440spe_desc_set_src_mult(iter, chan, + DMA_CUED_MULT1_OFF, + mult_dst, + scf[src_cnt - 1]); + } + if (!(--src_cnt)) + break; + } + } + spin_unlock_bh(&ppc440spe_chan->lock); + return sw_desc ? &sw_desc->async_tx : NULL; +} + +/** + * ppc440spe_adma_prep_dma_xor_zero_sum - prepare CDB group for + * XOR ZERO_SUM operation + */ +static struct dma_async_tx_descriptor *ppc440spe_adma_prep_dma_xor_zero_sum( + struct dma_chan *chan, dma_addr_t *src, unsigned int src_cnt, + size_t len, enum sum_check_flags *result, unsigned long flags) +{ + struct dma_async_tx_descriptor *tx; + dma_addr_t pq[2]; + + /* validate P, disable Q */ + pq[0] = src[0]; + pq[1] = 0; + flags |= DMA_PREP_PQ_DISABLE_Q; + + tx = ppc440spe_adma_prep_dma_pqzero_sum(chan, pq, &src[1], + src_cnt - 1, 0, len, + result, flags); + return tx; +} + +/** + * ppc440spe_adma_set_dest - set destination address into descriptor + */ +static void ppc440spe_adma_set_dest(struct ppc440spe_adma_desc_slot *sw_desc, + dma_addr_t addr, int index) +{ + struct ppc440spe_adma_chan *chan; + + BUG_ON(index >= sw_desc->dst_cnt); + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* to do: support transfers lengths > + * PPC440SPE_ADMA_DMA/XOR_MAX_BYTE_COUNT + */ + ppc440spe_desc_set_dest_addr(sw_desc->group_head, + chan, 0, addr, index); + break; + case PPC440SPE_XOR_ID: + sw_desc = ppc440spe_get_group_entry(sw_desc, index); + ppc440spe_desc_set_dest_addr(sw_desc, + chan, 0, addr, index); + break; + } +} + +static void ppc440spe_adma_pq_zero_op(struct ppc440spe_adma_desc_slot *iter, + struct ppc440spe_adma_chan *chan, dma_addr_t addr) +{ + /* To clear destinations update the descriptor + * (P or Q depending on index) as follows: + * addr is destination (0 corresponds to SG2): + */ + ppc440spe_desc_set_dest_addr(iter, chan, DMA_CUED_XOR_BASE, addr, 0); + + /* ... and the addr is source: */ + ppc440spe_desc_set_src_addr(iter, chan, 0, DMA_CUED_XOR_HB, addr); + + /* addr is always SG2 then the mult is always DST1 */ + ppc440spe_desc_set_src_mult(iter, chan, DMA_CUED_MULT1_OFF, + DMA_CDB_SG_DST1, 1); +} + +/** + * ppc440spe_adma_pq_set_dest - set destination address into descriptor + * for the PQXOR operation + */ +static void ppc440spe_adma_pq_set_dest(struct ppc440spe_adma_desc_slot *sw_desc, + dma_addr_t *addrs, unsigned long flags) +{ + struct ppc440spe_adma_desc_slot *iter; + struct ppc440spe_adma_chan *chan; + dma_addr_t paddr, qaddr; + dma_addr_t addr = 0, ppath, qpath; + int index = 0, i; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + + if (flags & DMA_PREP_PQ_DISABLE_P) + paddr = 0; + else + paddr = addrs[0]; + + if (flags & DMA_PREP_PQ_DISABLE_Q) + qaddr = 0; + else + qaddr = addrs[1]; + + if (!paddr || !qaddr) + addr = paddr ? paddr : qaddr; + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* walk through the WXOR source list and set P/Q-destinations + * for each slot: + */ + if (!test_bit(PPC440SPE_DESC_RXOR, &sw_desc->flags)) { + /* This is WXOR-only chain; may have 1/2 zero descs */ + if (test_bit(PPC440SPE_ZERO_P, &sw_desc->flags)) + index++; + if (test_bit(PPC440SPE_ZERO_Q, &sw_desc->flags)) + index++; + + iter = ppc440spe_get_group_entry(sw_desc, index); + if (addr) { + /* one destination */ + list_for_each_entry_from(iter, + &sw_desc->group_list, chain_node) + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, addr, 0); + } else { + /* two destinations */ + list_for_each_entry_from(iter, + &sw_desc->group_list, chain_node) { + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, paddr, 0); + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, qaddr, 1); + } + } + + if (index) { + /* To clear destinations update the descriptor + * (1st,2nd, or both depending on flags) + */ + index = 0; + if (test_bit(PPC440SPE_ZERO_P, + &sw_desc->flags)) { + iter = ppc440spe_get_group_entry( + sw_desc, index++); + ppc440spe_adma_pq_zero_op(iter, chan, + paddr); + } + + if (test_bit(PPC440SPE_ZERO_Q, + &sw_desc->flags)) { + iter = ppc440spe_get_group_entry( + sw_desc, index++); + ppc440spe_adma_pq_zero_op(iter, chan, + qaddr); + } + + return; + } + } else { + /* This is RXOR-only or RXOR/WXOR mixed chain */ + + /* If we want to include destination into calculations, + * then make dest addresses cued with mult=1 (XOR). + */ + ppath = test_bit(PPC440SPE_ZERO_P, &sw_desc->flags) ? + DMA_CUED_XOR_HB : + DMA_CUED_XOR_BASE | + (1 << DMA_CUED_MULT1_OFF); + qpath = test_bit(PPC440SPE_ZERO_Q, &sw_desc->flags) ? + DMA_CUED_XOR_HB : + DMA_CUED_XOR_BASE | + (1 << DMA_CUED_MULT1_OFF); + + /* Setup destination(s) in RXOR slot(s) */ + iter = ppc440spe_get_group_entry(sw_desc, index++); + ppc440spe_desc_set_dest_addr(iter, chan, + paddr ? ppath : qpath, + paddr ? paddr : qaddr, 0); + if (!addr) { + /* two destinations */ + iter = ppc440spe_get_group_entry(sw_desc, + index++); + ppc440spe_desc_set_dest_addr(iter, chan, + qpath, qaddr, 0); + } + + if (test_bit(PPC440SPE_DESC_WXOR, &sw_desc->flags)) { + /* Setup destination(s) in remaining WXOR + * slots + */ + iter = ppc440spe_get_group_entry(sw_desc, + index); + if (addr) { + /* one destination */ + list_for_each_entry_from(iter, + &sw_desc->group_list, + chain_node) + ppc440spe_desc_set_dest_addr( + iter, chan, + DMA_CUED_XOR_BASE, + addr, 0); + + } else { + /* two destinations */ + list_for_each_entry_from(iter, + &sw_desc->group_list, + chain_node) { + ppc440spe_desc_set_dest_addr( + iter, chan, + DMA_CUED_XOR_BASE, + paddr, 0); + ppc440spe_desc_set_dest_addr( + iter, chan, + DMA_CUED_XOR_BASE, + qaddr, 1); + } + } + } + + } + break; + + case PPC440SPE_XOR_ID: + /* DMA2 descriptors have only 1 destination, so there are + * two chains - one for each dest. + * If we want to include destination into calculations, + * then make dest addresses cued with mult=1 (XOR). + */ + ppath = test_bit(PPC440SPE_ZERO_P, &sw_desc->flags) ? + DMA_CUED_XOR_HB : + DMA_CUED_XOR_BASE | + (1 << DMA_CUED_MULT1_OFF); + + qpath = test_bit(PPC440SPE_ZERO_Q, &sw_desc->flags) ? + DMA_CUED_XOR_HB : + DMA_CUED_XOR_BASE | + (1 << DMA_CUED_MULT1_OFF); + + iter = ppc440spe_get_group_entry(sw_desc, 0); + for (i = 0; i < sw_desc->descs_per_op; i++) { + ppc440spe_desc_set_dest_addr(iter, chan, + paddr ? ppath : qpath, + paddr ? paddr : qaddr, 0); + iter = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + } + + if (!addr) { + /* Two destinations; setup Q here */ + iter = ppc440spe_get_group_entry(sw_desc, + sw_desc->descs_per_op); + for (i = 0; i < sw_desc->descs_per_op; i++) { + ppc440spe_desc_set_dest_addr(iter, + chan, qpath, qaddr, 0); + iter = list_entry(iter->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + } + } + + break; + } +} + +/** + * ppc440spe_adma_pq_zero_sum_set_dest - set destination address into descriptor + * for the PQ_ZERO_SUM operation + */ +static void ppc440spe_adma_pqzero_sum_set_dest( + struct ppc440spe_adma_desc_slot *sw_desc, + dma_addr_t paddr, dma_addr_t qaddr) +{ + struct ppc440spe_adma_desc_slot *iter, *end; + struct ppc440spe_adma_chan *chan; + dma_addr_t addr = 0; + int idx; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + + /* walk through the WXOR source list and set P/Q-destinations + * for each slot + */ + idx = (paddr && qaddr) ? 2 : 1; + /* set end */ + list_for_each_entry_reverse(end, &sw_desc->group_list, + chain_node) { + if (!(--idx)) + break; + } + /* set start */ + idx = (paddr && qaddr) ? 2 : 1; + iter = ppc440spe_get_group_entry(sw_desc, idx); + + if (paddr && qaddr) { + /* two destinations */ + list_for_each_entry_from(iter, &sw_desc->group_list, + chain_node) { + if (unlikely(iter == end)) + break; + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, paddr, 0); + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, qaddr, 1); + } + } else { + /* one destination */ + addr = paddr ? paddr : qaddr; + list_for_each_entry_from(iter, &sw_desc->group_list, + chain_node) { + if (unlikely(iter == end)) + break; + ppc440spe_desc_set_dest_addr(iter, chan, + DMA_CUED_XOR_BASE, addr, 0); + } + } + + /* The remaining descriptors are DATACHECK. These have no need in + * destination. Actually, these destinations are used there + * as sources for check operation. So, set addr as source. + */ + ppc440spe_desc_set_src_addr(end, chan, 0, 0, addr ? addr : paddr); + + if (!addr) { + end = list_entry(end->chain_node.next, + struct ppc440spe_adma_desc_slot, chain_node); + ppc440spe_desc_set_src_addr(end, chan, 0, 0, qaddr); + } +} + +/** + * ppc440spe_desc_set_xor_src_cnt - set source count into descriptor + */ +static inline void ppc440spe_desc_set_xor_src_cnt( + struct ppc440spe_adma_desc_slot *desc, + int src_cnt) +{ + struct xor_cb *hw_desc = desc->hw_desc; + + hw_desc->cbc &= ~XOR_CDCR_OAC_MSK; + hw_desc->cbc |= src_cnt; +} + +/** + * ppc440spe_adma_pq_set_src - set source address into descriptor + */ +static void ppc440spe_adma_pq_set_src(struct ppc440spe_adma_desc_slot *sw_desc, + dma_addr_t addr, int index) +{ + struct ppc440spe_adma_chan *chan; + dma_addr_t haddr = 0; + struct ppc440spe_adma_desc_slot *iter = NULL; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + /* DMA0,1 may do: WXOR, RXOR, RXOR+WXORs chain + */ + if (test_bit(PPC440SPE_DESC_RXOR, &sw_desc->flags)) { + /* RXOR-only or RXOR/WXOR operation */ + int iskip = test_bit(PPC440SPE_DESC_RXOR12, + &sw_desc->flags) ? 2 : 3; + + if (index == 0) { + /* 1st slot (RXOR) */ + /* setup sources region (R1-2-3, R1-2-4, + * or R1-2-5) + */ + if (test_bit(PPC440SPE_DESC_RXOR12, + &sw_desc->flags)) + haddr = DMA_RXOR12 << + DMA_CUED_REGION_OFF; + else if (test_bit(PPC440SPE_DESC_RXOR123, + &sw_desc->flags)) + haddr = DMA_RXOR123 << + DMA_CUED_REGION_OFF; + else if (test_bit(PPC440SPE_DESC_RXOR124, + &sw_desc->flags)) + haddr = DMA_RXOR124 << + DMA_CUED_REGION_OFF; + else if (test_bit(PPC440SPE_DESC_RXOR125, + &sw_desc->flags)) + haddr = DMA_RXOR125 << + DMA_CUED_REGION_OFF; + else + BUG(); + haddr |= DMA_CUED_XOR_BASE; + iter = ppc440spe_get_group_entry(sw_desc, 0); + } else if (index < iskip) { + /* 1st slot (RXOR) + * shall actually set source address only once + * instead of first + */ + iter = NULL; + } else { + /* 2nd/3d and next slots (WXOR); + * skip first slot with RXOR + */ + haddr = DMA_CUED_XOR_HB; + iter = ppc440spe_get_group_entry(sw_desc, + index - iskip + sw_desc->dst_cnt); + } + } else { + int znum = 0; + + /* WXOR-only operation; skip first slots with + * zeroing destinations + */ + if (test_bit(PPC440SPE_ZERO_P, &sw_desc->flags)) + znum++; + if (test_bit(PPC440SPE_ZERO_Q, &sw_desc->flags)) + znum++; + + haddr = DMA_CUED_XOR_HB; + iter = ppc440spe_get_group_entry(sw_desc, + index + znum); + } + + if (likely(iter)) { + ppc440spe_desc_set_src_addr(iter, chan, 0, haddr, addr); + + if (!index && + test_bit(PPC440SPE_DESC_RXOR, &sw_desc->flags) && + sw_desc->dst_cnt == 2) { + /* if we have two destinations for RXOR, then + * setup source in the second descr too + */ + iter = ppc440spe_get_group_entry(sw_desc, 1); + ppc440spe_desc_set_src_addr(iter, chan, 0, + haddr, addr); + } + } + break; + + case PPC440SPE_XOR_ID: + /* DMA2 may do Biskup */ + iter = sw_desc->group_head; + if (iter->dst_cnt == 2) { + /* both P & Q calculations required; set P src here */ + ppc440spe_adma_dma2rxor_set_src(iter, index, addr); + + /* this is for Q */ + iter = ppc440spe_get_group_entry(sw_desc, + sw_desc->descs_per_op); + } + ppc440spe_adma_dma2rxor_set_src(iter, index, addr); + break; + } +} + +/** + * ppc440spe_adma_memcpy_xor_set_src - set source address into descriptor + */ +static void ppc440spe_adma_memcpy_xor_set_src( + struct ppc440spe_adma_desc_slot *sw_desc, + dma_addr_t addr, int index) +{ + struct ppc440spe_adma_chan *chan; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + sw_desc = sw_desc->group_head; + + if (likely(sw_desc)) + ppc440spe_desc_set_src_addr(sw_desc, chan, index, 0, addr); +} + +/** + * ppc440spe_adma_dma2rxor_inc_addr - + */ +static void ppc440spe_adma_dma2rxor_inc_addr( + struct ppc440spe_adma_desc_slot *desc, + struct ppc440spe_rxor *cursor, int index, int src_cnt) +{ + cursor->addr_count++; + if (index == src_cnt - 1) { + ppc440spe_desc_set_xor_src_cnt(desc, cursor->addr_count); + } else if (cursor->addr_count == XOR_MAX_OPS) { + ppc440spe_desc_set_xor_src_cnt(desc, cursor->addr_count); + cursor->addr_count = 0; + cursor->desc_count++; + } +} + +/** + * ppc440spe_adma_dma2rxor_prep_src - setup RXOR types in DMA2 CDB + */ +static int ppc440spe_adma_dma2rxor_prep_src( + struct ppc440spe_adma_desc_slot *hdesc, + struct ppc440spe_rxor *cursor, int index, + int src_cnt, u32 addr) +{ + int rval = 0; + u32 sign; + struct ppc440spe_adma_desc_slot *desc = hdesc; + int i; + + for (i = 0; i < cursor->desc_count; i++) { + desc = list_entry(hdesc->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + } + + switch (cursor->state) { + case 0: + if (addr == cursor->addrl + cursor->len) { + /* direct RXOR */ + cursor->state = 1; + cursor->xor_count++; + if (index == src_cnt-1) { + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR12 << DMA_CUED_REGION_OFF); + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + } else if (cursor->addrl == addr + cursor->len) { + /* reverse RXOR */ + cursor->state = 1; + cursor->xor_count++; + set_bit(cursor->addr_count, &desc->reverse_flags[0]); + if (index == src_cnt-1) { + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR12 << DMA_CUED_REGION_OFF); + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + } else { + printk(KERN_ERR "Cannot build " + "DMA2 RXOR command block.\n"); + BUG(); + } + break; + case 1: + sign = test_bit(cursor->addr_count, + desc->reverse_flags) + ? -1 : 1; + if (index == src_cnt-2 || (sign == -1 + && addr != cursor->addrl - 2*cursor->len)) { + cursor->state = 0; + cursor->xor_count = 1; + cursor->addrl = addr; + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR12 << DMA_CUED_REGION_OFF); + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } else if (addr == cursor->addrl + 2*sign*cursor->len) { + cursor->state = 2; + cursor->xor_count = 0; + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR123 << DMA_CUED_REGION_OFF); + if (index == src_cnt-1) { + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + } else if (addr == cursor->addrl + 3*cursor->len) { + cursor->state = 2; + cursor->xor_count = 0; + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR124 << DMA_CUED_REGION_OFF); + if (index == src_cnt-1) { + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + } else if (addr == cursor->addrl + 4*cursor->len) { + cursor->state = 2; + cursor->xor_count = 0; + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR125 << DMA_CUED_REGION_OFF); + if (index == src_cnt-1) { + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + } else { + cursor->state = 0; + cursor->xor_count = 1; + cursor->addrl = addr; + ppc440spe_rxor_set_region(desc, + cursor->addr_count, + DMA_RXOR12 << DMA_CUED_REGION_OFF); + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + break; + case 2: + cursor->state = 0; + cursor->addrl = addr; + cursor->xor_count++; + if (index) { + ppc440spe_adma_dma2rxor_inc_addr( + desc, cursor, index, src_cnt); + } + break; + } + + return rval; +} + +/** + * ppc440spe_adma_dma2rxor_set_src - set RXOR source address; it's assumed that + * ppc440spe_adma_dma2rxor_prep_src() has already done prior this call + */ +static void ppc440spe_adma_dma2rxor_set_src( + struct ppc440spe_adma_desc_slot *desc, + int index, dma_addr_t addr) +{ + struct xor_cb *xcb = desc->hw_desc; + int k = 0, op = 0, lop = 0; + + /* get the RXOR operand which corresponds to index addr */ + while (op <= index) { + lop = op; + if (k == XOR_MAX_OPS) { + k = 0; + desc = list_entry(desc->chain_node.next, + struct ppc440spe_adma_desc_slot, chain_node); + xcb = desc->hw_desc; + + } + if ((xcb->ops[k++].h & (DMA_RXOR12 << DMA_CUED_REGION_OFF)) == + (DMA_RXOR12 << DMA_CUED_REGION_OFF)) + op += 2; + else + op += 3; + } + + BUG_ON(k < 1); + + if (test_bit(k-1, desc->reverse_flags)) { + /* reverse operand order; put last op in RXOR group */ + if (index == op - 1) + ppc440spe_rxor_set_src(desc, k - 1, addr); + } else { + /* direct operand order; put first op in RXOR group */ + if (index == lop) + ppc440spe_rxor_set_src(desc, k - 1, addr); + } +} + +/** + * ppc440spe_adma_dma2rxor_set_mult - set RXOR multipliers; it's assumed that + * ppc440spe_adma_dma2rxor_prep_src() has already done prior this call + */ +static void ppc440spe_adma_dma2rxor_set_mult( + struct ppc440spe_adma_desc_slot *desc, + int index, u8 mult) +{ + struct xor_cb *xcb = desc->hw_desc; + int k = 0, op = 0, lop = 0; + + /* get the RXOR operand which corresponds to index mult */ + while (op <= index) { + lop = op; + if (k == XOR_MAX_OPS) { + k = 0; + desc = list_entry(desc->chain_node.next, + struct ppc440spe_adma_desc_slot, + chain_node); + xcb = desc->hw_desc; + + } + if ((xcb->ops[k++].h & (DMA_RXOR12 << DMA_CUED_REGION_OFF)) == + (DMA_RXOR12 << DMA_CUED_REGION_OFF)) + op += 2; + else + op += 3; + } + + BUG_ON(k < 1); + if (test_bit(k-1, desc->reverse_flags)) { + /* reverse order */ + ppc440spe_rxor_set_mult(desc, k - 1, op - index - 1, mult); + } else { + /* direct order */ + ppc440spe_rxor_set_mult(desc, k - 1, index - lop, mult); + } +} + +/** + * ppc440spe_init_rxor_cursor - + */ +static void ppc440spe_init_rxor_cursor(struct ppc440spe_rxor *cursor) +{ + memset(cursor, 0, sizeof(struct ppc440spe_rxor)); + cursor->state = 2; +} + +/** + * ppc440spe_adma_pq_set_src_mult - set multiplication coefficient into + * descriptor for the PQXOR operation + */ +static void ppc440spe_adma_pq_set_src_mult( + struct ppc440spe_adma_desc_slot *sw_desc, + unsigned char mult, int index, int dst_pos) +{ + struct ppc440spe_adma_chan *chan; + u32 mult_idx, mult_dst; + struct ppc440spe_adma_desc_slot *iter = NULL, *iter1 = NULL; + + chan = to_ppc440spe_adma_chan(sw_desc->async_tx.chan); + + switch (chan->device->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + if (test_bit(PPC440SPE_DESC_RXOR, &sw_desc->flags)) { + int region = test_bit(PPC440SPE_DESC_RXOR12, + &sw_desc->flags) ? 2 : 3; + + if (index < region) { + /* RXOR multipliers */ + iter = ppc440spe_get_group_entry(sw_desc, + sw_desc->dst_cnt - 1); + if (sw_desc->dst_cnt == 2) + iter1 = ppc440spe_get_group_entry( + sw_desc, 0); + + mult_idx = DMA_CUED_MULT1_OFF + (index << 3); + mult_dst = DMA_CDB_SG_SRC; + } else { + /* WXOR multiplier */ + iter = ppc440spe_get_group_entry(sw_desc, + index - region + + sw_desc->dst_cnt); + mult_idx = DMA_CUED_MULT1_OFF; + mult_dst = dst_pos ? DMA_CDB_SG_DST2 : + DMA_CDB_SG_DST1; + } + } else { + int znum = 0; + + /* WXOR-only; + * skip first slots with destinations (if ZERO_DST has + * place) + */ + if (test_bit(PPC440SPE_ZERO_P, &sw_desc->flags)) + znum++; + if (test_bit(PPC440SPE_ZERO_Q, &sw_desc->flags)) + znum++; + + iter = ppc440spe_get_group_entry(sw_desc, index + znum); + mult_idx = DMA_CUED_MULT1_OFF; + mult_dst = dst_pos ? DMA_CDB_SG_DST2 : DMA_CDB_SG_DST1; + } + + if (likely(iter)) { + ppc440spe_desc_set_src_mult(iter, chan, + mult_idx, mult_dst, mult); + + if (unlikely(iter1)) { + /* if we have two destinations for RXOR, then + * we've just set Q mult. Set-up P now. + */ + ppc440spe_desc_set_src_mult(iter1, chan, + mult_idx, mult_dst, 1); + } + + } + break; + + case PPC440SPE_XOR_ID: + iter = sw_desc->group_head; + if (sw_desc->dst_cnt == 2) { + /* both P & Q calculations required; set P mult here */ + ppc440spe_adma_dma2rxor_set_mult(iter, index, 1); + + /* and then set Q mult */ + iter = ppc440spe_get_group_entry(sw_desc, + sw_desc->descs_per_op); + } + ppc440spe_adma_dma2rxor_set_mult(iter, index, mult); + break; + } +} + +/** + * ppc440spe_adma_free_chan_resources - free the resources allocated + */ +static void ppc440spe_adma_free_chan_resources(struct dma_chan *chan) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + struct ppc440spe_adma_desc_slot *iter, *_iter; + int in_use_descs = 0; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + ppc440spe_adma_slot_cleanup(ppc440spe_chan); + + spin_lock_bh(&ppc440spe_chan->lock); + list_for_each_entry_safe(iter, _iter, &ppc440spe_chan->chain, + chain_node) { + in_use_descs++; + list_del(&iter->chain_node); + } + list_for_each_entry_safe_reverse(iter, _iter, + &ppc440spe_chan->all_slots, slot_node) { + list_del(&iter->slot_node); + kfree(iter); + ppc440spe_chan->slots_allocated--; + } + ppc440spe_chan->last_used = NULL; + + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d %s slots_allocated %d\n", + ppc440spe_chan->device->id, + __func__, ppc440spe_chan->slots_allocated); + spin_unlock_bh(&ppc440spe_chan->lock); + + /* one is ok since we left it on there on purpose */ + if (in_use_descs > 1) + printk(KERN_ERR "SPE: Freeing %d in use descriptors!\n", + in_use_descs - 1); +} + +/** + * ppc440spe_adma_is_complete - poll the status of an ADMA transaction + * @chan: ADMA channel handle + * @cookie: ADMA transaction identifier + */ +static enum dma_status ppc440spe_adma_is_complete(struct dma_chan *chan, + dma_cookie_t cookie, dma_cookie_t *done, dma_cookie_t *used) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + dma_cookie_t last_used; + dma_cookie_t last_complete; + enum dma_status ret; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + last_used = chan->cookie; + last_complete = ppc440spe_chan->completed_cookie; + + if (done) + *done = last_complete; + if (used) + *used = last_used; + + ret = dma_async_is_complete(cookie, last_complete, last_used); + if (ret == DMA_SUCCESS) + return ret; + + ppc440spe_adma_slot_cleanup(ppc440spe_chan); + + last_used = chan->cookie; + last_complete = ppc440spe_chan->completed_cookie; + + if (done) + *done = last_complete; + if (used) + *used = last_used; + + return dma_async_is_complete(cookie, last_complete, last_used); +} + +/** + * ppc440spe_adma_eot_handler - end of transfer interrupt handler + */ +static irqreturn_t ppc440spe_adma_eot_handler(int irq, void *data) +{ + struct ppc440spe_adma_chan *chan = data; + + dev_dbg(chan->device->common.dev, + "ppc440spe adma%d: %s\n", chan->device->id, __func__); + + tasklet_schedule(&chan->irq_tasklet); + ppc440spe_adma_device_clear_eot_status(chan); + + return IRQ_HANDLED; +} + +/** + * ppc440spe_adma_err_handler - DMA error interrupt handler; + * do the same things as a eot handler + */ +static irqreturn_t ppc440spe_adma_err_handler(int irq, void *data) +{ + struct ppc440spe_adma_chan *chan = data; + + dev_dbg(chan->device->common.dev, + "ppc440spe adma%d: %s\n", chan->device->id, __func__); + + tasklet_schedule(&chan->irq_tasklet); + ppc440spe_adma_device_clear_eot_status(chan); + + return IRQ_HANDLED; +} + +/** + * ppc440spe_test_callback - called when test operation has been done + */ +static void ppc440spe_test_callback(void *unused) +{ + complete(&ppc440spe_r6_test_comp); +} + +/** + * ppc440spe_adma_issue_pending - flush all pending descriptors to h/w + */ +static void ppc440spe_adma_issue_pending(struct dma_chan *chan) +{ + struct ppc440spe_adma_chan *ppc440spe_chan; + + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + dev_dbg(ppc440spe_chan->device->common.dev, + "ppc440spe adma%d: %s %d \n", ppc440spe_chan->device->id, + __func__, ppc440spe_chan->pending); + + if (ppc440spe_chan->pending) { + ppc440spe_chan->pending = 0; + ppc440spe_chan_append(ppc440spe_chan); + } +} + +/** + * ppc440spe_chan_start_null_xor - initiate the first XOR operation (DMA engines + * use FIFOs (as opposite to chains used in XOR) so this is a XOR + * specific operation) + */ +static void ppc440spe_chan_start_null_xor(struct ppc440spe_adma_chan *chan) +{ + struct ppc440spe_adma_desc_slot *sw_desc, *group_start; + dma_cookie_t cookie; + int slot_cnt, slots_per_op; + + dev_dbg(chan->device->common.dev, + "ppc440spe adma%d: %s\n", chan->device->id, __func__); + + spin_lock_bh(&chan->lock); + slot_cnt = ppc440spe_chan_xor_slot_count(0, 2, &slots_per_op); + sw_desc = ppc440spe_adma_alloc_slots(chan, slot_cnt, slots_per_op); + if (sw_desc) { + group_start = sw_desc->group_head; + list_splice_init(&sw_desc->group_list, &chan->chain); + async_tx_ack(&sw_desc->async_tx); + ppc440spe_desc_init_null_xor(group_start); + + cookie = chan->common.cookie; + cookie++; + if (cookie <= 1) + cookie = 2; + + /* initialize the completed cookie to be less than + * the most recently used cookie + */ + chan->completed_cookie = cookie - 1; + chan->common.cookie = sw_desc->async_tx.cookie = cookie; + + /* channel should not be busy */ + BUG_ON(ppc440spe_chan_is_busy(chan)); + + /* set the descriptor address */ + ppc440spe_chan_set_first_xor_descriptor(chan, sw_desc); + + /* run the descriptor */ + ppc440spe_chan_run(chan); + } else + printk(KERN_ERR "ppc440spe adma%d" + " failed to allocate null descriptor\n", + chan->device->id); + spin_unlock_bh(&chan->lock); +} + +/** + * ppc440spe_test_raid6 - test are RAID-6 capabilities enabled successfully. + * For this we just perform one WXOR operation with the same source + * and destination addresses, the GF-multiplier is 1; so if RAID-6 + * capabilities are enabled then we'll get src/dst filled with zero. + */ +static int ppc440spe_test_raid6(struct ppc440spe_adma_chan *chan) +{ + struct ppc440spe_adma_desc_slot *sw_desc, *iter; + struct page *pg; + char *a; + dma_addr_t dma_addr, addrs[2]; + unsigned long op = 0; + int rval = 0; + + set_bit(PPC440SPE_DESC_WXOR, &op); + + pg = alloc_page(GFP_KERNEL); + if (!pg) + return -ENOMEM; + + spin_lock_bh(&chan->lock); + sw_desc = ppc440spe_adma_alloc_slots(chan, 1, 1); + if (sw_desc) { + /* 1 src, 1 dsr, int_ena, WXOR */ + ppc440spe_desc_init_dma01pq(sw_desc, 1, 1, 1, op); + list_for_each_entry(iter, &sw_desc->group_list, chain_node) { + ppc440spe_desc_set_byte_count(iter, chan, PAGE_SIZE); + iter->unmap_len = PAGE_SIZE; + } + } else { + rval = -EFAULT; + spin_unlock_bh(&chan->lock); + goto exit; + } + spin_unlock_bh(&chan->lock); + + /* Fill the test page with ones */ + memset(page_address(pg), 0xFF, PAGE_SIZE); + dma_addr = dma_map_page(chan->device->dev, pg, 0, + PAGE_SIZE, DMA_BIDIRECTIONAL); + + /* Setup addresses */ + ppc440spe_adma_pq_set_src(sw_desc, dma_addr, 0); + ppc440spe_adma_pq_set_src_mult(sw_desc, 1, 0, 0); + addrs[0] = dma_addr; + addrs[1] = 0; + ppc440spe_adma_pq_set_dest(sw_desc, addrs, DMA_PREP_PQ_DISABLE_Q); + + async_tx_ack(&sw_desc->async_tx); + sw_desc->async_tx.callback = ppc440spe_test_callback; + sw_desc->async_tx.callback_param = NULL; + + init_completion(&ppc440spe_r6_test_comp); + + ppc440spe_adma_tx_submit(&sw_desc->async_tx); + ppc440spe_adma_issue_pending(&chan->common); + + wait_for_completion(&ppc440spe_r6_test_comp); + + /* Now check if the test page is zeroed */ + a = page_address(pg); + if ((*(u32 *)a) == 0 && memcmp(a, a+4, PAGE_SIZE-4) == 0) { + /* page is zero - RAID-6 enabled */ + rval = 0; + } else { + /* RAID-6 was not enabled */ + rval = -EINVAL; + } +exit: + __free_page(pg); + return rval; +} + +static void ppc440spe_adma_init_capabilities(struct ppc440spe_adma_device *adev) +{ + switch (adev->id) { + case PPC440SPE_DMA0_ID: + case PPC440SPE_DMA1_ID: + dma_cap_set(DMA_MEMCPY, adev->common.cap_mask); + dma_cap_set(DMA_INTERRUPT, adev->common.cap_mask); + dma_cap_set(DMA_MEMSET, adev->common.cap_mask); + dma_cap_set(DMA_PQ, adev->common.cap_mask); + dma_cap_set(DMA_PQ_VAL, adev->common.cap_mask); + dma_cap_set(DMA_XOR_VAL, adev->common.cap_mask); + break; + case PPC440SPE_XOR_ID: + dma_cap_set(DMA_XOR, adev->common.cap_mask); + dma_cap_set(DMA_PQ, adev->common.cap_mask); + dma_cap_set(DMA_INTERRUPT, adev->common.cap_mask); + adev->common.cap_mask = adev->common.cap_mask; + break; + } + + /* Set base routines */ + adev->common.device_alloc_chan_resources = + ppc440spe_adma_alloc_chan_resources; + adev->common.device_free_chan_resources = + ppc440spe_adma_free_chan_resources; + adev->common.device_is_tx_complete = ppc440spe_adma_is_complete; + adev->common.device_issue_pending = ppc440spe_adma_issue_pending; + + /* Set prep routines based on capability */ + if (dma_has_cap(DMA_MEMCPY, adev->common.cap_mask)) { + adev->common.device_prep_dma_memcpy = + ppc440spe_adma_prep_dma_memcpy; + } + if (dma_has_cap(DMA_MEMSET, adev->common.cap_mask)) { + adev->common.device_prep_dma_memset = + ppc440spe_adma_prep_dma_memset; + } + if (dma_has_cap(DMA_XOR, adev->common.cap_mask)) { + adev->common.max_xor = XOR_MAX_OPS; + adev->common.device_prep_dma_xor = + ppc440spe_adma_prep_dma_xor; + } + if (dma_has_cap(DMA_PQ, adev->common.cap_mask)) { + switch (adev->id) { + case PPC440SPE_DMA0_ID: + dma_set_maxpq(&adev->common, + DMA0_FIFO_SIZE / sizeof(struct dma_cdb), 0); + break; + case PPC440SPE_DMA1_ID: + dma_set_maxpq(&adev->common, + DMA1_FIFO_SIZE / sizeof(struct dma_cdb), 0); + break; + case PPC440SPE_XOR_ID: + adev->common.max_pq = XOR_MAX_OPS * 3; + break; + } + adev->common.device_prep_dma_pq = + ppc440spe_adma_prep_dma_pq; + } + if (dma_has_cap(DMA_PQ_VAL, adev->common.cap_mask)) { + switch (adev->id) { + case PPC440SPE_DMA0_ID: + adev->common.max_pq = DMA0_FIFO_SIZE / + sizeof(struct dma_cdb); + break; + case PPC440SPE_DMA1_ID: + adev->common.max_pq = DMA1_FIFO_SIZE / + sizeof(struct dma_cdb); + break; + } + adev->common.device_prep_dma_pq_val = + ppc440spe_adma_prep_dma_pqzero_sum; + } + if (dma_has_cap(DMA_XOR_VAL, adev->common.cap_mask)) { + switch (adev->id) { + case PPC440SPE_DMA0_ID: + adev->common.max_xor = DMA0_FIFO_SIZE / + sizeof(struct dma_cdb); + break; + case PPC440SPE_DMA1_ID: + adev->common.max_xor = DMA1_FIFO_SIZE / + sizeof(struct dma_cdb); + break; + } + adev->common.device_prep_dma_xor_val = + ppc440spe_adma_prep_dma_xor_zero_sum; + } + if (dma_has_cap(DMA_INTERRUPT, adev->common.cap_mask)) { + adev->common.device_prep_dma_interrupt = + ppc440spe_adma_prep_dma_interrupt; + } + pr_info("%s: AMCC(R) PPC440SP(E) ADMA Engine: " + "( %s%s%s%s%s%s%s)\n", + dev_name(adev->dev), + dma_has_cap(DMA_PQ, adev->common.cap_mask) ? "pq " : "", + dma_has_cap(DMA_PQ_VAL, adev->common.cap_mask) ? "pq_val " : "", + dma_has_cap(DMA_XOR, adev->common.cap_mask) ? "xor " : "", + dma_has_cap(DMA_XOR_VAL, adev->common.cap_mask) ? "xor_val " : "", + dma_has_cap(DMA_MEMCPY, adev->common.cap_mask) ? "memcpy " : "", + dma_has_cap(DMA_MEMSET, adev->common.cap_mask) ? "memset " : "", + dma_has_cap(DMA_INTERRUPT, adev->common.cap_mask) ? "intr " : ""); +} + +static int ppc440spe_adma_setup_irqs(struct ppc440spe_adma_device *adev, + struct ppc440spe_adma_chan *chan, + int *initcode) +{ + struct device_node *np; + int ret; + + np = container_of(adev->dev, struct of_device, dev)->node; + if (adev->id != PPC440SPE_XOR_ID) { + adev->err_irq = irq_of_parse_and_map(np, 1); + if (adev->err_irq == NO_IRQ) { + dev_warn(adev->dev, "no err irq resource?\n"); + *initcode = PPC_ADMA_INIT_IRQ2; + adev->err_irq = -ENXIO; + } else + atomic_inc(&ppc440spe_adma_err_irq_ref); + } else { + adev->err_irq = -ENXIO; + } + + adev->irq = irq_of_parse_and_map(np, 0); + if (adev->irq == NO_IRQ) { + dev_err(adev->dev, "no irq resource\n"); + *initcode = PPC_ADMA_INIT_IRQ1; + ret = -ENXIO; + goto err_irq_map; + } + dev_dbg(adev->dev, "irq %d, err irq %d\n", + adev->irq, adev->err_irq); + + ret = request_irq(adev->irq, ppc440spe_adma_eot_handler, + 0, dev_driver_string(adev->dev), chan); + if (ret) { + dev_err(adev->dev, "can't request irq %d\n", + adev->irq); + *initcode = PPC_ADMA_INIT_IRQ1; + ret = -EIO; + goto err_req1; + } + + /* only DMA engines have a separate error IRQ + * so it's Ok if err_irq < 0 in XOR engine case. + */ + if (adev->err_irq > 0) { + /* both DMA engines share common error IRQ */ + ret = request_irq(adev->err_irq, + ppc440spe_adma_err_handler, + IRQF_SHARED, + dev_driver_string(adev->dev), + chan); + if (ret) { + dev_err(adev->dev, "can't request irq %d\n", + adev->err_irq); + *initcode = PPC_ADMA_INIT_IRQ2; + ret = -EIO; + goto err_req2; + } + } + + if (adev->id == PPC440SPE_XOR_ID) { + /* enable XOR engine interrupts */ + iowrite32be(XOR_IE_CBCIE_BIT | XOR_IE_ICBIE_BIT | + XOR_IE_ICIE_BIT | XOR_IE_RPTIE_BIT, + &adev->xor_reg->ier); + } else { + u32 mask, enable; + + np = of_find_compatible_node(NULL, NULL, "ibm,i2o-440spe"); + if (!np) { + pr_err("%s: can't find I2O device tree node\n", + __func__); + ret = -ENODEV; + goto err_req2; + } + adev->i2o_reg = of_iomap(np, 0); + if (!adev->i2o_reg) { + pr_err("%s: failed to map I2O registers\n", __func__); + of_node_put(np); + ret = -EINVAL; + goto err_req2; + } + of_node_put(np); + /* Unmask 'CS FIFO Attention' interrupts and + * enable generating interrupts on errors + */ + enable = (adev->id == PPC440SPE_DMA0_ID) ? + ~(I2O_IOPIM_P0SNE | I2O_IOPIM_P0EM) : + ~(I2O_IOPIM_P1SNE | I2O_IOPIM_P1EM); + mask = ioread32(&adev->i2o_reg->iopim) & enable; + iowrite32(mask, &adev->i2o_reg->iopim); + } + return 0; + +err_req2: + free_irq(adev->irq, chan); +err_req1: + irq_dispose_mapping(adev->irq); +err_irq_map: + if (adev->err_irq > 0) { + if (atomic_dec_and_test(&ppc440spe_adma_err_irq_ref)) + irq_dispose_mapping(adev->err_irq); + } + return ret; +} + +static void ppc440spe_adma_release_irqs(struct ppc440spe_adma_device *adev, + struct ppc440spe_adma_chan *chan) +{ + u32 mask, disable; + + if (adev->id == PPC440SPE_XOR_ID) { + /* disable XOR engine interrupts */ + mask = ioread32be(&adev->xor_reg->ier); + mask &= ~(XOR_IE_CBCIE_BIT | XOR_IE_ICBIE_BIT | + XOR_IE_ICIE_BIT | XOR_IE_RPTIE_BIT); + iowrite32be(mask, &adev->xor_reg->ier); + } else { + /* disable DMAx engine interrupts */ + disable = (adev->id == PPC440SPE_DMA0_ID) ? + (I2O_IOPIM_P0SNE | I2O_IOPIM_P0EM) : + (I2O_IOPIM_P1SNE | I2O_IOPIM_P1EM); + mask = ioread32(&adev->i2o_reg->iopim) | disable; + iowrite32(mask, &adev->i2o_reg->iopim); + } + free_irq(adev->irq, chan); + irq_dispose_mapping(adev->irq); + if (adev->err_irq > 0) { + free_irq(adev->err_irq, chan); + if (atomic_dec_and_test(&ppc440spe_adma_err_irq_ref)) { + irq_dispose_mapping(adev->err_irq); + iounmap(adev->i2o_reg); + } + } +} + +/** + * ppc440spe_adma_probe - probe the asynch device + */ +static int __devinit ppc440spe_adma_probe(struct of_device *ofdev, + const struct of_device_id *match) +{ + struct device_node *np = ofdev->node; + struct resource res; + struct ppc440spe_adma_device *adev; + struct ppc440spe_adma_chan *chan; + struct ppc_dma_chan_ref *ref, *_ref; + int ret = 0, initcode = PPC_ADMA_INIT_OK; + const u32 *idx; + int len; + void *regs; + u32 id, pool_size; + + if (of_device_is_compatible(np, "amcc,xor-accelerator")) { + id = PPC440SPE_XOR_ID; + /* As far as the XOR engine is concerned, it does not + * use FIFOs but uses linked list. So there is no dependency + * between pool size to allocate and the engine configuration. + */ + pool_size = PAGE_SIZE << 1; + } else { + /* it is DMA0 or DMA1 */ + idx = of_get_property(np, "cell-index", &len); + if (!idx || (len != sizeof(u32))) { + dev_err(&ofdev->dev, "Device node %s has missing " + "or invalid cell-index property\n", + np->full_name); + return -EINVAL; + } + id = *idx; + /* DMA0,1 engines use FIFO to maintain CDBs, so we + * should allocate the pool accordingly to size of this + * FIFO. Thus, the pool size depends on the FIFO depth: + * how much CDBs pointers the FIFO may contain then so + * much CDBs we should provide in the pool. + * That is + * CDB size = 32B; + * CDBs number = (DMA0_FIFO_SIZE >> 3); + * Pool size = CDBs number * CDB size = + * = (DMA0_FIFO_SIZE >> 3) << 5 = DMA0_FIFO_SIZE << 2. + */ + pool_size = (id == PPC440SPE_DMA0_ID) ? + DMA0_FIFO_SIZE : DMA1_FIFO_SIZE; + pool_size <<= 2; + } + + if (of_address_to_resource(np, 0, &res)) { + dev_err(&ofdev->dev, "failed to get memory resource\n"); + initcode = PPC_ADMA_INIT_MEMRES; + ret = -ENODEV; + goto out; + } + + if (!request_mem_region(res.start, resource_size(&res), + dev_driver_string(&ofdev->dev))) { + dev_err(&ofdev->dev, "failed to request memory region " + "(0x%016llx-0x%016llx)\n", + (u64)res.start, (u64)res.end); + initcode = PPC_ADMA_INIT_MEMREG; + ret = -EBUSY; + goto out; + } + + /* create a device */ + adev = kzalloc(sizeof(*adev), GFP_KERNEL); + if (!adev) { + dev_err(&ofdev->dev, "failed to allocate device\n"); + initcode = PPC_ADMA_INIT_ALLOC; + ret = -ENOMEM; + goto err_adev_alloc; + } + + adev->id = id; + adev->pool_size = pool_size; + /* allocate coherent memory for hardware descriptors */ + adev->dma_desc_pool_virt = dma_alloc_coherent(&ofdev->dev, + adev->pool_size, &adev->dma_desc_pool, + GFP_KERNEL); + if (adev->dma_desc_pool_virt == NULL) { + dev_err(&ofdev->dev, "failed to allocate %d bytes of coherent " + "memory for hardware descriptors\n", + adev->pool_size); + initcode = PPC_ADMA_INIT_COHERENT; + ret = -ENOMEM; + goto err_dma_alloc; + } + dev_dbg(&ofdev->dev, "allocted descriptor pool virt 0x%p phys 0x%llx\n", + adev->dma_desc_pool_virt, (u64)adev->dma_desc_pool); + + regs = ioremap(res.start, resource_size(&res)); + if (!regs) { + dev_err(&ofdev->dev, "failed to ioremap regs!\n"); + goto err_regs_alloc; + } + + if (adev->id == PPC440SPE_XOR_ID) { + adev->xor_reg = regs; + /* Reset XOR */ + iowrite32be(XOR_CRSR_XASR_BIT, &adev->xor_reg->crsr); + iowrite32be(XOR_CRSR_64BA_BIT, &adev->xor_reg->crrr); + } else { + size_t fifo_size = (adev->id == PPC440SPE_DMA0_ID) ? + DMA0_FIFO_SIZE : DMA1_FIFO_SIZE; + adev->dma_reg = regs; + /* DMAx_FIFO_SIZE is defined in bytes, + * - is defined in number of CDB pointers (8byte). + * DMA FIFO Length = CSlength + CPlength, where + * CSlength = CPlength = (fsiz + 1) * 8. + */ + iowrite32(DMA_FIFO_ENABLE | ((fifo_size >> 3) - 2), + &adev->dma_reg->fsiz); + /* Configure DMA engine */ + iowrite32(DMA_CFG_DXEPR_HP | DMA_CFG_DFMPP_HP | DMA_CFG_FALGN, + &adev->dma_reg->cfg); + /* Clear Status */ + iowrite32(~0, &adev->dma_reg->dsts); + } + + adev->dev = &ofdev->dev; + adev->common.dev = &ofdev->dev; + INIT_LIST_HEAD(&adev->common.channels); + dev_set_drvdata(&ofdev->dev, adev); + + /* create a channel */ + chan = kzalloc(sizeof(*chan), GFP_KERNEL); + if (!chan) { + dev_err(&ofdev->dev, "can't allocate channel structure\n"); + initcode = PPC_ADMA_INIT_CHANNEL; + ret = -ENOMEM; + goto err_chan_alloc; + } + + spin_lock_init(&chan->lock); + INIT_LIST_HEAD(&chan->chain); + INIT_LIST_HEAD(&chan->all_slots); + chan->device = adev; + chan->common.device = &adev->common; + list_add_tail(&chan->common.device_node, &adev->common.channels); + tasklet_init(&chan->irq_tasklet, ppc440spe_adma_tasklet, + (unsigned long)chan); + + /* allocate and map helper pages for async validation or + * async_mult/async_sum_product operations on DMA0/1. + */ + if (adev->id != PPC440SPE_XOR_ID) { + chan->pdest_page = alloc_page(GFP_KERNEL); + chan->qdest_page = alloc_page(GFP_KERNEL); + if (!chan->pdest_page || + !chan->qdest_page) { + if (chan->pdest_page) + __free_page(chan->pdest_page); + if (chan->qdest_page) + __free_page(chan->qdest_page); + ret = -ENOMEM; + goto err_page_alloc; + } + chan->pdest = dma_map_page(&ofdev->dev, chan->pdest_page, 0, + PAGE_SIZE, DMA_BIDIRECTIONAL); + chan->qdest = dma_map_page(&ofdev->dev, chan->qdest_page, 0, + PAGE_SIZE, DMA_BIDIRECTIONAL); + } + + ref = kmalloc(sizeof(*ref), GFP_KERNEL); + if (ref) { + ref->chan = &chan->common; + INIT_LIST_HEAD(&ref->node); + list_add_tail(&ref->node, &ppc440spe_adma_chan_list); + } else { + dev_err(&ofdev->dev, "failed to allocate channel reference!\n"); + ret = -ENOMEM; + goto err_ref_alloc; + } + + ret = ppc440spe_adma_setup_irqs(adev, chan, &initcode); + if (ret) + goto err_irq; + + ppc440spe_adma_init_capabilities(adev); + + ret = dma_async_device_register(&adev->common); + if (ret) { + initcode = PPC_ADMA_INIT_REGISTER; + dev_err(&ofdev->dev, "failed to register dma device\n"); + goto err_dev_reg; + } + + goto out; + +err_dev_reg: + ppc440spe_adma_release_irqs(adev, chan); +err_irq: + list_for_each_entry_safe(ref, _ref, &ppc440spe_adma_chan_list, node) { + if (chan == to_ppc440spe_adma_chan(ref->chan)) { + list_del(&ref->node); + kfree(ref); + } + } +err_ref_alloc: + if (adev->id != PPC440SPE_XOR_ID) { + dma_unmap_page(&ofdev->dev, chan->pdest, + PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_unmap_page(&ofdev->dev, chan->qdest, + PAGE_SIZE, DMA_BIDIRECTIONAL); + __free_page(chan->pdest_page); + __free_page(chan->qdest_page); + } +err_page_alloc: + kfree(chan); +err_chan_alloc: + if (adev->id == PPC440SPE_XOR_ID) + iounmap(adev->xor_reg); + else + iounmap(adev->dma_reg); +err_regs_alloc: + dma_free_coherent(adev->dev, adev->pool_size, + adev->dma_desc_pool_virt, + adev->dma_desc_pool); +err_dma_alloc: + kfree(adev); +err_adev_alloc: + release_mem_region(res.start, resource_size(&res)); +out: + if (id < PPC440SPE_ADMA_ENGINES_NUM) + ppc440spe_adma_devices[id] = initcode; + + return ret; +} + +/** + * ppc440spe_adma_remove - remove the asynch device + */ +static int __devexit ppc440spe_adma_remove(struct of_device *ofdev) +{ + struct ppc440spe_adma_device *adev = dev_get_drvdata(&ofdev->dev); + struct device_node *np = ofdev->node; + struct resource res; + struct dma_chan *chan, *_chan; + struct ppc_dma_chan_ref *ref, *_ref; + struct ppc440spe_adma_chan *ppc440spe_chan; + + dev_set_drvdata(&ofdev->dev, NULL); + if (adev->id < PPC440SPE_ADMA_ENGINES_NUM) + ppc440spe_adma_devices[adev->id] = -1; + + dma_async_device_unregister(&adev->common); + + list_for_each_entry_safe(chan, _chan, &adev->common.channels, + device_node) { + ppc440spe_chan = to_ppc440spe_adma_chan(chan); + ppc440spe_adma_release_irqs(adev, ppc440spe_chan); + tasklet_kill(&ppc440spe_chan->irq_tasklet); + if (adev->id != PPC440SPE_XOR_ID) { + dma_unmap_page(&ofdev->dev, ppc440spe_chan->pdest, + PAGE_SIZE, DMA_BIDIRECTIONAL); + dma_unmap_page(&ofdev->dev, ppc440spe_chan->qdest, + PAGE_SIZE, DMA_BIDIRECTIONAL); + __free_page(ppc440spe_chan->pdest_page); + __free_page(ppc440spe_chan->qdest_page); + } + list_for_each_entry_safe(ref, _ref, &ppc440spe_adma_chan_list, + node) { + if (ppc440spe_chan == + to_ppc440spe_adma_chan(ref->chan)) { + list_del(&ref->node); + kfree(ref); + } + } + list_del(&chan->device_node); + kfree(ppc440spe_chan); + } + + dma_free_coherent(adev->dev, adev->pool_size, + adev->dma_desc_pool_virt, adev->dma_desc_pool); + if (adev->id == PPC440SPE_XOR_ID) + iounmap(adev->xor_reg); + else + iounmap(adev->dma_reg); + of_address_to_resource(np, 0, &res); + release_mem_region(res.start, resource_size(&res)); + kfree(adev); + return 0; +} + +/* + * /sys driver interface to enable h/w RAID-6 capabilities + * Files created in e.g. /sys/devices/plb.0/400100100.dma0/driver/ + * directory are "devices", "enable" and "poly". + * "devices" shows available engines. + * "enable" is used to enable RAID-6 capabilities or to check + * whether these has been activated. + * "poly" allows setting/checking used polynomial (for PPC440SPe only). + */ + +static ssize_t show_ppc440spe_devices(struct device_driver *dev, char *buf) +{ + ssize_t size = 0; + int i; + + for (i = 0; i < PPC440SPE_ADMA_ENGINES_NUM; i++) { + if (ppc440spe_adma_devices[i] == -1) + continue; + size += snprintf(buf + size, PAGE_SIZE - size, + "PPC440SP(E)-ADMA.%d: %s\n", i, + ppc_adma_errors[ppc440spe_adma_devices[i]]); + } + return size; +} + +static ssize_t show_ppc440spe_r6enable(struct device_driver *dev, char *buf) +{ + return snprintf(buf, PAGE_SIZE, + "PPC440SP(e) RAID-6 capabilities are %sABLED.\n", + ppc440spe_r6_enabled ? "EN" : "DIS"); +} + +static ssize_t store_ppc440spe_r6enable(struct device_driver *dev, + const char *buf, size_t count) +{ + unsigned long val; + + if (!count || count > 11) + return -EINVAL; + + if (!ppc440spe_r6_tchan) + return -EFAULT; + + /* Write a key */ + sscanf(buf, "%lx", &val); + dcr_write(ppc440spe_mq_dcr_host, DCRN_MQ0_XORBA, val); + isync(); + + /* Verify whether it really works now */ + if (ppc440spe_test_raid6(ppc440spe_r6_tchan) == 0) { + pr_info("PPC440SP(e) RAID-6 has been activated " + "successfully\n"); + ppc440spe_r6_enabled = 1; + } else { + pr_info("PPC440SP(e) RAID-6 hasn't been activated!" + " Error key ?\n"); + ppc440spe_r6_enabled = 0; + } + return count; +} + +static ssize_t show_ppc440spe_r6poly(struct device_driver *dev, char *buf) +{ + ssize_t size = 0; + u32 reg; + +#ifdef CONFIG_440SP + /* 440SP has fixed polynomial */ + reg = 0x4d; +#else + reg = dcr_read(ppc440spe_mq_dcr_host, DCRN_MQ0_CFBHL); + reg >>= MQ0_CFBHL_POLY; + reg &= 0xFF; +#endif + + size = snprintf(buf, PAGE_SIZE, "PPC440SP(e) RAID-6 driver " + "uses 0x1%02x polynomial.\n", reg); + return size; +} + +static ssize_t store_ppc440spe_r6poly(struct device_driver *dev, + const char *buf, size_t count) +{ + unsigned long reg, val; + +#ifdef CONFIG_440SP + /* 440SP uses default 0x14D polynomial only */ + return -EINVAL; +#endif + + if (!count || count > 6) + return -EINVAL; + + /* e.g., 0x14D or 0x11D */ + sscanf(buf, "%lx", &val); + + if (val & ~0x1FF) + return -EINVAL; + + val &= 0xFF; + reg = dcr_read(ppc440spe_mq_dcr_host, DCRN_MQ0_CFBHL); + reg &= ~(0xFF << MQ0_CFBHL_POLY); + reg |= val << MQ0_CFBHL_POLY; + dcr_write(ppc440spe_mq_dcr_host, DCRN_MQ0_CFBHL, reg); + + return count; +} + +static DRIVER_ATTR(devices, S_IRUGO, show_ppc440spe_devices, NULL); +static DRIVER_ATTR(enable, S_IRUGO | S_IWUSR, show_ppc440spe_r6enable, + store_ppc440spe_r6enable); +static DRIVER_ATTR(poly, S_IRUGO | S_IWUSR, show_ppc440spe_r6poly, + store_ppc440spe_r6poly); + +/* + * Common initialisation for RAID engines; allocate memory for + * DMAx FIFOs, perform configuration common for all DMA engines. + * Further DMA engine specific configuration is done at probe time. + */ +static int ppc440spe_configure_raid_devices(void) +{ + struct device_node *np; + struct resource i2o_res; + struct i2o_regs __iomem *i2o_reg; + dcr_host_t i2o_dcr_host; + unsigned int dcr_base, dcr_len; + int i, ret; + + np = of_find_compatible_node(NULL, NULL, "ibm,i2o-440spe"); + if (!np) { + pr_err("%s: can't find I2O device tree node\n", + __func__); + return -ENODEV; + } + + if (of_address_to_resource(np, 0, &i2o_res)) { + of_node_put(np); + return -EINVAL; + } + + i2o_reg = of_iomap(np, 0); + if (!i2o_reg) { + pr_err("%s: failed to map I2O registers\n", __func__); + of_node_put(np); + return -EINVAL; + } + + /* Get I2O DCRs base */ + dcr_base = dcr_resource_start(np, 0); + dcr_len = dcr_resource_len(np, 0); + if (!dcr_base && !dcr_len) { + pr_err("%s: can't get DCR registers base/len!\n", + np->full_name); + of_node_put(np); + iounmap(i2o_reg); + return -ENODEV; + } + + i2o_dcr_host = dcr_map(np, dcr_base, dcr_len); + if (!DCR_MAP_OK(i2o_dcr_host)) { + pr_err("%s: failed to map DCRs!\n", np->full_name); + of_node_put(np); + iounmap(i2o_reg); + return -ENODEV; + } + of_node_put(np); + + /* Provide memory regions for DMA's FIFOs: I2O, DMA0 and DMA1 share + * the base address of FIFO memory space. + * Actually we need twice more physical memory than programmed in the + * register (because there are two FIFOs for each DMA: CP and CS) + */ + ppc440spe_dma_fifo_buf = kmalloc((DMA0_FIFO_SIZE + DMA1_FIFO_SIZE) << 1, + GFP_KERNEL); + if (!ppc440spe_dma_fifo_buf) { + pr_err("%s: DMA FIFO buffer allocation failed.\n", __func__); + iounmap(i2o_reg); + dcr_unmap(i2o_dcr_host, dcr_len); + return -ENOMEM; + } + + /* + * Configure h/w + */ + /* Reset I2O/DMA */ + mtdcri(SDR0, DCRN_SDR0_SRST, DCRN_SDR0_SRST_I2ODMA); + mtdcri(SDR0, DCRN_SDR0_SRST, 0); + + /* Setup the base address of mmaped registers */ + dcr_write(i2o_dcr_host, DCRN_I2O0_IBAH, (u32)(i2o_res.start >> 32)); + dcr_write(i2o_dcr_host, DCRN_I2O0_IBAL, (u32)(i2o_res.start) | + I2O_REG_ENABLE); + dcr_unmap(i2o_dcr_host, dcr_len); + + /* Setup FIFO memory space base address */ + iowrite32(0, &i2o_reg->ifbah); + iowrite32(((u32)__pa(ppc440spe_dma_fifo_buf)), &i2o_reg->ifbal); + + /* set zero FIFO size for I2O, so the whole + * ppc440spe_dma_fifo_buf is used by DMAs. + * DMAx_FIFOs will be configured while probe. + */ + iowrite32(0, &i2o_reg->ifsiz); + iounmap(i2o_reg); + + /* To prepare WXOR/RXOR functionality we need access to + * Memory Queue Module DCRs (finally it will be enabled + * via /sys interface of the ppc440spe ADMA driver). + */ + np = of_find_compatible_node(NULL, NULL, "ibm,mq-440spe"); + if (!np) { + pr_err("%s: can't find MQ device tree node\n", + __func__); + ret = -ENODEV; + goto out_free; + } + + /* Get MQ DCRs base */ + dcr_base = dcr_resource_start(np, 0); + dcr_len = dcr_resource_len(np, 0); + if (!dcr_base && !dcr_len) { + pr_err("%s: can't get DCR registers base/len!\n", + np->full_name); + ret = -ENODEV; + goto out_mq; + } + + ppc440spe_mq_dcr_host = dcr_map(np, dcr_base, dcr_len); + if (!DCR_MAP_OK(ppc440spe_mq_dcr_host)) { + pr_err("%s: failed to map DCRs!\n", np->full_name); + ret = -ENODEV; + goto out_mq; + } + of_node_put(np); + ppc440spe_mq_dcr_len = dcr_len; + + /* Set HB alias */ + dcr_write(ppc440spe_mq_dcr_host, DCRN_MQ0_BAUH, DMA_CUED_XOR_HB); + + /* Set: + * - LL transaction passing limit to 1; + * - Memory controller cycle limit to 1; + * - Galois Polynomial to 0x14d (default) + */ + dcr_write(ppc440spe_mq_dcr_host, DCRN_MQ0_CFBHL, + (1 << MQ0_CFBHL_TPLM) | (1 << MQ0_CFBHL_HBCL) | + (PPC440SPE_DEFAULT_POLY << MQ0_CFBHL_POLY)); + + atomic_set(&ppc440spe_adma_err_irq_ref, 0); + for (i = 0; i < PPC440SPE_ADMA_ENGINES_NUM; i++) + ppc440spe_adma_devices[i] = -1; + + return 0; + +out_mq: + of_node_put(np); +out_free: + kfree(ppc440spe_dma_fifo_buf); + return ret; +} + +static struct of_device_id __devinitdata ppc440spe_adma_of_match[] = { + { .compatible = "ibm,dma-440spe", }, + { .compatible = "amcc,xor-accelerator", }, + {}, +}; +MODULE_DEVICE_TABLE(of, ppc440spe_adma_of_match); + +static struct of_platform_driver ppc440spe_adma_driver = { + .match_table = ppc440spe_adma_of_match, + .probe = ppc440spe_adma_probe, + .remove = __devexit_p(ppc440spe_adma_remove), + .driver = { + .name = "PPC440SP(E)-ADMA", + .owner = THIS_MODULE, + }, +}; + +static __init int ppc440spe_adma_init(void) +{ + int ret; + + ret = ppc440spe_configure_raid_devices(); + if (ret) + return ret; + + ret = of_register_platform_driver(&ppc440spe_adma_driver); + if (ret) { + pr_err("%s: failed to register platform driver\n", + __func__); + goto out_reg; + } + + /* Initialization status */ + ret = driver_create_file(&ppc440spe_adma_driver.driver, + &driver_attr_devices); + if (ret) + goto out_dev; + + /* RAID-6 h/w enable entry */ + ret = driver_create_file(&ppc440spe_adma_driver.driver, + &driver_attr_enable); + if (ret) + goto out_en; + + /* GF polynomial to use */ + ret = driver_create_file(&ppc440spe_adma_driver.driver, + &driver_attr_poly); + if (!ret) + return ret; + + driver_remove_file(&ppc440spe_adma_driver.driver, + &driver_attr_enable); +out_en: + driver_remove_file(&ppc440spe_adma_driver.driver, + &driver_attr_devices); +out_dev: + /* User will not be able to enable h/w RAID-6 */ + pr_err("%s: failed to create RAID-6 driver interface\n", + __func__); + of_unregister_platform_driver(&ppc440spe_adma_driver); +out_reg: + dcr_unmap(ppc440spe_mq_dcr_host, ppc440spe_mq_dcr_len); + kfree(ppc440spe_dma_fifo_buf); + return ret; +} + +static void __exit ppc440spe_adma_exit(void) +{ + driver_remove_file(&ppc440spe_adma_driver.driver, + &driver_attr_poly); + driver_remove_file(&ppc440spe_adma_driver.driver, + &driver_attr_enable); + driver_remove_file(&ppc440spe_adma_driver.driver, + &driver_attr_devices); + of_unregister_platform_driver(&ppc440spe_adma_driver); + dcr_unmap(ppc440spe_mq_dcr_host, ppc440spe_mq_dcr_len); + kfree(ppc440spe_dma_fifo_buf); +} + +arch_initcall(ppc440spe_adma_init); +module_exit(ppc440spe_adma_exit); + +MODULE_AUTHOR("Yuri Tikhonov "); +MODULE_DESCRIPTION("PPC440SPE ADMA Engine Driver"); +MODULE_LICENSE("GPL"); diff --git a/drivers/dma/ppc4xx/adma.h b/drivers/dma/ppc4xx/adma.h new file mode 100644 index 000000000000..8ada5a812e3b --- /dev/null +++ b/drivers/dma/ppc4xx/adma.h @@ -0,0 +1,195 @@ +/* + * 2006-2009 (C) DENX Software Engineering. + * + * Author: Yuri Tikhonov + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of + * any kind, whether express or implied. + */ + +#ifndef _PPC440SPE_ADMA_H +#define _PPC440SPE_ADMA_H + +#include +#include "dma.h" +#include "xor.h" + +#define to_ppc440spe_adma_chan(chan) \ + container_of(chan, struct ppc440spe_adma_chan, common) +#define to_ppc440spe_adma_device(dev) \ + container_of(dev, struct ppc440spe_adma_device, common) +#define tx_to_ppc440spe_adma_slot(tx) \ + container_of(tx, struct ppc440spe_adma_desc_slot, async_tx) + +/* Default polynomial (for 440SP is only available) */ +#define PPC440SPE_DEFAULT_POLY 0x4d + +#define PPC440SPE_ADMA_ENGINES_NUM (XOR_ENGINES_NUM + DMA_ENGINES_NUM) + +#define PPC440SPE_ADMA_WATCHDOG_MSEC 3 +#define PPC440SPE_ADMA_THRESHOLD 1 + +#define PPC440SPE_DMA0_ID 0 +#define PPC440SPE_DMA1_ID 1 +#define PPC440SPE_XOR_ID 2 + +#define PPC440SPE_ADMA_DMA_MAX_BYTE_COUNT 0xFFFFFFUL +/* this is the XOR_CBBCR width */ +#define PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT (1 << 31) +#define PPC440SPE_ADMA_ZERO_SUM_MAX_BYTE_COUNT PPC440SPE_ADMA_XOR_MAX_BYTE_COUNT + +#define PPC440SPE_RXOR_RUN 0 + +#define MQ0_CF2H_RXOR_BS_MASK 0x1FF + +#undef ADMA_LL_DEBUG + +/** + * struct ppc440spe_adma_device - internal representation of an ADMA device + * @dev: device + * @dma_reg: base for DMAx register access + * @xor_reg: base for XOR register access + * @i2o_reg: base for I2O register access + * @id: HW ADMA Device selector + * @dma_desc_pool_virt: base of DMA descriptor region (CPU address) + * @dma_desc_pool: base of DMA descriptor region (DMA address) + * @pool_size: size of the pool + * @irq: DMAx or XOR irq number + * @err_irq: DMAx error irq number + * @common: embedded struct dma_device + */ +struct ppc440spe_adma_device { + struct device *dev; + struct dma_regs __iomem *dma_reg; + struct xor_regs __iomem *xor_reg; + struct i2o_regs __iomem *i2o_reg; + int id; + void *dma_desc_pool_virt; + dma_addr_t dma_desc_pool; + size_t pool_size; + int irq; + int err_irq; + struct dma_device common; +}; + +/** + * struct ppc440spe_adma_chan - internal representation of an ADMA channel + * @lock: serializes enqueue/dequeue operations to the slot pool + * @device: parent device + * @chain: device chain view of the descriptors + * @common: common dmaengine channel object members + * @all_slots: complete domain of slots usable by the channel + * @pending: allows batching of hardware operations + * @completed_cookie: identifier for the most recently completed operation + * @slots_allocated: records the actual size of the descriptor slot pool + * @hw_chain_inited: h/w descriptor chain initialization flag + * @irq_tasklet: bottom half where ppc440spe_adma_slot_cleanup runs + * @needs_unmap: if buffers should not be unmapped upon final processing + * @pdest_page: P destination page for async validate operation + * @qdest_page: Q destination page for async validate operation + * @pdest: P dma addr for async validate operation + * @qdest: Q dma addr for async validate operation + */ +struct ppc440spe_adma_chan { + spinlock_t lock; + struct ppc440spe_adma_device *device; + struct list_head chain; + struct dma_chan common; + struct list_head all_slots; + struct ppc440spe_adma_desc_slot *last_used; + int pending; + dma_cookie_t completed_cookie; + int slots_allocated; + int hw_chain_inited; + struct tasklet_struct irq_tasklet; + u8 needs_unmap; + struct page *pdest_page; + struct page *qdest_page; + dma_addr_t pdest; + dma_addr_t qdest; +}; + +struct ppc440spe_rxor { + u32 addrl; + u32 addrh; + int len; + int xor_count; + int addr_count; + int desc_count; + int state; +}; + +/** + * struct ppc440spe_adma_desc_slot - PPC440SPE-ADMA software descriptor + * @phys: hardware address of the hardware descriptor chain + * @group_head: first operation in a transaction + * @hw_next: pointer to the next descriptor in chain + * @async_tx: support for the async_tx api + * @slot_node: node on the iop_adma_chan.all_slots list + * @chain_node: node on the op_adma_chan.chain list + * @group_list: list of slots that make up a multi-descriptor transaction + * for example transfer lengths larger than the supported hw max + * @unmap_len: transaction bytecount + * @hw_desc: virtual address of the hardware descriptor chain + * @stride: currently chained or not + * @idx: pool index + * @slot_cnt: total slots used in an transaction (group of operations) + * @src_cnt: number of sources set in this descriptor + * @dst_cnt: number of destinations set in the descriptor + * @slots_per_op: number of slots per operation + * @descs_per_op: number of slot per P/Q operation see comment + * for ppc440spe_prep_dma_pqxor function + * @flags: desc state/type + * @reverse_flags: 1 if a corresponding rxor address uses reversed address order + * @xor_check_result: result of zero sum + * @crc32_result: result crc calculation + */ +struct ppc440spe_adma_desc_slot { + dma_addr_t phys; + struct ppc440spe_adma_desc_slot *group_head; + struct ppc440spe_adma_desc_slot *hw_next; + struct dma_async_tx_descriptor async_tx; + struct list_head slot_node; + struct list_head chain_node; /* node in channel ops list */ + struct list_head group_list; /* list */ + unsigned int unmap_len; + void *hw_desc; + u16 stride; + u16 idx; + u16 slot_cnt; + u8 src_cnt; + u8 dst_cnt; + u8 slots_per_op; + u8 descs_per_op; + unsigned long flags; + unsigned long reverse_flags[8]; + +#define PPC440SPE_DESC_INT 0 /* generate interrupt on complete */ +#define PPC440SPE_ZERO_P 1 /* clear P destionaion */ +#define PPC440SPE_ZERO_Q 2 /* clear Q destination */ +#define PPC440SPE_COHERENT 3 /* src/dst are coherent */ + +#define PPC440SPE_DESC_WXOR 4 /* WXORs are in chain */ +#define PPC440SPE_DESC_RXOR 5 /* RXOR is in chain */ + +#define PPC440SPE_DESC_RXOR123 8 /* CDB for RXOR123 operation */ +#define PPC440SPE_DESC_RXOR124 9 /* CDB for RXOR124 operation */ +#define PPC440SPE_DESC_RXOR125 10 /* CDB for RXOR125 operation */ +#define PPC440SPE_DESC_RXOR12 11 /* CDB for RXOR12 operation */ +#define PPC440SPE_DESC_RXOR_REV 12 /* CDB has srcs in reversed order */ + +#define PPC440SPE_DESC_PCHECK 13 +#define PPC440SPE_DESC_QCHECK 14 + +#define PPC440SPE_DESC_RXOR_MSK 0x3 + + struct ppc440spe_rxor rxor_cursor; + + union { + u32 *xor_check_result; + u32 *crc32_result; + }; +}; + +#endif /* _PPC440SPE_ADMA_H */ diff --git a/drivers/dma/ppc4xx/dma.h b/drivers/dma/ppc4xx/dma.h new file mode 100644 index 000000000000..bcde2df2f373 --- /dev/null +++ b/drivers/dma/ppc4xx/dma.h @@ -0,0 +1,223 @@ +/* + * 440SPe's DMA engines support header file + * + * 2006-2009 (C) DENX Software Engineering. + * + * Author: Yuri Tikhonov + * + * This file is licensed under the term of the GNU General Public License + * version 2. The program licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#ifndef _PPC440SPE_DMA_H +#define _PPC440SPE_DMA_H + +#include + +/* Number of elements in the array with statical CDBs */ +#define MAX_STAT_DMA_CDBS 16 +/* Number of DMA engines available on the contoller */ +#define DMA_ENGINES_NUM 2 + +/* Maximum h/w supported number of destinations */ +#define DMA_DEST_MAX_NUM 2 + +/* FIFO's params */ +#define DMA0_FIFO_SIZE 0x1000 +#define DMA1_FIFO_SIZE 0x1000 +#define DMA_FIFO_ENABLE (1<<12) + +/* DMA Configuration Register. Data Transfer Engine PLB Priority: */ +#define DMA_CFG_DXEPR_LP (0<<26) +#define DMA_CFG_DXEPR_HP (3<<26) +#define DMA_CFG_DXEPR_HHP (2<<26) +#define DMA_CFG_DXEPR_HHHP (1<<26) + +/* DMA Configuration Register. DMA FIFO Manager PLB Priority: */ +#define DMA_CFG_DFMPP_LP (0<<23) +#define DMA_CFG_DFMPP_HP (3<<23) +#define DMA_CFG_DFMPP_HHP (2<<23) +#define DMA_CFG_DFMPP_HHHP (1<<23) + +/* DMA Configuration Register. Force 64-byte Alignment */ +#define DMA_CFG_FALGN (1 << 19) + +/*UIC0:*/ +#define D0CPF_INT (1<<12) +#define D0CSF_INT (1<<11) +#define D1CPF_INT (1<<10) +#define D1CSF_INT (1<<9) +/*UIC1:*/ +#define DMAE_INT (1<<9) + +/* I2O IOP Interrupt Mask Register */ +#define I2O_IOPIM_P0SNE (1<<3) +#define I2O_IOPIM_P0EM (1<<5) +#define I2O_IOPIM_P1SNE (1<<6) +#define I2O_IOPIM_P1EM (1<<8) + +/* DMA CDB fields */ +#define DMA_CDB_MSK (0xF) +#define DMA_CDB_64B_ADDR (1<<2) +#define DMA_CDB_NO_INT (1<<3) +#define DMA_CDB_STATUS_MSK (0x3) +#define DMA_CDB_ADDR_MSK (0xFFFFFFF0) + +/* DMA CDB OpCodes */ +#define DMA_CDB_OPC_NO_OP (0x00) +#define DMA_CDB_OPC_MV_SG1_SG2 (0x01) +#define DMA_CDB_OPC_MULTICAST (0x05) +#define DMA_CDB_OPC_DFILL128 (0x24) +#define DMA_CDB_OPC_DCHECK128 (0x23) + +#define DMA_CUED_XOR_BASE (0x10000000) +#define DMA_CUED_XOR_HB (0x00000008) + +#ifdef CONFIG_440SP +#define DMA_CUED_MULT1_OFF 0 +#define DMA_CUED_MULT2_OFF 8 +#define DMA_CUED_MULT3_OFF 16 +#define DMA_CUED_REGION_OFF 24 +#define DMA_CUED_XOR_WIN_MSK (0xFC000000) +#else +#define DMA_CUED_MULT1_OFF 2 +#define DMA_CUED_MULT2_OFF 10 +#define DMA_CUED_MULT3_OFF 18 +#define DMA_CUED_REGION_OFF 26 +#define DMA_CUED_XOR_WIN_MSK (0xF0000000) +#endif + +#define DMA_CUED_REGION_MSK 0x3 +#define DMA_RXOR123 0x0 +#define DMA_RXOR124 0x1 +#define DMA_RXOR125 0x2 +#define DMA_RXOR12 0x3 + +/* S/G addresses */ +#define DMA_CDB_SG_SRC 1 +#define DMA_CDB_SG_DST1 2 +#define DMA_CDB_SG_DST2 3 + +/* + * DMAx engines Command Descriptor Block Type + */ +struct dma_cdb { + /* + * Basic CDB structure (Table 20-17, p.499, 440spe_um_1_22.pdf) + */ + u8 pad0[2]; /* reserved */ + u8 attr; /* attributes */ + u8 opc; /* opcode */ + u32 sg1u; /* upper SG1 address */ + u32 sg1l; /* lower SG1 address */ + u32 cnt; /* SG count, 3B used */ + u32 sg2u; /* upper SG2 address */ + u32 sg2l; /* lower SG2 address */ + u32 sg3u; /* upper SG3 address */ + u32 sg3l; /* lower SG3 address */ +}; + +/* + * DMAx hardware registers (p.515 in 440SPe UM 1.22) + */ +struct dma_regs { + u32 cpfpl; + u32 cpfph; + u32 csfpl; + u32 csfph; + u32 dsts; + u32 cfg; + u8 pad0[0x8]; + u16 cpfhp; + u16 cpftp; + u16 csfhp; + u16 csftp; + u8 pad1[0x8]; + u32 acpl; + u32 acph; + u32 s1bpl; + u32 s1bph; + u32 s2bpl; + u32 s2bph; + u32 s3bpl; + u32 s3bph; + u8 pad2[0x10]; + u32 earl; + u32 earh; + u8 pad3[0x8]; + u32 seat; + u32 sead; + u32 op; + u32 fsiz; +}; + +/* + * I2O hardware registers (p.528 in 440SPe UM 1.22) + */ +struct i2o_regs { + u32 ists; + u32 iseat; + u32 isead; + u8 pad0[0x14]; + u32 idbel; + u8 pad1[0xc]; + u32 ihis; + u32 ihim; + u8 pad2[0x8]; + u32 ihiq; + u32 ihoq; + u8 pad3[0x8]; + u32 iopis; + u32 iopim; + u32 iopiq; + u8 iopoq; + u8 pad4[3]; + u16 iiflh; + u16 iiflt; + u16 iiplh; + u16 iiplt; + u16 ioflh; + u16 ioflt; + u16 ioplh; + u16 ioplt; + u32 iidc; + u32 ictl; + u32 ifcpp; + u8 pad5[0x4]; + u16 mfac0; + u16 mfac1; + u16 mfac2; + u16 mfac3; + u16 mfac4; + u16 mfac5; + u16 mfac6; + u16 mfac7; + u16 ifcfh; + u16 ifcht; + u8 pad6[0x4]; + u32 iifmc; + u32 iodb; + u32 iodbc; + u32 ifbal; + u32 ifbah; + u32 ifsiz; + u32 ispd0; + u32 ispd1; + u32 ispd2; + u32 ispd3; + u32 ihipl; + u32 ihiph; + u32 ihopl; + u32 ihoph; + u32 iiipl; + u32 iiiph; + u32 iiopl; + u32 iioph; + u32 ifcpl; + u32 ifcph; + u8 pad7[0x8]; + u32 iopt; +}; + +#endif /* _PPC440SPE_DMA_H */ diff --git a/drivers/dma/ppc4xx/xor.h b/drivers/dma/ppc4xx/xor.h new file mode 100644 index 000000000000..daed7384daac --- /dev/null +++ b/drivers/dma/ppc4xx/xor.h @@ -0,0 +1,110 @@ +/* + * 440SPe's XOR engines support header file + * + * 2006-2009 (C) DENX Software Engineering. + * + * Author: Yuri Tikhonov + * + * This file is licensed under the term of the GNU General Public License + * version 2. The program licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#ifndef _PPC440SPE_XOR_H +#define _PPC440SPE_XOR_H + +#include + +/* Number of XOR engines available on the contoller */ +#define XOR_ENGINES_NUM 1 + +/* Number of operands supported in the h/w */ +#define XOR_MAX_OPS 16 + +/* + * XOR Command Block Control Register bits + */ +#define XOR_CBCR_LNK_BIT (1<<31) /* link present */ +#define XOR_CBCR_TGT_BIT (1<<30) /* target present */ +#define XOR_CBCR_CBCE_BIT (1<<29) /* command block compete enable */ +#define XOR_CBCR_RNZE_BIT (1<<28) /* result not zero enable */ +#define XOR_CBCR_XNOR_BIT (1<<15) /* XOR/XNOR */ +#define XOR_CDCR_OAC_MSK (0x7F) /* operand address count */ + +/* + * XORCore Status Register bits + */ +#define XOR_SR_XCP_BIT (1<<31) /* core processing */ +#define XOR_SR_ICB_BIT (1<<17) /* invalid CB */ +#define XOR_SR_IC_BIT (1<<16) /* invalid command */ +#define XOR_SR_IPE_BIT (1<<15) /* internal parity error */ +#define XOR_SR_RNZ_BIT (1<<2) /* result not Zero */ +#define XOR_SR_CBC_BIT (1<<1) /* CB complete */ +#define XOR_SR_CBLC_BIT (1<<0) /* CB list complete */ + +/* + * XORCore Control Set and Reset Register bits + */ +#define XOR_CRSR_XASR_BIT (1<<31) /* soft reset */ +#define XOR_CRSR_XAE_BIT (1<<30) /* enable */ +#define XOR_CRSR_RCBE_BIT (1<<29) /* refetch CB enable */ +#define XOR_CRSR_PAUS_BIT (1<<28) /* pause */ +#define XOR_CRSR_64BA_BIT (1<<27) /* 64/32 CB format */ +#define XOR_CRSR_CLP_BIT (1<<25) /* continue list processing */ + +/* + * XORCore Interrupt Enable Register + */ +#define XOR_IE_ICBIE_BIT (1<<17) /* Invalid Command Block IRQ Enable */ +#define XOR_IE_ICIE_BIT (1<<16) /* Invalid Command IRQ Enable */ +#define XOR_IE_RPTIE_BIT (1<<14) /* Read PLB Timeout Error IRQ Enable */ +#define XOR_IE_CBCIE_BIT (1<<1) /* CB complete interrupt enable */ +#define XOR_IE_CBLCI_BIT (1<<0) /* CB list complete interrupt enable */ + +/* + * XOR Accelerator engine Command Block Type + */ +struct xor_cb { + /* + * Basic 64-bit format XOR CB (Table 19-1, p.463, 440spe_um_1_22.pdf) + */ + u32 cbc; /* control */ + u32 cbbc; /* byte count */ + u32 cbs; /* status */ + u8 pad0[4]; /* reserved */ + u32 cbtah; /* target address high */ + u32 cbtal; /* target address low */ + u32 cblah; /* link address high */ + u32 cblal; /* link address low */ + struct { + u32 h; + u32 l; + } __attribute__ ((packed)) ops[16]; +} __attribute__ ((packed)); + +/* + * XOR hardware registers Table 19-3, UM 1.22 + */ +struct xor_regs { + u32 op_ar[16][2]; /* operand address[0]-high,[1]-low registers */ + u8 pad0[352]; /* reserved */ + u32 cbcr; /* CB control register */ + u32 cbbcr; /* CB byte count register */ + u32 cbsr; /* CB status register */ + u8 pad1[4]; /* reserved */ + u32 cbtahr; /* operand target address high register */ + u32 cbtalr; /* operand target address low register */ + u32 cblahr; /* CB link address high register */ + u32 cblalr; /* CB link address low register */ + u32 crsr; /* control set register */ + u32 crrr; /* control reset register */ + u32 ccbahr; /* current CB address high register */ + u32 ccbalr; /* current CB address low register */ + u32 plbr; /* PLB configuration register */ + u32 ier; /* interrupt enable register */ + u32 pecr; /* parity error count register */ + u32 sr; /* status register */ + u32 revidr; /* revision ID register */ +}; + +#endif /* _PPC440SPE_XOR_H */ -- cgit v1.2.3 From 86ad53f8aa051f5cf8783e28a804adc3ebc391f7 Mon Sep 17 00:00:00 2001 From: Albert Herranz Date: Sat, 12 Dec 2009 06:31:35 +0000 Subject: powerpc: gamecube: device tree Add a device tree source file for the Nintendo GameCube video game console. Signed-off-by: Albert Herranz Acked-by: Segher Boessenkool Signed-off-by: Grant Likely --- .../powerpc/dts-bindings/nintendo/gamecube.txt | 109 ++++++++++++++++++++ arch/powerpc/boot/dts/gamecube.dts | 114 +++++++++++++++++++++ 2 files changed, 223 insertions(+) create mode 100644 Documentation/powerpc/dts-bindings/nintendo/gamecube.txt create mode 100644 arch/powerpc/boot/dts/gamecube.dts (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/nintendo/gamecube.txt b/Documentation/powerpc/dts-bindings/nintendo/gamecube.txt new file mode 100644 index 000000000000..b558585b1aaf --- /dev/null +++ b/Documentation/powerpc/dts-bindings/nintendo/gamecube.txt @@ -0,0 +1,109 @@ + +Nintendo GameCube device tree +============================= + +1) The "flipper" node + + This node represents the multi-function "Flipper" chip, which packages + many of the devices found in the Nintendo GameCube. + + Required properties: + + - compatible : Should be "nintendo,flipper" + +1.a) The Video Interface (VI) node + + Represents the interface between the graphics processor and a external + video encoder. + + Required properties: + + - compatible : should be "nintendo,flipper-vi" + - reg : should contain the VI registers location and length + - interrupts : should contain the VI interrupt + +1.b) The Processor Interface (PI) node + + Represents the data and control interface between the main processor + and graphics and audio processor. + + Required properties: + + - compatible : should be "nintendo,flipper-pi" + - reg : should contain the PI registers location and length + +1.b.i) The "Flipper" interrupt controller node + + Represents the interrupt controller within the "Flipper" chip. + The node for the "Flipper" interrupt controller must be placed under + the PI node. + + Required properties: + + - compatible : should be "nintendo,flipper-pic" + +1.c) The Digital Signal Procesor (DSP) node + + Represents the digital signal processor interface, designed to offload + audio related tasks. + + Required properties: + + - compatible : should be "nintendo,flipper-dsp" + - reg : should contain the DSP registers location and length + - interrupts : should contain the DSP interrupt + +1.c.i) The Auxiliary RAM (ARAM) node + + Represents the non cpu-addressable ram designed mainly to store audio + related information. + The ARAM node must be placed under the DSP node. + + Required properties: + + - compatible : should be "nintendo,flipper-aram" + - reg : should contain the ARAM start (zero-based) and length + +1.d) The Disk Interface (DI) node + + Represents the interface used to communicate with mass storage devices. + + Required properties: + + - compatible : should be "nintendo,flipper-di" + - reg : should contain the DI registers location and length + - interrupts : should contain the DI interrupt + +1.e) The Audio Interface (AI) node + + Represents the interface to the external 16-bit stereo digital-to-analog + converter. + + Required properties: + + - compatible : should be "nintendo,flipper-ai" + - reg : should contain the AI registers location and length + - interrupts : should contain the AI interrupt + +1.f) The Serial Interface (SI) node + + Represents the interface to the four single bit serial interfaces. + The SI is a proprietary serial interface used normally to control gamepads. + It's NOT a RS232-type interface. + + Required properties: + + - compatible : should be "nintendo,flipper-si" + - reg : should contain the SI registers location and length + - interrupts : should contain the SI interrupt + +1.g) The External Interface (EXI) node + + Represents the multi-channel SPI-like interface. + + Required properties: + + - compatible : should be "nintendo,flipper-exi" + - reg : should contain the EXI registers location and length + - interrupts : should contain the EXI interrupt + diff --git a/arch/powerpc/boot/dts/gamecube.dts b/arch/powerpc/boot/dts/gamecube.dts new file mode 100644 index 000000000000..ef3be0e58b02 --- /dev/null +++ b/arch/powerpc/boot/dts/gamecube.dts @@ -0,0 +1,114 @@ +/* + * arch/powerpc/boot/dts/gamecube.dts + * + * Nintendo GameCube platform device tree source + * Copyright (C) 2007-2009 The GameCube Linux Team + * Copyright (C) 2007,2008,2009 Albert Herranz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + */ + +/dts-v1/; + +/ { + model = "nintendo,gamecube"; + compatible = "nintendo,gamecube"; + #address-cells = <1>; + #size-cells = <1>; + + chosen { + bootargs = "root=/dev/gcnsda2 rootwait udbg-immortal"; + }; + + memory { + device_type = "memory"; + reg = <0x00000000 0x01800000>; + }; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + PowerPC,gekko@0 { + device_type = "cpu"; + reg = <0>; + clock-frequency = <486000000>; /* 486MHz */ + bus-frequency = <162000000>; /* 162MHz core-to-bus 3x */ + timebase-frequency = <40500000>; /* 162MHz / 4 */ + i-cache-line-size = <32>; + d-cache-line-size = <32>; + i-cache-size = <32768>; + d-cache-size = <32768>; + }; + }; + + /* devices contained int the flipper chipset */ + flipper { + #address-cells = <1>; + #size-cells = <1>; + compatible = "nintendo,flipper"; + ranges = <0x0c000000 0x0c000000 0x00010000>; + interrupt-parent = <&PIC>; + + video@0c002000 { + compatible = "nintendo,flipper-vi"; + reg = <0x0c002000 0x100>; + interrupts = <8>; + }; + + processor-interface@0c003000 { + compatible = "nintendo,flipper-pi"; + reg = <0x0c003000 0x100>; + + PIC: pic { + #interrupt-cells = <1>; + compatible = "nintendo,flipper-pic"; + interrupt-controller; + }; + }; + + dsp@0c005000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "nintendo,flipper-dsp"; + reg = <0x0c005000 0x200>; + interrupts = <6>; + + memory@0 { + compatible = "nintendo,flipper-aram"; + reg = <0 0x1000000>; /* 16MB */ + }; + }; + + disk@0c006000 { + compatible = "nintendo,flipper-di"; + reg = <0x0c006000 0x40>; + interrupts = <2>; + }; + + audio@0c006c00 { + compatible = "nintendo,flipper-ai"; + reg = <0x0c006c00 0x20>; + interrupts = <6>; + }; + + gamepad-controller@0c006400 { + compatible = "nintendo,flipper-si"; + reg = <0x0c006400 0x100>; + interrupts = <3>; + }; + + /* External Interface bus */ + exi@0c006800 { + compatible = "nintendo,flipper-exi"; + reg = <0x0c006800 0x40>; + virtual-reg = <0x0c006800>; + interrupts = <4>; + }; + }; +}; + -- cgit v1.2.3 From 7a09116c016611effe7037c8346c5f42a7416718 Mon Sep 17 00:00:00 2001 From: Albert Herranz Date: Sat, 12 Dec 2009 06:31:44 +0000 Subject: powerpc: wii: device tree Add a device tree source file for the Nintendo Wii video game console. Signed-off-by: Albert Herranz Acked-by: Segher Boessenkool Acked-by: Benjamin Herrenschmidt Signed-off-by: Grant Likely --- .../powerpc/dts-bindings/nintendo/wii.txt | 184 +++++++++++++++++ arch/powerpc/boot/dts/wii.dts | 218 +++++++++++++++++++++ 2 files changed, 402 insertions(+) create mode 100644 Documentation/powerpc/dts-bindings/nintendo/wii.txt create mode 100644 arch/powerpc/boot/dts/wii.dts (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/nintendo/wii.txt b/Documentation/powerpc/dts-bindings/nintendo/wii.txt new file mode 100644 index 000000000000..a7e155a023b8 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/nintendo/wii.txt @@ -0,0 +1,184 @@ + +Nintendo Wii device tree +======================== + +0) The root node + + This node represents the Nintendo Wii video game console. + + Required properties: + + - model : Should be "nintendo,wii" + - compatible : Should be "nintendo,wii" + +1) The "hollywood" node + + This node represents the multi-function "Hollywood" chip, which packages + many of the devices found in the Nintendo Wii. + + Required properties: + + - compatible : Should be "nintendo,hollywood" + +1.a) The Video Interface (VI) node + + Represents the interface between the graphics processor and a external + video encoder. + + Required properties: + + - compatible : should be "nintendo,hollywood-vi","nintendo,flipper-vi" + - reg : should contain the VI registers location and length + - interrupts : should contain the VI interrupt + +1.b) The Processor Interface (PI) node + + Represents the data and control interface between the main processor + and graphics and audio processor. + + Required properties: + + - compatible : should be "nintendo,hollywood-pi","nintendo,flipper-pi" + - reg : should contain the PI registers location and length + +1.b.i) The "Flipper" interrupt controller node + + Represents the "Flipper" interrupt controller within the "Hollywood" chip. + The node for the "Flipper" interrupt controller must be placed under + the PI node. + + Required properties: + + - #interrupt-cells : <1> + - compatible : should be "nintendo,flipper-pic" + - interrupt-controller + +1.c) The Digital Signal Procesor (DSP) node + + Represents the digital signal processor interface, designed to offload + audio related tasks. + + Required properties: + + - compatible : should be "nintendo,hollywood-dsp","nintendo,flipper-dsp" + - reg : should contain the DSP registers location and length + - interrupts : should contain the DSP interrupt + +1.d) The Serial Interface (SI) node + + Represents the interface to the four single bit serial interfaces. + The SI is a proprietary serial interface used normally to control gamepads. + It's NOT a RS232-type interface. + + Required properties: + + - compatible : should be "nintendo,hollywood-si","nintendo,flipper-si" + - reg : should contain the SI registers location and length + - interrupts : should contain the SI interrupt + +1.e) The Audio Interface (AI) node + + Represents the interface to the external 16-bit stereo digital-to-analog + converter. + + Required properties: + + - compatible : should be "nintendo,hollywood-ai","nintendo,flipper-ai" + - reg : should contain the AI registers location and length + - interrupts : should contain the AI interrupt + +1.f) The External Interface (EXI) node + + Represents the multi-channel SPI-like interface. + + Required properties: + + - compatible : should be "nintendo,hollywood-exi","nintendo,flipper-exi" + - reg : should contain the EXI registers location and length + - interrupts : should contain the EXI interrupt + +1.g) The Open Host Controller Interface (OHCI) nodes + + Represent the USB 1.x Open Host Controller Interfaces. + + Required properties: + + - compatible : should be "nintendo,hollywood-usb-ohci","usb-ohci" + - reg : should contain the OHCI registers location and length + - interrupts : should contain the OHCI interrupt + +1.h) The Enhanced Host Controller Interface (EHCI) node + + Represents the USB 2.0 Enhanced Host Controller Interface. + + Required properties: + + - compatible : should be "nintendo,hollywood-usb-ehci","usb-ehci" + - reg : should contain the EHCI registers location and length + - interrupts : should contain the EHCI interrupt + +1.i) The Secure Digital Host Controller Interface (SDHCI) nodes + + Represent the Secure Digital Host Controller Interfaces. + + Required properties: + + - compatible : should be "nintendo,hollywood-sdhci","sdhci" + - reg : should contain the SDHCI registers location and length + - interrupts : should contain the SDHCI interrupt + +1.j) The Inter-Processsor Communication (IPC) node + + Represent the Inter-Processor Communication interface. This interface + enables communications between the Broadway and the Starlet processors. + + - compatible : should be "nintendo,hollywood-ipc" + - reg : should contain the IPC registers location and length + - interrupts : should contain the IPC interrupt + +1.k) The "Hollywood" interrupt controller node + + Represents the "Hollywood" interrupt controller within the + "Hollywood" chip. + + Required properties: + + - #interrupt-cells : <1> + - compatible : should be "nintendo,hollywood-pic" + - reg : should contain the controller registers location and length + - interrupt-controller + - interrupts : should contain the cascade interrupt of the "flipper" pic + - interrupt-parent: should contain the phandle of the "flipper" pic + +1.l) The General Purpose I/O (GPIO) controller node + + Represents the dual access 32 GPIO controller interface. + + Required properties: + + - #gpio-cells : <2> + - compatible : should be "nintendo,hollywood-gpio" + - reg : should contain the IPC registers location and length + - gpio-controller + +1.m) The control node + + Represents the control interface used to setup several miscellaneous + settings of the "Hollywood" chip like boot memory mappings, resets, + disk interface mode, etc. + + Required properties: + + - compatible : should be "nintendo,hollywood-control" + - reg : should contain the control registers location and length + +1.n) The Disk Interface (DI) node + + Represents the interface used to communicate with mass storage devices. + + Required properties: + + - compatible : should be "nintendo,hollywood-di" + - reg : should contain the DI registers location and length + - interrupts : should contain the DI interrupt + diff --git a/arch/powerpc/boot/dts/wii.dts b/arch/powerpc/boot/dts/wii.dts new file mode 100644 index 000000000000..77528c9a8dbd --- /dev/null +++ b/arch/powerpc/boot/dts/wii.dts @@ -0,0 +1,218 @@ +/* + * arch/powerpc/boot/dts/wii.dts + * + * Nintendo Wii platform device tree source + * Copyright (C) 2008-2009 The GameCube Linux Team + * Copyright (C) 2008,2009 Albert Herranz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + */ + +/dts-v1/; + +/* + * This is commented-out for now. + * Until a later patch is merged, the kernel can use only the first + * contiguous RAM range and will BUG() if the memreserve is outside + * that range. + */ +/*/memreserve/ 0x10000000 0x0004000;*/ /* DSP RAM */ + +/ { + model = "nintendo,wii"; + compatible = "nintendo,wii"; + #address-cells = <1>; + #size-cells = <1>; + + chosen { + bootargs = "root=/dev/mmcblk0p2 rootwait udbg-immortal"; + }; + + memory { + device_type = "memory"; + reg = <0x00000000 0x01800000 /* MEM1 24MB 1T-SRAM */ + 0x10000000 0x04000000>; /* MEM2 64MB GDDR3 */ + }; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + PowerPC,broadway@0 { + device_type = "cpu"; + reg = <0>; + clock-frequency = <729000000>; /* 729MHz */ + bus-frequency = <243000000>; /* 243MHz core-to-bus 3x */ + timebase-frequency = <60750000>; /* 243MHz / 4 */ + i-cache-line-size = <32>; + d-cache-line-size = <32>; + i-cache-size = <32768>; + d-cache-size = <32768>; + }; + }; + + /* devices contained in the hollywood chipset */ + hollywood { + #address-cells = <1>; + #size-cells = <1>; + compatible = "nintendo,hollywood"; + ranges = <0x0c000000 0x0c000000 0x01000000 + 0x0d000000 0x0d000000 0x00800000 + 0x0d800000 0x0d800000 0x00800000>; + interrupt-parent = <&PIC0>; + + video@0c002000 { + compatible = "nintendo,hollywood-vi", + "nintendo,flipper-vi"; + reg = <0x0c002000 0x100>; + interrupts = <8>; + }; + + processor-interface@0c003000 { + compatible = "nintendo,hollywood-pi", + "nintendo,flipper-pi"; + reg = <0x0c003000 0x100>; + + PIC0: pic0 { + #interrupt-cells = <1>; + compatible = "nintendo,flipper-pic"; + interrupt-controller; + }; + }; + + dsp@0c005000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "nintendo,hollywood-dsp", + "nintendo,flipper-dsp"; + reg = <0x0c005000 0x200>; + interrupts = <6>; + }; + + gamepad-controller@0d006400 { + compatible = "nintendo,hollywood-si", + "nintendo,flipper-si"; + reg = <0x0d006400 0x100>; + interrupts = <3>; + }; + + audio@0c006c00 { + compatible = "nintendo,hollywood-ai", + "nintendo,flipper-ai"; + reg = <0x0d006c00 0x20>; + interrupts = <6>; + }; + + /* External Interface bus */ + exi@0d006800 { + compatible = "nintendo,hollywood-exi", + "nintendo,flipper-exi"; + reg = <0x0d006800 0x40>; + virtual-reg = <0x0d006800>; + interrupts = <4>; + }; + + usb@0d040000 { + compatible = "nintendo,hollywood-usb-ehci", + "usb-ehci"; + reg = <0x0d040000 0x100>; + interrupts = <4>; + interrupt-parent = <&PIC1>; + }; + + usb@0d050000 { + compatible = "nintendo,hollywood-usb-ohci", + "usb-ohci"; + reg = <0x0d050000 0x100>; + interrupts = <5>; + interrupt-parent = <&PIC1>; + }; + + usb@0d060000 { + compatible = "nintendo,hollywood-usb-ohci", + "usb-ohci"; + reg = <0x0d060000 0x100>; + interrupts = <6>; + interrupt-parent = <&PIC1>; + }; + + sd@0d070000 { + compatible = "nintendo,hollywood-sdhci", + "sdhci"; + reg = <0x0d070000 0x200>; + interrupts = <7>; + interrupt-parent = <&PIC1>; + }; + + sdio@0d080000 { + compatible = "nintendo,hollywood-sdhci", + "sdhci"; + reg = <0x0d080000 0x200>; + interrupts = <8>; + interrupt-parent = <&PIC1>; + }; + + ipc@0d000000 { + compatible = "nintendo,hollywood-ipc"; + reg = <0x0d000000 0x10>; + interrupts = <30>; + interrupt-parent = <&PIC1>; + }; + + PIC1: pic1@0d800030 { + #interrupt-cells = <1>; + compatible = "nintendo,hollywood-pic"; + reg = <0x0d800030 0x10>; + interrupt-controller; + interrupts = <14>; + }; + + GPIO: gpio@0d8000c0 { + #gpio-cells = <2>; + compatible = "nintendo,hollywood-gpio"; + reg = <0x0d8000c0 0x40>; + gpio-controller; + + /* + * This is commented out while a standard binding + * for i2c over gpio is defined. + */ + /* + i2c-video { + #address-cells = <1>; + #size-cells = <0>; + compatible = "i2c-gpio"; + + gpios = <&GPIO 15 0 + &GPIO 14 0>; + clock-frequency = <250000>; + no-clock-stretching; + scl-is-open-drain; + sda-is-open-drain; + sda-enforce-dir; + + AVE: audio-video-encoder@70 { + compatible = "nintendo,wii-audio-video-encoder"; + reg = <0x70>; + }; + }; + */ + }; + + control@0d800100 { + compatible = "nintendo,hollywood-control"; + reg = <0x0d800100 0x300>; + }; + + disk@0d806000 { + compatible = "nintendo,hollywood-di"; + reg = <0x0d806000 0x40>; + interrupts = <2>; + }; + }; +}; + -- cgit v1.2.3 From 43a705076e51c5af21ec4260a35699775ea298f5 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 14 Dec 2009 12:49:55 +1100 Subject: md: support updating bitmap parameters via sysfs. A new attribute directory 'bitmap' in 'md' is created which contains files for configuring the bitmap. 'location' identifies where the bitmap is, either 'none', or 'file' or 'sector offset from metadata'. Writing 'location' can create or remove a bitmap. Adding a 'file' bitmap this way is not yet supported. 'chunksize' and 'time_base' must be set before 'location' can be set. 'chunksize' can be set before creating a bitmap, but is currently always over-ridden by the bitmap superblock. 'time_base' and 'backlog' can be updated at any time. Signed-off-by: NeilBrown Reviewed-by: Andre Noll --- Documentation/md.txt | 29 ++++++++ drivers/md/bitmap.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/md/md.c | 5 +- drivers/md/md.h | 3 +- 4 files changed, 240 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index 4edd39ec7db9..18fad6876228 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -296,6 +296,35 @@ All md devices contain: active-idle like active, but no writes have been seen for a while (safe_mode_delay). + bitmap/location + This indicates where the write-intent bitmap for the array is + stored. + It can be one of "none", "file" or "[+-]N". + "file" may later be extended to "file:/file/name" + "[+-]N" means that many sectors from the start of the metadata. + This is replicated on all devices. For arrays with externally + managed metadata, the offset is from the beginning of the + device. + bitmap/chunksize + The size, in bytes, of the chunk which will be represented by a + single bit. For RAID456, it is a portion of an individual + device. For RAID10, it is a portion of the array. For RAID1, it + is both (they come to the same thing). + bitmap/time_base + The time, in seconds, between looking for bits in the bitmap to + be cleared. In the current implementation, a bit will be cleared + between 2 and 3 times "time_base" after all the covered blocks + are known to be in-sync. + bitmap/backlog + When write-mostly devices are active in a RAID1, write requests + to those devices proceed in the background - the filesystem (or + other user of the device) does not have to wait for them. + 'backlog' sets a limit on the number of concurrent background + writes. If there are more than this, new writes will by + synchronous. + + + As component devices are added to an md array, they appear in the 'md' directory as new directories named diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 958865445f06..24fff75b2191 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -510,6 +510,9 @@ void bitmap_update_sb(struct bitmap *bitmap) bitmap->events_cleared = bitmap->mddev->events; sb->events_cleared = cpu_to_le64(bitmap->events_cleared); } + /* Just in case these have been changed via sysfs: */ + sb->daemon_sleep = cpu_to_le32(bitmap->mddev->bitmap_info.daemon_sleep/HZ); + sb->write_behind = cpu_to_le32(bitmap->mddev->bitmap_info.max_write_behind); kunmap_atomic(sb, KM_USER0); write_page(bitmap, bitmap->sb_page, 1); } @@ -1714,6 +1717,208 @@ int bitmap_create(mddev_t *mddev) return err; } +static ssize_t +location_show(mddev_t *mddev, char *page) +{ + ssize_t len; + if (mddev->bitmap_info.file) { + len = sprintf(page, "file"); + } else if (mddev->bitmap_info.offset) { + len = sprintf(page, "%+lld", (long long)mddev->bitmap_info.offset); + } else + len = sprintf(page, "none"); + len += sprintf(page+len, "\n"); + return len; +} + +static ssize_t +location_store(mddev_t *mddev, const char *buf, size_t len) +{ + + if (mddev->pers) { + if (!mddev->pers->quiesce) + return -EBUSY; + if (mddev->recovery || mddev->sync_thread) + return -EBUSY; + } + + if (mddev->bitmap || mddev->bitmap_info.file || + mddev->bitmap_info.offset) { + /* bitmap already configured. Only option is to clear it */ + if (strncmp(buf, "none", 4) != 0) + return -EBUSY; + if (mddev->pers) { + mddev->pers->quiesce(mddev, 1); + bitmap_destroy(mddev); + mddev->pers->quiesce(mddev, 0); + } + mddev->bitmap_info.offset = 0; + if (mddev->bitmap_info.file) { + struct file *f = mddev->bitmap_info.file; + mddev->bitmap_info.file = NULL; + restore_bitmap_write_access(f); + fput(f); + } + } else { + /* No bitmap, OK to set a location */ + long long offset; + if (strncmp(buf, "none", 4) == 0) + /* nothing to be done */; + else if (strncmp(buf, "file:", 5) == 0) { + /* Not supported yet */ + return -EINVAL; + } else { + int rv; + if (buf[0] == '+') + rv = strict_strtoll(buf+1, 10, &offset); + else + rv = strict_strtoll(buf, 10, &offset); + if (rv) + return rv; + if (offset == 0) + return -EINVAL; + if (mddev->major_version == 0 && + offset != mddev->bitmap_info.default_offset) + return -EINVAL; + mddev->bitmap_info.offset = offset; + if (mddev->pers) { + mddev->pers->quiesce(mddev, 1); + rv = bitmap_create(mddev); + if (rv) { + bitmap_destroy(mddev); + mddev->bitmap_info.offset = 0; + } + mddev->pers->quiesce(mddev, 0); + if (rv) + return rv; + } + } + } + if (!mddev->external) { + /* Ensure new bitmap info is stored in + * metadata promptly. + */ + set_bit(MD_CHANGE_DEVS, &mddev->flags); + md_wakeup_thread(mddev->thread); + } + return len; +} + +static struct md_sysfs_entry bitmap_location = +__ATTR(location, S_IRUGO|S_IWUSR, location_show, location_store); + +static ssize_t +timeout_show(mddev_t *mddev, char *page) +{ + ssize_t len; + unsigned long secs = mddev->bitmap_info.daemon_sleep / HZ; + unsigned long jifs = mddev->bitmap_info.daemon_sleep % HZ; + + len = sprintf(page, "%lu", secs); + if (jifs) + len += sprintf(page+len, ".%03u", jiffies_to_msecs(jifs)); + len += sprintf(page+len, "\n"); + return len; +} + +static ssize_t +timeout_store(mddev_t *mddev, const char *buf, size_t len) +{ + /* timeout can be set at any time */ + unsigned long timeout; + int rv = strict_strtoul_scaled(buf, &timeout, 4); + if (rv) + return rv; + + /* just to make sure we don't overflow... */ + if (timeout >= LONG_MAX / HZ) + return -EINVAL; + + timeout = timeout * HZ / 10000; + + if (timeout >= MAX_SCHEDULE_TIMEOUT) + timeout = MAX_SCHEDULE_TIMEOUT-1; + if (timeout < 1) + timeout = 1; + mddev->bitmap_info.daemon_sleep = timeout; + if (mddev->thread) { + /* if thread->timeout is MAX_SCHEDULE_TIMEOUT, then + * the bitmap is all clean and we don't need to + * adjust the timeout right now + */ + if (mddev->thread->timeout < MAX_SCHEDULE_TIMEOUT) { + mddev->thread->timeout = timeout; + md_wakeup_thread(mddev->thread); + } + } + return len; +} + +static struct md_sysfs_entry bitmap_timeout = +__ATTR(time_base, S_IRUGO|S_IWUSR, timeout_show, timeout_store); + +static ssize_t +backlog_show(mddev_t *mddev, char *page) +{ + return sprintf(page, "%lu\n", mddev->bitmap_info.max_write_behind); +} + +static ssize_t +backlog_store(mddev_t *mddev, const char *buf, size_t len) +{ + unsigned long backlog; + int rv = strict_strtoul(buf, 10, &backlog); + if (rv) + return rv; + if (backlog > COUNTER_MAX) + return -EINVAL; + mddev->bitmap_info.max_write_behind = backlog; + return len; +} + +static struct md_sysfs_entry bitmap_backlog = +__ATTR(backlog, S_IRUGO|S_IWUSR, backlog_show, backlog_store); + +static ssize_t +chunksize_show(mddev_t *mddev, char *page) +{ + return sprintf(page, "%lu\n", mddev->bitmap_info.chunksize); +} + +static ssize_t +chunksize_store(mddev_t *mddev, const char *buf, size_t len) +{ + /* Can only be changed when no bitmap is active */ + int rv; + unsigned long csize; + if (mddev->bitmap) + return -EBUSY; + rv = strict_strtoul(buf, 10, &csize); + if (rv) + return rv; + if (csize < 512 || + !is_power_of_2(csize)) + return -EINVAL; + mddev->bitmap_info.chunksize = csize; + return len; +} + +static struct md_sysfs_entry bitmap_chunksize = +__ATTR(chunksize, S_IRUGO|S_IWUSR, chunksize_show, chunksize_store); + +static struct attribute *md_bitmap_attrs[] = { + &bitmap_location.attr, + &bitmap_timeout.attr, + &bitmap_backlog.attr, + &bitmap_chunksize.attr, + NULL +}; +struct attribute_group md_bitmap_group = { + .name = "bitmap", + .attrs = md_bitmap_attrs, +}; + + /* the bitmap API -- for raid personalities */ EXPORT_SYMBOL(bitmap_startwrite); EXPORT_SYMBOL(bitmap_endwrite); diff --git a/drivers/md/md.c b/drivers/md/md.c index 93287f88f1f4..859edbf8c9b0 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4018,6 +4018,7 @@ static void mddev_delayed_delete(struct work_struct *ws) mddev->sysfs_action = NULL; mddev->private = NULL; } + sysfs_remove_group(&mddev->kobj, &md_bitmap_group); kobject_del(&mddev->kobj); kobject_put(&mddev->kobj); } @@ -4109,6 +4110,8 @@ static int md_alloc(dev_t dev, char *name) disk->disk_name); error = 0; } + if (sysfs_create_group(&mddev->kobj, &md_bitmap_group)) + printk(KERN_DEBUG "pointless warning\n"); abort: mutex_unlock(&disks_mutex); if (!error) { @@ -4434,7 +4437,7 @@ static int deny_bitmap_write_access(struct file * file) return 0; } -static void restore_bitmap_write_access(struct file *file) +void restore_bitmap_write_access(struct file *file) { struct inode *inode = file->f_mapping->host; diff --git a/drivers/md/md.h b/drivers/md/md.h index b5c306925d94..fce02073f1a4 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -372,7 +372,7 @@ struct md_sysfs_entry { ssize_t (*show)(mddev_t *, char *); ssize_t (*store)(mddev_t *, const char *, size_t); }; - +extern struct attribute_group md_bitmap_group; static inline char * mdname (mddev_t * mddev) { @@ -465,5 +465,6 @@ extern int md_check_no_bitmap(mddev_t *mddev); extern int md_integrity_register(mddev_t *mddev); extern void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev); extern int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale); +extern void restore_bitmap_write_access(struct file *file); #endif /* _MD_MD_H */ -- cgit v1.2.3 From ece5cff0da9e696c360fff592cb5f51b6419e4d6 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 14 Dec 2009 12:49:56 +1100 Subject: md: Support write-intent bitmaps with externally managed metadata. In this case, the metadata needs to not be in the same sector as the bitmap. md will not read/write any bitmap metadata. Config must be done via sysfs and when a recovery makes the array non-degraded again, writing 'true' to 'bitmap/can_clear' will allow bits in the bitmap to be cleared again. Signed-off-by: NeilBrown --- Documentation/md.txt | 16 ++++++ drivers/md/bitmap.c | 142 ++++++++++++++++++++++++++++++++++++++++++--------- drivers/md/bitmap.h | 11 +--- drivers/md/md.h | 1 + 4 files changed, 137 insertions(+), 33 deletions(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index 18fad6876228..21d26fb5d02b 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -322,6 +322,22 @@ All md devices contain: 'backlog' sets a limit on the number of concurrent background writes. If there are more than this, new writes will by synchronous. + bitmap/metadata + This can be either 'internal' or 'external'. + 'internal' is the default and means the metadata for the bitmap + is stored in the first 256 bytes of the allocated space and is + managed by the md module. + 'external' means that bitmap metadata is managed externally to + the kernel (i.e. by some userspace program) + bitmap/can_clear + This is either 'true' or 'false'. If 'true', then bits in the + bitmap will be cleared when the corresponding blocks are thought + to be in-sync. If 'false', bits will never be cleared. + This is automatically set to 'false' if a write happens on a + degraded array, or if the array becomes degraded during a write. + When metadata is managed externally, it should be set to true + once the array becomes non-degraded, and this fact has been + recorded in the metadata. diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 62958491f329..de5c42df8d17 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -497,6 +497,8 @@ void bitmap_update_sb(struct bitmap *bitmap) if (!bitmap || !bitmap->mddev) /* no bitmap for this array */ return; + if (bitmap->mddev->bitmap_info.external) + return; spin_lock_irqsave(&bitmap->lock, flags); if (!bitmap->sb_page) { /* no superblock */ spin_unlock_irqrestore(&bitmap->lock, flags); @@ -676,16 +678,26 @@ static int bitmap_mask_state(struct bitmap *bitmap, enum bitmap_state bits, * general bitmap file operations */ +/* + * on-disk bitmap: + * + * Use one bit per "chunk" (block set). We do the disk I/O on the bitmap + * file a page at a time. There's a superblock at the start of the file. + */ /* calculate the index of the page that contains this bit */ -static inline unsigned long file_page_index(unsigned long chunk) +static inline unsigned long file_page_index(struct bitmap *bitmap, unsigned long chunk) { - return CHUNK_BIT_OFFSET(chunk) >> PAGE_BIT_SHIFT; + if (!bitmap->mddev->bitmap_info.external) + chunk += sizeof(bitmap_super_t) << 3; + return chunk >> PAGE_BIT_SHIFT; } /* calculate the (bit) offset of this bit within a page */ -static inline unsigned long file_page_offset(unsigned long chunk) +static inline unsigned long file_page_offset(struct bitmap *bitmap, unsigned long chunk) { - return CHUNK_BIT_OFFSET(chunk) & (PAGE_BITS - 1); + if (!bitmap->mddev->bitmap_info.external) + chunk += sizeof(bitmap_super_t) << 3; + return chunk & (PAGE_BITS - 1); } /* @@ -698,8 +710,9 @@ static inline unsigned long file_page_offset(unsigned long chunk) static inline struct page *filemap_get_page(struct bitmap *bitmap, unsigned long chunk) { - if (file_page_index(chunk) >= bitmap->file_pages) return NULL; - return bitmap->filemap[file_page_index(chunk) - file_page_index(0)]; + if (file_page_index(bitmap, chunk) >= bitmap->file_pages) return NULL; + return bitmap->filemap[file_page_index(bitmap, chunk) + - file_page_index(bitmap, 0)]; } @@ -722,7 +735,7 @@ static void bitmap_file_unmap(struct bitmap *bitmap) spin_unlock_irqrestore(&bitmap->lock, flags); while (pages--) - if (map[pages]->index != 0) /* 0 is sb_page, release it below */ + if (map[pages] != sb_page) /* 0 is sb_page, release it below */ free_buffers(map[pages]); kfree(map); kfree(attr); @@ -833,7 +846,7 @@ static void bitmap_file_set_bit(struct bitmap *bitmap, sector_t block) page = filemap_get_page(bitmap, chunk); if (!page) return; - bit = file_page_offset(chunk); + bit = file_page_offset(bitmap, chunk); /* set the bit */ kaddr = kmap_atomic(page, KM_USER0); @@ -931,14 +944,17 @@ static int bitmap_init_from_disk(struct bitmap *bitmap, sector_t start) "recovery\n", bmname(bitmap)); bytes = (chunks + 7) / 8; + if (!bitmap->mddev->bitmap_info.external) + bytes += sizeof(bitmap_super_t); - num_pages = (bytes + sizeof(bitmap_super_t) + PAGE_SIZE - 1) / PAGE_SIZE; + + num_pages = (bytes + PAGE_SIZE - 1) / PAGE_SIZE; - if (file && i_size_read(file->f_mapping->host) < bytes + sizeof(bitmap_super_t)) { + if (file && i_size_read(file->f_mapping->host) < bytes) { printk(KERN_INFO "%s: bitmap file too short %lu < %lu\n", bmname(bitmap), (unsigned long) i_size_read(file->f_mapping->host), - bytes + sizeof(bitmap_super_t)); + bytes); goto err; } @@ -959,17 +975,16 @@ static int bitmap_init_from_disk(struct bitmap *bitmap, sector_t start) for (i = 0; i < chunks; i++) { int b; - index = file_page_index(i); - bit = file_page_offset(i); + index = file_page_index(bitmap, i); + bit = file_page_offset(bitmap, i); if (index != oldindex) { /* this is a new page, read it in */ int count; /* unmap the old page, we're done with it */ if (index == num_pages-1) - count = bytes + sizeof(bitmap_super_t) - - index * PAGE_SIZE; + count = bytes - index * PAGE_SIZE; else count = PAGE_SIZE; - if (index == 0) { + if (index == 0 && bitmap->sb_page) { /* * if we're here then the superblock page * contains some bits (PAGE_SIZE != sizeof sb) @@ -1164,7 +1179,8 @@ void bitmap_daemon_work(mddev_t *mddev) /* We are possibly going to clear some bits, so make * sure that events_cleared is up-to-date. */ - if (bitmap->need_sync) { + if (bitmap->need_sync && + bitmap->mddev->bitmap_info.external == 0) { bitmap_super_t *sb; bitmap->need_sync = 0; sb = kmap_atomic(bitmap->sb_page, KM_USER0); @@ -1174,7 +1190,8 @@ void bitmap_daemon_work(mddev_t *mddev) write_page(bitmap, bitmap->sb_page, 1); } spin_lock_irqsave(&bitmap->lock, flags); - clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); + if (!bitmap->need_sync) + clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); } bmc = bitmap_get_counter(bitmap, (sector_t)j << CHUNK_BLOCK_SHIFT(bitmap), @@ -1189,7 +1206,7 @@ void bitmap_daemon_work(mddev_t *mddev) if (*bmc == 2) { *bmc=1; /* maybe clear the bit next time */ set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN); - } else if (*bmc == 1) { + } else if (*bmc == 1 && !bitmap->need_sync) { /* we can clear the bit */ *bmc = 0; bitmap_count_page(bitmap, @@ -1199,9 +1216,11 @@ void bitmap_daemon_work(mddev_t *mddev) /* clear the bit */ paddr = kmap_atomic(page, KM_USER0); if (bitmap->flags & BITMAP_HOSTENDIAN) - clear_bit(file_page_offset(j), paddr); + clear_bit(file_page_offset(bitmap, j), + paddr); else - ext2_clear_bit(file_page_offset(j), paddr); + ext2_clear_bit(file_page_offset(bitmap, j), + paddr); kunmap_atomic(paddr, KM_USER0); } } else @@ -1356,6 +1375,7 @@ void bitmap_endwrite(struct bitmap *bitmap, sector_t offset, unsigned long secto bitmap->events_cleared < bitmap->mddev->events) { bitmap->events_cleared = bitmap->mddev->events; bitmap->need_sync = 1; + sysfs_notify_dirent(bitmap->sysfs_can_clear); } if (!success && ! (*bmc & NEEDED_MASK)) @@ -1613,6 +1633,9 @@ void bitmap_destroy(mddev_t *mddev) if (mddev->thread) mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT; + if (bitmap->sysfs_can_clear) + sysfs_put(bitmap->sysfs_can_clear); + bitmap_free(bitmap); } @@ -1629,6 +1652,7 @@ int bitmap_create(mddev_t *mddev) struct file *file = mddev->bitmap_info.file; int err; sector_t start; + struct sysfs_dirent *bm; BUILD_BUG_ON(sizeof(bitmap_super_t) != 256); @@ -1648,6 +1672,13 @@ int bitmap_create(mddev_t *mddev) bitmap->mddev = mddev; + bm = sysfs_get_dirent(mddev->kobj.sd, "bitmap"); + if (bm) { + bitmap->sysfs_can_clear = sysfs_get_dirent(bm, "can_clear"); + sysfs_put(bm); + } else + bitmap->sysfs_can_clear = NULL; + bitmap->file = file; if (file) { get_file(file); @@ -1658,7 +1689,16 @@ int bitmap_create(mddev_t *mddev) vfs_fsync(file, file->f_dentry, 1); } /* read superblock from bitmap file (this sets mddev->bitmap_info.chunksize) */ - err = bitmap_read_sb(bitmap); + if (!mddev->bitmap_info.external) + err = bitmap_read_sb(bitmap); + else { + err = 0; + if (mddev->bitmap_info.chunksize == 0 || + mddev->bitmap_info.daemon_sleep == 0) + /* chunksize and time_base need to be + * set first. */ + err = -EINVAL; + } if (err) goto error; @@ -1777,7 +1817,8 @@ location_store(mddev_t *mddev, const char *buf, size_t len) return rv; if (offset == 0) return -EINVAL; - if (mddev->major_version == 0 && + if (mddev->bitmap_info.external == 0 && + mddev->major_version == 0 && offset != mddev->bitmap_info.default_offset) return -EINVAL; mddev->bitmap_info.offset = offset; @@ -1906,11 +1947,66 @@ chunksize_store(mddev_t *mddev, const char *buf, size_t len) static struct md_sysfs_entry bitmap_chunksize = __ATTR(chunksize, S_IRUGO|S_IWUSR, chunksize_show, chunksize_store); +static ssize_t metadata_show(mddev_t *mddev, char *page) +{ + return sprintf(page, "%s\n", (mddev->bitmap_info.external + ? "external" : "internal")); +} + +static ssize_t metadata_store(mddev_t *mddev, const char *buf, size_t len) +{ + if (mddev->bitmap || + mddev->bitmap_info.file || + mddev->bitmap_info.offset) + return -EBUSY; + if (strncmp(buf, "external", 8) == 0) + mddev->bitmap_info.external = 1; + else if (strncmp(buf, "internal", 8) == 0) + mddev->bitmap_info.external = 0; + else + return -EINVAL; + return len; +} + +static struct md_sysfs_entry bitmap_metadata = +__ATTR(metadata, S_IRUGO|S_IWUSR, metadata_show, metadata_store); + +static ssize_t can_clear_show(mddev_t *mddev, char *page) +{ + int len; + if (mddev->bitmap) + len = sprintf(page, "%s\n", (mddev->bitmap->need_sync ? + "false" : "true")); + else + len = sprintf(page, "\n"); + return len; +} + +static ssize_t can_clear_store(mddev_t *mddev, const char *buf, size_t len) +{ + if (mddev->bitmap == NULL) + return -ENOENT; + if (strncmp(buf, "false", 5) == 0) + mddev->bitmap->need_sync = 1; + else if (strncmp(buf, "true", 4) == 0) { + if (mddev->degraded) + return -EBUSY; + mddev->bitmap->need_sync = 0; + } else + return -EINVAL; + return len; +} + +static struct md_sysfs_entry bitmap_can_clear = +__ATTR(can_clear, S_IRUGO|S_IWUSR, can_clear_show, can_clear_store); + static struct attribute *md_bitmap_attrs[] = { &bitmap_location.attr, &bitmap_timeout.attr, &bitmap_backlog.attr, &bitmap_chunksize.attr, + &bitmap_metadata.attr, + &bitmap_can_clear.attr, NULL }; struct attribute_group md_bitmap_group = { diff --git a/drivers/md/bitmap.h b/drivers/md/bitmap.h index 50ee4240f5db..cb821d76d1b4 100644 --- a/drivers/md/bitmap.h +++ b/drivers/md/bitmap.h @@ -118,16 +118,6 @@ typedef __u16 bitmap_counter_t; (CHUNK_BLOCK_SHIFT(bitmap) + PAGE_COUNTER_SHIFT - 1) #define PAGEPTR_BLOCK_MASK(bitmap) (PAGEPTR_BLOCK_RATIO(bitmap) - 1) -/* - * on-disk bitmap: - * - * Use one bit per "chunk" (block set). We do the disk I/O on the bitmap - * file a page at a time. There's a superblock at the start of the file. - */ - -/* map chunks (bits) to file pages - offset by the size of the superblock */ -#define CHUNK_BIT_OFFSET(chunk) ((chunk) + (sizeof(bitmap_super_t) << 3)) - #endif /* @@ -250,6 +240,7 @@ struct bitmap { wait_queue_head_t write_wait; wait_queue_head_t overflow_wait; + struct sysfs_dirent *sysfs_can_clear; }; /* the bitmap API */ diff --git a/drivers/md/md.h b/drivers/md/md.h index fce02073f1a4..d9138885b87f 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -296,6 +296,7 @@ struct mddev_s unsigned long chunksize; unsigned long daemon_sleep; /* how many seconds between updates? */ unsigned long max_write_behind; /* write-behind mode */ + int external; } bitmap_info; struct list_head all_mddevs; -- cgit v1.2.3 From 06e3c817b750c131a20e82eed57a17841ea88ed2 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Sat, 12 Dec 2009 21:17:12 -0700 Subject: md: add 'recovery_start' per-device sysfs attribute Enable external metadata arrays to manage rebuild checkpointing via a md/dev-XXX/recovery_start attribute which reflects rdev->recovery_offset Also update resync_start_store to allow 'none' to be written, for consistency. Signed-off-by: Dan Williams Signed-off-by: NeilBrown --- Documentation/md.txt | 27 +++++++++++++++++++++++---- drivers/md/md.c | 41 ++++++++++++++++++++++++++++++++++++++++- 2 files changed, 63 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index 21d26fb5d02b..188f4768f1d5 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -233,9 +233,9 @@ All md devices contain: resync_start The point at which resync should start. If no resync is needed, - this will be a very large number. At array creation it will - default to 0, though starting the array as 'clean' will - set it much larger. + this will be a very large number (or 'none' since 2.6.30-rc1). At + array creation it will default to 0, though starting the array as + 'clean' will set it much larger. new_dev This file can be written but not read. The value written should @@ -379,8 +379,9 @@ Each directory contains: Writing "writemostly" sets the writemostly flag. Writing "-writemostly" clears the writemostly flag. Writing "blocked" sets the "blocked" flag. - Writing "-blocked" clear the "blocked" flag and allows writes + Writing "-blocked" clears the "blocked" flag and allows writes to complete. + Writing "in_sync" sets the in_sync flag. This file responds to select/poll. Any change to 'faulty' or 'blocked' causes an event. @@ -417,6 +418,24 @@ Each directory contains: array. If a value less than the current component_size is written, it will be rejected. + recovery_start + + When the device is not 'in_sync', this records the number of + sectors from the start of the device which are known to be + correct. This is normally zero, but during a recovery + operation is will steadily increase, and if the recovery is + interrupted, restoring this value can cause recovery to + avoid repeating the earlier blocks. With v1.x metadata, this + value is saved and restored automatically. + + This can be set whenever the device is not an active member of + the array, either before the array is activated, or before + the 'slot' is set. + + Setting this to 'none' is equivalent to setting 'in_sync'. + Setting to any other value also clears the 'in_sync' flag. + + An active md device will also contain and entry for each active device in the array. These are named diff --git a/drivers/md/md.c b/drivers/md/md.c index ea64a68e9c75..e1f3c1715cca 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2551,12 +2551,49 @@ rdev_size_store(mdk_rdev_t *rdev, const char *buf, size_t len) static struct rdev_sysfs_entry rdev_size = __ATTR(size, S_IRUGO|S_IWUSR, rdev_size_show, rdev_size_store); + +static ssize_t recovery_start_show(mdk_rdev_t *rdev, char *page) +{ + unsigned long long recovery_start = rdev->recovery_offset; + + if (test_bit(In_sync, &rdev->flags) || + recovery_start == MaxSector) + return sprintf(page, "none\n"); + + return sprintf(page, "%llu\n", recovery_start); +} + +static ssize_t recovery_start_store(mdk_rdev_t *rdev, const char *buf, size_t len) +{ + unsigned long long recovery_start; + + if (cmd_match(buf, "none")) + recovery_start = MaxSector; + else if (strict_strtoull(buf, 10, &recovery_start)) + return -EINVAL; + + if (rdev->mddev->pers && + rdev->raid_disk >= 0) + return -EBUSY; + + rdev->recovery_offset = recovery_start; + if (recovery_start == MaxSector) + set_bit(In_sync, &rdev->flags); + else + clear_bit(In_sync, &rdev->flags); + return len; +} + +static struct rdev_sysfs_entry rdev_recovery_start = +__ATTR(recovery_start, S_IRUGO|S_IWUSR, recovery_start_show, recovery_start_store); + static struct attribute *rdev_default_attrs[] = { &rdev_state.attr, &rdev_errors.attr, &rdev_slot.attr, &rdev_offset.attr, &rdev_size.attr, + &rdev_recovery_start.attr, NULL, }; static ssize_t @@ -3101,7 +3138,9 @@ resync_start_store(mddev_t *mddev, const char *buf, size_t len) if (mddev->pers) return -EBUSY; - if (!*buf || (*e && *e != '\n')) + if (cmd_match(buf, "none")) + n = MaxSector; + else if (!*buf || (*e && *e != '\n')) return -EINVAL; mddev->recovery_cp = n; -- cgit v1.2.3 From 06f8bda8324fa8bf39eed81d8b3df08063a37696 Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Mon, 14 Dec 2009 11:06:15 +0900 Subject: x86: Remove usedac in feature-removal-schedule.txt The reason of removal, "replaced by allowdac and no dac combination" is incorrect. There is no way to do the same thing with "allowdac" and "nodac" combination. The usedac option enables us to stop via_no_dac() setting forbid_dac to 1. That is, someone who uses VIA bridges can use DAC with this option even if some of VIA bridges seem to be broken about DAC. Signed-off-by: FUJITA Tomonori Acked-by: WANG Cong Cc: gcosta@redhat.com LKML-Reference: <20091214104423X.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar --- Documentation/feature-removal-schedule.txt | 7 ------- 1 file changed, 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 2a4d77946c7d..eb2c138c277c 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -291,13 +291,6 @@ Who: Michael Buesch --------------------------- -What: usedac i386 kernel parameter -When: 2.6.27 -Why: replaced by allowdac and no dac combination -Who: Glauber Costa - ---------------------------- - What: print_fn_descriptor_symbol() When: October 2009 Why: The %pF vsprintf format provides the same functionality in a -- cgit v1.2.3 From 7a92263705435d046d37a0990d0edfcb517f7ad3 Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Mon, 14 Dec 2009 14:52:10 +0100 Subject: netfilter: xtables: document minimal required version For both .33 and .32-stable. Signed-off-by: Jan Engelhardt Cc: stable@kernel.org Signed-off-by: Patrick McHardy --- Documentation/Changes | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/Changes b/Documentation/Changes index 6d0f1efc5bf6..f08b313cd235 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -49,6 +49,8 @@ o oprofile 0.9 # oprofiled --version o udev 081 # udevinfo -V o grub 0.93 # grub --version o mcelog 0.6 +o iptables 1.4.1 # iptables -V + Kernel compilation ================== -- cgit v1.2.3 From fb0bbb92d42d5bd0ab224605444efdfed06d6934 Mon Sep 17 00:00:00 2001 From: William Allen Simpson Date: Sun, 13 Dec 2009 15:12:46 -0500 Subject: Documentation: rw_lock lessons learned In recent months, two different network projects erroneously strayed down the rw_lock path. Update the Documentation based upon comments by Eric Dumazet and Paul E. McKenney in those threads. Further updates await somebody else with more expertise. Changes: - Merged with extensive content by Stephen Hemminger. - Fix one of the comments by Linus Torvalds. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Paul E. McKenney Signed-off-by: Linus Torvalds --- Documentation/spinlocks.txt | 184 ++++++++++++++++++++------------------------ 1 file changed, 84 insertions(+), 100 deletions(-) (limited to 'Documentation') diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt index 619699dde593..178c831b907d 100644 --- a/Documentation/spinlocks.txt +++ b/Documentation/spinlocks.txt @@ -1,73 +1,8 @@ -SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED defeat lockdep state tracking and -are hence deprecated. +Lesson 1: Spin locks -Please use DEFINE_SPINLOCK()/DEFINE_RWLOCK() or -__SPIN_LOCK_UNLOCKED()/__RW_LOCK_UNLOCKED() as appropriate for static -initialization. - -Most of the time, you can simply turn: - - static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; - -into: - - static DEFINE_SPINLOCK(xxx_lock); - -Static structure member variables go from: - - struct foo bar { - .lock = SPIN_LOCK_UNLOCKED; - }; - -to: - - struct foo bar { - .lock = __SPIN_LOCK_UNLOCKED(bar.lock); - }; - -Declaration of static rw_locks undergo a similar transformation. - -Dynamic initialization, when necessary, may be performed as -demonstrated below. - - spinlock_t xxx_lock; - rwlock_t xxx_rw_lock; - - static int __init xxx_init(void) - { - spin_lock_init(&xxx_lock); - rwlock_init(&xxx_rw_lock); - ... - } - - module_init(xxx_init); - -The following discussion is still valid, however, with the dynamic -initialization of spinlocks or with DEFINE_SPINLOCK, etc., used -instead of SPIN_LOCK_UNLOCKED. - ------------------------ - -On Fri, 2 Jan 1998, Doug Ledford wrote: -> -> I'm working on making the aic7xxx driver more SMP friendly (as well as -> importing the latest FreeBSD sequencer code to have 7895 support) and wanted -> to get some info from you. The goal here is to make the various routines -> SMP safe as well as UP safe during interrupts and other manipulating -> routines. So far, I've added a spin_lock variable to things like my queue -> structs. Now, from what I recall, there are some spin lock functions I can -> use to lock these spin locks from other use as opposed to a (nasty) -> save_flags(); cli(); stuff; restore_flags(); construct. Where do I find -> these routines and go about making use of them? Do they only lock on a -> per-processor basis or can they also lock say an interrupt routine from -> mucking with a queue if the queue routine was manipulating it when the -> interrupt occurred, or should I still use a cli(); based construct on that -> one? - -See . The basic version is: - - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; +The most basic primitive for locking is spinlock. +static DEFINE_SPINLOCK(xxx_lock); unsigned long flags; @@ -75,13 +10,11 @@ See . The basic version is: ... critical section here .. spin_unlock_irqrestore(&xxx_lock, flags); -and the above is always safe. It will disable interrupts _locally_, but the +The above is always safe. It will disable interrupts _locally_, but the spinlock itself will guarantee the global lock, so it will guarantee that there is only one thread-of-control within the region(s) protected by that -lock. - -Note that it works well even under UP - the above sequence under UP -essentially is just the same as doing a +lock. This works well even under UP. The above sequence under UP +essentially is just the same as doing unsigned long flags; @@ -91,15 +24,13 @@ essentially is just the same as doing a so the code does _not_ need to worry about UP vs SMP issues: the spinlocks work correctly under both (and spinlocks are actually more efficient on -architectures that allow doing the "save_flags + cli" in one go because I -don't export that interface normally). +architectures that allow doing the "save_flags + cli" in one operation). + + NOTE! Implications of spin_locks for memory are further described in: -NOTE NOTE NOTE! The reason the spinlock is so much faster than a global -interrupt lock under SMP is exactly because it disables interrupts only on -the local CPU. The spin-lock is safe only when you _also_ use the lock -itself to do locking across CPU's, which implies that EVERYTHING that -touches a shared variable has to agree about the spinlock they want to -use. + Documentation/memory-barriers.txt + (5) LOCK operations. + (6) UNLOCK operations. The above is usually pretty simple (you usually need and want only one spinlock for most things - using more than one spinlock can make things a @@ -120,20 +51,24 @@ and another sequence that does then they are NOT mutually exclusive, and the critical regions can happen at the same time on two different CPU's. That's fine per se, but the critical regions had better be critical for different things (ie they -can't stomp on each other). +can't stomp on each other). The above is a problem mainly if you end up mixing code - for example the routines in ll_rw_block() tend to use cli/sti to protect the atomicity of their actions, and if a driver uses spinlocks instead then you should -think about issues like the above.. +think about issues like the above. This is really the only really hard part about spinlocks: once you start using spinlocks they tend to expand to areas you might not have noticed before, because you have to make sure the spinlocks correctly protect the shared data structures _everywhere_ they are used. The spinlocks are most -easily added to places that are completely independent of other code (ie -internal driver data structures that nobody else ever touches, for -example). +easily added to places that are completely independent of other code (for +example, internal driver data structures that nobody else ever touches). + + NOTE! The spin-lock is safe only when you _also_ use the lock itself + to do locking across CPU's, which implies that EVERYTHING that + touches a shared variable has to agree about the spinlock they want + to use. ---- @@ -141,13 +76,17 @@ Lesson 2: reader-writer spinlocks. If your data accesses have a very natural pattern where you usually tend to mostly read from the shared variables, the reader-writer locks -(rw_lock) versions of the spinlocks are often nicer. They allow multiple +(rw_lock) versions of the spinlocks are sometimes useful. They allow multiple readers to be in the same critical region at once, but if somebody wants -to change the variables it has to get an exclusive write lock. The -routines look the same as above: +to change the variables it has to get an exclusive write lock. - rwlock_t xxx_lock = RW_LOCK_UNLOCKED; + NOTE! reader-writer locks require more atomic memory operations than + simple spinlocks. Unless the reader critical section is long, you + are better off just using spinlocks. +The routines look the same as above: + + rwlock_t xxx_lock = RW_LOCK_UNLOCKED; unsigned long flags; @@ -159,18 +98,21 @@ routines look the same as above: .. read and write exclusive access to the info ... write_unlock_irqrestore(&xxx_lock, flags); -The above kind of lock is useful for complex data structures like linked -lists etc, especially when you know that most of the work is to just -traverse the list searching for entries without changing the list itself, -for example. Then you can use the read lock for that kind of list -traversal, which allows many concurrent readers. Anything that _changes_ -the list will have to get the write lock. +The above kind of lock may be useful for complex data structures like +linked lists, especially searching for entries without changing the list +itself. The read lock allows many concurrent readers. Anything that +_changes_ the list will have to get the write lock. + + NOTE! RCU is better for list traversal, but requires careful + attention to design detail (see Documentation/RCU/listRCU.txt). -Note: you cannot "upgrade" a read-lock to a write-lock, so if you at _any_ +Also, you cannot "upgrade" a read-lock to a write-lock, so if you at _any_ time need to do any changes (even if you don't do it every time), you have -to get the write-lock at the very beginning. I could fairly easily add a -primitive to create a "upgradeable" read-lock, but it hasn't been an issue -yet. Tell me if you'd want one. +to get the write-lock at the very beginning. + + NOTE! We are working hard to remove reader-writer spinlocks in most + cases, so please don't add a new one without consensus. (Instead, see + Documentation/RCU/rcu.txt for complete information.) ---- @@ -233,4 +175,46 @@ indeed), while write-locks need to protect themselves against interrupts. Linus +---- + +Reference information: + +For dynamic initialization, use spin_lock_init() or rwlock_init() as +appropriate: + + spinlock_t xxx_lock; + rwlock_t xxx_rw_lock; + + static int __init xxx_init(void) + { + spin_lock_init(&xxx_lock); + rwlock_init(&xxx_rw_lock); + ... + } + + module_init(xxx_init); + +For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or +__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate. + +SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED are deprecated. These interfere +with lockdep state tracking. + +Most of the time, you can simply turn: + static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; +into: + static DEFINE_SPINLOCK(xxx_lock); + +Static structure member variables go from: + + struct foo bar { + .lock = SPIN_LOCK_UNLOCKED; + }; + +to: + struct foo bar { + .lock = __SPIN_LOCK_UNLOCKED(bar.lock); + }; + +Declaration of static rw_locks undergo a similar transformation. -- cgit v1.2.3 From c3813d6af177fab19e322f3114b1f64fbcf08d71 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Mon, 14 Dec 2009 21:17:25 +0100 Subject: i2c: Get rid of struct i2c_client_address_data Struct i2c_client_address_data only contains one field at this point, which makes its usefulness questionable. Get rid of it and pass simple address lists around instead. Signed-off-by: Jean Delvare Tested-by: Wolfram Sang --- Documentation/i2c/writing-clients | 2 +- drivers/hwmon/adm1021.c | 2 +- drivers/hwmon/adm1025.c | 2 +- drivers/hwmon/adm1026.c | 2 +- drivers/hwmon/adm1029.c | 2 +- drivers/hwmon/adm1031.c | 2 +- drivers/hwmon/adm9240.c | 2 +- drivers/hwmon/ads7828.c | 2 +- drivers/hwmon/adt7462.c | 2 +- drivers/hwmon/adt7470.c | 2 +- drivers/hwmon/adt7473.c | 2 +- drivers/hwmon/adt7475.c | 2 +- drivers/hwmon/asb100.c | 2 +- drivers/hwmon/atxp1.c | 2 +- drivers/hwmon/dme1737.c | 2 +- drivers/hwmon/ds1621.c | 2 +- drivers/hwmon/f75375s.c | 2 +- drivers/hwmon/fschmd.c | 2 +- drivers/hwmon/gl518sm.c | 2 +- drivers/hwmon/gl520sm.c | 2 +- drivers/hwmon/lm63.c | 2 +- drivers/hwmon/lm73.c | 2 +- drivers/hwmon/lm75.c | 2 +- drivers/hwmon/lm77.c | 2 +- drivers/hwmon/lm78.c | 2 +- drivers/hwmon/lm80.c | 2 +- drivers/hwmon/lm83.c | 2 +- drivers/hwmon/lm85.c | 2 +- drivers/hwmon/lm87.c | 2 +- drivers/hwmon/lm90.c | 2 +- drivers/hwmon/lm92.c | 2 +- drivers/hwmon/lm93.c | 2 +- drivers/hwmon/lm95241.c | 2 +- drivers/hwmon/max1619.c | 2 +- drivers/hwmon/max6650.c | 2 +- drivers/hwmon/pcf8591.c | 2 +- drivers/hwmon/smsc47m192.c | 2 +- drivers/hwmon/thmc50.c | 2 +- drivers/hwmon/tmp401.c | 2 +- drivers/hwmon/tmp421.c | 2 +- drivers/hwmon/w83781d.c | 2 +- drivers/hwmon/w83791d.c | 2 +- drivers/hwmon/w83792d.c | 2 +- drivers/hwmon/w83793.c | 2 +- drivers/hwmon/w83l785ts.c | 2 +- drivers/hwmon/w83l786ng.c | 2 +- drivers/i2c/i2c-core.c | 15 +++++++------ drivers/misc/eeprom/eeprom.c | 2 +- drivers/misc/ics932s401.c | 2 +- include/linux/i2c.h | 44 ++++++++++----------------------------- 50 files changed, 66 insertions(+), 89 deletions(-) (limited to 'Documentation') diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 7860aafb483d..0a74603eb671 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -44,7 +44,7 @@ static struct i2c_driver foo_driver = { /* if device autodetection is needed: */ .class = I2C_CLASS_SOMETHING, .detect = foo_detect, - .address_data = &addr_data, + .address_list = normal_i2c, .shutdown = foo_shutdown, /* optional */ .suspend = foo_suspend, /* optional */ diff --git a/drivers/hwmon/adm1021.c b/drivers/hwmon/adm1021.c index 3ebcdd9c25a5..d7af039a4d43 100644 --- a/drivers/hwmon/adm1021.c +++ b/drivers/hwmon/adm1021.c @@ -130,7 +130,7 @@ static struct i2c_driver adm1021_driver = { .remove = adm1021_remove, .id_table = adm1021_id, .detect = adm1021_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static ssize_t show_temp(struct device *dev, diff --git a/drivers/hwmon/adm1025.c b/drivers/hwmon/adm1025.c index 357c9ffa147e..e17651083b09 100644 --- a/drivers/hwmon/adm1025.c +++ b/drivers/hwmon/adm1025.c @@ -137,7 +137,7 @@ static struct i2c_driver adm1025_driver = { .remove = adm1025_remove, .id_table = adm1025_id, .detect = adm1025_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/adm1026.c b/drivers/hwmon/adm1026.c index 8deb17a402da..85bf23aea7db 100644 --- a/drivers/hwmon/adm1026.c +++ b/drivers/hwmon/adm1026.c @@ -319,7 +319,7 @@ static struct i2c_driver adm1026_driver = { .remove = adm1026_remove, .id_table = adm1026_id, .detect = adm1026_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int adm1026_read_value(struct i2c_client *client, u8 reg) diff --git a/drivers/hwmon/adm1029.c b/drivers/hwmon/adm1029.c index 9bc9dbcacdbd..a006ae5fbd2b 100644 --- a/drivers/hwmon/adm1029.c +++ b/drivers/hwmon/adm1029.c @@ -142,7 +142,7 @@ static struct i2c_driver adm1029_driver = { .remove = adm1029_remove, .id_table = adm1029_id, .detect = adm1029_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/adm1031.c b/drivers/hwmon/adm1031.c index cebfbf6926da..1e02799b870e 100644 --- a/drivers/hwmon/adm1031.c +++ b/drivers/hwmon/adm1031.c @@ -125,7 +125,7 @@ static struct i2c_driver adm1031_driver = { .remove = adm1031_remove, .id_table = adm1031_id, .detect = adm1031_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static inline u8 adm1031_read_value(struct i2c_client *client, u8 reg) diff --git a/drivers/hwmon/adm9240.c b/drivers/hwmon/adm9240.c index 9316e074d690..d9942e74ed4a 100644 --- a/drivers/hwmon/adm9240.c +++ b/drivers/hwmon/adm9240.c @@ -156,7 +156,7 @@ static struct i2c_driver adm9240_driver = { .remove = adm9240_remove, .id_table = adm9240_id, .detect = adm9240_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* per client data */ diff --git a/drivers/hwmon/ads7828.c b/drivers/hwmon/ads7828.c index b50893111532..3827ce4be071 100644 --- a/drivers/hwmon/ads7828.c +++ b/drivers/hwmon/ads7828.c @@ -183,7 +183,7 @@ static struct i2c_driver ads7828_driver = { .remove = ads7828_remove, .id_table = ads7828_id, .detect = ads7828_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* Return 0 if detection is successful, -ENODEV otherwise */ diff --git a/drivers/hwmon/adt7462.c b/drivers/hwmon/adt7462.c index 30cf002c677f..325700428ef0 100644 --- a/drivers/hwmon/adt7462.c +++ b/drivers/hwmon/adt7462.c @@ -256,7 +256,7 @@ static struct i2c_driver adt7462_driver = { .remove = adt7462_remove, .id_table = adt7462_id, .detect = adt7462_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/adt7470.c b/drivers/hwmon/adt7470.c index 9ffe5c6e495d..33aa0fa3e990 100644 --- a/drivers/hwmon/adt7470.c +++ b/drivers/hwmon/adt7470.c @@ -196,7 +196,7 @@ static struct i2c_driver adt7470_driver = { .remove = adt7470_remove, .id_table = adt7470_id, .detect = adt7470_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/adt7473.c b/drivers/hwmon/adt7473.c index 78fadaa431e2..1535733ddf19 100644 --- a/drivers/hwmon/adt7473.c +++ b/drivers/hwmon/adt7473.c @@ -184,7 +184,7 @@ static struct i2c_driver adt7473_driver = { .remove = adt7473_remove, .id_table = adt7473_id, .detect = adt7473_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/adt7475.c b/drivers/hwmon/adt7475.c index 3c0571551a9f..1fb8940428c6 100644 --- a/drivers/hwmon/adt7475.c +++ b/drivers/hwmon/adt7475.c @@ -1412,7 +1412,7 @@ static struct i2c_driver adt7475_driver = { .remove = adt7475_remove, .id_table = adt7475_id, .detect = adt7475_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static void adt7475_read_hystersis(struct i2c_client *client) diff --git a/drivers/hwmon/asb100.c b/drivers/hwmon/asb100.c index 507e116d5456..a92512a4a366 100644 --- a/drivers/hwmon/asb100.c +++ b/drivers/hwmon/asb100.c @@ -230,7 +230,7 @@ static struct i2c_driver asb100_driver = { .remove = asb100_remove, .id_table = asb100_id, .detect = asb100_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* 7 Voltages */ diff --git a/drivers/hwmon/atxp1.c b/drivers/hwmon/atxp1.c index 6b7459745b66..b0c3051d8a4f 100644 --- a/drivers/hwmon/atxp1.c +++ b/drivers/hwmon/atxp1.c @@ -67,7 +67,7 @@ static struct i2c_driver atxp1_driver = { .remove = atxp1_remove, .id_table = atxp1_id, .detect = atxp1_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; struct atxp1_data { diff --git a/drivers/hwmon/dme1737.c b/drivers/hwmon/dme1737.c index 7024617c1194..a3af09f9dbad 100644 --- a/drivers/hwmon/dme1737.c +++ b/drivers/hwmon/dme1737.c @@ -2318,7 +2318,7 @@ static struct i2c_driver dme1737_i2c_driver = { .remove = dme1737_i2c_remove, .id_table = dme1737_id, .detect = dme1737_i2c_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* --------------------------------------------------------------------- diff --git a/drivers/hwmon/ds1621.c b/drivers/hwmon/ds1621.c index 5fde2f5139fd..dfa4329090d4 100644 --- a/drivers/hwmon/ds1621.c +++ b/drivers/hwmon/ds1621.c @@ -321,7 +321,7 @@ static struct i2c_driver ds1621_driver = { .remove = ds1621_remove, .id_table = ds1621_id, .detect = ds1621_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init ds1621_init(void) diff --git a/drivers/hwmon/f75375s.c b/drivers/hwmon/f75375s.c index 2ffcf56b6f4e..f8992c935666 100644 --- a/drivers/hwmon/f75375s.c +++ b/drivers/hwmon/f75375s.c @@ -135,7 +135,7 @@ static struct i2c_driver f75375_driver = { .remove = f75375_remove, .id_table = f75375_id, .detect = f75375_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static inline int f75375_read8(struct i2c_client *client, u8 reg) diff --git a/drivers/hwmon/fschmd.c b/drivers/hwmon/fschmd.c index bce18e0f1d61..4eebbbeba2d1 100644 --- a/drivers/hwmon/fschmd.c +++ b/drivers/hwmon/fschmd.c @@ -251,7 +251,7 @@ static struct i2c_driver fschmd_driver = { .remove = fschmd_remove, .id_table = fschmd_id, .detect = fschmd_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/gl518sm.c b/drivers/hwmon/gl518sm.c index 34f83c6a3f45..e9407acd72cb 100644 --- a/drivers/hwmon/gl518sm.c +++ b/drivers/hwmon/gl518sm.c @@ -162,7 +162,7 @@ static struct i2c_driver gl518_driver = { .remove = gl518_remove, .id_table = gl518_id, .detect = gl518_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/gl520sm.c b/drivers/hwmon/gl520sm.c index d03ba692fc42..c0ec8c28731e 100644 --- a/drivers/hwmon/gl520sm.c +++ b/drivers/hwmon/gl520sm.c @@ -104,7 +104,7 @@ static struct i2c_driver gl520_driver = { .remove = gl520_remove, .id_table = gl520_id, .detect = gl520_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* Client data */ diff --git a/drivers/hwmon/lm63.c b/drivers/hwmon/lm63.c index 26844fc4a66d..1426a455071c 100644 --- a/drivers/hwmon/lm63.c +++ b/drivers/hwmon/lm63.c @@ -156,7 +156,7 @@ static struct i2c_driver lm63_driver = { .remove = lm63_remove, .id_table = lm63_id, .detect = lm63_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/lm73.c b/drivers/hwmon/lm73.c index e610da9bd80c..fb6ab9a9a608 100644 --- a/drivers/hwmon/lm73.c +++ b/drivers/hwmon/lm73.c @@ -182,7 +182,7 @@ static struct i2c_driver lm73_driver = { .remove = lm73_remove, .id_table = lm73_ids, .detect = lm73_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* module glue */ diff --git a/drivers/hwmon/lm75.c b/drivers/hwmon/lm75.c index 8fd759d28ddf..ce2423cd8198 100644 --- a/drivers/hwmon/lm75.c +++ b/drivers/hwmon/lm75.c @@ -295,7 +295,7 @@ static struct i2c_driver lm75_driver = { .remove = lm75_remove, .id_table = lm75_ids, .detect = lm75_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /*-----------------------------------------------------------------------*/ diff --git a/drivers/hwmon/lm77.c b/drivers/hwmon/lm77.c index 6373ab2cd952..b6105e570b0e 100644 --- a/drivers/hwmon/lm77.c +++ b/drivers/hwmon/lm77.c @@ -91,7 +91,7 @@ static struct i2c_driver lm77_driver = { .remove = lm77_remove, .id_table = lm77_id, .detect = lm77_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* straight from the datasheet */ diff --git a/drivers/hwmon/lm78.c b/drivers/hwmon/lm78.c index f58850a9d9e0..cd6a9ea921f6 100644 --- a/drivers/hwmon/lm78.c +++ b/drivers/hwmon/lm78.c @@ -173,7 +173,7 @@ static struct i2c_driver lm78_driver = { .remove = lm78_i2c_remove, .id_table = lm78_i2c_id, .detect = lm78_i2c_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static struct platform_driver lm78_isa_driver = { diff --git a/drivers/hwmon/lm80.c b/drivers/hwmon/lm80.c index e3222f3d4a5e..1cf5ff5bfa43 100644 --- a/drivers/hwmon/lm80.c +++ b/drivers/hwmon/lm80.c @@ -159,7 +159,7 @@ static struct i2c_driver lm80_driver = { .remove = lm80_remove, .id_table = lm80_id, .detect = lm80_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/lm83.c b/drivers/hwmon/lm83.c index bfb7477cb6b3..b80ae182f851 100644 --- a/drivers/hwmon/lm83.c +++ b/drivers/hwmon/lm83.c @@ -145,7 +145,7 @@ static struct i2c_driver lm83_driver = { .remove = lm83_remove, .id_table = lm83_id, .detect = lm83_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/lm85.c b/drivers/hwmon/lm85.c index f5fc45ac6fe5..d29bd34a265e 100644 --- a/drivers/hwmon/lm85.c +++ b/drivers/hwmon/lm85.c @@ -356,7 +356,7 @@ static struct i2c_driver lm85_driver = { .remove = lm85_remove, .id_table = lm85_id, .detect = lm85_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; diff --git a/drivers/hwmon/lm87.c b/drivers/hwmon/lm87.c index d545b8596b05..60d34bc578cb 100644 --- a/drivers/hwmon/lm87.c +++ b/drivers/hwmon/lm87.c @@ -184,7 +184,7 @@ static struct i2c_driver lm87_driver = { .remove = lm87_remove, .id_table = lm87_id, .detect = lm87_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c index 3acbacadac77..3e916ac97ea8 100644 --- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -191,7 +191,7 @@ static struct i2c_driver lm90_driver = { .remove = lm90_remove, .id_table = lm90_id, .detect = lm90_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/lm92.c b/drivers/hwmon/lm92.c index da354222468d..b582b3b7fdee 100644 --- a/drivers/hwmon/lm92.c +++ b/drivers/hwmon/lm92.c @@ -416,7 +416,7 @@ static struct i2c_driver lm92_driver = { .remove = lm92_remove, .id_table = lm92_id, .detect = lm92_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init sensors_lm92_init(void) diff --git a/drivers/hwmon/lm93.c b/drivers/hwmon/lm93.c index 7b9c97b72eb8..d160dfdb513f 100644 --- a/drivers/hwmon/lm93.c +++ b/drivers/hwmon/lm93.c @@ -2616,7 +2616,7 @@ static struct i2c_driver lm93_driver = { .remove = lm93_remove, .id_table = lm93_id, .detect = lm93_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init lm93_init(void) diff --git a/drivers/hwmon/lm95241.c b/drivers/hwmon/lm95241.c index 05ede4137e22..55e3bfd49706 100644 --- a/drivers/hwmon/lm95241.c +++ b/drivers/hwmon/lm95241.c @@ -460,7 +460,7 @@ static struct i2c_driver lm95241_driver = { .remove = lm95241_remove, .id_table = lm95241_id, .detect = lm95241_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init sensors_lm95241_init(void) diff --git a/drivers/hwmon/max1619.c b/drivers/hwmon/max1619.c index 4baf94efd372..94cea29f157b 100644 --- a/drivers/hwmon/max1619.c +++ b/drivers/hwmon/max1619.c @@ -113,7 +113,7 @@ static struct i2c_driver max1619_driver = { .remove = max1619_remove, .id_table = max1619_id, .detect = max1619_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/max6650.c b/drivers/hwmon/max6650.c index fd5d1acfcc95..c7c126cf22dd 100644 --- a/drivers/hwmon/max6650.c +++ b/drivers/hwmon/max6650.c @@ -141,7 +141,7 @@ static struct i2c_driver max6650_driver = { .remove = max6650_remove, .id_table = max6650_id, .detect = max6650_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/pcf8591.c b/drivers/hwmon/pcf8591.c index 4355aada01f2..c19e61bd393c 100644 --- a/drivers/hwmon/pcf8591.c +++ b/drivers/hwmon/pcf8591.c @@ -299,7 +299,7 @@ static struct i2c_driver pcf8591_driver = { .class = I2C_CLASS_HWMON, /* Nearest choice */ .detect = pcf8591_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init pcf8591_init(void) diff --git a/drivers/hwmon/smsc47m192.c b/drivers/hwmon/smsc47m192.c index 1683fc76759f..34df2e2ee28a 100644 --- a/drivers/hwmon/smsc47m192.c +++ b/drivers/hwmon/smsc47m192.c @@ -135,7 +135,7 @@ static struct i2c_driver smsc47m192_driver = { .remove = smsc47m192_remove, .id_table = smsc47m192_id, .detect = smsc47m192_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* Voltages */ diff --git a/drivers/hwmon/thmc50.c b/drivers/hwmon/thmc50.c index 02ac0d4323a4..866d66507596 100644 --- a/drivers/hwmon/thmc50.c +++ b/drivers/hwmon/thmc50.c @@ -108,7 +108,7 @@ static struct i2c_driver thmc50_driver = { .remove = thmc50_remove, .id_table = thmc50_id, .detect = thmc50_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static ssize_t show_analog_out(struct device *dev, diff --git a/drivers/hwmon/tmp401.c b/drivers/hwmon/tmp401.c index 7cf1d541a5fa..ed086491cc9a 100644 --- a/drivers/hwmon/tmp401.c +++ b/drivers/hwmon/tmp401.c @@ -123,7 +123,7 @@ static struct i2c_driver tmp401_driver = { .remove = tmp401_remove, .id_table = tmp401_id, .detect = tmp401_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/tmp421.c b/drivers/hwmon/tmp421.c index 34eb34c548ff..018ad028c179 100644 --- a/drivers/hwmon/tmp421.c +++ b/drivers/hwmon/tmp421.c @@ -322,7 +322,7 @@ static struct i2c_driver tmp421_driver = { .remove = tmp421_remove, .id_table = tmp421_id, .detect = tmp421_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init tmp421_init(void) diff --git a/drivers/hwmon/w83781d.c b/drivers/hwmon/w83781d.c index 44704d2dee63..bfaa888f6e47 100644 --- a/drivers/hwmon/w83781d.c +++ b/drivers/hwmon/w83781d.c @@ -1536,7 +1536,7 @@ static struct i2c_driver w83781d_driver = { .remove = w83781d_remove, .id_table = w83781d_ids, .detect = w83781d_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/w83791d.c b/drivers/hwmon/w83791d.c index b3e91b6e651a..e059cf0471b0 100644 --- a/drivers/hwmon/w83791d.c +++ b/drivers/hwmon/w83791d.c @@ -355,7 +355,7 @@ static struct i2c_driver w83791d_driver = { .remove = w83791d_remove, .id_table = w83791d_id, .detect = w83791d_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* following are the sysfs callback functions */ diff --git a/drivers/hwmon/w83792d.c b/drivers/hwmon/w83792d.c index 03b836cdafa6..c6f198a3d924 100644 --- a/drivers/hwmon/w83792d.c +++ b/drivers/hwmon/w83792d.c @@ -328,7 +328,7 @@ static struct i2c_driver w83792d_driver = { .remove = w83792d_remove, .id_table = w83792d_id, .detect = w83792d_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static inline long in_count_from_reg(int nr, struct w83792d_data *data) diff --git a/drivers/hwmon/w83793.c b/drivers/hwmon/w83793.c index acf35e6e2ab9..ed32b18fbc42 100644 --- a/drivers/hwmon/w83793.c +++ b/drivers/hwmon/w83793.c @@ -252,7 +252,7 @@ static struct i2c_driver w83793_driver = { .remove = w83793_remove, .id_table = w83793_id, .detect = w83793_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static ssize_t diff --git a/drivers/hwmon/w83l785ts.c b/drivers/hwmon/w83l785ts.c index ec6e4b7fb741..81c59937cf0d 100644 --- a/drivers/hwmon/w83l785ts.c +++ b/drivers/hwmon/w83l785ts.c @@ -108,7 +108,7 @@ static struct i2c_driver w83l785ts_driver = { .remove = w83l785ts_remove, .id_table = w83l785ts_id, .detect = w83l785ts_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; /* diff --git a/drivers/hwmon/w83l786ng.c b/drivers/hwmon/w83l786ng.c index 12a5fd67bee0..a427347ae82b 100644 --- a/drivers/hwmon/w83l786ng.c +++ b/drivers/hwmon/w83l786ng.c @@ -168,7 +168,7 @@ static struct i2c_driver w83l786ng_driver = { .remove = w83l786ng_remove, .id_table = w83l786ng_id, .detect = w83l786ng_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static u8 diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c index c1047d644d8f..9065c7238b5e 100644 --- a/drivers/i2c/i2c-core.c +++ b/drivers/i2c/i2c-core.c @@ -1214,13 +1214,13 @@ static int i2c_detect_address(struct i2c_client *temp_client, static int i2c_detect(struct i2c_adapter *adapter, struct i2c_driver *driver) { - const struct i2c_client_address_data *address_data; + const unsigned short *address_list; struct i2c_client *temp_client; int i, err = 0; int adap_id = i2c_adapter_id(adapter); - address_data = driver->address_data; - if (!driver->detect || !address_data) + address_list = driver->address_list; + if (!driver->detect || !address_list) return 0; /* Set up a temporary client to help detect callback */ @@ -1235,7 +1235,7 @@ static int i2c_detect(struct i2c_adapter *adapter, struct i2c_driver *driver) /* Stop here if we can't use SMBUS_QUICK */ if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_QUICK)) { - if (address_data->normal_i2c[0] == I2C_CLIENT_END) + if (address_list[0] == I2C_CLIENT_END) goto exit_free; dev_warn(&adapter->dev, "SMBus Quick command not supported, " @@ -1244,11 +1244,10 @@ static int i2c_detect(struct i2c_adapter *adapter, struct i2c_driver *driver) goto exit_free; } - for (i = 0; address_data->normal_i2c[i] != I2C_CLIENT_END; i += 1) { + for (i = 0; address_list[i] != I2C_CLIENT_END; i += 1) { dev_dbg(&adapter->dev, "found normal entry for adapter %d, " - "addr 0x%02x\n", adap_id, - address_data->normal_i2c[i]); - temp_client->addr = address_data->normal_i2c[i]; + "addr 0x%02x\n", adap_id, address_list[i]); + temp_client->addr = address_list[i]; err = i2c_detect_address(temp_client, driver); if (err) goto exit_free; diff --git a/drivers/misc/eeprom/eeprom.c b/drivers/misc/eeprom/eeprom.c index 2c428f464539..3dc5e3db2c12 100644 --- a/drivers/misc/eeprom/eeprom.c +++ b/drivers/misc/eeprom/eeprom.c @@ -232,7 +232,7 @@ static struct i2c_driver eeprom_driver = { .class = I2C_CLASS_DDC | I2C_CLASS_SPD, .detect = eeprom_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static int __init eeprom_init(void) diff --git a/drivers/misc/ics932s401.c b/drivers/misc/ics932s401.c index 75097ec43edd..d8a84718d687 100644 --- a/drivers/misc/ics932s401.c +++ b/drivers/misc/ics932s401.c @@ -125,7 +125,7 @@ static struct i2c_driver ics932s401_driver = { .remove = ics932s401_remove, .id_table = ics932s401_id, .detect = ics932s401_detect, - .address_data = &addr_data, + .address_list = normal_i2c, }; static struct ics932s401_data *ics932s401_update_device(struct device *dev) diff --git a/include/linux/i2c.h b/include/linux/i2c.h index f6f2c080ba67..fb9df1416ad5 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -110,7 +110,7 @@ extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client *client, * @driver: Device driver model driver * @id_table: List of I2C devices supported by this driver * @detect: Callback for device detection - * @address_data: The I2C addresses to probe (for detect) + * @address_list: The I2C addresses to probe (for detect) * @clients: List of detected clients we created (for i2c-core use only) * * The driver.owner field should be set to the module owner of this driver. @@ -162,7 +162,7 @@ struct i2c_driver { /* Device detection callback for automatic device creation */ int (*detect)(struct i2c_client *, struct i2c_board_info *); - const struct i2c_client_address_data *address_data; + const unsigned short *address_list; struct list_head clients; }; #define to_i2c_driver(d) container_of(d, struct i2c_driver, driver) @@ -391,14 +391,6 @@ static inline void i2c_unlock_adapter(struct i2c_adapter *adapter) #define I2C_CLASS_DDC (1<<3) /* DDC bus on graphics adapters */ #define I2C_CLASS_SPD (1<<7) /* SPD EEPROMs and similar */ -/* i2c_client_address_data is the struct for holding default client - * addresses for a driver and for the parameters supplied on the - * command line - */ -struct i2c_client_address_data { - const unsigned short *normal_i2c; -}; - /* Internal numbers to terminate lists */ #define I2C_CLIENT_END 0xfffeU @@ -610,48 +602,34 @@ union i2c_smbus_data { module_param_array(var, short, &var##_num, 0); \ MODULE_PARM_DESC(var, desc) -#define I2C_CLIENT_INSMOD_COMMON \ -static const struct i2c_client_address_data addr_data = { \ - .normal_i2c = normal_i2c, \ -} - /* These are the ones you want to use in your own drivers. Pick the one which matches the number of devices the driver differenciates between. */ -#define I2C_CLIENT_INSMOD \ -I2C_CLIENT_INSMOD_COMMON +#define I2C_CLIENT_INSMOD #define I2C_CLIENT_INSMOD_1(chip1) \ -enum chips { any_chip, chip1 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1 } #define I2C_CLIENT_INSMOD_2(chip1, chip2) \ -enum chips { any_chip, chip1, chip2 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1, chip2 } #define I2C_CLIENT_INSMOD_3(chip1, chip2, chip3) \ -enum chips { any_chip, chip1, chip2, chip3 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1, chip2, chip3 } #define I2C_CLIENT_INSMOD_4(chip1, chip2, chip3, chip4) \ -enum chips { any_chip, chip1, chip2, chip3, chip4 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1, chip2, chip3, chip4 } #define I2C_CLIENT_INSMOD_5(chip1, chip2, chip3, chip4, chip5) \ -enum chips { any_chip, chip1, chip2, chip3, chip4, chip5 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1, chip2, chip3, chip4, chip5 } #define I2C_CLIENT_INSMOD_6(chip1, chip2, chip3, chip4, chip5, chip6) \ -enum chips { any_chip, chip1, chip2, chip3, chip4, chip5, chip6 }; \ -I2C_CLIENT_INSMOD_COMMON +enum chips { any_chip, chip1, chip2, chip3, chip4, chip5, chip6 } #define I2C_CLIENT_INSMOD_7(chip1, chip2, chip3, chip4, chip5, chip6, chip7) \ enum chips { any_chip, chip1, chip2, chip3, chip4, chip5, chip6, \ - chip7 }; \ -I2C_CLIENT_INSMOD_COMMON + chip7 } #define I2C_CLIENT_INSMOD_8(chip1, chip2, chip3, chip4, chip5, chip6, chip7, chip8) \ enum chips { any_chip, chip1, chip2, chip3, chip4, chip5, chip6, \ - chip7, chip8 }; \ -I2C_CLIENT_INSMOD_COMMON + chip7, chip8 } #endif /* __KERNEL__ */ #endif /* _LINUX_I2C_H */ -- cgit v1.2.3 From aaa225e0aa96691964a5f0518c3c16b954d400cb Mon Sep 17 00:00:00 2001 From: Michael Hennerich Date: Wed, 14 Oct 2009 13:43:53 +0000 Subject: Blackfin: punt cache lock documentation The cache lock code was unused and punted, so punt the documentation too. Signed-off-by: Michael Hennerich Signed-off-by: Mike Frysinger --- Documentation/blackfin/00-INDEX | 3 -- Documentation/blackfin/cache-lock.txt | 48 -------------------------------- Documentation/blackfin/cachefeatures.txt | 10 ------- 3 files changed, 61 deletions(-) delete mode 100644 Documentation/blackfin/cache-lock.txt (limited to 'Documentation') diff --git a/Documentation/blackfin/00-INDEX b/Documentation/blackfin/00-INDEX index d6840a91e1e1..c34e12440fec 100644 --- a/Documentation/blackfin/00-INDEX +++ b/Documentation/blackfin/00-INDEX @@ -1,9 +1,6 @@ 00-INDEX - This file -cache-lock.txt - - HOWTO for blackfin cache locking. - cachefeatures.txt - Supported cache features. diff --git a/Documentation/blackfin/cache-lock.txt b/Documentation/blackfin/cache-lock.txt deleted file mode 100644 index 88ba1e6c31c3..000000000000 --- a/Documentation/blackfin/cache-lock.txt +++ /dev/null @@ -1,48 +0,0 @@ -/* - * File: Documentation/blackfin/cache-lock.txt - * Based on: - * Author: - * - * Created: - * Description: This file contains the simple DMA Implementation for Blackfin - * - * Rev: $Id: cache-lock.txt 2384 2006-11-01 04:12:43Z magicyang $ - * - * Modified: - * Copyright 2004-2006 Analog Devices Inc. - * - * Bugs: Enter bugs at http://blackfin.uclinux.org/ - * - */ - -How to lock your code in cache in uClinux/blackfin --------------------------------------------------- - -There are only a few steps required to lock your code into the cache. -Currently you can lock the code by Way. - -Below are the interface provided for locking the cache. - - -1. cache_grab_lock(int Ways); - -This function grab the lock for locking your code into the cache specified -by Ways. - - -2. cache_lock(int Ways); - -This function should be called after your critical code has been executed. -Once the critical code exits, the code is now loaded into the cache. This -function locks the code into the cache. - - -So, the example sequence will be: - - cache_grab_lock(WAY0_L); /* Grab the lock */ - - critical_code(); /* Execute the code of interest */ - - cache_lock(WAY0_L); /* Lock the cache */ - -Where WAY0_L signifies WAY0 locking. diff --git a/Documentation/blackfin/cachefeatures.txt b/Documentation/blackfin/cachefeatures.txt index 0fbec23becb5..75de51f94515 100644 --- a/Documentation/blackfin/cachefeatures.txt +++ b/Documentation/blackfin/cachefeatures.txt @@ -41,16 +41,6 @@ icplb_flush(); dcplb_flush(); - - Locking the cache. - - cache_grab_lock(); - cache_lock(); - - Please refer linux-2.6.x/Documentation/blackfin/cache-lock.txt for how to - lock the cache. - - Locking the cache is optional feature. - - Miscellaneous cache functions. flush_cache_all(); -- cgit v1.2.3 From 4b60779d5ea76908c3bc82d93280b733335fce48 Mon Sep 17 00:00:00 2001 From: Mike Frysinger Date: Wed, 21 Oct 2009 21:22:35 +0000 Subject: Blackfin: add an example showing how to use the gptimers API Signed-off-by: Mike Frysinger --- Documentation/blackfin/Makefile | 6 +++ Documentation/blackfin/gptimers-example.c | 83 +++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 Documentation/blackfin/Makefile create mode 100644 Documentation/blackfin/gptimers-example.c (limited to 'Documentation') diff --git a/Documentation/blackfin/Makefile b/Documentation/blackfin/Makefile new file mode 100644 index 000000000000..773dbb103f1c --- /dev/null +++ b/Documentation/blackfin/Makefile @@ -0,0 +1,6 @@ +obj-m := gptimers-example.o + +all: modules + +modules clean: + $(MAKE) -C ../.. SUBDIRS=$(PWD) $@ diff --git a/Documentation/blackfin/gptimers-example.c b/Documentation/blackfin/gptimers-example.c new file mode 100644 index 000000000000..b1bd6340e748 --- /dev/null +++ b/Documentation/blackfin/gptimers-example.c @@ -0,0 +1,83 @@ +/* + * Simple gptimers example + * http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:drivers:gptimers + * + * Copyright 2007-2009 Analog Devices Inc. + * + * Licensed under the GPL-2 or later. + */ + +#include +#include + +#include +#include + +/* ... random driver includes ... */ + +#define DRIVER_NAME "gptimer_example" + +struct gptimer_data { + uint32_t period, width; +}; +static struct gptimer_data data; + +/* ... random driver state ... */ + +static irqreturn_t gptimer_example_irq(int irq, void *dev_id) +{ + struct gptimer_data *data = dev_id; + + /* make sure it was our timer which caused the interrupt */ + if (!get_gptimer_intr(TIMER5_id)) + return IRQ_NONE; + + /* read the width/period values that were captured for the waveform */ + data->width = get_gptimer_pwidth(TIMER5_id); + data->period = get_gptimer_period(TIMER5_id); + + /* acknowledge the interrupt */ + clear_gptimer_intr(TIMER5_id); + + /* tell the upper layers we took care of things */ + return IRQ_HANDLED; +} + +/* ... random driver code ... */ + +static int __init gptimer_example_init(void) +{ + int ret; + + /* grab the peripheral pins */ + ret = peripheral_request(P_TMR5, DRIVER_NAME); + if (ret) { + printk(KERN_NOTICE DRIVER_NAME ": peripheral request failed\n"); + return ret; + } + + /* grab the IRQ for the timer */ + ret = request_irq(IRQ_TIMER5, gptimer_example_irq, IRQF_SHARED, DRIVER_NAME, &data); + if (ret) { + printk(KERN_NOTICE DRIVER_NAME ": IRQ request failed\n"); + peripheral_free(P_TMR5); + return ret; + } + + /* setup the timer and enable it */ + set_gptimer_config(TIMER5_id, WDTH_CAP | PULSE_HI | PERIOD_CNT | IRQ_ENA); + enable_gptimers(TIMER5bit); + + return 0; +} +module_init(gptimer_example_init); + +static void __exit gptimer_example_exit(void) +{ + disable_gptimers(TIMER5bit); + free_irq(IRQ_TIMER5, &data); + peripheral_free(P_TMR5); +} +module_exit(gptimer_example_exit); + +MODULE_LICENSE("BSD"); -- cgit v1.2.3 From 3428838d8e88fb766143c913cf3e777b000f8e07 Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Mon, 14 Dec 2009 17:57:48 -0800 Subject: page-types: constify read only arrays Signed-off-by: Tommi Rantala Cc: Randy Dunlap Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index ea44ea502da1..59a4614f4b90 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -100,7 +100,7 @@ #define BIT(name) (1ULL << KPF_##name) #define BITS_COMPOUND (BIT(COMPOUND_HEAD) | BIT(COMPOUND_TAIL)) -static char *page_flag_names[] = { +static const char *page_flag_names[] = { [KPF_LOCKED] = "L:locked", [KPF_ERROR] = "E:error", [KPF_REFERENCED] = "R:referenced", @@ -173,7 +173,7 @@ static int kpageflags_fd; static int opt_hwpoison; static int opt_unpoison; -static char *hwpoison_debug_fs = "/debug/hwpoison"; +static const char hwpoison_debug_fs[] = "/debug/hwpoison"; static int hwpoison_inject_fd; static int hwpoison_forget_fd; @@ -885,7 +885,7 @@ static void parse_bits_mask(const char *optarg) } -static struct option opts[] = { +static const struct option opts[] = { { "raw" , 0, NULL, 'r' }, { "pid" , 1, NULL, 'p' }, { "file" , 1, NULL, 'f' }, -- cgit v1.2.3 From f1327bf18c8737aeb85226f02f102dfa065fddde Mon Sep 17 00:00:00 2001 From: Roel Kluin Date: Mon, 14 Dec 2009 17:57:49 -0800 Subject: page-types: unsigned cannot be less than 0 in add_page() If not signed, testing of the read() return value in this function will not work. Signed-off-by: Roel Kluin Cc: Wu Fengguang Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 59a4614f4b90..2279b616a7ac 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -560,7 +560,7 @@ static void walk_pfn(unsigned long voffset, { uint64_t buf[KPAGEFLAGS_BATCH]; unsigned long batch; - unsigned long pages; + long pages; unsigned long i; while (count) { -- cgit v1.2.3 From dcfe730c60030fb76e2794a8960c6bd84c6c6163 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Mon, 14 Dec 2009 17:57:52 -0800 Subject: page-types: learn to describe flags directly from command line Teach page-types to describe page flags directly from the command line. Why is this useful? For instance, if you're using memory hotplug and see this in /var/log/messages: kernel: removing from LRU failed 3836dd0/1/1e00000000000010 It would be nice to decode those page flags without staring at the source. Example usage and output: # Documentation/vm/page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty # Documentation/vm/page-types -d anon 0x0000000000001000 ____________a_____________________ anonymous # Documentation/vm/page-types -d anon,0x10 0x0000000000001010 ____D_______a_____________________ dirty,anonymous [achiang@hp.com: documentation] Signed-off-by: Alex Chiang Signed-off-by: Wu Fengguang Cc: Andi Kleen Cc: Haicheng Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 2279b616a7ac..8c28be6a47cb 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -674,6 +674,7 @@ static void usage(void) printf( "page-types [options]\n" " -r|--raw Raw mode, for kernel developers\n" +" -d|--describe flags Describe flags\n" " -a|--addr addr-spec Walk a range of pages\n" " -b|--bits bits-spec Walk pages with specified bits\n" " -p|--pid pid Walk process address space\n" @@ -686,6 +687,10 @@ static void usage(void) " -X|--hwpoison hwpoison pages\n" " -x|--unpoison unpoison pages\n" " -h|--help Show this usage message\n" +"flags:\n" +" 0x10 bitfield format, e.g.\n" +" anon bit-name, e.g.\n" +" 0x10,anon comma-separated list, e.g.\n" "addr-spec:\n" " N one page at offset N (unit: pages)\n" " N+M pages range from N to N+M-1\n" @@ -884,6 +889,15 @@ static void parse_bits_mask(const char *optarg) add_bits_filter(mask, bits); } +static void describe_flags(const char *optarg) +{ + uint64_t flags = parse_flag_names(optarg, 0); + + printf("0x%016llx\t%s\t%s\n", + (unsigned long long)flags, + page_flag_name(flags), + page_flag_longname(flags)); +} static const struct option opts[] = { { "raw" , 0, NULL, 'r' }, @@ -891,6 +905,7 @@ static const struct option opts[] = { { "file" , 1, NULL, 'f' }, { "addr" , 1, NULL, 'a' }, { "bits" , 1, NULL, 'b' }, + { "describe" , 1, NULL, 'd' }, { "list" , 0, NULL, 'l' }, { "list-each" , 0, NULL, 'L' }, { "no-summary", 0, NULL, 'N' }, @@ -907,7 +922,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); while ((c = getopt_long(argc, argv, - "rp:f:a:b:lLNXxh", opts, NULL)) != -1) { + "rp:f:a:b:d:lLNXxh", opts, NULL)) != -1) { switch (c) { case 'r': opt_raw = 1; @@ -924,6 +939,10 @@ int main(int argc, char *argv[]) case 'b': parse_bits_mask(optarg); break; + case 'd': + opt_no_summary = 1; + describe_flags(optarg); + break; case 'l': opt_list = 1; break; -- cgit v1.2.3 From 9fdcd886ab407dbe3f854579417e6b036222a468 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Mon, 14 Dec 2009 17:57:53 -0800 Subject: page-types: whitespace alignment Align the output when page-type -h is invoked. Signed-off-by: Alex Chiang Acked-by: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 46 +++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 8c28be6a47cb..19ca23c773a6 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -673,35 +673,35 @@ static void usage(void) printf( "page-types [options]\n" -" -r|--raw Raw mode, for kernel developers\n" +" -r|--raw Raw mode, for kernel developers\n" " -d|--describe flags Describe flags\n" -" -a|--addr addr-spec Walk a range of pages\n" -" -b|--bits bits-spec Walk pages with specified bits\n" -" -p|--pid pid Walk process address space\n" +" -a|--addr addr-spec Walk a range of pages\n" +" -b|--bits bits-spec Walk pages with specified bits\n" +" -p|--pid pid Walk process address space\n" #if 0 /* planned features */ -" -f|--file filename Walk file address space\n" +" -f|--file filename Walk file address space\n" #endif -" -l|--list Show page details in ranges\n" -" -L|--list-each Show page details one by one\n" -" -N|--no-summary Don't show summay info\n" -" -X|--hwpoison hwpoison pages\n" -" -x|--unpoison unpoison pages\n" -" -h|--help Show this usage message\n" +" -l|--list Show page details in ranges\n" +" -L|--list-each Show page details one by one\n" +" -N|--no-summary Don't show summay info\n" +" -X|--hwpoison hwpoison pages\n" +" -x|--unpoison unpoison pages\n" +" -h|--help Show this usage message\n" "flags:\n" -" 0x10 bitfield format, e.g.\n" -" anon bit-name, e.g.\n" -" 0x10,anon comma-separated list, e.g.\n" +" 0x10 bitfield format, e.g.\n" +" anon bit-name, e.g.\n" +" 0x10,anon comma-separated list, e.g.\n" "addr-spec:\n" -" N one page at offset N (unit: pages)\n" -" N+M pages range from N to N+M-1\n" -" N,M pages range from N to M-1\n" -" N, pages range from N to end\n" -" ,M pages range from 0 to M-1\n" +" N one page at offset N (unit: pages)\n" +" N+M pages range from N to N+M-1\n" +" N,M pages range from N to M-1\n" +" N, pages range from N to end\n" +" ,M pages range from 0 to M-1\n" "bits-spec:\n" -" bit1,bit2 (flags & (bit1|bit2)) != 0\n" -" bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n" -" bit1,~bit2 (flags & (bit1|bit2)) == bit1\n" -" =bit1,bit2 flags == (bit1|bit2)\n" +" bit1,bit2 (flags & (bit1|bit2)) != 0\n" +" bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n" +" bit1,~bit2 (flags & (bit1|bit2)) == bit1\n" +" =bit1,bit2 flags == (bit1|bit2)\n" "bit-names:\n" ); -- cgit v1.2.3 From bb86a7338b7a864c03e1736a8d370b10254b0300 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Mon, 14 Dec 2009 17:57:54 -0800 Subject: page-types: exit early when invoked with -d|--describe On a system with large amount of memory (256GB), invoking page-types can take quite a long time, which is unreasonable considering the user only wants a description of the flags: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m34.285s user 0m1.966s sys 0m32.313s This is because we still walk the entire address range. Exiting early seems like a reasonble solution: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m0.007s user 0m0.001s sys 0m0.005s Signed-off-by: Alex Chiang Cc: Andi Kleen Cc: Haicheng Li Acked-by: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 19ca23c773a6..7a7d9bab32ef 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -940,9 +940,8 @@ int main(int argc, char *argv[]) parse_bits_mask(optarg); break; case 'd': - opt_no_summary = 1; describe_flags(optarg); - break; + exit(0); case 'l': opt_list = 1; break; -- cgit v1.2.3 From 267b4c281b4a43c8f3d965c791d3a7fd62448733 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 14 Dec 2009 17:58:30 -0800 Subject: hugetlb: update hugetlb documentation for NUMA controls Update the kernel huge tlb documentation to describe the numa memory policy based huge page management. Additionaly, the patch includes a fair amount of rework to improve consistency, eliminate duplication and set the context for documenting the memory policy interaction. Signed-off-by: Lee Schermerhorn Acked-by: David Rientjes Acked-by: Mel Gorman Reviewed-by: Andi Kleen Cc: KAMEZAWA Hiroyuki Cc: Lee Schermerhorn Cc: Randy Dunlap Cc: Nishanth Aravamudan Cc: Adam Litke Cc: Andy Whitcroft Cc: Eric Whitney Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/hugetlbpage.txt | 261 ++++++++++++++++++++++++++------------- 1 file changed, 176 insertions(+), 85 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index 82a7bd1800b2..01c3108d2e31 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -11,23 +11,21 @@ This optimization is more critical now as bigger and bigger physical memories (several GBs) are more readily available. Users can use the huge page support in Linux kernel by either using the mmap -system call or standard SYSv shared memory system calls (shmget, shmat). +system call or standard SYSV shared memory system calls (shmget, shmat). First the Linux kernel needs to be built with the CONFIG_HUGETLBFS (present under "File systems") and CONFIG_HUGETLB_PAGE (selected automatically when CONFIG_HUGETLBFS is selected) configuration options. -The kernel built with huge page support should show the number of configured -huge pages in the system by running the "cat /proc/meminfo" command. +The /proc/meminfo file provides information about the total number of +persistent hugetlb pages in the kernel's huge page pool. It also displays +information about the number of free, reserved and surplus huge pages and the +default huge page size. The huge page size is needed for generating the +proper alignment and size of the arguments to system calls that map huge page +regions. -/proc/meminfo also provides information about the total number of hugetlb -pages configured in the kernel. It also displays information about the -number of free hugetlb pages at any time. It also displays information about -the configured huge page size - this is needed for generating the proper -alignment and size of the arguments to the above system calls. - -The output of "cat /proc/meminfo" will have lines like: +The output of "cat /proc/meminfo" will include lines like: ..... HugePages_Total: vvv @@ -53,59 +51,63 @@ HugePages_Surp is short for "surplus," and is the number of huge pages in /proc/filesystems should also show a filesystem of type "hugetlbfs" configured in the kernel. -/proc/sys/vm/nr_hugepages indicates the current number of configured hugetlb -pages in the kernel. Super user can dynamically request more (or free some -pre-configured) huge pages. -The allocation (or deallocation) of hugetlb pages is possible only if there are -enough physically contiguous free pages in system (freeing of huge pages is -possible only if there are enough hugetlb pages free that can be transferred -back to regular memory pool). +/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge +pages in the kernel's huge page pool. "Persistent" huge pages will be +returned to the huge page pool when freed by a task. A user with root +privileges can dynamically allocate more or free some persistent huge pages +by increasing or decreasing the value of 'nr_hugepages'. -Pages that are used as hugetlb pages are reserved inside the kernel and cannot -be used for other purposes. +Pages that are used as huge pages are reserved inside the kernel and cannot +be used for other purposes. Huge pages cannot be swapped out under +memory pressure. -Once the kernel with Hugetlb page support is built and running, a user can -use either the mmap system call or shared memory system calls to start using -the huge pages. It is required that the system administrator preallocate -enough memory for huge page purposes. +Once a number of huge pages have been pre-allocated to the kernel huge page +pool, a user with appropriate privilege can use either the mmap system call +or shared memory system calls to use the huge pages. See the discussion of +Using Huge Pages, below. -The administrator can preallocate huge pages on the kernel boot command line by -specifying the "hugepages=N" parameter, where 'N' = the number of huge pages -requested. This is the most reliable method for preallocating huge pages as -memory has not yet become fragmented. +The administrator can allocate persistent huge pages on the kernel boot +command line by specifying the "hugepages=N" parameter, where 'N' = the +number of huge pages requested. This is the most reliable method of +allocating huge pages as memory has not yet become fragmented. -Some platforms support multiple huge page sizes. To preallocate huge pages +Some platforms support multiple huge page sizes. To allocate huge pages of a specific size, one must preceed the huge pages boot command parameters with a huge page size selection parameter "hugepagesz=". must be specified in bytes with optional scale suffix [kKmMgG]. The default huge page size may be selected with the "default_hugepagesz=" boot parameter. -/proc/sys/vm/nr_hugepages indicates the current number of configured [default -size] hugetlb pages in the kernel. Super user can dynamically request more -(or free some pre-configured) huge pages. - -Use the following command to dynamically allocate/deallocate default sized -huge pages: +When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages +indicates the current number of pre-allocated huge pages of the default size. +Thus, one can use the following command to dynamically allocate/deallocate +default sized persistent huge pages: echo 20 > /proc/sys/vm/nr_hugepages -This command will try to configure 20 default sized huge pages in the system. +This command will try to adjust the number of default sized huge pages in the +huge page pool to 20, allocating or freeing huge pages, as required. + On a NUMA platform, the kernel will attempt to distribute the huge page pool -over the all on-line nodes. These huge pages, allocated when nr_hugepages -is increased, are called "persistent huge pages". +over all the set of allowed nodes specified by the NUMA memory policy of the +task that modifies nr_hugepages. The default for the allowed nodes--when the +task has default memory policy--is all on-line nodes. Allowed nodes with +insufficient available, contiguous memory for a huge page will be silently +skipped when allocating persistent huge pages. See the discussion below of +the interaction of task memory policy, cpusets and per node attributes with +the allocation and freeing of persistent huge pages. The success or failure of huge page allocation depends on the amount of -physically contiguous memory that is preset in system at the time of the +physically contiguous memory that is present in system at the time of the allocation attempt. If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any. -System administrators may want to put this command in one of the local rc init -files. This will enable the kernel to request huge pages early in the boot -process when the possibility of getting physical contiguous pages is still -very high. Administrators can verify the number of huge pages actually -allocated by checking the sysctl or meminfo. To check the per node +System administrators may want to put this command in one of the local rc +init files. This will enable the kernel to allocate huge pages early in +the boot process when the possibility of getting physical contiguous pages +is still very high. Administrators can verify the number of huge pages +actually allocated by checking the sysctl or meminfo. To check the per node distribution of huge pages in a NUMA system, use: cat /sys/devices/system/node/node*/meminfo | fgrep Huge @@ -113,45 +115,47 @@ distribution of huge pages in a NUMA system, use: /proc/sys/vm/nr_overcommit_hugepages specifies how large the pool of huge pages can grow, if more huge pages than /proc/sys/vm/nr_hugepages are requested by applications. Writing any non-zero value into this file -indicates that the hugetlb subsystem is allowed to try to obtain "surplus" -huge pages from the buddy allocator, when the normal pool is exhausted. As -these surplus huge pages go out of use, they are freed back to the buddy -allocator. +indicates that the hugetlb subsystem is allowed to try to obtain that +number of "surplus" huge pages from the kernel's normal page pool, when the +persistent huge page pool is exhausted. As these surplus huge pages become +unused, they are freed back to the kernel's normal page pool. -When increasing the huge page pool size via nr_hugepages, any surplus +When increasing the huge page pool size via nr_hugepages, any existing surplus pages will first be promoted to persistent huge pages. Then, additional huge pages will be allocated, if necessary and if possible, to fulfill -the new huge page pool size. +the new persistent huge page pool size. -The administrator may shrink the pool of preallocated huge pages for +The administrator may shrink the pool of persistent huge pages for the default huge page size by setting the nr_hugepages sysctl to a smaller value. The kernel will attempt to balance the freeing of huge pages -across all on-line nodes. Any free huge pages on the selected nodes will -be freed back to the buddy allocator. - -Caveat: Shrinking the pool via nr_hugepages such that it becomes less -than the number of huge pages in use will convert the balance to surplus -huge pages even if it would exceed the overcommit value. As long as -this condition holds, however, no more surplus huge pages will be -allowed on the system until one of the two sysctls are increased -sufficiently, or the surplus huge pages go out of use and are freed. +across all nodes in the memory policy of the task modifying nr_hugepages. +Any free huge pages on the selected nodes will be freed back to the kernel's +normal page pool. + +Caveat: Shrinking the persistent huge page pool via nr_hugepages such that +it becomes less than the number of huge pages in use will convert the balance +of the in-use huge pages to surplus huge pages. This will occur even if +the number of surplus pages it would exceed the overcommit value. As long as +this condition holds--that is, until nr_hugepages+nr_overcommit_hugepages is +increased sufficiently, or the surplus huge pages go out of use and are freed-- +no more surplus huge pages will be allowed to be allocated. With support for multiple huge page pools at run-time available, much of -the huge page userspace interface has been duplicated in sysfs. The above -information applies to the default huge page size which will be -controlled by the /proc interfaces for backwards compatibility. The root -huge page control directory in sysfs is: +the huge page userspace interface in /proc/sys/vm has been duplicated in sysfs. +The /proc interfaces discussed above have been retained for backwards +compatibility. The root huge page control directory in sysfs is: /sys/kernel/mm/hugepages For each huge page size supported by the running kernel, a subdirectory -will exist, of the form +will exist, of the form: hugepages-${size}kB Inside each of these directories, the same set of files will exist: nr_hugepages + nr_hugepages_mempolicy nr_overcommit_hugepages free_hugepages resv_hugepages @@ -159,6 +163,101 @@ Inside each of these directories, the same set of files will exist: which function as described above for the default huge page-sized case. + +Interaction of Task Memory Policy with Huge Page Allocation/Freeing + +Whether huge pages are allocated and freed via the /proc interface or +the /sysfs interface using the nr_hugepages_mempolicy attribute, the NUMA +nodes from which huge pages are allocated or freed are controlled by the +NUMA memory policy of the task that modifies the nr_hugepages_mempolicy +sysctl or attribute. When the nr_hugepages attribute is used, mempolicy +is ignored. + +The recommended method to allocate or free huge pages to/from the kernel +huge page pool, using the nr_hugepages example above, is: + + numactl --interleave echo 20 \ + >/proc/sys/vm/nr_hugepages_mempolicy + +or, more succinctly: + + numactl -m echo 20 >/proc/sys/vm/nr_hugepages_mempolicy + +This will allocate or free abs(20 - nr_hugepages) to or from the nodes +specified in , depending on whether number of persistent huge pages +is initially less than or greater than 20, respectively. No huge pages will be +allocated nor freed on any node not included in the specified . + +When adjusting the persistent hugepage count via nr_hugepages_mempolicy, any +memory policy mode--bind, preferred, local or interleave--may be used. The +resulting effect on persistent huge page allocation is as follows: + +1) Regardless of mempolicy mode [see Documentation/vm/numa_memory_policy.txt], + persistent huge pages will be distributed across the node or nodes + specified in the mempolicy as if "interleave" had been specified. + However, if a node in the policy does not contain sufficient contiguous + memory for a huge page, the allocation will not "fallback" to the nearest + neighbor node with sufficient contiguous memory. To do this would cause + undesirable imbalance in the distribution of the huge page pool, or + possibly, allocation of persistent huge pages on nodes not allowed by + the task's memory policy. + +2) One or more nodes may be specified with the bind or interleave policy. + If more than one node is specified with the preferred policy, only the + lowest numeric id will be used. Local policy will select the node where + the task is running at the time the nodes_allowed mask is constructed. + For local policy to be deterministic, the task must be bound to a cpu or + cpus in a single node. Otherwise, the task could be migrated to some + other node at any time after launch and the resulting node will be + indeterminate. Thus, local policy is not very useful for this purpose. + Any of the other mempolicy modes may be used to specify a single node. + +3) The nodes allowed mask will be derived from any non-default task mempolicy, + whether this policy was set explicitly by the task itself or one of its + ancestors, such as numactl. This means that if the task is invoked from a + shell with non-default policy, that policy will be used. One can specify a + node list of "all" with numactl --interleave or --membind [-m] to achieve + interleaving over all nodes in the system or cpuset. + +4) Any task mempolicy specifed--e.g., using numactl--will be constrained by + the resource limits of any cpuset in which the task runs. Thus, there will + be no way for a task with non-default policy running in a cpuset with a + subset of the system nodes to allocate huge pages outside the cpuset + without first moving to a cpuset that contains all of the desired nodes. + +5) Boot-time huge page allocation attempts to distribute the requested number + of huge pages over all on-lines nodes. + +Per Node Hugepages Attributes + +A subset of the contents of the root huge page control directory in sysfs, +described above, has been replicated under each "node" system device in: + + /sys/devices/system/node/node[0-9]*/hugepages/ + +Under this directory, the subdirectory for each supported huge page size +contains the following attribute files: + + nr_hugepages + free_hugepages + surplus_hugepages + +The free_' and surplus_' attribute files are read-only. They return the number +of free and surplus [overcommitted] huge pages, respectively, on the parent +node. + +The nr_hugepages attribute returns the total number of huge pages on the +specified node. When this attribute is written, the number of persistent huge +pages on the parent node will be adjusted to the specified value, if sufficient +resources exist, regardless of the task's mempolicy or cpuset constraints. + +Note that the number of overcommit and reserve pages remain global quantities, +as we don't know until fault time, when the faulting task's mempolicy is +applied, from which node the huge page allocation will be attempted. + + +Using Huge Pages + If the user applications are going to request huge pages using mmap system call, then it is required that system administrator mount a file system of type hugetlbfs: @@ -206,9 +305,11 @@ map_hugetlb.c. * requesting huge pages. * * For the ia64 architecture, the Linux kernel reserves Region number 4 for - * huge pages. That means the addresses starting with 0x800000... will need - * to be specified. Specifying a fixed address is not required on ppc64, - * i386 or x86_64. + * huge pages. That means that if one requires a fixed address, a huge page + * aligned address starting with 0x800000... will be required. If a fixed + * address is not required, the kernel will select an address in the proper + * range. + * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. * * Note: The default shared memory limit is quite low on many kernels, * you may need to increase it via: @@ -237,14 +338,8 @@ map_hugetlb.c. #define dprintf(x) printf(x) -/* Only ia64 requires this */ -#ifdef __ia64__ -#define ADDR (void *)(0x8000000000000000UL) -#define SHMAT_FLAGS (SHM_RND) -#else -#define ADDR (void *)(0x0UL) +#define ADDR (void *)(0x0UL) /* let kernel choose address */ #define SHMAT_FLAGS (0) -#endif int main(void) { @@ -302,10 +397,12 @@ int main(void) * example, the app is requesting memory of size 256MB that is backed by * huge pages. * - * For ia64 architecture, Linux kernel reserves Region number 4 for huge pages. - * That means the addresses starting with 0x800000... will need to be - * specified. Specifying a fixed address is not required on ppc64, i386 - * or x86_64. + * For the ia64 architecture, the Linux kernel reserves Region number 4 for + * huge pages. That means that if one requires a fixed address, a huge page + * aligned address starting with 0x800000... will be required. If a fixed + * address is not required, the kernel will select an address in the proper + * range. + * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. */ #include #include @@ -317,14 +414,8 @@ int main(void) #define LENGTH (256UL*1024*1024) #define PROTECTION (PROT_READ | PROT_WRITE) -/* Only ia64 requires this */ -#ifdef __ia64__ -#define ADDR (void *)(0x8000000000000000UL) -#define FLAGS (MAP_SHARED | MAP_FIXED) -#else -#define ADDR (void *)(0x0UL) +#define ADDR (void *)(0x0UL) /* let kernel choose address */ #define FLAGS (MAP_SHARED) -#endif void check_bytes(char *addr) { -- cgit v1.2.3 From 9b5e5d0fdc91b73bba8cf5e0fbe3521a953e4e4d Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 14 Dec 2009 17:58:32 -0800 Subject: hugetlb: use only nodes with memory for huge pages Register per node hstate sysfs attributes only for nodes with memory. Global replacement of 'all online nodes" with "all nodes with memory" in mm/hugetlb.c. Suggested by David Rientjes. A subsequent patch will handle adding/removing of per node hstate sysfs attributes when nodes transition to/from memoryless state via memory hotplug. NOTE: this patch has not been tested with memoryless nodes. Signed-off-by: Lee Schermerhorn Reviewed-by: Andi Kleen Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Randy Dunlap Cc: Nishanth Aravamudan Acked-by: David Rientjes Cc: Adam Litke Cc: Andy Whitcroft Cc: Eric Whitney Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/hugetlbpage.txt | 12 ++++++------ mm/hugetlb.c | 35 ++++++++++++++++++----------------- 2 files changed, 24 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index 01c3108d2e31..6a8e4667ab38 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -90,11 +90,11 @@ huge page pool to 20, allocating or freeing huge pages, as required. On a NUMA platform, the kernel will attempt to distribute the huge page pool over all the set of allowed nodes specified by the NUMA memory policy of the task that modifies nr_hugepages. The default for the allowed nodes--when the -task has default memory policy--is all on-line nodes. Allowed nodes with -insufficient available, contiguous memory for a huge page will be silently -skipped when allocating persistent huge pages. See the discussion below of -the interaction of task memory policy, cpusets and per node attributes with -the allocation and freeing of persistent huge pages. +task has default memory policy--is all on-line nodes with memory. Allowed +nodes with insufficient available, contiguous memory for a huge page will be +silently skipped when allocating persistent huge pages. See the discussion +below of the interaction of task memory policy, cpusets and per node attributes +with the allocation and freeing of persistent huge pages. The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the @@ -226,7 +226,7 @@ resulting effect on persistent huge page allocation is as follows: without first moving to a cpuset that contains all of the desired nodes. 5) Boot-time huge page allocation attempts to distribute the requested number - of huge pages over all on-lines nodes. + of huge pages over all on-lines nodes with memory. Per Node Hugepages Attributes diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 544f7bcb615e..b4a263512cb7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -942,14 +942,14 @@ static void return_unused_surplus_pages(struct hstate *h, /* * We want to release as many surplus pages as possible, spread - * evenly across all nodes. Iterate across all nodes until we - * can no longer free unreserved surplus pages. This occurs when - * the nodes with surplus pages have no free pages. - * free_pool_huge_page() will balance the the frees across the - * on-line nodes for us and will handle the hstate accounting. + * evenly across all nodes with memory. Iterate across these nodes + * until we can no longer free unreserved surplus pages. This occurs + * when the nodes with surplus pages have no free pages. + * free_pool_huge_page() will balance the the freed pages across the + * on-line nodes with memory and will handle the hstate accounting. */ while (nr_pages--) { - if (!free_pool_huge_page(h, &node_online_map, 1)) + if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1)) break; } } @@ -1053,14 +1053,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, int __weak alloc_bootmem_huge_page(struct hstate *h) { struct huge_bootmem_page *m; - int nr_nodes = nodes_weight(node_online_map); + int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]); while (nr_nodes) { void *addr; addr = __alloc_bootmem_node_nopanic( NODE_DATA(hstate_next_node_to_alloc(h, - &node_online_map)), + &node_states[N_HIGH_MEMORY])), huge_page_size(h), huge_page_size(h), 0); if (addr) { @@ -1115,7 +1115,8 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) if (h->order >= MAX_ORDER) { if (!alloc_bootmem_huge_page(h)) break; - } else if (!alloc_fresh_huge_page(h, &node_online_map)) + } else if (!alloc_fresh_huge_page(h, + &node_states[N_HIGH_MEMORY])) break; } h->max_huge_pages = i; @@ -1388,7 +1389,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy, h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed); - if (nodes_allowed != &node_online_map) + if (nodes_allowed != &node_states[N_HIGH_MEMORY]) NODEMASK_FREE(nodes_allowed); return len; @@ -1610,7 +1611,7 @@ void hugetlb_unregister_node(struct node *node) struct node_hstate *nhs = &node_hstates[node->sysdev.id]; if (!nhs->hugepages_kobj) - return; + return; /* no hstate attributes */ for_each_hstate(h) if (nhs->hstate_kobjs[h - hstates]) { @@ -1675,15 +1676,15 @@ void hugetlb_register_node(struct node *node) } /* - * hugetlb init time: register hstate attributes for all registered - * node sysdevs. All on-line nodes should have registered their - * associated sysdev by the time the hugetlb module initializes. + * hugetlb init time: register hstate attributes for all registered node + * sysdevs of nodes that have memory. All on-line nodes should have + * registered their associated sysdev by this time. */ static void hugetlb_register_all_nodes(void) { int nid; - for (nid = 0; nid < nr_node_ids; nid++) { + for_each_node_state(nid, N_HIGH_MEMORY) { struct node *node = &node_devices[nid]; if (node->sysdev.id == nid) hugetlb_register_node(node); @@ -1777,8 +1778,8 @@ void __init hugetlb_add_hstate(unsigned order) h->free_huge_pages = 0; for (i = 0; i < MAX_NUMNODES; ++i) INIT_LIST_HEAD(&h->hugepage_freelists[i]); - h->next_nid_to_alloc = first_node(node_online_map); - h->next_nid_to_free = first_node(node_online_map); + h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]); + h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]); snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); -- cgit v1.2.3 From 4faf8d950ec438c49ae4526b897c30f8a2cad741 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 14 Dec 2009 17:58:35 -0800 Subject: hugetlb: handle memory hot-plug events Register per node hstate attributes only for nodes with memory. As suggested by David Rientjes. With Memory Hotplug, memory can be added to a memoryless node and a node with memory can become memoryless. Therefore, add a memory on/off-line notifier callback to [un]register a node's attributes on transition to/from memoryless state. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn Reviewed-by: Andi Kleen Acked-by: David Rientjes Cc: KAMEZAWA Hiroyuki Cc: Lee Schermerhorn Cc: Mel Gorman Cc: Randy Dunlap Cc: Nishanth Aravamudan Cc: Adam Litke Cc: Andy Whitcroft Cc: Eric Whitney Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/hugetlbpage.txt | 3 ++- drivers/base/node.c | 53 ++++++++++++++++++++++++++++++++++++---- 2 files changed, 50 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index 6a8e4667ab38..bc31636973e3 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt @@ -231,7 +231,8 @@ resulting effect on persistent huge page allocation is as follows: Per Node Hugepages Attributes A subset of the contents of the root huge page control directory in sysfs, -described above, has been replicated under each "node" system device in: +described above, will be replicated under each the system device of each +NUMA node with memory in: /sys/devices/system/node/node[0-9]*/hugepages/ diff --git a/drivers/base/node.c b/drivers/base/node.c index f502711d28db..9e218a6d4a5b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -177,8 +177,8 @@ static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL); /* * hugetlbfs per node attributes registration interface: * When/if hugetlb[fs] subsystem initializes [sometime after this module], - * it will register its per node attributes for all nodes online at that - * time. It will also call register_hugetlbfs_with_node(), below, to + * it will register its per node attributes for all online nodes with + * memory. It will also call register_hugetlbfs_with_node(), below, to * register its attribute registration functions with this node driver. * Once these hooks have been initialized, the node driver will call into * the hugetlb module to [un]register attributes for hot-plugged nodes. @@ -188,7 +188,8 @@ static node_registration_func_t __hugetlb_unregister_node; static inline void hugetlb_register_node(struct node *node) { - if (__hugetlb_register_node) + if (__hugetlb_register_node && + node_state(node->sysdev.id, N_HIGH_MEMORY)) __hugetlb_register_node(node); } @@ -233,6 +234,7 @@ int register_node(struct node *node, int num, struct node *parent) sysdev_create_file(&node->sysdev, &attr_distance); scan_unevictable_register_node(node); + hugetlb_register_node(node); } return error; @@ -254,7 +256,7 @@ void unregister_node(struct node *node) sysdev_remove_file(&node->sysdev, &attr_distance); scan_unevictable_unregister_node(node); - hugetlb_unregister_node(node); + hugetlb_unregister_node(node); /* no-op, if memoryless node */ sysdev_unregister(&node->sysdev); } @@ -384,8 +386,45 @@ static int link_mem_sections(int nid) } return err; } + +/* + * Handle per node hstate attribute [un]registration on transistions + * to/from memoryless state. + */ + +static int node_memory_callback(struct notifier_block *self, + unsigned long action, void *arg) +{ + struct memory_notify *mnb = arg; + int nid = mnb->status_change_nid; + + switch (action) { + case MEM_ONLINE: /* memory successfully brought online */ + if (nid != NUMA_NO_NODE) + hugetlb_register_node(&node_devices[nid]); + break; + case MEM_OFFLINE: /* or offline */ + if (nid != NUMA_NO_NODE) + hugetlb_unregister_node(&node_devices[nid]); + break; + case MEM_GOING_ONLINE: + case MEM_GOING_OFFLINE: + case MEM_CANCEL_ONLINE: + case MEM_CANCEL_OFFLINE: + default: + break; + } + + return NOTIFY_OK; +} #else static int link_mem_sections(int nid) { return 0; } + +static inline int node_memory_callback(struct notifier_block *self, + unsigned long action, void *arg) +{ + return NOTIFY_OK; +} #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ int register_one_node(int nid) @@ -499,13 +538,17 @@ static int node_states_init(void) return err; } +#define NODE_CALLBACK_PRI 2 /* lower than SLAB */ static int __init register_node_type(void) { int ret; ret = sysdev_class_register(&node_class); - if (!ret) + if (!ret) { ret = node_states_init(); + hotplug_memory_notifier(node_memory_callback, + NODE_CALLBACK_PRI); + } /* * Note: we're not going to unregister the node class if we fail -- cgit v1.2.3 From dee5d0d518defd0337a41f1a504428c9acc87be5 Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Mon, 14 Dec 2009 17:59:05 -0800 Subject: mm: add numa node symlink for memory section in sysfs Commit c04fc586c (mm: show node to memory section relationship with symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang Cc: Gary Hade Cc: Badari Pulavarty Cc: Ingo Molnar Acked-by: David Rientjes Cc: Greg KH Cc: Randy Dunlap Cc: David Rientjes Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ABI/testing/sysfs-devices-memory | 14 +++++++++++++- Documentation/memory-hotplug.txt | 11 +++++++---- drivers/base/node.c | 11 ++++++++++- 3 files changed, 30 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index 9fe91c02ee40..bf1627b02a03 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -60,6 +60,19 @@ Description: Users: hotplug memory remove tools https://w3.opensource.ibm.com/projects/powerpc-utils/ + +What: /sys/devices/system/memoryX/nodeY +Date: October 2009 +Contact: Linux Memory Management list +Description: + When CONFIG_NUMA is enabled, a symbolic link that + points to the corresponding NUMA node directory. + + For example, the following symbolic link is created for + memory section 9 on node0: + /sys/devices/system/memory/memory9/node0 -> ../../node/node0 + + What: /sys/devices/system/node/nodeX/memoryY Date: September 2008 Contact: Gary Hade @@ -70,4 +83,3 @@ Description: memory section directory. For example, the following symbolic link is created for memory section 9 on node0. /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 - diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index bbc8a6a36921..57e7e9cc1870 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -160,12 +160,15 @@ Under each section, you can see 4 files. NOTE: These directories/files appear after physical memory hotplug phase. -If CONFIG_NUMA is enabled the -/sys/devices/system/memory/memoryXXX memory section -directories can also be accessed via symbolic links located in -the /sys/devices/system/node/node* directories. For example: +If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed +via symbolic links located in the /sys/devices/system/node/node* directories. + +For example: /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 +A backlink will also be created: +/sys/devices/system/memory/memory9/node0 -> ../../node/node0 + -------------------------------- 4. Physical memory hot-add phase -------------------------------- diff --git a/drivers/base/node.c b/drivers/base/node.c index 54e5d8eaf70e..44eed11bbdf3 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -312,6 +312,7 @@ static int get_nid_for_pfn(unsigned long pfn) /* register memory section under specified node if it spans that node */ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid) { + int ret; unsigned long pfn, sect_start_pfn, sect_end_pfn; if (!mem_blk) @@ -328,9 +329,15 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid) continue; if (page_nid != nid) continue; - return sysfs_create_link_nowarn(&node_devices[nid].sysdev.kobj, + ret = sysfs_create_link_nowarn(&node_devices[nid].sysdev.kobj, &mem_blk->sysdev.kobj, kobject_name(&mem_blk->sysdev.kobj)); + if (ret) + return ret; + + return sysfs_create_link_nowarn(&mem_blk->sysdev.kobj, + &node_devices[nid].sysdev.kobj, + kobject_name(&node_devices[nid].sysdev.kobj)); } /* mem section does not span the specified node */ return 0; @@ -359,6 +366,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk) continue; sysfs_remove_link(&node_devices[nid].sysdev.kobj, kobject_name(&mem_blk->sysdev.kobj)); + sysfs_remove_link(&mem_blk->sysdev.kobj, + kobject_name(&node_devices[nid].sysdev.kobj)); } return 0; } -- cgit v1.2.3 From cba5dd7fa535b7684cba68e17ac8be5b0083dc3d Mon Sep 17 00:00:00 2001 From: Alex Chiang Date: Mon, 14 Dec 2009 17:59:09 -0800 Subject: Documentation: ABI: /sys/devices/system/cpu/cpu#/node Describe NUMA node symlink created for CPUs when CONFIG_NUMA is set. Signed-off-by: Alex Chiang Cc: Greg KH Cc: Randy Dunlap Cc: Gary Hade Cc: Badari Pulavarty Cc: Ingo Molnar Cc: David Rientjes Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ABI/testing/sysfs-devices-system-cpu | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 2aae06fcbed7..84a710f87c64 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -92,6 +92,20 @@ Description: Discover NUMA node a CPU belongs to /sys/devices/system/cpu/cpu42/node2 -> ../../node/node2 +What: /sys/devices/system/cpu/cpu#/node +Date: October 2009 +Contact: Linux memory management mailing list +Description: Discover NUMA node a CPU belongs to + + When CONFIG_NUMA is enabled, a symbolic link that points + to the corresponding NUMA node directory. + + For example, the following symlink is created for cpu42 + in NUMA node 2: + + /sys/devices/system/cpu/cpu42/node2 -> ../../node/node2 + + What: /sys/devices/system/cpu/cpu#/topology/core_id /sys/devices/system/cpu/cpu#/topology/core_siblings /sys/devices/system/cpu/cpu#/topology/core_siblings_list -- cgit v1.2.3 From d0f209f68f80f9a152799760c230019e7f270b2a Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Dec 2009 17:59:34 -0800 Subject: ksm: remove unswappable max_kernel_pages Now that ksm pages are swappable, and the known holes plugged, remove mention of unswappable kernel pages from KSM documentation and comments. Remove the totalram_pages/4 initialization of max_kernel_pages. In fact, remove max_kernel_pages altogether - we can reinstate it if removal turns out to break someone's script; but if we later want to limit KSM's memory usage, limiting the stable nodes would not be an effective approach. Signed-off-by: Hugh Dickins Cc: Izik Eidus Cc: Andrea Arcangeli Cc: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/ksm.txt | 22 +++++++--------------- mm/Kconfig | 2 +- mm/ksm.c | 41 ++--------------------------------------- 3 files changed, 10 insertions(+), 55 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index 262d8e6793a3..b392e496f816 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -16,9 +16,9 @@ by sharing the data common between them. But it can be useful to any application which generates many instances of the same data. KSM only merges anonymous (private) pages, never pagecache (file) pages. -KSM's merged pages are at present locked into kernel memory for as long -as they are shared: so cannot be swapped out like the user pages they -replace (but swapping KSM pages should follow soon in a later release). +KSM's merged pages were originally locked into kernel memory, but can now +be swapped out just like other user pages (but sharing is broken when they +are swapped back in: ksmd must rediscover their identity and merge again). KSM only operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) @@ -44,20 +44,12 @@ includes unmapped gaps (though working on the intervening mapped areas), and might fail with EAGAIN if not enough memory for internal structures. Applications should be considerate in their use of MADV_MERGEABLE, -restricting its use to areas likely to benefit. KSM's scans may use -a lot of processing power, and its kernel-resident pages are a limited -resource. Some installations will disable KSM for these reasons. +restricting its use to areas likely to benefit. KSM's scans may use a lot +of processing power: some installations will disable KSM for that reason. The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root: -max_kernel_pages - set to maximum number of kernel pages that KSM may use - e.g. "echo 100000 > /sys/kernel/mm/ksm/max_kernel_pages" - Value 0 imposes no limit on the kernel pages KSM may use; - but note that any process using MADV_MERGEABLE can cause - KSM to allocate these pages, unswappable until it exits. - Default: quarter of memory (chosen to not pin too much) - pages_to_scan - how many present pages to scan before ksmd goes to sleep e.g. "echo 100 > /sys/kernel/mm/ksm/pages_to_scan" Default: 100 (chosen for demonstration purposes) @@ -75,7 +67,7 @@ run - set 0 to stop ksmd from running but keep merged pages, The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: -pages_shared - how many shared unswappable kernel pages KSM is using +pages_shared - how many shared pages are being used pages_sharing - how many more sites are sharing them i.e. how much saved pages_unshared - how many pages unique but repeatedly checked for merging pages_volatile - how many pages changing too fast to be placed in a tree @@ -87,4 +79,4 @@ pages_volatile embraces several different kinds of activity, but a high proportion there would also indicate poor use of madvise MADV_MERGEABLE. Izik Eidus, -Hugh Dickins, 24 Sept 2009 +Hugh Dickins, 17 Nov 2009 diff --git a/mm/Kconfig b/mm/Kconfig index d4b5fff6ea09..2310984591ed 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -212,7 +212,7 @@ config KSM Enable Kernel Samepage Merging: KSM periodically scans those areas of an application's address space that an app has advised may be mergeable. When it finds pages of identical content, it replaces - the many instances by a single resident page with that content, so + the many instances by a single page with that content, so saving memory until one or another app needs to modify the content. Recommended for use with KVM, or with other duplicative applications. See Documentation/vm/ksm.txt for more information: KSM is inactive diff --git a/mm/ksm.c b/mm/ksm.c index d4c228a9d278..56a0da1f9979 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -179,9 +179,6 @@ static unsigned long ksm_pages_unshared; /* The number of rmap_items in use: to calculate pages_volatile */ static unsigned long ksm_rmap_items; -/* Limit on the number of unswappable pages used */ -static unsigned long ksm_max_kernel_pages; - /* Number of pages ksmd should scan in one batch */ static unsigned int ksm_thread_pages_to_scan = 100; @@ -943,14 +940,6 @@ static struct page *try_to_merge_two_pages(struct rmap_item *rmap_item, { int err; - /* - * The number of nodes in the stable tree - * is the number of kernel pages that we hold. - */ - if (ksm_max_kernel_pages && - ksm_max_kernel_pages <= ksm_pages_shared) - return NULL; - err = try_to_merge_with_ksm_page(rmap_item, page, NULL); if (!err) { err = try_to_merge_with_ksm_page(tree_rmap_item, @@ -1850,8 +1839,8 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr, /* * KSM_RUN_MERGE sets ksmd running, and 0 stops it running. * KSM_RUN_UNMERGE stops it running and unmerges all rmap_items, - * breaking COW to free the unswappable pages_shared (but leaves - * mm_slots on the list for when ksmd may be set running again). + * breaking COW to free the pages_shared (but leaves mm_slots + * on the list for when ksmd may be set running again). */ mutex_lock(&ksm_thread_mutex); @@ -1876,29 +1865,6 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr, } KSM_ATTR(run); -static ssize_t max_kernel_pages_store(struct kobject *kobj, - struct kobj_attribute *attr, - const char *buf, size_t count) -{ - int err; - unsigned long nr_pages; - - err = strict_strtoul(buf, 10, &nr_pages); - if (err) - return -EINVAL; - - ksm_max_kernel_pages = nr_pages; - - return count; -} - -static ssize_t max_kernel_pages_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) -{ - return sprintf(buf, "%lu\n", ksm_max_kernel_pages); -} -KSM_ATTR(max_kernel_pages); - static ssize_t pages_shared_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -1948,7 +1914,6 @@ static struct attribute *ksm_attrs[] = { &sleep_millisecs_attr.attr, &pages_to_scan_attr.attr, &run_attr.attr, - &max_kernel_pages_attr.attr, &pages_shared_attr.attr, &pages_sharing_attr.attr, &pages_unshared_attr.attr, @@ -1968,8 +1933,6 @@ static int __init ksm_init(void) struct task_struct *ksm_thread; int err; - ksm_max_kernel_pages = totalram_pages / 4; - err = ksm_slab_init(); if (err) goto out; -- cgit v1.2.3 From ea637639591def87a54cea811cbac796980cb30d Mon Sep 17 00:00:00 2001 From: Jie Zhang Date: Mon, 14 Dec 2009 18:00:02 -0800 Subject: nommu: fix malloc performance by adding uninitialized flag The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang Signed-off-by: Robin Getz Signed-off-by: Mike Frysinger Signed-off-by: David Howells Acked-by: Paul Mundt Acked-by: Greg Ungerer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/nommu-mmap.txt | 26 ++++++++++++++++++++++++++ fs/binfmt_elf_fdpic.c | 3 ++- include/asm-generic/mman-common.h | 5 +++++ init/Kconfig | 22 ++++++++++++++++++++++ mm/nommu.c | 8 +++++--- 5 files changed, 60 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt index b565e8279d13..8e1ddec2c78a 100644 --- a/Documentation/nommu-mmap.txt +++ b/Documentation/nommu-mmap.txt @@ -119,6 +119,32 @@ FURTHER NOTES ON NO-MMU MMAP granule but will only discard the excess if appropriately configured as this has an effect on fragmentation. + (*) The memory allocated by a request for an anonymous mapping will normally + be cleared by the kernel before being returned in accordance with the + Linux man pages (ver 2.22 or later). + + In the MMU case this can be achieved with reasonable performance as + regions are backed by virtual pages, with the contents only being mapped + to cleared physical pages when a write happens on that specific page + (prior to which, the pages are effectively mapped to the global zero page + from which reads can take place). This spreads out the time it takes to + initialize the contents of a page - depending on the write-usage of the + mapping. + + In the no-MMU case, however, anonymous mappings are backed by physical + pages, and the entire map is cleared at allocation time. This can cause + significant delays during a userspace malloc() as the C library does an + anonymous mapping and the kernel then does a memset for the entire map. + + However, for memory that isn't required to be precleared - such as that + returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to + indicate to the kernel that it shouldn't bother clearing the memory before + returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled + to permit this, otherwise the flag will be ignored. + + uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this + to allocate the brk and stack region. + (*) A list of all the private copy and anonymous mappings on the system is visible through /proc/maps in no-MMU mode. diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 38502c67987c..79d2b1aa389f 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -380,7 +380,8 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm, down_write(¤t->mm->mmap_sem); current->mm->start_brk = do_mmap(NULL, 0, stack_size, PROT_READ | PROT_WRITE | PROT_EXEC, - MAP_PRIVATE | MAP_ANONYMOUS | MAP_GROWSDOWN, + MAP_PRIVATE | MAP_ANONYMOUS | + MAP_UNINITIALIZED | MAP_GROWSDOWN, 0); if (IS_ERR_VALUE(current->mm->start_brk)) { diff --git a/include/asm-generic/mman-common.h b/include/asm-generic/mman-common.h index 5ee13b2fd223..20111265afd8 100644 --- a/include/asm-generic/mman-common.h +++ b/include/asm-generic/mman-common.h @@ -19,6 +19,11 @@ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x20 /* don't use a file */ +#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED +# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */ +#else +# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ +#endif #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff --git a/init/Kconfig b/init/Kconfig index 54c655ce9c04..a23da9f01803 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1079,6 +1079,28 @@ config SLOB endchoice +config MMAP_ALLOW_UNINITIALIZED + bool "Allow mmapped anonymous memory to be uninitialized" + depends on EMBEDDED && !MMU + default n + help + Normally, and according to the Linux spec, anonymous memory obtained + from mmap() has it's contents cleared before it is passed to + userspace. Enabling this config option allows you to request that + mmap() skip that if it is given an MAP_UNINITIALIZED flag, thus + providing a huge performance boost. If this option is not enabled, + then the flag will be ignored. + + This is taken advantage of by uClibc's malloc(), and also by + ELF-FDPIC binfmt's brk and stack allocator. + + Because of the obvious security issues, this option should only be + enabled on embedded devices where you control what is run in + userspace. Since that isn't generally a problem on no-MMU systems, + it is normally safe to say Y here. + + See Documentation/nommu-mmap.txt for more information. + config PROFILING bool "Profiling support (EXPERIMENTAL)" help diff --git a/mm/nommu.c b/mm/nommu.c index 9876fa0c3ad3..8687973462bb 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1143,9 +1143,6 @@ static int do_mmap_private(struct vm_area_struct *vma, if (ret < rlen) memset(base + ret, 0, rlen - ret); - } else { - /* if it's an anonymous mapping, then just clear it */ - memset(base, 0, rlen); } return 0; @@ -1343,6 +1340,11 @@ unsigned long do_mmap_pgoff(struct file *file, goto error_just_free; add_nommu_region(region); + /* clear anonymous mappings that don't ask for uninitialized data */ + if (!vma->vm_file && !(flags & MAP_UNINITIALIZED)) + memset((void *)region->vm_start, 0, + region->vm_end - region->vm_start); + /* okay... we have a mapping; now we have to register it */ result = vma->vm_start; -- cgit v1.2.3 From 4614a696bd1c3a9af3a08f0e5874830a85b889d4 Mon Sep 17 00:00:00 2001 From: john stultz Date: Mon, 14 Dec 2009 18:00:05 -0800 Subject: procfs: allow threads to rename siblings via /proc/pid/tasks/tid/comm Setting a thread's comm to be something unique is a very useful ability and is helpful for debugging complicated threaded applications. However currently the only way to set a thread name is for the thread to name itself via the PR_SET_NAME prctl. However, there may be situations where it would be advantageous for a thread dispatcher to be naming the threads its managing, rather then having the threads self-describe themselves. This sort of behavior is available on other systems via the pthread_setname_np() interface. This patch exports a task's comm via proc/pid/comm and proc/pid/task/tid/comm interfaces, and allows thread siblings to write to these values. [akpm@linux-foundation.org: cleanups] Signed-off-by: John Stultz Cc: Andi Kleen Cc: Arjan van de Ven Cc: Mike Fulton Cc: Sean Foley Cc: Darren Hart Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 9 +++++ fs/exec.c | 9 +++++ fs/proc/base.c | 68 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 94b9f2056f4c..220cc6376ef8 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -38,6 +38,7 @@ Table of Contents 3.3 /proc//io - Display the IO accounting fields 3.4 /proc//coredump_filter - Core dump filtering settings 3.5 /proc//mountinfo - Information about mounts + 3.6 /proc//comm & /proc//task//comm ------------------------------------------------------------------------------ @@ -1409,3 +1410,11 @@ For more information on mount propagation see: Documentation/filesystems/sharedsubtree.txt + +3.6 /proc//comm & /proc//task//comm +-------------------------------------------------------- +These files provide a method to access a tasks comm value. It also allows for +a task to set its own or one of its thread siblings comm value. The comm value +is limited in size compared to the cmdline value, so writing anything longer +then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated +comm value. diff --git a/fs/exec.c b/fs/exec.c index c0c636e34f60..623a5cc3076a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -923,6 +923,15 @@ char *get_task_comm(char *buf, struct task_struct *tsk) void set_task_comm(struct task_struct *tsk, char *buf) { task_lock(tsk); + + /* + * Threads may access current->comm without holding + * the task lock, so write the string carefully. + * Readers without a lock may see incomplete new + * names but are safe from non-terminating string reads. + */ + memset(tsk->comm, 0, TASK_COMM_LEN); + wmb(); strlcpy(tsk->comm, buf, sizeof(tsk->comm)); task_unlock(tsk); perf_event_comm(tsk); diff --git a/fs/proc/base.c b/fs/proc/base.c index af643b5aefe8..4df4a464a919 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1265,6 +1265,72 @@ static const struct file_operations proc_pid_sched_operations = { #endif +static ssize_t comm_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct task_struct *p; + char buffer[TASK_COMM_LEN]; + + memset(buffer, 0, sizeof(buffer)); + if (count > sizeof(buffer) - 1) + count = sizeof(buffer) - 1; + if (copy_from_user(buffer, buf, count)) + return -EFAULT; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + if (same_thread_group(current, p)) + set_task_comm(p, buffer); + else + count = -EINVAL; + + put_task_struct(p); + + return count; +} + +static int comm_show(struct seq_file *m, void *v) +{ + struct inode *inode = m->private; + struct task_struct *p; + + p = get_proc_task(inode); + if (!p) + return -ESRCH; + + task_lock(p); + seq_printf(m, "%s\n", p->comm); + task_unlock(p); + + put_task_struct(p); + + return 0; +} + +static int comm_open(struct inode *inode, struct file *filp) +{ + int ret; + + ret = single_open(filp, comm_show, NULL); + if (!ret) { + struct seq_file *m = filp->private_data; + + m->private = inode; + } + return ret; +} + +static const struct file_operations proc_pid_set_comm_operations = { + .open = comm_open, + .read = seq_read, + .write = comm_write, + .llseek = seq_lseek, + .release = single_release, +}; + /* * We added or removed a vma mapping the executable. The vmas are only mapped * during exec and are not mapped with the mmap system call. @@ -2504,6 +2570,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif + REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK INF("syscall", S_IRUSR, proc_pid_syscall), #endif @@ -2838,6 +2905,7 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif + REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK INF("syscall", S_IRUSR, proc_pid_syscall), #endif -- cgit v1.2.3 From 4eb174bee6f8623fed1af0072f1bebfc3b513a52 Mon Sep 17 00:00:00 2001 From: Michael Hennerich Date: Mon, 14 Dec 2009 18:00:15 -0800 Subject: ad525x_dpot: new driver for AD525x digital potentiometers This driver supports the non-volatile digital potentiometers via I2C: AD5258, AD5259, AD5251, AD5252, AD5253, AD5254, and AD5255 It provides a sysfs interface to each device for reading/writing which is documented in Documentation/misc-devices/ad525x_dpot.txt. Signed-off-by: Michael Hennerich Signed-off-by: Chris Verges Signed-off-by: Mike Frysinger Cc: Jean Delvare Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/misc-devices/ad525x_dpot.txt | 57 +++ drivers/misc/Kconfig | 14 + drivers/misc/Makefile | 1 + drivers/misc/ad525x_dpot.c | 666 +++++++++++++++++++++++++++++ 4 files changed, 738 insertions(+) create mode 100644 Documentation/misc-devices/ad525x_dpot.txt create mode 100644 drivers/misc/ad525x_dpot.c (limited to 'Documentation') diff --git a/Documentation/misc-devices/ad525x_dpot.txt b/Documentation/misc-devices/ad525x_dpot.txt new file mode 100644 index 000000000000..0c9413b1cbf3 --- /dev/null +++ b/Documentation/misc-devices/ad525x_dpot.txt @@ -0,0 +1,57 @@ +--------------------------------- + AD525x Digital Potentiometers +--------------------------------- + +The ad525x_dpot driver exports a simple sysfs interface. This allows you to +work with the immediate resistance settings as well as update the saved startup +settings. Access to the factory programmed tolerance is also provided, but +interpretation of this settings is required by the end application according to +the specific part in use. + +--------- + Files +--------- + +Each dpot device will have a set of eeprom, rdac, and tolerance files. How +many depends on the actual part you have, as will the range of allowed values. + +The eeprom files are used to program the startup value of the device. + +The rdac files are used to program the immediate value of the device. + +The tolerance files are the read-only factory programmed tolerance settings +and may vary greatly on a part-by-part basis. For exact interpretation of +this field, please consult the datasheet for your part. This is presented +as a hex file for easier parsing. + +----------- + Example +----------- + +Locate the device in your sysfs tree. This is probably easiest by going into +the common i2c directory and locating the device by the i2c slave address. + + # ls /sys/bus/i2c/devices/ + 0-0022 0-0027 0-002f + +So assuming the device in question is on the first i2c bus and has the slave +address of 0x2f, we descend (unrelated sysfs entries have been trimmed). + + # ls /sys/bus/i2c/devices/0-002f/ + eeprom0 rdac0 tolerance0 + +You can use simple reads/writes to access these files: + + # cd /sys/bus/i2c/devices/0-002f/ + + # cat eeprom0 + 0 + # echo 10 > eeprom0 + # cat eeprom0 + 10 + + # cat rdac0 + 5 + # echo 3 > rdac0 + # cat rdac0 + 3 diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 2c16ca6501d5..0c7f4c5dd07f 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -13,6 +13,20 @@ menuconfig MISC_DEVICES if MISC_DEVICES +config AD525X_DPOT + tristate "Analog Devices AD525x Digital Potentiometers" + depends on I2C && SYSFS + help + If you say yes here, you get support for the Analog Devices + AD5258, AD5259, AD5251, AD5252, AD5253, AD5254 and AD5255 + digital potentiometer chips. + + See Documentation/misc-devices/ad525x_dpot.txt for the + userspace interface. + + This driver can also be built as a module. If so, the module + will be called ad525x_dpot. + config ATMEL_PWM tristate "Atmel AT32/AT91 PWM support" depends on AVR32 || ARCH_AT91SAM9263 || ARCH_AT91SAM9RL || ARCH_AT91CAP9 diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 906a0edcea40..18c8418a5132 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -4,6 +4,7 @@ obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_HDPU_FEATURES) += hdpuftrs/ +obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o obj-$(CONFIG_ATMEL_PWM) += atmel_pwm.o obj-$(CONFIG_ATMEL_SSC) += atmel-ssc.o obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o diff --git a/drivers/misc/ad525x_dpot.c b/drivers/misc/ad525x_dpot.c new file mode 100644 index 000000000000..30a59f2bacd2 --- /dev/null +++ b/drivers/misc/ad525x_dpot.c @@ -0,0 +1,666 @@ +/* + * ad525x_dpot: Driver for the Analog Devices AD525x digital potentiometers + * Copyright (c) 2009 Analog Devices, Inc. + * Author: Michael Hennerich + * + * DEVID #Wipers #Positions Resistor Options (kOhm) + * AD5258 1 64 1, 10, 50, 100 + * AD5259 1 256 5, 10, 50, 100 + * AD5251 2 64 1, 10, 50, 100 + * AD5252 2 256 1, 10, 50, 100 + * AD5255 3 512 25, 250 + * AD5253 4 64 1, 10, 50, 100 + * AD5254 4 256 1, 10, 50, 100 + * + * See Documentation/misc-devices/ad525x_dpot.txt for more info. + * + * derived from ad5258.c + * Copyright (c) 2009 Cyber Switching, Inc. + * Author: Chris Verges + * + * derived from ad5252.c + * Copyright (c) 2006 Michael Hennerich + * + * Licensed under the GPL-2 or later. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DRIVER_NAME "ad525x_dpot" +#define DRIVER_VERSION "0.1" + +enum dpot_devid { + AD5258_ID, + AD5259_ID, + AD5251_ID, + AD5252_ID, + AD5253_ID, + AD5254_ID, + AD5255_ID, +}; + +#define AD5258_MAX_POSITION 64 +#define AD5259_MAX_POSITION 256 +#define AD5251_MAX_POSITION 64 +#define AD5252_MAX_POSITION 256 +#define AD5253_MAX_POSITION 64 +#define AD5254_MAX_POSITION 256 +#define AD5255_MAX_POSITION 512 + +#define AD525X_RDAC0 0 +#define AD525X_RDAC1 1 +#define AD525X_RDAC2 2 +#define AD525X_RDAC3 3 + +#define AD525X_REG_TOL 0x18 +#define AD525X_TOL_RDAC0 (AD525X_REG_TOL | AD525X_RDAC0) +#define AD525X_TOL_RDAC1 (AD525X_REG_TOL | AD525X_RDAC1) +#define AD525X_TOL_RDAC2 (AD525X_REG_TOL | AD525X_RDAC2) +#define AD525X_TOL_RDAC3 (AD525X_REG_TOL | AD525X_RDAC3) + +/* RDAC-to-EEPROM Interface Commands */ +#define AD525X_I2C_RDAC (0x00 << 5) +#define AD525X_I2C_EEPROM (0x01 << 5) +#define AD525X_I2C_CMD (0x80) + +#define AD525X_DEC_ALL_6DB (AD525X_I2C_CMD | (0x4 << 3)) +#define AD525X_INC_ALL_6DB (AD525X_I2C_CMD | (0x9 << 3)) +#define AD525X_DEC_ALL (AD525X_I2C_CMD | (0x6 << 3)) +#define AD525X_INC_ALL (AD525X_I2C_CMD | (0xB << 3)) + +static s32 ad525x_read(struct i2c_client *client, u8 reg); +static s32 ad525x_write(struct i2c_client *client, u8 reg, u8 value); + +/* + * Client data (each client gets its own) + */ + +struct dpot_data { + struct mutex update_lock; + unsigned rdac_mask; + unsigned max_pos; + unsigned devid; +}; + +/* sysfs functions */ + +static ssize_t sysfs_show_reg(struct device *dev, + struct device_attribute *attr, char *buf, u32 reg) +{ + struct i2c_client *client = to_i2c_client(dev); + struct dpot_data *data = i2c_get_clientdata(client); + s32 value; + + mutex_lock(&data->update_lock); + value = ad525x_read(client, reg); + mutex_unlock(&data->update_lock); + + if (value < 0) + return -EINVAL; + /* + * Let someone else deal with converting this ... + * the tolerance is a two-byte value where the MSB + * is a sign + integer value, and the LSB is a + * decimal value. See page 18 of the AD5258 + * datasheet (Rev. A) for more details. + */ + + if (reg & AD525X_REG_TOL) + return sprintf(buf, "0x%04x\n", value & 0xFFFF); + else + return sprintf(buf, "%u\n", value & data->rdac_mask); +} + +static ssize_t sysfs_set_reg(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count, u32 reg) +{ + struct i2c_client *client = to_i2c_client(dev); + struct dpot_data *data = i2c_get_clientdata(client); + unsigned long value; + int err; + + err = strict_strtoul(buf, 10, &value); + if (err) + return err; + + if (value > data->rdac_mask) + value = data->rdac_mask; + + mutex_lock(&data->update_lock); + ad525x_write(client, reg, value); + if (reg & AD525X_I2C_EEPROM) + msleep(26); /* Sleep while the EEPROM updates */ + mutex_unlock(&data->update_lock); + + return count; +} + +static ssize_t sysfs_do_cmd(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count, u32 reg) +{ + struct i2c_client *client = to_i2c_client(dev); + struct dpot_data *data = i2c_get_clientdata(client); + + mutex_lock(&data->update_lock); + ad525x_write(client, reg, 0); + mutex_unlock(&data->update_lock); + + return count; +} + +/* ------------------------------------------------------------------------- */ + +static ssize_t show_rdac0(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC0); +} + +static ssize_t set_rdac0(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_RDAC | AD525X_RDAC0); +} + +static DEVICE_ATTR(rdac0, S_IWUSR | S_IRUGO, show_rdac0, set_rdac0); + +static ssize_t show_eeprom0(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC0); +} + +static ssize_t set_eeprom0(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_EEPROM | AD525X_RDAC0); +} + +static DEVICE_ATTR(eeprom0, S_IWUSR | S_IRUGO, show_eeprom0, set_eeprom0); + +static ssize_t show_tolerance0(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, + AD525X_I2C_EEPROM | AD525X_TOL_RDAC0); +} + +static DEVICE_ATTR(tolerance0, S_IRUGO, show_tolerance0, NULL); + +/* ------------------------------------------------------------------------- */ + +static ssize_t show_rdac1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC1); +} + +static ssize_t set_rdac1(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_RDAC | AD525X_RDAC1); +} + +static DEVICE_ATTR(rdac1, S_IWUSR | S_IRUGO, show_rdac1, set_rdac1); + +static ssize_t show_eeprom1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC1); +} + +static ssize_t set_eeprom1(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_EEPROM | AD525X_RDAC1); +} + +static DEVICE_ATTR(eeprom1, S_IWUSR | S_IRUGO, show_eeprom1, set_eeprom1); + +static ssize_t show_tolerance1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, + AD525X_I2C_EEPROM | AD525X_TOL_RDAC1); +} + +static DEVICE_ATTR(tolerance1, S_IRUGO, show_tolerance1, NULL); + +/* ------------------------------------------------------------------------- */ + +static ssize_t show_rdac2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC2); +} + +static ssize_t set_rdac2(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_RDAC | AD525X_RDAC2); +} + +static DEVICE_ATTR(rdac2, S_IWUSR | S_IRUGO, show_rdac2, set_rdac2); + +static ssize_t show_eeprom2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC2); +} + +static ssize_t set_eeprom2(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_EEPROM | AD525X_RDAC2); +} + +static DEVICE_ATTR(eeprom2, S_IWUSR | S_IRUGO, show_eeprom2, set_eeprom2); + +static ssize_t show_tolerance2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, + AD525X_I2C_EEPROM | AD525X_TOL_RDAC2); +} + +static DEVICE_ATTR(tolerance2, S_IRUGO, show_tolerance2, NULL); + +/* ------------------------------------------------------------------------- */ + +static ssize_t show_rdac3(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC3); +} + +static ssize_t set_rdac3(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_RDAC | AD525X_RDAC3); +} + +static DEVICE_ATTR(rdac3, S_IWUSR | S_IRUGO, show_rdac3, set_rdac3); + +static ssize_t show_eeprom3(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC3); +} + +static ssize_t set_eeprom3(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_set_reg(dev, attr, buf, count, + AD525X_I2C_EEPROM | AD525X_RDAC3); +} + +static DEVICE_ATTR(eeprom3, S_IWUSR | S_IRUGO, show_eeprom3, set_eeprom3); + +static ssize_t show_tolerance3(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_show_reg(dev, attr, buf, + AD525X_I2C_EEPROM | AD525X_TOL_RDAC3); +} + +static DEVICE_ATTR(tolerance3, S_IRUGO, show_tolerance3, NULL); + +static struct attribute *ad525x_attributes_wipers[4][4] = { + { + &dev_attr_rdac0.attr, + &dev_attr_eeprom0.attr, + &dev_attr_tolerance0.attr, + NULL + }, { + &dev_attr_rdac1.attr, + &dev_attr_eeprom1.attr, + &dev_attr_tolerance1.attr, + NULL + }, { + &dev_attr_rdac2.attr, + &dev_attr_eeprom2.attr, + &dev_attr_tolerance2.attr, + NULL + }, { + &dev_attr_rdac3.attr, + &dev_attr_eeprom3.attr, + &dev_attr_tolerance3.attr, + NULL + } +}; + +static const struct attribute_group ad525x_group_wipers[] = { + {.attrs = ad525x_attributes_wipers[AD525X_RDAC0]}, + {.attrs = ad525x_attributes_wipers[AD525X_RDAC1]}, + {.attrs = ad525x_attributes_wipers[AD525X_RDAC2]}, + {.attrs = ad525x_attributes_wipers[AD525X_RDAC3]}, +}; + +/* ------------------------------------------------------------------------- */ + +static ssize_t set_inc_all(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_do_cmd(dev, attr, buf, count, AD525X_INC_ALL); +} + +static DEVICE_ATTR(inc_all, S_IWUSR, NULL, set_inc_all); + +static ssize_t set_dec_all(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_do_cmd(dev, attr, buf, count, AD525X_DEC_ALL); +} + +static DEVICE_ATTR(dec_all, S_IWUSR, NULL, set_dec_all); + +static ssize_t set_inc_all_6db(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_do_cmd(dev, attr, buf, count, AD525X_INC_ALL_6DB); +} + +static DEVICE_ATTR(inc_all_6db, S_IWUSR, NULL, set_inc_all_6db); + +static ssize_t set_dec_all_6db(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + return sysfs_do_cmd(dev, attr, buf, count, AD525X_DEC_ALL_6DB); +} + +static DEVICE_ATTR(dec_all_6db, S_IWUSR, NULL, set_dec_all_6db); + +static struct attribute *ad525x_attributes_commands[] = { + &dev_attr_inc_all.attr, + &dev_attr_dec_all.attr, + &dev_attr_inc_all_6db.attr, + &dev_attr_dec_all_6db.attr, + NULL +}; + +static const struct attribute_group ad525x_group_commands = { + .attrs = ad525x_attributes_commands, +}; + +/* ------------------------------------------------------------------------- */ + +/* i2c device functions */ + +/** + * ad525x_read - return the value contained in the specified register + * on the AD5258 device. + * @client: value returned from i2c_new_device() + * @reg: the register to read + * + * If the tolerance register is specified, 2 bytes are returned. + * Otherwise, 1 byte is returned. A negative value indicates an error + * occurred while reading the register. + */ +static s32 ad525x_read(struct i2c_client *client, u8 reg) +{ + struct dpot_data *data = i2c_get_clientdata(client); + + if ((reg & AD525X_REG_TOL) || (data->max_pos > 256)) + return i2c_smbus_read_word_data(client, (reg & 0xF8) | + ((reg & 0x7) << 1)); + else + return i2c_smbus_read_byte_data(client, reg); +} + +/** + * ad525x_write - store the given value in the specified register on + * the AD5258 device. + * @client: value returned from i2c_new_device() + * @reg: the register to write + * @value: the byte to store in the register + * + * For certain instructions that do not require a data byte, "NULL" + * should be specified for the "value" parameter. These instructions + * include NOP, RESTORE_FROM_EEPROM, and STORE_TO_EEPROM. + * + * A negative return value indicates an error occurred while reading + * the register. + */ +static s32 ad525x_write(struct i2c_client *client, u8 reg, u8 value) +{ + struct dpot_data *data = i2c_get_clientdata(client); + + /* Only write the instruction byte for certain commands */ + if (reg & AD525X_I2C_CMD) + return i2c_smbus_write_byte(client, reg); + + if (data->max_pos > 256) + return i2c_smbus_write_word_data(client, (reg & 0xF8) | + ((reg & 0x7) << 1), value); + else + /* All other registers require instruction + data bytes */ + return i2c_smbus_write_byte_data(client, reg, value); +} + +static int ad525x_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + struct device *dev = &client->dev; + struct dpot_data *data; + int err = 0; + + dev_dbg(dev, "%s\n", __func__); + + if (!i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_BYTE)) { + dev_err(dev, "missing I2C functionality for this driver\n"); + goto exit; + } + + data = kzalloc(sizeof(struct dpot_data), GFP_KERNEL); + if (!data) { + err = -ENOMEM; + goto exit; + } + + i2c_set_clientdata(client, data); + mutex_init(&data->update_lock); + + switch (id->driver_data) { + case AD5258_ID: + data->max_pos = AD5258_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + break; + case AD5259_ID: + data->max_pos = AD5259_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + break; + case AD5251_ID: + data->max_pos = AD5251_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5252_ID: + data->max_pos = AD5252_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5253_ID: + data->max_pos = AD5253_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC2]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5254_ID: + data->max_pos = AD5254_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC2]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5255_ID: + data->max_pos = AD5255_MAX_POSITION; + err = sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + err |= sysfs_create_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC2]); + err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands); + break; + default: + err = -ENODEV; + goto exit_free; + } + + if (err) { + dev_err(dev, "failed to register sysfs hooks\n"); + goto exit_free; + } + + data->devid = id->driver_data; + data->rdac_mask = data->max_pos - 1; + + dev_info(dev, "%s %d-Position Digital Potentiometer registered\n", + id->name, data->max_pos); + + return 0; + +exit_free: + kfree(data); + i2c_set_clientdata(client, NULL); +exit: + dev_err(dev, "failed to create client\n"); + return err; +} + +static int __devexit ad525x_remove(struct i2c_client *client) +{ + struct dpot_data *data = i2c_get_clientdata(client); + struct device *dev = &client->dev; + + switch (data->devid) { + case AD5258_ID: + case AD5259_ID: + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + break; + case AD5251_ID: + case AD5252_ID: + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + sysfs_remove_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5253_ID: + case AD5254_ID: + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC2]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC3]); + sysfs_remove_group(&dev->kobj, &ad525x_group_commands); + break; + case AD5255_ID: + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC0]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC1]); + sysfs_remove_group(&dev->kobj, + &ad525x_group_wipers[AD525X_RDAC2]); + sysfs_remove_group(&dev->kobj, &ad525x_group_commands); + break; + } + + i2c_set_clientdata(client, NULL); + kfree(data); + + return 0; +} + +static const struct i2c_device_id ad525x_idtable[] = { + {"ad5258", AD5258_ID}, + {"ad5259", AD5259_ID}, + {"ad5251", AD5251_ID}, + {"ad5252", AD5252_ID}, + {"ad5253", AD5253_ID}, + {"ad5254", AD5254_ID}, + {"ad5255", AD5255_ID}, + {} +}; + +MODULE_DEVICE_TABLE(i2c, ad525x_idtable); + +static struct i2c_driver ad525x_driver = { + .driver = { + .owner = THIS_MODULE, + .name = DRIVER_NAME, + }, + .id_table = ad525x_idtable, + .probe = ad525x_probe, + .remove = __devexit_p(ad525x_remove), +}; + +static int __init ad525x_init(void) +{ + return i2c_add_driver(&ad525x_driver); +} + +module_init(ad525x_init); + +static void __exit ad525x_exit(void) +{ + i2c_del_driver(&ad525x_driver); +} + +module_exit(ad525x_exit); + +MODULE_AUTHOR("Chris Verges , " + "Michael Hennerich , "); +MODULE_DESCRIPTION("AD5258/9 digital potentiometer driver"); +MODULE_LICENSE("GPL"); +MODULE_VERSION(DRIVER_VERSION); -- cgit v1.2.3 From 948c1e2521979c332b21b623414cf258150f214e Mon Sep 17 00:00:00 2001 From: Amerigo Wang Date: Mon, 14 Dec 2009 18:00:22 -0800 Subject: kallsyms: remove deprecated print_fn_descriptor_symbol() According to feature-removal-schedule.txt, it is the time to remove print_fn_descriptor_symbol(). And a quick grep shows that it no longer has any callers. Signed-off-by: WANG Cong Cc: Bjorn Helgaas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/feature-removal-schedule.txt | 9 --------- include/linux/kallsyms.h | 12 ------------ 2 files changed, 21 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index eb2c138c277c..21ab9357326d 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -291,15 +291,6 @@ Who: Michael Buesch --------------------------- -What: print_fn_descriptor_symbol() -When: October 2009 -Why: The %pF vsprintf format provides the same functionality in a - simpler way. print_fn_descriptor_symbol() is deprecated but - still present to give out-of-tree modules time to change. -Who: Bjorn Helgaas - ---------------------------- - What: /sys/o2cb symlink When: January 2010 Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 792274269f2b..d8e9b3d1c23c 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -107,18 +107,6 @@ static inline void print_symbol(const char *fmt, unsigned long addr) __builtin_extract_return_addr((void *)addr)); } -/* - * Pretty-print a function pointer. This function is deprecated. - * Please use the "%pF" vsprintf format instead. - */ -static inline void __deprecated print_fn_descriptor_symbol(const char *fmt, void *addr) -{ -#if defined(CONFIG_IA64) || defined(CONFIG_PPC64) - addr = *(void **)addr; -#endif - print_symbol(fmt, (unsigned long)addr); -} - static inline void print_ip_sym(unsigned long ip) { printk("[<%p>] %pS\n", (void *) ip, (void *) ip); -- cgit v1.2.3 From 41e9a062361de204d3710038925ae7f356ebb40d Mon Sep 17 00:00:00 2001 From: Daniel J Blueman Date: Mon, 14 Dec 2009 18:01:37 -0800 Subject: hwmon: w83627ehf updates Add control of fan minimum turn-on output levels, decoupling it from the fan turn-off output level. Add control of rate of change of fan output level. These in turn allow lower turn-off rotor speed and smoother transitions for better thermal and acoustic control authority. Add support for constant fan speed and proportional-response operations modes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Daniel J Blueman Cc: Jean Delvare Cc: David Hubbard Cc: Hans de Goede Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/hwmon/w83627ehf | 10 ++++-- drivers/hwmon/w83627ehf.c | 72 +++++++++++++++++++++++++++++++------------ 2 files changed, 61 insertions(+), 21 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf index 02b74899edaf..b7e42ec4b26b 100644 --- a/Documentation/hwmon/w83627ehf +++ b/Documentation/hwmon/w83627ehf @@ -81,8 +81,14 @@ pwm[1-4] - this file stores PWM duty cycle or DC value (fan speed) in range: 0 (stop) to 255 (full) pwm[1-4]_enable - this file controls mode of fan/temperature control: - * 1 Manual Mode, write to pwm file any value 0-255 (full speed) - * 2 Thermal Cruise + * 1 Manual mode, write to pwm file any value 0-255 (full speed) + * 2 "Thermal Cruise" mode + * 3 "Fan Speed Cruise" mode + * 4 "Smart Fan III" mode + +pwm[1-4]_mode - controls if output is PWM or DC level + * 0 DC output (0 - 12v) + * 1 PWM output Thermal Cruise mode ------------------- diff --git a/drivers/hwmon/w83627ehf.c b/drivers/hwmon/w83627ehf.c index bb5e78748783..0dcaba9b7189 100644 --- a/drivers/hwmon/w83627ehf.c +++ b/drivers/hwmon/w83627ehf.c @@ -5,6 +5,7 @@ Copyright (C) 2006 Yuan Mu (Winbond), Rudolf Marek David Hubbard + Daniel J Blueman Shamelessly ripped from the w83627hf driver Copyright (C) 2003 Mark Studebaker @@ -177,12 +178,15 @@ static const u16 W83627EHF_REG_TEMP_CONFIG[] = { 0x152, 0x252 }; #define W83627EHF_REG_ALARM3 0x45B /* SmartFan registers */ +#define W83627EHF_REG_FAN_STEPUP_TIME 0x0f +#define W83627EHF_REG_FAN_STEPDOWN_TIME 0x0e + /* DC or PWM output fan configuration */ static const u8 W83627EHF_REG_PWM_ENABLE[] = { 0x04, /* SYS FAN0 output mode and PWM mode */ 0x04, /* CPU FAN0 output mode and PWM mode */ 0x12, /* AUX FAN mode */ - 0x62, /* CPU fan1 mode */ + 0x62, /* CPU FAN1 mode */ }; static const u8 W83627EHF_PWM_MODE_SHIFT[] = { 0, 1, 0, 6 }; @@ -193,10 +197,12 @@ static const u8 W83627EHF_REG_PWM[] = { 0x01, 0x03, 0x11, 0x61 }; static const u8 W83627EHF_REG_TARGET[] = { 0x05, 0x06, 0x13, 0x63 }; static const u8 W83627EHF_REG_TOLERANCE[] = { 0x07, 0x07, 0x14, 0x62 }; - /* Advanced Fan control, some values are common for all fans */ -static const u8 W83627EHF_REG_FAN_MIN_OUTPUT[] = { 0x08, 0x09, 0x15, 0x64 }; -static const u8 W83627EHF_REG_FAN_STOP_TIME[] = { 0x0C, 0x0D, 0x17, 0x66 }; +static const u8 W83627EHF_REG_FAN_START_OUTPUT[] = { 0x0a, 0x0b, 0x16, 0x65 }; +static const u8 W83627EHF_REG_FAN_STOP_OUTPUT[] = { 0x08, 0x09, 0x15, 0x64 }; +static const u8 W83627EHF_REG_FAN_STOP_TIME[] = { 0x0c, 0x0d, 0x17, 0x66 }; +static const u8 W83627EHF_REG_FAN_MAX_OUTPUT[] = { 0xff, 0x67, 0xff, 0x69 }; +static const u8 W83627EHF_REG_FAN_STEP_OUTPUT[] = { 0xff, 0x68, 0xff, 0x6a }; /* * Conversions @@ -295,14 +301,19 @@ struct w83627ehf_data { u8 pwm_mode[4]; /* 0->DC variable voltage, 1->PWM variable duty cycle */ u8 pwm_enable[4]; /* 1->manual - 2->thermal cruise (also called SmartFan I) */ + 2->thermal cruise mode (also called SmartFan I) + 3->fan speed cruise mode + 4->variable thermal cruise (also called SmartFan III) */ u8 pwm_num; /* number of pwm */ u8 pwm[4]; u8 target_temp[4]; u8 tolerance[4]; - u8 fan_min_output[4]; /* minimum fan speed */ - u8 fan_stop_time[4]; + u8 fan_start_output[4]; /* minimum fan speed when spinning up */ + u8 fan_stop_output[4]; /* minimum fan speed when spinning down */ + u8 fan_stop_time[4]; /* time at minimum before disabling fan */ + u8 fan_max_output[4]; /* maximum fan speed */ + u8 fan_step_output[4]; /* rate of change output value */ u8 vid; u8 vrm; @@ -529,8 +540,10 @@ static struct w83627ehf_data *w83627ehf_update_device(struct device *dev) & 3) + 1; data->pwm[i] = w83627ehf_read_value(data, W83627EHF_REG_PWM[i]); - data->fan_min_output[i] = w83627ehf_read_value(data, - W83627EHF_REG_FAN_MIN_OUTPUT[i]); + data->fan_start_output[i] = w83627ehf_read_value(data, + W83627EHF_REG_FAN_START_OUTPUT[i]); + data->fan_stop_output[i] = w83627ehf_read_value(data, + W83627EHF_REG_FAN_STOP_OUTPUT[i]); data->fan_stop_time[i] = w83627ehf_read_value(data, W83627EHF_REG_FAN_STOP_TIME[i]); data->target_temp[i] = @@ -976,7 +989,7 @@ store_pwm_enable(struct device *dev, struct device_attribute *attr, u32 val = simple_strtoul(buf, NULL, 10); u16 reg; - if (!val || (val > 2)) /* only modes 1 and 2 are supported */ + if (!val || (val > 4)) return -EINVAL; mutex_lock(&data->update_lock); reg = w83627ehf_read_value(data, W83627EHF_REG_PWM_ENABLE[nr]); @@ -1118,7 +1131,10 @@ store_##reg(struct device *dev, struct device_attribute *attr, \ return count; \ } -fan_functions(fan_min_output, FAN_MIN_OUTPUT) +fan_functions(fan_start_output, FAN_START_OUTPUT) +fan_functions(fan_stop_output, FAN_STOP_OUTPUT) +fan_functions(fan_max_output, FAN_MAX_OUTPUT) +fan_functions(fan_step_output, FAN_STEP_OUTPUT) #define fan_time_functions(reg, REG) \ static ssize_t show_##reg(struct device *dev, struct device_attribute *attr, \ @@ -1161,8 +1177,14 @@ static DEVICE_ATTR(name, S_IRUGO, show_name, NULL); static struct sensor_device_attribute sda_sf3_arrays_fan4[] = { SENSOR_ATTR(pwm4_stop_time, S_IWUSR | S_IRUGO, show_fan_stop_time, store_fan_stop_time, 3), - SENSOR_ATTR(pwm4_min_output, S_IWUSR | S_IRUGO, show_fan_min_output, - store_fan_min_output, 3), + SENSOR_ATTR(pwm4_start_output, S_IWUSR | S_IRUGO, show_fan_start_output, + store_fan_start_output, 3), + SENSOR_ATTR(pwm4_stop_output, S_IWUSR | S_IRUGO, show_fan_stop_output, + store_fan_stop_output, 3), + SENSOR_ATTR(pwm4_max_output, S_IWUSR | S_IRUGO, show_fan_max_output, + store_fan_max_output, 3), + SENSOR_ATTR(pwm4_step_output, S_IWUSR | S_IRUGO, show_fan_step_output, + store_fan_step_output, 3), }; static struct sensor_device_attribute sda_sf3_arrays[] = { @@ -1172,12 +1194,24 @@ static struct sensor_device_attribute sda_sf3_arrays[] = { store_fan_stop_time, 1), SENSOR_ATTR(pwm3_stop_time, S_IWUSR | S_IRUGO, show_fan_stop_time, store_fan_stop_time, 2), - SENSOR_ATTR(pwm1_min_output, S_IWUSR | S_IRUGO, show_fan_min_output, - store_fan_min_output, 0), - SENSOR_ATTR(pwm2_min_output, S_IWUSR | S_IRUGO, show_fan_min_output, - store_fan_min_output, 1), - SENSOR_ATTR(pwm3_min_output, S_IWUSR | S_IRUGO, show_fan_min_output, - store_fan_min_output, 2), + SENSOR_ATTR(pwm1_start_output, S_IWUSR | S_IRUGO, show_fan_start_output, + store_fan_start_output, 0), + SENSOR_ATTR(pwm2_start_output, S_IWUSR | S_IRUGO, show_fan_start_output, + store_fan_start_output, 1), + SENSOR_ATTR(pwm3_start_output, S_IWUSR | S_IRUGO, show_fan_start_output, + store_fan_start_output, 2), + SENSOR_ATTR(pwm1_stop_output, S_IWUSR | S_IRUGO, show_fan_stop_output, + store_fan_stop_output, 0), + SENSOR_ATTR(pwm2_stop_output, S_IWUSR | S_IRUGO, show_fan_stop_output, + store_fan_stop_output, 1), + SENSOR_ATTR(pwm3_stop_output, S_IWUSR | S_IRUGO, show_fan_stop_output, + store_fan_stop_output, 2), + + /* pwm1 and pwm3 don't support max and step settings */ + SENSOR_ATTR(pwm2_max_output, S_IWUSR | S_IRUGO, show_fan_max_output, + store_fan_max_output, 1), + SENSOR_ATTR(pwm2_step_output, S_IWUSR | S_IRUGO, show_fan_step_output, + store_fan_step_output, 1), }; static ssize_t -- cgit v1.2.3 From bc62c1471773fc32adcfc05100abd16fa2b6e126 Mon Sep 17 00:00:00 2001 From: Éric Piel Date: Mon, 14 Dec 2009 18:01:39 -0800 Subject: lis3: update documentation and comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Most of the documentation and comments were written when the driver was only supporting one type of chip, only via ACPI/HP. Update the info to the much clearer understanding that we have now. Signed-off-by: Éric Piel Signed-off-by: Samu Onkalo Cc: Pavel Machek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/hwmon/lis3lv02d | 24 ++++++++++++++---------- drivers/hwmon/Kconfig | 22 ++++++++++++---------- drivers/hwmon/lis3lv02d.c | 20 ++++++++++---------- drivers/hwmon/lis3lv02d.h | 28 ++++++++++++++++------------ 4 files changed, 52 insertions(+), 42 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/lis3lv02d b/Documentation/hwmon/lis3lv02d index effe949a7282..21f090252209 100644 --- a/Documentation/hwmon/lis3lv02d +++ b/Documentation/hwmon/lis3lv02d @@ -3,7 +3,8 @@ Kernel driver lis3lv02d Supported chips: - * STMicroelectronics LIS3LV02DL and LIS3LV02DQ + * STMicroelectronics LIS3LV02DL, LIS3LV02DQ (12 bits precision) + * STMicroelectronics LIS302DL, LIS3L02DQ, LIS331DL (8 bits) Authors: Yan Burman @@ -13,13 +14,12 @@ Authors: Description ----------- -This driver provides support for the accelerometer found in various HP -laptops sporting the feature officially called "HP Mobile Data -Protection System 3D" or "HP 3D DriveGuard". It detects automatically -laptops with this sensor. Known models (for now the HP 2133, nc6420, -nc2510, nc8510, nc84x0, nw9440 and nx9420) will have their axis -automatically oriented on standard way (eg: you can directly play -neverball). The accelerometer data is readable via +This driver provides support for the accelerometer found in various HP laptops +sporting the feature officially called "HP Mobile Data Protection System 3D" or +"HP 3D DriveGuard". It detects automatically laptops with this sensor. Known +models (full list can be found in drivers/hwmon/hp_accel.c) will have their +axis automatically oriented on standard way (eg: you can directly play +neverball). The accelerometer data is readable via /sys/devices/platform/lis3lv02d. Sysfs attributes under /sys/devices/platform/lis3lv02d/: @@ -33,12 +33,16 @@ rate - reports the sampling rate of the accelerometer device in HZ This driver also provides an absolute input class device, allowing the laptop to act as a pinball machine-esque joystick. +On HP laptops, if the led infrastructure is activated, support for a led +indicating disk protection will be provided as /sys/class/leds/hp::hddprotect. + Another feature of the driver is misc device called "freefall" that acts similar to /dev/rtc and reacts on free-fall interrupts received from the device. It supports blocking operations, poll/select and fasync operation modes. You must read 1 bytes from the device. The result is number of free-fall interrupts since the last successful -read (or 255 if number of interrupts would not fit). +read (or 255 if number of interrupts would not fit). See the hpfall.c +file for an example on using the device. Axes orientation @@ -55,7 +59,7 @@ the accelerometer are converted into a "standard" organisation of the axes * If the laptop is put upside-down, Z becomes negative If your laptop model is not recognized (cf "dmesg"), you can send an -email to the authors to add it to the database. When reporting a new +email to the maintainer to add it to the database. When reporting a new laptop, please include the output of "dmidecode" plus the value of /sys/devices/platform/lis3lv02d/position in these four cases. diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index 9e640c62ebd9..95ccbe377f9c 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1046,25 +1046,27 @@ config SENSORS_ATK0110 will be called asus_atk0110. config SENSORS_LIS3LV02D - tristate "STMicroeletronics LIS3LV02Dx three-axis digital accelerometer" + tristate "STMicroeletronics LIS3* three-axis digital accelerometer" depends on INPUT select INPUT_POLLDEV select NEW_LEDS select LEDS_CLASS default n help - This driver provides support for the LIS3LV02Dx accelerometer. In - particular, it can be found in a number of HP laptops, which have the - "Mobile Data Protection System 3D" or "3D DriveGuard" feature. On such - systems the driver should load automatically (via ACPI). The - accelerometer might also be found in other systems, connected via SPI - or I2C. The accelerometer data is readable via - /sys/devices/platform/lis3lv02d. + This driver provides support for the LIS3* accelerometers, such as the + LIS3LV02DL or the LIS331DL. In particular, it can be found in a number + of HP laptops, which have the "Mobile Data Protection System 3D" or + "3D DriveGuard" feature. On such systems the driver should load + automatically (via ACPI alias). The accelerometer might also be found + in other systems, connected via SPI or I2C. The accelerometer data is + readable via /sys/devices/platform/lis3lv02d. This driver also provides an absolute input class device, allowing - the laptop to act as a pinball machine-esque joystick. On HP laptops, + a laptop to act as a pinball machine-esque joystick. It provides also + a misc device which can be used to detect free-fall. On HP laptops, if the led infrastructure is activated, support for a led indicating - disk protection will be provided as hp:red:hddprotection. + disk protection will be provided as hp::hddprotect. For more + information on the feature, refer to Documentation/hwmon/lis3lv02d. This driver can also be built as modules. If so, the core module will be called lis3lv02d and a specific module for HP laptops will be diff --git a/drivers/hwmon/lis3lv02d.c b/drivers/hwmon/lis3lv02d.c index dbd0b055d4b9..1c8f10817e69 100644 --- a/drivers/hwmon/lis3lv02d.c +++ b/drivers/hwmon/lis3lv02d.c @@ -43,7 +43,7 @@ #define MDPS_POLL_INTERVAL 50 /* * The sensor can also generate interrupts (DRDY) but it's pretty pointless - * because their are generated even if the data do not change. So it's better + * because they are generated even if the data do not change. So it's better * to keep the interrupt for the free-fall event. The values are updated at * 40Hz (at the lowest frequency), but as it can be pretty time consuming on * some low processor, we poll the sensor only at 20Hz... enough for the @@ -65,7 +65,7 @@ static s16 lis3lv02d_read_8(struct lis3lv02d *lis3, int reg) return lo; } -static s16 lis3lv02d_read_16(struct lis3lv02d *lis3, int reg) +static s16 lis3lv02d_read_12(struct lis3lv02d *lis3, int reg) { u8 lo, hi; @@ -411,20 +411,20 @@ EXPORT_SYMBOL_GPL(lis3lv02d_remove_fs); /* * Initialise the accelerometer and the various subsystems. - * Should be rather independant of the bus system. + * Should be rather independent of the bus system. */ int lis3lv02d_init_device(struct lis3lv02d *dev) { dev->whoami = lis3lv02d_read_8(dev, WHO_AM_I); switch (dev->whoami) { - case LIS_DOUBLE_ID: - printk(KERN_INFO DRIVER_NAME ": 2-byte sensor found\n"); - dev->read_data = lis3lv02d_read_16; + case WAI_12B: + printk(KERN_INFO DRIVER_NAME ": 12 bits sensor found\n"); + dev->read_data = lis3lv02d_read_12; dev->mdps_max_val = 2048; break; - case LIS_SINGLE_ID: - printk(KERN_INFO DRIVER_NAME ": 1-byte sensor found\n"); + case WAI_8B: + printk(KERN_INFO DRIVER_NAME ": 8 bits sensor found\n"); dev->read_data = lis3lv02d_read_8; dev->mdps_max_val = 128; break; @@ -445,7 +445,7 @@ int lis3lv02d_init_device(struct lis3lv02d *dev) if (dev->pdata) { struct lis3lv02d_platform_data *p = dev->pdata; - if (p->click_flags && (dev->whoami == LIS_SINGLE_ID)) { + if (p->click_flags && (dev->whoami == WAI_8B)) { dev->write(dev, CLICK_CFG, p->click_flags); dev->write(dev, CLICK_TIMELIMIT, p->click_time_limit); dev->write(dev, CLICK_LATENCY, p->click_latency); @@ -456,7 +456,7 @@ int lis3lv02d_init_device(struct lis3lv02d *dev) (p->click_thresh_y << 4)); } - if (p->wakeup_flags && (dev->whoami == LIS_SINGLE_ID)) { + if (p->wakeup_flags && (dev->whoami == WAI_8B)) { dev->write(dev, FF_WU_CFG_1, p->wakeup_flags); dev->write(dev, FF_WU_THS_1, p->wakeup_thresh & 0x7f); /* default to 2.5ms for now */ diff --git a/drivers/hwmon/lis3lv02d.h b/drivers/hwmon/lis3lv02d.h index 3e1ff46f72d3..2431c5199534 100644 --- a/drivers/hwmon/lis3lv02d.h +++ b/drivers/hwmon/lis3lv02d.h @@ -2,7 +2,7 @@ * lis3lv02d.h - ST LIS3LV02DL accelerometer driver * * Copyright (C) 2007-2008 Yan Burman - * Copyright (C) 2008 Eric Piel + * Copyright (C) 2008-2009 Eric Piel * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -22,20 +22,18 @@ #include /* - * The actual chip is STMicroelectronics LIS3LV02DL or LIS3LV02DQ that seems to - * be connected via SPI. There exists also several similar chips (such as LIS302DL or - * LIS3L02DQ) and they have slightly different registers, but we can provide a - * common interface for all of them. - * They can also be connected via I²C. + * This driver tries to support the "digital" accelerometer chips from + * STMicroelectronics such as LIS3LV02DL, LIS302DL, LIS3L02DQ, LIS331DL, + * LIS35DE, or LIS202DL. They are very similar in terms of programming, with + * almost the same registers. In addition to differing on physical properties, + * they differ on the number of axes (2/3), precision (8/12 bits), and special + * features (freefall detection, click...). Unfortunately, not all the + * differences can be probed via a register. + * They can be connected either via I²C or SPI. */ #include -/* 2-byte registers */ -#define LIS_DOUBLE_ID 0x3A /* LIS3LV02D[LQ] */ -/* 1-byte registers */ -#define LIS_SINGLE_ID 0x3B /* LIS[32]02DL and others */ - enum lis3_reg { WHO_AM_I = 0x0F, OFFSET_X = 0x16, @@ -94,6 +92,12 @@ enum lis3lv02d_reg { DD_THSE_H = 0x3F, }; +enum lis3_who_am_i { + WAI_12B = 0x3A, /* 12 bits: LIS3LV02D[LQ]... */ + WAI_8B = 0x3B, /* 8 bits: LIS[23]02D[LQ]... */ + WAI_6B = 0x52, /* 6 bits: LIS331DLF - not supported */ +}; + enum lis3lv02d_ctrl1 { CTRL1_Xen = 0x01, CTRL1_Yen = 0x02, @@ -194,7 +198,7 @@ struct lis3lv02d { int (*write) (struct lis3lv02d *lis3, int reg, u8 val); int (*read) (struct lis3lv02d *lis3, int reg, u8 *ret); - u8 whoami; /* 3Ah: 2-byte registries, 3Bh: 1-byte registries */ + u8 whoami; /* indicates measurement precision */ s16 (*read_data) (struct lis3lv02d *lis3, int reg); int mdps_max_val; -- cgit v1.2.3 From e956e6b77054f91580deb81ba6389cc8c8ce67ef Mon Sep 17 00:00:00 2001 From: Samu Onkalo Date: Mon, 14 Dec 2009 18:01:46 -0800 Subject: lis3: update documentation to match latest changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [akpm@linux-foundation.org: s/q/g/ (Randy)] Signed-off-by: Samu Onkalo Acked-by: Éric Piel Cc: Pavel Machek Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/hwmon/lis3lv02d | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/lis3lv02d b/Documentation/hwmon/lis3lv02d index 21f090252209..06534f25e643 100644 --- a/Documentation/hwmon/lis3lv02d +++ b/Documentation/hwmon/lis3lv02d @@ -20,18 +20,35 @@ sporting the feature officially called "HP Mobile Data Protection System 3D" or models (full list can be found in drivers/hwmon/hp_accel.c) will have their axis automatically oriented on standard way (eg: you can directly play neverball). The accelerometer data is readable via -/sys/devices/platform/lis3lv02d. +/sys/devices/platform/lis3lv02d. Reported values are scaled +to mg values (1/1000th of earth gravity). Sysfs attributes under /sys/devices/platform/lis3lv02d/: position - 3D position that the accelerometer reports. Format: "(x,y,z)" -calibrate - read: values (x, y, z) that are used as the base for input - class device operation. - write: forces the base to be recalibrated with the current - position. -rate - reports the sampling rate of the accelerometer device in HZ +rate - read reports the sampling rate of the accelerometer device in HZ. + write changes sampling rate of the accelerometer device. + Only values which are supported by HW are accepted. +selftest - performs selftest for the chip as specified by chip manufacturer. This driver also provides an absolute input class device, allowing -the laptop to act as a pinball machine-esque joystick. +the laptop to act as a pinball machine-esque joystick. Joystick device can be +calibrated. Joystick device can be in two different modes. +By default output values are scaled between -32768 .. 32767. In joystick raw +mode, joystick and sysfs position entry have the same scale. There can be +small difference due to input system fuzziness feature. +Events are also available as input event device. + +Selftest is meant only for hardware diagnostic purposes. It is not meant to be +used during normal operations. Position data is not corrupted during selftest +but interrupt behaviour is not guaranteed to work reliably. In test mode, the +sensing element is internally moved little bit. Selftest measures difference +between normal mode and test mode. Chip specifications tell the acceptance +limit for each type of the chip. Limits are provided via platform data +to allow adjustment of the limits without a change to the actual driver. +Seltest returns either "OK x y z" or "FAIL x y z" where x, y and z are +measured difference between modes. Axes are not remapped in selftest mode. +Measurement values are provided to help HW diagnostic applications to make +final decision. On HP laptops, if the led infrastructure is activated, support for a led indicating disk protection will be provided as /sys/class/leds/hp::hddprotect. -- cgit v1.2.3 From eac8ea536aded07004bde917f05a2329902c64b0 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Fri, 27 Nov 2009 13:56:50 -0300 Subject: V4L/DVB (13549): v4l: Add video_device_node_name function Many drivers access the device number (video_device::v4l2_devnode::num) in order to print the video device node name. Add and use a helper function to retrieve the video_device node name. Signed-off-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/v4l2-framework.txt | 16 ++++++++++++++-- drivers/media/video/v4l2-dev.c | 4 ++-- include/media/v4l2-dev.h | 5 +++++ 3 files changed, 21 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/video4linux/v4l2-framework.txt b/Documentation/video4linux/v4l2-framework.txt index b806edaf3e75..74d677c8b036 100644 --- a/Documentation/video4linux/v4l2-framework.txt +++ b/Documentation/video4linux/v4l2-framework.txt @@ -561,6 +561,8 @@ video_device helper functions There are a few useful helper functions: +- file/video_device private data + You can set/get driver private data in the video_device struct using: void *video_get_drvdata(struct video_device *vdev); @@ -575,8 +577,7 @@ struct video_device *video_devdata(struct file *file); returns the video_device belonging to the file struct. -The final helper function combines video_get_drvdata with -video_devdata: +The video_drvdata function combines video_get_drvdata with video_devdata: void *video_drvdata(struct file *file); @@ -584,6 +585,17 @@ You can go from a video_device struct to the v4l2_device struct using: struct v4l2_device *v4l2_dev = vdev->v4l2_dev; +- Device node name + +The video_device node kernel name can be retrieved using + +const char *video_device_node_name(struct video_device *vdev); + +The name is used as a hint by userspace tools such as udev. The function +should be used where possible instead of accessing the video_device::num and +video_device::minor fields. + + video buffer helper functions ----------------------------- diff --git a/drivers/media/video/v4l2-dev.c b/drivers/media/video/v4l2-dev.c index 500cbe9891ac..aa3e0f9aa3bf 100644 --- a/drivers/media/video/v4l2-dev.c +++ b/drivers/media/video/v4l2-dev.c @@ -551,8 +551,8 @@ static int __video_register_device(struct video_device *vdev, int type, int nr, vdev->dev.release = v4l2_device_release; if (nr != -1 && nr != vdev->num && warn_if_nr_in_use) - printk(KERN_WARNING "%s: requested %s%d, got %s%d\n", - __func__, name_base, nr, name_base, vdev->num); + printk(KERN_WARNING "%s: requested %s%d, got %s\n", __func__, + name_base, nr, video_device_node_name(vdev)); /* Part 5: Activate this minor. The char device can now be used. */ mutex_lock(&videodev_lock); diff --git a/include/media/v4l2-dev.h b/include/media/v4l2-dev.h index 73c9867d744c..9135f57351f6 100644 --- a/include/media/v4l2-dev.h +++ b/include/media/v4l2-dev.h @@ -141,6 +141,11 @@ static inline void *video_drvdata(struct file *file) return video_get_drvdata(video_devdata(file)); } +static inline const char *video_device_node_name(struct video_device *vdev) +{ + return dev_name(&vdev->dev); +} + static inline int video_is_unregistered(struct video_device *vdev) { return test_bit(V4L2_FL_UNREGISTERED, &vdev->flags); -- cgit v1.2.3 From 116dc5c50aff62064bfcceaa15abf44e9277578a Mon Sep 17 00:00:00 2001 From: Jean-Francois Moine Date: Tue, 1 Dec 2009 07:20:34 -0300 Subject: V4L/DVB (13562): gspca - doc: Update webcam list. Signed-off-by: Jean-Francois Moine Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/gspca.txt | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt index 319d9838e87e..1800a62cf135 100644 --- a/Documentation/video4linux/gspca.txt +++ b/Documentation/video4linux/gspca.txt @@ -12,6 +12,7 @@ m5602 0402:5602 ALi Video Camera Controller spca501 040a:0002 Kodak DVC-325 spca500 040a:0300 Kodak EZ200 zc3xx 041e:041e Creative WebCam Live! +ov519 041e:4003 Video Blaster WebCam Go Plus spca500 041e:400a Creative PC-CAM 300 sunplus 041e:400b Creative PC-CAM 600 sunplus 041e:4012 PC-Cam350 @@ -168,10 +169,14 @@ sunplus 055f:c650 Mustek MDC5500Z zc3xx 055f:d003 Mustek WCam300A zc3xx 055f:d004 Mustek WCam300 AN conex 0572:0041 Creative Notebook cx11646 +ov519 05a9:0511 Video Blaster WebCam 3/WebCam Plus, D-Link USB Digital Video Camera +ov519 05a9:0518 Creative WebCam ov519 05a9:0519 OV519 Microphone ov519 05a9:0530 OmniVision +ov519 05a9:2800 OmniVision SuperCAM ov519 05a9:4519 Webcam Classic ov519 05a9:8519 OmniVision +ov519 05a9:a511 D-Link USB Digital Video Camera ov519 05a9:a518 D-Link DSB-C310 Webcam sunplus 05da:1018 Digital Dream Enigma 1.3 stk014 05e1:0893 Syntek DV4000 @@ -187,7 +192,7 @@ ov534 06f8:3002 Hercules Blog Webcam ov534 06f8:3003 Hercules Dualpix HD Weblog sonixj 06f8:3004 Hercules Classic Silver sonixj 06f8:3008 Hercules Deluxe Optical Glass -pac7311 06f8:3009 Hercules Classic Link +pac7302 06f8:3009 Hercules Classic Link spca508 0733:0110 ViewQuest VQ110 spca501 0733:0401 Intel Create and Share spca501 0733:0402 ViewQuest M318B @@ -199,6 +204,7 @@ sunplus 0733:2221 Mercury Digital Pro 3.1p sunplus 0733:3261 Concord 3045 spca536a sunplus 0733:3281 Cyberpix S550V spca506 0734:043b 3DeMon USB Capture aka +ov519 0813:0002 Dual Mode USB Camera Plus spca500 084d:0003 D-Link DSC-350 spca500 08ca:0103 Aiptek PocketDV sunplus 08ca:0104 Aiptek PocketDVII 1.3 @@ -236,15 +242,15 @@ pac7311 093a:2603 Philips SPC 500 NC pac7311 093a:2608 Trust WB-3300p pac7311 093a:260e Gigaware VGA PC Camera, Trust WB-3350p, SIGMA cam 2350 pac7311 093a:260f SnakeCam -pac7311 093a:2620 Apollo AC-905 -pac7311 093a:2621 PAC731x -pac7311 093a:2622 Genius Eye 312 -pac7311 093a:2624 PAC7302 -pac7311 093a:2626 Labtec 2200 -pac7311 093a:2628 Genius iLook 300 -pac7311 093a:2629 Genious iSlim 300 -pac7311 093a:262a Webcam 300k -pac7311 093a:262c Philips SPC 230 NC +pac7302 093a:2620 Apollo AC-905 +pac7302 093a:2621 PAC731x +pac7302 093a:2622 Genius Eye 312 +pac7302 093a:2624 PAC7302 +pac7302 093a:2626 Labtec 2200 +pac7302 093a:2628 Genius iLook 300 +pac7302 093a:2629 Genious iSlim 300 +pac7302 093a:262a Webcam 300k +pac7302 093a:262c Philips SPC 230 NC jeilinj 0979:0280 Sakar 57379 zc3xx 0ac8:0302 Z-star Vimicro zc0302 vc032x 0ac8:0321 Vimicro generic vc0321 @@ -259,6 +265,7 @@ vc032x 0ac8:c002 Sony embedded vimicro vc032x 0ac8:c301 Samsung Q1 Ultra Premium spca508 0af9:0010 Hama USB Sightcam 100 spca508 0af9:0011 Hama USB Sightcam 100 +ov519 0b62:0059 iBOT2 Webcam sonixb 0c45:6001 Genius VideoCAM NB sonixb 0c45:6005 Microdia Sweex Mini Webcam sonixb 0c45:6007 Sonix sn9c101 + Tas5110D @@ -318,8 +325,10 @@ sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655) sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660) sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R) sunplus 0d64:0303 Sunplus FashionCam DXG +ov519 0e96:c001 TRUST 380 USB2 SPACEC@M etoms 102c:6151 Qcam Sangha CIF etoms 102c:6251 Qcam xxxxxx VGA +ov519 1046:9967 W9967CF/W9968CF WebCam IC, Video Blaster WebCam Go zc3xx 10fd:0128 Typhoon Webshot II USB 300k 0x0128 spca561 10fd:7e50 FlyCam Usb 100 zc3xx 10fd:8050 Typhoon Webshot II USB 300k @@ -332,7 +341,12 @@ spca501 1776:501c Arowana 300K CMOS Camera t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC pac207 2001:f115 D-Link DSB-C120 +sq905c 2770:9050 sq905c +sq905c 2770:905c DualCamera +sq905 2770:9120 Argus Digital Camera DC1512 +sq905c 2770:913d sq905c spca500 2899:012c Toptro Industrial +ov519 8020:ef04 ov519 spca508 8086:0110 Intel Easy PC Camera spca500 8086:0630 Intel Pocket PC Camera spca506 99fa:8988 Grandtec V.cap -- cgit v1.2.3 From 007701e2c4de94109864c8e85e39b6d8fc474170 Mon Sep 17 00:00:00 2001 From: Muralidharan Karicheri Date: Thu, 3 Dec 2009 01:13:17 -0300 Subject: V4L/DVB (13572): v4l2-spec: Digital Video Timings API documentation This patch updates the v4l2-dvb documentation for the new video timings API added. Also updated the document based on comments from Hans Verkuil. Signed-off-by: Muralidharan Karicheri Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/media-entities.tmpl | 18 ++ Documentation/DocBook/media-indices.tmpl | 4 + Documentation/DocBook/v4l/common.xml | 35 +++ Documentation/DocBook/v4l/v4l2.xml | 4 + Documentation/DocBook/v4l/videodev2.h.xml | 116 +++++++++- .../DocBook/v4l/vidioc-enum-dv-presets.xml | 238 +++++++++++++++++++++ Documentation/DocBook/v4l/vidioc-enuminput.xml | 36 +++- Documentation/DocBook/v4l/vidioc-enumoutput.xml | 36 +++- Documentation/DocBook/v4l/vidioc-g-dv-preset.xml | 111 ++++++++++ Documentation/DocBook/v4l/vidioc-g-dv-timings.xml | 224 +++++++++++++++++++ .../DocBook/v4l/vidioc-query-dv-preset.xml | 85 ++++++++ 11 files changed, 903 insertions(+), 4 deletions(-) create mode 100644 Documentation/DocBook/v4l/vidioc-enum-dv-presets.xml create mode 100644 Documentation/DocBook/v4l/vidioc-g-dv-preset.xml create mode 100644 Documentation/DocBook/v4l/vidioc-g-dv-timings.xml create mode 100644 Documentation/DocBook/v4l/vidioc-query-dv-preset.xml (limited to 'Documentation') diff --git a/Documentation/DocBook/media-entities.tmpl b/Documentation/DocBook/media-entities.tmpl index bb5ab741220e..c725cb852c54 100644 --- a/Documentation/DocBook/media-entities.tmpl +++ b/Documentation/DocBook/media-entities.tmpl @@ -23,6 +23,7 @@ VIDIOC_ENUMINPUT"> VIDIOC_ENUMOUTPUT"> VIDIOC_ENUMSTD"> +VIDIOC_ENUM_DV_PRESETS"> VIDIOC_ENUM_FMT"> VIDIOC_ENUM_FRAMEINTERVALS"> VIDIOC_ENUM_FRAMESIZES"> @@ -30,6 +31,8 @@ VIDIOC_G_AUDOUT"> VIDIOC_G_CROP"> VIDIOC_G_CTRL"> +VIDIOC_G_DV_PRESET"> +VIDIOC_G_DV_TIMINGS"> VIDIOC_G_ENC_INDEX"> VIDIOC_G_EXT_CTRLS"> VIDIOC_G_FBUF"> @@ -53,6 +56,7 @@ VIDIOC_QUERYCTRL"> VIDIOC_QUERYMENU"> VIDIOC_QUERYSTD"> +VIDIOC_QUERY_DV_PRESET"> VIDIOC_REQBUFS"> VIDIOC_STREAMOFF"> VIDIOC_STREAMON"> @@ -60,6 +64,8 @@ VIDIOC_S_AUDOUT"> VIDIOC_S_CROP"> VIDIOC_S_CTRL"> +VIDIOC_S_DV_PRESET"> +VIDIOC_S_DV_TIMINGS"> VIDIOC_S_EXT_CTRLS"> VIDIOC_S_FBUF"> VIDIOC_S_FMT"> @@ -118,6 +124,7 @@ v4l2_audio"> v4l2_audioout"> +v4l2_bt_timings"> v4l2_buffer"> v4l2_capability"> v4l2_captureparm"> @@ -128,6 +135,9 @@ v4l2_dbg_chip_ident"> v4l2_dbg_match"> v4l2_dbg_register"> +v4l2_dv_enum_preset"> +v4l2_dv_preset"> +v4l2_dv_timings"> v4l2_enc_idx"> v4l2_enc_idx_entry"> v4l2_encoder_cmd"> @@ -243,6 +253,10 @@ + + + + @@ -333,6 +347,10 @@ + + + + diff --git a/Documentation/DocBook/media-indices.tmpl b/Documentation/DocBook/media-indices.tmpl index 9e30a236d74f..78d6031de001 100644 --- a/Documentation/DocBook/media-indices.tmpl +++ b/Documentation/DocBook/media-indices.tmpl @@ -36,6 +36,7 @@ enum v4l2_preemphasis struct v4l2_audio struct v4l2_audioout +struct v4l2_bt_timings struct v4l2_buffer struct v4l2_capability struct v4l2_captureparm @@ -46,6 +47,9 @@ struct v4l2_dbg_chip_ident struct v4l2_dbg_match struct v4l2_dbg_register +struct v4l2_dv_enum_preset +struct v4l2_dv_preset +struct v4l2_dv_timings struct v4l2_enc_idx struct v4l2_enc_idx_entry struct v4l2_encoder_cmd diff --git a/Documentation/DocBook/v4l/common.xml b/Documentation/DocBook/v4l/common.xml index b1a81d246d58..c65f0ac9b6ee 100644 --- a/Documentation/DocBook/v4l/common.xml +++ b/Documentation/DocBook/v4l/common.xml @@ -716,6 +716,41 @@ if (-1 == ioctl (fd, &VIDIOC-S-STD;, &std_id)) { } +
+ Digital Video (DV) Timings + + The video standards discussed so far has been dealing with Analog TV and the +corresponding video timings. Today there are many more different hardware interfaces +such as High Definition TV interfaces (HDMI), VGA, DVI connectors etc., that carry +video signals and there is a need to extend the API to select the video timings +for these interfaces. Since it is not possible to extend the &v4l2-std-id; due to +the limited bits available, a new set of IOCTLs is added to set/get video timings at +the input and output: + + DV Presets: Digital Video (DV) presets. These are IDs representing a +video timing at the input/output. Presets are pre-defined timings implemented +by the hardware according to video standards. A __u32 data type is used to represent +a preset unlike the bit mask that is used in &v4l2-std-id; allowing future extensions +to support as many different presets as needed. + + + Custom DV Timings: This will allow applications to define more detailed +custom video timings for the interface. This includes parameters such as width, height, +polarities, frontporch, backporch etc. + + + + To enumerate and query the attributes of DV presets supported by a device, +applications use the &VIDIOC-ENUM-DV-PRESETS; ioctl. To get the current DV preset, +applications use the &VIDIOC-G-DV-PRESET; ioctl and to set a preset they use the +&VIDIOC-S-DV-PRESET; ioctl. + To set custom DV timings for the device, applications use the +&VIDIOC-S-DV-TIMINGS; ioctl and to get current custom DV timings they use the +&VIDIOC-G-DV-TIMINGS; ioctl. + Applications can make use of the and + flags to decide what ioctls are available to set the +video timings for the device. +
&sub-controls; diff --git a/Documentation/DocBook/v4l/v4l2.xml b/Documentation/DocBook/v4l/v4l2.xml index 937b4157a5d0..77b7dd8a3b5f 100644 --- a/Documentation/DocBook/v4l/v4l2.xml +++ b/Documentation/DocBook/v4l/v4l2.xml @@ -411,6 +411,7 @@ and discussions on the V4L mailing list. &sub-encoder-cmd; &sub-enumaudio; &sub-enumaudioout; + &sub-enum-dv-presets; &sub-enum-fmt; &sub-enum-framesizes; &sub-enum-frameintervals; @@ -421,6 +422,8 @@ and discussions on the V4L mailing list. &sub-g-audioout; &sub-g-crop; &sub-g-ctrl; + &sub-g-dv-preset; + &sub-g-dv-timings; &sub-g-enc-index; &sub-g-ext-ctrls; &sub-g-fbuf; @@ -441,6 +444,7 @@ and discussions on the V4L mailing list. &sub-querybuf; &sub-querycap; &sub-queryctrl; + &sub-query-dv-preset; &sub-querystd; &sub-reqbufs; &sub-s-hw-freq-seek; diff --git a/Documentation/DocBook/v4l/videodev2.h.xml b/Documentation/DocBook/v4l/videodev2.h.xml index 3e282ed9f593..068325940658 100644 --- a/Documentation/DocBook/v4l/videodev2.h.xml +++ b/Documentation/DocBook/v4l/videodev2.h.xml @@ -733,6 +733,99 @@ struct v4l2_standard { __u32 reserved[4]; }; +/* + * V I D E O T I M I N G S D V P R E S E T + */ +struct v4l2_dv_preset { + __u32 preset; + __u32 reserved[4]; +}; + +/* + * D V P R E S E T S E N U M E R A T I O N + */ +struct v4l2_dv_enum_preset { + __u32 index; + __u32 preset; + __u8 name[32]; /* Name of the preset timing */ + __u32 width; + __u32 height; + __u32 reserved[4]; +}; + +/* + * D V P R E S E T V A L U E S + */ +#define V4L2_DV_INVALID 0 +#define V4L2_DV_480P59_94 1 /* BT.1362 */ +#define V4L2_DV_576P50 2 /* BT.1362 */ +#define V4L2_DV_720P24 3 /* SMPTE 296M */ +#define V4L2_DV_720P25 4 /* SMPTE 296M */ +#define V4L2_DV_720P30 5 /* SMPTE 296M */ +#define V4L2_DV_720P50 6 /* SMPTE 296M */ +#define V4L2_DV_720P59_94 7 /* SMPTE 274M */ +#define V4L2_DV_720P60 8 /* SMPTE 274M/296M */ +#define V4L2_DV_1080I29_97 9 /* BT.1120/ SMPTE 274M */ +#define V4L2_DV_1080I30 10 /* BT.1120/ SMPTE 274M */ +#define V4L2_DV_1080I25 11 /* BT.1120 */ +#define V4L2_DV_1080I50 12 /* SMPTE 296M */ +#define V4L2_DV_1080I60 13 /* SMPTE 296M */ +#define V4L2_DV_1080P24 14 /* SMPTE 296M */ +#define V4L2_DV_1080P25 15 /* SMPTE 296M */ +#define V4L2_DV_1080P30 16 /* SMPTE 296M */ +#define V4L2_DV_1080P50 17 /* BT.1120 */ +#define V4L2_DV_1080P60 18 /* BT.1120 */ + +/* + * D V B T T I M I N G S + */ + +/* BT.656/BT.1120 timing data */ +struct v4l2_bt_timings { + __u32 width; /* width in pixels */ + __u32 height; /* height in lines */ + __u32 interlaced; /* Interlaced or progressive */ + __u32 polarities; /* Positive or negative polarity */ + __u64 pixelclock; /* Pixel clock in HZ. Ex. 74.25MHz->74250000 */ + __u32 hfrontporch; /* Horizpontal front porch in pixels */ + __u32 hsync; /* Horizontal Sync length in pixels */ + __u32 hbackporch; /* Horizontal back porch in pixels */ + __u32 vfrontporch; /* Vertical front porch in pixels */ + __u32 vsync; /* Vertical Sync length in lines */ + __u32 vbackporch; /* Vertical back porch in lines */ + __u32 il_vfrontporch; /* Vertical front porch for bottom field of + * interlaced field formats + */ + __u32 il_vsync; /* Vertical sync length for bottom field of + * interlaced field formats + */ + __u32 il_vbackporch; /* Vertical back porch for bottom field of + * interlaced field formats + */ + __u32 reserved[16]; +} __attribute__ ((packed)); + +/* Interlaced or progressive format */ +#define V4L2_DV_PROGRESSIVE 0 +#define V4L2_DV_INTERLACED 1 + +/* Polarities. If bit is not set, it is assumed to be negative polarity */ +#define V4L2_DV_VSYNC_POS_POL 0x00000001 +#define V4L2_DV_HSYNC_POS_POL 0x00000002 + + +/* DV timings */ +struct v4l2_dv_timings { + __u32 type; + union { + struct v4l2_bt_timings bt; + __u32 reserved[32]; + }; +} __attribute__ ((packed)); + +/* Values for the type field */ +#define V4L2_DV_BT_656_1120 0 /* BT.656/1120 timing type */ + /* * V I D E O I N P U T S */ @@ -744,7 +837,8 @@ struct v4l2_input { __u32 tuner; /* Associated tuner */ v4l2_std_id std; __u32 status; - __u32 reserved[4]; + __u32 capabilities; + __u32 reserved[3]; }; /* Values for the 'type' field */ @@ -775,6 +869,11 @@ struct v4l2_input { #define V4L2_IN_ST_NO_ACCESS 0x02000000 /* Conditional access denied */ #define V4L2_IN_ST_VTR 0x04000000 /* VTR time constant */ +/* capabilities flags */ +#define V4L2_IN_CAP_PRESETS 0x00000001 /* Supports S_DV_PRESET */ +#define V4L2_IN_CAP_CUSTOM_TIMINGS 0x00000002 /* Supports S_DV_TIMINGS */ +#define V4L2_IN_CAP_STD 0x00000004 /* Supports S_STD */ + /* * V I D E O O U T P U T S */ @@ -785,13 +884,19 @@ struct v4l2_output { __u32 audioset; /* Associated audios (bitfield) */ __u32 modulator; /* Associated modulator */ v4l2_std_id std; - __u32 reserved[4]; + __u32 capabilities; + __u32 reserved[3]; }; /* Values for the 'type' field */ #define V4L2_OUTPUT_TYPE_MODULATOR 1 #define V4L2_OUTPUT_TYPE_ANALOG 2 #define V4L2_OUTPUT_TYPE_ANALOGVGAOVERLAY 3 +/* capabilities flags */ +#define V4L2_OUT_CAP_PRESETS 0x00000001 /* Supports S_DV_PRESET */ +#define V4L2_OUT_CAP_CUSTOM_TIMINGS 0x00000002 /* Supports S_DV_TIMINGS */ +#define V4L2_OUT_CAP_STD 0x00000004 /* Supports S_STD */ + /* * C O N T R O L S */ @@ -1626,6 +1731,13 @@ struct v4l2_dbg_chip_ident { #endif #define VIDIOC_S_HW_FREQ_SEEK _IOW('V', 82, struct v4l2_hw_freq_seek) +#define VIDIOC_ENUM_DV_PRESETS _IOWR('V', 83, struct v4l2_dv_enum_preset) +#define VIDIOC_S_DV_PRESET _IOWR('V', 84, struct v4l2_dv_preset) +#define VIDIOC_G_DV_PRESET _IOWR('V', 85, struct v4l2_dv_preset) +#define VIDIOC_QUERY_DV_PRESET _IOR('V', 86, struct v4l2_dv_preset) +#define VIDIOC_S_DV_TIMINGS _IOWR('V', 87, struct v4l2_dv_timings) +#define VIDIOC_G_DV_TIMINGS _IOWR('V', 88, struct v4l2_dv_timings) + /* Reminder: when adding new ioctls please add support for them to drivers/media/video/v4l2-compat-ioctl32.c as well! */ diff --git a/Documentation/DocBook/v4l/vidioc-enum-dv-presets.xml b/Documentation/DocBook/v4l/vidioc-enum-dv-presets.xml new file mode 100644 index 000000000000..1d31427edd1b --- /dev/null +++ b/Documentation/DocBook/v4l/vidioc-enum-dv-presets.xml @@ -0,0 +1,238 @@ + + + ioctl VIDIOC_ENUM_DV_PRESETS + &manvol; + + + + VIDIOC_ENUM_DV_PRESETS + Enumerate supported Digital Video presets + + + + + + int ioctl + int fd + int request + struct v4l2_dv_enum_preset *argp + + + + + + Arguments + + + + fd + + &fd; + + + + request + + VIDIOC_ENUM_DV_PRESETS + + + + argp + + + + + + + + + Description + + To query the attributes of a DV preset, applications initialize the +index field and zero the reserved array of &v4l2-dv-enum-preset; +and call the VIDIOC_ENUM_DV_PRESETS ioctl with a pointer to this +structure. Drivers fill the rest of the structure or return an +&EINVAL; when the index is out of bounds. To enumerate all DV Presets supported, +applications shall begin at index zero, incrementing by one until the +driver returns EINVAL. Drivers may enumerate a +different set of DV presets after switching the video input or +output. + + + struct <structname>v4l2_dv_enum_presets</structname> + + &cs-str; + + + __u32 + index + Number of the DV preset, set by the +application. + + + __u32 + preset + This field identifies one of the DV preset values listed in . + + + __u8 + name[24] + Name of the preset, a NUL-terminated ASCII string, for example: "720P-60", "1080I-60". This information is +intended for the user. + + + __u32 + width + Width of the active video in pixels for the DV preset. + + + __u32 + height + Height of the active video in lines for the DV preset. + + + __u32 + reserved[4] + Reserved for future extensions. Drivers must set the array to zero. + + + +
+ + + struct <structname>DV Presets</structname> + + &cs-str; + + + Preset + Preset value + Description + + + + + + + + V4L2_DV_INVALID + 0 + Invalid preset value. + + + V4L2_DV_480P59_94 + 1 + 720x480 progressive video at 59.94 fps as per BT.1362. + + + V4L2_DV_576P50 + 2 + 720x576 progressive video at 50 fps as per BT.1362. + + + V4L2_DV_720P24 + 3 + 1280x720 progressive video at 24 fps as per SMPTE 296M. + + + V4L2_DV_720P25 + 4 + 1280x720 progressive video at 25 fps as per SMPTE 296M. + + + V4L2_DV_720P30 + 5 + 1280x720 progressive video at 30 fps as per SMPTE 296M. + + + V4L2_DV_720P50 + 6 + 1280x720 progressive video at 50 fps as per SMPTE 296M. + + + V4L2_DV_720P59_94 + 7 + 1280x720 progressive video at 59.94 fps as per SMPTE 274M. + + + V4L2_DV_720P60 + 8 + 1280x720 progressive video at 60 fps as per SMPTE 274M/296M. + + + V4L2_DV_1080I29_97 + 9 + 1920x1080 interlaced video at 29.97 fps as per BT.1120/SMPTE 274M. + + + V4L2_DV_1080I30 + 10 + 1920x1080 interlaced video at 30 fps as per BT.1120/SMPTE 274M. + + + V4L2_DV_1080I25 + 11 + 1920x1080 interlaced video at 25 fps as per BT.1120. + + + V4L2_DV_1080I50 + 12 + 1920x1080 interlaced video at 50 fps as per SMPTE 296M. + + + V4L2_DV_1080I60 + 13 + 1920x1080 interlaced video at 60 fps as per SMPTE 296M. + + + V4L2_DV_1080P24 + 14 + 1920x1080 progressive video at 24 fps as per SMPTE 296M. + + + V4L2_DV_1080P25 + 15 + 1920x1080 progressive video at 25 fps as per SMPTE 296M. + + + V4L2_DV_1080P30 + 16 + 1920x1080 progressive video at 30 fps as per SMPTE 296M. + + + V4L2_DV_1080P50 + 17 + 1920x1080 progressive video at 50 fps as per BT.1120. + + + V4L2_DV_1080P60 + 18 + 1920x1080 progressive video at 60 fps as per BT.1120. + + + +
+
+ + + &return-value; + + + + EINVAL + + The &v4l2-dv-enum-preset; index +is out of bounds. + + + + +
+ + diff --git a/Documentation/DocBook/v4l/vidioc-enuminput.xml b/Documentation/DocBook/v4l/vidioc-enuminput.xml index 414856b82473..71b868e2fb8f 100644 --- a/Documentation/DocBook/v4l/vidioc-enuminput.xml +++ b/Documentation/DocBook/v4l/vidioc-enuminput.xml @@ -124,7 +124,13 @@ current input. __u32 - reserved[4] + capabilities + This field provides capabilities for the +input. See for flags. + + + __u32 + reserved[3] Reserved for future extensions. Drivers must set the array to zero. @@ -261,6 +267,34 @@ flag is set Macrovision has been detected. + + + + Input capabilities + + &cs-def; + + + V4L2_IN_CAP_PRESETS + 0x00000001 + This input supports setting DV presets by using VIDIOC_S_DV_PRESET. + + + V4L2_OUT_CAP_CUSTOM_TIMINGS + 0x00000002 + This input supports setting custom video timings by using VIDIOC_S_DV_TIMINGS. + + + V4L2_IN_CAP_STD + 0x00000004 + This input supports setting the TV standard by using VIDIOC_S_STD. + + + +
diff --git a/Documentation/DocBook/v4l/vidioc-enumoutput.xml b/Documentation/DocBook/v4l/vidioc-enumoutput.xml index e8d16dcd50cf..a281d26a195f 100644 --- a/Documentation/DocBook/v4l/vidioc-enumoutput.xml +++ b/Documentation/DocBook/v4l/vidioc-enumoutput.xml @@ -114,7 +114,13 @@ details on video standards and how to switch see __u32 - reserved[4] + capabilities + This field provides capabilities for the +output. See for flags. + + + __u32 + reserved[3] Reserved for future extensions. Drivers must set the array to zero. @@ -147,6 +153,34 @@ CVBS, S-Video, RGB. + + + Output capabilities + + &cs-def; + + + V4L2_OUT_CAP_PRESETS + 0x00000001 + This output supports setting DV presets by using VIDIOC_S_DV_PRESET. + + + V4L2_OUT_CAP_CUSTOM_TIMINGS + 0x00000002 + This output supports setting custom video timings by using VIDIOC_S_DV_TIMINGS. + + + V4L2_OUT_CAP_STD + 0x00000004 + This output supports setting the TV standard by using VIDIOC_S_STD. + + + +
+
&return-value; diff --git a/Documentation/DocBook/v4l/vidioc-g-dv-preset.xml b/Documentation/DocBook/v4l/vidioc-g-dv-preset.xml new file mode 100644 index 000000000000..3c6784e132f3 --- /dev/null +++ b/Documentation/DocBook/v4l/vidioc-g-dv-preset.xml @@ -0,0 +1,111 @@ + + + ioctl VIDIOC_G_DV_PRESET, VIDIOC_S_DV_PRESET + &manvol; + + + + VIDIOC_G_DV_PRESET + VIDIOC_S_DV_PRESET + Query or select the DV preset of the current input or output + + + + + + int ioctl + int fd + int request + &v4l2-dv-preset; +*argp + + + + + + Arguments + + + + fd + + &fd; + + + + request + + VIDIOC_G_DV_PRESET, VIDIOC_S_DV_PRESET + + + + argp + + + + + + + + + Description + To query and select the current DV preset, applications +use the VIDIOC_G_DV_PRESET and VIDIOC_S_DV_PRESET +ioctls which take a pointer to a &v4l2-dv-preset; type as argument. +Applications must zero the reserved array in &v4l2-dv-preset;. +VIDIOC_G_DV_PRESET returns a dv preset in the field +preset of &v4l2-dv-preset;. + + VIDIOC_S_DV_PRESET accepts a pointer to a &v4l2-dv-preset; +that has the preset value to be set. Applications must zero the reserved array in &v4l2-dv-preset;. +If the preset is not supported, it returns an &EINVAL; + + + + &return-value; + + + + EINVAL + + This ioctl is not supported, or the +VIDIOC_S_DV_PRESET,VIDIOC_S_DV_PRESET parameter was unsuitable. + + + + EBUSY + + The device is busy and therefore can not change the preset. + + + + + + struct <structname>v4l2_dv_preset</structname> + + &cs-str; + + + __u32 + preset + Preset value to represent the digital video timings + + + __u32 + reserved[4] + Reserved fields for future use + + + +
+ +
+
+ + diff --git a/Documentation/DocBook/v4l/vidioc-g-dv-timings.xml b/Documentation/DocBook/v4l/vidioc-g-dv-timings.xml new file mode 100644 index 000000000000..ecc19576bb8f --- /dev/null +++ b/Documentation/DocBook/v4l/vidioc-g-dv-timings.xml @@ -0,0 +1,224 @@ + + + ioctl VIDIOC_G_DV_TIMINGS, VIDIOC_S_DV_TIMINGS + &manvol; + + + + VIDIOC_G_DV_TIMINGS + VIDIOC_S_DV_TIMINGS + Get or set custom DV timings for input or output + + + + + + int ioctl + int fd + int request + &v4l2-dv-timings; +*argp + + + + + + Arguments + + + + fd + + &fd; + + + + request + + VIDIOC_G_DV_TIMINGS, VIDIOC_S_DV_TIMINGS + + + + argp + + + + + + + + + Description + To set custom DV timings for the input or output, applications use the +VIDIOC_S_DV_TIMINGS ioctl and to get the current custom timings, +applications use the VIDIOC_G_DV_TIMINGS ioctl. The detailed timing +information is filled in using the structure &v4l2-dv-timings;. These ioctls take +a pointer to the &v4l2-dv-timings; structure as argument. If the ioctl is not supported +or the timing values are not correct, the driver returns &EINVAL;. + + + + &return-value; + + + + EINVAL + + This ioctl is not supported, or the +VIDIOC_S_DV_TIMINGS parameter was unsuitable. + + + + EBUSY + + The device is busy and therefore can not change the timings. + + + + + + struct <structname>v4l2_bt_timings</structname> + + &cs-str; + + + __u32 + width + Width of the active video in pixels + + + __u32 + height + Height of the active video in lines + + + __u32 + interlaced + Progressive (0) or interlaced (1) + + + __u32 + polarities + This is a bit mask that defines polarities of sync signals. +bit 0 (V4L2_DV_VSYNC_POS_POL) is for vertical sync polarity and bit 1 (V4L2_DV_HSYNC_POS_POL) is for horizontal sync polarity. If the bit is set +(1) it is positive polarity and if is cleared (0), it is negative polarity. + + + __u64 + pixelclock + Pixel clock in Hz. Ex. 74.25MHz->74250000 + + + __u32 + hfrontporch + Horizontal front porch in pixels + + + __u32 + hsync + Horizontal sync length in pixels + + + __u32 + hbackporch + Horizontal back porch in pixels + + + __u32 + vfrontporch + Vertical front porch in lines + + + __u32 + vsync + Vertical sync length in lines + + + __u32 + vbackporch + Vertical back porch in lines + + + __u32 + il_vfrontporch + Vertical front porch in lines for bottom field of interlaced field formats + + + __u32 + il_vsync + Vertical sync length in lines for bottom field of interlaced field formats + + + __u32 + il_vbackporch + Vertical back porch in lines for bottom field of interlaced field formats + + + +
+ + + struct <structname>v4l2_dv_timings</structname> + + &cs-str; + + + __u32 + type + + Type of DV timings as listed in . + + + union + + + + + + &v4l2-bt-timings; + bt + Timings defined by BT.656/1120 specifications + + + + __u32 + reserved[32] + + + + +
+ + + DV Timing types + + &cs-str; + + + Timing type + value + Description + + + + + + + + V4L2_DV_BT_656_1120 + 0 + BT.656/1120 timings + + + +
+
+
+ + diff --git a/Documentation/DocBook/v4l/vidioc-query-dv-preset.xml b/Documentation/DocBook/v4l/vidioc-query-dv-preset.xml new file mode 100644 index 000000000000..87e4f0f6151c --- /dev/null +++ b/Documentation/DocBook/v4l/vidioc-query-dv-preset.xml @@ -0,0 +1,85 @@ + + + ioctl VIDIOC_QUERY_DV_PRESET + &manvol; + + + + VIDIOC_QUERY_DV_PRESET + Sense the DV preset received by the current +input + + + + + + int ioctl + int fd + int request + &v4l2-dv-preset; *argp + + + + + + Arguments + + + + fd + + &fd; + + + + request + + VIDIOC_QUERY_DV_PRESET + + + + argp + + + + + + + + + Description + + The hardware may be able to detect the current DV preset +automatically, similar to sensing the video standard. To do so, applications +call VIDIOC_QUERY_DV_PRESET with a pointer to a +&v4l2-dv-preset; type. Once the hardware detects a preset, that preset is +returned in the preset field of &v4l2-dv-preset;. When detection is not +possible or fails, the value V4L2_DV_INVALID is returned. + + + + &return-value; + + + EINVAL + + This ioctl is not supported. + + + + EBUSY + + The device is busy and therefore can not sense the preset + + + + + + + -- cgit v1.2.3 From b33f5f8af3fabe6a080c9d06f8f3184a4de3f3c2 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Thu, 3 Dec 2009 01:32:12 -0300 Subject: V4L/DVB (13573): v4l2-spec: updated revision history, updated version to 2.6.33. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/v4l/compat.xml | 16 ++++++++++++---- Documentation/DocBook/v4l/v4l2.xml | 22 ++++++++++++++++++++-- 2 files changed, 32 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/v4l/compat.xml b/Documentation/DocBook/v4l/compat.xml index 4d1902a54d61..b9dbdf9e6d29 100644 --- a/Documentation/DocBook/v4l/compat.xml +++ b/Documentation/DocBook/v4l/compat.xml @@ -2291,8 +2291,8 @@ was renamed to v4l2_chip_ident_old New control V4L2_CID_COLORFX was added. - - + +
V4L2 in Linux 2.6.32 @@ -2322,8 +2322,16 @@ more information. Added Remote Controller chapter, describing the default Remote Controller mapping for media devices. - -
+ + +
+ V4L2 in Linux 2.6.33 + + + Added support for Digital Video timings in order to support HDTV receivers and transmitters. + + +
diff --git a/Documentation/DocBook/v4l/v4l2.xml b/Documentation/DocBook/v4l/v4l2.xml index 77b7dd8a3b5f..060105af49e5 100644 --- a/Documentation/DocBook/v4l/v4l2.xml +++ b/Documentation/DocBook/v4l/v4l2.xml @@ -74,6 +74,17 @@ Remote Controller chapter. + + + Muralidharan + Karicheri + Documented the Digital Video timings API. + +
+ m-karicheri2@ti.com +
+
+
@@ -89,7 +100,7 @@ Remote Controller chapter. 2008 2009 Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin -Rubli, Andy Walls, Mauro Carvalho Chehab +Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab Except when explicitly stated as GPL, programming examples within @@ -102,6 +113,13 @@ structs, ioctls) must be noted in more detail in the history chapter (compat.sgml), along with the possible impact on existing drivers and applications. --> + + 2.6.33 + 2009-12-03 + mk + Added documentation for the Digital Video timings API. + + 2.6.32 2009-08-31 @@ -355,7 +373,7 @@ and discussions on the V4L mailing list. Video for Linux Two API Specification - Revision 2.6.32 + Revision 2.6.33 &sub-common; -- cgit v1.2.3 From 422eaac5d5367bab71fa2cc33393b2ea894498fe Mon Sep 17 00:00:00 2001 From: Muralidharan Karicheri Date: Thu, 10 Dec 2009 10:31:44 -0300 Subject: V4L/DVB (13619): v4l2-spec: Adds EBUSY error code for S_STD and QUERYSTD ioctls During review of Video Timing API documentation, Hans Verkuil had a comment on adding EBUSY error code for VIDIOC_S_STD and VIDIOC_QUERYSTD ioctls. This patch updates the document for this. Signed-off-by: Muralidharan Karicheri Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/v4l/vidioc-g-std.xml | 6 ++++++ Documentation/DocBook/v4l/vidioc-querystd.xml | 6 ++++++ 2 files changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/DocBook/v4l/vidioc-g-std.xml b/Documentation/DocBook/v4l/vidioc-g-std.xml index b6f5d267e856..912f8513e5da 100644 --- a/Documentation/DocBook/v4l/vidioc-g-std.xml +++ b/Documentation/DocBook/v4l/vidioc-g-std.xml @@ -86,6 +86,12 @@ standards. VIDIOC_S_STD parameter was unsuitable. + + EBUSY + + The device is busy and therefore can not change the standard + + diff --git a/Documentation/DocBook/v4l/vidioc-querystd.xml b/Documentation/DocBook/v4l/vidioc-querystd.xml index b5a7ff934486..1a9e60393091 100644 --- a/Documentation/DocBook/v4l/vidioc-querystd.xml +++ b/Documentation/DocBook/v4l/vidioc-querystd.xml @@ -70,6 +70,12 @@ current video input or output. This ioctl is not supported. + + EBUSY + + The device is busy and therefore can not detect the standard + + -- cgit v1.2.3 From 329e4e18dfdc552f36b0642a3de5ebfa96063666 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Tue, 15 Dec 2009 21:51:08 -0200 Subject: thinkpad-acpi: volume subdriver rewrite I don't trust the coupled EC writes and SMI calls the current volume control code does very much, although it is exactly what the IBM DSDTs seem to do (they never do more than a single step though). Change the driver to stop issuing SMIs, and just drive the EC directly to the desired level (DSDTs seem to confirm this will work even on very old models like the 570 and 600e/x). We checkpoint directly to NVRAM (this can be turned off) at suspend/shutdown/driver unload, which from what I can see in tbp, should also work on every ThinkPad. Signed-off-by: Henrique de Moraes Holschuh Cc: Lorne Applebaum Cc: Matthew Garrett Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 22 ++- drivers/platform/x86/thinkpad_acpi.c | 340 ++++++++++++++++++++++++++------ 2 files changed, 299 insertions(+), 63 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index f5056c7fb5be..96687d0106a9 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1092,22 +1092,33 @@ WARNING: its level up and down at every change. -Volume control -- /proc/acpi/ibm/volume ---------------------------------------- +Volume control +-------------- + +procfs: /proc/acpi/ibm/volume -This feature allows volume control on ThinkPad models which don't have -a hardware volume knob. The available commands are: +This feature allows volume control on ThinkPad models with a digital +volume knob, as well as mute/unmute control. The available commands are: echo up >/proc/acpi/ibm/volume echo down >/proc/acpi/ibm/volume echo mute >/proc/acpi/ibm/volume echo 'level ' >/proc/acpi/ibm/volume -The number range is 0 to 15 although not all of them may be +The number range is 0 to 14 although not all of them may be distinct. The unmute the volume after the mute command, use either the up or down command (the level command will not unmute the volume). The current volume level and mute state is shown in the file. +There are two strategies for volume control. To select which one +should be used, use the volume_mode module parameter: volume_mode=1 +selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing +(so that volume/mute changes are remembered across shutdown/reboot). + +The driver will operate in volume_mode=3 by default. If that does not +work well on your ThinkPad model, please report this to +ibm-acpi-devel@lists.sourceforge.net. + The ALSA mixer interface to this feature is still missing, but patches to add it exist. That problem should be addressed in the not so distant future. @@ -1376,6 +1387,7 @@ to enable more than one output class, just add their values. 0x0008 HKEY event interface, hotkeys 0x0010 Fan control 0x0020 Backlight brightness + 0x0040 Audio mixer/volume control There is also a kernel build option to enable more debugging information, which may be necessary to debug driver problems. diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 05714abf5a87..a2f5312c6a4e 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -231,6 +231,7 @@ enum tpacpi_hkey_event_t { #define TPACPI_DBG_HKEY 0x0008 #define TPACPI_DBG_FAN 0x0010 #define TPACPI_DBG_BRGHT 0x0020 +#define TPACPI_DBG_MIXER 0x0040 #define onoff(status, bit) ((status) & (1 << (bit)) ? "on" : "off") #define enabled(status, bit) ((status) & (1 << (bit)) ? "enabled" : "disabled") @@ -6375,21 +6376,260 @@ static struct ibm_struct brightness_driver_data = { * Volume subdriver */ -static int volume_offset = 0x30; +/* + * IBM ThinkPads have a simple volume controller with MUTE gating. + * Very early Lenovo ThinkPads follow the IBM ThinkPad spec. + * + * Since the *61 series (and probably also the later *60 series), Lenovo + * ThinkPads only implement the MUTE gate. + * + * EC register 0x30 + * Bit 6: MUTE (1 mutes sound) + * Bit 3-0: Volume + * Other bits should be zero as far as we know. + * + * This is also stored in CMOS NVRAM, byte 0x60, bit 6 (MUTE), and + * bits 3-0 (volume). Other bits in NVRAM may have other functions, + * such as bit 7 which is used to detect repeated presses of MUTE, + * and we leave them unchanged. + */ + +enum { + TP_EC_AUDIO = 0x30, + + /* TP_EC_AUDIO bits */ + TP_EC_AUDIO_MUTESW = 6, + + /* TP_EC_AUDIO bitmasks */ + TP_EC_AUDIO_LVL_MSK = 0x0F, + TP_EC_AUDIO_MUTESW_MSK = (1 << TP_EC_AUDIO_MUTESW), + + /* Maximum volume */ + TP_EC_VOLUME_MAX = 14, +}; + +enum tpacpi_volume_access_mode { + TPACPI_VOL_MODE_AUTO = 0, /* Not implemented yet */ + TPACPI_VOL_MODE_EC, /* Pure EC control */ + TPACPI_VOL_MODE_UCMS_STEP, /* UCMS step-based control: N/A */ + TPACPI_VOL_MODE_ECNVRAM, /* EC control w/ NVRAM store */ + TPACPI_VOL_MODE_MAX +}; + +static enum tpacpi_volume_access_mode volume_mode = + TPACPI_VOL_MODE_MAX; + + +/* + * Used to syncronize writers to TP_EC_AUDIO and + * TP_NVRAM_ADDR_MIXER, as we need to do read-modify-write + */ +static struct mutex volume_mutex; + +static void tpacpi_volume_checkpoint_nvram(void) +{ + u8 lec = 0; + u8 b_nvram; + const u8 ec_mask = TP_EC_AUDIO_LVL_MSK | TP_EC_AUDIO_MUTESW_MSK; + + if (volume_mode != TPACPI_VOL_MODE_ECNVRAM) + return; + + vdbg_printk(TPACPI_DBG_MIXER, + "trying to checkpoint mixer state to NVRAM...\n"); + + if (mutex_lock_killable(&volume_mutex) < 0) + return; + + if (unlikely(!acpi_ec_read(TP_EC_AUDIO, &lec))) + goto unlock; + lec &= ec_mask; + b_nvram = nvram_read_byte(TP_NVRAM_ADDR_MIXER); + + if (lec != (b_nvram & ec_mask)) { + /* NVRAM needs update */ + b_nvram &= ~ec_mask; + b_nvram |= lec; + nvram_write_byte(b_nvram, TP_NVRAM_ADDR_MIXER); + dbg_printk(TPACPI_DBG_MIXER, + "updated NVRAM mixer status to 0x%02x (0x%02x)\n", + (unsigned int) lec, (unsigned int) b_nvram); + } else { + vdbg_printk(TPACPI_DBG_MIXER, + "NVRAM mixer status already is 0x%02x (0x%02x)\n", + (unsigned int) lec, (unsigned int) b_nvram); + } + +unlock: + mutex_unlock(&volume_mutex); +} + +static int volume_get_status_ec(u8 *status) +{ + u8 s; + + if (!acpi_ec_read(TP_EC_AUDIO, &s)) + return -EIO; + + *status = s; + + dbg_printk(TPACPI_DBG_MIXER, "status 0x%02x\n", s); + + return 0; +} + +static int volume_get_status(u8 *status) +{ + return volume_get_status_ec(status); +} + +static int volume_set_status_ec(const u8 status) +{ + if (!acpi_ec_write(TP_EC_AUDIO, status)) + return -EIO; + + dbg_printk(TPACPI_DBG_MIXER, "set EC mixer to 0x%02x\n", status); + + return 0; +} + +static int volume_set_status(const u8 status) +{ + return volume_set_status_ec(status); +} + +static int volume_set_mute_ec(const bool mute) +{ + int rc; + u8 s, n; + + if (mutex_lock_killable(&volume_mutex) < 0) + return -EINTR; + + rc = volume_get_status_ec(&s); + if (rc) + goto unlock; + + n = (mute) ? s | TP_EC_AUDIO_MUTESW_MSK : + s & ~TP_EC_AUDIO_MUTESW_MSK; + + if (n != s) + rc = volume_set_status_ec(n); + +unlock: + mutex_unlock(&volume_mutex); + return rc; +} + +static int volume_set_mute(const bool mute) +{ + dbg_printk(TPACPI_DBG_MIXER, "trying to %smute\n", + (mute) ? "" : "un"); + return volume_set_mute_ec(mute); +} + +static int volume_set_volume_ec(const u8 vol) +{ + int rc; + u8 s, n; + + if (vol > TP_EC_VOLUME_MAX) + return -EINVAL; + + if (mutex_lock_killable(&volume_mutex) < 0) + return -EINTR; + + rc = volume_get_status_ec(&s); + if (rc) + goto unlock; + + n = (s & ~TP_EC_AUDIO_LVL_MSK) | vol; + + if (n != s) + rc = volume_set_status_ec(n); + +unlock: + mutex_unlock(&volume_mutex); + return rc; +} + +static int volume_set_volume(const u8 vol) +{ + dbg_printk(TPACPI_DBG_MIXER, + "trying to set volume level to %hu\n", vol); + return volume_set_volume_ec(vol); +} + +static void volume_suspend(pm_message_t state) +{ + tpacpi_volume_checkpoint_nvram(); +} + +static void volume_shutdown(void) +{ + tpacpi_volume_checkpoint_nvram(); +} + +static void volume_exit(void) +{ + tpacpi_volume_checkpoint_nvram(); +} + +static int __init volume_init(struct ibm_init_struct *iibm) +{ + vdbg_printk(TPACPI_DBG_INIT, "initializing volume subdriver\n"); + + mutex_init(&volume_mutex); + + /* + * Check for module parameter bogosity, note that we + * init volume_mode to TPACPI_VOL_MODE_MAX in order to be + * able to detect "unspecified" + */ + if (volume_mode > TPACPI_VOL_MODE_MAX) + return -EINVAL; + + if (volume_mode == TPACPI_VOL_MODE_UCMS_STEP) { + printk(TPACPI_ERR + "UCMS step volume mode not implemented, " + "please contact %s\n", TPACPI_MAIL); + return 1; + } + + if (volume_mode == TPACPI_VOL_MODE_AUTO || + volume_mode == TPACPI_VOL_MODE_MAX) { + volume_mode = TPACPI_VOL_MODE_ECNVRAM; + + dbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "driver auto-selected volume_mode=%d\n", + volume_mode); + } else { + dbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "using user-supplied volume_mode=%d\n", + volume_mode); + } + + vdbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "volume is supported\n"); + + return 0; +} static int volume_read(char *p) { int len = 0; - u8 level; + u8 status; - if (!acpi_ec_read(volume_offset, &level)) { + if (volume_get_status(&status) < 0) { len += sprintf(p + len, "level:\t\tunreadable\n"); } else { - len += sprintf(p + len, "level:\t\t%d\n", level & 0xf); - len += sprintf(p + len, "mute:\t\t%s\n", onoff(level, 6)); + len += sprintf(p + len, "level:\t\t%d\n", + status & TP_EC_AUDIO_LVL_MSK); + len += sprintf(p + len, "mute:\t\t%s\n", + onoff(status, TP_EC_AUDIO_MUTESW)); len += sprintf(p + len, "commands:\tup, down, mute\n"); len += sprintf(p + len, "commands:\tlevel " - " ( is 0-15)\n"); + " ( is 0-%d)\n", TP_EC_VOLUME_MAX); } return len; @@ -6397,77 +6637,55 @@ static int volume_read(char *p) static int volume_write(char *buf) { - int cmos_cmd, inc, i; - u8 level, mute; - int new_level, new_mute; + u8 s; + u8 new_level, new_mute; + int l; char *cmd; + int rc; - while ((cmd = next_cmd(&buf))) { - if (!acpi_ec_read(volume_offset, &level)) - return -EIO; - new_mute = mute = level & 0x40; - new_level = level = level & 0xf; + rc = volume_get_status(&s); + if (rc < 0) + return rc; + new_level = s & TP_EC_AUDIO_LVL_MSK; + new_mute = s & TP_EC_AUDIO_MUTESW_MSK; + + while ((cmd = next_cmd(&buf))) { if (strlencmp(cmd, "up") == 0) { - if (mute) + if (new_mute) new_mute = 0; - else - new_level = level == 15 ? 15 : level + 1; + else if (new_level < TP_EC_VOLUME_MAX) + new_level++; } else if (strlencmp(cmd, "down") == 0) { - if (mute) + if (new_mute) new_mute = 0; - else - new_level = level == 0 ? 0 : level - 1; - } else if (sscanf(cmd, "level %d", &new_level) == 1 && - new_level >= 0 && new_level <= 15) { - /* new_level set */ + else if (new_level > 0) + new_level--; + } else if (sscanf(cmd, "level %u", &l) == 1 && + l >= 0 && l <= TP_EC_VOLUME_MAX) { + new_level = l; } else if (strlencmp(cmd, "mute") == 0) { - new_mute = 0x40; + new_mute = TP_EC_AUDIO_MUTESW_MSK; } else return -EINVAL; + } - if (new_level != level) { - /* mute doesn't change */ - - cmos_cmd = (new_level > level) ? - TP_CMOS_VOLUME_UP : TP_CMOS_VOLUME_DOWN; - inc = new_level > level ? 1 : -1; - - if (mute && (issue_thinkpad_cmos_command(cmos_cmd) || - !acpi_ec_write(volume_offset, level))) - return -EIO; - - for (i = level; i != new_level; i += inc) - if (issue_thinkpad_cmos_command(cmos_cmd) || - !acpi_ec_write(volume_offset, i + inc)) - return -EIO; - - if (mute && - (issue_thinkpad_cmos_command(TP_CMOS_VOLUME_MUTE) || - !acpi_ec_write(volume_offset, new_level + mute))) { - return -EIO; - } - } - - if (new_mute != mute) { - /* level doesn't change */ + tpacpi_disclose_usertask("procfs volume", + "%smute and set level to %d\n", + new_mute ? "" : "un", new_level); - cmos_cmd = (new_mute) ? - TP_CMOS_VOLUME_MUTE : TP_CMOS_VOLUME_UP; + rc = volume_set_status(new_mute | new_level); - if (issue_thinkpad_cmos_command(cmos_cmd) || - !acpi_ec_write(volume_offset, level + new_mute)) - return -EIO; - } - } - - return 0; + return (rc == -EINTR) ? -ERESTARTSYS : rc; } static struct ibm_struct volume_driver_data = { .name = "volume", .read = volume_read, .write = volume_write, + .exit = volume_exit, + .suspend = volume_suspend, + .shutdown = volume_shutdown, }; /************************************************************************* @@ -8121,6 +8339,7 @@ static struct ibm_init_struct ibms_init[] __initdata = { .data = &brightness_driver_data, }, { + .init = volume_init, .data = &volume_driver_data, }, { @@ -8186,6 +8405,11 @@ MODULE_PARM_DESC(hotkey_report_mode, "used for backwards compatibility with userspace, " "see documentation"); +module_param_named(volume_mode, volume_mode, uint, 0444); +MODULE_PARM_DESC(volume_mode, + "Selects volume control strategy: " + "0=auto, 1=EC, 2=N/A, 3=EC+NVRAM"); + #define TPACPI_PARAM(feature) \ module_param_call(feature, set_ibm_param, NULL, NULL, 0); \ MODULE_PARM_DESC(feature, "Simulates thinkpad-acpi procfs command " \ -- cgit v1.2.3 From a112ceee673629afc204bf6b4a4828a6143a083f Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Tue, 15 Dec 2009 21:51:09 -0200 Subject: thinkpad-acpi: support MUTE-only ThinkPads Lenovo removed the extra mixer since the T61 and thereabouts. Newer Lenovo models only have the mute gate function, and leave the volume control to the HDA mixer. Until a way to automatically query the firmware about its audio control capabilities is discovered (there might not be any), use a white/black list. We will likely need to ask T60 (old and new model) and Z60/Z61 users whether they have volume control to populate the black/white list. Meanwhile, provide a volume_capabilities parameter that can be used to override the defaults. Signed-off-by: Henrique de Moraes Holschuh Cc: Lorne Applebaum Cc: Matthew Garrett Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 19 +++- drivers/platform/x86/thinkpad_acpi.c | 159 ++++++++++++++++++++++++++------ 2 files changed, 149 insertions(+), 29 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 96687d0106a9..bd87682103c1 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1098,18 +1098,31 @@ Volume control procfs: /proc/acpi/ibm/volume This feature allows volume control on ThinkPad models with a digital -volume knob, as well as mute/unmute control. The available commands are: +volume knob (when available, not all models have it), as well as +mute/unmute control. The available commands are: echo up >/proc/acpi/ibm/volume echo down >/proc/acpi/ibm/volume echo mute >/proc/acpi/ibm/volume + echo unmute >/proc/acpi/ibm/volume echo 'level ' >/proc/acpi/ibm/volume The number range is 0 to 14 although not all of them may be distinct. The unmute the volume after the mute command, use either the -up or down command (the level command will not unmute the volume). +up or down command (the level command will not unmute the volume), or +the unmute command. + The current volume level and mute state is shown in the file. +You can use the volume_capabilities parameter to tell the driver +whether your thinkpad has volume control or mute-only control: +volume_capabilities=1 for mixers with mute and volume control, +volume_capabilities=2 for mixers with only mute control. + +If the driver misdetects the capabilities for your ThinkPad model, +please report this to ibm-acpi-devel@lists.sourceforge.net, so that we +can update the driver. + There are two strategies for volume control. To select which one should be used, use the volume_mode module parameter: volume_mode=1 selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing @@ -1450,3 +1463,5 @@ Sysfs interface changelog: is deprecated and marked for removal. 0x020600: Marker for backlight change event support. + +0x020700: Support for mute-only mixers. diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index a2f5312c6a4e..4d909d5a0340 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -22,7 +22,7 @@ */ #define TPACPI_VERSION "0.23" -#define TPACPI_SYSFS_VERSION 0x020600 +#define TPACPI_SYSFS_VERSION 0x020700 /* * Changelog: @@ -299,6 +299,7 @@ static struct { u32 fan_ctrl_status_undef:1; u32 second_fan:1; u32 beep_needs_two_args:1; + u32 mixer_no_level_control:1; u32 input_device_registered:1; u32 platform_drv_registered:1; u32 platform_drv_attrs_registered:1; @@ -426,6 +427,12 @@ static void tpacpi_log_usertask(const char * const what) .ec = TPACPI_MATCH_ANY, \ .quirks = (__quirk) } +#define TPACPI_QEC_LNV(__id1, __id2, __quirk) \ + { .vendor = PCI_VENDOR_ID_LENOVO, \ + .bios = TPACPI_MATCH_ANY, \ + .ec = TPID(__id1, __id2), \ + .quirks = (__quirk) } + struct tpacpi_quirk { unsigned int vendor; u16 bios; @@ -6416,9 +6423,17 @@ enum tpacpi_volume_access_mode { TPACPI_VOL_MODE_MAX }; +enum tpacpi_volume_capabilities { + TPACPI_VOL_CAP_AUTO = 0, /* Use white/blacklist */ + TPACPI_VOL_CAP_VOLMUTE, /* Output vol and mute */ + TPACPI_VOL_CAP_MUTEONLY, /* Output mute only */ + TPACPI_VOL_CAP_MAX +}; + static enum tpacpi_volume_access_mode volume_mode = TPACPI_VOL_MODE_MAX; +static enum tpacpi_volume_capabilities volume_capabilities; /* * Used to syncronize writers to TP_EC_AUDIO and @@ -6430,7 +6445,7 @@ static void tpacpi_volume_checkpoint_nvram(void) { u8 lec = 0; u8 b_nvram; - const u8 ec_mask = TP_EC_AUDIO_LVL_MSK | TP_EC_AUDIO_MUTESW_MSK; + u8 ec_mask; if (volume_mode != TPACPI_VOL_MODE_ECNVRAM) return; @@ -6438,6 +6453,11 @@ static void tpacpi_volume_checkpoint_nvram(void) vdbg_printk(TPACPI_DBG_MIXER, "trying to checkpoint mixer state to NVRAM...\n"); + if (tp_features.mixer_no_level_control) + ec_mask = TP_EC_AUDIO_MUTESW_MSK; + else + ec_mask = TP_EC_AUDIO_MUTESW_MSK | TP_EC_AUDIO_LVL_MSK; + if (mutex_lock_killable(&volume_mutex) < 0) return; @@ -6575,8 +6595,36 @@ static void volume_exit(void) tpacpi_volume_checkpoint_nvram(); } +#define TPACPI_VOL_Q_MUTEONLY 0x0001 /* Mute-only control available */ +#define TPACPI_VOL_Q_LEVEL 0x0002 /* Volume control available */ + +static const struct tpacpi_quirk volume_quirk_table[] __initconst = { + /* Whitelist volume level on all IBM by default */ + { .vendor = PCI_VENDOR_ID_IBM, + .bios = TPACPI_MATCH_ANY, + .ec = TPACPI_MATCH_ANY, + .quirks = TPACPI_VOL_Q_LEVEL }, + + /* Lenovo models with volume control (needs confirmation) */ + TPACPI_QEC_LNV('7', 'C', TPACPI_VOL_Q_LEVEL), /* R60/i */ + TPACPI_QEC_LNV('7', 'E', TPACPI_VOL_Q_LEVEL), /* R60e/i */ + TPACPI_QEC_LNV('7', '9', TPACPI_VOL_Q_LEVEL), /* T60/p */ + TPACPI_QEC_LNV('7', 'B', TPACPI_VOL_Q_LEVEL), /* X60/s */ + TPACPI_QEC_LNV('7', 'J', TPACPI_VOL_Q_LEVEL), /* X60t */ + TPACPI_QEC_LNV('7', '7', TPACPI_VOL_Q_LEVEL), /* Z60 */ + TPACPI_QEC_LNV('7', 'F', TPACPI_VOL_Q_LEVEL), /* Z61 */ + + /* Whitelist mute-only on all Lenovo by default */ + { .vendor = PCI_VENDOR_ID_LENOVO, + .bios = TPACPI_MATCH_ANY, + .ec = TPACPI_MATCH_ANY, + .quirks = TPACPI_VOL_Q_MUTEONLY } +}; + static int __init volume_init(struct ibm_init_struct *iibm) { + unsigned long quirks; + vdbg_printk(TPACPI_DBG_INIT, "initializing volume subdriver\n"); mutex_init(&volume_mutex); @@ -6596,6 +6644,36 @@ static int __init volume_init(struct ibm_init_struct *iibm) return 1; } + if (volume_capabilities >= TPACPI_VOL_CAP_MAX) + return -EINVAL; + + quirks = tpacpi_check_quirks(volume_quirk_table, + ARRAY_SIZE(volume_quirk_table)); + + switch (volume_capabilities) { + case TPACPI_VOL_CAP_AUTO: + if (quirks & TPACPI_VOL_Q_MUTEONLY) + tp_features.mixer_no_level_control = 1; + else if (quirks & TPACPI_VOL_Q_LEVEL) + tp_features.mixer_no_level_control = 0; + else + return 1; /* no mixer */ + break; + case TPACPI_VOL_CAP_VOLMUTE: + tp_features.mixer_no_level_control = 0; + break; + case TPACPI_VOL_CAP_MUTEONLY: + tp_features.mixer_no_level_control = 1; + break; + default: + return 1; + } + + if (volume_capabilities != TPACPI_VOL_CAP_AUTO) + dbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "using user-supplied volume_capabilities=%d\n", + volume_capabilities); + if (volume_mode == TPACPI_VOL_MODE_AUTO || volume_mode == TPACPI_VOL_MODE_MAX) { volume_mode = TPACPI_VOL_MODE_ECNVRAM; @@ -6610,7 +6688,8 @@ static int __init volume_init(struct ibm_init_struct *iibm) } vdbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, - "volume is supported\n"); + "mute is supported, volume control is %s\n", + str_supported(!tp_features.mixer_no_level_control)); return 0; } @@ -6623,13 +6702,21 @@ static int volume_read(char *p) if (volume_get_status(&status) < 0) { len += sprintf(p + len, "level:\t\tunreadable\n"); } else { - len += sprintf(p + len, "level:\t\t%d\n", - status & TP_EC_AUDIO_LVL_MSK); + if (tp_features.mixer_no_level_control) + len += sprintf(p + len, "level:\t\tunsupported\n"); + else + len += sprintf(p + len, "level:\t\t%d\n", + status & TP_EC_AUDIO_LVL_MSK); + len += sprintf(p + len, "mute:\t\t%s\n", onoff(status, TP_EC_AUDIO_MUTESW)); - len += sprintf(p + len, "commands:\tup, down, mute\n"); - len += sprintf(p + len, "commands:\tlevel " + + len += sprintf(p + len, "commands:\tunmute, mute\n"); + if (!tp_features.mixer_no_level_control) { + len += sprintf(p + len, "commands:\tup, down\n"); + len += sprintf(p + len, "commands:\tlevel " " ( is 0-%d)\n", TP_EC_VOLUME_MAX); + } } return len; @@ -6651,30 +6738,43 @@ static int volume_write(char *buf) new_mute = s & TP_EC_AUDIO_MUTESW_MSK; while ((cmd = next_cmd(&buf))) { - if (strlencmp(cmd, "up") == 0) { - if (new_mute) - new_mute = 0; - else if (new_level < TP_EC_VOLUME_MAX) - new_level++; - } else if (strlencmp(cmd, "down") == 0) { - if (new_mute) - new_mute = 0; - else if (new_level > 0) - new_level--; - } else if (sscanf(cmd, "level %u", &l) == 1 && - l >= 0 && l <= TP_EC_VOLUME_MAX) { - new_level = l; - } else if (strlencmp(cmd, "mute") == 0) { + if (!tp_features.mixer_no_level_control) { + if (strlencmp(cmd, "up") == 0) { + if (new_mute) + new_mute = 0; + else if (new_level < TP_EC_VOLUME_MAX) + new_level++; + continue; + } else if (strlencmp(cmd, "down") == 0) { + if (new_mute) + new_mute = 0; + else if (new_level > 0) + new_level--; + continue; + } else if (sscanf(cmd, "level %u", &l) == 1 && + l >= 0 && l <= TP_EC_VOLUME_MAX) { + new_level = l; + continue; + } + } + if (strlencmp(cmd, "mute") == 0) new_mute = TP_EC_AUDIO_MUTESW_MSK; - } else + else if (strlencmp(cmd, "unmute") == 0) + new_mute = 0; + else return -EINVAL; } - tpacpi_disclose_usertask("procfs volume", - "%smute and set level to %d\n", - new_mute ? "" : "un", new_level); - - rc = volume_set_status(new_mute | new_level); + if (tp_features.mixer_no_level_control) { + tpacpi_disclose_usertask("procfs volume", "%smute\n", + new_mute ? "" : "un"); + rc = volume_set_mute(!!new_mute); + } else { + tpacpi_disclose_usertask("procfs volume", + "%smute and set level to %d\n", + new_mute ? "" : "un", new_level); + rc = volume_set_status(new_mute | new_level); + } return (rc == -EINTR) ? -ERESTARTSYS : rc; } @@ -8410,6 +8510,11 @@ MODULE_PARM_DESC(volume_mode, "Selects volume control strategy: " "0=auto, 1=EC, 2=N/A, 3=EC+NVRAM"); +module_param_named(volume_capabilities, volume_capabilities, uint, 0444); +MODULE_PARM_DESC(volume_capabilities, + "Selects the mixer capabilites: " + "0=auto, 1=volume and mute, 2=mute only"); + #define TPACPI_PARAM(feature) \ module_param_call(feature, set_ibm_param, NULL, NULL, 0); \ MODULE_PARM_DESC(feature, "Simulates thinkpad-acpi procfs command " \ -- cgit v1.2.3 From c7ac6291ea7ebc568a1fce16fed87d102898f264 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Tue, 15 Dec 2009 21:51:10 -0200 Subject: thinkpad-acpi: disable volume control Disable volume control by default. It can be enabled at module load time by a module parameter (volume_control=1). The audio control mixer that thinkpad-acpi interacts with is fully functional without any drivers, and operated by hotkeys. The idea behind the console audio control is that the human operator is the only one that can interact with it. The ThinkVantage suite in Windows does not allow any software-based overrides, and only does OSD (on-screen-display) functions. The Linux driver will, with the addition of the ALSA interface, try to follow and enforce the ThinkVantage UI design: The user is supposed to use the keyboard hotkeys to interact with the console audio control. The kernel and the desktop environment is supposed to cooperate to provide proper user feedback through on-screen-display functions. Distros are urged to not to enable volume control by default. Enabling this must be a local admin's decision. This is the reason why there is no Kconfig option. Keep in mind that all ThinkPads have a normal, main mixer (AC97 or HDA) for regular software-based audio control. We are not talking about that mixer here. Advanced users are, of course, free to enable volume control and do as they please. Signed-off-by: Henrique de Moraes Holschuh Cc: Lorne Applebaum Cc: Matthew Garrett Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 13 +++++++++ drivers/platform/x86/thinkpad_acpi.c | 47 +++++++++++++++++++++++++++++---- 2 files changed, 55 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index bd87682103c1..6a5814330e51 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1097,6 +1097,18 @@ Volume control procfs: /proc/acpi/ibm/volume +NOTE: by default, the volume control interface operates in read-only +mode, as it is supposed to be used for on-screen-display purposes. +The read/write mode can be enabled through the use of the +"volume_control=1" module parameter. + +NOTE: distros are urged to not enable volume_control by default, this +should be done by the local admin only. The ThinkPad UI is for the +console audio control to be done through the volume keys only, and for +the desktop environment to just provide on-screen-display feedback. +Software volume control should be done only in the main AC97/HDA +mixer. + This feature allows volume control on ThinkPad models with a digital volume knob (when available, not all models have it), as well as mute/unmute control. The available commands are: @@ -1465,3 +1477,4 @@ Sysfs interface changelog: 0x020600: Marker for backlight change event support. 0x020700: Support for mute-only mixers. + Volume control in read-only mode by default. diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 4d909d5a0340..2d74926913d2 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -311,6 +311,7 @@ static struct { static struct { u16 hotkey_mask_ff:1; + u16 volume_ctrl_forbidden:1; } tp_warned; struct thinkpad_id_data { @@ -6434,6 +6435,7 @@ static enum tpacpi_volume_access_mode volume_mode = TPACPI_VOL_MODE_MAX; static enum tpacpi_volume_capabilities volume_capabilities; +static int volume_control_allowed; /* * Used to syncronize writers to TP_EC_AUDIO and @@ -6449,6 +6451,8 @@ static void tpacpi_volume_checkpoint_nvram(void) if (volume_mode != TPACPI_VOL_MODE_ECNVRAM) return; + if (!volume_control_allowed) + return; vdbg_printk(TPACPI_DBG_MIXER, "trying to checkpoint mixer state to NVRAM...\n"); @@ -6691,6 +6695,12 @@ static int __init volume_init(struct ibm_init_struct *iibm) "mute is supported, volume control is %s\n", str_supported(!tp_features.mixer_no_level_control)); + printk(TPACPI_INFO + "Console audio control enabled, mode: %s\n", + (volume_control_allowed) ? + "override (read/write)" : + "monitor (read only)"); + return 0; } @@ -6711,11 +6721,16 @@ static int volume_read(char *p) len += sprintf(p + len, "mute:\t\t%s\n", onoff(status, TP_EC_AUDIO_MUTESW)); - len += sprintf(p + len, "commands:\tunmute, mute\n"); - if (!tp_features.mixer_no_level_control) { - len += sprintf(p + len, "commands:\tup, down\n"); - len += sprintf(p + len, "commands:\tlevel " - " ( is 0-%d)\n", TP_EC_VOLUME_MAX); + if (volume_control_allowed) { + len += sprintf(p + len, "commands:\tunmute, mute\n"); + if (!tp_features.mixer_no_level_control) { + len += sprintf(p + len, + "commands:\tup, down\n"); + len += sprintf(p + len, + "commands:\tlevel " + " ( is 0-%d)\n", + TP_EC_VOLUME_MAX); + } } } @@ -6730,6 +6745,23 @@ static int volume_write(char *buf) char *cmd; int rc; + /* + * We do allow volume control at driver startup, so that the + * user can set initial state through the volume=... parameter hack. + */ + if (!volume_control_allowed && tpacpi_lifecycle != TPACPI_LIFE_INIT) { + if (unlikely(!tp_warned.volume_ctrl_forbidden)) { + tp_warned.volume_ctrl_forbidden = 1; + printk(TPACPI_NOTICE + "Console audio control in monitor mode, " + "changes are not allowed.\n"); + printk(TPACPI_NOTICE + "Use the volume_control=1 module parameter " + "to enable volume control\n"); + } + return -EPERM; + } + rc = volume_get_status(&s); if (rc < 0) return rc; @@ -8515,6 +8547,11 @@ MODULE_PARM_DESC(volume_capabilities, "Selects the mixer capabilites: " "0=auto, 1=volume and mute, 2=mute only"); +module_param_named(volume_control, volume_control_allowed, bool, 0444); +MODULE_PARM_DESC(volume_control, + "Enables software override for the console audio " + "control when true"); + #define TPACPI_PARAM(feature) \ module_param_call(feature, set_ibm_param, NULL, NULL, 0); \ MODULE_PARM_DESC(feature, "Simulates thinkpad-acpi procfs command " \ -- cgit v1.2.3 From 0d204c34e85d1d63e5fdd3e3192747daf0ee7ec1 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Tue, 15 Dec 2009 21:51:11 -0200 Subject: thinkpad-acpi: basic ALSA mixer support (v2) Add the basic ALSA mixer functionality. The mixer is event-driven, and will work fine on IBM ThinkPads. I expect Lenovo ThinkPads will cause some trouble with the event interface. Heavily based on work by Lorne Applebaum and ideas from Matthew Garrett . Signed-off-by: Henrique de Moraes Holschuh Cc: Lorne Applebaum Cc: Matthew Garrett Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 7 +- drivers/platform/x86/thinkpad_acpi.c | 237 +++++++++++++++++++++++++++++++- 2 files changed, 239 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 6a5814330e51..b4ed30cb1375 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1096,6 +1096,7 @@ Volume control -------------- procfs: /proc/acpi/ibm/volume +ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC" NOTE: by default, the volume control interface operates in read-only mode, as it is supposed to be used for on-screen-display purposes. @@ -1144,9 +1145,8 @@ The driver will operate in volume_mode=3 by default. If that does not work well on your ThinkPad model, please report this to ibm-acpi-devel@lists.sourceforge.net. -The ALSA mixer interface to this feature is still missing, but patches -to add it exist. That problem should be addressed in the not so -distant future. +The driver supports the standard ALSA module parameters. If the ALSA +mixer is disabled, the driver will disable all volume functionality. Fan control and monitoring: fan speed, fan enable/disable @@ -1478,3 +1478,4 @@ Sysfs interface changelog: 0x020700: Support for mute-only mixers. Volume control in read-only mode by default. + Marker for ALSA mixer support. diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 2d74926913d2..e0fbe73b8dff 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -76,6 +76,10 @@ #include #include +#include +#include +#include + #include #include @@ -6402,6 +6406,22 @@ static struct ibm_struct brightness_driver_data = { * and we leave them unchanged. */ +#define TPACPI_ALSA_DRVNAME "ThinkPad EC" +#define TPACPI_ALSA_SHRTNAME "ThinkPad Console Audio Control" +#define TPACPI_ALSA_MIXERNAME TPACPI_ALSA_SHRTNAME + +static int alsa_index = SNDRV_DEFAULT_IDX1; +static char *alsa_id = "ThinkPadEC"; +static int alsa_enable = SNDRV_DEFAULT_ENABLE1; + +struct tpacpi_alsa_data { + struct snd_card *card; + struct snd_ctl_elem_id *ctl_mute_id; + struct snd_ctl_elem_id *ctl_vol_id; +}; + +static struct snd_card *alsa_card; + enum { TP_EC_AUDIO = 0x30, @@ -6584,11 +6604,104 @@ static int volume_set_volume(const u8 vol) return volume_set_volume_ec(vol); } +static void volume_alsa_notify_change(void) +{ + struct tpacpi_alsa_data *d; + + if (alsa_card && alsa_card->private_data) { + d = alsa_card->private_data; + if (d->ctl_mute_id) + snd_ctl_notify(alsa_card, + SNDRV_CTL_EVENT_MASK_VALUE, + d->ctl_mute_id); + if (d->ctl_vol_id) + snd_ctl_notify(alsa_card, + SNDRV_CTL_EVENT_MASK_VALUE, + d->ctl_vol_id); + } +} + +static int volume_alsa_vol_info(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_info *uinfo) +{ + uinfo->type = SNDRV_CTL_ELEM_TYPE_INTEGER; + uinfo->count = 1; + uinfo->value.integer.min = 0; + uinfo->value.integer.max = TP_EC_VOLUME_MAX; + return 0; +} + +static int volume_alsa_vol_get(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + u8 s; + int rc; + + rc = volume_get_status(&s); + if (rc < 0) + return rc; + + ucontrol->value.integer.value[0] = s & TP_EC_AUDIO_LVL_MSK; + return 0; +} + +static int volume_alsa_vol_put(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + return volume_set_volume(ucontrol->value.integer.value[0]); +} + +#define volume_alsa_mute_info snd_ctl_boolean_mono_info + +static int volume_alsa_mute_get(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + u8 s; + int rc; + + rc = volume_get_status(&s); + if (rc < 0) + return rc; + + ucontrol->value.integer.value[0] = + (s & TP_EC_AUDIO_MUTESW_MSK) ? 0 : 1; + return 0; +} + +static int volume_alsa_mute_put(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + return volume_set_mute(!ucontrol->value.integer.value[0]); +} + +static struct snd_kcontrol_new volume_alsa_control_vol __devinitdata = { + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, + .name = "Console Playback Volume", + .index = 0, + .access = SNDRV_CTL_ELEM_ACCESS_READ, + .info = volume_alsa_vol_info, + .get = volume_alsa_vol_get, +}; + +static struct snd_kcontrol_new volume_alsa_control_mute __devinitdata = { + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, + .name = "Console Playback Switch", + .index = 0, + .access = SNDRV_CTL_ELEM_ACCESS_READ, + .info = volume_alsa_mute_info, + .get = volume_alsa_mute_get, +}; + static void volume_suspend(pm_message_t state) { tpacpi_volume_checkpoint_nvram(); } +static void volume_resume(void) +{ + volume_alsa_notify_change(); +} + static void volume_shutdown(void) { tpacpi_volume_checkpoint_nvram(); @@ -6596,9 +6709,87 @@ static void volume_shutdown(void) static void volume_exit(void) { + if (alsa_card) { + snd_card_free(alsa_card); + alsa_card = NULL; + } + tpacpi_volume_checkpoint_nvram(); } +static int __init volume_create_alsa_mixer(void) +{ + struct snd_card *card; + struct tpacpi_alsa_data *data; + struct snd_kcontrol *ctl_vol; + struct snd_kcontrol *ctl_mute; + int rc; + + rc = snd_card_create(alsa_index, alsa_id, THIS_MODULE, + sizeof(struct tpacpi_alsa_data), &card); + if (rc < 0) + return rc; + if (!card) + return -ENOMEM; + + BUG_ON(!card->private_data); + data = card->private_data; + data->card = card; + + strlcpy(card->driver, TPACPI_ALSA_DRVNAME, + sizeof(card->driver)); + strlcpy(card->shortname, TPACPI_ALSA_SHRTNAME, + sizeof(card->shortname)); + snprintf(card->mixername, sizeof(card->mixername), "ThinkPad EC %s", + (thinkpad_id.ec_version_str) ? + thinkpad_id.ec_version_str : "(unknown)"); + snprintf(card->longname, sizeof(card->longname), + "%s at EC reg 0x%02x, fw %s", card->shortname, TP_EC_AUDIO, + (thinkpad_id.ec_version_str) ? + thinkpad_id.ec_version_str : "unknown"); + + if (volume_control_allowed) { + volume_alsa_control_vol.put = volume_alsa_vol_put; + volume_alsa_control_vol.access = + SNDRV_CTL_ELEM_ACCESS_READWRITE; + + volume_alsa_control_mute.put = volume_alsa_mute_put; + volume_alsa_control_mute.access = + SNDRV_CTL_ELEM_ACCESS_READWRITE; + } + + if (!tp_features.mixer_no_level_control) { + ctl_vol = snd_ctl_new1(&volume_alsa_control_vol, NULL); + rc = snd_ctl_add(card, ctl_vol); + if (rc < 0) { + printk(TPACPI_ERR + "Failed to create ALSA volume control\n"); + goto err_out; + } + data->ctl_vol_id = &ctl_vol->id; + } + + ctl_mute = snd_ctl_new1(&volume_alsa_control_mute, NULL); + rc = snd_ctl_add(card, ctl_mute); + if (rc < 0) { + printk(TPACPI_ERR "Failed to create ALSA mute control\n"); + goto err_out; + } + data->ctl_mute_id = &ctl_mute->id; + + snd_card_set_dev(card, &tpacpi_pdev->dev); + rc = snd_card_register(card); + +err_out: + if (rc < 0) { + snd_card_free(card); + card = NULL; + } + + alsa_card = card; + return rc; +} + #define TPACPI_VOL_Q_MUTEONLY 0x0001 /* Mute-only control available */ #define TPACPI_VOL_Q_LEVEL 0x0002 /* Volume control available */ @@ -6628,6 +6819,7 @@ static const struct tpacpi_quirk volume_quirk_table[] __initconst = { static int __init volume_init(struct ibm_init_struct *iibm) { unsigned long quirks; + int rc; vdbg_printk(TPACPI_DBG_INIT, "initializing volume subdriver\n"); @@ -6651,6 +6843,17 @@ static int __init volume_init(struct ibm_init_struct *iibm) if (volume_capabilities >= TPACPI_VOL_CAP_MAX) return -EINVAL; + /* + * The ALSA mixer is our primary interface. + * When disabled, don't install the subdriver at all + */ + if (!alsa_enable) { + vdbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "ALSA mixer disabled by parameter, " + "not loading volume subdriver...\n"); + return 1; + } + quirks = tpacpi_check_quirks(volume_quirk_table, ARRAY_SIZE(volume_quirk_table)); @@ -6695,12 +6898,26 @@ static int __init volume_init(struct ibm_init_struct *iibm) "mute is supported, volume control is %s\n", str_supported(!tp_features.mixer_no_level_control)); + rc = volume_create_alsa_mixer(); + if (rc) { + printk(TPACPI_ERR + "Could not create the ALSA mixer interface\n"); + return rc; + } + printk(TPACPI_INFO "Console audio control enabled, mode: %s\n", (volume_control_allowed) ? "override (read/write)" : "monitor (read only)"); + vdbg_printk(TPACPI_DBG_INIT | TPACPI_DBG_MIXER, + "registering volume hotkeys as change notification\n"); + tpacpi_hotkey_driver_mask_set(hotkey_driver_mask + | TP_ACPI_HKEY_VOLUP_MASK + | TP_ACPI_HKEY_VOLDWN_MASK + | TP_ACPI_HKEY_MUTE_MASK); + return 0; } @@ -6807,6 +7024,7 @@ static int volume_write(char *buf) new_mute ? "" : "un", new_level); rc = volume_set_status(new_mute | new_level); } + volume_alsa_notify_change(); return (rc == -EINTR) ? -ERESTARTSYS : rc; } @@ -6817,6 +7035,7 @@ static struct ibm_struct volume_driver_data = { .write = volume_write, .exit = volume_exit, .suspend = volume_suspend, + .resume = volume_resume, .shutdown = volume_shutdown, }; @@ -8115,10 +8334,16 @@ static void tpacpi_driver_event(const unsigned int hkey_event) tpacpi_brightness_notify_change(); } } + if (alsa_card) { + switch (hkey_event) { + case TP_HKEY_EV_VOL_UP: + case TP_HKEY_EV_VOL_DOWN: + case TP_HKEY_EV_VOL_MUTE: + volume_alsa_notify_change(); + } + } } - - static void hotkey_driver_event(const unsigned int scancode) { tpacpi_driver_event(TP_HKEY_EV_HOTKEY_BASE + scancode); @@ -8552,6 +8777,14 @@ MODULE_PARM_DESC(volume_control, "Enables software override for the console audio " "control when true"); +/* ALSA module API parameters */ +module_param_named(index, alsa_index, int, 0444); +MODULE_PARM_DESC(index, "ALSA index for the ACPI EC Mixer"); +module_param_named(id, alsa_id, charp, 0444); +MODULE_PARM_DESC(id, "ALSA id for the ACPI EC Mixer"); +module_param_named(enable, alsa_enable, bool, 0444); +MODULE_PARM_DESC(enable, "Enable the ALSA interface for the ACPI EC Mixer"); + #define TPACPI_PARAM(feature) \ module_param_call(feature, set_ibm_param, NULL, NULL, 0); \ MODULE_PARM_DESC(feature, "Simulates thinkpad-acpi procfs command " \ -- cgit v1.2.3 From 5d2eb14d36723eba0b31ae208bc346835751e944 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Tue, 15 Dec 2009 21:51:13 -0200 Subject: thinkpad-acpi: bump version to 0.24 Signed-off-by: Henrique de Moraes Holschuh Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 4 ++-- drivers/platform/x86/thinkpad_acpi.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index b4ed30cb1375..169091f75e6d 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -1,7 +1,7 @@ ThinkPad ACPI Extras Driver - Version 0.23 - April 10th, 2009 + Version 0.24 + December 11th, 2009 Borislav Deianov Henrique de Moraes Holschuh diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index c21c35e2dcc6..af2352517c99 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -21,7 +21,7 @@ * 02110-1301, USA. */ -#define TPACPI_VERSION "0.23" +#define TPACPI_VERSION "0.24" #define TPACPI_SYSFS_VERSION 0x020700 /* -- cgit v1.2.3 From 0e9052eb98a9986ec0669d030604f7a68f6df638 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:57 +0100 Subject: page-types: add standard GPL license header Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/page-types.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 7a7d9bab32ef..66e9358e2144 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -1,11 +1,22 @@ /* * page-types: Tool for querying page flags * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; version 2. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should find a copy of v2 of the GNU General Public License somewhere on + * your Linux system; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place, Suite 330, Boston, MA 02111-1307 USA. + * * Copyright (C) 2009 Intel corporation * * Authors: Wu Fengguang - * - * Released under the General Public License (GPL). */ #define _LARGEFILE64_SOURCE -- cgit v1.2.3 From 847ce401df392b0704369fd3f75df614ac1414b4 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:58 +0100 Subject: HWPOISON: Add unpoisoning support The unpoisoning interface is useful for stress testing tools to reclaim poisoned pages (to prevent OOM) There is no hardware level unpoisioning, so this cannot be used for real memory errors, only for software injected errors. Note that it may leak pages silently - those who have been removed from LRU cache, but not isolated from page cache/swap cache at hwpoison time. Especially the stress test of dirty swap cache pages shall reboot system before exhausting memory. AK: Fix comments, add documentation, add printks, rename symbol Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 16 ++++++++-- include/linux/mm.h | 1 + include/linux/page-flags.h | 2 +- mm/hwpoison-inject.c | 36 +++++++++++++++++++---- mm/memory-failure.c | 68 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 114 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 3ffadf8da61f..f047e75acb23 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -98,10 +98,22 @@ madvise(MADV_POISON, ....) hwpoison-inject module through debugfs - /sys/debug/hwpoison/corrupt-pfn -Inject hwpoison fault at PFN echoed into this file +/sys/debug/hwpoison/ +corrupt-pfn + +Inject hwpoison fault at PFN echoed into this file. + +unpoison-pfn + +Software-unpoison page at PFN echoed into this file. This +way a page can be reused again. +This only works for Linux injected failures, not for real +memory failures. + +Note these injection interfaces are not stable and might change between +kernel versions Architecture specific MCE injector diff --git a/include/linux/mm.h b/include/linux/mm.h index 135e19198cd3..8cdb941fc7b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1336,6 +1336,7 @@ enum mf_flags { }; extern void memory_failure(unsigned long pfn, int trapno); extern int __memory_failure(unsigned long pfn, int trapno, int flags); +extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; extern void shake_page(struct page *p); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 49e907bd067f..f9df6308af95 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -275,7 +275,7 @@ PAGEFLAG_FALSE(Uncached) #ifdef CONFIG_MEMORY_FAILURE PAGEFLAG(HWPoison, hwpoison) -TESTSETFLAG(HWPoison, hwpoison) +TESTSCFLAG(HWPoison, hwpoison) #define __PG_HWPOISON (1UL << PG_hwpoison) #else PAGEFLAG_FALSE(HWPoison) diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index e1d85137f086..6e35e563bf50 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -4,7 +4,7 @@ #include #include -static struct dentry *hwpoison_dir, *corrupt_pfn; +static struct dentry *hwpoison_dir; static int hwpoison_inject(void *data, u64 val) { @@ -14,7 +14,16 @@ static int hwpoison_inject(void *data, u64 val) return __memory_failure(val, 18, 0); } +static int hwpoison_unpoison(void *data, u64 val) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + return unpoison_memory(val); +} + DEFINE_SIMPLE_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n"); +DEFINE_SIMPLE_ATTRIBUTE(unpoison_fops, NULL, hwpoison_unpoison, "%lli\n"); static void pfn_inject_exit(void) { @@ -24,16 +33,31 @@ static void pfn_inject_exit(void) static int pfn_inject_init(void) { + struct dentry *dentry; + hwpoison_dir = debugfs_create_dir("hwpoison", NULL); if (hwpoison_dir == NULL) return -ENOMEM; - corrupt_pfn = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir, + + /* + * Note that the below poison/unpoison interfaces do not involve + * hardware status change, hence do not require hardware support. + * They are mainly for testing hwpoison in software level. + */ + dentry = debugfs_create_file("corrupt-pfn", 0600, hwpoison_dir, NULL, &hwpoison_fops); - if (corrupt_pfn == NULL) { - pfn_inject_exit(); - return -ENOMEM; - } + if (!dentry) + goto fail; + + dentry = debugfs_create_file("unpoison-pfn", 0600, hwpoison_dir, + NULL, &unpoison_fops); + if (!dentry) + goto fail; + return 0; +fail: + pfn_inject_exit(); + return -ENOMEM; } module_init(pfn_inject_init); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5055b940df5f..ed6e91c87a54 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -838,6 +838,16 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) * and in many cases impossible, so we just avoid it here. */ lock_page_nosync(p); + + /* + * unpoison always clear PG_hwpoison inside page lock + */ + if (!PageHWPoison(p)) { + action_result(pfn, "unpoisoned", IGNORED); + res = 0; + goto out; + } + wait_on_page_writeback(p); /* @@ -893,3 +903,61 @@ void memory_failure(unsigned long pfn, int trapno) { __memory_failure(pfn, trapno, 0); } + +/** + * unpoison_memory - Unpoison a previously poisoned page + * @pfn: Page number of the to be unpoisoned page + * + * Software-unpoison a page that has been poisoned by + * memory_failure() earlier. + * + * This is only done on the software-level, so it only works + * for linux injected failures, not real hardware failures + * + * Returns 0 for success, otherwise -errno. + */ +int unpoison_memory(unsigned long pfn) +{ + struct page *page; + struct page *p; + int freeit = 0; + + if (!pfn_valid(pfn)) + return -ENXIO; + + p = pfn_to_page(pfn); + page = compound_head(p); + + if (!PageHWPoison(p)) { + pr_debug("MCE: Page was already unpoisoned %#lx\n", pfn); + return 0; + } + + if (!get_page_unless_zero(page)) { + if (TestClearPageHWPoison(p)) + atomic_long_dec(&mce_bad_pages); + pr_debug("MCE: Software-unpoisoned free page %#lx\n", pfn); + return 0; + } + + lock_page_nosync(page); + /* + * This test is racy because PG_hwpoison is set outside of page lock. + * That's acceptable because that won't trigger kernel panic. Instead, + * the PG_hwpoison page will be caught and isolated on the entrance to + * the free buddy page pool. + */ + if (TestClearPageHWPoison(p)) { + pr_debug("MCE: Software-unpoisoned page %#lx\n", pfn); + atomic_long_dec(&mce_bad_pages); + freeit = 1; + } + unlock_page(page); + + put_page(page); + if (freeit) + put_page(page); + + return 0; +} +EXPORT_SYMBOL(unpoison_memory); -- cgit v1.2.3 From 7c116f2b0dbac4a1dd051c7a5e8cef37701cafd4 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add fs/device filters Filesystem data/metadata present the most tricky-to-isolate pages. It requires careful code review and stress testing to get them right. The fs/device filter helps to target the stress tests to some specific filesystem pages. The filter condition is block device's major/minor numbers: - corrupt-filter-dev-major - corrupt-filter-dev-minor When specified (non -1), only page cache pages that belong to that device will be poisoned. The filters are checked reliably on the locked and refcounted page. Haicheng: clear PG_hwpoison and drop bad page count if filter not OK AK: Add documentation CC: Haicheng Li CC: Nick Piggin Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 7 ++++++ mm/hwpoison-inject.c | 11 ++++++++++ mm/internal.h | 3 +++ mm/memory-failure.c | 51 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 72 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index f047e75acb23..fdf580464324 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -115,6 +115,13 @@ memory failures. Note these injection interfaces are not stable and might change between kernel versions +corrupt-filter-dev-major +corrupt-filter-dev-minor + +Only handle memory failures to pages associated with the file system defined +by block device major/minor. -1U is the wildcard value. +This should be only used for testing with artificial injection. + Architecture specific MCE injector x86 has mce-inject, mce-test diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index 6e35e563bf50..ac692a9b766c 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -3,6 +3,7 @@ #include #include #include +#include "internal.h" static struct dentry *hwpoison_dir; @@ -54,6 +55,16 @@ static int pfn_inject_init(void) if (!dentry) goto fail; + dentry = debugfs_create_u32("corrupt-filter-dev-major", 0600, + hwpoison_dir, &hwpoison_filter_dev_major); + if (!dentry) + goto fail; + + dentry = debugfs_create_u32("corrupt-filter-dev-minor", 0600, + hwpoison_dir, &hwpoison_filter_dev_minor); + if (!dentry) + goto fail; + return 0; fail: pfn_inject_exit(); diff --git a/mm/internal.h b/mm/internal.h index 49b2ff776b78..814da335f050 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -250,3 +250,6 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, #define ZONE_RECLAIM_SOME 0 #define ZONE_RECLAIM_SUCCESS 1 #endif + +extern u32 hwpoison_filter_dev_major; +extern u32 hwpoison_filter_dev_minor; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index edeaf2319e74..82ac73436d0e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -48,6 +48,50 @@ int sysctl_memory_failure_recovery __read_mostly = 1; atomic_long_t mce_bad_pages __read_mostly = ATOMIC_LONG_INIT(0); +u32 hwpoison_filter_dev_major = ~0U; +u32 hwpoison_filter_dev_minor = ~0U; +EXPORT_SYMBOL_GPL(hwpoison_filter_dev_major); +EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor); + +static int hwpoison_filter_dev(struct page *p) +{ + struct address_space *mapping; + dev_t dev; + + if (hwpoison_filter_dev_major == ~0U && + hwpoison_filter_dev_minor == ~0U) + return 0; + + /* + * page_mapping() does not accept slab page + */ + if (PageSlab(p)) + return -EINVAL; + + mapping = page_mapping(p); + if (mapping == NULL || mapping->host == NULL) + return -EINVAL; + + dev = mapping->host->i_sb->s_dev; + if (hwpoison_filter_dev_major != ~0U && + hwpoison_filter_dev_major != MAJOR(dev)) + return -EINVAL; + if (hwpoison_filter_dev_minor != ~0U && + hwpoison_filter_dev_minor != MINOR(dev)) + return -EINVAL; + + return 0; +} + +int hwpoison_filter(struct page *p) +{ + if (hwpoison_filter_dev(p)) + return -EINVAL; + + return 0; +} +EXPORT_SYMBOL_GPL(hwpoison_filter); + /* * Send all the processes who have the page mapped an ``action optional'' * signal. @@ -843,6 +887,13 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) res = 0; goto out; } + if (hwpoison_filter(p)) { + if (TestClearPageHWPoison(p)) + atomic_long_dec(&mce_bad_pages); + unlock_page(p); + put_page(p); + return 0; + } wait_on_page_writeback(p); -- cgit v1.2.3 From 31d3d3484f9bd263925ecaa341500ac2df3a5d9b Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: limit hwpoison injector to known page types __memory_failure()'s workflow is set PG_hwpoison //... unset PG_hwpoison if didn't pass hwpoison filter That could kill unrelated process if it happens to page fault on the page with the (temporary) PG_hwpoison. The race should be big enough to appear in stress tests. Fix it by grabbing the page and checking filter at inject time. This also avoids the very noisy "Injecting memory failure..." messages. - we don't touch madvise() based injection, because the filters are generally not necessary for it. - if we want to apply the filters to h/w aided injection, we'd better to rearrange the logic in __memory_failure() instead of this patch. AK: fix documentation, use drain all, cleanups CC: Haicheng Li Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 3 ++- mm/hwpoison-inject.c | 41 +++++++++++++++++++++++++++++++++++++++-- mm/internal.h | 2 ++ 3 files changed, 43 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index fdf580464324..4ef7bb30d15c 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -103,7 +103,8 @@ hwpoison-inject module through debugfs corrupt-pfn -Inject hwpoison fault at PFN echoed into this file. +Inject hwpoison fault at PFN echoed into this file. This does +some early filtering to avoid corrupted unintended pages in test suites. unpoison-pfn diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index ac692a9b766c..2b6b3200fa65 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -3,16 +3,53 @@ #include #include #include +#include +#include #include "internal.h" static struct dentry *hwpoison_dir; static int hwpoison_inject(void *data, u64 val) { + unsigned long pfn = val; + struct page *p; + int err; + if (!capable(CAP_SYS_ADMIN)) return -EPERM; - printk(KERN_INFO "Injecting memory failure at pfn %Lx\n", val); - return __memory_failure(val, 18, 0); + + if (!pfn_valid(pfn)) + return -ENXIO; + + p = pfn_to_page(pfn); + /* + * This implies unable to support free buddy pages. + */ + if (!get_page_unless_zero(p)) + return 0; + + if (!PageLRU(p)) + shake_page(p); + /* + * This implies unable to support non-LRU pages. + */ + if (!PageLRU(p)) + return 0; + + /* + * do a racy check with elevated page count, to make sure PG_hwpoison + * will only be set for the targeted owner (or on a free page). + * We temporarily take page lock for try_get_mem_cgroup_from_page(). + * __memory_failure() will redo the check reliably inside page lock. + */ + lock_page(p); + err = hwpoison_filter(p); + unlock_page(p); + if (err) + return 0; + + printk(KERN_INFO "Injecting memory failure at pfn %lx\n", pfn); + return __memory_failure(pfn, 18, MF_COUNT_INCREASED); } static int hwpoison_unpoison(void *data, u64 val) diff --git a/mm/internal.h b/mm/internal.h index 814da335f050..04bbce8b8ba6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -251,5 +251,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, #define ZONE_RECLAIM_SUCCESS 1 #endif +extern int hwpoison_filter(struct page *p); + extern u32 hwpoison_filter_dev_major; extern u32 hwpoison_filter_dev_minor; -- cgit v1.2.3 From 478c5ffc0b50527bd2390f2daa46cc16276b8413 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add page flags filter When specified, only poison pages if ((page_flags & mask) == value). - corrupt-filter-flags-mask - corrupt-filter-flags-value This allows stress testing of many kinds of pages. Strictly speaking, the buddy pages requires taking zone lock, to avoid setting PG_hwpoison on a "was buddy but now allocated to someone" page. However we can just do nothing because we set PG_locked in the beginning, this prevents the page allocator from allocating it to someone. (It will BUG() on the unexpected PG_locked, which is fine for hwpoison testing.) [AK: Add select PROC_PAGE_MONITOR to satisfy dependency] CC: Nick Piggin Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 10 ++++++++++ mm/Kconfig | 1 + mm/hwpoison-inject.c | 10 ++++++++++ mm/internal.h | 2 ++ mm/memory-failure.c | 20 ++++++++++++++++++++ 5 files changed, 43 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 4ef7bb30d15c..f454d3cd4d60 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -123,6 +123,16 @@ Only handle memory failures to pages associated with the file system defined by block device major/minor. -1U is the wildcard value. This should be only used for testing with artificial injection. + +corrupt-filter-flags-mask +corrupt-filter-flags-value + +When specified, only poison pages if ((page_flags & mask) == value). +This allows stress testing of many kinds of pages. The page_flags +are the same as in /proc/kpageflags. The flag bits are defined in +include/linux/kernel-page-flags.h and documented in +Documentation/vm/pagemap.txt + Architecture specific MCE injector x86 has mce-inject, mce-test diff --git a/mm/Kconfig b/mm/Kconfig index 2310984591ed..8cea7fde06e1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -253,6 +253,7 @@ config MEMORY_FAILURE config HWPOISON_INJECT tristate "Poison pages injector" depends on MEMORY_FAILURE && DEBUG_KERNEL + select PROC_PAGE_MONITOR config NOMMU_INITIAL_TRIM_EXCESS int "Turn on mmap() excess space trimming before booting" diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index 2b6b3200fa65..c4dfd89f654a 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -102,6 +102,16 @@ static int pfn_inject_init(void) if (!dentry) goto fail; + dentry = debugfs_create_u64("corrupt-filter-flags-mask", 0600, + hwpoison_dir, &hwpoison_filter_flags_mask); + if (!dentry) + goto fail; + + dentry = debugfs_create_u64("corrupt-filter-flags-value", 0600, + hwpoison_dir, &hwpoison_filter_flags_value); + if (!dentry) + goto fail; + return 0; fail: pfn_inject_exit(); diff --git a/mm/internal.h b/mm/internal.h index 04bbce8b8ba6..b2027c73119b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -255,3 +255,5 @@ extern int hwpoison_filter(struct page *p); extern u32 hwpoison_filter_dev_major; extern u32 hwpoison_filter_dev_minor; +extern u64 hwpoison_filter_flags_mask; +extern u64 hwpoison_filter_flags_value; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 82ac73436d0e..22d2b2028e54 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -50,8 +51,12 @@ atomic_long_t mce_bad_pages __read_mostly = ATOMIC_LONG_INIT(0); u32 hwpoison_filter_dev_major = ~0U; u32 hwpoison_filter_dev_minor = ~0U; +u64 hwpoison_filter_flags_mask; +u64 hwpoison_filter_flags_value; EXPORT_SYMBOL_GPL(hwpoison_filter_dev_major); EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor); +EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask); +EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value); static int hwpoison_filter_dev(struct page *p) { @@ -83,11 +88,26 @@ static int hwpoison_filter_dev(struct page *p) return 0; } +static int hwpoison_filter_flags(struct page *p) +{ + if (!hwpoison_filter_flags_mask) + return 0; + + if ((stable_page_flags(p) & hwpoison_filter_flags_mask) == + hwpoison_filter_flags_value) + return 0; + else + return -EINVAL; +} + int hwpoison_filter(struct page *p) { if (hwpoison_filter_dev(p)) return -EINVAL; + if (hwpoison_filter_flags(p)) + return -EINVAL; + return 0; } EXPORT_SYMBOL_GPL(hwpoison_filter); -- cgit v1.2.3 From 4fd466eb46a6a917c317a87fb94bfc7252a0f7ed Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add memory cgroup filter The hwpoison test suite need to inject hwpoison to a collection of selected task pages, and must not touch pages not owned by them and thus kill important system processes such as init. (But it's OK to mis-hwpoison free/unowned pages as well as shared clean pages. Mis-hwpoison of shared dirty pages will kill all tasks, so the test suite will target all or non of such tasks in the first place.) The memory cgroup serves this purpose well. We can put the target processes under the control of a memory cgroup, and tell the hwpoison injection code to only kill pages associated with some active memory cgroup. The prerequisite for doing hwpoison stress tests with mem_cgroup is, the mem_cgroup code tracks task pages _accurately_ (unless page is locked). Which we believe is/should be true. The benefits are simplification of hwpoison injector code. Also the mem_cgroup code will automatically be tested by hwpoison test cases. The alternative interfaces pin-pfn/unpin-pfn can also delegate the (process and page flags) filtering functions reliably to user space. However prototype implementation shows that this scheme adds more complexity than we wanted. Example test case: mkdir /cgroup/hwpoison usemem -m 100 -s 1000 & echo `jobs -p` > /cgroup/hwpoison/tasks memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg page-types -p `pidof init` --hwpoison # shall do nothing page-types -p `pidof usemem` --hwpoison # poison its pages [AK: Fix documentation] [Add fix for problem noticed by Li Zefan ; dentry in the css could be NULL] CC: KOSAKI Motohiro CC: Hugh Dickins CC: Daisuke Nishimura CC: Balbir Singh CC: KAMEZAWA Hiroyuki CC: Li Zefan CC: Paul Menage CC: Nick Piggin CC: Andi Kleen Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 16 +++++++++++++++ mm/hwpoison-inject.c | 7 +++++++ mm/internal.h | 1 + mm/memory-failure.c | 46 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 70 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index f454d3cd4d60..989e5afe740f 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -123,6 +123,22 @@ Only handle memory failures to pages associated with the file system defined by block device major/minor. -1U is the wildcard value. This should be only used for testing with artificial injection. +corrupt-filter-memcg + +Limit injection to pages owned by memgroup. Specified by inode number +of the memcg. + +Example: + mkdir /cgroup/hwpoison + + usemem -m 100 -s 1000 & + echo `jobs -p` > /cgroup/hwpoison/tasks + + memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') + echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg + + page-types -p `pidof init` --hwpoison # shall do nothing + page-types -p `pidof usemem` --hwpoison # poison its pages corrupt-filter-flags-mask corrupt-filter-flags-value diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index c4dfd89f654a..c838735ac31d 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -112,6 +112,13 @@ static int pfn_inject_init(void) if (!dentry) goto fail; +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP + dentry = debugfs_create_u64("corrupt-filter-memcg", 0600, + hwpoison_dir, &hwpoison_filter_memcg); + if (!dentry) + goto fail; +#endif + return 0; fail: pfn_inject_exit(); diff --git a/mm/internal.h b/mm/internal.h index b2027c73119b..5a6761bea6a6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -257,3 +257,4 @@ extern u32 hwpoison_filter_dev_major; extern u32 hwpoison_filter_dev_minor; extern u64 hwpoison_filter_flags_mask; extern u64 hwpoison_filter_flags_value; +extern u64 hwpoison_filter_memcg; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 22d2b2028e54..117ef1598469 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -100,6 +100,49 @@ static int hwpoison_filter_flags(struct page *p) return -EINVAL; } +/* + * This allows stress tests to limit test scope to a collection of tasks + * by putting them under some memcg. This prevents killing unrelated/important + * processes such as /sbin/init. Note that the target task may share clean + * pages with init (eg. libc text), which is harmless. If the target task + * share _dirty_ pages with another task B, the test scheme must make sure B + * is also included in the memcg. At last, due to race conditions this filter + * can only guarantee that the page either belongs to the memcg tasks, or is + * a freed page. + */ +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP +u64 hwpoison_filter_memcg; +EXPORT_SYMBOL_GPL(hwpoison_filter_memcg); +static int hwpoison_filter_task(struct page *p) +{ + struct mem_cgroup *mem; + struct cgroup_subsys_state *css; + unsigned long ino; + + if (!hwpoison_filter_memcg) + return 0; + + mem = try_get_mem_cgroup_from_page(p); + if (!mem) + return -EINVAL; + + css = mem_cgroup_css(mem); + /* root_mem_cgroup has NULL dentries */ + if (!css->cgroup->dentry) + return -EINVAL; + + ino = css->cgroup->dentry->d_inode->i_ino; + css_put(css); + + if (ino != hwpoison_filter_memcg) + return -EINVAL; + + return 0; +} +#else +static int hwpoison_filter_task(struct page *p) { return 0; } +#endif + int hwpoison_filter(struct page *p) { if (hwpoison_filter_dev(p)) @@ -108,6 +151,9 @@ int hwpoison_filter(struct page *p) if (hwpoison_filter_flags(p)) return -EINVAL; + if (hwpoison_filter_task(p)) + return -EINVAL; + return 0; } EXPORT_SYMBOL_GPL(hwpoison_filter); -- cgit v1.2.3 From fe194d3e100dea323d7b2de96d3b44d0c067ba7a Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:20:00 +0100 Subject: HWPOISON: Use correct name for MADV_HWPOISON in documentation Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 989e5afe740f..12f9ba20ccb7 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -92,7 +92,7 @@ PR_MCE_KILL_GET Testing: -madvise(MADV_POISON, ....) +madvise(MADV_HWPOISON, ....) (as root) Poison a page in the process for testing -- cgit v1.2.3 From facb6011f3993947283fa15d039dacb4ad140230 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:20:00 +0100 Subject: HWPOISON: Add soft page offline support This is a simpler, gentler variant of memory_failure() for soft page offlining controlled from user space. It doesn't kill anything, just tries to invalidate and if that doesn't work migrate the page away. This is useful for predictive failure analysis, where a page has a high rate of corrected errors, but hasn't gone bad yet. Instead it can be offlined early and avoided. The offlining is controlled from sysfs, including a new generic entry point for hard page offlining for symmetry too. We use the page isolate facility to prevent re-allocation race. Normally this is only used by memory hotplug. To avoid races with memory allocation I am using lock_system_sleep(). This avoids the situation where memory hotplug is about to isolate a page range and then hwpoison undoes that work. This is a big hammer currently, but the simplest solution currently. When the page is not free or LRU we try to free pages from slab and other caches. The slab freeing is currently quite dumb and does not try to focus on the specific slab cache which might own the page. This could be potentially improved later. Thanks to Fengguang Wu and Haicheng Li for some fixes. [Added fix from Andrew Morton to adapt to new migrate_pages prototype] Signed-off-by: Andi Kleen --- .../ABI/testing/sysfs-memory-page-offline | 44 +++++ drivers/base/memory.c | 61 +++++++ include/linux/mm.h | 3 +- mm/hwpoison-inject.c | 2 +- mm/memory-failure.c | 194 ++++++++++++++++++++- 5 files changed, 297 insertions(+), 7 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-memory-page-offline (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline new file mode 100644 index 000000000000..e14703f12fdf --- /dev/null +++ b/Documentation/ABI/testing/sysfs-memory-page-offline @@ -0,0 +1,44 @@ +What: /sys/devices/system/memory/soft_offline_page +Date: Sep 2009 +KernelVersion: 2.6.33 +Contact: andi@firstfloor.org +Description: + Soft-offline the memory page containing the physical address + written into this file. Input is a hex number specifying the + physical address of the page. The kernel will then attempt + to soft-offline it, by moving the contents elsewhere or + dropping it if possible. The kernel will then be placed + on the bad page list and never be reused. + + The offlining is done in kernel specific granuality. + Normally it's the base page size of the kernel, but + this might change. + + The page must be still accessible, not poisoned. The + kernel will never kill anything for this, but rather + fail the offline. Return value is the size of the + number, or a error when the offlining failed. Reading + the file is not allowed. + +What: /sys/devices/system/memory/hard_offline_page +Date: Sep 2009 +KernelVersion: 2.6.33 +Contact: andi@firstfloor.org +Description: + Hard-offline the memory page containing the physical + address written into this file. Input is a hex number + specifying the physical address of the page. The + kernel will then attempt to hard-offline the page, by + trying to drop the page or killing any owner or + triggering IO errors if needed. Note this may kill + any processes owning the page. The kernel will avoid + to access this page assuming it's poisoned by the + hardware. + + The offlining is done in kernel specific granuality. + Normally it's the base page size of the kernel, but + this might change. + + Return value is the size of the number, or a error when + the offlining failed. + Reading the file is not allowed. diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 989429cfed88..c4c8f2e1dd15 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -341,6 +341,64 @@ static inline int memory_probe_init(void) } #endif +#ifdef CONFIG_MEMORY_FAILURE +/* + * Support for offlining pages of memory + */ + +/* Soft offline a page */ +static ssize_t +store_soft_offline_page(struct class *class, const char *buf, size_t count) +{ + int ret; + u64 pfn; + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + if (strict_strtoull(buf, 0, &pfn) < 0) + return -EINVAL; + pfn >>= PAGE_SHIFT; + if (!pfn_valid(pfn)) + return -ENXIO; + ret = soft_offline_page(pfn_to_page(pfn), 0); + return ret == 0 ? count : ret; +} + +/* Forcibly offline a page, including killing processes. */ +static ssize_t +store_hard_offline_page(struct class *class, const char *buf, size_t count) +{ + int ret; + u64 pfn; + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + if (strict_strtoull(buf, 0, &pfn) < 0) + return -EINVAL; + pfn >>= PAGE_SHIFT; + ret = __memory_failure(pfn, 0, 0); + return ret ? ret : count; +} + +static CLASS_ATTR(soft_offline_page, 0644, NULL, store_soft_offline_page); +static CLASS_ATTR(hard_offline_page, 0644, NULL, store_hard_offline_page); + +static __init int memory_fail_init(void) +{ + int err; + + err = sysfs_create_file(&memory_sysdev_class.kset.kobj, + &class_attr_soft_offline_page.attr); + if (!err) + err = sysfs_create_file(&memory_sysdev_class.kset.kobj, + &class_attr_hard_offline_page.attr); + return err; +} +#else +static inline int memory_fail_init(void) +{ + return 0; +} +#endif + /* * Note that phys_device is optional. It is here to allow for * differentiation between which *physical* devices each @@ -471,6 +529,9 @@ int __init memory_dev_init(void) } err = memory_probe_init(); + if (!ret) + ret = err; + err = memory_fail_init(); if (!ret) ret = err; err = block_size_init(); diff --git a/include/linux/mm.h b/include/linux/mm.h index 8cdb941fc7b5..849b4a61bd8f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1339,8 +1339,9 @@ extern int __memory_failure(unsigned long pfn, int trapno, int flags); extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; -extern void shake_page(struct page *p); +extern void shake_page(struct page *p, int access); extern atomic_long_t mce_bad_pages; +extern int soft_offline_page(struct page *page, int flags); #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c index c597f46ac18a..a77fe3f9e211 100644 --- a/mm/hwpoison-inject.c +++ b/mm/hwpoison-inject.c @@ -29,7 +29,7 @@ static int hwpoison_inject(void *data, u64 val) return 0; if (!PageLRU(p)) - shake_page(p); + shake_page(p, 0); /* * This implies unable to support non-LRU pages. */ diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b5c3b6bd511f..bcce28755832 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -41,6 +41,9 @@ #include #include #include +#include +#include +#include #include "internal.h" int sysctl_memory_failure_early_kill __read_mostly = 0; @@ -201,7 +204,7 @@ static int kill_proc_ao(struct task_struct *t, unsigned long addr, int trapno, * When a unknown page type is encountered drain as many buffers as possible * in the hope to turn the page into a LRU or free page, which we can handle. */ -void shake_page(struct page *p) +void shake_page(struct page *p, int access) { if (!PageSlab(p)) { lru_add_drain_all(); @@ -211,11 +214,19 @@ void shake_page(struct page *p) if (PageLRU(p) || is_free_buddy_page(p)) return; } + /* - * Could call shrink_slab here (which would also - * shrink other caches). Unfortunately that might - * also access the corrupted page, which could be fatal. + * Only all shrink_slab here (which would also + * shrink other caches) if access is not potentially fatal. */ + if (access) { + int nr; + do { + nr = shrink_slab(1000, GFP_KERNEL, 1000); + if (page_count(p) == 0) + break; + } while (nr > 10); + } } EXPORT_SYMBOL_GPL(shake_page); @@ -949,7 +960,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) * walked by the page reclaim code, however that's not a big loss. */ if (!PageLRU(p)) - shake_page(p); + shake_page(p, 0); if (!PageLRU(p)) { /* * shake_page could have turned it free. @@ -1099,3 +1110,176 @@ int unpoison_memory(unsigned long pfn) return 0; } EXPORT_SYMBOL(unpoison_memory); + +static struct page *new_page(struct page *p, unsigned long private, int **x) +{ + return alloc_pages(GFP_HIGHUSER_MOVABLE, 0); +} + +/* + * Safely get reference count of an arbitrary page. + * Returns 0 for a free page, -EIO for a zero refcount page + * that is not free, and 1 for any other page type. + * For 1 the page is returned with increased page count, otherwise not. + */ +static int get_any_page(struct page *p, unsigned long pfn, int flags) +{ + int ret; + + if (flags & MF_COUNT_INCREASED) + return 1; + + /* + * The lock_system_sleep prevents a race with memory hotplug, + * because the isolation assumes there's only a single user. + * This is a big hammer, a better would be nicer. + */ + lock_system_sleep(); + + /* + * Isolate the page, so that it doesn't get reallocated if it + * was free. + */ + set_migratetype_isolate(p); + if (!get_page_unless_zero(compound_head(p))) { + if (is_free_buddy_page(p)) { + pr_debug("get_any_page: %#lx free buddy page\n", pfn); + /* Set hwpoison bit while page is still isolated */ + SetPageHWPoison(p); + ret = 0; + } else { + pr_debug("get_any_page: %#lx: unknown zero refcount page type %lx\n", + pfn, p->flags); + ret = -EIO; + } + } else { + /* Not a free page */ + ret = 1; + } + unset_migratetype_isolate(p); + unlock_system_sleep(); + return ret; +} + +/** + * soft_offline_page - Soft offline a page. + * @page: page to offline + * @flags: flags. Same as memory_failure(). + * + * Returns 0 on success, otherwise negated errno. + * + * Soft offline a page, by migration or invalidation, + * without killing anything. This is for the case when + * a page is not corrupted yet (so it's still valid to access), + * but has had a number of corrected errors and is better taken + * out. + * + * The actual policy on when to do that is maintained by + * user space. + * + * This should never impact any application or cause data loss, + * however it might take some time. + * + * This is not a 100% solution for all memory, but tries to be + * ``good enough'' for the majority of memory. + */ +int soft_offline_page(struct page *page, int flags) +{ + int ret; + unsigned long pfn = page_to_pfn(page); + + ret = get_any_page(page, pfn, flags); + if (ret < 0) + return ret; + if (ret == 0) + goto done; + + /* + * Page cache page we can handle? + */ + if (!PageLRU(page)) { + /* + * Try to free it. + */ + put_page(page); + shake_page(page, 1); + + /* + * Did it turn free? + */ + ret = get_any_page(page, pfn, 0); + if (ret < 0) + return ret; + if (ret == 0) + goto done; + } + if (!PageLRU(page)) { + pr_debug("soft_offline: %#lx: unknown non LRU page type %lx\n", + pfn, page->flags); + return -EIO; + } + + lock_page(page); + wait_on_page_writeback(page); + + /* + * Synchronized using the page lock with memory_failure() + */ + if (PageHWPoison(page)) { + unlock_page(page); + put_page(page); + pr_debug("soft offline: %#lx page already poisoned\n", pfn); + return -EBUSY; + } + + /* + * Try to invalidate first. This should work for + * non dirty unmapped page cache pages. + */ + ret = invalidate_inode_page(page); + unlock_page(page); + + /* + * Drop count because page migration doesn't like raised + * counts. The page could get re-allocated, but if it becomes + * LRU the isolation will just fail. + * RED-PEN would be better to keep it isolated here, but we + * would need to fix isolation locking first. + */ + put_page(page); + if (ret == 1) { + ret = 0; + pr_debug("soft_offline: %#lx: invalidated\n", pfn); + goto done; + } + + /* + * Simple invalidation didn't work. + * Try to migrate to a new page instead. migrate.c + * handles a large number of cases for us. + */ + ret = isolate_lru_page(page); + if (!ret) { + LIST_HEAD(pagelist); + + list_add(&page->lru, &pagelist); + ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0); + if (ret) { + pr_debug("soft offline: %#lx: migration failed %d, type %lx\n", + pfn, ret, page->flags); + if (ret > 0) + ret = -EIO; + } + } else { + pr_debug("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n", + pfn, ret, page_count(page), page->flags); + } + if (ret) + return ret; + +done: + atomic_long_add(1, &mce_bad_pages); + SetPageHWPoison(page); + /* keep elevated page count for bad page */ + return ret; +} -- cgit v1.2.3 From 35b23b522696118b5ec5b84ba6846ee96aa2743e Mon Sep 17 00:00:00 2001 From: Guennadi Liakhovetski Date: Fri, 11 Dec 2009 11:34:20 -0300 Subject: V4L/DVB (13651): sh_mobile_ceu_camera: document the scaling and cropping algorithm The sh_mobile_ceu_camera driver implements an advanced algorithm, combining scaling and cropping on the client and on the host. Due to its complexity the algorithm deserves separate documentation. create mode 100644 Documentation/video4linux/sh_mobile_ceu_camera.txt Signed-off-by: Guennadi Liakhovetski Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/sh_mobile_ceu_camera.txt | 157 +++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 Documentation/video4linux/sh_mobile_ceu_camera.txt (limited to 'Documentation') diff --git a/Documentation/video4linux/sh_mobile_ceu_camera.txt b/Documentation/video4linux/sh_mobile_ceu_camera.txt new file mode 100644 index 000000000000..2ae16349a78d --- /dev/null +++ b/Documentation/video4linux/sh_mobile_ceu_camera.txt @@ -0,0 +1,157 @@ + Cropping and Scaling algorithm, used in the sh_mobile_ceu_camera driver + ======================================================================= + +Terminology +----------- + +sensor scales: horizontal and vertical scales, configured by the sensor driver +host scales: -"- host driver +combined scales: sensor_scale * host_scale + + +Generic scaling / cropping scheme +--------------------------------- + +-1-- +| +-2-- -\ +| --\ +| --\ ++-5-- -\ -- -3-- +| ---\ +| --- -4-- -\ +| -\ +| - -6-- +| +| - -6'- +| -/ +| --- -4'- -/ +| ---/ ++-5'- -/ +| -- -3'- +| --/ +| --/ +-2'- -/ +| +| +-1'- + +Produced by user requests: + +S_CROP(left / top = (5) - (1), width / height = (5') - (5)) +S_FMT(width / height = (6') - (6)) + +Here: + +(1) to (1') - whole max width or height +(1) to (2) - sensor cropped left or top +(2) to (2') - sensor cropped width or height +(3) to (3') - sensor scale +(3) to (4) - CEU cropped left or top +(4) to (4') - CEU cropped width or height +(5) to (5') - reverse sensor scale applied to CEU cropped width or height +(2) to (5) - reverse sensor scale applied to CEU cropped left or top +(6) to (6') - CEU scale - user window + + +S_FMT +----- + +Do not touch input rectangle - it is already optimal. + +1. Calculate current sensor scales: + + scale_s = ((3') - (3)) / ((2') - (2)) + +2. Calculate "effective" input crop (sensor subwindow) - CEU crop scaled back at +current sensor scales onto input window - this is user S_CROP: + + width_u = (5') - (5) = ((4') - (4)) * scale_s + +3. Calculate new combined scales from "effective" input window to requested user +window: + + scale_comb = width_u / ((6') - (6)) + +4. Calculate sensor output window by applying combined scales to real input +window: + + width_s_out = ((2') - (2)) / scale_comb + +5. Apply iterative sensor S_FMT for sensor output window. + + subdev->video_ops->s_fmt(.width = width_s_out) + +6. Retrieve sensor output window (g_fmt) + +7. Calculate new sensor scales: + + scale_s_new = ((3')_new - (3)_new) / ((2') - (2)) + +8. Calculate new CEU crop - apply sensor scales to previously calculated +"effective" crop: + + width_ceu = (4')_new - (4)_new = width_u / scale_s_new + left_ceu = (4)_new - (3)_new = ((5) - (2)) / scale_s_new + +9. Use CEU cropping to crop to the new window: + + ceu_crop(.width = width_ceu, .left = left_ceu) + +10. Use CEU scaling to scale to the requested user window: + + scale_ceu = width_ceu / width + + +S_CROP +------ + +If old scale applied to new crop is invalid produce nearest new scale possible + +1. Calculate current combined scales. + + scale_comb = (((4') - (4)) / ((6') - (6))) * (((2') - (2)) / ((3') - (3))) + +2. Apply iterative sensor S_CROP for new input window. + +3. If old combined scales applied to new crop produce an impossible user window, +adjust scales to produce nearest possible window. + + width_u_out = ((5') - (5)) / scale_comb + + if (width_u_out > max) + scale_comb = ((5') - (5)) / max; + else if (width_u_out < min) + scale_comb = ((5') - (5)) / min; + +4. Issue G_CROP to retrieve actual input window. + +5. Using actual input window and calculated combined scales calculate sensor +target output window. + + width_s_out = ((3') - (3)) = ((2') - (2)) / scale_comb + +6. Apply iterative S_FMT for new sensor target output window. + +7. Issue G_FMT to retrieve the actual sensor output window. + +8. Calculate sensor scales. + + scale_s = ((3') - (3)) / ((2') - (2)) + +9. Calculate sensor output subwindow to be cropped on CEU by applying sensor +scales to the requested window. + + width_ceu = ((5') - (5)) / scale_s + +10. Use CEU cropping for above calculated window. + +11. Calculate CEU scales from sensor scales from results of (10) and user window +from (3) + + scale_ceu = calc_scale(((5') - (5)), &width_u_out) + +12. Apply CEU scales. + +-- +Author: Guennadi Liakhovetski -- cgit v1.2.3 From 49b14650ba5bf80234cb1984fd8396aff03430ce Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Thu, 3 Dec 2009 19:50:35 -0300 Subject: V4L/DVB (13680a): DocBook/media: copy images after building HTML The rule for %.html removes the output directory, so there is no point in copying images before building HTML. Documentation/DocBook/Makefile | 10 +++++----- Signed-off-by: Ben Hutchings Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/Makefile | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index ab8300f67182..22bbf7e39018 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -32,7 +32,7 @@ PS_METHOD = $(prefer-db2x) ### # The targets that may be used. -PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs cleandocs media +PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs cleandocs BOOKS := $(addprefix $(obj)/,$(DOCBOOKS)) xmldocs: $(BOOKS) @@ -45,15 +45,15 @@ PDF := $(patsubst %.xml, %.pdf, $(BOOKS)) pdfdocs: $(PDF) HTML := $(sort $(patsubst %.xml, %.html, $(BOOKS))) -htmldocs: media $(HTML) +htmldocs: $(HTML) $(call build_main_index) + $(call build_images) MAN := $(patsubst %.xml, %.9, $(BOOKS)) mandocs: $(MAN) -media: - mkdir -p $(srctree)/Documentation/DocBook/media/ - cp $(srctree)/Documentation/DocBook/dvb/*.png $(srctree)/Documentation/DocBook/v4l/*.gif $(srctree)/Documentation/DocBook/media/ +build_images = mkdir -p $(objtree)/Documentation/DocBook/media/ && \ + cp $(srctree)/Documentation/DocBook/dvb/*.png $(srctree)/Documentation/DocBook/v4l/*.gif $(objtree)/Documentation/DocBook/media/ installmandocs: mandocs mkdir -p /usr/local/man/man9/ -- cgit v1.2.3 From 5bf583473813530c1bf82051a35fac8d5045f4f7 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Thu, 3 Dec 2009 19:51:09 -0300 Subject: V4L/DVB (13680b): DocBook/media: create links for included sources If docs are being built in a separate directory, xmlto and xsltproc can't find included sources. Make links back to the source directory. I would much prefer to have xmlto and xsltproc look in the source directory for included entities but couldn't see how to do that. This needs to be solved in some way for 2.6.32, even if this patch isn't the right way to do it. Signed-off-by: Ben Hutchings Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/Makefile | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 22bbf7e39018..50075df94962 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -32,10 +32,10 @@ PS_METHOD = $(prefer-db2x) ### # The targets that may be used. -PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs cleandocs +PHONY += xmldocs sgmldocs psdocs pdfdocs htmldocs mandocs installmandocs cleandocs xmldoclinks BOOKS := $(addprefix $(obj)/,$(DOCBOOKS)) -xmldocs: $(BOOKS) +xmldocs: $(BOOKS) xmldoclinks sgmldocs: xmldocs PS := $(patsubst %.xml, %.ps, $(BOOKS)) @@ -55,6 +55,15 @@ mandocs: $(MAN) build_images = mkdir -p $(objtree)/Documentation/DocBook/media/ && \ cp $(srctree)/Documentation/DocBook/dvb/*.png $(srctree)/Documentation/DocBook/v4l/*.gif $(objtree)/Documentation/DocBook/media/ +xmldoclinks: +ifneq ($(objtree),$(srctree)) + for dep in dvb media-entities.tmpl media-indices.tmpl v4l; do \ + rm -f $(objtree)/Documentation/DocBook/$$dep \ + && ln -s $(srctree)/Documentation/DocBook/$$dep $(objtree)/Documentation/DocBook/ \ + || exit; \ + done +endif + installmandocs: mandocs mkdir -p /usr/local/man/man9/ install Documentation/DocBook/man/*.9.gz /usr/local/man/man9/ -- cgit v1.2.3 From 9ea9a886b0e8630e12cff515955e7f0f5be32cb1 Mon Sep 17 00:00:00 2001 From: Clemens Ladisch Date: Tue, 15 Dec 2009 16:45:39 -0800 Subject: vt: make the default cursor shape configurable For embedded systems, the blinking cursor at startup time can be annoying and unintended. Add a new kernel parameter to change the default cursor shape. Signed-off-by: Clemens Ladisch Cc: Daniel Mack Acked-by: Pavel Machek Cc: David Newall Cc: Alan Cox Cc: Greg Kroah-Hartman Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-parameters.txt | 5 +++++ drivers/char/vt.c | 7 +++++-- 2 files changed, 10 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index ab95d3ada5c7..c309515ae959 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2729,6 +2729,11 @@ and is between 256 and 4096 characters. It is defined in the file vmpoff= [KNL,S390] Perform z/VM CP command after power off. Format: + vt.cur_default= [VT] Default cursor shape. + Format: 0xCCBBAA, where AA, BB, and CC are the same as + the parameters of the [?A;B;Cc escape sequence; + see VGA-softcursor.txt. Default: 2 = underline. + vt.default_blu= [VT] Format: ,,,..., Change the default blue palette of the console. diff --git a/drivers/char/vt.c b/drivers/char/vt.c index e43fbc66aef0..50faa1fb0f06 100644 --- a/drivers/char/vt.c +++ b/drivers/char/vt.c @@ -164,6 +164,9 @@ module_param(default_utf8, int, S_IRUGO | S_IWUSR); int global_cursor_default = -1; module_param(global_cursor_default, int, S_IRUGO | S_IWUSR); +static int cur_default = CUR_DEFAULT; +module_param(cur_default, int, S_IRUGO | S_IWUSR); + /* * ignore_poke: don't unblank the screen when things are typed. This is * mainly for the privacy of braille terminal users. @@ -1636,7 +1639,7 @@ static void reset_terminal(struct vc_data *vc, int do_clear) /* do not do set_leds here because this causes an endless tasklet loop when the keyboard hasn't been initialized yet */ - vc->vc_cursor_type = CUR_DEFAULT; + vc->vc_cursor_type = cur_default; vc->vc_complement_mask = vc->vc_s_complement_mask; default_attr(vc); @@ -1838,7 +1841,7 @@ static void do_con_trol(struct tty_struct *tty, struct vc_data *vc, int c) if (vc->vc_par[0]) vc->vc_cursor_type = vc->vc_par[0] | (vc->vc_par[1] << 8) | (vc->vc_par[2] << 16); else - vc->vc_cursor_type = CUR_DEFAULT; + vc->vc_cursor_type = cur_default; return; } break; -- cgit v1.2.3 From 0769746183caff9d4334be48c7b0e7d2ec8716c4 Mon Sep 17 00:00:00 2001 From: Jani Nikula Date: Tue, 15 Dec 2009 16:46:20 -0800 Subject: gpiolib: add support for changing value polarity in sysfs Drivers may use gpiolib sysfs as part of their public user space interface. The GPIO number and polarity might change from board to board. The gpio_export_link() call can be used to hide the GPIO number from user space. Add support for also hiding the GPIO line polarity changes from user space. Signed-off-by: Jani Nikula Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/gpio.txt | 15 +++++ drivers/gpio/gpiolib.c | 161 +++++++++++++++++++++++++++++++++++++++++---- include/asm-generic/gpio.h | 6 ++ include/linux/gpio.h | 6 ++ 4 files changed, 176 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index e4e7daed2ba8..1866c27eec69 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -531,6 +531,13 @@ and have the following read/write attributes: This file exists only if the pin can be configured as an interrupt generating input pin. + "active_low" ... reads as either 0 (false) or 1 (true). Write + any nonzero value to invert the value attribute both + for reading and writing. Existing and subsequent + poll(2) support configuration via the edge attribute + for "rising" and "falling" edges will follow this + setting. + GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the controller implementing GPIOs starting at #42) and have the following read-only attributes: @@ -566,6 +573,8 @@ requested using gpio_request(): int gpio_export_link(struct device *dev, const char *name, unsigned gpio) + /* change the polarity of a GPIO node in sysfs */ + int gpio_sysfs_set_active_low(unsigned gpio, int value); After a kernel driver requests a GPIO, it may only be made available in the sysfs interface by gpio_export(). The driver can control whether the @@ -580,3 +589,9 @@ After the GPIO has been exported, gpio_export_link() allows creating symlinks from elsewhere in sysfs to the GPIO sysfs node. Drivers can use this to provide the interface under their own device in sysfs with a descriptive name. + +Drivers can use gpio_sysfs_set_active_low() to hide GPIO line polarity +differences between boards from user space. This only affects the +sysfs interface. Polarity change can be done both before and after +gpio_export(), and previously enabled poll(2) support for either +rising or falling edge will be reconfigured to follow this setting. diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 50de0f5750d8..a25ad284a272 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -53,6 +53,7 @@ struct gpio_desc { #define FLAG_SYSFS 4 /* exported via /sys/class/gpio/control */ #define FLAG_TRIG_FALL 5 /* trigger on falling edge */ #define FLAG_TRIG_RISE 6 /* trigger on rising edge */ +#define FLAG_ACTIVE_LOW 7 /* sysfs value has active low */ #define PDESC_ID_SHIFT 16 /* add new flags before this one */ @@ -210,6 +211,11 @@ static DEFINE_MUTEX(sysfs_lock); * * configures behavior of poll(2) on /value * * available only if pin can generate IRQs on input * * is read/write as "none", "falling", "rising", or "both" + * /active_low + * * configures polarity of /value + * * is read/write as zero/nonzero + * * also affects existing and subsequent "falling" and "rising" + * /edge configuration */ static ssize_t gpio_direction_show(struct device *dev, @@ -255,7 +261,7 @@ static ssize_t gpio_direction_store(struct device *dev, return status ? : size; } -static const DEVICE_ATTR(direction, 0644, +static /* const */ DEVICE_ATTR(direction, 0644, gpio_direction_show, gpio_direction_store); static ssize_t gpio_value_show(struct device *dev, @@ -267,10 +273,17 @@ static ssize_t gpio_value_show(struct device *dev, mutex_lock(&sysfs_lock); - if (!test_bit(FLAG_EXPORT, &desc->flags)) + if (!test_bit(FLAG_EXPORT, &desc->flags)) { status = -EIO; - else - status = sprintf(buf, "%d\n", !!gpio_get_value_cansleep(gpio)); + } else { + int value; + + value = !!gpio_get_value_cansleep(gpio); + if (test_bit(FLAG_ACTIVE_LOW, &desc->flags)) + value = !value; + + status = sprintf(buf, "%d\n", value); + } mutex_unlock(&sysfs_lock); return status; @@ -294,6 +307,8 @@ static ssize_t gpio_value_store(struct device *dev, status = strict_strtol(buf, 0, &value); if (status == 0) { + if (test_bit(FLAG_ACTIVE_LOW, &desc->flags)) + value = !value; gpio_set_value_cansleep(gpio, value != 0); status = size; } @@ -303,7 +318,7 @@ static ssize_t gpio_value_store(struct device *dev, return status; } -static /*const*/ DEVICE_ATTR(value, 0644, +static const DEVICE_ATTR(value, 0644, gpio_value_show, gpio_value_store); static irqreturn_t gpio_sysfs_irq(int irq, void *priv) @@ -352,9 +367,11 @@ static int gpio_setup_irq(struct gpio_desc *desc, struct device *dev, irq_flags = IRQF_SHARED; if (test_bit(FLAG_TRIG_FALL, &gpio_flags)) - irq_flags |= IRQF_TRIGGER_FALLING; + irq_flags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ? + IRQF_TRIGGER_RISING : IRQF_TRIGGER_FALLING; if (test_bit(FLAG_TRIG_RISE, &gpio_flags)) - irq_flags |= IRQF_TRIGGER_RISING; + irq_flags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ? + IRQF_TRIGGER_FALLING : IRQF_TRIGGER_RISING; if (!pdesc) { pdesc = kmalloc(sizeof(*pdesc), GFP_KERNEL); @@ -475,9 +492,79 @@ found: static DEVICE_ATTR(edge, 0644, gpio_edge_show, gpio_edge_store); +static int sysfs_set_active_low(struct gpio_desc *desc, struct device *dev, + int value) +{ + int status = 0; + + if (!!test_bit(FLAG_ACTIVE_LOW, &desc->flags) == !!value) + return 0; + + if (value) + set_bit(FLAG_ACTIVE_LOW, &desc->flags); + else + clear_bit(FLAG_ACTIVE_LOW, &desc->flags); + + /* reconfigure poll(2) support if enabled on one edge only */ + if (dev != NULL && (!!test_bit(FLAG_TRIG_RISE, &desc->flags) ^ + !!test_bit(FLAG_TRIG_FALL, &desc->flags))) { + unsigned long trigger_flags = desc->flags & GPIO_TRIGGER_MASK; + + gpio_setup_irq(desc, dev, 0); + status = gpio_setup_irq(desc, dev, trigger_flags); + } + + return status; +} + +static ssize_t gpio_active_low_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + const struct gpio_desc *desc = dev_get_drvdata(dev); + ssize_t status; + + mutex_lock(&sysfs_lock); + + if (!test_bit(FLAG_EXPORT, &desc->flags)) + status = -EIO; + else + status = sprintf(buf, "%d\n", + !!test_bit(FLAG_ACTIVE_LOW, &desc->flags)); + + mutex_unlock(&sysfs_lock); + + return status; +} + +static ssize_t gpio_active_low_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t size) +{ + struct gpio_desc *desc = dev_get_drvdata(dev); + ssize_t status; + + mutex_lock(&sysfs_lock); + + if (!test_bit(FLAG_EXPORT, &desc->flags)) { + status = -EIO; + } else { + long value; + + status = strict_strtol(buf, 0, &value); + if (status == 0) + status = sysfs_set_active_low(desc, dev, value != 0); + } + + mutex_unlock(&sysfs_lock); + + return status ? : size; +} + +static const DEVICE_ATTR(active_low, 0644, + gpio_active_low_show, gpio_active_low_store); + static const struct attribute *gpio_attrs[] = { - &dev_attr_direction.attr, &dev_attr_value.attr, + &dev_attr_active_low.attr, NULL, }; @@ -662,12 +749,12 @@ int gpio_export(unsigned gpio, bool direction_may_change) dev = device_create(&gpio_class, desc->chip->dev, MKDEV(0, 0), desc, ioname ? ioname : "gpio%d", gpio); if (!IS_ERR(dev)) { - if (direction_may_change) - status = sysfs_create_group(&dev->kobj, + status = sysfs_create_group(&dev->kobj, &gpio_attr_group); - else + + if (!status && direction_may_change) status = device_create_file(dev, - &dev_attr_value); + &dev_attr_direction); if (!status && gpio_to_irq(gpio) >= 0 && (direction_may_change @@ -744,6 +831,55 @@ done: } EXPORT_SYMBOL_GPL(gpio_export_link); + +/** + * gpio_sysfs_set_active_low - set the polarity of gpio sysfs value + * @gpio: gpio to change + * @value: non-zero to use active low, i.e. inverted values + * + * Set the polarity of /sys/class/gpio/gpioN/value sysfs attribute. + * The GPIO does not have to be exported yet. If poll(2) support has + * been enabled for either rising or falling edge, it will be + * reconfigured to follow the new polarity. + * + * Returns zero on success, else an error. + */ +int gpio_sysfs_set_active_low(unsigned gpio, int value) +{ + struct gpio_desc *desc; + struct device *dev = NULL; + int status = -EINVAL; + + if (!gpio_is_valid(gpio)) + goto done; + + mutex_lock(&sysfs_lock); + + desc = &gpio_desc[gpio]; + + if (test_bit(FLAG_EXPORT, &desc->flags)) { + struct device *dev; + + dev = class_find_device(&gpio_class, NULL, desc, match_export); + if (dev == NULL) { + status = -ENODEV; + goto unlock; + } + } + + status = sysfs_set_active_low(desc, dev, value); + +unlock: + mutex_unlock(&sysfs_lock); + +done: + if (status) + pr_debug("%s: gpio%d status %d\n", __func__, gpio, status); + + return status; +} +EXPORT_SYMBOL_GPL(gpio_sysfs_set_active_low); + /** * gpio_unexport - reverse effect of gpio_export() * @gpio: gpio to make unavailable @@ -1094,6 +1230,7 @@ void gpio_free(unsigned gpio) } desc_set_label(desc, NULL); module_put(desc->chip->owner); + clear_bit(FLAG_ACTIVE_LOW, &desc->flags); clear_bit(FLAG_REQUESTED, &desc->flags); } else WARN_ON(extra_checks); diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h index 204bed37e82d..485eeb6c4ef3 100644 --- a/include/asm-generic/gpio.h +++ b/include/asm-generic/gpio.h @@ -145,6 +145,7 @@ extern int __gpio_to_irq(unsigned gpio); extern int gpio_export(unsigned gpio, bool direction_may_change); extern int gpio_export_link(struct device *dev, const char *name, unsigned gpio); +extern int gpio_sysfs_set_active_low(unsigned gpio, int value); extern void gpio_unexport(unsigned gpio); #endif /* CONFIG_GPIO_SYSFS */ @@ -197,6 +198,11 @@ static inline int gpio_export_link(struct device *dev, const char *name, return -ENOSYS; } +static inline int gpio_sysfs_set_active_low(unsigned gpio, int value) +{ + return -ENOSYS; +} + static inline void gpio_unexport(unsigned gpio) { } diff --git a/include/linux/gpio.h b/include/linux/gpio.h index 059bd189d35d..4e949a5b5b85 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -99,6 +99,12 @@ static inline int gpio_export_link(struct device *dev, const char *name, return -EINVAL; } +static inline int gpio_sysfs_set_active_low(unsigned gpio, int value) +{ + /* GPIO can never have been requested */ + WARN_ON(1); + return -EINVAL; +} static inline void gpio_unexport(unsigned gpio) { -- cgit v1.2.3 From 4562aea791e97aa0f2e342849daa18b588c46df1 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Tue, 15 Dec 2009 16:46:42 -0800 Subject: viafb: documentation update We now support the VX855, and the VX800 is no longer unaccellerated. viafb_video_dev was removed as it was useless. Signed-off-by: Harald Welte Signed-off-by: Florian Tobias Schandinat Cc: Joseph Chan Cc: Scott Fang Cc: Krzysztof Helt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/fb/viafb.txt | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/fb/viafb.txt b/Documentation/fb/viafb.txt index 67dbf442b0b6..f3e046a6a987 100644 --- a/Documentation/fb/viafb.txt +++ b/Documentation/fb/viafb.txt @@ -7,7 +7,7 @@ VIA UniChrome Family(CLE266, PM800 / CN400 / CN300, P4M800CE / P4M800Pro / CN700 / VN800, CX700 / VX700, K8M890, P4M890, - CN896 / P4M900, VX800) + CN896 / P4M900, VX800, VX855) [Driver features] ------------------------ @@ -154,13 +154,6 @@ 0 : No Dual Edge Panel (default) 1 : Dual Edge Panel - viafb_video_dev: - This option is used to specify video output devices(CRT, DVI, LCD) for - duoview case. - For example: - To output video on DVI, we should use: - modprobe viafb viafb_video_dev=DVI... - viafb_lcd_port: This option is used to specify LCD output port, available values are "DVP0" "DVP1" "DFP_HIGHLOW" "DFP_HIGH" "DFP_LOW". @@ -181,9 +174,6 @@ Notes: and bpp, need to call VIAFB specified ioctl interface VIAFB_SET_DEVICE instead of calling common ioctl function FBIOPUT_VSCREENINFO since viafb doesn't support multi-head well, or it will cause screen crush. - 4. VX800 2D accelerator hasn't been supported in this driver yet. When - using driver on VX800, the driver will disable the acceleration - function as default. [Configure viafb with "fbset" tool] -- cgit v1.2.3 From 7de3369c14b67fe77d8b5f65171bb3a3b4f371ba Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 15 Dec 2009 16:46:59 -0800 Subject: doc: SubmitChecklist, add ioctls, remove OSDL reference If a patch adds ioctls, then Documentation/ioctl/ioctl-number.txt should also be updated. Remove reference to the OSDL PLM build farm. Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/SubmitChecklist | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 78a9168ff377..1053a56be3b1 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist @@ -15,7 +15,7 @@ kernel patches. 2: Passes allnoconfig, allmodconfig 3: Builds on multiple CPU architectures by using local cross-compile tools - or something like PLM at OSDL. + or some other build farm. 4: ppc64 is a good architecture for cross-compilation checking because it tends to use `unsigned long' for 64-bit quantities. @@ -88,3 +88,6 @@ kernel patches. 24: All memory barriers {e.g., barrier(), rmb(), wmb()} need a comment in the source code that explains the logic of what they are doing and why. + +25: If any ioctl's are added by the patch, then also update + Documentation/ioctl/ioctl-number.txt. -- cgit v1.2.3 From 82c1e49ccb28534b4e8b77d5f0ff553f19912d4d Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Tue, 15 Dec 2009 16:46:59 -0800 Subject: proc: remove docbook and example Example is outdated, it still uses old ->read_proc interfaces and "fb" example is plain racy. There are better examples all over the tree. Docbook itself says almost nothing about /proc and contain quite a number of simply wrong facts, e.g. device nodes support. What it does is describing at great length interface which are going to be removed. There are Documentation/filesystems/seq_file.txt in exchange. Signed-off-by: Alexey Dobriyan Acked-by: Erik Mouw Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/DocBook/Makefile | 17 +- Documentation/DocBook/procfs-guide.tmpl | 626 -------------------------------- Documentation/DocBook/procfs_example.c | 201 ---------- 3 files changed, 3 insertions(+), 841 deletions(-) delete mode 100644 Documentation/DocBook/procfs-guide.tmpl delete mode 100644 Documentation/DocBook/procfs_example.c (limited to 'Documentation') diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index ab8300f67182..ee34ceb9ad5f 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -8,7 +8,7 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \ kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ - procfs-guide.xml writing_usb_driver.xml networking.xml \ + writing_usb_driver.xml networking.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ @@ -65,7 +65,7 @@ KERNELDOC = $(srctree)/scripts/kernel-doc DOCPROC = $(objtree)/scripts/basic/docproc XMLTOFLAGS = -m $(srctree)/Documentation/DocBook/stylesheet.xsl -#XMLTOFLAGS += --skip-validation +XMLTOFLAGS += --skip-validation ### # DOCPROC is used for two purposes: @@ -101,17 +101,6 @@ endif # Changes in kernel-doc force a rebuild of all documentation $(BOOKS): $(KERNELDOC) -### -# procfs guide uses a .c file as example code. -# This requires an explicit dependency -C-procfs-example = procfs_example.xml -C-procfs-example2 = $(addprefix $(obj)/,$(C-procfs-example)) -$(obj)/procfs-guide.xml: $(C-procfs-example2) - -# List of programs to build -##oops, this is a kernel module::hostprogs-y := procfs_example -obj-m += procfs_example.o - # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -238,7 +227,7 @@ clean-files := $(DOCBOOKS) \ $(patsubst %.xml, %.pdf, $(DOCBOOKS)) \ $(patsubst %.xml, %.html, $(DOCBOOKS)) \ $(patsubst %.xml, %.9, $(DOCBOOKS)) \ - $(C-procfs-example) $(index) + $(index) clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man diff --git a/Documentation/DocBook/procfs-guide.tmpl b/Documentation/DocBook/procfs-guide.tmpl deleted file mode 100644 index 9eba4b7af73d..000000000000 --- a/Documentation/DocBook/procfs-guide.tmpl +++ /dev/null @@ -1,626 +0,0 @@ - - -]> - - - - Linux Kernel Procfs Guide - - - - Erik - (J.A.K.) - Mouw - -
- mouw@nl.linux.org -
-
-
- - - This software and documentation were written while working on the - LART computing board - (http://www.lartmaker.nl/), - which was sponsored by the Delt University of Technology projects - Mobile Multi-media Communications and Ubiquitous Communications. - - -
- - - - 1.0 - May 30, 2001 - Initial revision posted to linux-kernel - - - 1.1 - June 3, 2001 - Revised after comments from linux-kernel - - - - - 2001 - Erik Mouw - - - - - - This documentation is free software; you can redistribute it - and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either - version 2 of the License, or (at your option) any later - version. - - - - This documentation is distributed in the hope that it will be - useful, but WITHOUT ANY WARRANTY; without even the implied - warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR - PURPOSE. See the GNU General Public License for more details. - - - - You should have received a copy of the GNU General Public - License along with this program; if not, write to the Free - Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, - MA 02111-1307 USA - - - - For more details see the file COPYING in the source - distribution of Linux. - - -
- - - - - - - - - - - - Preface - - - This guide describes the use of the procfs file system from - within the Linux kernel. The idea to write this guide came up on - the #kernelnewbies IRC channel (see http://www.kernelnewbies.org/), - when Jeff Garzik explained the use of procfs and forwarded me a - message Alexander Viro wrote to the linux-kernel mailing list. I - agreed to write it up nicely, so here it is. - - - - I'd like to thank Jeff Garzik - jgarzik@pobox.com and Alexander Viro - viro@parcelfarce.linux.theplanet.co.uk for their input, - Tim Waugh twaugh@redhat.com for his Selfdocbook, - and Marc Joosen marcj@historia.et.tudelft.nl for - proofreading. - - - - Erik - - - - - - - - Introduction - - - The /proc file system - (procfs) is a special file system in the linux kernel. It's a - virtual file system: it is not associated with a block device - but exists only in memory. The files in the procfs are there to - allow userland programs access to certain information from the - kernel (like process information in /proc/[0-9]+/), but also for debug - purposes (like /proc/ksyms). - - - - This guide describes the use of the procfs file system from - within the Linux kernel. It starts by introducing all relevant - functions to manage the files within the file system. After that - it shows how to communicate with userland, and some tips and - tricks will be pointed out. Finally a complete example will be - shown. - - - - Note that the files in /proc/sys are sysctl files: they - don't belong to procfs and are governed by a completely - different API described in the Kernel API book. - - - - - - - - Managing procfs entries - - - This chapter describes the functions that various kernel - components use to populate the procfs with files, symlinks, - device nodes, and directories. - - - - A minor note before we start: if you want to use any of the - procfs functions, be sure to include the correct header file! - This should be one of the first lines in your code: - - - -#include <linux/proc_fs.h> - - - - - - - Creating a regular file - - - - struct proc_dir_entry* create_proc_entry - const char* name - mode_t mode - struct proc_dir_entry* parent - - - - - This function creates a regular file with the name - name, file mode - mode in the directory - parent. To create a file in the root of - the procfs, use NULL as - parent parameter. When successful, the - function will return a pointer to the freshly created - struct proc_dir_entry; otherwise it - will return NULL. describes how to do something useful with - regular files. - - - - Note that it is specifically supported that you can pass a - path that spans multiple directories. For example - create_proc_entry("drivers/via0/info") - will create the via0 - directory if necessary, with standard - 0755 permissions. - - - - If you only want to be able to read the file, the function - create_proc_read_entry described in may be used to create and initialise - the procfs entry in one single call. - - - - - - - - Creating a symlink - - - - struct proc_dir_entry* - proc_symlink const - char* name - struct proc_dir_entry* - parent const - char* dest - - - - - This creates a symlink in the procfs directory - parent that points from - name to - dest. This translates in userland to - ln -s dest - name. - - - - - Creating a directory - - - - struct proc_dir_entry* proc_mkdir - const char* name - struct proc_dir_entry* parent - - - - - Create a directory name in the procfs - directory parent. - - - - - - - - Removing an entry - - - - void remove_proc_entry - const char* name - struct proc_dir_entry* parent - - - - - Removes the entry name in the directory - parent from the procfs. Entries are - removed by their name, not by the - struct proc_dir_entry returned by the - various create functions. Note that this function doesn't - recursively remove entries. - - - - Be sure to free the data entry from - the struct proc_dir_entry before - remove_proc_entry is called (that is: if - there was some data allocated, of - course). See for more information - on using the data entry. - - - - - - - - - Communicating with userland - - - Instead of reading (or writing) information directly from - kernel memory, procfs works with call back - functions for files: functions that are called when - a specific file is being read or written. Such functions have - to be initialised after the procfs file is created by setting - the read_proc and/or - write_proc fields in the - struct proc_dir_entry* that the - function create_proc_entry returned: - - - -struct proc_dir_entry* entry; - -entry->read_proc = read_proc_foo; -entry->write_proc = write_proc_foo; - - - - If you only want to use a the - read_proc, the function - create_proc_read_entry described in may be used to create and initialise the - procfs entry in one single call. - - - - - - Reading data - - - The read function is a call back function that allows userland - processes to read data from the kernel. The read function - should have the following format: - - - - - int read_func - char* buffer - char** start - off_t off - int count - int* peof - void* data - - - - - The read function should write its information into the - buffer, which will be exactly - PAGE_SIZE bytes long. - - - - The parameter - peof should be used to signal that the - end of the file has been reached by writing - 1 to the memory location - peof points to. - - - - The data - parameter can be used to create a single call back function for - several files, see . - - - - The rest of the parameters and the return value are described - by a comment in fs/proc/generic.c as follows: - - -
- - You have three ways to return data: - - - - - Leave *start = NULL. (This is the default.) - Put the data of the requested offset at that - offset within the buffer. Return the number (n) - of bytes there are from the beginning of the - buffer up to the last byte of data. If the - number of supplied bytes (= n - offset) is - greater than zero and you didn't signal eof - and the reader is prepared to take more data - you will be called again with the requested - offset advanced by the number of bytes - absorbed. This interface is useful for files - no larger than the buffer. - - - - - Set *start to an unsigned long value less than - the buffer address but greater than zero. - Put the data of the requested offset at the - beginning of the buffer. Return the number of - bytes of data placed there. If this number is - greater than zero and you didn't signal eof - and the reader is prepared to take more data - you will be called again with the requested - offset advanced by *start. This interface is - useful when you have a large file consisting - of a series of blocks which you want to count - and return as wholes. - (Hack by Paul.Russell@rustcorp.com.au) - - - - - Set *start to an address within the buffer. - Put the data of the requested offset at *start. - Return the number of bytes of data placed there. - If this number is greater than zero and you - didn't signal eof and the reader is prepared to - take more data you will be called again with the - requested offset advanced by the number of bytes - absorbed. - - - -
- - - shows how to use a read call back - function. - -
- - - - - - Writing data - - - The write call back function allows a userland process to write - data to the kernel, so it has some kind of control over the - kernel. The write function should have the following format: - - - - - int write_func - struct file* file - const char* buffer - unsigned long count - void* data - - - - - The write function should read count - bytes at maximum from the buffer. Note - that the buffer doesn't live in the - kernel's memory space, so it should first be copied to kernel - space with copy_from_user. The - file parameter is usually - ignored. shows how to use the - data parameter. - - - - Again, shows how to use this call back - function. - - - - - - - - A single call back for many files - - - When a large number of almost identical files is used, it's - quite inconvenient to use a separate call back function for - each file. A better approach is to have a single call back - function that distinguishes between the files by using the - data field in struct - proc_dir_entry. First of all, the - data field has to be initialised: - - - -struct proc_dir_entry* entry; -struct my_file_data *file_data; - -file_data = kmalloc(sizeof(struct my_file_data), GFP_KERNEL); -entry->data = file_data; - - - - The data field is a void - *, so it can be initialised with anything. - - - - Now that the data field is set, the - read_proc and - write_proc can use it to distinguish - between files because they get it passed into their - data parameter: - - - -int foo_read_func(char *page, char **start, off_t off, - int count, int *eof, void *data) -{ - int len; - - if(data == file_data) { - /* special case for this file */ - } else { - /* normal processing */ - } - - return len; -} - - - - Be sure to free the data data field - when removing the procfs entry. - - -
- - - - - - Tips and tricks - - - - - - Convenience functions - - - - struct proc_dir_entry* create_proc_read_entry - const char* name - mode_t mode - struct proc_dir_entry* parent - read_proc_t* read_proc - void* data - - - - - This function creates a regular file in exactly the same way - as create_proc_entry from does, but also allows to set the read - function read_proc in one call. This - function can set the data as well, like - explained in . - - - - - - - Modules - - - If procfs is being used from within a module, be sure to set - the owner field in the - struct proc_dir_entry to - THIS_MODULE. - - - -struct proc_dir_entry* entry; - -entry->owner = THIS_MODULE; - - - - - - - - Mode and ownership - - - Sometimes it is useful to change the mode and/or ownership of - a procfs entry. Here is an example that shows how to achieve - that: - - - -struct proc_dir_entry* entry; - -entry->mode = S_IWUSR |S_IRUSR | S_IRGRP | S_IROTH; -entry->uid = 0; -entry->gid = 100; - - - - - - - - - - Example - - - -&procfsexample; - - -
diff --git a/Documentation/DocBook/procfs_example.c b/Documentation/DocBook/procfs_example.c deleted file mode 100644 index a5b11793b1e0..000000000000 --- a/Documentation/DocBook/procfs_example.c +++ /dev/null @@ -1,201 +0,0 @@ -/* - * procfs_example.c: an example proc interface - * - * Copyright (C) 2001, Erik Mouw (mouw@nl.linux.org) - * - * This file accompanies the procfs-guide in the Linux kernel - * source. Its main use is to demonstrate the concepts and - * functions described in the guide. - * - * This software has been developed while working on the LART - * computing board (http://www.lartmaker.nl), which was sponsored - * by the Delt University of Technology projects Mobile Multi-media - * Communications and Ubiquitous Communications. - * - * This program is free software; you can redistribute - * it and/or modify it under the terms of the GNU General - * Public License as published by the Free Software - * Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be - * useful, but WITHOUT ANY WARRANTY; without even the implied - * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR - * PURPOSE. See the GNU General Public License for more - * details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place, - * Suite 330, Boston, MA 02111-1307 USA - * - */ - -#include -#include -#include -#include -#include -#include - - -#define MODULE_VERS "1.0" -#define MODULE_NAME "procfs_example" - -#define FOOBAR_LEN 8 - -struct fb_data_t { - char name[FOOBAR_LEN + 1]; - char value[FOOBAR_LEN + 1]; -}; - - -static struct proc_dir_entry *example_dir, *foo_file, - *bar_file, *jiffies_file, *symlink; - - -struct fb_data_t foo_data, bar_data; - - -static int proc_read_jiffies(char *page, char **start, - off_t off, int count, - int *eof, void *data) -{ - int len; - - len = sprintf(page, "jiffies = %ld\n", - jiffies); - - return len; -} - - -static int proc_read_foobar(char *page, char **start, - off_t off, int count, - int *eof, void *data) -{ - int len; - struct fb_data_t *fb_data = (struct fb_data_t *)data; - - /* DON'T DO THAT - buffer overruns are bad */ - len = sprintf(page, "%s = '%s'\n", - fb_data->name, fb_data->value); - - return len; -} - - -static int proc_write_foobar(struct file *file, - const char *buffer, - unsigned long count, - void *data) -{ - int len; - struct fb_data_t *fb_data = (struct fb_data_t *)data; - - if(count > FOOBAR_LEN) - len = FOOBAR_LEN; - else - len = count; - - if(copy_from_user(fb_data->value, buffer, len)) - return -EFAULT; - - fb_data->value[len] = '\0'; - - return len; -} - - -static int __init init_procfs_example(void) -{ - int rv = 0; - - /* create directory */ - example_dir = proc_mkdir(MODULE_NAME, NULL); - if(example_dir == NULL) { - rv = -ENOMEM; - goto out; - } - /* create jiffies using convenience function */ - jiffies_file = create_proc_read_entry("jiffies", - 0444, example_dir, - proc_read_jiffies, - NULL); - if(jiffies_file == NULL) { - rv = -ENOMEM; - goto no_jiffies; - } - - /* create foo and bar files using same callback - * functions - */ - foo_file = create_proc_entry("foo", 0644, example_dir); - if(foo_file == NULL) { - rv = -ENOMEM; - goto no_foo; - } - - strcpy(foo_data.name, "foo"); - strcpy(foo_data.value, "foo"); - foo_file->data = &foo_data; - foo_file->read_proc = proc_read_foobar; - foo_file->write_proc = proc_write_foobar; - - bar_file = create_proc_entry("bar", 0644, example_dir); - if(bar_file == NULL) { - rv = -ENOMEM; - goto no_bar; - } - - strcpy(bar_data.name, "bar"); - strcpy(bar_data.value, "bar"); - bar_file->data = &bar_data; - bar_file->read_proc = proc_read_foobar; - bar_file->write_proc = proc_write_foobar; - - /* create symlink */ - symlink = proc_symlink("jiffies_too", example_dir, - "jiffies"); - if(symlink == NULL) { - rv = -ENOMEM; - goto no_symlink; - } - - /* everything OK */ - printk(KERN_INFO "%s %s initialised\n", - MODULE_NAME, MODULE_VERS); - return 0; - -no_symlink: - remove_proc_entry("bar", example_dir); -no_bar: - remove_proc_entry("foo", example_dir); -no_foo: - remove_proc_entry("jiffies", example_dir); -no_jiffies: - remove_proc_entry(MODULE_NAME, NULL); -out: - return rv; -} - - -static void __exit cleanup_procfs_example(void) -{ - remove_proc_entry("jiffies_too", example_dir); - remove_proc_entry("bar", example_dir); - remove_proc_entry("foo", example_dir); - remove_proc_entry("jiffies", example_dir); - remove_proc_entry(MODULE_NAME, NULL); - - printk(KERN_INFO "%s %s removed\n", - MODULE_NAME, MODULE_VERS); -} - - -module_init(init_procfs_example); -module_exit(cleanup_procfs_example); - -MODULE_AUTHOR("Erik Mouw"); -MODULE_DESCRIPTION("procfs examples"); -MODULE_LICENSE("GPL"); -- cgit v1.2.3 From 6be4b78993498c253e99b12c4d0f7684a36955e2 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Tue, 15 Dec 2009 16:47:00 -0800 Subject: seq_file: use proc_create() in documentation Using create_proc_entry() + ->proc_fops assignment is racy because ->proc_fops will be NULL for some time, use proc_create() to avoid race. Signed-off-by: Alexey Dobriyan Cc: Randy Dunlap Cc: Jonathan Corbet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/seq_file.txt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index 0d15ebccf5b0..a1e2e0dda907 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt @@ -248,9 +248,7 @@ code, that is done in the initialization code in the usual way: { struct proc_dir_entry *entry; - entry = create_proc_entry("sequence", 0, NULL); - if (entry) - entry->proc_fops = &ct_file_ops; + proc_create("sequence", 0, NULL, &ct_file_ops); return 0; } -- cgit v1.2.3