The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
sys_rt_sigsuspend() instead of duplicating it for each architecture. This
provides such an implementation and makes arch/powerpc use it.
It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.
We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).
The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:
real 0m31.921s
user 0m0.688s
sys 0m31.234s
With syscall support the results are much better:
real 0m20.699s
user 0m0.536s
sys 0m20.149s
The interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One of the things that's confusing about nfsd4_lock is that the lk_stateowner
field could be set to either of two different lockowners: the open owner or
the lock owner. Rename to lk_replay_owner and add a comment to make it clear
that it's used for whichever stateowner has its sequence id bumped for replay
detection.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The server code currently keeps track of the destination address on every
request so that it can reply using the same address. However we forget to do
that in the case of a deferred request. Remedy this oversight. >From folks
at PolyServe.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
up through the stack. This involves a number of rearrangements of error
paths, and care to distinguish between Linux -errno numbers and NFSERR
numbers.
In the 'create' routines, we continue with the 'setattr' even if a previous
sync_dir failed.
This patch is quite different from Takashi's in a few ways, but there is still
a strong lineage.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All standard system calls should be declared in include/linux/syscalls.h.
Add some of the new additions that were previously missed.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
imbalance in memory allocation during bootup.
The slab allocator in 2.6.13 is not numa aware and simply calls
alloc_pages(). This means that memory policies may control the behavior of
alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE
resulting in the spreading out of allocations during bootup over all
available nodes. The slab allocator in 2.6.13 has only a single list of
slab pages. As a result the per cpu slab cache and the spinlock controlled
page lists may contain slab entries from off node memory. The slab
allocator in 2.6.13 makes no effort to discern the locality of an entry on
its lists.
The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
explicitly by calling alloc_pages_node(). The NUMA slab allocator manages
slab entries by having lists of available slab pages for each node. The
per cpu slab cache can only contain slab entries associated with the node
local to the processor. This guarantees that the default allocation mode
of the slab allocator always assigns local memory if available.
Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
anymore. In 2.6.14 all node unspecific slab allocations are performed on
the boot processor. This means that most of key data structures are
allocated on one node. Most processors will have to refer to these
structures making the boot node a potential bottleneck. This may reduce
performance and cause unnecessary memory pressure on the boot node.
This patch implements NUMA policies in the slab layer. There is the need
of explicit application of NUMA memory policies by the slab allcator itself
since the NUMA slab allocator does no longer let the page_allocator control
locality.
The check for policies is made directly at the beginning of __cache_alloc
using current->mempolicy. The memory policy is already frequently checked
by the page allocator (alloc_page_vma() and alloc_page_current()). So it
is highly likely that the cacheline is present. For MPOL_INTERLEAVE
kmalloc() will spread out each request to one node after another so that an
equal distribution of allocations can be obtained during bootup.
It is not possible to push the policy check to lower layers of the NUMA
slab allocator since the per cpu caches are now only containing slab
entries from the current node. If the policy says that the local node is
not to be preferred or forbidden then there is no point in checking the
slab cache or local list of slab pages. The allocation better be directed
immediately to the lists containing slab entries for the allowed set of
nodes.
This way of applying policy also fixes another strange behavior in 2.6.13.
alloc_pages() is controlled by the memory allocation policy of the current
process. It could therefore be that one process is running with
MPOL_INTERLEAVE and would f.e. obtain a new page following that policy
since no slab entries are in the lists anymore. A page can typically be
used for multiple slab entries but lets say that the current process is
only using one. The other entries are then added to the slab lists. These
are now non local entries in the slab lists despite of the possible
availability of local pages that would provide faster access and increase
the performance of the application.
Another process without MPOL_INTERLEAVE may now run and expect a local slab
entry from kmalloc(). However, there are still these free slab entries
from the off node page obtained from the other process via MPOL_INTERLEAVE
in the cache. The process will then get an off node slab entry although
other slab entries may be available that are local to that process. This
means that the policy if one process may contaminate the locality of the
slab caches for other processes.
This patch in effect insures that a per process policy is followed for the
allocation of slab entries and that there cannot be a memory policy
influence from one process to another. A process with default policy will
always get a local slab entry if one is available. And the process using
memory policies will get its memory arranged as requested. Off-node slab
allocation will require the use of spinlocks and will make the use of per
cpu caches not possible. A process using memory policies to redirect
allocations offnode will have to cope with additional lock overhead in
addition to the latency added by the need to access a remote slab entry.
Changes V1->V2
- Remove #ifdef CONFIG_NUMA by moving forward declaration into
prior #ifdef CONFIG_NUMA section.
- Give the function determining the node number to use a saner
name.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
proc support for zone reclaim
This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
used to override the automatic determination of the zone reclaim made on
bootup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
patch fixes them up, removes unused code and makes zone reclaim usable.
Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available. Zone reclaim is of particular importance for NUMA
machines. It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.
Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
component (enclosure or motherboard) on IA64. The meaning of the NUMA
distance information seems to vary by arch.
If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).
This patch was discussed before. See
http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Migration code currently does not take a reference to target page
properly, so between unlocking the pte and trying to take a new
reference to the page with isolate_lru_page, anything could happen to
it.
Fix this by holding the pte lock until we get a chance to elevate the
refcount.
Other small cleanups while we're here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This one fell through the automation at first because it initializes the
semaphore to locked, but that's easily remedied
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>
drivers/cpufreq/cpufreq.c | 37 +++++++++++++++++++------------------
include/linux/cpufreq.h | 3 ++-
2 files changed, 21 insertions(+), 19 deletions(-)
There is a new device which is look like:
Serial controller: Decision Computer International Co. PCCOM2 (rev 02) (prog-if 02 [16550])
0700: 6666:0004 (rev 02) (prog-if 02)
Flags: medium devsel, IRQ 177
Memory at fe000000 (32-bit, non-prefetchable) [size=128]
I/O ports at e880 [size=128]
I/O ports at e400 [size=256]
It has two 16550A, and is not listed in kernel, although the
manufacturer clams that it is supported...
I've created the following patch, it only add the new PCI id and the
card to the repository, it seems to work.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Added macros for min/default/max link priority in tipc_config.h.
Also renamed TIPC_NUM_LINK_PRI to TIPC_MEDIA_LINK_PRI since that
is a more accurate description of what it is used for.
Signed-off-by: Per Liden <per.liden@ericsson.com>
This patch contains the following possible cleanups:
- make needlessly global code static
- arcnet.c: remove the unneeded EXPORT_SYMBOL(arc_proto_null)
- arcnet.c: remove the unneeded EXPORT_SYMBOL(arcnet_dump_packet)
To make Jeff happy, arcnet.c still prints
arcnet: v3.93 BETA 2000/04/29 - by Avery Pennarun et al.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Some subsystems, such as PPP, can send negative values
here. It just happened to work correctly on 32-bit with
an unsigned value, but on 64-bit this explodes.
Figured out by Paul Mackerras based upon several PPP crash
reports.
Signed-off-by: David S. Miller <davem@davemloft.net>
These definitions ware used for only internal use in kernel <= 2.6.13,
which had not introduced the unified parser of IPv6 extension header yet.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Procfs always output IPV6 addresses without the colon
characters, and we cannot change that.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __meminit to the __init lineup to ensure functions default
to __init when memory hotplug is not enabled. Replace __devinit
with __meminit on functions that were changed when the memory
hotplug code was introduced.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds kexec() support for SH.
Signed-off-by: kogiidena <kogiidena@eggplant.ddo.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: <fastboot@lists.osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
If optimizing for size (CONFIG_CC_OPTIMIZE_FOR_SIZE), allow gcc4 compilers
to decide what to inline and what not - instead of the kernel forcing gcc
to inline all the time. This requires several places that require to be
inlined to be marked as such, previous patches in this series do that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven <arjan@infradead.org>
Mark a number of functions as 'must inline'. The functions affected by this
patch need to be inlined because they use knowledge that their arguments are
constant so that most of the function optimizes away. At this point this
patch does not change behavior, it's for documentation only (and for future
patches in the inline series)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is the first in a series that tries to optimize the kernel in terms
of size (and thus cache behavior, both cpu and pagecache).
This first patch changes __always_inline to be a forced inline instead of the
"regular" inline it was on everything except alpha. This forced inline
matches the intention of the define better as a matter of documentation.
There is no change in behavior by this patch, since "inline" currently is
mapped to a forced inline anyway.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need for a file argument. If we'd really need it it's in vma->vm_file
already. gbefb and sgivwfb used to set vma->vm_file to the file argument, but
the kernel alrady did that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ioctl and file arguments to ->fb_mmap are totally unused and there's not
reason a driver should need them.
Also update the ->fb_compat_ioctl prototype to be the same as ->fb_mmap.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the remaining kmalloc() wrapper bits from fs/smbfs/.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The problem, reported in:
http://bugzilla.kernel.org/show_bug.cgi?id=5859
and by various other email messages and lkml posts is that the cpuset hook
in the oom (out of memory) code can try to take a cpuset semaphore while
holding the tasklist_lock (a spinlock).
One must not sleep while holding a spinlock.
The fix seems easy enough - move the cpuset semaphore region outside the
tasklist_lock region.
This required a few lines of mechanism to implement. The oom code where
the locking needs to be changed does not have access to the cpuset locks,
which are internal to kernel/cpuset.c only. So I provided a couple more
cpuset interface routines, available to the rest of the kernel, which
simple take and drop the lock needed here (cpusets callback_sem).
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
finish_arch_switch needs to update the user cpu time as well, not just the
system cpu time. Otherwise the partial user cpu time of a process that is
stored in the lowcore will be (mis-)accounted to the next process.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anything that writes into a tmpfs filesystem is liable to disproportionately
decrease the available memory on a particular node. Since there's no telling
what sort of application (e.g. dd/cp/cat) might be dropping large files
there, this lets the admin choose the appropriate default behavior for their
site's situation.
Introduce a tmpfs mount option which allows specifying a memory policy and
a second option to specify the nodelist for that policy. With the default
policy, tmpfs will behave as it does today. This patch adds support for
preferred, bind, and interleave policies.
The default policy will cause pages to be added to tmpfs files on the node
which is doing the writing. Some jobs expect a single process to create
and manage the tmpfs files. This results in a node which has a
significantly reduced number of free pages.
With this patch, the administrator can specify the policy and nodes for
that policy where they would prefer allocations.
This patch was originally written by Brent Casavant and Hugh Dickins. I
added support for the bind and preferred policies and the mpol_nodelist
mount option.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some people apparently run CONFIG_NUMA without CONFIG_SWAP. The migration
code currently depends on swap. This patch provides a set of inline
fallback functions so that the kernel properly compiles. However, calls to
migration functions will fail.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new SCHED_BATCH (3) scheduling policy: such tasks are presumed
CPU-intensive, and will acquire a constant +5 priority level penalty. Such
policy is nice for workloads that are non-interactive, but which do not
want to give up their nice levels. The policy is also useful for workloads
that want a deterministic scheduling policy without interactivity causing
extra preemptions (between that workload's tasks).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add driver support for a 2 port PCI IOC3-based serial card on Altix boxes:
This is a re-submission. On the original submission I was asked to
organize the code so that the MIPS ioc3 ethernet and serial parts could be
used with this driver. Stanislaw Skowronek was kind enough to provide the
shim layer for this - thanks Stanislaw. This patch includes the shim layer
and the Altix PCI ioc3 serial driver. The MIPS merged ioc3 ethernet and
serial support is forthcoming.
Signed-off-by: Patrick Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A Christoph suggested that the /proc/devices file be converted to use the
seq_file interface. This patch does that.
I've obxerved one or two installation that had sufficiently large sans that
they overran the 4k limit on /proc/devices.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add software recognition for the new LSI Logic Fibre Channel controller.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
gcc4 generates warnings when a non-FASTCALL function pointer is assigned to a
FASTCALL one. Perhaps it has taste.
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes the SPI core and its users access transfers in the SPI message
structure as linked list not as an array, as discussed on LKML.
From: David Brownell <dbrownell@users.sourceforge.net>
Updates including doc, bugfixes to the list code, add
spi_message_add_tail(). Plus, initialize things _before_ grabbing the
locks in some cases (in case it grows more expensive). This also merges
some bitbang updates of mine that didn't yet make it into the mm tree.
Signed-off-by: Vitaly Wool <vwool@ru.mvista.com>
Signed-off-by: Dmitry Pervushin <dpervushin@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This was originally a driver for the ST M25P80 SPI flash. It's been
updated slightly to handle other M25P series chips.
For many of these chips, the specific type could be probed, but for now
this just requires static setup with flash_platform_data that lists the
chip type (size, format) and any default partitioning to use.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Mike Lavender <mike@steroidmicros.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a bitbanging spi master, hooking up to board/adapter-specific glue
code which knows how to set and read the signals (gpios etc).
This code kicks in after the glue code creates a platform_device with the
right platform_data. That data includes I/O loops, which will usually
come from expanding an inline function (provided in the header). One goal
is that the I/O loops should be easily optimized down to a few GPIO register
accesses, in common cases, for speed and minimized overhead.
This understands all the currently defined protocol tweaking options in the
SPI framework, and might eventually serve as as reference implementation.
- different word sizes (1..32 bits)
- differing clock rates
- SPI modes differing by CPOL (affecting chip select and I/O loops)
- SPI modes differing by CPHA (affecting I/O loops)
- delays (usecs) after transfers
- temporarily deselecting chips in mid-transfer
A lot of hardware could work with this framework, though common types of
controller can't reach peak performance without switching to a driver
structure that supports pipelining of transfers (e.g. DMA queues) and maybe
controllers (e.g. IRQ driven).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This updates the ads7864 driver to use the new "spi_driver" struct, and
includes some minor unrelated cleanup.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This includes various updates to the SPI core:
- Fixes a driver model refcount bug in spi_unregister_master() paths.
- The spi_master structures now have wrappers which help keep drivers
from needing class-level get/put for device data or for refcounts.
- Check for a few setup errors that would cause oopsing later.
- Docs say more about memory management. Highlights the use of DMA-safe
i/o buffers, and zero-initializing spi_message and such metadata.
- Provide a simple alloc/free for spi_message and its spi_transfer;
this is only one of the possible memory management policies.
Nothing to break code that already works.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a refresh of the "Simple SPI Framework" found in 2.6.15-rc3-mm1
which makes the following changes:
* There's now a "struct spi_driver". This increase the footprint
of the core a bit, since it now includes code to do what the driver
core was previously handling directly. Documentation and comments
were updated to match.
* spi_alloc_master() now does class_device_initialize(), so it can
at least be refcounted before spi_register_master(). To match,
spi_register_master() switched over to class_device_add().
* States explicitly that after transfer errors, spi_devices will be
deselected. We want fault recovery procedures to work the same
for all controller drivers.
* Minor tweaks: controller_data no longer points to readonly data;
prevent some potential cast-from-null bugs with container_of calls;
clarifies some existing kerneldoc,
And a few small cleanups.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a conversion of the AT91rm9200 DataFlash MTD driver to use the
lightweight SPI framework, and no longer be AT91-specific. It compiles
down to less than 3KBytes on ARM.
The driver allows board-specific init code to provide platform_data with
the relevant MTD partitioning information, and hotplugs.
This version has been lightly tested. Its parent at91_dataflash driver has
been pretty well banged on, although kernel.org JFFS2 dataflash support was
acting broken the last time I tried it.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a driver for the ADS7846 touchscreen sensor, derived from
the corgi_ts and omap_ts drivers. Key differences from those two:
- Uses the new SPI framework (minimalist version)
- <linux/spi/ads7846.h> abstracts board-specific touchscreen info
- Sysfs attributes for the temperature and voltage sensors
- Uses fewer ARM-specific IRQ primitives
The temperature and voltage sensors show up in sysfs like this:
$ pwd
/sys/devices/platform/omap-uwire/spi2.0
$ ls
bus@ input:event0@ power/ temp1 vbatt
driver@ modalias temp0 vaux
$ cat modalias
ads7846
$ cat temp0
991
$ cat temp1
1177
$
So far only basic testing has been done. There's a fair amount of hardware
that uses this sensor, and which also runs Linux, which should eventually
be able to use this driver.
One portability note may be of special interest. It turns out that not all
SPI controllers are happy issuing requests that do things like "write 8 bit
command, read 12 bit response". Most of them seem happy to handle various
word sizes, so the issue isn't "12 bit response" but rather "different rx
and tx write sizes", despite that being a common MicroWire convention. So
this version of the driver no longer reads 12 bit native-endian words; it
reads 16-bit big-endian responses, then byteswaps them and shifts the
results to discard the noise.
Signed-off-by: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the core of a small SPI framework, implementing the model of a
queue of messages which complete asynchronously (with thin synchronous
wrappers on top).
- It's still less than 2KB of ".text" (ARM). If there's got to be a
mid-layer for something so simple, that's the right size budget. :)
- The guts use board-specific SPI device tables to build the driver
model tree. (Hardware probing is rarely an option.)
- This version of Kconfig includes no drivers. At this writing there
are two known master controller drivers (PXA/SSP, OMAP MicroWire)
and three protocol drivers (CS8415a, ADS7846, DataFlash) with LKML
mentions of other drivers in development.
- No userspace API. There are several implementations to compare.
Implement them like any other driver, and bind them with sysfs.
The changes from last version posted to LKML (on 11-Nov-2005) are minor,
and include:
- One bugfix (removes a FIXME), with the visible effect of making device
names be "spiB.C" where B is the bus number and C is the chipselect.
- The "caller provides DMA mappings" mechanism now has kerneldoc, for
DMA drivers that want to be fancy.
- Hey, the framework init can be subsys_init. Even though board init
logic fires earlier, at arch_init ... since the framework init is
for driver support, and the board init support uses static init.
- Various additional spec/doc clarifications based on discussions
with other folk. It adds a brief "thank you" at the end, for folk
who've helped nudge this framework into existence.
As I've said before, I think that "protocol tweaking" is the main support
that this driver framework will need to evolve.
From: Mark Underwood <basicmark@yahoo.com>
Update the SPI framework to remove a potential priority inversion case by
reverting to kmalloc if the pre-allocated DMA-safe buffer isn't available.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are errors and inconsistency in the display of NIP6 strings.
ie: net/ipv6/ip6_flowlabel.c
There are errors and inconsistency in the display of NIPQUAD strings too.
ie: net/netfilter/nf_conntrack_ftp.c
This patch:
adds NIP6_FMT to kernel.h
changes all code to use NIP6_FMT
fixes net/ipv6/ip6_flowlabel.c
adds NIPQUAD_FMT to kernel.h
fixes net/netfilter/nf_conntrack_ftp.c
changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build and boot tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add bus_type probe, remove and shutdown methods to replace the
corresponding methods in struct device_driver. This matches
the way we handle the suspend/resume methods.
Since the bus methods override the device_driver methods, warn
if a device driver is registered whose methods will not be
called.
The long-term idea is to remove the device_driver methods entirely.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On PowerPC, we want to be able to provide an AT_PLATFORM aux table
entry to userspace, so that glibc can choose optimized libraries for
the processor we're running on. Unfortunately that would be the 21st
aux table entry on powerpc, meaning that the aux table including the
terminating null entry would overflow the mm->saved_auxv[] array,
leading to userland programs segfaulting.
This increases the size of the mm->saved_auxv array to be large enough
to accommodate an AT_PLATFORM entry on powerpc.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support to the proc_device_tree file for removing
and updating properties. Remove just removes the
proc file, update changes the data pointer within
the proc file. The remainder of the device-tree
changes occur elsewhere.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables. In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.
o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
are now implemented as xt_FOOBAR.c files and provide module aliases
to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
around the xt_FOOBAR.h headers
Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The license header in each file now more clearly state that this
code is licensed under a dual BSD/GPL. Before this was only
evident if you looked at the MODULE_LICENSE line in core.c.
Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
Restored the old tipc_config.h to get a cleaner division between the
interfaces used by normal TIPC users and TIPC administration utilities.
Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
TIPC (Transparent Inter Process Communication) is a protocol designed for
intra cluster communication. For more information see
http://tipc.sourceforge.net
Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
Make the driver produce the string used by phy_connect and have board specific
code pass the integer mii bus id and phy device id for the specific controller
instance.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Add the PHY_ID_FMT macro to ensure that the format of the id string used by a
driver to match to its specific phy is consistent between the mdio_bus and the
driver.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
We can now have the gianfar mii platform device have a proper resource for the
IO memory region for its registers. Previously we passed this information
that the platform_data structure because we couldn't handle overlapping memory
regions for platform devices.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
There's a problem with the REQ_BLOCK_PC handling as well (bad ->data_len
handling) where it could actually complete a request ahead of time. I
suggest we just back this out for now, I will resubmit it later when I'm
fully confident in it.
This reverts commit 8672d57138
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adding defines for RAID10 and RAID50 levels, in preparation
of adding RAID Transport support in the mpt fusion drivers.
(BTW: IME is RAID10, and IM is RAID1).
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Patchset annotates arch/* uses of ->thread_info. Ones that really are about
access of thread_info of given process are simply switched to
task_thread_info(task); ones that deal with access to objects on stack are
switched to new helper - task_stack_page(). A _lot_ of the latter are
actually open-coded instances of "find where pt_regs are"; those are
consolidated into task_pt_regs(task) (many architectures actually have such
helper already).
Note that these annotations are not mandatory - any code not converted to
these helpers still works. However, they clean up a lot of places and have
actually caught a number of bugs, so converting out of tree ports would be a
good idea...
As an example of breakage caught by that stuff, see i386 pt_regs mess - we
used to have it open-coded in a bunch of places and when back in April Stas
had fixed a bug in copy_thread(), the rest had been left out of sync. That
required two followup patches (the latest - just before 2.6.15) _and_ still
had left /proc/*/stat eip field broken. Try ps -eo eip on i386 and watch the
junk...
This patch:
new helper - task_stack_page(task). Returns pointer to the memory object
containing task stack; usually thread_info of task sits in the beginning
of that object.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Nick Piggin <nickpiggin@yahoo.com.au>
Track the last waker CPU, and only consider wakeup-balancing if there's a
match between current waker CPU and the previous waker CPU. This ensures
that there is some correlation between two subsequent wakeup events before
we move the task. Should help random-wakeup workloads on large SMP
systems, by reducing the migration attempts by a factor of nr_cpus.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Ingo Molnar <mingo@elte.hu>
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:
- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.
[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]
Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:
- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.
This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.
(The reliable measurement of migration costs is tricky - see the source
for details.)
Furthermore i've added various boot-time options to override/tune
migration behavior.
Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:
migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.
I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:
Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------
Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.
Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------
Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------
This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel pointed out that the missing lower limit of intervals
leads to an accounting error in the overrun count. Enforce the lower
limit of intervals to resolution in the timer forwarding code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change the storage format of the per base resolution to ktime_t to
make it easier accessible in the hrtimers code.
Change the resolution from (NSEC_PER_SEC/HZ) to TICK_NSEC as Roman
pointed out. TICK_NSEC is closer to the real resolution.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The list_head in the hrtimer structure was introduced for easy access
to the first timer with the further extensions of real high resolution
timers in mind, but it turned out in the course of development that
it is not necessary for the standard use case. Remove the list head
and access the first expiry timer by a datafield in the timer base.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce vSMP arch to the kernel.
This patch:
1. Adds CONFIG_X86_VSMP
2. Adds machine specific macros for local_irq_disabled, local_irq_enabled
and irqs_disabled
3. Writes to the vSMP CTL device to indicate kernel compiled with CONFIG_VSMP
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following kmalloc_node.
Needed for another patch to return -1 for unknown nodes in x86-64.
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: kiran@scalex86.org
Signed-off-by: Andi Kleen <ak@suse.de>
[ Changed 0 to numa_node_id() on suggestion by Christoph Lameter ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some people need it now on 64bit so reuse the i386 code for
x86-64. This will be also useful for future bug workarounds.
It is a bit simplified there because there is no need
to do it very early on x86-64. This means it doesn't need
early ioremap et.al. We run it as a core initcall right now.
I hope it's not needed for early setup.
I added a general CONFIG_DMI symbol in case IA64 or someone
else wants to reuse the code later too.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move capable() from sched.h to capability.h;
- Use <linux/capability.h> where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Uninline capable(). Saves 2K of kernel text on a generic .config, and 1K on a
tiny config. In addition it makes the use of capable more consistent between
CONFIG_SECURITY and !CONFIG_SECURITY
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a kprobes modules is written in such a way that probes are inserted on
itself, then unload of that moudle was not possible due to reference
couning on the same module.
The below patch makes a check and incrementes the module refcount only if
it is not a self probed module.
We need to allow modules to probe themself for kprobes performance
measurements
This patch has been tested on several x86_64, ppc64 and IA64 architectures.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clarify in comments that GFP_ATOMIC means both "don't sleep" and "use
emergency pools", hence both ALLOC_HARDER and ALLOC_HIGH.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
arguments instead, since all its callers were just calculating the 'to'
address for themselves anyway... (and sometimes doing so badly).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Flag a whole bunch of things as __read_mostly on parisc. Also flag a few
branches as unlikely() and cleanup a bit of code.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
->print and ->print_range are not used (and apparently never were).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't wrap entire file in #ifdef CONFIG_NETFILTER, remove a few
unneccessary includes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from Andrew Victor
This patch adds support to the 2.6 kernel series for the Atmel
AT91RM9200 processor.
This patch is the Serial driver.
This version uses the newly re-written GPL'ed hardware headers.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch moves rcu_state into the rcu_ctrlblk. I think there
are no reasons why we should have 2 different variables to control
rcu state. Every user of rcu_state has also "rcu_ctrlblk *rcp" in
the parameter list.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the process of optimising our per cpu data code, I found a ppc64
compiler bug that has been around forever. Basically the current
RELOC_HIDE can end up trashing r30. Details of the bug can be found at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25572
This bug is present in all compilers before 4.1. It is masked by the
fact that our current per cpu data code is inefficient and causes
other loads that end up marking r30 as used.
A workaround identified by Alan Modra is to use the =r asm constraint
instead of =g.
Signed-off-by: Anton Blanchard <anton@samba.org>
[ Verified that this makes no real difference on x86[-64] */
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no need to guard the normalize_rt_tasks() prototype with an #ifdef
CONFIG_MAGIC_SYSRQ.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Wrap all the code to 80 chars on a line.
`}\nelse' changed to `} else'.
Clean whitespaces in header file.
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some code from one place to another. Get rid of ugly ifdefs in code in
next p[patches, so here create functions and macros to enable it. Rename some
functions and align some code to 80 chars.
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The API and code have been through various bits of initial review by
serial driver people but they definitely need to live somewhere for a
while so the unconverted drivers can get knocked into shape, existing
drivers that have been updated can be better tuned and bugs whacked out.
This replaces the tty flip buffers with kmalloc objects in rings. In the
normal situation for an IRQ driven serial port at typical speeds the
behaviour is pretty much the same, two buffers end up allocated and the
kernel cycles between them as before.
When there are delays or at high speed we now behave far better as the
buffer pool can grow a bit rather than lose characters. This also means
that we can operate at higher speeds reliably.
For drivers that receive characters in blocks (DMA based, USB and
especially virtualisation) the layer allows a lot of driver specific
code that works around the tty layer with private secondary queues to be
removed. The IBM folks need this sort of layer, the smart serial port
people do, the virtualisers do (because a virtualised tty typically
operates at infinite speed rather than emulating 9600 baud).
Finally many drivers had invalid and unsafe attempts to avoid buffer
overflows by directly invoking tty methods extracted out of the innards
of work queue structs. These are no longer needed and all go away. That
fixes various random hangs with serial ports on overflow.
The other change in here is to optimise the receive_room path that is
used by some callers. It turns out that only one ldisc uses receive room
except asa constant and it updates it far far less than the value is
read. We thus make it a variable not a function call.
I expect the code to contain bugs due to the size alone but I'll be
watching and squashing them and feeding out new patches as it goes.
Because the buffers now dynamically expand you should only run out of
buffering when the kernel runs out of memory for real. That means a lot of
the horrible hacks high performance drivers used to do just aren't needed any
more.
Description:
tty_insert_flip_char is an old API and continues to work as before, as does
tty_flip_buffer_push() [this is why many drivers dont need modification]. It
does now also return the number of chars inserted
There are also
tty_buffer_request_room(tty, len)
which asks for a buffer block of the length requested and returns the space
found. This improves efficiency with hardware that knows how much to
transfer.
and tty_insert_flip_string_flags(tty, str, flags, len)
to insert a string of characters and flags
For a smart interface the usual code is
len = tty_request_buffer_room(tty, amount_hardware_says);
tty_insert_flip_string(tty, buffer_from_card, len);
More description!
At the moment tty buffers are attached directly to the tty. This is causing a
lot of the problems related to tty layer locking, also problems at high speed
and also with bursty data (such as occurs in virtualised environments)
I'm working on ripping out the flip buffers and replacing them with a pool of
dynamically allocated buffers. This allows both for old style "byte I/O"
devices and also helps virtualisation and smart devices where large blocks of
data suddenely materialise and need storing.
So far so good. Lots of drivers reference tty->flip.*. Several of them also
call directly and unsafely into function pointers it provides. This will all
break. Most drivers can use tty_insert_flip_char which can be kept as an API
but others need more.
At the moment I've added the following interfaces, if people think more will
be needed now is a good time to say
int tty_buffer_request_room(tty, size)
Try and ensure at least size bytes are available, returns actual room (may be
zero). At the moment it just uses the flipbuf space but that will change.
Repeated calls without characters being added are not cumulative. (ie if you
call it with 1, 1, 1, and then 4 you'll have four characters of space. The
other functions will also try and grow buffers in future but this will be a
more efficient way when you know block sizes.
int tty_insert_flip_char(tty, ch, flag)
As before insert a character if there is room. Now returns 1 for success, 0
for failure.
int tty_insert_flip_string(tty, str, len)
Insert a block of non error characters. Returns the number inserted.
int tty_prepare_flip_string(tty, strptr, len)
Adjust the buffer to allow len characters to be added. Returns a buffer
pointer in strptr and the length available. This allows for hardware that
needs to use functions like insl or mencpy_fromio.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix typos in comments to remove kernel-doc warnings.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chipsets with PCI device ids & 0xf0 == 0x00f0 has their actual chipset type in
offset 0x1800 of the mmio space. Add support for this.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Bugzilla Bug 5351
"After resuming from S3 (suspended while in X), the LCD panel stays black .
However, the laptop is up again, and I can SSH into it from another
machine.
I can get the panel working again, when I first direct video output to the
CRT output of the laptop, and then back to LCD (done by repeatedly hitting
Fn+F5 buttons on the Toshiba, which directs output to either LCD, CRT or
TV) None of this ever happened with older kernels."
This bug is due to the recently added vesafb_blank() method in vesafb. It
works with CRT displays, but has a high incidence of problems in laptop
users. Since CRT users don't really get that much benefit from hardware
blanking, drop support for this.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch (against 2.6.15-rc5-mm3) fixes a kprobes build break
due to changes introduced in the kprobe locking in 2.6.15-rc5-mm3. In
addition, the patch reverts back the open-coding of kprobe_mutex.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc. All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.
This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s) do { } while(0)
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since Kprobes runtime exception handlers is now lock free as this code path is
now using RCU to walk through the list, there is no need for the
register/unregister{_kprobe} to use spin_{lock/unlock}_isr{save/restore}. The
serialization during registration/unregistration is now possible using just a
mutex.
In the above process, this patch also fixes a minor memory leak for x86_64 and
powerpc.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch clock_nanosleep to use the new nanosleep functions in hrtimer.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
introduce the hrtimer_nanosleep() and hrtimer_nanosleep_real() APIs. Not yet
used by any code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hrtimer subsystem core. It is initialized at bootup and expired by the timer
interrupt, but is otherwise not utilized by any other subsystem yet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- introduce ktime_t: nanosecond-resolution time format.
- eliminate the plain s64 scalar type, and always use the union.
This simplifies the arithmetics. Idea from Roman Zippel.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add timespec_valid(ts) [returns false if the timespec is denorm]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add const arguments to the posix-timers.h API functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
style and whitespace cleanup of the rest of time.h.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
clean up the CLOCK_ portions of time.h
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add 'const' to mktime arguments, and clean it up a bit
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mktime() and set_normalized_timespec() are large inline functions used in many
places: deinline them.
From: George Anzinger, off-by-1 bugfix
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move div_long_long_rem() from jiffies.h into a new calc64.h include file, as
it is a general math function useful for other things than the jiffy code.
Convert it to an inline function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Turn noatime and nodiratime into per-mount instead of per-sb flags.
After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.
While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures. Also remove some
superflous includes in fs/compat_ioctl.c
Tested on ppc64.
[1] parisc still had it's PPP handler left, which is not fully correct
for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
kick in for all netdevice users. We can introduce a proper handler
in one of the next patch series by adding a compat_ioctl method to
struct net_device but for now let's just kill it - parisc doesn't
compile in mainline anyway and I don't want this to block this
patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch __deprecated_for_modules the lookup_hash() prototype.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All callers use touch_atime now which takes a vfsmount and allows us to
implement per-mount noatime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To allow various options to work per-mount instead of per-sb we need a
struct vfsmount when updating ctime and mtime. This preparation patch
replaces the inode_update_time routine with a file_update_atime routine so
we can easily get at the vfsmount. (and the file makes more sense in this
context anyway). Also get rid of the unused second argument - we always
want to update the ctime when calling this routine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Christoph Hellwig <hch@lst.de>
The xattr code has rather complex permission checks because the rules are very
different for different attribute namespaces. This patch moves as much as we
can into the generic code. Currently all the major disk based filesystems
duplicate these checks, while many minor filesystems or network filesystems
lack some or all of them.
To do this we need defines for the extended attribute names in common code, I
moved them up from JFS which had the nicest defintions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add vfs_getxattr, vfs_setxattr and vfs_removexattr helpers for common checks
around invocation of the xattr methods. NFSD already was missing some of the
checks and there will be more soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: James Morris <jmorris@namei.org>
(James, I haven't touched selinux yet because it's doing various odd things
and I'm not sure how it would interact with the security attribute fallbacks
you added. Could you investigate whether it could use vfs_getxattr or if not
add a __vfs_getxattr helper to share the bits it is fine with?)
For NFSv4: instead of just converting it add an nfsd_getxattr helper for the
code shared by NFSv2/3 and NFSv4 ACLs. In fact that code isn't even
NFS-specific, but I'll wait for more users to pop up first before moving it to
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Vivek Goyal <vgoyal@in.ibm.com>
- In some cases, the number of segments, on a kexec load, exceeds the
existing cap of 8. This patch increases the KEXEC_SEGMENT_MAX limit from 8
to 16.
Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- In case of system crash, current state of cpu registers is saved in memory
in elf note format. So far memory for storing elf notes was being allocated
statically for NR_CPUS.
- This patch introduces dynamic allocation of memory for storing elf notes.
It uses alloc_percpu() interface. This should lead to better memory usage.
- Introduced based on Andi Kleen's and Eric W. Biederman's suggestions.
- This patch also moves memory allocation for elf notes from architecture
dependent portion to architecture independent portion. Now crash_notes is
architecture independent. The whole idea is that size of memory to be
allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and
allocation of this memory can be architecture independent.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Adrian Bunk <bunk@stusta.de>
- create one common dump_thread() prototype in kernel.h
- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
removed on all architectures where CONFIG_BINFMT_AOUT is not
available
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add list_for_each_entry_safe_reverse() to linux/list.h
This is needed by unmerged cachefs and be an as-yet-unreviewed
device_shutdown() fix.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A simple driver for the CS5535 and CS5536 that allows a user-space program
to manipulate GPIO pins. The CS5535/CS5536 chips are Geode processor
companion devices.
Signed-off-by: Ben Gardner <bgardner@wabtec.com>
Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The patch changes semaphores that are initialized as
locked to complete().
Source: MontaVista Software, Inc.
Modified-by: Steven Rostedt <rostedt@goodmis.org>
The following patch is from Montavista. I modified it slightly.
Semaphores are currently being used where it makes more sense for
completions. This patch corrects that.
Signed-off-by: Aleksey Makarov <amakarov@ru.mvista.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts the superblock-lock semaphore to a mutex, affecting
lock_super()/unlock_super(). Tested on ext3 and XFS.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
more mutex debugging: check for held locks during memory freeing,
task exit, enable sysrq printouts, etc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
mutex implementation, core files: just the basic subsystem, no users of it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
add typecheck_fn(type, function) to do type-checking of function
pointers.
Modified-by: Ingo Molnar <mingo@elte.hu>
(made it typeof() based, instead of typedef based.)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some controllers actually check the first byte of the response (most
don't). This byte contains the command opcode for R1/R1b and all 1:s
for other types. The difference must be indicated to the controller
so it knows which reply to expect.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
This is the first step on the road towards asynchronous support in
the Crypto API. It adds support for having multiple crypto_alg objects
for the same algorithm registered in the system.
For example, each device driver would register a crypto_alg object
for each algorithm that it supports. While at the same time the
user may load software implementations of those same algorithms.
Users of the Crypto API may then select a specific implementation
by name, or choose any implementation for a given algorithm with
the highest priority.
The priority field is a 32-bit signed integer. In future it will be
possible to modify it from user-space.
This also provides a solution to the problem of selecting amongst
various AES implementations, that is, aes vs. aes-i586 vs. aes-padlock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some hosts need to know that a transfer will be multi-block.
Add a data flag to indicate multiple data block transfers.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Various PCI bus errors can be signaled by newer PCI controllers.
Recovering from those errors requires an infrastructure to notify
affected device drivers of the error, and a way of walking through a
reset sequence. This patch adds a set of callbacks to be used by error
recovery routines to notify device drivers of the various stages of
recovery.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The powerpc PCI code sets up the PCI tree without doing config space
accesses in most cases, from the firmware tree. However, it still wants
to call pci_cfg_space_size() under some conditions, thus it needs to
be made non-static (though I don't see a point to export it to modules).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Store the value of the INTERRUPT_PIN in the pci_dev structure
so that it can be retrieved later.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
__rcu_pending() is rather fat and called twice from rcu_pending().
rcu_pending() has multiple callers, and not that small too.
This patch uninlines both of them.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Now, all internal ioctls are at v4l2-common.h
- removed unused ioctl at saa6752hs.h
- all debug ioctl code moved to v4l2-common.c
- removed duplicated stuff from other cards
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
- Included advanced debug option to tvp5150.c
- Now, advanced debug info is the first item at V4L menu.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
- Created a new ioctl to control I2S speed. Old calls to an
inadequate V4L2 API replaced.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
- Implement frontend-specific tuning and the ability to disable zigzag
Signed-off-by: Andrew de Quincey <adq_dvb@lidskialf.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
This moves the 32 bit ioctl compatibility handlers for
Video4Linux into a new file and adds explicit calls to them
to each v4l device driver.
Unfortunately, there does not seem to be any code handling
the v4l2 ioctls, so quite often the code goes through two
separate conversions, first from 32 bit v4l to 64 bit v4l,
and from there to 64 bit v4l2. My patch does not change
that, so there is still much room for improvement.
Also, some drivers have additional ioctl numbers, for
which the conversion should be handled internally to
that driver.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
This patch makes IDE use the new blk_complete_request() interface.
There's still room for improvement, as __ide_end_request() really
could drop the lock after getting HWGROUP->rq (why does it need to
hold it in the first place? If ->rq access isn't serialized, we are
screwed anyways).
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch moves the SCSI softirq handling to the block layer version.
There should be no functional changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
Request completion can be a quite heavy process, since it needs to
iterate through the entire request and complete the bio's it holds.
This patch adds blk_complete_request() which moves this processing
into a dedicated block softirq.
Signed-off-by: Jens Axboe <axboe@suse.de>
It's a broken interface, it's done way too late. And apparently it triggers
slab problems in recent kernels as well (most likely after the generic dispatch
code was merged). So kill it, ide-cd is the only user of it.
Signed-off-by: Jens Axboe <axboe@suse.de>
This is the first part of a rework of the PowerMac i2c code. It
completely reworks the "low_i2c" layer. It is now more flexible,
supports KeyWest, SMU and PMU i2c busses, and provides functions to
match device nodes to i2c busses and adapters.
This patch also extends & fix some bugs in the SMU driver related to i2c
support and removes the clock spreading hacks from the pmac feature code
rather than adapting them to the new API since they'll be replaced by
the platform function code completely in patch 3/5
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since version 4.1 the gcc is warning about ignored attributes. This patch is
using the equivalent attribute on the struct instead of on each of the
structure or union members.
GCC Manual:
"Specifying Attributes of Types
packed
This attribute, attached to struct or union type definition, specifies
that
each member of the structure or union is placed to minimize the memory
required. When attached to an enum definition, it indicates that the
smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to
specifying the packed attribute on each of the structure or union
members."
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier fix removed unused phase, but that changed the values for other
phases. Since these are exposed to userspace through ppdev, it is safer
not to change them. Restore the unused phase value.
Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes it possible for boot code to use screen_info without dragging in
all of tty.h.
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
PTRACE_SYSEMU{,_SINGLESTEP} is actually arch specific, for now, and the
current allocated number clashes with a ptrace code of frv, i.e.
PTRACE_GETFDPIC. I should have submitted this much earlier, anyway we get no
breakage for this.
CC: Daniel Jacobowitz <dan@debian.org>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reduce the size of the pageframe for NR_CPUS>4, CONFIG_PREEMPT back to the
minimal size by unionising both ->private and ->mapping with the pagetable
lock.
It uses an anonymous struct and hence requires gcc-3.x.
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reorder members of the kiocb structure to make sync kiocb setup faster. By
setting the elements sequentially, the write combining buffers on the CPU
are able to combine the writes into a single burst, which results in fewer
cache cycles being consumed, freeing them up for other code. This results
in a 10-20KB/s[*] increase on the bw_unix part of LMbench on my test
system.
* The improvement varies based on what other patches are in the system,
as there are a number of bottlenecks, so this number is not absolutely
accurate.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.
From: Adrian Bunk <bunk@stusta.de>
Some documentation updates and removes some code paths for gcc < 3.2.
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's one scsi driver which doesn't compile due to weird __VA_ARGS__ tricks
and the rather useful scsi/sd.c is currently getting an ICE. None of the new
SAS code compiles, due to extensive use of anonymous unions. The V4L guys are
very good at exploiting the gcc-2.95.x macro expansion bug (_why_ does each
driver need to implement its own debug macros?) and various people keep on
sneaking in anonymous unions, which are rather nice.
Plus anonymous unions are rather useful.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I found that blkp field was not used in kernel tree.
As most of the times NR_CPUS is a power of two and kmalloc() memory blocks
too, this extra field basically doubles the memory space allocated in
__alloc_percpu() to store the 'struct percpu_data'
(for example, if NR_CPUS=8 on i386, kmalloc(4*8+4) returns a 64 bytes block
instead of a 32 bytes block after this patch)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch inlines the single user of struct super_block field
s_old_blocksize and removes the field.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some long time ago, dentry struct was carefully tuned so that on 32 bits
UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
of memory cache lines.
Then RCU was added and dentry struct enlarged by two pointers, with nice
results for SMP, but not so good on UP, because breaking the above tuning
(128 + 8 = 136 bytes)
This patch reverts this unwanted side effect, by using an union (d_u),
where d_rcu and d_child are placed so that these two fields can share their
memory needs.
At the time d_free() is called (and d_rcu is really used), d_child is known
to be empty and not touched by the dentry freeing.
Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
the previous content of d_child is not needed if said dentry was unhashed
but still accessed by a CPU because of RCU constraints)
As dentry cache easily contains millions of entries, a size reduction is
worth the extra complexity of the ugly C union.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Paul Jackson <pj@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
HDIO_GETGEO is implemented in most block drivers, and all of them have to
duplicate the code to copy the structure to userspace, as well as getting
the start sector. This patch moves that to common code [1] and adds a
->getgeo method to fill out the raw kernel hd_geometry structure. For many
drivers this means ->ioctl can go away now.
[1] the s390 block drivers are odd in this respect. xpram sets ->start
to 4 always which seems more than odd, and the dasd driver shifts
the start offset around, probably because of it's non-standard
sector size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@suse.de>
Cc: <mike.miller@hp.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo Giarrusso <blaisorblade@yahoo.it>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While rooting aroung in the signal code trying to understand how to fix the
SIG_IGN ploy (set sig handler to SIG_IGN and flood system with high speed
repeating timers) I came across what, I think, is a problem in sigaction()
in that when processing a SIG_IGN request it flushes signals from 1 to
SIGRTMIN and leaves the rest. Attempt to fix this.
Signed-off-by: George Anzinger <george@mvista.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it possible for a running process (such as gssapid) to be able to
instantiate a key, as was requested by Trond Myklebust for NFS4.
The patch makes the following changes:
(1) A new, optional key type method has been added. This permits a key type
to intercept requests at the point /sbin/request-key is about to be
spawned and do something else with them - passing them over the
rpc_pipefs files or netlink sockets for instance.
The uninstantiated key, the authorisation key and the intended operation
name are passed to the method.
(2) The callout_info is no longer passed as an argument to /sbin/request-key
to prevent unauthorised viewing of this data using ps or by looking in
/proc/pid/cmdline.
This means that the old /sbin/request-key program will not work with the
patched kernel as it will expect to see an extra argument that is no
longer there.
A revised keyutils package will be made available tomorrow.
(3) The callout_info is now attached to the authorisation key. Reading this
key will retrieve the information.
(4) A new field has been added to the task_struct. This holds the
authorisation key currently active for a thread. Searches now look here
for the caller's set of keys rather than looking for an auth key in the
lowest level of the session keyring.
This permits a thread to be servicing multiple requests at once and to
switch between them. Note that this is per-thread, not per-process, and
so is usable in multithreaded programs.
The setting of this field is inherited across fork and exec.
(5) A new keyctl function (KEYCTL_ASSUME_AUTHORITY) has been added that
permits a thread to assume the authority to deal with an uninstantiated
key. Assumption is only permitted if the authorisation key associated
with the uninstantiated key is somewhere in the thread's keyrings.
This function can also clear the assumption.
(6) A new magic key specifier has been added to refer to the currently
assumed authorisation key (KEY_SPEC_REQKEY_AUTH_KEY).
(7) Instantiation will only proceed if the appropriate authorisation key is
assumed first. The assumed authorisation key is discarded if
instantiation is successful.
(8) key_validate() is moved from the file of request_key functions to the
file of permissions functions.
(9) The documentation is updated.
From: <Valdis.Kletnieks@vt.edu>
Build fix.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new keyctl function that allows the expiry time to be set on a key or
removed from a key, provided the caller has attribute modification access.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SUS requires that when truncating a file to the size that it currently
is:
truncate and ftruncate should NOT modify ctime or mtime
O_TRUNC SHOULD modify ctime and mtime.
Currently mtime and ctime are always modified on most local
filesystems (side effect of ->truncate) or never modified (on NFS).
With this patch:
ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
an update of these times is required whether size changes or not
(via a new argument to do_truncate). This allows NFS to do
the right thing for O_TRUNC.
inode_setattr nolonger forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
sets the size to it's current value. This allows local filesystems
to do the right thing for f?truncate.
Also, the logic in inode_setattr is changed a bit so there are two return
points. One returns the error from vmtruncate if it failed, the other
returns 0 (there can be no other failure).
Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
requested, we now fall-through and mark_inode_dirty. If a filesystem did
not have a ->truncate function, then vmtruncate will have changed i_size,
without marking the inode as 'dirty', and I think this is wrong.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it possible to include linux/pagevec.h multiple times without
incurring errors due to duplicate definitions.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.
Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface. We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns. It's a bit more code in the callers, but we have two sane routines
that do one thing well now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames relayfs_file_operations to relay_file_operations, and the
file operations themselves from relayfs_XXX to relay_file_XXX, to make it more
clear that they refer to relay files.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the optional is_global outparam to the create_buf_file()
callback. This can be used by clients to create a single global relayfs
buffer instead of the default per-cpu buffers. This was suggested as being
useful for certain debugging applications where it's more convenient to be
able to get all the data from a single channel without having to go to the
bother of dealing with per-cpu files.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a couple of callback functions that allow a client to hook
into relay_open()/close() and supply the files that will be used to represent
the channel buffers; the default implementation if no callbacks are defined is
to create the files in relayfs. This is to support the creation and use of
relay files in other filesystems such as debugfs, as implied by the fact that
relayfs_file_operations are exported.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since we're no longer using relayfs_inode_info, remove relayfs_alloc_inode()
and relayfs_destroy_inode() along with the relayfs inode cache.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds and exports relayfs_remove_file(), for API symmetry (with
relayfs_create_file()).
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a mandatory fileops param to relayfs_create_file() and exports
that function so that clients can use it to create files defined by their own
set of file operations, in relayfs. The purpose is to allow relayfs
applications to create their own set of 'control' files alongside their relay
files in relayfs rather than having to create them in /proc or debugfs for
instance. relayfs_create_file() is also used by relay_open_buf() to create
the relay files for a channel. In this case, a pointer to
relayfs_file_operations is passed in, along with a pointer to the buffer
associated with the file.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Needed for the Novell kernel debugger and perhaps some per-cpu data on x86_64
in the future.
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use atomic_inc_not_zero for rcu files instead of special case rcuref.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the rtc_interrupt() prototype to rtc.h and removes the
prototypes from C files.
It also renames static rtc_interrupt() functions in
arch/arm/mach-integrator/time.c and arch/sh64/kernel/time.c to avoid compile
problems.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Paul Gortmaker <p_gortmaker@yahoo.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes generic_cont_expand(), in order to share the code
with fatfs.
- Use vmtruncate() if ->prepare_write() returns a error.
Even if ->prepare_write() returns an error, it may already have added some
blocks. So, this truncates blocks outside of ->i_size by vmtruncate().
- Add generic_cont_expand_simple().
The generic_cont_expand_simple() assumes that ->prepare_write() can handle
the block boundary. With this, we don't need to care the extra byte.
And for expanding a file size by truncate(), fatfs uses the
added generic_cont_expand_simple().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This exports/changes the sync_page_range/_nolock(). The fatfs needs
sync_page_range/_nolock() for expanding truncate, and changes "size_t count"
to "loff_t count".
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch add to support of ->direct_IO() for mostly read.
The user of this seems to want to use for streaming read. So, current direct
I/O has limitation, it can only overwrite. (For write operation, mainly we
need to handle the hole etc..)
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some ARM platforms have the ability to program the interrupt controller to
detect various interrupt edges and/or levels. For some platforms, this is
critical to setup correctly, particularly those which the setting is dependent
on the device.
Currently, ARM drivers do (eg) the following:
err = request_irq(irq, ...);
set_irq_type(irq, IRQT_RISING);
However, if the interrupt has previously been programmed to be level sensitive
(for whatever reason) then this will cause an interrupt storm.
Hence, if we combine set_irq_type() with request_irq(), we can then safely set
the type prior to unmasking the interrupt. The unfortunate problem is that in
order to support this, these flags need to be visible outside of the ARM
architecture - drivers such as smc91x need these flags and they're
cross-architecture.
Finally, the SA_TRIGGER_* flag passed to request_irq() should reflect the
property that the device would like. The IRQ controller code should do its
best to select the most appropriate supported mode.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New character device driver for the SyncLink GT and SyncLink AC families of
synchronous and asynchronous serial adapters
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Include fixes for 2.6.14-git11. Should allow to remove sched.h from
module.h on i386, x86_64, arm, ia64, ppc, ppc64, and s390. Probably more
to come since I haven't yet checked the other archs.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove a couple of more lines of code from the cpuset hooks in the page
allocation code path.
There was a check for a NULL cpuset pointer in the routine
cpuset_update_task_memory_state() that was only needed during system boot,
after the memory subsystem was initialized, before the cpuset subsystem was
initialized, to catch a NULL task->cpuset pointer.
Add a cpuset_init_early() routine, just before the mem_init() call in
init/main.c, that sets up just enough of the init tasks cpuset structure to
render cpuset_update_task_memory_state() calls harmless.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix more of longstanding bug in cpuset/mempolicy interaction.
NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset
to just the Memory Nodes allowed by that cpuset. The kernel maintains
internal state for each mempolicy, tracking what nodes are used for the
MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.
When a tasks cpuset memory placement changes, whether because the cpuset
changed, or because the task was attached to a different cpuset, then the
tasks mempolicies have to be rebound to the new cpuset placement, so as to
preserve the cpuset-relative numbering of the nodes in that policy.
An earlier fix handled such mempolicy rebinding for mempolicies attached to a
task.
This fix rebinds mempolicies attached to vma's (address ranges in a tasks
address space.) Due to the need to hold the task->mm->mmap_sem semaphore while
updating vma's, the rebinding of vma mempolicies has to be done when the
cpuset memory placement is changed, at which time mmap_sem can be safely
acquired. The tasks mempolicy is rebound later, when the task next attempts
to allocate memory and notices that its task->cpuset_mems_generation is
out-of-date with its cpusets mems_generation.
Because walking the tasklist to find all tasks attached to a changing cpuset
requires holding tasklist_lock, a spinlock, one cannot update the vma's of the
affected tasks while doing the tasklist scan. In general, one cannot acquire
a semaphore (which can sleep) while already holding a spinlock (such as
tasklist_lock). So a list of mm references has to be built up during the
tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem
acquired, and the vma's in that mm rebound.
Once the tasklist lock is dropped, affected tasks may fork new tasks, before
their mm's are rebound. A kernel global 'cpuset_being_rebound' is set to
point to the cpuset being rebound (there can only be one; cpuset modifications
are done under a global 'manage_sem' semaphore), and the mpol_copy code that
is used to copy a tasks mempolicies during fork catches such forking tasks,
and ensures their children are also rebound.
When a task is moved to a different cpuset, it is easier, as there is only one
task involved. It's mm->vma's are scanned, using the same
mpol_rebind_policy() as used above.
It may happen that both the mpol_copy hook and the update done via the
tasklist scan update the same mm twice. This is ok, as the mempolicies of
each vma in an mm keep track of what mems_allowed they are relative to, and
safely no-op a second request to rebind to the same nodes.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Easy little optimization hack to avoid actually having to call
cpuset_zone_allowed() and check mems_allowed, in the main page allocation
routine, __alloc_pages(). This saves several CPU cycles per page allocation
on systems not using cpusets.
A counter is updated each time a cpuset is created or removed, and whenever
there is only one cpuset in the system, it must be the root cpuset, which
contains all CPUs and all Memory Nodes. In that case, when the counter is
one, all allocations are allowed.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cleanup, reorganize and make more robust the mempolicy.c code to rebind
mempolicies relative to the containing cpuset after a tasks memory placement
changes.
The real motivator for this cleanup patch is to lay more groundwork for the
upcoming patch to correctly rebind NUMA mempolicies that are attached to vma's
after the containing cpuset memory placement changes.
NUMA mempolicies are constrained by the cpuset their task is a member of.
When either (1) a task is moved to a different cpuset, or (2) the 'mems'
mems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded
node numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to
be recalculated, relative to their new cpuset placement.
The old code used an unreliable method of determining what was the old
mems_allowed constraining the mempolicy. It just looked at the tasks
mems_allowed value. This sort of worked with the present code, that just
rebinds the -task- mempolicy, and leaves any -vma- mempolicies broken,
referring to the old nodes. But in an upcoming patch, the vma mempolicies
will be rebound as well. Then the order in which the various task and vma
mempolicies are updated will no longer be deterministic, and one can no longer
count on the task->mems_allowed holding the old value for as long as needed.
It's not even clear if the current code was guaranteed to work reliably for
task mempolicies.
So I added a mems_allowed field to each mempolicy, stating exactly what
mems_allowed the policy is relative to, and updated synchronously and reliably
anytime that the mempolicy is rebound.
Also removed a useless wrapper routine, numa_policy_rebind(), and had its
caller, cpuset_update_task_memory_state(), call directly to the rewritten
policy_rebind() routine, and made that rebind routine extern instead of
static, and added a "mpol_" prefix to its name, making it
mpol_rebind_policy().
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a cpuset_mems_allowed() method, which the sys_migrate_pages() code
needed, to obtain the mems_allowed vector of a cpuset, and replaced the
workaround in sys_migrate_pages() to call this new method.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The important code paths through alloc_pages_current() and alloc_page_vma(),
by which most kernel page allocations go, both called
cpuset_update_current_mems_allowed(), which in turn called refresh_mems().
-Both- of these latter two routines did a tasklock, got the tasks cpuset
pointer, and checked for out of date cpuset->mems_generation.
That was a silly duplication of code and waste of CPU cycles on an important
code path.
Consolidated those two routines into a single routine, called
cpuset_update_task_memory_state(), since it updates more than just
mems_allowed.
Changed all callers of either routine to call the new consolidated routine.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.
This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.
This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.
This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure. It's up to the batch
manager or other user code to decide what to do about it and take action.
==> Unless this feature is enabled by writing "1" to the special file
/dev/cpuset/memory_pressure_enabled, the hook in the rebalance
code of __alloc_pages() for this metric reduces to simply noticing
that the cpuset_memory_pressure_enabled flag is zero. So only
systems that enable this feature will compute the metric.
Why a per-cpuset, running average:
Because this meter is per-cpuset, rather than per-task or mm, the
system load imposed by a batch scheduler monitoring this metric is
sharply reduced on large systems, because a scan of the tasklist can be
avoided on each set of queries.
Because this meter is a running average, instead of an accumulating
counter, a batch scheduler can detect memory pressure with a single
read, instead of having to read and accumulate results for a period of
time.
Because this meter is per-cpuset rather than per-task or mm, the
batch scheduler can obtain the key information, memory pressure in a
cpuset, with a single read, rather than having to query and accumulate
results over all the (dynamically changing) set of tasks in the cpuset.
A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.
A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Finish converting mm/mempolicy.c from bitmaps to nodemasks. The previous
conversion had left one routine using bitmaps, since it involved a
corresponding change to kernel/cpuset.c
Fix that interface by replacing with a simple macro that calls nodes_subset(),
or if !CONFIG_CPUSET, returns (1).
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <christoph@lameter.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The latest set of signal-RCU patches does not use get_task_struct_rcu().
Attached is a patch that removes it.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked. This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
____cacheline_maxaligned_in_smp is currently used to align critical structures
and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find
L1_CACHE_SHIFT_MAX useless.
However, we have been using ____cacheline_maxaligned_in_smp to align
structures on the internode cacheline size. As per Andi's suggestion,
following patch kills ____cacheline_maxaligned_in_smp and introduces
INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
Arches needing L3/Internode cacheline alignment can define
INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces
____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp
With this patch, L1_CACHE_SHIFT_MAX can be killed
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the numa_maps functionality is now in mempolicy.c we no longer need to
export get_vma_policy().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a boolean "memory_migrate" to each cpuset, represented by a file
containing "0" or "1" in each directory below /dev/cpuset.
It defaults to false (file contains "0"). It can be set true by writing
"1" to the file.
If true, then anytime that a task is attached to the cpuset so marked, the
pages of that task will be moved to that cpuset, preserving, to the extent
practical, the cpuset-relative placement of the pages.
Also anytime that a cpuset so marked has its memory placement changed (by
writing to its "mems" file), the tasks in that cpuset will have their pages
moved to the cpusets new nodes, preserving, to the extent practical, the
cpuset-relative placement of the moved pages.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the parameters of migrate_pages() to allow the caller control over the
fate of successfully migrated or impossible to migrate pages.
Swap migration and direct migration will have the same interface after this
patch so that patches can be independently applied to the policy layer and the
core migration code.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add gfp_mask to add_to_swap
add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
swapping. During migration we may have use add_to_swap extensively which may
lead to out of memory errors.
This patch makes add_to_swap take a parameter that specifies the gfp mask.
The page migration code can then make add_to_swap use GFP_KERNEL.
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by
CONFIG_MIGRATION saving some codesize for single processor kernels.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_migrate_pages implementation using swap based page migration
This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.
The intent of sys_migrate is to migrate memory of a process. A process may
have migrated to another node. Memory was allocated optimally for the prior
context. sys_migrate_pages allows to shift the memory to the new node.
sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory. Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed. However, a user may decide
to manually control the migration.
This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends. The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory. sys_migrate_pages does not modify policies in contrast to Ray's
implementation.
The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset). When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets. The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.
Patch supports ia64, i386 and x86_64.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add page migration support via swap to the NUMA policy layer
This patch adds page migration support to the NUMA policy layer. An
additional flag MPOL_MF_MOVE is introduced for mbind. If MPOL_MF_MOVE is
specified then pages that do not conform to the memory policy will be evicted
from memory. When they get pages back in new pages will be allocated
following the numa policy.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Include page migration if the system is NUMA or having a memory model that
allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).
And:
- Only include lru_add_drain_per_cpu if building for an SMP system.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the basic page migration function with a minimal implementation that
only allows the eviction of pages to swap space.
Page eviction and migration may be useful to migrate pages, to suspend
programs or for remapping single pages (useful for faulty pages or pages with
soft ECC failures)
The process is as follows:
The function wanting to migrate pages must first build a list of pages to be
migrated or evicted and take them off the lru lists via isolate_lru_page().
isolate_lru_page determines that a page is freeable based on the LRU bit set.
Then the actual migration or swapout can happen by calling migrate_pages().
migrate_pages does its best to migrate or swapout the pages and does multiple
passes over the list. Some pages may only be swappable if they are not dirty.
migrate_pages may start writing out dirty pages in the initial passes over
the pages. However, migrate_pages may not be able to migrate or evict all
pages for a variety of reasons.
The remaining pages may be returned to the LRU lists using putback_lru_pages().
Changelog V4->V5:
- Use the lru caches to return pages to the LRU
Changelog V3->V4:
- Restructure code so that applying patches to support full migration does
require minimal changes. Rename swapout_pages() to migrate_pages().
Changelog V2->V3:
- Extract common code from shrink_list() and swapout_pages()
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add PF_SWAPWRITE to control a processes permission to write to swap.
- Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
and pdflush
- Set PF_SWAPWRITE flag for kswapd and pdflush
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the start of the `swap migration' patch series.
Swap migration allows the moving of the physical location of pages between
nodes in a numa system while the process is running. This means that the
virtual addresses that the process sees do not change. However, the system
rearranges the physical location of those pages.
The main intent of page migration patches here is to reduce the latency of
memory access by moving pages near to the processor where the process
accessing that memory is running.
The patchset allows a process to manually relocate the node on which its
pages are located through the MF_MOVE and MF_MOVE_ALL options while
setting a new memory policy.
The pages of process can also be relocated from another process using the
sys_migrate_pages() function call. Requires CAP_SYS_ADMIN. The migrate_pages
function call takes two sets of nodes and moves pages of a process that are
located on the from nodes to the destination nodes.
Manual migration is very useful if for example the scheduler has relocated a
process to a processor on a distant node. A batch scheduler or an
administrator can detect the situation and move the pages of the process
nearer to the new processor.
sys_migrate_pages() could be used on non-numa machines as well, to force all
of a particualr process's pages out to swap, if someone thinks that's useful.
Larger installations usually partition the system using cpusets into sections
of nodes. Paul has equipped cpusets with the ability to move pages when a
task is moved to another cpuset. This allows automatic control over locality
of a process. If a task is moved to a new cpuset then also all its pages are
moved with it so that the performance of the process does not sink
dramatically (as is the case today).
Swap migration works by simply evicting the page. The pages must be faulted
back in. The pages are then typically reallocated by the system near the node
where the process is executing.
For swap migration the destination of the move is controlled by the allocation
policy. Cpusets set the allocation policy before calling sys_migrate_pages()
in order to move the pages as intended.
No allocation policy changes are performed for sys_migrate_pages(). This
means that the pages may not faulted in to the specified nodes if no
allocation policy was set by other means. The pages will just end up near the
node where the fault occurred.
There's another patch series in the pipeline which implements "direct
migration".
The direct migration patchset extends the migration functionality to avoid
going through swap. The destination node of the relation is controllable
during the actual moving of pages. The crutch of using the allocation policy
to relocate is not necessary and the pages are moved directly to the target.
Its also faster since swap is not used.
And sys_migrate_pages() can then move pages directly to the specified node.
Implement functions to isolate pages from the LRU and put them back later.
This patch:
An earlier implementation was provided by Hirokazu Takahashi
<taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
memory hotplug project.
From: Magnus
This breaks out isolate_lru_page() and putpack_lru_page(). Needed for swap
migration.
Signed-off-by: Magnus Damm <magnus.damm@gmail.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
swap migration's isolate_lru_page() currently uses an IPI to notify other
processors that the lru caches need to be drained if the page cannot be
found on the LRU. The IPI interrupt may interrupt a processor that is just
processing lru requests and cause a race condition.
This patch introduces a new function run_on_each_cpu() that uses the
keventd() to run the LRU draining on each processor. Processors disable
preemption when dealing the LRU caches (these are per processor) and thus
executing LRU draining from another process is safe.
Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As recently there has been lot of traffic on the right values for batch and
high water marks for per_cpu_pagelists. This patch makes these two
variables configurable through /proc interface.
A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry
controls the fraction of pages at most in each zone that are allocated for
each per cpu page list. The min value for this is 8. It means that we
don't allow more than 1/8th of pages in each zone to be allocated in any
single per_cpu_pagelist.
The batch value of each per cpu pagelist is also updated as a result. It
is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
It won't drop dirty data, so the user should run `sync' first.
Caveats:
a) Holds inode_lock for exorbitant amounts of time.
b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__alloc_percpu and alloc_percpu both take an 'align' argument which is
completely ignored. snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use
it, but it will be ignored. Therefore, remove the 'align' argument and fixup
the lone caller.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix compilation with CONFIG_MEMORY_HOTPLUG=y and gcc41.
Also remove unneeded declations, add a public function.
drivers/base/memory.c:53: error: static declaration of 'register_memory_notifier' follows non-static declaration
include/linux/memory.h:85: error: previous declaration of 'register_memory_notifier' was here
drivers/base/memory.c:58: error: static declaration of 'unregister_memory_notifier' follows non-static declaration
include/linux/memory.h:86: error: previous declaration of 'unregister_memory_notifier' was here
drivers/base/memory.c:68: error: static declaration of 'register_memory' follows non-static declaration
include/linux/memory.h:73: error: previous declaration of 'register_memory' was here
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds some very basic support for the new machines, including the
Quad G5 (tested), and other new dual core based machines and iMac G5
iSight (untested). This is still experimental ! There is no thermal
control yet, there is no proper handing of MSIs, etc.. but it
boots, I have all 4 cores up on my machine. Compared to the previous
version of this patch, this one adds DART IOMMU support for the U4
chipset and thus should work fine on setups with more than 2Gb of RAM.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is the current version of the spu file system, used
for driving SPEs on the Cell Broadband Engine.
This release is almost identical to the version for the
2.6.14 kernel posted earlier, which is available as part
of the Cell BE Linux distribution from
http://www.bsc.es/projects/deepcomputing/linuxoncell/.
The first patch provides all the interfaces for running
spu application, but does not have any support for
debugging SPU tasks or for scheduling. Both these
functionalities are added in the subsequent patches.
See Documentation/filesystems/spufs.txt on how to use
spufs.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi
of the original packet from the conntrack information for IPsec policy
checks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.
Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.
Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side effect this gets rid of the manual
hopopts skipping in ip6_input_finish.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call netfilter hooks before IPsec transforms. Packets visit the
FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
transform.
Patch from Herbert Xu <herbert@gondor.apana.org.au>:
Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
all subsequent transport mode SAs and is called in a loop that calls the
netfilter hooks between each two calls.
In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the ARM AMBA bus is used on MIPS as well as ARM, we need
to make the bus available for other architectures to use. Move
the AMBA include files from include/asm-arm/hardware/ to
include/linux/amba/
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It should return an unsigned value, and fix sk_filter() as well.
Signed-off-by: Kris Katterjohn <kjak@ispwest.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when kbuild passes KBUILD_MODNAME with "" do not __stringify it when
used. Remove __stringnify for all users.
This also fixes the output of:
$ ls -l /sys/module/
drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia
drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia_core
drwxr-xr-x 3 root root 0 2006-01-05 14:24 "processor"
drwxr-xr-x 3 root root 0 2006-01-05 14:24 "psmouse"
The quoting of the module names will be gone again.
Thanks to GregKH + Kay Sievers for reproting this.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Also update the tokenlen calculations to accomodate g_token_size().
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If someone changes the uid/gid mapping in userland, then we do eventually
want those changes to be propagated to the kernel. Currently the kernel
assumes that it may cache entries forever.
Add an expiration time + garbage collector for idmap entries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server decides to close the RPC socket, we currently don't actually
respond until either another RPC call is scheduled, or until xprt_autoclose()
gets called by the socket expiry timer (which may be up to 5 minutes
later).
This patch ensures that xprt_autoclose() is called much sooner if the
server closes the socket.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
client, sets cl_chatty. There's no reason why NLM shouldn't set it, so
just get rid of cl_chatty and always be verbose.
Test-plan:
Compile with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
At some point, transport endpoint addresses will no longer be IPv4. To hide
the structure of the rpc_xprt's address field from ULPs and port mappers,
add an API for setting the port number during an RPC bind operation.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
Start by creating an API to force RPC rebind, replacing logic that simply
sets cl_port to zero.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add RPC client transport switch support for replacing buffer management
on a per-transport basis.
In the current IPv4 socket transport implementation, RPC buffers are
allocated as needed for each RPC message that is sent. Some transport
implementations may choose to use pre-allocated buffers for encoding,
sending, receiving, and unmarshalling RPC messages, however. For
transports capable of direct data placement, the buffers can be carved
out of a pre-registered area of memory rather than from a slab cache.
Test-plan:
Millions of fsx operations. Performance characterization with "sio" and
"iozone". Use oprofile and other tools to look for significant regression
in CPU utilization.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server receives an NLM cancel call and finds no waiting lock to
cancel, then chances are the lock has already been applied, and the client
just hadn't yet processed the NLM granted callback before it sent the
cancel.
The Open Group text, for example, perimts a server to return either success
(LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error
seems more helpful; the client may be able to use it to recognize that a
race has occurred and to recover from the race.
So, modify the relevant functions to return an error in this case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch removes ths unused function xdr_decode_string().
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Charles Lever <Charles.Lever@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Upon return of a write delegation, the server will almost always bump the
change attribute. Ensure that we pick up that change so that we don't
invalidate our data cache unnecessarily.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The SuS states that a call to write() will cause mtime to be updated on
the file. In order to satisfy that requirement, we need to flush out
any cached writes in nfs_getattr().
Speed things up slightly by not committing the writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Most NFS server implementations allow up to 64KB reads and writes on the
wire. The Solaris NFS server allows up to a megabyte, for instance.
Now the Linux NFS client supports transfer sizes up to 1MB, too. This will
help reduce protocol and context switch overhead on read/write intensive NFS
workloads, and support larger atomic read and write operations on servers
that support them.
Test-plan:
Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
Tests with NFS over UDP to verify the maximum RPC payload size cap.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Minor cleanup: inlined bit ops in nfs_page.h can be simpler.
Test plan:
Write-intensive workload against a server that requires COMMITs.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFSv4 model requires us to complete all RPC calls that might
establish state on the server whether or not the user wants to
interrupt it. We may also need to schedule new work (including
new RPC calls) in order to cancel the new state.
The asynchronous RPC model will allow us to ensure that RPC calls
always complete, but in order to allow for "synchronous" RPC, we
want to add the ability to wait for completion.
The waits are, of course, interruptible.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Shrink the RPC task structure. Instead of storing separate pointers
for task->tk_exit and task->tk_release, put them in a structure.
Also pass the user data pointer as a parameter instead of passing it via
task->tk_calldata. This enables us to nest callbacks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS needs to be able to distinguish between single-page ->writepage() calls and
multipage ->writepages() calls.
For the single-page writepage calls NFS can kick off the I/O within the
context of ->writepage().
For multipage ->writepages calls, nfs_writepage() will leave the I/O pending
and nfs_writepages() will kick off the I/O when it all has been queued up
within NFS.
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds suspend patch to libata, and ata_piix in particular. For
most low level drivers, they should just need to add the 4 hooks to
work. As I can only test ata_piix, I didn't enable it for more
though.
Suspend support is the single most important feature on a notebook, and
most new notebooks have sata drives. It's quite embarrassing that we
_still_ do not support this. Right now, it's perfectly possible to
suspend the drive in mid-transfer.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also export current (average) speed and status in sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Store this total in superblock (As appropriate), and make it available to
userspace via sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. because they aren't used outside md.c
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
md sometimes call put_page on NULL pointers (treating it like kfree). This is
not safe, so define and use a 'safe_put_page' which checks for NULL.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
md supports multiple different RAID level, each being implemented by a
'personality' (which is often in a separate module).
These personalities have fairly artificial 'numbers'. The numbers
are use to:
1- provide an index into an array where the various personalities
are recorded
2- identify the module (via an alias) which implements are particular
personality.
Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely). Module identification can be done using an alias
based on level rather than 'personality' number.
The current 'raid5' modules support two level (4 and 5) but only one
personality. This slight awkwardness (which was handled in the mapping from
level to personality) can be better handled by allowing raid5 to register 2
personalities.
With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities can be
added independently.
This patch also moves the check for chunksize being non-zero into the ->run
routines for the personalities that need it, rather than having it in core-md.
This has a side effect of allowing 'faulty' and 'linear' not to have a
chunk-size set.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- replace open-coded hash chain with hlist macros
- Fix hash-table size at one page - it is already quite generous, so there
will never be a need to use multiple pages, so no need for __get_free_pages
No functional change.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add in correct read-error handling for resync and read-only situations.
When read-only, we don't over-write, so we need to mark the failed drive in
the r10_bio so we don't re-try it. During resync, we always read all blocks,
so if there is a read error, we simply over-write it with the good block that
we found (assuming we found one).
Note that the recovery case still isn't handled in an interesting way. There
is nothing useful to do for the 2-copies case. If there are 3 or more copies,
then we could try reading from one of the non-missing copies, but this is a
bit complicated and very rarely would be used, so I'm leaving it for now.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Largely just a cross-port from raid1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is this "FIXME" comment with a typo in it!! that been annoying me for
days, so I just had to remove it.
conf->disks[i].rdev should only be accessed if
- we know we hold a reference or
- the mddev->reconfig_sem is down or
- we have a rcu_readlock
handle_stripe was referencing rdev in three places without any of these. For
the first two, get an rcu_readlock. For the last, the same access
(md_sync_acct call) is made a little later after the rdev has been claimed
under and rcu_readlock, if R5_Syncio is set. So just use that access...
However R5_Syncio isn't really needed as the 'syncing' variable contains the
same information. So use that instead.
Issues, comment, and fix are identical in raid5 and raid6.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On a read-error we suspend the array, then synchronously read the block from
other arrays until we find one where we can read it. Then we try writing the
good data back everywhere and make sure it works. If any write or subsequent
read fails, only then do we fail the device out of the array.
To be able to suspend the array, we need to also keep track of how many
requests are queued for handling by raid1d.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid6 currently does not check the P/Q syndromes when doing a resync, it just
calculates the correct value and writes it. Doing the check can reduce writes
(often to 0) for a resync, and it is needed to properly implement the
echo check > sync_action
operation.
This patch implements the appropriate checks and tidies up some related code.
It also allows raid6 user-requested resync to bypass the intent bitmap.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
See patch to md.txt for more details
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid10 needs to put up a barrier to new requests while it does resync or other
background recovery. The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.
This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid1 needs to put up a barrier to new requests while it does resync or other
background recovery. The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.
This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add ioctl DM_SKIP_LOCKFS_FLAG for userspace to request that lock_fs is
bypassed when suspending a device.
There's no change to the behaviour of existing code that doesn't know about
the new flag.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring. This as noticed when exporting directories using fuse.
This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.
There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"extern inline" doesn't make much sense.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add missing "struct" keyword preventing compilation with DEBUG_PARPORT
defined. Also add some "const".
Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Did not move the parport interface properly into IEEE1284_PH_REV_IDLE phase at
end of data due to comparing bytes with nibbles. Internal phase
IEEE1284_PH_HBUSY_DNA became unused, so remove it.
Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the maximum size of write data configurable by the filesystem. The
previous fixed 4096 limit only worked on architectures where the page size is
less or equal to this. This change make writing work on other architectures
too, and also lets the filesystem receive bigger write requests in direct_io
mode.
Normal writes which go through the page cache are still limited to a page
sized chunk per request.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the way a too large request is handled. Until now in this case the
device read returned -EINVAL and the operation returned -EIO.
Make it more flexibible by not returning -EINVAL from the read, but restarting
it instead.
Also remove the fixed limit on setxattr data and let the filesystem provide as
large a read buffer as it needs to handle the extended attribute data.
The symbolic link length is already checked by VFS to be less than PATH_MAX,
so the extra check against FUSE_SYMLINK_MAX is not needed.
The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the
dentry has already been looked up, and hence the name already checked.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add 'frsize' member to the statfs reply.
I'm not sure if sending f_fsid will ever be needed, but just in case leave
some space at the end of the structure, so less compatibility mess would be
required.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change interface version to 7.4.
Following changes will need backward compatibility support, so store the minor
version returned by userspace.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Removed some kmalloc's with __GFP_ZERO and replace it with memset()
because it didn't work properly.
- Fixed returned message frame in i2o_cfg_passthru() which caused raidutils
to display wrong error message in case a disk was missing.
- Fixed size of printk() in i2o_scsi.c.
- Fixed get_device() and put_device() in probing of the I2O controller.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Removed wrong I2O device class, which was only needed to add sysfs attributes.
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changed the I2O API to create I2O messages first in kernel memory and then
transfer it at once over the PCI bus instead of sending each quad-word over
the PCI bus.
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
S390, 64BIT and COMPAT.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the function get_swap_page_of_type() allowing us to specify an index
in swap_info[] and select a swap_info_struct structure to be used for
allocating a swap page.
This function (or another one of similar functionality) will be necessary for
implementing the image-writing part of swsusp in the user space. It can also
be used for simplifying the current in-kernel implementation of the
image-writing part of swsusp.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes swsusp free only as much memory as needed to complete the
suspend and not as much as possible. In the most of cases this should speed
up the suspend and make the system much more responsive after resume,
especially if a GUI (eg. X Windows) is used.
If needed, the old behavior (ie to free as much memory as possible during
suspend) can be restored by unsetting FAST_FREE in power.h
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the swap map structure that can be used by swsusp for
keeping tracks of data pages written to the swap. The structure itself is
described in a comment within the patch.
The overall idea is to reduce the amount of metadata written to the swap and
to write and read the image pages sequentially, in a file-alike way. This
makes the swap-handling part of swsusp fairly independent of its
snapshot-handling part and will hopefully allow us to completely separate
these two parts in the future.
This patch is needed to remove the suspend image size limit imposed by the
limited size of the swsusp_info structure, which is essential for x86-64
systems with more than 512 MB of RAM.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thanks to Christoph for doing most of the work.
This allows automatic SMP IRQ affinity assignment other than default "all
interrupts on all CPUs" which is rather expensive. This might be useful if
the hardware can be programmed to distribute interrupts among different
CPUs, like Alpha does.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide basic support for the AMD Geode GX and LX processors.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the futex code compilable and usable on NOMMU by making the attempt to
handle page faults conditional on CONFIG_MMU. If this is not enabled, then
we can assume that EFAULT returned from futex_atomic_op_inuser() is not
recoverable, and that the address lies outside of valid memory.
handle_mm_fault() is made to BUG if called on NOMMU without attempting to
invoke the actual handler (__handle_mm_fault).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes the SYSV IPC shared memory facilities use the new
ramfs facilities on a no-MMU kernel.
The following changes are made:
(1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
allow the IPC SHM facilities to commune with the tiny-shmem and shmem
code.
(2) ramfs files now need resizing using do_truncate() rather than by modifying
the inode size directly (see shmem_file_setup()). This causes ramfs to
attempt to bind a block of pages of sufficient size to the inode.
(3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes ramfs support shared-writable mmaps by:
(1) Attempting to perform a contiguous block allocation to the requested size
when truncate attempts to increase the file from zero size, such as
happens when:
fd = shm_open("/file/on/ramfs", ...):
ftruncate(fd, size_requested);
addr = mmap(NULL, subsize, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED,
fd, offset);
(2) Permitting any shared-writable mapping over any contiguous set of extant
pages. get_unmapped_area() will return the address into the actual ramfs
pages. The mapping may start anywhere and be of any size, but may not go
over the end of file. Multiple mappings may overlap in any way.
(3) Not permitting a file to be shrunk if it would truncate any shared
mappings (private mappings are copied).
Thus this patch provides support for POSIX shared memory on NOMMU kernels,
with certain limitations such as there being a large enough block of pages
available to support the allocation and it only working on directly mappable
filesystems.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the key duplication stuff since there's nothing that uses it, no way
to get at it and it's awkward to deal with for LSM purposes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Comment the new locking rules for page_state statistics.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields. Callers must provide their own locking (either
disable interrupts or not update from interrupt context).
Switch over the hot callsites that can easily be moved under interrupts off
sections.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several counters already have the need to use 64 atomic variables on 64 bit
platforms (see mm_counter_t in sched.h). We have to do ugly ifdefs to fall
back to 32 bit atomic on 32 bit platforms.
The VM statistics patch that I am working on will also make more extensive
use of atomic64.
This patch introduces a new type atomic_long_t by providing definitions in
asm-generic/atomic.h that works similar to the c "long" type. Its 32 bits
on 32 bit platforms and 64 bits on 64 bit platforms.
Also cleans up the determination of the mm_counter_t in sched.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the function to build a zonelist for a BIND policy has the side
effect to set the policy_zone. This seems to be a bit strange. policy
zone seems to not be initialized elsewhere and therefore 0. Do we police
ZONE_DMA if no bind policy has been used yet?
This patch moves the determination of the zone to apply policies to into
the page allocator. We determine the zone while building the zonelist for
nodes.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are numerous places we check whether a zone is populated or not.
Provide a helper function to check for populated zones and convert all
checks for zone->present_pages.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimise rmap functions by minimising atomic operations when we know there
will be no concurrent modifications.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add dma32 to zone statistics. Also attempt to arrange struct page_state a
bit better (visually).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the last bits of Martin's ill-fated sys_set_zone_reclaim().
Cc: Martin Hicks <mort@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch cleans up the alloc_bootmem fix for swiotlb. Patch removes
alloc_bootmem_*_limit api and fixes alloc_boot_*low api to do the right
thing -- allocate from low32 memory.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before SPARSEMEM is initialised we cannot provide an efficient pfn_to_nid()
implmentation; before initialisation is complete we use early_pfn_to_nid()
to provide location information. Until recently there was no non-init user
of this functionality. Provide a post init pfn_to_nid() implementation.
Note that this implmentation assumes that the pfn passed has been validated
with pfn_valid(). The current single user of this function already has
this check.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are three places we define pfn_to_nid(). Two in linux/mmzone.h and one
in asm/mmzone.h. These in essence represent the three memory models. The
definition in linux/mmzone.h under !NEED_MULTIPLE_NODES is both the FLATMEM
definition and the optimisation for single NUMA nodes; the one under SPARSEMEM
is the NUMA sparsemem one; the one in asm/mmzone.h under DISCONTIGMEM is the
discontigmem one. This is not in the least bit obvious, particularly the
connection between the non-NUMA optimisations and the memory models.
Two patches:
flatmem-split-out-memory-model: simplifies the selection of pfn_to_nid()
implementations. The selection is based primarily off the memory model
selected. Optimisations for non-NUMA are applied where needed.
sparse-provide-pfn_to_nid: implement pfn_to_nid() for SPARSEMEM
This patch:
pfn_to_nid is memory model specific
The pfn_to_nid() call is memory model specific. It represents the locality
identifier for the memory passed. Classically this would be a NUMA node,
but not a chunk of memory under DISCONTIGMEM.
The SPARSEMEM and FLATMEM memory model non-NUMA versions of pfn_to_nid()
are folded together under NEED_MULTIPLE_NODES, while DISCONTIGMEM has its
own optimisation. This is all very confusing.
This patch splits out each implementation of pfn_to_nid() so that we can
see them and the optimisations to each.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix two warnings in ipc/shm.c
ipc/shm.c:122: warning: statement with no effect
ipc/shm.c:560: warning: statement with no effect
by converting the macros to empty inline functions. For safety, let's do
all three. This also has the advantage that typechecking gets performed
even without CONFIG_SHMEM enabled.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NODES_SPAN_OTHER_NODES config option was created so that DISCONTIGMEM
could handle pSeries numa layouts. However, support for DISCONTIGMEM has
been replaced by SPARSEMEM on powerpc. As a result, this config option and
supporting code is no longer needed.
I have already sent a patch to Paul that removes the option from powerpc
specific code. This removes the arch independent piece. Doesn't really
matter which is applied first.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pfn_to_pgdat() isn't used in common code. Remove definition.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kvaddr_to_nid() isn't used in common code nor in i386 code. Remove these
definitions.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mempolicy.c contains provisional interface for huge page allocation based on
node numbers. This is in use in SLES9 but was never used (AFAIK) in upstream
versions of Linux.
Huge page allocations now use zonelists to figure out where to allocate pages.
The use of zonelists allows us to find the closest hugepage which was the
consideration of the NUMA distance for huge page allocations.
Remove the obsolete functions.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The huge_zonelist() function in the memory policy layer provides an list of
zones ordered by NUMA distance. The hugetlb layer will walk that list looking
for a zone that has available huge pages but is also in the nodeset of the
current cpuset.
This patch does not contain the folding of find_or_alloc_huge_page() that was
controversial in the earlier discussion.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store. Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.
"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released. However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.
This feature is also useful for supporting hot-plug memory on UML.
Concerns raised by Andrew Morton:
- "We have no plan for holepunching! If we _do_ have such a plan (or
might in the future) then what would the API look like? I think
sys_holepunch(fd, start, len), so we should start out with that."
- Using madvise is very weird, because people will ask "why do I need to
mmap my file before I can stick a hole in it?"
- None of the other madvise operations call into the filesystem in this
manner. A broad question is: is this capability an MM operation or a
filesytem operation? truncate, for example, is a filesystem operation
which sometimes has MM side-effects. madvise is an mm operation and with
this patch, it gains FS side-effects, only they're really, really
significant ones."
Comments:
- Andrea suggested the fs operation too but then it's more efficient to
have it as a mm operation with fs side effects, because they don't
immediatly know fd and physical offset of the range. It's possible to
fixup in userland and to use the fs operation but it's more expensive,
the vmas are already in the kernel and we can use them.
Short term plan & Future Direction:
- We seem to need this interface only for shmfs/tmpfs files in the short
term. We have to add hooks into the filesystem for correctness and
completeness. This is what this patch does.
- In the future, plan is to support both fs and mmap apis also. This
also involves (other) filesystem specific functions to be implemented.
- Current patch doesn't support VM_NONLINEAR - which can be addressed in
the future.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes truncate_inode_pages_range from truncate_inode_pages.
truncate_inode_pages became a one-liner call to truncate_inode_pages_range.
Reiser4 needs truncate_inode_pages_ranges because it tries to keep
correspondence between existences of metadata pointing to data pages and pages
to which those metadata point to. So, when metadata of certain part of file
is removed from filesystem tree, only pages of corresponding range are to be
truncated.
(Needed by the madvise(MADV_REMOVE) patch)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Janos Haar of First NetCenter Bt. reported numerous crashes involving the
NBD driver. With his help, this was tracked down to bogus bio vectors
which in turn was the result of a race condition between the
receive/transmit routines in the NBD driver.
The bug manifests itself like this:
CPU0 CPU1
do_nbd_request
add req to queuelist
nbd_send_request
send req head
for each bio
kmap
send
nbd_read_stat
nbd_find_request
nbd_end_request
kunmap
When CPU1 finishes nbd_end_request, the request and all its associated
bio's are freed. So when CPU0 calls kunmap whose argument is derived from
the last bio, it may crash.
Under normal circumstances, the race occurs only on the last bio. However,
if an error is encountered on the remote NBD server (such as an incorrect
magic number in the request), or if there were a bug in the server, it is
possible for the nbd_end_request to occur any time after the request's
addition to the queuelist.
The following patch fixes this problem by making sure that requests are not
added to the queuelist until after they have been completed transmission.
In order for the receiving side to be ready for responses involving
requests still being transmitted, the patch introduces the concept of the
active request.
When a response matches the current active request, its processing is
delayed until after the tranmission has come to a stop.
This has been tested by Janos and it has been successful in curing this
race condition.
From: Herbert Xu <herbert@gondor.apana.org.au>
Here is an updated patch which removes the active_req wait in
nbd_clear_queue and the associated memory barrier.
I've also clarified this in the comment.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: <djani22@dynamicweb.hu>
Cc: Paul Clements <Paul.Clements@SteelEye.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reimplement handling of barrier requests.
* Flexible handling to deal with various capabilities of
target devices.
* Retry support for falling back.
* Tagged queues which don't support ordered tag can do ordered.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
add @uptodate argument to end_that_request_last() and @error
to rq_end_io_fn(). there's no generic way to pass error code
to request completion function, making generic error handling
of non-fs request difficult (rq->errors is driver-specific and
each driver uses it differently). this patch adds @uptodate
to end_that_request_last() and @error to rq_end_io_fn().
for fs requests, this doesn't really matter, so just using the
same uptodate argument used in the last call to
end_that_request_first() should suffice. imho, this can also
help the generic command-carrying request jens is working on.
Signed-off-by: tejun heo <htejun@gmail.com>
Signed-Off-By: Jens Axboe <axboe@suse.de>
One more supported PCI ID for the i2c-nforce2 driver.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cleanups to i2c driver ID list:
* Remove mostly bogus comments about driver ID ranges.
* Drop experimental driver IDs, as the concept is pretty broken.
* Drop now unused IDs of non-I2C (ISA) drivers.
* Drop a few more IDs which are no more used.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the VIA CENTAUR CPUs to detection table.
Table was updated to treat future Intel x86 CPUs as VRD10.
Stepping field was added, because some VIA CPUs have
different VRM specs across stepping. I changed the vrm type
to u8 because all drivers use u8 anyway.
Signed-off-by: Rudolf Marek <r.marek@sh.cvut.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This prevents i2c drivers from messing up and forgetting to set the
module owner of their driver. It also reduces the size of their drivers
by one line :)
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jean Delvare <khali@linux-fr.org>
We should use the i2c_driver.driver's .name and .owner fields
instead of the i2c_driver's ones.
This patch updates the core of the i2c drivers: it removes .name and
.owner fields from the struct i2c_device and modify various
functions to use struct device fields instead.
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The i2c_get_client function doesn't exist anymore, so we shouldn't
have a definition for it in i2c.h.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Do not limit the usage count of i2c clients to 1. In other words,
change the client usage count behavior from the old I2C_CLIENT_ALLOW_USE
to the old I2C_CLIENT_ALLOW_MULTIPLE_USE. The rationale is that no
driver actually needs the limiting behavior, and the unlimiting
behavior is slightly easier to implement.
Update the documentation to reflect this change.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make I2C_CLIENT_ALLOW_USE the default for all i2c clients. It doesn't
hurt if the usage count is actually never used for any given driver,
and allows for nice code simplifications in i2c-core.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No i2c client uses the I2C_CLIENT_ALLOW_MULTIPLE_USE flag, drop it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The flags member of the i2c_driver structure is no more used. Drop it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Just about every i2c chip driver sets the I2C_DF_NOTIFY flag, so we
can simply make it the default and drop the flag. If any driver really
doesn't want to be notified when i2c adapters are added, that driver
can simply omit to set .attach_adapter. This approach is also more
robust as it prevents accidental NULL pointer dereferences.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The I2C_DF_DUMMY flag is gone since 2.5.70, it's about time to
drop all ifdef'd out references thereto.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original ipv6_find_hdr() finds the specified header in IPv6 packets.
This makes it possible to get transport header so that we can kill similar
loop in ip6_match_packet().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly dump the helper name instead of internal kernel data.
Based on patch by Marcus Sundberg <marcus@ingate.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build when scripts/mod/file2alias.c includes linux/input.h, which
tries to include /usr/include/linux/mod_devicetable.h:
In file included from scripts/mod/file2alias.c:40:
include/linux/input.h:21:35: linux/mod_devicetable.h: No such file or directory
make[2]: *** [scripts/mod/file2alias.o] Error 1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Here's the patch for modalias support for input classes. It uses
comma-separated numbers, and doesn't describe all the potential keys (no
module currently cares, and that would make the strings huge). The
changes to input.h are to move the definitions needed by file2alias
outside __KERNEL__. I chose not to move those definitions to
mod_devicetable.h, because there are so many that it might break compile
of something else in the kernel.
The rest is fairly straightforward.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
lib/lib.a(kobject_uevent.o)(.text+0x25f): In function `kobject_uevent':
: undefined reference to `__alloc_skb'
lib/lib.a(kobject_uevent.o)(.text+0x2a1): In function `kobject_uevent':
: undefined reference to `skb_over_panic'
lib/lib.a(kobject_uevent.o)(.text+0x31d): In function `kobject_uevent':
: undefined reference to `skb_over_panic'
lib/lib.a(kobject_uevent.o)(.text+0x356): In function `kobject_uevent':
: undefined reference to `netlink_broadcast'
lib/lib.a(kobject_uevent.o)(.init.text+0x9): In function `kobject_uevent_init':
: undefined reference to `netlink_kernel_create'
make: *** [.tmp_vmlinux1] Error 1
Netlink is unconditionally enabled if CONFIG_NET, so that's OK.
kobject_uevent.o is compiled even if !CONFIG_HOTPLUG, which is lazy.
Let's compound the sin.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The distinction between hotplug and uevent does not make sense these
days, netlink events are the default.
udev depends entirely on netlink uevents. Only during early boot and
in initramfs, /sbin/hotplug is needed. So merge the two functions and
provide only one interface without all the options.
The netlink layer got a nice generic interface with named slots
recently, which is probably a better facility to plug events for
subsystem specific events.
Also the new poll() interface to /proc/mounts is a nicer way to
notify about changes than sending events through the core.
The uevents should only be used for driver core related requests to
userspace now.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The names of these events have been confusing from the beginning
on, as they have been more like claim/release events. We needed these
events for noticing HAL if storage devices have been mounted.
Thanks to Al, we have the proper solution now and can poll()
/proc/mounts instead to get notfied about mount tree changes.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It makes zero sense to have hotplug, but not the netlink
events enabled today. Remove this option and merge the
kobject_uevent.h header into the kobject.h header file.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds another usb-storage subdriver, which supports two fairly
old dual-XD/SmartMedia reader-writers (USB1.1 devices).
This driver was written by Daniel Drake <dsd@gentoo.org> -- he notes
that he wrote this driver without specs, however a vendor-supplied GPL
driver for the previous generation of products ("sma03") did prove to be
quite useful, as did the sddr09 driver which also has to deal with
low-level physical block layout on SmartMedia.
The original patch has been reformed by me, as it clashed with the
libusual patches.
We really need to consolidate some of this common SmartMedia code, and
get together with the MTD guys to share it with them as well.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as610) adds a field to struct usb_device to store the device's
port number. This allows us to remove several loops in the hub driver
(searching for a particular device among all the entries in the parent's
array of children).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as609) changes the way we keep track of power budgeting for
USB hubs and devices, and it updates the choose_configuration routine to
take this information into account. (This is something we should have
been doing all along.) A new field in struct usb_device holds the amount
of bus current available from the upstream port, and the usb_hub structure
keeps track of the current available for each downstream port.
Two new rules for configuration selection are added:
Don't select a self-powered configuration when only bus power
is available.
Don't select a configuration requiring more bus power than is
available.
However the first rule is #if-ed out, because I found that the internal
hub in my HP USB keyboard claims that its only configuration is
self-powered. The rule would prevent the configuration from being chosen,
leaving the hub & keyboard unconfigured. Since similar descriptor errors
may turn out to be fairly common, it seemed wise not to include a rule
that would break automatic configuration unnecessarily for such devices.
The second rule may also trigger unnecessarily, although this should be
less common. More likely it will annoy people by sometimes failing to
accept configurations that should never have been chosen in the first
place.
The patch also changes usbcore's reaction when no configuration is
suitable. Instead of raising an error and rejecting the device, now
the core will simply leave the device unconfigured. People can always
work around such problems by installing configurations manually through
sysfs.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as605) removes the private udev->serialize semaphore,
relying instead on the locking provided by the embedded struct device's
semaphore. The changes are confined to the core, except that the
usb_trylock_device routine now uses the return convention of
down_trylock rather than down_read_trylock (they return opposite values
for no good reason).
A couple of other associated changes are included as well:
Now that we aren't concerned about HCDs that avoid using the
hcd glue layer, usb_disconnect no longer needs to acquire the
usb_bus_lock -- that can be done by usb_remove_hcd where it
belongs.
Devices aren't locked over the same scope of code in
usb_new_device and hub_port_connect_change as they used to be.
This shouldn't cause any trouble.
Along with the preceding driver core patch, this needs a lot of testing.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes the driver that forgot to set the module owner up. Now we
can remove the unneeded pointer from the usb driver structure. The idea
for how to do this was from Al Viro, who did this for the PCI drivers.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This lets drivers, like the usb-serial ones, disable the ability to add
ids from sysfs.
The usb-serial drivers are "odd" in that they are really usb-serial bus
drivers, not usb bus drivers, so the dynamic id logic will have to go
into the usb-serial bus core for those drivers to get that ability.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Echo the usb vendor and product id to the "new_id" file in the driver's
sysfs directory, and then that driver will be able to bind to a device
with those ids if it is present.
Example:
echo 0557 2008 > /sys/bus/usb/drivers/foo_driver/new_id
adds the hex values 0557 and 2008 to the device id table for the foo_driver.
Note, usb-serial drivers do not currently work with this capability yet.
usb-storage also might have some oddities.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a shim driver libusual, which routes devices between
usb-storage and ub according to the common table, based on unusual_devs.h.
The help and example syntax is in Kconfig.
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I've also fixed the sort-ordering comments on this naming convention.
Signed-off-by: Stuart MacDonald <stuartm@connecttech.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The check for multicast shouldn't exclude broadcast type addresses.
This reverts the incorrect change done in 2.6.13.
The broadcast address is a multicast address and should be excluded
from being a valid_ether_address for use in bridging or device address.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Select a block size for IO based on the read and write block size
combinations, and whether the card supports partial block reads
and/or partial block writes.
If we are able to satisfy block reads but not block writes, mark
the device read only.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
From: Benjamin LaHaise <bcrl@kvack.org>
In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the
shared info structure results in gcc being forced to do a reload of the
pointer since it has no information on possible aliasing. Fix this by
using a pointer to refer to skb_shared_info.
By initializing skb_shared_info sequentially, the write combining buffers
can reduce the number of memory transactions to a single write. Reorder
the initialization in __alloc_skb() to match the structure definition.
There is also an alignment issue on 64 bit systems with skb_shared_info
by converting nr_frags to a short everything packs up nicely.
Also, pass the slab cache pointer according to the fclone flag instead
of using two almost identical function calls.
This raises bw_unix performance up to a peak of 707KB/s when combined
with the spinlock patch. It should help other networking protocols, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
And actually, with this, the whole pppox layer can basically
be removed and subsumed into pppoe.c, no other pppox sub-protocol
implementation exists and we've had this thing for at least 4
years.
Signed-off-by: David S. Miller <davem@davemloft.net>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)
This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.
This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)
I made a global stubstitute to change all existing occurences to make
them const.
This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_init can be done as a core_initcall instead of calling
it directly in init/main.c
Also I removed an out of date #ifdef.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As DCCP needs to be called in the same spots.
Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming patches will make, for instance, ip_sockglue.c need just this enum
and not all of tcp.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.
Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.
tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using sk->sk_protocol instead of IPPROTO_TCP.
Will be used by DCCPv6 in the next changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue. This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.
Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6. This patch moves them into a one place and simplifies
the code somewhat.
This is based on a suggestion by Eric Dumazet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.
(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)
This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.
This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.
Patch purpose:
The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.
Patch design approach:
The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.
A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.
Patch implementation details:
On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.
On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.
The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.
Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.
Testing:
The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.
The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.
WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Also use the PnP functions to start/stop the devices during the suspend so
that drivers will not have to duplicate this code.
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add support for the CS5535 Audio device. I've fixed up some errors as per
Takashi's advice from the thread:
http://lkml.org/lkml/2005/9/15/119
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
cs5535 is a 32bit x86 only device using weird CPU features
Signed-off-by: Jaya Kumar <jayakumar.alsa@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This patch fixes a problem when we use well known kernel symbols as module
names.
For example, if module source name is current.c, idle_stack.c or etc.,
we have a bad KBUILD_MODNAME value.
For example, KBUILD_MODNAME will be "get_current()" instead of "current", or
"(init_thread_union.stack)" instead of "idle_task".
The trick is to define a stringify macro on the commandline - named
KBUILD_STR for namespace reasons - and then to stringify the module
name.
There are a few uses of KBUILD_MODNAME throughout the tree but the usage
is for debug and will not be harmed by this change so left untouched for now.
While at it KBUILD_BASENAME was changed too. Any spinlock usage in the
unix module would have created wrong section names without it.
Usage in spinlock.h fixed so it no longer stringify KBUILD_BASENAME.
Original patch from Ustyogov Roman - all bugs introduced by me.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Fix n_r3964 timeouts (hardcoded for 100Hz)
Also the include of <asm/termios.h> in 'n_r3964.h' is unnecessary and
prevents using the header file in any application that has to include
<termios.h> due to duplicate definition of 'struct termio'.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently a simple
void foo(void) { preempt_enable(); }
produces the following code on ARM:
foo:
bic r3, sp, #8128
bic r3, r3, #63
ldr r2, [r3, #4]
ldr r1, [r3, #0]
sub r2, r2, #1
tst r1, #4
str r2, [r3, #4]
blne preempt_schedule
mov pc, lr
The problem is that the TIF_NEED_RESCHED flag is loaded _before_ the
preemption count is stored back, hence any interrupt coming within that
3 instruction window causing TIF_NEED_RESCHED to be set won't be
seen and scheduling won't happen as it should.
Nothing currently prevents gcc from performing that reordering. There
is already a barrier() before the decrement of the preemption count, but
another one is needed between this and the TIF_NEED_RESCHED flag test
for proper code ordering.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan's crosscompile page [1] shows, that one regression in 2.6.15-rc is
that the v850 defconfig does no longer compile.
The compile error is:
<-- snip -->
...
CC arch/v850/kernel/setup.o
In file included from /usr/src/ctest/rc/kernel/arch/v850/kernel/setup.c:17:
/usr/src/ctest/rc/kernel/include/linux/irq.h:13:43: asm/smp.h: No such file or directory
make[2]: *** [arch/v850/kernel/setup.o] Error 1
<-- snip -->
The #include <asm/smp.h> in irq.h was intruduced in 2.6.15-rc.
Since include/linux/irq.h needs code from asm/smp.h only in the
CONFIG_SMP=y case and linux/smp.h #include's asm/smp.h only in the
CONFIG_SMP=y case, I'm suggesting this patch to #include <linux/smp.h>
in irq.h.
I've tested the compilation with both CONFIG_SMP=y and CONFIG_SMP=n
on i386.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's currently a diagnostic printk in relay_switch_subbuf() meant as
a warning if you accidentally try to log an event larger than the
sub-buffer size.
The problem is if this happens while logging from somewhere it's not
safe to be doing printks, such as in the scheduler, you can end up with
a deadlock. This patch removes the warning from relay_switch_subbuf()
and instead prints some diagnostic info when the channel is closed.
Thanks to Mathieu Desnoyers for pointing out the problem and
suggesting a fix.
Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ensure we call unmap_mapping_range() and sync dirty pages to disk before
doing an NFS direct write.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I reported a problem and gave hints to the solution, but nobody seemed
to react. So I prepared a patch against 2.6.14.4.
Tested on 2.6.14.4 with "ip monitor addr" and with the program
attached, while adding and removing IPv6 address. Both programs didn't
receive any messages. Tested 2.6.14.4 + this patch, and both programs
received add and remove messages.
Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com>
Acked-by: Jamal Hadi salim <hadi@cyberus.ca>
ACKed-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This (and the three subsequent patches) is working well on OMAP H4 with
2.6.15-rc4 kernel and passes the LTP fs test.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparc64, i386 and x86_64 have support for a special data section dedicated
to rarely updated data that is frequently read. The section was created to
avoid false sharing of those rarely read data with frequently written kernel
data.
This patch creates such a data section for ia64 and will group rarely written
data into this section.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.
Also make some things a bit more anal wrt VM_PFNMAP.
Pointed out by Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This merge is pretty extensive. The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.
Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
http://bugzilla.kernel.org/show_bug.cgi?id=4320
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
For tape we need to control the retries. This patch adds a retries
counter on the request for REQ_BLOCK_PC commands originating from
scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
layer SG_IO path continue to use the retires set in the ULD init_command.
(scsi_execute* does not set the gendisk so we do not execute
the init_command in that path).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
To send async requests we need these two functions exported.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Some motherboards (such as the Asus P5V800-MX) ship a
PCI_DEVICE_ID_VIA_82C586_1 IDE controller alongside a VT8251 southbridge.
This southbridge is currently unrecognised in the via82cxxx IDE driver,
preventing those users from getting DMA access to disks.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Some hardware does not support the PACKET command at all.
Other hardware supports ATAPI, but the driver does something nasty such
as calling BUG() when an ATAPI command is issued.
For these such cases, we mark them with a new flag, ATA_FLAG_NO_ATAPI.
Initial version contributed by Ben Collins.
There is no user of qc->waiting left after ata_exec_internal()
changes. Kill the field.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch implements ata_exec_internal() function which performs
libata internal command execution. Previously, this was done by each
user by manually initializing a qc, issueing it, waiting for its
completion and handling errors. In addition to obvious code
factoring, using ata_exec_internal() fixes the following bugs.
* qc not freed on issue failure
* ap->qactive clearing could race with the next internal command
* race between timeout handling and irq
* ignoring error condition not represented in tf->status
Also, qc & hardware are not accessed anymore once it's completed,
making internal commands more conformant with general semantics.
ata_exec_internal() also makes it easy to issue internal commands from
multiple threads if that becomes necessary.
This patch only implements ata_exec_internal(). A following patch
will convert all users.
Signed-off-by: Tejun Heo <htejun@gmail.com>
--
Jeff, all patches have been regenerated against upstream branch as of
today. (575ab52a21)
Also, I took out a debug printk from ata_exec_internal (don't know how
that one got left there). Other than that, all patches are identical
to the previous posting.
Thanks. :-)
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The drawing function cfbfillrect does not work correctly when access is not
unsigned-long aligned. It manifests as extra lines of pixels that are not
complete drawn. Reversing the shift operator solves the problem, so I would
presume that this bug would manifest only on little endian machines. The
function cfbcopyarea may also have this bug.
Aligned access should present no problems.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every framebuffer driver relies on the assumption that the set_par()
function of the driver is called before drawing functions and other
functions dependent on the hardware state are executed.
Whenever you switch from X to a framebuffer console for the very first
time, there is a chance that a broken X system has _not_ set the mode to
KD_GRAPHICS, thus the vt and framebuffer code executes a screen redraw and
several other functions before a set_par() is executed. This is believed
to be not a bug of linux but a bug of X/xdm. At least some X releases used
by SuSE and Debian show this behaviour.
There was a 2nd case, but that has been fixed by Antonino Daplas on
10-dec-2005.
This patch allows drivers to set a flag to inform fbcon_switch() that they
prefer a set_par() call on every console switch, working around the
problems caused by the broken X releases.
The flag will be used by the next release of cyblafb and might help other
drivers that assume a hardware state different to the one used by X.
As the default behaviour does not change, this patch should be acceptable
to everybody.
Signed-off-by: Knut Petersen <Knut_Petersen@t-online.de>
Acked-by: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add hooks to save and restore the graphics state. These hooks are called in
fbcon_blank() when entering/leaving KD_GRAPHICS mode. This is needed by
savagefb at least so it can cooperate with savage_dri and by cyblafb.
State save/restoration can be full or partial.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Spotted by a Fedora user. Compiling with DEBUG_PARPORT set fails due to
the broken cast.
Just remove it.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.
The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For Kprobes critical path is the path from debug break exception handler
till the control reaches kprobes exception code. No probes can be
supported in this path as we will end up in recursion.
This patch prevents this by moving the below function to safe __kprobes
section onto which no probes can be inserted.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The below patch lets userspace have more control over the inodes that
inotify will watch. It introduces two new flags.
IN_ONLYDIR -- only watch the inode if it is a directory.
This is needed to avoid the race that can occur when we want to be
sure that we are watching a directory.
IN_DONT_FOLLOW -- don't follow a symlink. In combination
with IN_ONLYDIR we can make sure that we don't watch the target of
symlinks.
The issues the flags fix came up when writing the gnome-vfs inotify
backend. Default behaviour is unchanged.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add list_replace_rcu: replace old entry by new one.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a timestamp field to the events sent via the process event
connector. The timestamp allows listeners to accurately account the
duration(s) between a process' events and offers strong means with which
to determine the order of events with respect to a given task while also
avoiding the addition of per-task data.
This alters the size and layout of the event structure and hence would
break compatibility if process events connector as it stands in 2.6.15-rc2
were released as a mainline kernel.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are several functions that might seem appropriate for a timestamp:
get_cycles()
current_kernel_time()
do_gettimeofday()
<read jiffies/jiffies_64>
Each has problems with combinations of SMP-safety, low resolution, and
monotonicity. This patch adds a new function that returns a monotonic SMP-safe
timestamp with nanosecond resolution where available.
Changes:
Split timestamp into separate patch
Moved to kernel/time.c
Renamed to getnstimestamp
Fixed unintended-pointer-arithmetic bug
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a new interface - rcu_barrier() which waits until all
the RCUs queued until this call have been completed.
Reiser4 needs this, because we do more than just freeing memory object
in our RCU callback: we also remove it from the list hanging off
super-block. This means, that before freeing reiser4-specific portion
of super-block (during umount) we have to wait until all pending RCU
callbacks are executed.
The only change of reiser4 made to the original patch, is exporting of
rcu_barrier().
Cc: Hans Reiser <reiser@namesys.com>
Cc: Vladimir V. Saveliev <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the previous commit, we can handle arbitrary shared re-mappings
even without this complexity, and since the only known private mappings
are for strange users of /dev/mem (which never create an incomplete one),
there seems to be no reason to support it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Completed a major overhaul of the Resource Manager code -
specifically, optimizations in the area of the AML/internal
resource conversion code. The code has been optimized to
simplify and eliminate duplicated code, CPU stack use has
been decreased by optimizing function parameters and local
variables, and naming conventions across the manager have
been standardized for clarity and ease of maintenance (this
includes function, parameter, variable, and struct/typedef
names.)
All Resource Manager dispatch and information tables have
been moved to a single location for clarity and ease of
maintenance. One new file was created, named "rsinfo.c".
The ACPI return macros (return_ACPI_STATUS, etc.) have
been modified to guarantee that the argument is
not evaluated twice, making them less prone to macro
side-effects. However, since there exists the possibility
of additional stack use if a particular compiler cannot
optimize them (such as in the debug generation case),
the original macros are optionally available. Note that
some invocations of the return_VALUE macro may now cause
size mismatch warnings; the return_UINT8 and return_UINT32
macros are provided to eliminate these. (From Randy Dunlap)
Implemented a new mechanism to enable debug tracing for
individual control methods. A new external interface,
acpi_debug_trace(), is provided to enable this mechanism. The
intent is to allow the host OS to easily enable and disable
tracing for problematic control methods. This interface
can be easily exposed to a user or debugger interface if
desired. See the file psxface.c for details.
acpi_ut_callocate() will now return a valid pointer if a
length of zero is specified - a length of one is used
and a warning is issued. This matches the behavior of
acpi_ut_allocate().
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?
Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.
On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.
On ia64:
It is always boot time frequency of a particular CPU that gets displayed.
The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.
Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
- remove err_mask from the parameter list of the complete functions
- move err_mask to ata_queued_cmd
- initialize qc->err_mask when needed
- for each function call to ata_qc_complete(), replace the err_mask parameter with qc->err_mask.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
===============
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The patch (originally from Steve) simply adds memory buffer settings to
DECnet similar to those in TCP.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Some funcions are now declared as static
- Added a I2C code for InfraRed.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I didn't like the name netif_rx_schedule_test(), in earlier patches
and changed to __netif_rx_schedule_prep to be more consistent.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This is what a lot of drivers will actually want to use to insert
individual pages into a user VMA. It doesn't have the old PageReserved
restrictions of remap_pfn_range(), and it doesn't complain about partial
remappings.
The page you insert needs to be a nice clean kernel allocation, so you
can't insert arbitrary page mappings with this, but that's not what
people want.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The uid_t and gid_t fields appear to present a 32/64-bit userspace/kernel
problem for some archs.
This patch addresses the problem by fixing the size to the largest size for
uid_t/gid_t used in the kernel. This preserves the total size of the event
structure while ensuring that the layouts of the ID change event match in
32 and 64-bit kernels and applications.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
atm_dev_deregister() removes device from atm_dev list immediately to
prevent operations on a phantom device. Decision to free device based
only on ->refcnt now. Remove shutdown_atm_dev() use atm_dev_deregister()
instead. atm_dev_deregister() also asynchronously releases all vccs
related to device.
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
These get created by some drivers that don't generally even want a pfn
remapping at all, but would really mostly prefer to just map pages
they've allocated individually instead.
For now, create a helper function that turns such an incomplete PFN
remapping call into a loop that does that explicit mapping. In the long
run we almost certainly want to export a totally different interface for
that, though.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent models of Intel/Sharp and Spansion CFI flash now have significant
bits in the upper byte of device ID codes, read via what Spansion calls
"autoselect" and Intel calls "read device identifier". Currently these
values are truncated to the low 8 bits in the mtd data structures, as
all CFI read query info has previously been read one byte at a time.
Add a new method for reading 16-bit info, currently just manufacturer
and device codes; datasheets hint at future uses for upper bytes in
other fields.
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.
The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.
Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.
This patch resolves the problem Zwane ran into.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So don't define it as extern in the header file.
drivers/base/memory.c:28: error: static declaration of 'memory_sysdev_class' follows non-static declaration
include/linux/memory.h:88: error: previous declaration of 'memory_sysdev_class' was here
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Sparc update from David in the next commit.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A review against MMC/SD specifications found some errors in the current
implementation.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Sascha Hauer
This patch adds PORT_NETX for supporting the Hilscher netx embedded
UARTs.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit af2b4079ab
Changing the #define to an inline function breaks on non-SMP builds,
since wuite a few places in the kernel do not implement the ipi handler
when compiling for UP.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ext3 compat-ioctl translation wants to translate data structures
that <linux/jbd.h> only declared when CONFIG_JBD was enabled.
So make <linux/jbd.h> play nicely even when we don't actually end up
using it.
Acked-by: Andrew Morton <akpm@osdl.org>
Acked-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Acked-by: Zan Lynx <zlynx@acm.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was some confusion about the different zone usage, this should fix
up the resulting mess in the GFP zonemask handling.
The different zone usage is still confusing (it's very easy to mix up
the individual zone numbers with the GFP zone _list_ numbers), so we
might want to clean up some of this in the future, but in the meantime
this should fix the actual problems.
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not really a network problem, more a !SMP issue.
net/core/flow.c:295: warning: statement with no effect
flow.c:295: smp_call_function(flow_cache_flush_per_cpu, &info, 1, 0);
Fix this by converting the macro to an inline function, which
also increases the typechecking for !SMP builds.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage. The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.
Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched. Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.
Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
needed anywhere? It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).
Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.
That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
flagged_taskfile() is called from execute_drive_cmd()
(the only user) only if args->tf_out_flags.all != 0.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Remove duplicate documentation for ide_do_drive_cmd() from
<linux/ide.h>, this function is already documented in ide-io.c.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Fix compile warning caused by conflicting types of expand_upwards. IA64
requires it to not be static inline, as it's used outside mm/mmap.c
Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
The structure ide_driver_t have a .owner field which is a duplicate
of .gendriver.owner field (.gen_driver is a struct device_driver).
This patch removes ide_driver_t's owner field.
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This patch fixes a bug that breaks hpacucli, a command line interface
for the HP Array Config Utility. Without this fix the utility will
not detect any controllers in the system. I thought I had already fixed
this, but I guess not.
Thanks to all who reported the issue. Please consider this this inclusion.
Signed-off-by: Mike Miller <mikem@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <axboe@suse.de>
Based upon a patch by Guido Guenther <agx@sigxcpu.org>.
Some of these ioctls had embedded time_t objects
or pointers, so needed translation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sysctl.h (again) useable from userspace
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fields obtained through cpuid vector 0x1(ebx[16:23]) and
vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not
always be the same as what is available and what OS sees. So make sure
"siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen
by OS instead of what cpuid instruction says. This will also fix the buggy BIOS
cases (for example where cpuid on a single core cpu says there are "2" siblings,
even when HT is disabled in the BIOS.
http://bugzilla.kernel.org/show_bug.cgi?id=4359)
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not go from the CPU number to an mapping array.
Mode number is often used now in fast paths.
This also adds a generic numa_node_id to all the topology includes
Suggested by Eric Dumazet
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Has been introduced for x86-64 at some point to save memory
in struct page, but has been obsolete for some time. Just
remove it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.
As a bit of historical background: when the x86-64 port
was originally designed we had some discussion if we should
use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
both. Both was ruled out at this point because it was in early
2.4 when VM is still quite shakey and had bad troubles even
dealing with one DMA zone. We settled on the 16MB DMA zone mainly
because we worried about older soundcards and the floppy.
But this has always caused problems since then because
device drivers had trouble getting enough DMA able memory. These days
the VM works much better and the wide use of NUMA has proven
it can deal with many zones successfully.
So this patch adds both zones.
This helps drivers who need a lot of memory below 4GB because
their hardware is not accessing more (graphic drivers - proprietary
and free ones, video frame buffer drivers, sound drivers etc.).
Previously they could only use IOMMU+16MB GFP_DMA, which
was not enough memory.
Another common problem is that hardware who has full memory
addressing for >4GB misses it for some control structures in memory
(like transmit rings or other metadata). They tended to allocate memory
in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
(even on AMD systems the IOMMU tends to be quite small) especially if you have
many devices. With the new zone pci_alloc_consistent can just put
this stuff into memory below 4GB which works better.
One argument was still if the zone should be 4GB or 2GB. The main
motivation for 2GB would be an unnamed not so unpopular hardware
raid controller (mostly found in older machines from a particular four letter
company) who has a strange 2GB restriction in firmware. But
that one works ok with swiotlb/IOMMU anyways, so it doesn't really
need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
it seems to be the most common restriction.
The new zone is so far added only for x86-64.
For other architectures who don't set up this
new zone nothing changes. Architectures can set a compatibility
define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
as GFP_DMA. Otherwise it's a nop because on 32bit architectures
it's normally not needed because GFP_NORMAL (=0) is DMA able
enough.
One problem is still that GFP_DMA means different things on different
architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA
(trusting it to be 4GB) #elif __x86_64__ (use other hacks like
the swiotlb because 16MB is not enough) ... . This was quite
ugly and is now obsolete.
These should be now converted to use GFP_DMA32 unconditionally. I haven't done
this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
which will use GFP_DMA32 transparently.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch unconditionally requires CAP_NET_ADMIN for all nfnetlink
messages. It also removes the per-message cap_required field, since all
existing subsystems use CAP_NET_ADMIN for all their messages anyway.
Patrick McHardy owes me a beer if we ever need to re-introduce this.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip sizecheck if the size of the attribute wasn't specified, ie. zero.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- in ata_dev_identify(), don't assume that all devices are either
ATA or ATAPI. In the future, this code will see port multipliers
and other devices.
- make a debugging printk less verbose
- add new helper ata_qc_reinit()
- add new helper BPRINTK() and port flag ATA_FLAG_DEBUGMSG, for
fine-grained debugging use.
Many structures contain both an internal part and one which is part of the API
to other modules. With this patch it is possible to only include these public
members in the kernel documentation.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Added SECAM L' video standard
- SECAM L' is a Secam variant that requires special config.
This patch adds support on V4L core. Requires aditional patches
on tuners to support.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It would appear that the timespec normalize code has an off by one error.
Found in three places. Thanks to Ben for spotting.
Signed-off-by: George Anzinger<george@mvista.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
allnoconfig:
In file included from fs/super.c:28:
include/linux/acct.h:173: warning: `TICK_NSEC' is not defined
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
put_ioctx's refcount debugging was doing an atomic_read after dropping its
reference when it wasn't the last ref, leaving a tiny race for another freeing
thread to sneak into. This shifts the debugging before the ops, uses BUG_ON,
and reformats the defines a little. Sadly, moving to inlines increased the
code size but this change decreases the code size by a whole 9 bytes :)
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
any, is done in the context of their owner who has allocated them on the
stack.
The sole user of a sync iocb's ctx reference was aio_complete() checking for
an elevated iocb ref count that could never happen. No path which grabs an
iocb ref has access to sync iocbs.
If we were to implement sync iocb cancelation it would be done by the owner of
the iocb using its on-stack reference.
Removing this chunk from aio_complete allows us to remove the entire kioctx
instance from mm_struct, reducing its size by a third. On a i386 testing box
the slab size went from 768 to 504 bytes and from 5 to 8 per page.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently per_cpu_ptr() doesn't really do anything with 'cpu' in the UP
case. This is problematic in the cases where this is the only place the
variable is referenced:
CC kernel/workqueue.o
kernel/workqueue.c: In function `current_is_keventd':
kernel/workqueue.c:460: warning: unused variable `cpu'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove task_work structure, use the standard thread flags functions and use
shifts in entry.S to test the thread flags. Add a few local labels to entry.S
to allow gas to generate short jumps.
Finally it changes a number of inline functions in thread_info.h to macros to
delay the current_thread_info() usage, which requires on m68k a structure
(task_struct) not yet defined at this point.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a) in smp_lock.h #include of sched.h and spinlock.h moved under #ifdef
CONFIG_LOCK_KERNEL.
b) interrupt.h now explicitly pulls sched.h (not via smp_lock.h from
hardirq.h as it used to)
c) in three more places we need changes to compensate for (a) - one place
in arch/sparc needs string.h now, hardirq.h needs forward declaration of
task_struct and preempt.h needs direct include of thread_info.h.
d) thread_info-related helpers in sched.h and thread_info.h put under
ifndef __HAVE_THREAD_FUNCTIONS. Obviously safe.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
encapsulates the rest of arch-dependent operations with thread_info access.
Two new helpers - setup_thread_stack() and end_of_stack(). For normal case
the former consists of copying thread_info of parent to new thread_info and
the latter returns pointer immediately past the end of thread_info.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
new helper - task_thread_info(task). On platforms that have thread_info
allocated separately (i.e. in default case) it simply returns
task->thread_info. m68k wants (and for good reasons) to embed its thread_info
into task_struct. So it will (in later patch) have task_thread_info() of its
own. For now we just add a macro for generic case and convert existing
instances of its body in core kernel to uses of new macro. Obviously safe -
all normal architectures get the same preprocessor output they used to get.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enablement patch for the new PowerBooks (late 2005 edition).
This enables the ATA controller, Gigabit ethernet and basic AGP setup.
Bluetooth works out-of-the box after running hid2hci.
Still remaining is to get the touchpad to work, the simple change of just
adding the new USB ids isn't enough.
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove last remnant of the defunct early reclaim page logic, the no longer
used __GFP_NORECLAIM flag bit.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Martin Hicks <mort@bork.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up of __alloc_pages.
Restoration of previous behaviour, plus further cleanups by introducing an
'alloc_flags', removing the last of should_reclaim_zone.
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The address based work estimate for unmapping (for lockbreak) is and always
was horribly inefficient for sparse mappings. The problem is most simply
explained with an example:
If we find a pgd is clear, we still have to call into unmap_page_range
PGDIR_SIZE / ZAP_BLOCK_SIZE times, each time checking the clear pgd, in
order to progress the working address to the next pgd.
The fundamental way to solve the problem is to keep track of the end
address we've processed and pass it back to the higher layers.
From: Nick Piggin <npiggin@suse.de>
Modification to completely get away from address based work estimate
and instead use an abstract count, with a very small cost for empty
entries as opposed to present pages.
On 2.6.14-git2, ppc64, and CONFIG_PREEMPT=y, mapping and unmapping 1TB
of virtual address space takes 1.69s; with the following patch applied,
this operation can be done 1000 times in less than 0.01s
From: Andrew Morton <akpm@osdl.org>
With CONFIG_HUTETLB_PAGE=n:
mm/memory.c: In function `unmap_vmas':
mm/memory.c:779: warning: division by zero
Due to
zap_work -= (end - start) /
(HPAGE_SIZE / PAGE_SIZE);
So make the dummy HPAGE_SIZE non-zero
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changed jobs and the Freescale address is no longer valid.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since few people need the support anymore, this moves the legacy
pm_xxx functions to CONFIG_PM_LEGACY, and include/linux/pm_legacy.h.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The file_lock spinlock sits close to mostly read fields of 'struct
files_struct'
In SMP (and NUMA) environments, each time a thread wants to open or close
a file, it has to acquire the spinlock, thus invalidating the cache line
containing this spinlock on other CPUS. So other threads doing
read()/write()/... calls that use RCU to access the file table are going
to ask further memory (possibly NUMA) transactions to read again this
memory line.
Move the spinlock to another cache line, so that concurrent threads can
share the cache line containing 'count' and 'fdt' fields.
It's worth up to 9% on a microbenchmark using a 4-thread 2-package x86
machine. See
http://marc.theaimsgroup.com/?l=linux-kernel&m=112680448713342&w=2
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dev_valid_name() is a useful function. Make it public.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Patch from Richard Purdie
Add a driver for the extra GPIOs found on the Sharp SL-C1000 (Akita).
These GPIOs are found on a Maxim MAX7310 I2C i/o expander chip. A
generic GPIO driver for the MAX7310 was attempted but this mini
driver is a much simpler and much more effective solution avoiding
several issues and complexity the generic driver had (as discussed
on LKML).
The platform device is required so the device parent can be set
correctly which ensures the device is one of the last to suspend
and first to resume. Whilst the i2c suspend/resume calls can be
influenced, nothing guarantees this is easlier/later than the
subsystems the gpios are used on which are all independent of i2c
(sound, irda, video/backlight etc.).
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On Fri, Nov 11, 2005 at 12:58:40PM -0800, David S. Miller wrote:
>
> This change:
>
> diff-tree 8ca2bdc7a9 (from feee207e44d3643d19e648aAuthor: Christoph Hellwig <hch@lst.de>
> Date: Wed Nov 9 12:07:18 2005 -0800
>
> [SPARC] sbus rtc: implement ->compat_ioctl
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> results in the console now getting spewed on sparc64 systems
> with messages like:
>
> [ 11.968298] ioctl32(hwclock:464): Unknown cmd fd(3) cmd(401c7014){00} arg(efc
> What's happening is hwclock tries first the SBUS rtc device ioctls
> then the normal rtc driver ones.
>
> So things actually worked better when we had the SBUS rtc compat ioctl
> directly handled via the generic compat ioctl code.
>
> There are _so_ many rtc drivers in the kernel implementing the
> generic rtc ioctls that I don't think putting a ->compat_ioctl
> into all of them to fix this problem is feasible. Unless we
> write a single rtc_compat_ioctl(), export it to modules, and hook
> it into all of those somehow.
>
> But even that doesn't appear to have any pretty implementation.
>
> Any better ideas?
We had similar problems with other ioctls where userspace did things
like that. What we did there was to put the compat handler to generic
code. The patch below does that, adding a big comment about what's
going on and removing the COMPAT_IOCTL entires for these on powerpc
that not only weren't ever useful but are duplicated now aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds request_queue->nr_sorted which keeps the number of
requests in the iosched and implement elv_drain_elevator which
performs forced dispatching. elv_drain_elevator checks whether
iosched actually dispatches all requests it has and prints error
message if it doesn't. As buggy forced dispatching can result in
wrong barrier operations, I think this extra check is worthwhile.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Also introduces a sysctl option to configure the receive buffer
accounting policy to be either at socket or association level.
Default is all the associations on the same socket share the
receive buffer.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will let me chop the code size of several drivers right down. In
many cases the actual private data is very useful and constant for a
given host controller so being able to just pass it at probe time would
be very useful indeed (eg with the via driver would could pass the udma
clocking and reduce the code size, or with the AMD one the UDMA
multiplier and the offset)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
On Alpha:
include/linux/libata.h: In function `ata_pad_alloc':
include/linux/libata.h:785: warning: implicit declaration of function `dma_alloc_coherent'
include/linux/libata.h:786: warning: assignment makes pointer from integer without a cast
include/linux/libata.h: In function `ata_pad_free':
include/linux/libata.h:792: warning: implicit declaration of function `dma_free_coherent'
(I have a decouple-some-header-files cleanup in -mm, so it's causing some
fallout of this nature)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Use "hints" to speed up the SACK processing. Various forms
of this have been used by TCP developers (Web100, STCP, BIC)
to avoid the 2x linear search of outstanding segments.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.
The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A nice feature of sysfs is that it can create the symlink from the
driver to the module that is contained in it.
It requires that the device_driver.owner is set, what is not the
case for many PCI drivers.
This patch allows pci_register_driver to set automatically the
device_driver.owner for any PCI driver.
Credits to Al Viro who suggested the method.
Signed-off-by: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
--
drivers/ide/setup-pci.c | 12 +++++++-----
drivers/pci/pci-driver.c | 9 +++++----
include/linux/ide.h | 3 ++-
include/linux/pci.h | 10 ++++++++--
4 files changed, 22 insertions(+), 12 deletions(-)
This patch tweaks the way pciehp requests control of the hotplug
hardware from BIOS. It now tries to invoke the ACPI _OSC method
for a specific hotplug controller only, rather than walking the
entire acpi namespace invoking all possible _OSC methods under
all host bridges. This allows us to gain control of each hotplug
controller individually, even if BIOS fails to give us control of
some other hotplug controller in the system.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some devices have more than one capability of the same type. For
example, the PCI header for the PathScale InfiniPath looks like:
04:01.0 InfiniBand: Unknown device 1fc1:000d (rev 02)
Subsystem: Unknown device 1fc1:000d
Flags: bus master, fast devsel, latency 0, IRQ 193
Memory at fea00000 (64-bit, non-prefetchable) [size=2M]
Capabilities: [c0] HyperTransport: Slave or Primary Interface
Capabilities: [f8] HyperTransport: Interrupt Discovery and Configuration
There are _two_ HyperTransport capabilities, and the PathScale driver
wants to look at both of them.
The current pci_find_capability() API doesn't work for this, since it
only allows us to get to the first capability of a given type. The
patch below introduces a new pci_find_next_capability(), which can be
used in a loop like
for (pos = pci_find_capability(pdev, <ID>);
pos;
pos = pci_find_next_capability(pdev, pos, <ID>)) {
/* ... */
}
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The protocol field in ethernet headers is big-endian and should be
annotated as such. This patch allows detection of missing ntohs() calls
on the ethernet protocol field when sparse is run with __CHECK_ENDIAN__
defined.
This is a revised version that includes <linux/types.h> so that the
userspace programs are not confused by __be16. Thanks to David S.
Miller.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the patch that introduces the generic skb_checksum_complete
which also checks for hardware RX checksum faults. If that happens,
it'll call netdev_rx_csum_fault which currently prints out a stack
trace with the device name. In future it can turn off RX checksum.
I've converted every spot under net/ that does RX checksum checks to
use skb_checksum_complete or __skb_checksum_complete with the
exceptions of:
* Those places where checksums are done bit by bit. These will call
netdev_rx_csum_fault directly.
* The following have not been completely checked/converted:
ipmr
ip_vs
netfilter
dccp
This patch is based on patches and suggestions from Stephen Hemminger
and David S. Miller.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic netlink family builds on top of netlink and provides
simplifies access for the less demanding netlink users. It solves
the problem of protocol numbers running out by introducing a so
called controller taking care of id management and name resolving.
Generic netlink modules register themself after filling out their
id card (struct genl_family), after successful registration the
modules are able to register callbacks to command numbers by
filling out a struct genl_ops and calling genl_register_op(). The
registered callbacks are invoked with attributes parsed making
life of simple modules a lot easier.
Although generic netlink modules can request static identifiers,
it is recommended to use GENL_ID_GENERATE and to let the controller
assign a unique identifier to the module. Userspace applications
will then ask the controller and lookup the idenfier by the module
name.
Due to the current multicast implementation of netlink, the number
of generic netlink modules is restricted to 1024 to avoid wasting
memory for the per socket multiacst subscription bitmask.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces a new type-safe interface for netlink message and
attributes handling. The interface is fully binary compatible
with the old interface towards userspace. Besides type safety,
this interface features attribute validation capabilities,
simplified message contstruction, and documentation.
The resulting netlink code should be smaller, less error prone
and easier to understand.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing connection tracking subsystem in netfilter can only
handle ipv4. There were basically two choices present to add
connection tracking support for ipv6. We could either duplicate all
of the ipv4 connection tracking code into an ipv6 counterpart, or (the
choice taken by these patches) we could design a generic layer that
could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
(TCP, UDP, etc.) connection tracking helper module to be written.
In fact nf_conntrack is capable of working with any layer 3
protocol.
The existing ipv4 specific conntrack code could also not deal
with the pecularities of doing connection tracking on ipv6,
which is also cured here. For example, these issues include:
1) ICMPv6 handling, which is used for neighbour discovery in
ipv6 thus some messages such as these should not participate
in connection tracking since effectively they are like ARP
messages
2) fragmentation must be handled differently in ipv6, because
the simplistic "defrag, connection track and NAT, refrag"
(which the existing ipv4 connection tracking does) approach simply
isn't feasible in ipv6
3) ipv6 extension header parsing must occur at the correct spots
before and after connection tracking decisions, and there were
no provisions for this in the existing connection tracking
design
4) ipv6 has no need for stateful NAT
The ipv4 specific conntrack layer is kept around, until all of
the ipv4 specific conntrack helpers are ported over to nf_conntrack
and it is feature complete. Once that occurs, the old conntrack
stuff will get placed into the feature-removal-schedule and we will
fully kill it off 6 months later.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
From: "Jordan Crouse" <jordan.crouse@amd.com>
The core IDE engine on the CS5536 is the same as the other AMD southbridges,
so unlike the CS5535, we can simply add the appropriate PCI headers to
the existing amd74xx code.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
CONFIG_IDE_MAX_HWIFS is a generic thing, no need to have it duplicated
by every arch that uses it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Devices driven by ide-cs will appear under /sys/devices instead of the
appropriate PCMCIA device. To fix this I had to extend the hw_regs_t
structure with a 'struct device' field, which allows us to set the
parent link for the appropriate hwif.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@suse.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
nfattr_parse (and thus nfattr_parse_nested) always returns success. So we
can make them 'void' and remove all the checking at the caller side.
Based on original patch by Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce struct platform_driver. This allows the platform device
driver methods to be passed a platform_device structure instead of
instead of a plain device structure, and therefore requiring casting
in every platform driver.
We introduce this in such a way that any existing platform drivers
registered directly via driver_register continue to work as before,
thereby allowing a gradual conversion to the new platform_driver
methods.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
There are a few loose ends following the conversion of md to use kthreads:
- Some fields in mdk_thread_t that aren't needed (kthreads does it's own
completion and manages it's own name).
- thread->run is now never NULL, so no need to check
- Some tests for signal_pending that aren't needed (As we don't use signals
to stop threads any more)
- Some flush_signals are not needed
- Some waits are interruptible and don't need to be.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....
So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.
We initially assumes barriers are OK.
When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers. This will usually clear the flags fairly quickly.
If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.
When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried. So raid1d is enhanced to do this,
and when any bio write completes (i.e. no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.
We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
1/ the device used to support BARRIER, but now doesn't. Few devices
change like this, though raid1 can!
or
2/ the array has no persistent superblock, so there was no opportunity to
pre-test for barriers when writing the superblock.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current bitmaps use set_bit et.al and so are host-endian, which means
not-portable. Oops.
Define a new version number (4) for which bitmaps are little-endian.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This has the advantage of removing the confusion caused by 'rdev_t' and
'mddev_t' both having 'in_sync' fields.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
no success, fail the drive. This will make sure a dead
drive will be eventually kicked even when we aren't trying
to rewrite (which would normally kick a dead drive more quickly.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
/sys/block/mdX/md/raid5/attribute
to
/sys/block/mdX/md/attribute
This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With this, raid5 can be asked to check parity without repairing it. It also
keeps a count of the number of incorrect parity blocks found (mismatches) and
reports them through sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You can trigger a 'check' with
echo check > /sys/block/mdX/md/scan_mode
or a check-and-repair errors with
echo repair > /sys/block/mdX/md/scan_mode
and read the current state from the same file.
Note: personalities need to know the different between 'check' and 'repair',
but don't yet. Until they do, 'check' will be the same as 'repair' and will
just do a normal resync pass.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
/sys/block/mdX/md/raid5/
contains raid5-related attributes.
Currently
stripe_cache_size
is number of entries in stripe cache, and is settable.
stripe_cache_active
is number of active entries, and in only readable.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Each device in an md array how has a corresponding
/sys/block/mdX/md/devNN/
directory which can contain attributes. Currently there is only 'state' which
summarises the state, nd 'super' which has a copy of the superblock, and
'block' which is a symlink to the block device.
Also, /sys/block/mdX/md/rdNN represents slot 'NN' in the array, and is a
symlink to the relevant 'devNN'. Obviously spare devices do not have a slot
in the array, and so don't have such a symlink.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Start using kobjects in mddevs, and provide a couple of simple attributes
(level and disks). Attributes live in
/sys/block/mdX/md/attr-name
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch changes the behaviour of raid5 when it gets a read error.
Instead of just failing the device, it tried to find out what should have
been there, and writes it over the bad block. For some media-errors, this
has a reasonable chance of fixing the error. If the write succeeds, and a
subsequent read succeeds as well, raid5 decided the address is OK and
conitnues.
Instead of failing a drive on read-error, we attempt to re-write the block,
and then re-read. If that all works, we allow the device to remain in the
array.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The frame buffer layer already had some code dealing with compat ioctls, this
patch moves over the remaining code from fs/compat_ioctl.c
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add ability to set rotation via sysfs. The attributes are located in
/sys/class/graphics/fb[n] and accepts 0 - unrotated; 1 - clockwise; 2 - upside
down; 3 - counterclockwise.
The attributes are:
con_rotate (r/w) - set rotation of the active console
con_rotate_all (w) - set rotation of all consoles
rotate (r/w) - set rotation of the framebuffer, if supported.
Currently, none of the drivers support this.
This is probably temporary, since con_rotate and con_rotate_all are
console-specific and has no business being under the fb device. However,
until the console layer acquires it's own sysfs class, these attributes will
temporarily reside here.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for rotating and positioning of the logo. Rotation and position
depends on 'int rotate' parameter added to fb_prepare_logo() and
fb_show_logo().
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch series implements generic code to rotate the console at 90, 180,
and 270 degrees. The implementation is completely done in the framebuffer
console level, thus no changes to the framebuffer layer or to the drivers
are needed.
Console rotation is required by some Sharp-based devices where the natural
orientation of the display is not at 0 degrees. Also, users that have
displays that can pivot will benefit by having a console in portrait mode
if they so desire.
The choice to implement the code in the console layer rather than in the
framebuffer layer is due to the following reasons:
- it's fast
- it does not require driver changes
- it can coexist with devices that can rotate the display at the hardware level
- it complements graphics applications that can do display rotation
The changes to core fbcon are minimal-- recognition of the console
rotation angle so it can swap directions, origins and axes (xres vs yres,
xpanstep vs ypanstep, xoffset vs yoffset, etc) and storage of the rotation
angle per display. The bulk of the code that does the actual drawing to the
screen are placed in separate files. Each angle of rotation has separate
methods (bmove, clear, putcs, cursor, update_start which is derived from
update_var, and clear_margins). To mimimize processing time, the fontdata
are pre-rotated at each console switch (only if the font or the angle has
changed).
The option can be compiled out (CONFIG_FRAMEBUFFER_CONSOLE_ROTATION = n) if
rotation is not needed.
Choosing the rotation angle can be done in several ways:
1. boot option fbcon=rotate:n, where
n = 0 - normal
n = 1 - 90 degrees (clockwise)
n = 2 - 180 degrees (upside down)
n = 3 - 270 degrees (counterclockwise)
2. echo n > /sys/class/graphics/fb[num]/con_rotate
where n is the same as described above. It sets the angle of rotation
of the current console
3 echo n > /sys/class/graphics/fb[num]/con_rotate_all
where n is the same as described above. Globally sets the angle of
rotation.
GOTCHAS:
The option, especially at angles of 90 and 270 degrees, will exercise
the least used code of drivers. Namely, at these angles, panning is done
in the x-axis, so it can reveal bugs in the driver if xpanstep is set
incorrectly. A workaround is to set xpanstep = 0.
Secondly, at these angles, the framebuffer memory access can be
unaligned if (fontheight * bpp) % 32 ~= 0 which can reveal bugs in the drivers
imageblit, fillrect and copyarea functions. (I think cfbfillrect may have
this buglet). A workaround is to use a standard 8x16 font.
Speed:
The scrolling speed difference between 0 and 180 degrees is minimal,
somewhere areound 1-2%. At 90 or 270 degress, speed drops down to a vicinity
of 30-40%. This is understandable because the blit direction is across the
framebuffer "direction." Scrolling will be helped at these angles if xpanstep
is not equal to zero, use of 8x16 fonts, and setting xres_virtual >= xres * 2.
Note: The code is tested on little-endian only, so I don't know if it will
work in big-endian. Please let me know, it will take only less than a minute
of your time.
This patch prepares fbcon for console rotation and contains the following
changes:
- add rotate field in struct fbcon_ops to keep fbcon's current rotation
angle
- add con_rotate field in struct display to store per-display rotation angle
- create a private copy of the current var to fbcon. This will prevent
fbcon from directly manipulating info->var, especially the fields xoffset,
yoffset and vmode.
- add ability to swap pertinent axes (xres, yres; xpanstep, ypanstep; etc)
depending on the rotation angle
- change global update_var() (function that sets the screen start address)
as an fbcon method update_start. This is required because the axes, start
offset, and/or direction can be reversed depending on the rotation angle.
- add fbcon method rotate_font() which will rotate each character bitmap to
the correct angle of rotation.
- add fbcon boot option 'rotate' to select the angle of rotation at bootime.
Currently does nothing until all patches are applied.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Em28xx cleanups and fixes.
- Some cleanups and audio amux adjust.
- em28xx will allways try, by default, the biggest size alt.
- Fixes audio mux code.
- Fixes some logs.
- Adds support for digital output for WinTV USB2 board.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Moved some user defines to be out of __KERNEL__ define.
Signed-off-by: Michael Schimek <mschimek@gmx.at>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Some changes to allow compiling cx88 and saa7134 without V4L1 support.
- This patch will help obsoleting V4L1 API.
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- This patch adds the VIDIOC_LOG_STATUS to videodev2.h and adds
LOG_STATUS support to tda9887.c and bttv-driver.c.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
->permission and ->lookup have a struct nameidata * argument these days to
pass down lookup intents. Unfortunately some callers of lookup_hash don't
actually pass this one down. For lookup_one_len() we don't have a struct
nameidata to pass down, but as this function is a library function only
used by filesystem code this is an acceptable limitation. All other
callers should pass down the nameidata, so this patch changes the
lookup_hash interface to only take a struct nameidata argument and derives
the other two arguments to __lookup_hash from it. All callers already have
the nameidata argument available so this is not a problem.
At the same time I'd like to deprecate the lookup_hash interface as there
are better exported interfaces for filesystem usage. Before it can
actually be removed I need to fix up rpc_pipefs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A few more callers of permission() just want to check for a different access
pattern on an already open file. This patch adds a wrapper for permission()
that takes a file in preparation of per-mount read-only support and to clean
up the callers a little. The helper is not intended for new code, everything
without the interface set in stone should use vfs_permission()
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most permission() calls have a struct nameidata * available. This helper
takes that as an argument and thus makes sure we pass it down for lookup
intents and prepares for per-mount read-only support where we need a struct
vfsmount for checking whether a file is writeable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes almost all inclusions of linux/version.h. The 3
#defines are unused in most of the touched files.
A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.
There are also lots of #ifdef for long obsolete kernels, this was not
touched. In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.
quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When calling target drivers to set frequency, we take cpucontrol lock.
When we modified the code to accomodate CPU hotplug, there was an attempt
to take a double lock of cpucontrol leading to a deadlock. Since the
current thread context is already holding the cpucontrol lock, we dont need
to make another attempt to acquire it.
Now we leave a trace in current->flags indicating current thread already is
under cpucontrol lock held, so we dont attempt to do this another time.
Thanks to Andrew Morton for the beating:-)
From: Brice Goglin <Brice.Goglin@ens-lyon.org>
Build fix
(akpm: this patch is still unpleasant. Ashok continues to look for a cleaner
solution, doesn't he? ;))
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You could open the /proc/sys/net/ipv4/conf/<if>/<whatever> file, then
wait for interface to go away, try to grab as much memory as possible in
hope to hit the (kfreed) ctl_table. Then fill it with pointers to your
function. Then do read from file you've opened and if you are lucky,
you'll get it called as ->proc_handler() in kernel mode.
So this is at least an Oops and possibly more. It does depend on an
interface going away though, so less of a security risk than it would
otherwise be.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Matt Domsch <Matt_Domsch@dell.com>
The patch below implements the Microsoft Point-to-Point Encryption method
as a PPP compressor/decompressor. This is necessary for Linux clients and
servers to interoperate with Microsoft Point-to-Point Tunneling Protocol
(PPTP) servers (either Microsoft PPTP servers or the poptop project) which
use MPPE to encrypt data when creating a VPN.
This patch differs from the kernel_ppp_mppe DKMS pacakge at
pptpclient.sourceforge.net by utilizing the kernel crypto routines rather
than providing its own SHA1 and arcfour implementations.
Minor changes to ppp_generic.c try to prevent a link from disabling
compression (in our case, the encryption) after it has started using
compression (encryption).
Feedback to <pptpclient-devel@lists.sourceforge.net> please.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: James Cameron <james.cameron@hp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch encloses the idr.h header file in
#ifndef __IDR_H__ macro.
Signed-off-by: Luben Tuikov <luben_tuikov@adaptec.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
This patch kills include/linux/eeprom.h .
Rationale:
- it was only used by one single driver
- even this driver didn't do anything useful with it
- most of this file are non-inline and non-static functions (sic)
This removes include/linux/eeprom.h and cleans drivers/net/ns83820.c up.
If you think eeprom.h should be used more extensively, please consider:
- the code has to be moved from the header file to a .c file
- the currently empty write function has to be implemented
- ns83820.c or any other driver should actually use it
Noone did any of these during the more than 3 years eeprom.h already
exists...
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
An unbindable mount does not forward or receive propagation. Also
unbindable mount disallows bind mounts. The semantics is as follows.
Bind semantics:
It is invalid to bind mount an unbindable mount.
Move semantics:
It is invalid to move an unbindable mount under shared mount.
Clone-namespace semantics:
If a mount is unbindable in the parent namespace, the corresponding
cloned mount in the child namespace becomes unbindable too. Note:
there is subtle difference, unbindable mounts cannot be bind mounted
but can be cloned during clone-namespace.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A slave mount always has a master mount from which it receives
mount/umount events. Unlike shared mount the event propagation does not
flow from the slave mount to the master.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An unmount of a mount creates a umount event on the parent. If the
parent is a shared mount, it gets propagated to all mounts in the peer
group.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement handling of MS_BIND in presense of shared mounts (see
Documentation/sharedsubtree.txt in the end of patch series for detailed
description).
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This creates shared mounts. A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other. All the
shared mounts that propagate events to each other belong to the same
peer-group.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A private mount does not forward or receive propagation. This patch
provides user the ability to convert any mount to private.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the per-namespace semaphore in favor of a global semaphore.
This can have an effect on namespace scalability.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and
a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off
Moreover, LSM provides hooks for doing the same sort of broken logics.
The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:
- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).
The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...
The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.
quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the ability to the SMU driver to recover missing
calibration partitions from the SMU chip itself. It also adds some
dynamic mecanism to /proc/device-tree so that new properties are visible
to userland.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global function:
- core.c: pnp_remove_device
- #if 0 the following unneeded EXPORT_SYMBOL's:
- card.c: pnp_add_card
- card.c: pnp_remove_card
- card.c: pnp_add_card_device
- card.c: pnp_remove_card_device
- card.c: pnp_add_card_id
- core.c: pnp_register_protocol
- core.c: pnp_unregister_protocol
- core.c: pnp_add_device
- core.c: pnp_remove_device
- pnpacpi/core.c: pnpacpi_protocol
- driver.c: pnp_add_id
- isapnp/core.c: isapnp_read_byte
- manager.c: pnp_auto_config_dev
- resource.c: pnp_register_dependent_option
- resource.c: pnp_register_independent_option
- resource.c: pnp_register_irq_resource
- resource.c: pnp_register_dma_resource
- resource.c: pnp_register_port_resource
- resource.c: pnp_register_mem_resource
Note that this patch #if 0's exactly one functions and removes no
functions. Most it does is the #if 0 of EXPORT_SYMBOL's, so if any modular
code will use any of them, re-adding will be trivial.
Modular ISAPnP might be interesting in some cases, but this is more legacy
code. If someone would work on it to sort all the issues out (starting
with the point that most users of __ISAPNP__ will have to be fixed)
re-enabling the required EXPORT_SYMBOL's won't be hard for him.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This looks like something which out-of-tree code could possibly be using.
Give panic_timeout the twelve-month treatment.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This looks like something which out-of-tree code could possibly be using.
Give insert_resource the twelve-month treatment.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert to proper kernel-doc format.
Some have extra blank lines (not allowed immed. after the function name)
or need blank lines (after all parameters). Function summary must be only
one line.
Colon (":") in a function description does weird things (causes kernel-doc
to think that it's a new section head sadly).
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various warnings in kernel-doc:
Warning(linux-2614-rc4//include/linux/net.h:89): Enum value 'SOCK_DCCP' not described in enum 'sock_type'
usercopy.c: should use !E instead of !I for exported symbols:
Warning(linux-2614-rc4//arch/i386/lib/usercopy.c): no structured comments found
fs.h does not need to use !E since it has no exported symbols:
Warning(linux-2614-rc4//include/linux/fs.h:1182): No description found for parameter 'find_exported_dentry'
Warning(linux-2614-rc4//include/linux/fs.h): no structured comments found
irq/manage.c should use !E for its exported symbols:
Warning(linux-2614-rc4//kernel/irq/manage.c): no structured comments found
macmodes.c should use !E for its exported symbols:
Warning(linux-2614-rc4//drivers/video/macmodes.c): no structured comments found
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add structure fields kernel-doc for 2 fields in struct journal_s.
Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbuf'
Warning(/var/linsrc/linux-2614-rc4//include/linux/jbd.h:808): No description found for parameter 'j_wbufsize'
Convert fs/jbd/recovery.c non-static functions to kernel-doc format.
fs/jbd/recovery.c doesn't export any symbols, so it should use
!I instead of !E to eliminate this warning message:
Warning(/var/linsrc/linux-2614-rc4//fs/jbd/recovery.c): no structured comments found
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add new entries for Mystique AGP with the PCI ID 0x051e.
I don't actually have such boards but according to google they do exist.
Curiosly X.Org doesn't recognize that PCI ID. And what's even more
interesting is that Matrox's own Windows drivers don't recognize it either.
After going through about a dozen different versions I did find one older
driver that does list this particular ID. It is also listed in the pci.ids
file.
I'm not sure if non-220 AGP chips exist. I left the chip revision check
intact for AGP chips nonetheless.
Signed-off-by: Ville Syrjl <syrjala@sci.fi>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add new helper, fb_find_best_display(), which will search the modelist for the
best mode for the attached display. This requires an EDID block that is
converted to struct fb_monspecs and a private modelist. The search will be
done in this manner:
- if 1st detailed timing is preferred, use that
- else if dimensions of the display are known, use that to estimate xres and
- else if modelist has detailed timings, use the first detailed timing
- else, use the very first entry from the modelist
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I converted the "rl" console font from the kbd utility to be a built-in font
for the framebuffer console, and I was wondering if you would be OK with
including it. I've generated a font_rl.c file and related minor
modifications. I find it's the most visually appealing of the kbd fonts which
is why I use it and selected it for conversion. I believe the font is GPL'd.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the fb_find_nearest_mode() function finds a mode with screen
resolution closest to that described by the 'var' argument and with some
arbitrary refresh rate (eg. in the following sequence of refresh rates: 70 60
53 85 75, 53 is selected).
This patch fixes the function so that it looks for the closest mode as far as
both resolution and refresh rate are concerned. The function's first argument
is changed to fb_videomode so that the refresh rate can be specified by the
caller, as fb_var_screeninfo doesn't have any fields that could directly hold
this data.
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
According to Jon Smirl, filling in the field fb_cursor with soft_cursor for
drivers that do not support hardware cursors is redundant. The soft_cursor
function is usable by all drivers because it is just a wrapper around
fb_imageblit. And because soft_cursor is an fbcon-specific hook, the file is
moved to the console directory.
Thus, drivers that do not support hardware cursors can leave the fb_cursor
field blank. For drivers that do, they can fill up this field with their own
version.
The end result is a smaller code size. And if the framebuffer console is not
loaded, module/kernel size is also reduced because the soft_cursor module will
also not be loaded.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are a couple of tests which could possibly be confused by extremely
large numbers appearing in 'xdr' packets. I think the closest to an exploit
you could get would be writing random data from a free page into a file - i.e.
leak data out of kernel space.
I'm fairly sure they cannot be used for remote compromise.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a file in the NFSD filesystem that allows setting and querying of
which version of NFS are being exported. Changes are only allowed while no
server is running.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updates the RIO messaging interface to pass a device instance into the
event registeration and callbacks.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Addresses issues raised with the 2.6.12-rc6-mm1 RIO support. Fix dma_mask
init, shrink some code, general cleanup.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add RapidIO core include files.
The core code implements enumeration/discovery, management of
devices/resources, and interfaces for RIO drivers.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reorganize the preempt_disable/enable calls to eliminate the extra preempt
depth. Changes based on Paul McKenney's review suggestions for the kprobes
RCU changeset.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes to the base kprobes infrastructure to use RCU for synchronization
during kprobe registration and unregistration. These changes coupled with the
arch kprobe changes (next in series):
a. serialize registration and unregistration of kprobes.
b. enable lockless execution of handlers. Handlers can now run in parallel.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes to the base kprobe infrastructure to track kprobe execution on a
per-cpu basis.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is rather large, but it really can't be done in smaller chunks
easily and I believe it is an important change. This has been out and tested
for a while in the latest IPMI driver release. There are no functional
changes, just changes as necessary to convert the locking over (and a few
minor style updates).
The IPMI driver uses read/write locks to ensure that things exist while they
are in use. This is bad from a number of points of view. This patch removes
the rwlocks and uses refcounts and RCU lists to manage what the locks did.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch passes the file handle supplied in iattr to userspace, in case the
->setattr() was invoked from sys_ftruncate(). This solves the permission
checking (or lack thereof) in ftruncate() for the class of filesystems served
by an unprivileged userspace process.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds an atomic create+open operation. This does not yet work if
the file type changes between lookup and create+open, but solves the
permission checking problems for the separte create and open methods.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new access call, which will only be called if ->permission is invoked
from sys_access(). In all other cases permission checking is delayed until
the actual filesystem operation.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Though the following changes are all backward compatible (from the kernel's as
well as the library's POV) change the minor version, so interested
applications can detect new features.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch extends the iattr structure with a file pointer memeber, and adds
an ATTR_FILE validity flag for this member.
This is set if do_truncate() is invoked from ftruncate() or from
do_coredump().
The change is source and binary compatible.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The sys_ptrace boilerplate code (everything outside the big switch
statement for the arch-specific requests) is shared by most architectures.
This patch moves it to kernel/ptrace.c and leaves the arch-specific code as
arch_ptrace.
Some architectures have a too different ptrace so we have to exclude them.
They continue to keep their implementations. For sh64 I had to add a
sh64_ptrace wrapper because it does some initialization on the first call.
For um I removed an ifdefed SUBARCH_PTRACE_SPECIAL block, but
SUBARCH_PTRACE_SPECIAL isn't defined anywhere in the tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-By: David Howells <dhowells@redhat.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- "extern inline" -> "static inline"
- every file should #include the headers containing the prototypes for
it's global functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AIO was adding a new context's max requests to the global total before
testing if that resulting total was over the global limit. This let
innocent tasks get their new limit tested along with a racing guilty task
that was crossing the limit. This serializes the _nr accounting with a
spinlock It also switches to using unsigned long for the global totals.
Individual contexts are still limited to an unsigned int's worth of
requests by the syscall interface.
The problem and fix were verified with a simple program that spun creating
and destroying a context while holding on to another long lived context.
Before the patch a task creating a tiny context could get a spurious EAGAIN
if it raced with a task creating a very large context that overran the
limit.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reiser4 uses radix trees to solve a trouble reiser4_readdir has serving nfs
requests.
Unfortunately, radix tree api lacks an operation suitable for modifying
existing entry. This patch adds radix_tree_lookup_slot which returns pointer
to found item within the tree. That location can be then updated.
Both Nick and Christoph Lameter have patches which need this as well.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a few comments surrounding the generic readahead API.
Also convert some ulongs into pgoff_t: the identifier for PAGE_CACHE_SIZE
offsets into pagecache.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add SHM_NORESERVE functionality similar to MAP_NORESERVE for shared memory
segments.
This is mainly to avoid abuse of OVERCOMMIT_ALWAYS and this flag is ignored
for OVERCOMMIT_NEVER.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the hlist_for_each_rcu() API, which is used only in one place, and
is trivially converted to hlist_for_each_entry_rcu(), making the code
shorter and more readable. Any out-of-tree uses may be similarly
converted.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a connector that reports fork, exec, id change, and exit
events for all processes to userspace. It replaces the fork_advisor patch
that ELSA is currently using. Applications that may find these events
useful include accounting/auditing (e.g. ELSA), system activity monitoring
(e.g. top), security, and resource management (e.g. CKRM).
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This extends the API somewhat to allow for platform-specific VCR reading and
writing. Some platforms (like SH4-202) implement the VCR in a split VCRL and
VCRH, but end up being in reverse order or have other quirks that need to be
dealt with, so we add a set of superhyway_ops per-bus to accomodate this.
We also have to extend the per-device resources somewhat, as some devices now
conveniently split control and data blocks. So we allow a platform to
register its set of SuperHyway devices via superhyway_add_devices() with the
control block always ordered as the first resource (as this is the one that
userspace cares about).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames struct kmem_cache_s to kmem_cache so we can start using
it instead of kmem_cache_t typedef.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If they get inlined into non __xipram functions we're screwed.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add density mask for better support of DDP chips.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simple bad block table source and header files
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simplify the debugging code further.
Update the TODO list
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The changes introduced allow to suspend/resume NAND flash.
A new state (FL_PM_SUSPENDED) is introduced, as well as
routines for mtd->suspend and mtd->resume to put the flash in
suspended state from software pov.
Signed-off-by: Vitaly Wool <vwool@ru.mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The goal of summary is to speed up the mount time. Erase block summary (EBS)
stores summary information at the end of every (closed) erase block. It is
no longer necessary to scan all nodes separetly (and read all pages of them)
just read this "small" summary, where every information is stored which is
needed at mount time.
This summary information is stored in a JFFS2_FEATURE_RWCOMPAT_DELETE. During
the mount process if there is no summary info the orignal scan process will
be executed. EBS works with NAND and NOR flashes, too.
There is a user space tool called sumtool to generate this summary
information for a JFFS2 image.
Signed-off-by: Ferenc Havasi <havasi@inf.u-szeged.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- Update OMAP OneNAND mapping file using device driver model
- Remove board specific macro and values.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add OneNAND Sync. Burst Read support
Tested with OMAP platform
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
OneNAND is a new flash technology from Samsung with integrated SRAM
buffers and logic interface.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This updates the Primary Vendor-Specific Extended Query parsing to
version 1.4 in order to get the information about the Configurable
Programming Mode regions implemented in the Sibley flash, as well as
selecting the appropriate write command code.
This flash does not behave like traditional NOR flash when writing data.
While mtdblock should just work, further changes are needed for JFFS2 use.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Adds some new PCI IDs for new IPR adapters.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- Update raid class to use nested classes for raid components (this will
allow us to move to a component control model now)
- Make the raid level an enumeration rather than and int.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Rename functions to a name matching the functionality.
Remove stall debug code
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
JFFS2 uses f->dents to store the pointer to the symlink target string (in case
the inode is symlink). This is somewhat ugly to use the same field for
different reasons. Introduce distinct field f->target for this purpose.
Note, f->fragtree, f->dents, f->target may probably be put in a union.
Signed-off-by: Artem B. Bityutskiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Include autoconf.h into every kernel compilation via the gcc command line
using -imacros. This ensures that we have the kernel configuration
included from the start, rather than relying on each file having #include
<linux/config.h> as appropriate. History has shown that this is something
which is difficult to get right.
Since we now include the kernel configuration automatically, make
configcheck becomes meaningless, so remove it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
The offsets of the registers are in a different place, and
some parts cannot handle a full set of modem control signals.
Signed-off-by: Pantelis Antoniou <pantelis@embeddedalley.ocm>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add 5708 copper and serdes basic support, including 2.5 Gbps support
on 5708 serdes. SPEED_2500 is also added to ethtool.h
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change netem to support packets getting reordered because of variations in
delay. Introduce a special case version of FIFO that queues packets in order
based on the netem delay.
Since netem is classful, those users that don't want jitter based reordering
can just insert a pfifo instead of the default.
This required changes to generic skbuff code to allow finer grain manipulation
of sk_buff_head. Insertion into the middle and reverse walk.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Re-jig the simple platform device support to allow private data
to be attached to a platform device, as well as allowing the
parent device to be set.
Example usage:
pdev = platform_device_alloc("mydev", id);
if (pdev) {
err = platform_device_add_resources(pdev, &resources,
ARRAY_SIZE(resources));
if (err == 0)
err = platform_device_add_data(pdev, &platform_data,
sizeof(platform_data));
if (err == 0)
err = platform_device_add(pdev);
} else {
err = -ENOMEM;
}
if (err)
platform_device_put(pdev);
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.
This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.
The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Adds a new u8 flags in a unused padding area of the netlink
message. Adds ECN marking support to be used instead of dropping
packets immediately.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Removes unnecessary includes, initializers, and simplifies
the code a bit.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Adds a phy_mask field to struct mii_bus and uses it. This field
indicates each phy address to be ignored when probing the mdio bus.
This support is needed for the fs_enet and ibm_emac drivers to be
converted to the generic phy layer among other drivers. Many systems
lock up on probing certain phy addresses or probing doesn't return
0xffff when nothing is found at the address. A new driver I'm
working on also makes use of this mask.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Use ata_pad_{alloc,free} in two drivers, to factor out common code.
Add ata_pad_{alloc,free} to two other drivers, which needed the padding
but had not been updated.
This adds support for the Nvidia Geforce 7800 series of cards to the
nvidiafb framebuffer driver. All it does is add the PCI device id for
the 7800, 7800 GTX, 7800 GO, and 7800 GTX GO cards to the module device
table for the nvidiafb.ko driver, so that nvidiafb.ko will actually work
on these cards.
I also added the relevant PCI device ids to linux/pci_ids.h
I tested it on my 7800 GTX here and it works like a charm. I now can
get framebuffer support on this card! Woo hoo!! Nothing like 200x75 text
mode to make your eyes BLEED. ;)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny
bits specified must be exactly equal to the union of the share_access and
share_deny bits specified for some subset of the OPENs in effect for
current openowner on the current file.
Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that
it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to
OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file
with O_WRONLY access mode.
Fix the problem by replacing nfs4_find_state() with a modified version of
nfs_find_open_context().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do not register statically allocated input devices to prevent
OOPS when attaching input interfaces since it requires class
device to be properly initialized.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This is now used to issue a delayed allocation flush before reporting
quota, which allows the used space quota report to match reality.
Signed-off-by: Nathan Scott <nathans@sgi.com>
Fix up etherdevice docbook comments and make them (and other networking stuff)
get dragged into the kernel-api. Delete the old 8390 stuff, it really isn't
interesting anymore.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Optimize the match for broadcast address by using bit operations instead
of comparison. This saves a number of conditional branches, and generates
smaller code.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).
Signed-off-by: Jens Axboe <axboe@suse.de>
Like ip_tables already has it for some time, this adds support for
having multiple revisions for each match/target. We steal one byte from
the name in order to accomodate a 8 bit version number.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch simplifies some checks for magic siginfo values. It should not
change the behaviour in any way.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplify the UP (1 CPU) implementatin of set_cpus_allowed.
The one CPU is hardcoded to be cpu 0 - so just test for that bit, and avoid
having to pick up the cpu_online_map.
Also, unexport cpu_online_map: it was only needed for set_cpus_allowed().
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).
This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.
This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparse complains about every MODULE_PARM used in a module: warning: symbol
'__parm_foo' was not declared. Should it be static?
The fix is to split declaration and initialization. While MODULE_PARM is
obsolete, it's not something sparse should report.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Setting ctime is implicit in all setattr cases, so the FATTR_CTIME
definition is unnecessary.
It is used by neither the kernel nor by userspace.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch adds LSM hooks for key management facilities. The notable
changes are:
(1) The key struct now supports a security pointer for the use of security
modules. This will permit key labelling and restrictions on which
programs may access a key.
(2) Security modules get a chance to note (or abort) the allocation of a key.
(3) The key permission checking can now be enhanced by the security modules;
the permissions check consults LSM if all other checks bear out.
(4) The key permissions checking functions now return an error code rather
than a boolean value.
(5) An extra permission has been added to govern the modification of
attributes (UID, GID, permissions).
Note that there isn't an LSM hook specifically for each keyctl() operation,
but rather the permissions hook allows control of individual operations based
on the permission request bits.
Key management access control through LSM is enabled by automatically if both
CONFIG_KEYS and CONFIG_SECURITY are enabled.
This should be applied on top of the patch ensubjected:
[PATCH] Keys: Possessor permissions should be additive
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch automatically updates a tasks NUMA mempolicy when its cpuset
memory placement changes. It does so within the context of the task,
without any need to support low level external mempolicy manipulation.
If a system is not using cpusets, or if running on a system with just the
root (all-encompassing) cpuset, then this remap is a no-op. Only when a
task is moved between cpusets, or a cpusets memory placement is changed
does the following apply. Otherwise, the main routine below,
rebind_policy() is not even called.
When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
essential role of cpusets is to place jobs (several related tasks) on a set
of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
manage a jobs processor placement within its allowed cpuset, and the
essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
memory placement within its allowed cpuset.
However, CPU affinity and NUMA memory placement are managed within the
kernel using absolute system wide numbering, not cpuset relative numbering.
This is ok until a job is migrated to a different cpuset, or what's the
same, a jobs cpuset is moved to different CPUs and Memory Nodes.
Then the CPU affinity and NUMA memory placement of the tasks in the job
need to be updated, to preserve their cpuset-relative position. This can
be done for CPU affinity using sched_setaffinity() from user code, as one
task can modify anothers CPU affinity. This cannot be done from an
external task for NUMA memory placement, as that can only be modified in
the context of the task using it.
However, it easy enough to remap a tasks NUMA mempolicy automatically when
a task is migrated, using the existing cpuset mechanism to trigger a
refresh of a tasks memory placement after its cpuset has changed. All that
is needed is the old and new nodemask, and notice to the task that it needs
to rebind its mempolicy. The tasks mems_allowed has the old mask, the
tasks cpuset has the new mask, and the existing
cpuset_update_current_mems_allowed() mechanism provides the notice. The
bitmap/cpumask/nodemask remap operators provide the cpuset relative
calculations.
This patch leaves open a couple of issues:
1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
These mempolicies may reference nodes outside of those allowed to
the current task by its cpuset. Tasks are migrated as part of jobs,
which reside on what might be several cpusets in a subtree. When such
a job is migrated, all NUMA memory policy references to nodes within
that cpuset subtree should be translated, and references to any nodes
outside that subtree should be left untouched. A future patch will
provide the cpuset mechanism needed to mark such subtrees. With that
patch, we will be able to correctly migrate these other memory policies
across a job migration.
2) Updating cpuset, affinity and memory policies in user space:
This is harder. Any placement state stored in user space using
system-wide numbering will be invalidated across a migration. More
work will be required to provide user code with a migration-safe means
to manage its cpuset relative placement, while preserving the current
API's that pass system wide numbers, not cpuset relative numbers across
the kernel-user boundary.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the forthcoming task migration support, a key calculation will be
mapping cpu and node numbers from the old set to the new set while
preserving cpuset-relative offset.
For example, if a task and its pages on nodes 8-11 are being migrated to
nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be
moved to node 25 (the 2nd node in the new set.)
As with other bitmap operations, the proper way to code this is to provide
the underlying calculation in lib/bitmap.c, and then to provide the usual
cpumask and nodemask wrappers.
This patch provides that. These operations are termed 'remap' operations.
Both remapping a single bit and a set of bits is supported.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overhaul cpuset locking. Replace single semaphore with two semaphores.
The suggestion to use two locks was made by Roman Zippel.
Both locks are global. Code that wants to modify cpusets must first
acquire the exclusive manage_sem, which allows them read-only access to
cpusets, and holds off other would-be modifiers. Before making actual
changes, the second semaphore, callback_sem must be acquired as well. Code
that needs only to query cpusets must acquire callback_sem, which is also a
global exclusive lock.
The earlier problems with double tripping are avoided, because it is
allowed for holders of manage_sem to nest the second callback_sem lock, and
only callback_sem is needed by code called from within __alloc_pages(),
where the double tripping had been possible.
This is not quite the same as a normal read/write semaphore, because
obtaining read-only access with intent to change must hold off other such
attempts, while allowing read-only access w/o such intention. Changing
cpusets involves several related checks and changes, which must be done
while allowing read-only queries (to avoid the double trip), but while
ensuring nothing changes (holding off other would be modifiers.)
This overhaul of cpuset locking also makes careful use of task_lock() to
guard access to the task->cpuset pointer, closing a couple of race
conditions noticed while reading this code (thanks, Roman). I've never
seen these races fail in any use or test.
See further the comments in the code.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the recent timer rework we lost the check for an add_timer() of an
already-pending timer. That check was useful for networking, so put it back.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure we always return, as all syscalls should. Also move the common
prototype to <linux/syscalls.h>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lock is used in sigqueue_free(), but it is always equal to
current->sighand->siglock, so we don't need to keep it in the struct
sigqueue.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
file_free_rcu) time, to reduce the size of this critical kernel object.
The trick I used is to move f_rcuhead and f_list in an union called f_u
The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
becomes f_u.f_list
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove timer_list.magic and associated debugging code.
I originally added this when a spinlock was added to timer_list - this meant
that an all-zeroes timer became illegal and init_timer() was required.
That spinlock isn't even there any more, although timer.base must now be
initialised.
I'll keep this debugging code in -mm.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create a macro shift_right() that avoids the numerous ugly conditionals in the
NTP code that look like:
if(a < 0)
b = -(-a >> shift);
else
b = a >> shift;
Replacing it with:
b = shift_right(a, shift);
This should have zero effect on the logic, however it should probably have
a bit of testing just to be sure.
Also replace open-coded min/max with the macros.
Signed-off-by : John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enhance the kthread API by adding kthread_stop_sem, for use in stopping
threads that spend their idle time waiting on a semaphore.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every user of init_timer() also needs to initialize ->function and ->data
fields. This patch adds a simple setup_timer() helper for that.
The schedule_timeout() is patched as an example of usage.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the problem (BUG 4964) with unmapped buffers in transaction's
t_sync_data list. The problem is we need to call filesystem's own
invalidatepage() from block_write_full_page().
block_write_full_page() must call filesystem's invalidatepage(). Otherwise
following nasty race can happen:
proc 1 proc 2
------ ------
- write some new data to 'offset'
=> bh gets to the transactions data list
- starts truncate
=> i_size set to new size
- mpage_writepages()
- ext3_ordered_writepage() to 'offset'
- block_write_full_page()
- page->index > end_index+1
- block_invalidatepage()
- discard_buffer()
- clear_buffer_mapped()
- commit triggers and finds unmapped buffer - BOOM!
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add pm_ops.valid callback, so only the available pm states show in
/sys/power/state. And this also makes an earlier states error report at
enter_state before we do actual suspend/resume.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Acked-by: Pavel Machek<pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch makes swsusp use the PG_nosave and PG_nosave_free flags to
mark pages that should be freed in case of an error during resume.
This allows us to simplify the code and to use swsusp_free() in all of the
swsusp's resume error paths, which makes them actually work.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch moves the functionality of swsusp related to creating and
handling the snapshot of memory to a separate file, snapshot.c
This should enable us to untangle the code in the future and eventually to
implement some parts of swsusp.c in the user space.
The patch does not change the code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some modules creating sysfs entries under /sys/devices/system/cpu/cpuX/
need to know the parent sysfs entry to make devices under them. This will
just return the sysfs entry for a given cpu.
sysfs entries showing under each cpu sysfs can be easily created if such
entries can be created by registering a sysfs driver for cpuclass. The
issue is when the entry is created the CPU may not be online, hence we
would need to defer the creation until the online notification comes.
Current users: cache entries for Intel CPU's and cpufreq subsystem.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows SELinux to canonicalize the value returned from
getxattr() via the security_inode_getsecurity() hook, which is called after
the fs level getxattr() function.
The purpose of this is to allow the in-core security context for an inode
to override the on-disk value. This could happen in cases such as
upgrading a system to a different labeling form (e.g. standard SELinux to
MLS) without needing to do a full relabel of the filesystem.
In such cases, we want getxattr() to return the canonical security context
that the kernel is using rather than what is stored on disk.
The implementation hooks into the inode_getsecurity(), adding another
parameter to indicate the result of the preceding fs-level getxattr() call,
so that SELinux knows whether to compare a value obtained from disk with
the kernel value.
We also now allow getxattr() to work for mountpoint labeled filesystems
(i.e. mount with option context=foo_t), as we are able to return the
kernel value to the user.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add CONFIG_X86_32 for i386. This allows selecting options that only apply
to 32-bit systems.
(X86 && !X86_64) becomes X86_32
(X86 || X86_64) becomes X86
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The second argument to ata_qc_complete() was being used for two
purposes: communicate the ATA Status register to the completion
function, and indicate an error. On legacy PCI IDE hardware, the latter
is often implicit in the former. On more modern hardware, the driver
often completely emulated a Status register value, passing ATA_ERR as an
indication that something went wrong.
Now that previous code changes have eliminated the need to use drv_stat
arg to communicate the ATA Status register value, we can convert it to a
mask of possible error classes.
This will lead to more flexible error handling in the future.
This adds generic memory add/remove and supporting functions for memory
hotplug into a new file as well as a memory hotplug kernel config option.
Individual architecture patches will follow.
For now, disable memory hotplug when swsusp is enabled. There's a lot of
churn there right now. We'll fix it up properly once it calms down.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
See the "fixup bad_range()" patch for more information, but this actually
creates a the lock to protect things making assumptions about a zone's size
staying constant at runtime.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pgdat->node_size_lock is basically only neeeded in one place in the normal
code: show_mem(), which is the arch-specific sysrq-m printing function.
Strictly speaking, the architectures not doing memory hotplug do no need this
locking in show_mem(). However, they are all included for completeness. This
should also make any future consolidation of all of the implementations a
little more straightforward.
This lock is also held in the sparsemem code during a memory removal, as
sections are invalidated. This is the place there pfn_valid() is made false
for a memory area that's being removed. The lock is only required when doing
pfn_valid() operations on memory which the user does not already have a
reference on the page, such as in show_mem().
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A little helper that we use in the hotplug code.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updated several references to page_table_lock in common code comments.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of oddities were guarded by page_table_lock, no longer properly
guarded when that is split.
The mm_counters of file_rss and anon_rss: make those an atomic_t, or an
atomic64_t if the architecture supports it, in such a case. Definitions by
courtesy of Christoph Lameter: who spent considerable effort on more scalable
ways of counting, but found insufficient benefit in practice.
And adding an mm with swap to the mmlist for swapoff: the list is well-
guarded by its own lock, but the list_empty check now has to be repeated
inside it.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Final step in pushing down common core's page_table_lock. follow_page no
longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
and so no page_table_lock is taken in get_user_pages itself.
But get_user_pages (and get_futex_key) do then need follow_page to pin the
page for them: take Daniel's suggestion of bitflags to follow_page.
Need one for WRITE, another for TOUCH (it was the accessed flag before:
vanished along with check_user_page_readable, but surely get_numa_maps is
wrong to mark every page it finds as accessed), another for GET.
And another, ANON to dispose of untouched_anonymous_page: it seems silly for
that to descend a second time, let follow_page observe if there was no page
table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED -
make_pages_present ought to make readonly anonymous present.
Give get_numa_maps a cond_resched while we're there.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
check_user_page_readable is a problematic variant of follow_page. It's used
only by oprofile's i386 and arm backtrace code, at interrupt time, to
establish whether a userspace stackframe is currently readable.
This is problematic, because we want to push the page_table_lock down inside
follow_page, and later split it; whereas oprofile is doing a spin_trylock on
it (in the i386 case, forgotten in the arm case), and needs that to pin
perhaps two pages spanned by the stackframe (which might be covered by
different locks when we split).
I think oprofile is going about this in the wrong way: it doesn't need to know
the area is readable (neither i386 nor arm uses read protection of user
pages), it doesn't need to pin the memory, it should simply
__copy_from_user_inatomic, and see if that succeeds or not. Sorry, but I've
not got around to devising the sparse __user annotations for this.
Then we can eliminate check_user_page_readable, and return to a single
follow_page without the __follow_page variants.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
rmap's page_check_address descend without page_table_lock. First just
pte_offset_map in case there's no pte present worth locking for, then take
page_table_lock for the full check, and pass ptl back to caller in the same
style as pte_offset_map_lock. __xip_unmap, page_referenced_one and
try_to_unmap_one use pte_unmap_unlock. try_to_unmap_cluster also.
page_check_address reformatted to avoid progressive indentation. No use is
made of its one error code, return NULL when it fails.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the page_table_lock from around the calls to unmap_vmas, and replace
the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
now safe to descend without page_table_lock.
Don't attempt fancy locking for hugepages, just take page_table_lock in
unmap_hugepage_range. Which makes zap_hugepage_range, and the hugetlb test in
zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway. Nor
does unmap_vmas have much use for its mm arg now.
The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
page_table_lock: if they're implemented at all, they typically come down to
flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
(which we already audited for the mprotect case).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Second step in pushing down the page_table_lock. Remove the temporary
bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
to hold page_table_lock, whether it's on init_mm or a user mm; take
page_table_lock internally to check if a racing task already allocated.
Convert their callers from common code. But avoid coming back to change them
again later: instead of moving the spin_lock(&mm->page_table_lock) down,
switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
encapsulate the mapping+locking and unlocking+unmapping together, and in the
end may use alternatives to the mm page_table_lock itself.
These callers all hold mmap_sem (some exclusively, some not), so at no level
can a page table be whipped away from beneath them; and pte_alloc uses the
"atomic" pmd_present to test whether it needs to allocate. It appears that on
all arches we can safely descend without page_table_lock.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems odd to me that, whereas pud_alloc and pmd_alloc test inline, only
calling out-of-line __pud_alloc __pmd_alloc if allocation needed,
pte_alloc_map and pte_alloc_kernel are entirely out-of-line. Though it does
add a little to kernel size, change them to macros testing inline, calling
__pte_alloc or __pte_alloc_kernel to allocate out-of-line. Mark none of them
as fastcalls, leave that to CONFIG_REGPARM or not.
It also seems more natural for the out-of-line functions to leave the offset
calculation and map to the inline, which has to do it anyway for the common
case. At least mremap move wants __pte_alloc without _map.
Macros rather than inline functions, certainly to avoid the header file issues
which arise from CONFIG_HIGHPTE needing kmap_types.h, but also in case any
architectures I haven't built would have other such problems.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
First step in pushing down the page_table_lock. init_mm.page_table_lock has
been used throughout the architectures (usually for ioremap): not to serialize
kernel address space allocation (that's usually vmlist_lock), but because
pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
and drop it when allocating a new one, to check lest a racing task already
did. Similarly no page_table_lock in vmalloc's map_vm_area.
Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
user mms, which are converted only by a later patch, for now they have to lock
differently according to whether or not it's init_mm.
If sources get muddled, there's a danger that an arch source taking
init_mm.page_table_lock will be mixed with common source also taking it (or
neither take it). So break the rules and make another change, which should
break the build for such a mismatch: remove the redundant mm arg from
pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
took page_table_lock for no good reason.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia64 has expand_backing_store function for growing its Register Backing Store
vma upwards. But more complete code for this purpose is found in the
CONFIG_STACK_GROWSUP part of mm/mmap.c. Uglify its #ifdefs further to provide
expand_upwards for ia64 as well as expand_stack for parisc.
The Register Backing Store vma should be marked VM_ACCOUNT. Implement the
intention of growing it only a page at a time, instead of passing an address
outside of the vma to handle_mm_fault, with unknown consequences.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Slight and timid rearrangement of mm_struct: hiwater_rss and hiwater_vm were
tacked on the end, but it seems better to keep them near _file_rss, _anon_rss
and total_vm, in the same cacheline on those arches verified.
There are likely to be more profitable rearrangements, but less obvious (is it
good or bad that saved_auxv[AT_VECTOR_SIZE] isolates cpu_vm_mask and context
from many others?), needing serious instrumentation.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.
Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.
And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).
There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.
What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I was lazy when we added anon_rss, and chose to change as few places as
possible. So currently each anonymous page has to be counted twice, in rss
and in anon_rss. Which won't be so good if those are atomic counts in some
configurations.
Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics. And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Divide remove_vm_struct into two parts: first anon_vma_unlink plus
unlink_file_vma, to unlink the vma from the list and tree by which rmap or
vmtruncate might find it; then remove_vma to close, fput and free.
The intention here is to do the anon_vma_unlink and unlink_file_vma earlier,
in free_pgtables before freeing any page tables: so we can be sure that any
page tables traversed by rmap and vmtruncate are stable (and other, ordinary
cases are stabilized by holding mmap_sem).
This will be crucial to traversing pgd,pud,pmd without page_table_lock. But
testing the split-out patch showed that lifting the page_table_lock is
symbiotically necessary to make this change - the lock ordering is wrong to
move those unlinks into free_pgtables while it's under ptlock.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The original vm_stat_account has fallen into disuse, with only one user, and
only one user of vm_stat_unaccount. It's easier to keep track if we convert
them all to __vm_stat_account, then free it from its __shackles.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NUMA policy code predated nodemask_t so it used open coded bitmaps.
Convert everything to nodemask_t. Big patch, but shouldn't have any actual
behaviour changes (except I removed one unnecessary check against
node_online_map and one unnecessary BUG_ON)
Signed-off-by: "Andi Kleen" <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add sem_is_read/write_locked functions to the read/write semaphores, along the
same lines of the *_is_locked spinlock functions. The swap token tuning patch
uses sem_is_read_locked; sem_is_write_locked is added for completeness.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds
vmalloc_node(size, node) -> Allocate necessary memory on the specified node
and
get_vm_area_node(size, flags, node)
and the other functions that it depends on.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sg_init_one is a nice tool for the block layer. However, users
of struct scatterlist in other subsystems don't usually need the
DMA attributes. For them it's a waste of time and space to
initialise the whole struct scatterlist structure.
Therefore this patch adds a new function sg_set_buf to initialise
a scatterlist without zeroing the DMA attributes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- converted to platform bus
- removed pci dependencies
- removed virt_to_phys/phys_to_virt calls
System now can root off of a disk.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README
new file mode 100644
Convert everyone who uses platform_bus_type to include
linux/platform_device.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
zutil.h does not need errno.h
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch updates the 85xx platform code to support the new PHY Layer.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Kumar Gala <Kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Expose faster ether compare for use by protocols and other
driver. And change name to be more consistent with other ether
address manipulation routines in same file
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
With CONFIG_PM=n:
drivers/built-in.o(.text+0x1098c): In function `hub_thread':
drivers/usb/core/hub.c:2673: undefined reference to `.dpm_runtime_resume'
drivers/built-in.o(.text+0x10998):drivers/usb/core/hub.c:2674: undefined reference to `.dpm_runtime_resume'
Please, never ever ever put extern decls into .c files. Use the darn header
files :(
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as592) makes a few small improvements to the way device
strings are handled, and it fixes some bugs in a couple of other sysfs
attribute routines. (Look at show_configuration_string() to see what I
mean.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I can't stand text lines that wrap-around in my 80-column windows. This
patch (as589) makes cosmetic changes to a couple of source files.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This revised patch (as587b) improves the implementation of USB endpoint
sysfs files. Instead of storing a whole bunch of attributes for every
single endpoint, each endpoint now gets its own kobject and they can
share a static list of attributes. The number of extra fields added to
struct usb_host_endpoint has been reduced from 4 to 1.
The bEndpointAddress field is retained even though it is redundant (it
repeats the same information as the attributes' directory name). The
code avoids calling kobject_register, to prevent generating unwanted
hotplug events.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This revised patch (as586b) makes usb-handoff permanently true and no
longer a kernel boot parameter. It also removes the piix3_usb quirk code;
that was nothing more than an early version of the USB handoff code
(written at a time when Intel's PIIX3 was about the only motherboard with
USB support). And it adds identifiers for the three PCI USB controller
classes to pci_ids.h.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This should let us get rid of all of the different hooks in the USB core for
when something has changed.
Also, some other parts of the kernel have wanted to know this kind of
information at times.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a USB device is put into suspend mode, the current drawn from VBUS
has to be less than 500 uA. Some transceivers need to be put into a
special power-saving mode to accomplish this, and won't have a separate
OTG driver handling that.
This adds a suspend method to the "otg_transceiver" struct -- misnamed,
it's not only for OTG -- and calls it from the OMAP UDC driver.
Signed-off-by: Juha Yrj?l? <juha.yrjola@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The way we're looking at USB suspend lately doesn't expect drivers to
call usb_suspend_device() or usb_resume_device() directly; that'll
be implicit when no interfaces are in use.
This patch removes those APIs from visibility outside usbcore.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
drivers/usb/core/hub.c | 12 ++++--------
drivers/usb/core/usb.h | 4 ++++
include/linux/usb.h | 5 -----
3 files changed, 8 insertions(+), 13 deletions(-)
This saves a word from "struct device" ... there's a refcounting mechanism
stub that's rather ineffective (the values are never even tested!), which
can safely be deleted. With this patch it uses normal device refcounting,
so any potential users of the pm_parent mechanism will be more correct.
(That mechanism is actually unusable for now though; it does nothing.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/base/power/main.c | 26 +++-----------------------
include/linux/pm.h | 1 -
2 files changed, 3 insertions(+), 24 deletions(-)
This patch removes the extra usb_suspend_device() parameter. The original
reason to pass that parameter was so that this routine could suspend any
active children. A previous patch removed that functionality ... leaving
no reason to pass the parameter. A close analogy is pci_set_power_state,
which doesn't need a pm_message_t either.
On the internal code path that comes through the driver model, the parameter
is now used to distinguish cases where USB devices need to "freeze" but not
suspend. It also checks for an error case that's accessible through sysfs:
attempting to suspend a device before its interfaces (or for hubs, ports).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/usb/core/hub.c | 34 +++++++++++++++++++++-------------
drivers/usb/core/usb.c | 23 +++++++++++++++++++++--
drivers/usb/host/ehci-hcd.c | 2 +-
drivers/usb/host/isp116x-hcd.c | 2 +-
drivers/usb/host/ohci-pci.c | 2 +-
include/linux/usb.h | 2 +-
6 files changed, 46 insertions(+), 19 deletions(-)
This patch adds endpoint information for both devices and interfaces to
sysfs. Previously it was only possible to get the endpoint information
from usbfs, and never possible to get any information on endpoint 0.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/usb/core/sysfs.c | 195 ++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/usb.h | 4
2 files changed, 197 insertions(+), 2 deletions(-)
I told you that the pci_ids.h cleanup was a bad idea ;)
Cc: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pci_ids.h cleanup: remove duplicated entries and change some defines to
explicit value rather than in terms of another constant, preparation for
removing unused symbols
Signed-off-by: Grant Coady <gcoady@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
include/linux/pci_ids.h | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
Some PCI adapters (eg. ipr scsi adapters) have an exposure today in that they
issue BIST to the adapter to reset the card. If, during the time it takes to
complete BIST, userspace attempts to access PCI config space, the host bus
bridge will master abort the access since the ipr adapter does not respond on
the PCI bus for a brief period of time when running BIST. On PPC64 hardware,
this master abort results in the host PCI bridge isolating that PCI device
from the rest of the system, making the device unusable until Linux is
rebooted. This patch is an attempt to close that exposure by introducing some
blocking code in the PCI code. When blocked, writes will be humored and reads
will return the cached value. Ben Herrenschmidt has also mentioned that he
plans to use this in PPC power management.
Signed-off-by: Brian King <brking@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/pci/access.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci-sysfs.c | 20 +++++-----
drivers/pci/pci.h | 7 +++
drivers/pci/proc.c | 28 +++++++--------
drivers/pci/syscall.c | 14 +++----
include/linux/pci.h | 7 +++
6 files changed, 134 insertions(+), 31 deletions(-)
The new SMBus PEC implementation doesn't support PEC emulation on
non-PEC non-I2C SMBus masters, so we can drop all related code.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is my rewrite of the SMBus PEC support. The original
implementation was known to have bugs (credits go to Hideki Iwamoto
for reporting many of them recently), and was incomplete due to a
conceptual limitation.
The rewrite affects only software PEC. Hardware PEC needs very little
code and is mostly untouched.
Technically, both implementations differ in that the original one
was emulating PEC in software by modifying the contents of an
i2c_smbus_data union (changing the transaction to a different type),
while the new one works one level lower, on i2c_msg structures (working
on message contents). Due to the definition of the i2c_smbus_data union,
not all SMBus transactions could be handled (at least not without
changing the definition of this union, which would break user-space
compatibility), and those which could had to be implemented
individually. At the opposite, adding PEC to an i2c_msg structure
can be done on any SMBus transaction with common code.
Advantages of the new implementation:
* It's about twice as small (from ~136 lines before to ~70 now, only
counting i2c-core, including blank and comment lines). The memory
used by i2c-core is down by ~640 bytes (~3.5%).
* Easier to validate, less tricky code. The code being common to all
transactions by design, the risk that a bug can stay uncovered is
lower.
* All SMBus transactions have PEC support in I2C emulation mode
(providing the non-PEC transaction is also implemented). Transactions
which have no emulation code right now will get PEC support for free
when they finally get implemented.
* Allows for code simplifications in header files and bus drivers
(patch follows).
Drawbacks (I guess there had to be at least one):
* PEC emulation for non-PEC capable non-I2C SMBus masters was dropped.
It was based on SMBus tricks and doesn't quite fit in the new design.
I don't think it's really a problem, as the benefit was certainly
not worth the additional complexity, but it's only fair that I at
least mention it.
Lastly, let's note that the new implementation does slightly affect
compatibility (both in kernel and user-space), but doesn't actually
break it. Some defines will be dropped, but the code can always be
changed in a way that will work with both the old and the new
implementations. It shouldn't be a problem as there doesn't seem to be
many users of SMBus PEC to date anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Discard I2C_FUNC_SMBUS_*_PEC defines. i2c clients are not supposed to
check for PEC support of i2c bus drivers on individual SMBus
transactions, and i2c bus drivers are not supposed to advertise them.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Drop unused i2c-over-parallel-port i2c IDs:
* I2C_HW_B_LPC was never actually used as far as I could search.
* I2C_HW_B_ELV and I2C_HW_B_VELLE are no more used since the
introduction of the unified i2c-parport driver in Linux 2.6.2.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
New driver for the Xicor X1205 RTC chip.
Signed-off-by: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Drop I2C_SMBUS_I2C_BLOCK_MAX, use I2C_SMBUS_BLOCK_MAX instead.
I2C_SMBUS_I2C_BLOCK_MAX has always been defined to the same value as
I2C_SMBUS_BLOCK_MAX, and this will never change: setting it to a lower
value would make no sense, setting it to a higher value would break
i2c_smbus_data compatibility. There is no point in changing
i2c_smbus_data to support larger block transactions in SMBus mode, as
no SMBus hardware supports more than 32 byte blocks. Thus, for larger
transactions, direct I2C transfers are the way to go.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are no more per-i2c-algorithm adapter max. Last time there were
was in July 1999.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Delete 2 out-of-date, colliding ioctl defines. I2C_UDELAY and
I2C_MDELAY are supposed to be used by i2c-algo-bit, but actually
aren't (and I suspect never were). Moreover, their values are the same
as I2C_FUNCS and I2C_SLAVE_FORCE, respectively, which *are* widely
used.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a misplaced comment in i2c.h. Spotted by Hideki Iwamoto.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CVS revision IDs are totally useless and irrelevant by now.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The i2c_smbus_data union block member has a comment stating that an
extra byte is required for SMBus Block Process Call transactions. This
has been true for three weeks around June 2002, but no more since, so
it is about time that we drop this comment and fix the definition.
From: Hideki Iwamoto <h-iwamoto@kit.hi-ho.ne.jp>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
include/linux/i2c.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Add complete support for 5714/5715. These chips are very similar to
5780 so the changes are very trivial. A TG3_FLG2_5780_CLASS flag is
added to identify these chips.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.
1. This patch incorporate the review comments by Jeff Garzik.
2. Renamed USO as UFO (UDP Fragmentation Offload)
3. udp sendfile support with UFO
This patches uses scatter-gather feature of skb to generate large UDP
datagram. Below is a "how-to" on changes required in network device
driver to use the UFO interface.
UDP Fragmentation Offload (UFO) Interface:
-------------------------------------------
UFO is a feature wherein the Linux kernel network stack will offload the
IP fragmentation functionality of large UDP datagram to hardware. This
will reduce the overhead of stack in fragmenting the large UDP datagram to
MTU sized packets
1) Drivers indicate their capability of UFO using
dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG
NETIF_F_HW_CSUM is required for UFO over ipv6.
2) UFO packet will be submitted for transmission using driver xmit routine.
UFO packet will have a non-zero value for
"skb_shinfo(skb)->ufo_size"
skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
fragment going out of the adapter after IP fragmentation by hardware.
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
For IPV6 the UFO provides the fragment identification-id in
skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
IPv6 fragments.
Signed-off-by: Ananda Raju <ananda.raju@neterion.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
In PM v1, all devices were called at SUSPEND_DISABLE level. Then
all devices were called at SUSPEND_SAVE_STATE level, and finally
SUSPEND_POWER_DOWN level. However, with PM v2, to maintain
compatibility for platform devices, I arranged for the PM v2
suspend/resume callbacks to call the old PM v1 suspend/resume
callbacks three times with each level in order so that existing
drivers continued to work.
Since this is obsolete infrastructure which is no longer necessary,
we can remove it. Here's an (untested) patch to do exactly that.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch allows struct class_device to be nested, so that another
struct class_device can be the parent of a new one, instead of only
having the struct class be the parent. This will allow us to
(hopefully) fix up the input and video class subsystem mess.
But please people, don't go crazy and start making huge trees of class
devices, you should only need 2 levels deep to get everything to work
(remember to use a class_interface to get notification of a new class
device being added to the system.)
Oh, this also allows us to have the possibility of potentially, someday,
moving /sys/block into /sys/class. The main hindrance is that pesky
/dev numberspace issue...
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A "coldplug + udevstart" can be simple like this:
for i in /sys/block/*/*/uevent; do echo 1 > $i; done
for i in /sys/class/*/*/uevent; do echo 1 > $i; done
for i in /sys/bus/*/devices/*/uevent; do echo 1 > $i; done
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver core: pass interface to class intreface methods
Pass interface as argument to add() and remove() class interface
methods. This way a subsystem can implement generic add/remove
handlers and then call interface-specific ones.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I2O: cleanup - remove i2o_device_class
I2O devices reside on their own bus so there should be no reason
to also have i2c_device class that mirros i2o bus.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a refresh of an earlier patch to add "wakeup" support to the
PM core model. This provides per-device bus-neutral control of the
use of wakeup events.
* "struct device_pm_info" has two bits that are initialized as
part of setting up the enclosing struct device:
- "can_wakeup", reflecting hardware capabilities
- "may_wakeup", the policy setting (when CONFIG_PM)
* There's a writeable sysfs "wakeup" file, with one of two values:
- "enabled", when the policy is to allow wakeup
- "disabled", when the policy is not to allow it
- "" if the device can't currently issue wakeups
By default, wakeup is enabled on all devices that support it. If its
driver doesn't support it ... treat it as a bug. :)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch from Erik Hovland
I noticed that the same typo (i before c in associated) showed up twice
in the file kernel/include/linux/mmc/mmc.h.
This patch fixes both of the instances I found with this mistake. The
typos are in comments and should have no affect on working code.
E
Signed-off-by: Erik Hovland <erik@hovland.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...
One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Beginning of gfp_t annotations:
- -Wbitwise added to CHECKFLAGS
- old __bitwise renamed to __bitwise__
- __bitwise defined to either __bitwise__ or nothing, depending on
__CHECK_ENDIAN__ being defined
- gfp_t switched from __nocast to __bitwise__
- force cast to gfp_t added to __GFP_... constants
- new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
the result to int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch kills max_back_kb handling from elv_dispatch_sort() and
kills max_back_kb field from struct request_queue.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge. This
and the following patches move last_merge handling into
generic elevator code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Optimise attribute revalidation when hardlinking. Add post-op attributes
for the directory and the original inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
"Optional" means that the close call will not fail if the getattr
at the end of the compound fails.
If it does succeed, try to refresh inode attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since the directory attributes change every time we CREATE a file,
we might as well pick up the new directory attributes in the same
compound.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since we almost always call nfs_end_data_update() after we called
nfs_refresh_inode(), we now end up marking the inode metadata
as needing revalidation immediately after having updated it.
This patch rearranges things so that we mark the inode as needing
revalidation _before_ we call nfs_refresh_inode() on those operations
that need it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow nfs_refresh_inode() also to update attributes on the inode if the
RPC call was sent after the last call to nfs_update_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add kernel-doc to skbuff.h, skbuff.c to eliminate kernel-doc warnings.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Add the new ID 0x132a and configure the new PCI Diva console port. This
device supports only 1 single console UART.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested by Wolfgang Denk with this device:
00:0f.0 Network controller: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 01)
Subsystem: Exsys EX-4055 4S(16C550) RS-232
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 10
Region 0: Memory at 80100000 (32-bit, non-prefetchable) [size=128]
Region 1: I/O ports at 7080 [size=128]
Region 2: I/O ports at 7400 [size=32]
00:0f.0 Class 0280: 10b5:9050 (rev 01)
Subsystem: d84d:4055
Results with this patch:
Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
PCI: Found IRQ 10 for device 0000:00:0f.0
ttyS4 at I/O 0x7400 (irq = 10) is a 16550A
ttyS5 at I/O 0x7408 (irq = 10) is a 16550A
ttyS6 at I/O 0x7410 (irq = 10) is a 16550A
ttyS7 at I/O 0x7418 (irq = 10) is a 16550A
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>
IDR trees include a cache of idr_layer objects. There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.
Add and use idr_destroy() for this. v9fs and infiniband also need to use
idr_destroy() to avoid leaks.
Or, we make the cache global, like radix_tree_preload(). Which is probably
better. Later.
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update drivers to new input layer changes.
Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>
Reorder code in gscps2_interrupt() and only enable ports when opened.
This fixes issues with hangs booting an SMP kernel on my C360.
Previously serio_interrupt() could be called before the lock in
struct serio was initialised.
Signed-off-by: Richard Hirst <rhirst@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
This is needed for full AMD and VIA drivers and possibly more. Functions
to turn actual clocking and cycle timings into register values. Also to
merge shared timings to compute an optimal timing set.
Built from the drivers/ide version by Vojtech Pavlik
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This reverts commit 3359b54c8c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.
We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.
The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator. But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
cleanup.
With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hugetlb pages are currently pre-faulted. At the time of mmap of
hugepages, we populate the new PTEs. It is possible that HW has already
cached some of the unused PTEs internally. These stale entries never
get a chance to be purged in existing control flow.
This patch extends the check in page fault code for hugepages. Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
[ This is apparently arguably an ia64 port bug. But the code won't
hurt, and for now it fixes a real problem on some ia64 machines ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Not only are the qop parameters that are passed around throughout the gssapi
unused by any currently implemented mechanism, but there appears to be some
doubt as to whether they will ever be used. Let's just kill them off for now.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add support for privacy to the krb5 rpcsec_gss mechanism.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The code this was originally derived from processed wrap and mic tokens using
the same functions. This required some contortions, and more would be required
with the addition of xdr_buf's, so it's better to separate out the two code
paths.
In preparation for adding privacy support, remove the last vestiges of the
old wrap token code.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Various xdr encode routines use au_rslack to guess where the reply argument
will end up, so we can set up the xdr_buf to recieve data into the right place
for zero copy.
Currently we calculate the au_rslack estimate when we check the verifier.
Normally this only depends on the verifier size. In the integrity case we add
a few bytes to allow for a length and sequence number.
It's a bit simpler to calculate only the verifier size when we check the
verifier, and delay the full calculation till we unwrap.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For privacy, we need to allocate pages to store the encrypted data (passed
in pages can't be used without the risk of corrupting data in the page cache).
So we need a way to free that memory after the request has been transmitted.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add support for privacy to generic gss-api code. This is dead code until we
have both a mechanism that supports privacy and code in the client or server
that uses it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make NFSv4 return the fully initialized file pointer with the
stateid that it created in the lookup w/intent.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is needed by NFSv4 for atomicity reasons: our open command is in
fact a lookup+open, so we need to be able to propagate open context
information from lookup() into the resulting struct file's
private_data field.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Once the state_owner and lock_owner semaphores get removed, it will be
possible for other OPEN requests to reopen the same file if they have
lower sequence ids than our CLOSE call.
This patch ensures that we recheck the file state once
nfs_wait_on_sequence() has completed waiting.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 file state-changing functions such as OPEN, CLOSE, LOCK,... are all
labelled with "sequence identifiers" in order to prevent the server from
reordering RPC requests, as this could cause its file state to
become out of sync with the client.
Currently the NFS client code enforces this ordering locally using
semaphores to restrict access to structures until the RPC call is done.
This, of course, only works with synchronous RPC calls, since the
user process must first grab the semaphore.
By dropping semaphores, and instead teaching the RPC engine to hold
the RPC calls until they are ready to be sent, we can extend this
process to work nicely with asynchronous RPC calls too.
This patch adds a new list called "rpc_sequence" that defines the order
of the RPC calls to be sent. We add one such list for each state_owner.
When an RPC call is ready to be sent, it checks if it is top of the
rpc_sequence list. If so, it proceeds. If not, it goes back to sleep,
and loops until it hits top of the list.
Once the RPC call has completed, it can then bump the sequence id counter,
and remove itself from the rpc_sequence list, and then wake up the next
sleeper.
Note that the state_owner sequence ids and lock_owner sequence ids are
all indexed to the same rpc_sequence list, so OPEN, LOCK,... requests
are all ordered w.r.t. each other.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, call_encode will cause the entire RPC call to abort if it returns
an error. This is unnecessarily rigid, and gets in the way of attempts
to allow the NFSv4 layer to order RPC calls that carry sequence ids.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- add lba_28_ok() and lba_48_ok() to ata.h.
- check ending block number instead of staring block number.
- use lba_28_ok() for CHS range check
- LBA28/LBA48 optimization
Suggested by Mark Lord and Alan Cox.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
=====
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
- merge ata_prot_to_cmd() and ata_dev_set_protocol() as
ata_rwcmd_protocol()
- pave road for read/write multiple support
- remove usage of pre-cached command and protocol values and call
ata_rwcmd_protocol() instead
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
==============
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
lock_kiocb() was introduced to serialize retrying and cancellation. In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb. Cancel has other problems and
has no significant in-tree users that have been complaining about it. So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes call_rcu() keep track of how many events there are on the RCU
list, and cause a reschedule event when the list gets too long.
This helps keep RCU event lists down.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems that all the list_*_rcu primitives are missing a memory barrier
on the very first dereference. For example,
#define list_for_each_rcu(pos, head) \
for (pos = (head)->next; prefetch(pos->next), pos != (head); \
pos = rcu_dereference(pos->next))
It will go something like:
pos = (head)->next
prefetch(pos->next)
pos != (head)
do stuff
We're missing a barrier here.
pos = rcu_dereference(pos->next)
fetch pos->next
barrier given by rcu_dereference(pos->next)
store pos
Without the missing barrier, the pos->next value may turn out to be stale.
In fact, if "do stuff" were also dereferencing pos and relying on
list_for_each_rcu to provide the barrier then it may also break.
So here is a patch to make sure that we have a barrier for the first
element in the list.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... otherwise, things like alpha and sparc64 break and break
badly. They define cpu_possible_map to something else in smp.h
*AFTER* having included cpumask.h.
If that puppy is a macro, expansion will happen at the actual
caller, when we'd already seen #define cpu_possible_map ... and we will
get the right thing used.
As an inline helper it will be tokenized before we get to that
define and that's it; no matter what we define later, it won't affect
anything. We get modules with dependency on cpu_possible_map instead
of the right symbol (phys_cpu_present_map in case of sparc64), or outright
link errors if they are built-in.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix copy and paste error in jiffies_to_AHZ conversion which leads to wrong
BSD accounting information on alpha and ia64 when
CONFIG_BSD_PROCESS_ACCT_V3 is turned on.
Also update comment to match reorganised header files.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sbastien Bernard.
EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly. That is not necessarily true.
This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.
Signed-off-by: David S. Miller <davem@davemloft.net>
When netpoll is not being used, the macro that
defines the removed routing netpoll_poll_lock
defines the return as zero, but the real
routine returns a `void *`
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow. With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:
CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE
instead of:
CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this #include, __be16 is not defined and userspace programs
will break.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of
struct ipt_nat_info was obsoleted, but the pointer not removed from the
struct.
This patch removes the pointer, thereby yet again shrinking struct
ip_conntrack.
Discovered-by: Rusty Russell <rusty@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e.
host byte order tags, net byte order values) are useless, unless a parser
can determine whether an attribute is nested or not.
This patch steals the highest bit of nfattr.nfa_type to indicate whether
the data payload contains a nested nfattr (1) or not (0).
This will break userspace compatibility, but luckily no kernel with
nfnetlink was released so far.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a process issues an URB from userspace and (starts to) terminate
before the URB comes back, we run into the issue described above. This
is because the urb saves a pointer to "current" when it is posted to the
device, but there's no guarantee that this pointer is still valid
afterwards.
In fact, there are three separate issues:
1) the pointer to "current" can become invalid, since the task could be
completely gone when the URB completion comes back from the device.
2) Even if the saved task pointer is still pointing to a valid task_struct,
task_struct->sighand could have gone meanwhile.
3) Even if the process is perfectly fine, permissions may have changed,
and we can no longer send it a signal.
So what we do instead, is to save the PID and uid's of the process, and
introduce a new kill_proc_info_as_uid() function.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
[ Fixed up types and added symbol exports ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch makes swsusp avoid the possible temporary corruption
of page translation tables during resume on x86-64. This is achieved by
creating a copy of the relevant page tables that will not be modified by
swsusp and can be safely used by it on resume.
The problem is that during resume on x86-64 swsusp may temporarily
corrupt the page tables used for the direct mapping of RAM. If that
happens, a page fault occurs and cannot be handled properly, which leads
to the solid hang of the affected system. This leads to the loss of the
system's state from before suspend and may result in the loss of data or
the corruption of filesystems, so it is a serious issue. Also, it
appears to happen quite often (for me, as often as 50% of the time).
The problem is related to the fact that (at least) one of the PMD
entries used in the direct memory mapping (starting at PAGE_OFFSET)
points to a page table the physical address of which is much greater
than the physical address of the PMD entry itself. Moreover,
unfortunately, the physical address of the page table before suspend
(i.e. the one stored in the suspend image) happens to be different to
the physical address of the corresponding page table used during resume
(i.e. the one that is valid right before swsusp_arch_resume() in
arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is
restored, the "offending" PMD entry gets overwritten, so it does not
point to the right physical address any more (i.e. there's no page
table at the address pointed to by it, because it points to the address
the page table has been at during suspend). Consequently, if the PMD
entry is used later on, and it _is_ used in the process of copying the
image pages, a page fault occurs, but it cannot be handled in the normal
way and the system hangs.
In principle we can call create_resume_mapping() from
swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory
allocations in create_resume_mapping(), resume_pud_mapping(), and
resume_pmd_mapping() must be made carefully so that we use _only_
NosaveFree pages in them (the other pages are overwritten by the loop in
swsusp_arch_resume()). Additionally, we are in atomic context at that
time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations
fails, we should free all of the allocated pages, so we need to trace
them somehow.
All of this is done in the appended patch, except that the functions
populating the page tables are located in arch/x86_64/kernel/suspend.c
rather than in init.c. It may be done in a more elegan way in the
future, with the help of some swsusp patches that are in the works now.
[AK: move some externs into headers, renamed a function]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch splits key permissions checking out of key-ui.h and
moves it into a .c file. It's quite large and called quite a lot, and
it's about to get bigger with the addition of LSM support for keys...
key_any_permission() is also discarded as it's no longer used.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix nocast sparse warnings:
include/linux/textsearch.h:165:57: warning: implicit cast to nocast type
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix implicit nocast warnings in connector code:
drivers/connector/connector.c:102:24: warning: implicit cast to nocast type
drivers/connector/connector.c:114:45: warning: implicit cast to nocast type
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix implicit nocast warnings in atm code:
net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type
drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type
Also use kzalloc() instead of kmalloc().
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This redoes the n_ports logic I proposed before as a bitmask.
ata_pci_init_native_mode is now used with a mask allowing for mixed mode
stuff later on. ata_pci_init_legacy_port is called with port number and
does one port now not two. Instead it is called twice by the ata init
logic which cleans both of them up.
There are stil limits in the original code left over
- IRQ/port mapping for legacy mode should be arch specific values
- You can have one legacy mode IDE adapter per PCI root bridge on some systems
- Doesn't handle mixed mode devices yet (but is now a lot closer to it)
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.
1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().
There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.
This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)
(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
1) First some performance data :
--------------------------------
tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
The most time critical code is :
sk_for_each(sk, node, &head->chain) {
if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
goto hit; /* You sunk my battleship! */
}
The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.
As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.
This can be problematic if some chains are very long.
2) The goal
-----------
The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.
3) Description of the patch
---------------------------
Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.
struct sock_common {
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
int skc_bound_dev_if;
struct hlist_node skc_node;
struct hlist_node skc_bind_node;
atomic_t skc_refcnt;
+ unsigned int skc_hash;
struct proto *skc_prot;
};
Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.
Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)
File include/net/inet_hashtables.h
64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->sk_hash == (__hash))
((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->sk_hash == (__hash)) && \
(inet_sk(__sk)->daddr == (__saddr)) && \
(inet_sk(__sk)->rcv_saddr == (__daddr)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
- Adds a prefetch(head->chain.first) in
__inet_lookup_established()/__tcp_v4_check_established() and
__inet6_lookup_established()/__tcp_v6_check_established() and
__dccp_v4_check_established() to bring into cache the first element of the
list, before the {read|write}_lock(&head->lock);
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I've found the problem in general. It affects any 64-bit
architecture. The problem occurs when you change the system time.
Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base. You then wind the clock back
by a day. From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.
In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.
When we do overflow, we'll need a better solution of course.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use '#ifdef' consistently on __KERNEL__. This was reported as bug #5340
(isn't easier to send a fix than report the bug?!)
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only one of the run or kick path is supposed to put an iocb on the run
list. If both of them do it than one of them can end up referencing a
freed iocb. The kick path could delete the task_list item from the wait
queue before getting the ctx_lock and putting the iocb on the run list.
The run path was testing the task_list item outside the lock so that it
could catch ki_retry methods that return -EIOCBRETRY *without* putting the
iocb on a wait queue and promising to call kick_iocb. This unlocked check
could then race with the kick path to cause both to try and put the iocb on
the run list.
The patch stops the run path from testing task_list by requring that any
ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
called in the future. aio_p{read,write}, the only in-tree -EIOCBRETRY
users, are updated.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland points out that the flags end up having non-obvious dependencies
elsewhere, so revert aa55a08687 and add
some comments about why things are as they are.
We'll just have to fix up the broken comparisons. Roland has a patch.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
do_signal_stop:
for_each_thread(t) {
if (t->state < TASK_STOPPED)
++sig->group_stop_count;
}
However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.
See also wait_task_stopped(), which checks ->state > TASK_STOPPED.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
[ We really probably should always use the appropriate bitmasks to test
task states, not do it like this. Using something like
#define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \
TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE)
and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the
ordering of the task states is historical, and keeping the ordering
does make sense regardless. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch adds extra permission grants to keys for the possessor of a
key in addition to the owner, group and other permissions bits. This makes
SUID binaries easier to support without going as far as labelling keys and key
targets using the LSM facilities.
This patch adds a second "pointer type" to key structures (struct key_ref *)
that can have the bottom bit of the address set to indicate the possession of
a key. This is propagated through searches from the keyring to the discovered
key. It has been made a separate type so that the compiler can spot attempts
to dereference a potentially incorrect pointer.
The "possession" attribute can't be attached to a key structure directly as
it's not an intrinsic property of a key.
Pointers to keys have been replaced with struct key_ref *'s wherever
possession information needs to be passed through.
This does assume that the bottom bit of the pointer will always be zero on
return from kmem_cache_alloc().
The key reference type has been made into a typedef so that at least it can be
located in the sources, even though it's basically a pointer to an undefined
type. I've also renamed the accessor functions to be more useful, and all
reference variables should now end in "_ref".
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
s/PIO_ST_/HSM_ST_/ and s/pio_task_state/hsm_task_state/.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>