Convert parport_serial to use the new 8250_pci interface, converting
the table to a pciserial_board table. This also unuses the SPCI_*
definitions in serialP.h, which can now be removed.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Re-jig the setup/removal/suspend/resume of 8250 pci ports so that they
know slightly less about how they're attached to a PCI device. Expose
this as the new interface for registering PCI serial ports, as well as
the pciserial_board structure and associated flag definitions.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When the kernel is working well and we want to restart cleanly
kernel_restart is the function to use. But in many instances
the kernel wants to reboot when thing are expected to be working
very badly such as from panic or a software watchdog handler.
This patch adds the function emergency_restart() so that
callers can be clear what semantics they expect when calling
restart. emergency_restart() is expected to be callable
from interrupt context and possibly reliable in even more
trying circumstances.
This is an initial generic implementation for all architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because the factors of sys_reboot don't exist people calling
into the reboot path duplicate the code badly, leading to
inconsistent expectations of code in the reboot path.
This patch should is just code motion.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add 5780 PCI IDs, chip IDs, and other basic support.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.
The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet. If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.
This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use. Reference counting
is needed for ctnetlink anyway, so introduce it now.
To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf states:
> I used to mark such ids as obsolete in the header but since
> skb is on diet anyway and there has been no official
> iproute2 release with the ematch bits included it might be
> a better idea to remove the ids from the header completely.
> Those that have picked up my patch on netdev shouldn't care
> about a ABI breakage, actually I doubt that someone is using
> it already.
So here's the patch to remove it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for SIIG Quartet Serial card. This card has Oxford
Semiconducor 16954 quad UART which is clocked by 10x faster
(18.432 MHz) quartz.
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
I think it's about time to make the build a little more vocal about the
expiry of these functions. Due to recent discussions with problems in
the console initialisation vs power manglement, I'd like to move the
date forward to September.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We need to be careful differentiating between a resync of a complete array,
in which we can clear the bitmap, and a resync of a degraded array, in
which we cannot.
This patch cleans all that up.
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add pci ids for new ISP types.
Move old definitions in local qla_def.h file to pci_ids.h as
well.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add special case for the POSIX_FADV_DONTNEED and POSIX_FADV_NOREUSE hint
values for s390-64. The user space values in the s390-64 glibc headers for
these two defines have always been 6 and 7 instead of 4 and 5. All 64 bit
applications therefore use the "wrong" values. To get these applications
working without recompiling the kernel needs to accept the "wrong" values.
Since the values for s390-31 are 4 and 5 the compat wrapper for fadvise64
and fadvise64_64 need to rewrite the values for 31 bit system calls.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Something has changed in the core kernel such that we now get concurrent
inode write outs, one e.g via pdflush and one via sys_sync or whatever.
This causes a nasty deadlock in ntfs. The only clean solution
unfortunately requires a minor vfs api extension.
First the deadlock analysis:
Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
time. The NTFS driver uses the page cache for storing the file contents as
usual. More interestingly this file contains the table of on-disk inodes
as a sequence of MFT_RECORDs. Thus NTFS driver accesses the on-disk inodes
by accessing the MFT_RECORDs in the page cache pages of the loaded inode
$MFT.
The situation: VFS inode X on a mounted ntfs volume is dirty. For same
inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
table of inodes ($MFT, inode 0).
What happens:
Process 1: sys_sync()/umount()/whatever... calls __sync_single_inode() for
$MFT -> do_writepages() -> write_page for the dirty page containing the
on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
clears PageUptodate() on the page to prevent anyone else getting hold of it
whilst it does the write out (this is necessary as the on-disk inode needs
"fixups" applied before the write to disk which are removed again after the
write and PageUptodate is then set again). It then analyses the page
looking for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this on-disk
inode. This then calls ilookup5() to check if the corresponding VFS inode
is in icache(). This in turn calls ifind() which waits on the inode lock
via wait_on_inode whilst holding the global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the same
VFS inode X on the ntfs volume. This locks the inode (I_LOCK) then calls
write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
the page (in page cache of table of inodes $MFT, inode 0) containing the
on-disk inode. This page has PageUptodate() clear because of Process 1
(see above) so read_cache_page() blocks when tries to take the page lock
for the page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the on-disk
inode X and it is waiting on the inode X to be unlocked in ifind() so it
can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for the
page to be unlocked so it can call ntfs_readpage() or discover that
Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The only sensible solution: NTFS does not care whether the VFS inode is
locked or not when it calls ilookup5() (it doesn't use the VFS inode at
all, it just uses it to find the corresponding ntfs_inode which is of
course attached to the VFS inode (both are one single struct); and it uses
the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
hence we want a modified ilookup5_nowait() which is the same as ilookup5()
but it does not wait on the inode lock.
Without such functionality I would have to keep my own ntfs_inode cache in
the NTFS driver just so I can find ntfs_inodes independent of their VFS
inodes which would be slow, memory and cpu cycle wasting, and incredibly
stupid given the icache already exists in the VFS.
Below is a patch that does the ilookup5_nowait() implementation in
fs/inode.c and exports it.
ilookup5_nowait.diff:
Introduce ilookup5_nowait() which is basically the same as ilookup5() but
it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
done in ifind()).
This is needed to avoid a nasty deadlock in NTFS.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rearranges the event ordering for "open" to be consistent with the
ordering of the other events.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs". Also some related cleanup.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was a pure indentation change, using:
scripts/Lindent fs/reiserfs/*.c include/linux/reiserfs_*.h
to make reiserfs match the regular Linux indentation style. As Jeff
Mahoney <jeffm@suse.com> writes:
The ReiserFS code is a mix of a number of different coding styles, sometimes
different even from line-to-line. Since the code has been relatively stable
for quite some time and there are few outstanding patches to be applied, it
is time to reformat the code to conform to the Linux style standard outlined
in Documentation/CodingStyle.
This patch contains the result of running scripts/Lindent against
fs/reiserfs/*.c and include/linux/reiserfs_*.h. There are places where the
code can be made to look better, but I'd rather keep those patches separate
so that there isn't a subtle by-hand hand accident in the middle of a huge
patch. To be clear: This patch is reformatting *only*.
A number of patches may follow that continue to make the code more consistent
with the Linux coding style.
Hans wasn't particularly enthusiastic about these patches, but said he
wouldn't really oppose them either.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_pages_and_swap_cache() and free_page_and_swap_cache() use release_pages()
and page_cache_release() respectively, so make sure that we have the
declarations in scope.
Cc: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a problem with ext3 mount option parsing. When remount of a filesystem
fails, old options are now restored.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
tr_type_trans(), hippi_type_trans() left as-is.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds another CDC descriptor type to <linux/usb_cdc.h>; the main claim
to fame for this is that some Motorola phones include it. It's not currently
needed by any driver code; included for completeness.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg,
This patch fixes the kmalloc() flags argument type in USB
subsystem; hopefully all of its occurences. The patch was
made against patch-2.6.12-git2 from Jun 20.
Cleanup of flags for kmalloc() in USB subsystem.
Signed-off-by: Olav Kongas <ok@artecdesign.ee>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
http://bugme.osdl.org/show_bug.cgi?id=4016
Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI 3.0 added a Correctable Platform Error Interrupt (CPEI)
Processor Overide flag to MADT.Platform_Interrupt_Source.
Record the processor that was provided as hint from ACPI.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Implement the framework for binding physical devices
with ACPI devices. A physical bus like PCI bus
should create a 'acpi_bus_type', with:
.find_device:
For device which has parent such as normal PCI devices.
.find_bridge:
It's for special devices, such as PCI root bridge
or IDE controller. Such devices generally haven't a
parent or ->bus. We use the special method
to get an ACPI handle.
Uses new field in struct device: firmware_data
http://bugzilla.kernel.org/show_bug.cgi?id=4277
Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Register an "acpi" system device to be notified of shutdown preparation.
This depends on CONFIG_PM
http://bugzilla.kernel.org/show_bug.cgi?id=4041
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Here's a small patch against 2.6.13-rc2 that adds securityfs, a virtual
fs that all LSMs can use instead of creating their own. The fs should
be mounted at /sys/kernel/security, and the fs creates that mount point.
This will make the LSB people happy that we aren't creating a new
/my_lsm_fs directory in the root for every different LSM.
It has changed a bit since the last version, thanks to comments from
Mike Waychison.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@osdl.org>
This patch corrects a few problems with the IP_ADD_MEMBERSHIP
socket option:
1) The existing code makes an attempt at reference counting joins when
using the ip_mreqn/imr_ifindex interface. Joining the same group
on the same socket is an error, whatever the API. This leads to
unexpected results when mixing ip_mreqn by index with ip_mreqn by
address, ip_mreq, or other API's. For example, ip_mreq followed by
ip_mreqn of the same group will "work" while the same two reversed
will not.
Fixed to always return EADDRINUSE on a duplicate join and
removed the (now unused) reference count in ip_mc_socklist.
2) The group-search list in ip_mc_join_group() is comparing a full
ip_mreqn structure and all of it must match for it to find the
group. This doesn't correctly match a group that was joined with
ip_mreq or ip_mreqn with an address (with or without an index). It
also doesn't match groups that are joined by different addresses on
the same interface. All of these are the same multicast group,
which is identified by group address and interface index.
Fixed the check to correctly match groups so we don't get
duplicate group entries on the ip_mc_socklist.
3) The old code allocates a multicast address before searching for
duplicates requiring it to free in various error cases. This
patch moves the allocate until after the search and
igmp_max_memberships check, so never a need to allocate, then free
an entry.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't be allowing, e.g., write locks on files not open for read. To
enforce this, we add a pointer from the lock stateid back to the open stateid
it came from, so that the check will continue to be correct even after the
open is upgraded or downgraded.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
from RFC 3530:
"Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED."
(Note that share_denied is really only a legal error for OPEN.)
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some comments on the use of so_seqid, in an attempt to avoid some of the
confusion outlined in the previous patch....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to fsync the recovery directory after writing to it, but we weren't
doing this correctly. (For example, we weren't taking the i_sem when calling
->fsync().)
Just reuse the existing nfsd fsync code instead.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames _mntput() to something a little more descriptive:
mntput_no_expire().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames vfsmount->mnt_fslink to something a little more
descriptive: vfsmount->mnt_expire.
Signed-off-by: Mike Waychison <michael.waychison@sun.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a race found by Ram in mark_mounts_for_expiry() in
fs/namespace.c.
The bug can only be triggered with simultaneous exiting of a process having
a private namespace, and expiry of a mount from within that namespace.
It's practically impossible to trigger, and I haven't even tried. But
still, a bug is a bug.
The race happens when put_namespace() is called by another task, while
mark_mounts_for_expiry() is between atomic_read() and get_namespace(). In
that case get_namespace() will be called on an already dead namespace with
unforeseeable results.
The solution was suggested by Al Viro, with his own words:
Instead of screwing with atomic_read() in there, why don't we
simply do the following:
a) atomic_dec_and_lock() in put_namespace()
b) __put_namespace() called without dropping lock
c) the first thing done by __put_namespace would be
struct vfsmount *root = namespace->root;
namespace->root = NULL;
spin_unlock(...);
....
umount_tree(root);
...
d) check in mark_... would be simply namespace && namespace->root.
And we are all set; no screwing around with atomic_read(), no magic
at all. Dying namespace gets NULL ->root.
All changes of ->root happen under spinlock.
If under a spinlock we see non-NULL ->mnt_namespace, it won't be
freed until we drop the lock (we will set ->mnt_namespace to NULL
under that lock before we get to freeing namespace).
If under a spinlock we see non-NULL ->mnt_namespace and
->mnt_namespace->root, we can grab a reference to namespace and be
sure that it won't go away.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.
If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated. In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.
The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a bit spin lock in the first buffer of the page to synchronise asynch
IO buffer completions, instead of the global page_uptodate_lock, which is
showing some scalabilty problems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix u32 vs pm_message_t confusion in cpufreq.
Signed-off-by: Bernard Blackham <bernard@blackham.com.au>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Make ioprio syscalls return long, like set/getpriority syscalls.
- Move function prototypes into syscalls.h so we can pick them up in the
32/64bit compat code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OCFS2 wants to mark an inode which has been orphaned by another node so
that during final iput it takes the correct path through the VFS and can
pass through the OCFS2 delete_inode callback. Since i_nlink can get out of
date with other nodes, the best way I see to accomplish this is by clearing
i_nlink on those inodes at drop_inode time. Other than this small amount
of work, nothing different needs to happen, so I think it would be cleanest
to be able to just call generic_drop_inode at the end of the OCFS2
drop_inode callback.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch ensures that cit_iv is aligned according to cra_alignmask
by allocating it as part of the tfm structure. As a side effect the
crypto layer will also guarantee that the tfm ctx area has enough space
to be aligned by cra_alignmask. This allows us to remove the extra
space reservation from the Padlock driver.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VIA Padlock device requires the input and output buffers to
be aligned on 16-byte boundaries. This patch adds the alignmask
attribute for low-level cipher implementations to indicate their
alignment requirements.
The mid-level crypt() function will copy the input/output buffers
if they are not aligned correctly before they are passed to the
low-level implementation.
Strictly speaking, some of the software implementations require
the buffers to be aligned on 4-byte boundaries as they do 32-bit
loads. However, it is not clear whether it is better to copy
the buffers or pay the penalty for unaligned loads/stores.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds hooks for cipher algorithms to implement multi-block
ECB/CBC operations directly. This is expected to provide significant
performance boots to the VIA Padlock.
It could also be used for improving software implementations such as
AES where operating on multiple blocks at a time may enable certain
optimisations.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the usage of struct of_match to struct of_device_id,
similar to pci_device_id. This allows a device table to be generated,
which can be parsed by depmod(8) to generate a map file for module
loading.
In order for hotplug to work with macio devices, patches to
module-init-tools and hotplug must be applied. Those patches are
available at:
ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.
Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make TSO segment transmit size decisions at send time not earlier.
The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.
This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.
Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.
A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the code to load packet data into a register:
k = fentry->k;
if (k < 0) {
...
} else {
u32 _tmp, *p;
p = skb_header_pointer(skb, k, 4, &_tmp);
if (p != NULL) {
A = ntohl(*p);
continue;
}
}
skb_header_pointer checks if the requested data is within the
linear area:
int hlen = skb_headlen(skb);
if (offset + len <= hlen)
return skb->data + offset;
When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.
I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.
This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.
Thanks to Thomas Vgtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch adds some ioctls to include/linux/compat_ioctl.h
to allow using ppdev from the 32 bit user space on sparc64.
This patch also adds the PPDEV option in the sparc64 menu, near Parallel
printer support in the 'General machine setup' submenu.
All those ioctls seem to be compatible, since (correct me if I'm wrong)
they dont use the 'long' type. See include/linux/ppdev.h.
The application I used to test the new ioctls only used the following:
PPEXCL
PPCLAIM
PPNEGOT
PPGETMODES
PPRCONTROL
PPWCONTROL
PPDATADIR
PPWDATA
PPRDATA
But I beleive that the other ioctls will work fine.
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Rob Punkunus <rpunkunus@nvidia.com>
Rob Punkunus recently submitted a patch to enable support for MCP51/MCP55 in
the amd74xx driver. This patch was whitespace-corrupted and didn't apply to
2.6.12 since MCP51 support was merged in the 2.6.12-rc series.
Gentoo would like to support this hardware for our upcoming release media, so
I fixed the patch, and here it is :)
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
audit_log() also takes an extra argument, although it's a vararg
function so the compiler didn't really notice.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
audit_log_start() seems to take 3 arguments, but its defined to take
only 2 when AUDIT is turned off.
security/selinux/avc.c:553:75: macro "audit_log_start" passed 3 arguments, but takes just 2
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The dynamic pci id logic has been bothering me for a while, and now that
I started to look into how to move some of this to the driver core, I
thought it was time to clean it all up.
It ends up making the code smaller, and easier to follow, and fixes a
few bugs at the same time (dynamic ids were not being matched
everywhere, and so could be missed on some call paths for new devices,
semaphore not needed to be grabbed when adding a new id and calling the
driver core, etc.)
I also renamed the function pci_match_device() to pci_match_id() as
that's what it really does.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch increases the number of resource pointers in the
pci_bus structure. This is needed to store >4 resource ranges
for host bridges and transparent PCI bridges. With this change,
all PCI buses will have more resource pointers, but most PCI
buses will only use the first 3 or 4, the remaining being NULL.
The PCI core already deals with this correctly.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No one was looking at the return value of bus_rescan_devices, and it
really wasn't anything that anyone in the kernel would ever care about.
So change it which enabled some counting code to be removed also.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add bus_find_device() and driver_find_device() which allow searching for a
device in the bus's resp. the driver's klist and obtain a reference on it.
Signed-off-by: Cornelia Huck <cohuck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code was wrong in several aspects. The locking order was
inconsistent, the device aquire code did not reset a variable
after a wakeup and the wakeup handling was not working for
applications where multiple chips are sharing a single
hardware controller.
When a hardware controller is available the locking is now
reduced to the hardware controller lock and the waitqueue is
moved to the hardware controller structure in order to avoid
a wake_up_all().
The problem was pointed out by Ben Dooks, who also found the
missing variable reset as main cause for his deadlock problem.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Anyone reporting a stuck IRQ should try these options. Its effectiveness
varies we've found in the Fedora case. Quite a few systems with misdescribed
IRQ routing just work when you use irqpoll. It also fixes up the VIA systems
although thats now fixed with the VIA quirk (which we could just make default
as its what Redmond OS does but Linus didn't like it historically).
A small number of systems have jammed IRQ sources or misdescribes that cause
an IRQ that we have no handler registered anywhere for. In those cases it
doesn't help.
Signed-off-by: Alan Cox <number6@the-village.bc.nu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_io_context needlessly turned off interrupts and checked for racing io
context creations. Both of which aren't needed, because the io context can
only be created while in process context of the current process.
Also, split the function in 2. A light version, current_io_context does not
elevate the reference count specifically, but can be used when in process
context, because the process holds a reference itself.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch for usb_ch9.h includes linux/types.h instead of asm/types.h so that
__le16 and so on is explicitly defined. It also cleans up non standard //
comment.
Signed-off-by: GOTO Masanori <gotom@debian.or.jp>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch lets i2c-dev.h include linux/compiler.h so that __user is defined.
Signed-off-by: GOTO Masanori <gotom@debian.or.jp>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like it sneaked back with the NFS ACL merge..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In file included from drivers/media/dvb/ttpci/av7110_hw.c:38:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_v4l.c:36:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_av.c:37:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
drivers/isdn/icn/icn.c:719:4: warning: #warning TODO test headroom or use skb->nb to flag ACK
In file included from drivers/media/dvb/ttpci/av7110_ca.c:39:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110.c:41:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
Does declaring a function to return a const value actually mean something to
gcc?
Dunno. Kill it and replace sone `__inline__'s with `inline' too.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
linux/etherdevice.h can't be included standalone at the moment, which
is required in order to sort the header files in the recommended
alphabetic order. This patch fixes that and is needed to build spider_net.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove two more unused IPV6_AUTHHDR option things,
which I failed to remove them last time,
plus, mark IPV6_AUTHHDR obsolete.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plug holes with padding fields and initialized them to zero.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the first step in properly handling the MCFG PCI table.
It defines the structures properly, and saves off the table so that the
pci mmconfig code can access it. It moves the parsing of the table a
little later in the boot process, but still before the information is
needed.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With CONFIG_PCI=n:
In file included from include/linux/pci.h:917,
from lib/iomap.c:6:
include/asm/pci.h:104: warning: `enum pci_dma_burst_strategy' declared inside parameter list
include/asm/pci.h:104: warning: its scope is only this definition or declaration, which is probably not what you want.
include/asm/pci.h: In function `pci_dma_burst_advice':
include/asm/pci.h:106: dereferencing pointer to incomplete type
include/asm/pci.h:106: `PCI_DMA_BURST_INFINITY' undeclared (first use in this function)
include/asm/pci.h:106: (Each undeclared identifier is reported only once
include/asm/pci.h:106: for each function it appears in.)
make[1]: *** [lib/iomap.o] Error 1
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After seeing, at best, "guesses" as to the following kind
of information in several drivers, I decided that we really
need a way for platforms to specifically give advice in this
area for what works best with their PCI controller implementation.
Basically, this new interface gives DMA bursting advice on
PCI. There are three forms of the advice:
1) Burst as much as possible, it is not necessary to end bursts
on some particular boundary for best performance.
2) Burst on some byte count multiple. A DMA burst to some multiple of
number of bytes may be done, but it is important to end the burst
on an exact multiple for best performance.
The best example of this I am aware of are the PPC64 PCI
controllers, where if you end a burst mid-cacheline then
chip has to refetch the data and the IOMMU translations
which hurts performance a lot.
3) Burst on a single byte count multiple. Bursts shall end
exactly on the next multiple boundary for best performance.
Sparc64 and Alpha's PCI controllers operate this way. They
disconnect any device which tries to burst across a cacheline
boundary.
Actually, newer sparc64 PCI controllers do not have this behavior.
That is why the "pdev" is passed into the interface, so I can
add code later to check which PCI controller the system is using
and give advice accordingly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is an updated version of Ben's fix-pci-mmap-on-ppc-and-ppc64.patch
which is in 2.6.12-rc4-mm1.
It fixes the patch to work on PPC iSeries, removes some debug printks
at Ben's request, and incorporates your
fix-pci-mmap-on-ppc-and-ppc64-fix.patch also.
Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch was discussed at length on linux-pci and so far, the last
iteration of it didn't raise any comment. It's effect is a nop on
architecture that don't define the new pci_resource_to_user() callback
anyway. It allows architecture like ppc who put weird things inside of
PCI resource structures to convert to some different value for user
visible ones. It also fixes mmap'ing of IO space on those archs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds PCI based I/O xAPIC hot-add support to ACPIPHP
driver. When PCI root bridge is hot-added, all PCI based I/O xAPICs
under the root bridge are hot-added by this patch. Hot-remove support
is TBD.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the following new interfaces for I/O xAPIC
hotplug. The implementation of these interfaces depends on each
architecture.
o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
u32 gsi_base);
This new interface is to add a new I/O xAPIC specified by
phys_addr and gsi_base pair. phys_addr is the physical address
to which the I/O xAPIC is mapped and gsi_base is global system
interrupt base of the I/O xAPIC. acpi_register_ioapic returns
0 on success, or negative value on error.
o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);
This new interface is to remove a I/O xAPIC specified by
gsi_base. acpi_unregister_ioapic returns 0 on success, or
negative value on error.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When you hot-plug a (root) bridge hierarchy, it may have p2p bridges and
devices attached to it that have not been configured by firmware. In this
case, we need to configure the devices before starting them. This patch
separates device start from device scan so that we can introduce the
configuration step in the middle.
I kept the existing semantics for pci_scan_bus() since there are a huge number
of callers to that function.
Also, I have no way of testing the changes I made to the parisc files, so this
needs review by those folks. Sorry for the massive cross-post, this touches
files in many different places.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The size of pointers may differ between (userspace) modpost and (kernelspace)
modules -- so fix mod_devicetable.h to reflect this possibility.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a card doesn't provide _any_ information about itself, assume it is a
so-called "anonymous" card. pcmciamtd will bind to it if it is configured to
do so.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add another match flag for devices needing a CIS override. The driver will
only probe/attach if the CIS has been replaced before.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The actual matching of pcmcia drivers and pcmcia devices. The original
version of this was written by David Woodhouse.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets you throw out the iteraid stuff that has ended up back in due
to stupid goings on in the IDE world. Its the same heavily tested code
shipped in Fedora/Red Hat products but without the other dependancies on
the Bartlomiej IDE layer.
Pre-requisite: the ide-disk patch I sent to handle pure LBA devices.
Obviously you lose things like hot unplug with the Bartlomiej IDE layer
at the moment but that won't matter to most users.
The patch does the following
- Add IT8211/12 to pci_ids.h
- Add Makefile/Kconfig entry
- Add it8212 driver
No core IDE code is touched by this diff
Embedded system testing and the ability to force raid mode off by David
Howells
Made possible by the ite reference code, documentation and also several
clarifications and pieces of assistance provided by ITE themselves
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following is the second version of the function return probe patches
I sent out earlier this week. Changes since my last submission include:
* Fix in ppc64 code removing an unneeded call to re-enable preemption
* Fix a build problem in ia64 when kprobes was turned off
* Added another BUG_ON check to each of the architecture trampoline
handlers
My initial patch description ==>
From my experiences with adding return probes to x86_64 and ia64, and the
feedback on LKML to those patches, I think we can simplify the design
for return probes.
The following patch tweaks the original design such that:
* Instead of storing the stack address in the return probe instance, the
task pointer is stored. This gives us all we need in order to:
- find the correct return probe instance when we enter the trampoline
(even if we are recursing)
- find all left-over return probe instances when the task is going away
This has the side effect of simplifying the implementation since more
work can be done in kernel/kprobes.c since architecture specific knowledge
of the stack layout is no longer required. Specifically, we no longer have:
- arch_get_kprobe_task()
- arch_kprobe_flush_task()
- get_rp_inst_tsk()
- get_rp_inst()
- trampoline_post_handler() <see next bullet>
* Instead of splitting the return probe handling and cleanup logic across
the pre and post trampoline handlers, all the work is pushed into the
pre function (trampoline_probe_handler), and then we skip single stepping
the original function. In this case the original instruction to be single
stepped was just a NOP, and we can do without the extra interruption.
The new flow of events to having a return probe handler execute when a target
function exits is:
* At system initialization time, a kprobe is inserted at the beginning of
kretprobe_trampoline. kernel/kprobes.c use to handle this on it's own,
but ia64 needed to do this a little differently (i.e. a function pointer
is really a pointer to a structure containing the instruction pointer and
a global pointer), so I added the notion of arch_init(), so that
kernel/kprobes.c:init_kprobes() now allows architecture specific
initialization by calling arch_init() before exiting. Each architecture
now registers a kprobe on it's own trampoline function.
* register_kretprobe() will insert a kprobe at the beginning of the targeted
function with the kprobe pre_handler set to arch_prepare_kretprobe
(still no change)
* When the target function is entered, the kprobe is fired, calling
arch_prepare_kretprobe (still no change)
* In arch_prepare_kretprobe() we try to get a free instance and if one is
available then we fill out the instance with a pointer to the return probe,
the original return address, and a pointer to the task structure (instead
of the stack address.) Just like before we change the return address
to the trampoline function and mark the instance as used.
If multiple return probes are registered for a given target function,
then arch_prepare_kretprobe() will get called multiple times for the same
task (since our kprobe implementation is able to handle multiple kprobes
at the same address.) Past the first call to arch_prepare_kretprobe,
we end up with the original address stored in the return probe instance
pointing to our trampoline function. (This is a significant difference
from the original arch_prepare_kretprobe design.)
* Target function executes like normal and then returns to kretprobe_trampoline.
* kprobe inserted on the first instruction of kretprobe_trampoline is fired
and calls trampoline_probe_handler() (no change here)
* trampoline_probe_handler() consumes each of the instances associated with
the current task by calling the registered handler function and marking
the instance as unused until an instance is found that has a return address
different then the trampoline function.
(change similar to my previous ia64 RFC)
* If the task is killed with some left-over return probe instances (meaning
that a target function was entered, but never returned), then we just
free any instances associated with the task. (Not much different other
then we can handle this without calling architecture specific functions.)
There is a known problem that this patch does not yet solve where
registering a return probe flush_old_exec or flush_thread will put us
in a bad state. Most likely the best way to handle this is to not allow
registering return probes on these two functions.
(Significant change)
This patch series applies to the 2.6.12-rc6-mm1 kernel, and provides:
* kernel/kprobes.c changes
* i386 patch of existing return probes implementation
* x86_64 patch of existing return probe implementation
* ia64 implementation
* ppc64 implementation (provided by Ananth)
This patch implements the architecture independant changes for a reworking
of the kprobes based function return probes design. Changes include:
* Removing functions for querying a return probe instance off a stack address
* Removing the stack_addr field from the kretprobe_instance definition,
and adding a task pointer
* Adding architecture specific initialization via arch_init()
* Removing extern definitions for the architecture trampoline functions
(this isn't needed anymore since the architecture handles the
initialization of the kprobe in the return probe trampoline function.)
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that PPC64 has no-execute support, here is a second try to fix the
single step out of line during kprobe execution. Kprobes on x86_64 already
solved this problem by allocating an executable page and using it as the
scratch area for stepping out of line. Reuse that.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is pass 2 of my patch to add pci domain info to an existing ioctl. This
time I insert the domain between dev_fn and board_id as Willy suggested and
change the var to unsigned short to ease Christoph's concerns. Although I
thought unsigned int was the correct var type for this. I also thought it
didn't matter where I inserted it in the structure.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a PCI ID I got wrong before. It also adds support for
another new SAS controller due out this summer. I didn't have a marketing
name prior to my last submission. Also modifies the copyright date range.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe at least for seccomp it's worth to turn off the tsc, not just for
HT but for the L2 cache too. So it's up to you, either you turn it off
completely (which isn't very nice IMHO) or I recommend to apply this below
patch.
This has been tested successfully on x86-64 against current cogito
repository (i686 compiles so I didn't bother testing ;). People selling
the cpu through cpushare may appreciate this bit for a peace of mind.
There's no way to get any timing info anymore with this applied
(gettimeofday is forbidden of course). The seccomp environment is
completely deterministic so it can't be allowed to get timing info, it has
to be deterministic so in the future I can enable a computing mode that
does a parallel computing for each task with server side transparent
checkpointing and verification that the output is the same from all the 2/3
seller computers for each task, without the buyer even noticing (for now
the verification is left to the buyer client side and there's no
checkpointing, since that would require more kernel changes to track the
dirty bits but it'll be easy to extend once the basic mode is finished).
Eliminating a cold-cache read of the cr4 global variable will save one
cacheline during the tlb flush while making the code per-cpu-safe at the
same time. Thanks to Mikael Pettersson for noticing the tlb flush wasn't
per-cpu-safe.
The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
it'll be transparent to the switch_to code since the IPI won't make any
change to the cr4 contents from the point of view of the interrupted code
and since it's now all per-cpu stuff, it will not race. So no need to
disable irqs in switch_to slow path.
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes CONFIG_PMAC_PBOOK (PowerBook support). This is now
split into CONFIG_PMAC_MEDIABAY for the actual hotswap bay that some
powerbooks have, CONFIG_PM for power management related code, and just left
out of any CONFIG_* option for some generally useful stuff that can be used
on non-laptops as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This provides declarations for new requests, descriptors, and bitfields as
defined in the Wireless USB 1.0 spec. Device support will involve a new
"Wire Adapter" device class, connecting a USB Host to a cluster of wireless
USB devices. There will be two adapter types:
* Host Wireless Adapter (HWA): the downstream link is wireless, which
connects a wireless USB host to wireless USB devices (not unlike like
a hub) including to the second type of adapter.
* Device Wireless Adapter (DWA): the upstream link is wireless, for
connecting existing USB devices through wired links into the cluser.
All wireless USB devices will need persistent (and secure!) key storage, and
it's probable that Linux -- or device firmware -- will need to be involved
with that to bootstrap the initial secure key exchange.
Some user interface is required in that initial key exchange, and since the
most "hands-off" one is a wired USB link, I suspect wireless operation will
usually not be the only mode for wireless USB devices. (Plus, devices can
recharge batteries using wired USB...) All other key exchange protocols need
error prone user interactions, like copying and/or verifying keys.
It'll likely be a while before we have commercial Wireless USB hardware,
much less Linux implementations that know how to use it.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This updates most of the gadget framework to expect SETUP packets use
USB byteorder (matching the annotation in <linux/usb_ch9.h> and usage
in the host side stack):
- definition in <linux/usb_gadget.h>
- gadget drivers: Ethernet/RNDIS, serial/ACM, file_storage, gadgetfs.
- dummy_hcd
It also includes some other similar changes as suggested by "sparse",
which was used to detect byteorder bugs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch provides an "isp116x-hcd" driver for Philips'
ISP1160/ISP1161 USB host controllers.
The driver:
- is relatively small, meant for use on embedded platforms.
- runs usbtests 1-14 without problems for days.
- has been in use by 6-7 different people on ARM and PPC platforms,
running a range of devices including USB hubs.
- supports suspend/resume of both the platform device and the root hub;
supports remote wakeup of the root hub (but NOT the platform device)
by USB devices.
- does NOT support ISO transfers (nobody has asked for them).
- is PIO-only.
Signed-off-by: Olav Kongas <ok@artecdesign.ee>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Adjust slice values
- Instead of one async queue, one is defined per priority level. This
prevents kernel threads (such as reiserfs/x and others) that run at
higher io priority from conflicting with others. Previously, it was a
coin toss what io prio the async queue got, it was defined by who
first set up the queue.
- Let a time slice only begin, when the previous slice is completely
done. Previously we could be somewhat unfair to a new sync slice, if
the previous slice was async and had several ios queued. This might
need a little tweaking if throughput suffers a little due to this,
allowing perhaps an overlap of a single request or so.
- Optimize the calling of kblockd_schedule_work() by doing it only when
it is strictly necessary (no requests in driver and work left to do).
- Correct sync vs async logic. A 'normal' process can be purely async as
well, and a flusher can be purely sync as well. Sync or async is now a
property of the class defined and requests pending. Previously writers
could be considered sync, when they were really async.
- Get rid of the bit fields in cfqq and crq, use flags instead.
- Various other cleanups and fixes
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add separate files for the different 8250 ISA-based serial boards.
Looking across all the various architectures, it seems reasonable that
we can key the availability of the configuration options for these
beasts to the bus-related symbols (iow, CONFIG_ISA). We also standardise
the base baud/uart clock rate for these boards - I'm sure that isn't
architecture specific, but is solely dependent on the crystal fitted
on the board (which should be the same no matter what type of machine
its fitted into.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We're using __be16 in userland visible types, so we
have to include asm/byteorder.h so that works.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for alternate slave selection algorithms to bonding
balance-xor and 802.3ad modes. Default mode (what we have now: xor of
MAC addresses) is "layer2", new choice is "layer3+4", using IP and port
information for hashing to select peer.
Originally submitted by Jason Gabler for balance-xor mode;
modified by Jay Vosburgh to additionally support 802.3ad mode. Jason's
original comment is as follows:
The attached patch to the Linux Etherchannel Bonding driver modifies the
driver's "balance-xor" mode as follows:
- alternate hashing policy support for mode 2
* Added kernel parameter "xmit_policy" to allow the specification
of different hashing policies for mode 2. The original mode 2
policy is the default, now found in xmit_hash_policy_layer2().
* Added xmit_hash_policy_layer34()
This patch was inspired by hashing policies implemented by Cisco,
Foundry and IBM, which are explained in
Foundry documentation found at:
http://www.foundrynet.com/services/documentation/sribcg/Trunking.html#112750
Signed-off-by: Jason Gabler <jygabler@lbl.gov>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global code static
- remove the following unused global functions:
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
- remove the following unused EXPORT_SYMBOL's:
- blk_phys_contig_segment
- blk_hw_contig_segment
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make the needlessly global function __nvram_set_checksum static
- #if 0 the unused global function nvram_set_checksum
- remove the EXPORT_SYMBOL's for both functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes use of ALIGN() to remove duplicate round-up code.
Signed-off-by: Nick Wilson <njw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Following patch provides purely cosmetic changes and corrects CodingStyle
guide lines related certain issues like below in kexec related files
o braces for one line "if" statements, "for" loops,
o more than 80 column wide lines,
o No space after "while", "for" and "switch" key words
o Changes:
o take-2: Removed the extra tab before "case" key words.
o take-3: Put operator at the end of line and space before "*/"
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Makes kexec_crashdump() take a pt_regs * as an argument. This allows to
get exact register state at the point of the crash. If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.
This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.
die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Vivek Goyal" <vgoyal@in.ibm.com>
o Support for /proc/vmcore interface. This interface exports elf core image
either in ELF32 or ELF64 format, depending on the format in which elf headers
have been stored by crashed kernel.
o Added support for CONFIG_VMCORE config option.
o Removed the dependency on /proc/kcore.
From: "Eric W. Biederman" <ebiederm@xmission.com>
This patch has been refactored to more closely match the prevailing style in
the affected files. And to clearly indicate the dependency between
/proc/kcore and proc/vmcore.c
From: Hariprasad Nellitheertha <hari@in.ibm.com>
This patch contains the code that provides an ELF format interface to the
previous kernel's memory post kexec reboot.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for retrieving the address of elf core header if one
is passed in command line.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch retrieves the max_pfn being used by previous kernel and stores it
in a safe location (saved_max_pfn) before it is overwritten due to user
defined memory map. This pfn is used to make sure that user does not try to
read the physical memory beyond saved_max_pfn.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add kexec support for s390 architecture.
From: Milton Miller <miltonm@bga.com>
- Fix passing of first argument to relocate_kernel assembly.
- Fix Kconfig description.
- Remove wrong comment and comments that describe obvious things.
- Allow only KEXEC_TYPE_DEFAULT as image type -> dump not supported.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the architecture independent implementation the
sys_kexec_load, the compat_sys_kexec_load system calls.
Kexec on panic support has been integrated into the core patch and is
relatively clean.
In addition the hopefully architecture independent option
crashkernel=size@location has been docuemented. It's purpose is to reserve
space for the panic kernel to live, and where no DMA transfer will ever be
setup to access.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The
3 models can be selected from a new menu:
(X) No Forced Preemption (Server)
( ) Voluntary Kernel Preemption (Desktop)
( ) Preemptible Kernel (Low-Latency Desktop)
we still default to the stock (Server) preemption model.
Voluntary preemption works by adding a cond_resched()
(reschedule-if-needed) call to every might_sleep() check. It is lighter
than CONFIG_PREEMPT - at the cost of not having as tight latencies. It
represents a different latency/complexity/overhead tradeoff.
It has no runtime impact at all if disabled. Here are size stats that show
how the various preemption models impact the kernel's size:
text data bss dec hex filename
3618774 547184 179896 4345854 424ffe vmlinux.stock
3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2%
3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5%
voluntary-preempt is +0.2% of .text, preempt is +3.5%.
This feature has been tested for many months by lots of people (and it's
also included in the RHEL4 distribution and earlier variants were in Fedora
as well), and it's intended for users and distributions who dont want to
use full-blown CONFIG_PREEMPT for one reason or another.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches add dynamic sched domains functionality that was
extensively discussed on lkml and lse-tech. I would like to see this added to
-mm
o The main advantage with this feature is that it ensures that the scheduler
load balacing code only balances against the cpus that are in the sched
domain as defined by an exclusive cpuset and not all of the cpus in the
system. This removes any overhead due to load balancing code trying to
pull tasks outside of the cpu exclusive cpuset only to be prevented by
the tasks' cpus_allowed mask.
o cpu exclusive cpusets are useful for servers running orthogonal
workloads such as RT applications requiring low latency and HPC
applications that are throughput sensitive
o It provides a new API partition_sched_domains in sched.c
that makes dynamic sched domains possible.
o cpu_exclusive cpusets sets are now associated with a sched domain.
Which means that the users can dynamically modify the sched domains
through the cpuset file system interface
o ia64 sched domain code has been updated to support this feature as well
o Currently, this does not support hotplug. (However some of my tests
indicate hotplug+preempt is currently broken)
o I have tested it extensively on x86.
o This should have very minimal impact on performance as none of
the fast paths are affected
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate balance-on-exec with balance-on-fork. This is made easy by the
sched-domains RCU patches.
As well as the general goodness of code reduction, this allows the runqueues
to be unlocked during balance-on-fork.
schedstats is a problem. Maybe just have balance-on-event instead of
distinguishing fork and exec?
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of requiring architecture code to interact with the scheduler's
locking implementation, provide a couple of defines that can be used by the
architecture to request runqueue unlocked context switches, and ask for
interrupts to be enabled over the context switch.
Also replaces the "switch_lock" used by these architectures with an oncpu
flag (note, not a potentially slow bitflag). This eliminates one bus
locked memory operation when context switching, and simplifies the
task_running function.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do some basic initial tuning.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reimplement the balance on exec balancing to be sched-domains aware. Use this
to also do balance on fork balancing. Make x86_64 do balance on fork over the
NUMA domain.
The problem that the non sched domains aware blancing became apparent on dual
core, multi socket opterons. What we want is for the new tasks to be sent to
a different socket, but more often than not, we would first load up our
sibling core, or fill two cores of a single remote socket before selecting a
new one.
This gives large improvements to STREAM on such systems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
going against the direction we are trying to go. Hopefully we can regain
performance through other methods.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do CPU load averaging over a number of different intervals. Allow each
interval to be chosen by sending a parameter to source_load and target_load.
0 is instantaneous, idx > 0 returns a decaying average with the most recent
sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased).
So generally a higher number will result in more conservative balancing.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not
all) in comments. This patch changes these synchronize_kernel() calls (and
comments) to synchronize_rcu() or synchronize_sched() as follows:
- arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to
handle races with machine-check exceptions (synchronize_rcu() would not cut
it given RCU implementations intended for hardcore realtime use.
- drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to
handle races with i8042_interrupt() interrupt handler. Again,
synchronize_rcu() would not cut it given RCU implementations intended for
hardcore realtime use.
- include/*/kdebug.h comments: change to synchronize_sched() to handle races
with NMIs. As before, synchronize_rcu() would not cut it...
- include/linux/list.h comment: change to synchronize_rcu(), since this
comment is for list_del_rcu().
- security/keys/key.c unregister_key_type(): change to synchronize_rcu(),
since this is interacting with RCU read side.
- security/keys/process_keys.c install_session_keyring(): change to
synchronize_rcu(), since this is interacting with RCU read side.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4
SMP patch.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds __cpuinit and __cpuinitdata sections that need to exist past
boot to support cpu hotplug.
Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen. Much of the generic hotplug code in kernel, and none of the other
archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
mark sections with __cpuinit, but only mark them as __devinit, and
__devinitdata.
If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use __cpuinit,
and __cpudevinit.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I really wish smp_prepare_cpu() would disappear eventually. In the interim
this is ideally a weak function, so we dont end up changing several places
to define this dummy in headers.
Today since the dummy declaration is done only in drivers/base/cpu.c but
the function is called in kernel/power/smp.c i get undefined reference in
my cpu hotplug code for x86_64 under development.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I8K: Change to use stock dmi infrastructure instead of homegrown
parsing code. The driver now requires box's DMI data to match
list of supported models so driver can be safely compiled-in
by default without fear of it poking into random SMM BIOS
code. DMI checks can be ignored with i8k.ignore_dmi option.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes sparse warnings in the qnx4fs (and might even make
qnx4fs work on big-endian boxes)
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't look up the task by its pid and then use the syscall filtering
helper. Just implement our own filter helper which operates solely on
the information in the netlink_skb_parms.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reworks filemap_xip.c with the goal to reduce code duplication
from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of
implementing the aio functions, this one implements the synchronous
read/write functions only. For readv and writev, the generic fallback is
used. For aio, we rely on the application doing the fallback. Since our
"synchronous" function does memcpy immediately anyway, there is no
performance difference between using the fallbacks or implementing each
operation.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the ext2 related parts. Ext2 now uses the xip_* file operations
along with the get_xip_page aop when mounted with -o xip.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- generic_file* file operations do no longer have a xip/non-xip split
- filemap_xip.c implements a new set of fops that require get_xip_page
aop to work proper. all new fops are exported GPL-only (don't like to
see whatever code use those except GPL modules)
- __xip_unmap now uses page_check_address, which is no longer static
in rmap.c, and defined in linux/rmap.h
- mm/filemap.h is now much more clean, plainly having just Linus'
inline funcs moved here from filemap.c
- fix includes in filemap_xip to make it build cleanly on i386
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the block device related part. The block device operation
direct_access now has a struct block_device as first parameter.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir.
It may be changed any time, but is used only on startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the code to create and remove client subdirectories from the
recovery directory, as described in the previous patch comment.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFSv4 clients are required to know what state they have on the server so that
they can reclaim it on server reboot. However, it is possible for
pathalogical combinations of server reboots and network partitions to leave a
client in a state where it cannot know whether it has lost its state on the
server.
For this reason, rfc3530 requires that we store some information about clients
to stable storage.
So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for
each client with active state. We leave open the possibility of including
files underneath each such subdirectory with information about the client, but
for now the subdirectories are empty.
We create a client subdirectory whenever a client makes its first non-reclaim
open_confirm.
We remove a client subdirectory whenever either
a) its lease expires, or
b) the grace period ends without it reclaiming anything.
When handling reclaims, we allow the reclaim if and only if the client doing
the reclaim has a subdirectory.
This patch adds just the code to scan the recovery directory on nfsd startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cb_parsed field is only used by probe_callback, to determine whether the
callback information has been filled in by setclientid. But there is no way
that probe_callback() can be called without that having already happened, so
that check is superfluous, as is cb_parsed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Already done for struct nfs4_file; go ahead and do it for the other nfsd4
state structures.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make needlessly global code static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For the purposes of reboot recovery we keep a directory with subdirectories
each having a name that is the ascii hex representation of the md5 sum of a
client identifier for an active client.
This adds the code to calculate that name. We also use it for the purposes of
comparing clients, so if someone ever manages to find two client names that
are md5 collisions, then we'll return clid_inuse to the second.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adopt standard kernel style by defining a no-op function instead of putting
ifdef's in the code where the function is called.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Separate out stuff that needs initialization on startup from stuff that only
needs initialization on module init from static data.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somewhat gratuitous rename to simplify following patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow recovery of delegations after reboot.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a struct kref to each nfs4_file and take a reference to it from each
stateid and delegation that refers to it. The atomicity guarantees are
overkill given that all this stuff is done under the single nfsd4 state lock,
but a) we'd like finer-grained locking some day, and b) this simplifies the
cleanup of the structures a bit, something that has previously been a bit
complicated and bug-prone.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Go ahead and do this for struct nfs4_file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're returning NFS4_FH_NOEXPIRE_WITH_OPEN | NFS4_FH_VOL_RENAME for the
fh_expire_type attribute. This is incorrect:
1. The spec actually only allows NOEXPIRE_WITH_OPEN when
VOLATILE_ANY is also set.
2. Filehandles for open files can expire, if the file is removed
and there is a reboot.
3. Filehandles are only volatile on rename in the nosubtree check
case.
Unfortunately, there's no way to indicate that we only expire on remove. So
our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set
FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually
cause problems in practice and it seems possibly useful to give clients some
way to distinguish that case.
Fix a mispelled #define while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lindent run and replaced printk() through the corresponding osm_*() function
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Added header "core.h" for i2o_core.ko internal definitions
- More sparse fixes
- Changed display of TID's in sysfs attributes from XXX to 0xXXX
- Use the right functions for accessing I/O and normal memory
- Removed error handling of SCSI device errors and let the SCSI layer
take care of it
- Added new device / removed device handling to SCSI-OSM
- Make status access volatile
- Cleaned up activation of I2O controller
- Removed unnecessary wmb() and rmb() calls
- Use own struct i2o_io for I/O memory instead of struct i2o_dma
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Provide SG_IO access to BLOCK and EXECUTIVE class on Adaptec
controllers
- Use PRIVATE messages in SCSI-OSM because on some controllers normal
SCSI class commands like READ or READ CAPACITY cause errors
- Use new DMA and SG list creation function
- Added workaround to limit sectors per request for Adaptec 2400A
controllers
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Added Bus-OSM which could be used by user space programs to reset a
channel on the controller
- Make ioctl's in Config-OSM obsolete in prefer for sysfs attributes and
move those to its own file
- Added sysfs attribute for firmware read and write access for I2O
controllers
- Added special handling of firmware read and write access for Adaptec
controllers
- Added vendor id and product id as sysfs-attribute to Executive classes
- Added automatic notification of LCT change handling to Exec-OSM
- Added flushing function to Block-OSM for later barrier implementation
- Use PRIVATE messages for Block access on Adaptec controllers, which are
faster then BLOCK class access
- Cleaned up support for Promise controller
- New messages are now detected using the IRQ status register as
suggested by the I2O spec
- Added i2o_dma_high() and i2o_dma_low() functions
- Added facility for SG tablesize calculation when using 32-bit and
64-bit DMA addresses
- Added i2o_dma_map_single() and i2o_dma_map_sg() which could build the
SG list for 32-bit as well as 64-bit DMA addresses
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Removed unnecessary checking of NULL before calling kfree()
- Make some functions static
- Changed pr_debug() into osm_debug()
- Use i2o_msg_in_to_virt() for getting a pointer to the message frame
- Cleaned up some comments
- Changed some le32_to_cpu() into readl() where necessary
- Make error messages of OSM's look the same
- Cleaned up error handling in i2o_block_end_request()
- Removed unused error handling of failed messages in Block-OSM, which
are not allowed by the I2O spec
- Corrected the blocksize detection in i2o_block
- Added hrt and lct sysfs-attribute to controller
- Call done() function in SCSI-OSM after freeing DMA buffers
- Removed unneeded variable for message size calculation in
i2o_scsi_queuecommand()
- Make some changes to remove sparse warnings
- Reordered some functions
- Cleaned up controller initialization
- Replaced some magic numbers by defines
- Removed unnecessary dma_sync_single_for_cpu() call on coherent DMA
- Removed some unused fields in i2o_controller and removed some unused
functions
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Fixed sysfs bug where user and parent links where added to the I2O
device itself
- Fixed bug when calculating TID for the event handler and cleaned up the
workflow of i2o_driver_dispatch()
- Fixed oops when no I2O device could be found for an event delivered to
Exec-OSM
- Fixed initialization of spinlock in Exec-OSM
- Fixed memory leak in i2o_cfg_passthru() and i2o_cfg_passthru()
- Removed MTRR support
- Added PCI ID of Promise SX6000 with firmware >= 1.20.x.x
- Turn of caching for ioremapped memory of in_queue
- Added initialization sequence for Promise controllers
- Moved definition of u8 / u16 / u32 for raidutils before first use
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for TPMs on additional LPC buses.
Signed-off-by: Kylene Hall <kjhall@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch to adds "power cycle" functionality to the IPMI power off module
ipmi_poweroff. It also contains changes to support procfs control of the
feature.
The power cycle action is considered an optional chassis control in the IPMI
specification. However, it is definitely useful when the hardware supports
it. A power cycle is usually required in order to reset a firmware in a bad
state. This action is critical to allow remote management of servers.
The implementation adds power cycle as optional to the ipmi_poweroff module.
It can be modified dynamically through the proc entry mentioned above. During
a power down and enabled, the power cycle command is sent to the BMC firmware.
If it fails either due to non-support or some error, it will retry to send
the command as power off.
Signed-off-by: Christopher A. Poblete <Chris_Poblete@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve space
for a quota operation in a transaction only if filesystem was mounted with
some quota option.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve a space
for a quota operation in a transaction only if filesystem was mounted with
some quota options.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve estimates on the number of needed credits for quota transaction.
Now we distinguish blocks that might need to be allocated and blocks that
only need to be rewritten. Also we distinguish deleting of a quota
structure and creating of a new one.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
XFS will have to look at iocb->private to fix aio+dio. No other filesystem
is using the blockdev_direct_IO* end_io callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes the following changes:
(1) There's a new special key type called ".request_key_auth".
This is an authorisation key for when one process requests a key and
another process is started to construct it. This type of key cannot be
created by the user; nor can it be requested by kernel services.
Authorisation keys hold two references:
(a) Each refers to a key being constructed. When the key being
constructed is instantiated the authorisation key is revoked,
rendering it of no further use.
(b) The "authorising process". This is either:
(i) the process that called request_key(), or:
(ii) if the process that called request_key() itself had an
authorisation key in its session keyring, then the authorising
process referred to by that authorisation key will also be
referred to by the new authorisation key.
This means that the process that initiated a chain of key requests
will authorise the lot of them, and will, by default, wind up with
the keys obtained from them in its keyrings.
(2) request_key() creates an authorisation key which is then passed to
/sbin/request-key in as part of a new session keyring.
(3) When request_key() is searching for a key to hand back to the caller, if
it comes across an authorisation key in the session keyring of the
calling process, it will also search the keyrings of the process
specified therein and it will use the specified process's credentials
(fsuid, fsgid, groups) to do that rather than the calling process's
credentials.
This allows a process started by /sbin/request-key to find keys belonging
to the authorising process.
(4) A key can be read, even if the process executing KEYCTL_READ doesn't have
direct read or search permission if that key is contained within the
keyrings of a process specified by an authorisation key found within the
calling process's session keyring, and is searchable using the
credentials of the authorising process.
This allows a process started by /sbin/request-key to read keys belonging
to the authorising process.
(5) The magic KEY_SPEC_*_KEYRING key IDs when passed to KEYCTL_INSTANTIATE or
KEYCTL_NEGATE will specify a keyring of the authorising process, rather
than the process doing the instantiation.
(6) One of the process keyrings can be nominated as the default to which
request_key() should attach new keys if not otherwise specified. This is
done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_*
constants. The current setting can also be read using this call.
(7) request_key() is partially interruptible. If it is waiting for another
process to finish constructing a key, it can be interrupted. This permits
a request-key cycle to be broken without recourse to rebooting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes it possible to pass a session keyring through to the
process spawned by call_usermodehelper(). This allows patch 3/3 to pass an
authorisation key through to /sbin/request-key, thus permitting better access
controls when doing just-in-time key creation.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch changes the key implementation in a number of ways:
(1) It removes the spinlock from the key structure.
(2) The key flags are now accessed using atomic bitops instead of
write-locking the key spinlock and using C bitwise operators.
The three instantiation flags are dealt with with the construction
semaphore held during the request_key/instantiate/negate sequence, thus
rendering the spinlock superfluous.
The key flags are also now bit numbers not bit masks.
(3) The key payload is now accessed using RCU. This permits the recursive
keyring search algorithm to be simplified greatly since no locks need be
taken other than the usual RCU preemption disablement. Searching now does
not require any locks or semaphores to be held; merely that the starting
keyring be pinned.
(4) The keyring payload now includes an RCU head so that it can be disposed
of by call_rcu(). This requires that the payload be copied on unlink to
prevent introducing races in copy-down vs search-up.
(5) The user key payload is now a structure with the data following it. It
includes an RCU head like the keyring payload and for the same reason. It
also contains a data length because the data length in the key may be
changed on another CPU whilst an RCU protected read is in progress on the
payload. This would then see the supposed RCU payload and the on-key data
length getting out of sync.
I'm tempted to drop the key's datalen entirely, except that it's used in
conjunction with quota management and so is a little tricky to get rid
of.
(6) Update the keys documentation.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Finds a pattern in the skb data according to the specified
textsearch configuration. Use textsearch_next() to retrieve
subsequent occurrences of the pattern. Returns the offset
to the first occurrence or UINT_MAX if no match was found.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements sequential reading for both linear and non-linear
skb data at zerocopy cost. The data is returned in chunks of
arbitary length, therefore random access is not possible.
Usage:
from := 0
to := 128
state := undef
data := undef
len := undef
consumed := 0
skb_prepare_seq_read(skb, from, to, &state)
while (len = skb_seq_read(consumed, &data, &state)) != 0 do
/* do something with 'data' of length 'len' */
if abort then
/* abort read if we don't wait for
* skb_seq_read() to return 0 */
skb_abort_seq_read(&state)
return
endif
/* not necessary to consume all of 'len' */
consumed += len
done
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
A finite state machine consists of n states (struct ts_fsm_token)
representing the pattern as a finite automation. The data is read
sequentially on a octet basis. Every state token specifies the number
of recurrences and the type of value accepted which can be either a
specific character or ctype based set of characters. The available
type of recurrences include 1, (0|1), [0 n], and [1 n].
The algorithm differs between strict/non-strict mode specyfing
whether the pattern has to start at the first octect. Strict mode
is enabled by default and can be disabled by inserting
TS_FSM_HEAD_IGNORE as the first token in the chain.
The runtime performance of the algorithm should be around O(n),
however while in strict mode the average runtime can be better.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The textsearch infrastructure provides text searching
facitilies for both linear and non-linear data.
Individual search algorithms are implemented in modules
and chosen by the user.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow using setsockopt to set TCP congestion control to use on a per
socket basis.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate out the two uses of netdev_max_backlog. One controls the
upper bound on packets processed per softirq, the new name for this is
netdev_budget; the other controls the limit on packets queued via
netif_rx.
Increase the max_backlog default to account for faster processors.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the throttling behaviour when the netif receive queue fills
because it behaves badly when using high speed networks under load.
The throttling cause multiple packet drops that cause TCP to go into
slow start mode. The same effective patch has been part of BIC TCP and
H-TCP as well as part of Web100.
The existing code drops 100's of packets when the queue fills;
this changes it to individual packet drop-tail.
Signed-off-by: Stephen Hemmminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the congestion sensing mechanism from netif_rx, and always
return either full or empty. Almost no driver checks the return value
from netif_rx, and those that do only use it for debug messages.
The original design of netif_rx was to do flow control based on the
receive queue, but NAPI has supplanted this and no driver uses the
feedback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove last vestiages of fastroute code that is no longer used.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhancement to the tcp_diag interface used by the iproute2 ss command
to report the tcp congestion control being used by a socket.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules. The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's a bit strange to see tty_register_ldisc call in modules' exit
functions.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the upcoming aio_down patch, it is useful to store a private data
pointer in the kiocb's wait_queue. Since we provide our own wake up
function and do not require the task_struct pointer, it makes sense to
convert the task pointer into a generic private pointer.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file duplicates <linux/posix_acl_xattr.h>, using slightly different
names.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch removes the f_error field and all checks of f_error.
Trond said:
f_error was introduced for NFS, and made sense when we were guaranteed
always to have a file pointer around when write errors occurred. Since
then, we have (for various reasons) had to introduce the nfs_open_context in
order to track the file read/write state, and it made sense to move our
f_error tracking there too.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows block device drivers to convert their ioctl functions to
unlocked_ioctl() like character devices and other subsystems. All
functions that were called with the BKL held before are still used that
way, but I would not be surprised if it could be removed from the ioctl
functions in drivers/block/ioctl.c themselves.
As a side note, I found that compat_blkdev_ioctl() acquires the BKL as
well, which looks like a bug. I have checked that every user of
disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so
it could easily be removed from compat_blkdev_ioctl().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch improves write performance for the CD/DVD packet writing driver.
The logic for switching between reading and writing has been changed so
that streaming writes are no longer interrupted by read requests.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This
is problematic for execution of 32 bit processes, which are not largefile
aware, either by SW emulation or by HW execution.
For such processes, the problem is two-fold:
1) When trying to open a file that is larger than 4G
the operation should fail, but it's not
2) Writing to offset larger than 4G should fail, but
it's not
The proposed patch takes advantage of the way 32 bit processes are
identified in ia64 systems. Such processes have PER_LINUX32 for their
personality. With the patch, the ia64 kernel will not enforce the
O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior
for all other architectures remains unchanged.
Signed-off-by: Yoav Zach <yoav.zach@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In situations where a kprobes handler calls a routine which has a probe on it,
then kprobes_handler() disarms the new probe forever. This patch removes the
above limitation by temporarily disarming the new probe. When the another
probe hits while handling the old probe, the kprobes_handler() saves previous
kprobes state and handles the new probe without calling the new kprobes
registered handlers. kprobe_post_handler() restores back the previous kprobes
state and the normal execution continues.
However on x86_64 architecture, re-rentrancy is provided only through
pre_handler(). If a routine having probe is referenced through
post_handler(), then the probes on that routine are disarmed forever, since
the exception stack is gets changed after the processor single steps the
instruction of the new probe.
This patch includes generic changes to support temporary disarming on
reentrancy of probes.
Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().
Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time. The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address. This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:
*p->addr = BREAKPOINT_INSTRUCTION;
The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:
* void arch_arm_kprobe(struct kprobe *p)
* void arch_disarm_kprobe(struct kprobe *p)
and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).
I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So... I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.
So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds function-return probes to kprobes for the i386
architecture. This enables you to establish a handler to be run when a
function returns.
1. API
Two new functions are added to kprobes:
int register_kretprobe(struct kretprobe *rp);
void unregister_kretprobe(struct kretprobe *rp);
2. Registration and unregistration
2.1 Register
To register a function-return probe, the user populates the following
fields in a kretprobe object and calls register_kretprobe() with the
kretprobe address as an argument:
kp.addr - the function's address
handler - this function is run after the ret instruction executes, but
before control returns to the return address in the caller.
maxactive - The maximum number of instances of the probed function that
can be active concurrently. For example, if the function is non-
recursive and is called with a spinlock or mutex held, maxactive = 1
should be enough. If the function is non-recursive and can never
relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
be enough. maxactive is used to determine how many kretprobe_instance
objects to allocate for this particular probed function. If maxactive <=
0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
NR_CPUS) else maxactive=NR_CPUS)
For example:
struct kretprobe rp;
rp.kp.addr = /* entrypoint address */
rp.handler = /*return probe handler */
rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
register_kretprobe(&rp);
The following field may also be of interest:
nmissed - Initialized to zero when the function-return probe is
registered, and incremented every time the probed function is entered but
there is no kretprobe_instance object available for establishing the
function-return probe (i.e., because maxactive was set too low).
2.2 Unregister
To unregiter a function-return probe, the user calls
unregister_kretprobe() with the same kretprobe object as registered
previously. If a probed function is running when the return probe is
unregistered, the function will return as expected, but the handler won't
be run.
3. Limitations
3.1 This patch supports only the i386 architecture, but patches for
x86_64 and ppc64 are anticipated soon.
3.2 Return probes operates by replacing the return address in the stack
(or in a known register, such as the lr register for ppc). This may
cause __builtin_return_address(0), when invoked from the return-probed
function, to return the address of the return-probes trampoline.
3.3 This implementation uses the "Multiprobes at an address" feature in
2.6.12-rc3-mm3.
3.4 Due to a limitation in multi-probes, you cannot currently establish
a return probe and a jprobe on the same function. A patch to remove
this limitation is being tested.
This feature is required by SystemTap (http://sourceware.org/systemtap),
and reflects ideas contributed by several SystemTap developers, including
Will Cohen and Ananth Mavinakayanahalli.
Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some code duplicated in both callers into vfs_quota_on_mount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
of /proc/devices from spilling over the provided page if more than 4096
bytes of string data are generated from all the registered character and
block devices in a system
Signed-off-by: Neil Horman <nhorman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like locking can be optimised quite a lot. Increase lock widths
slightly so lo_lock is taken fewer times per request. Also it was quite
trivial to cover lo_pending with that lock, and remove the atomic
requirement. This also makes memory ordering explicitly correct, which is
nice (not that I particularly saw any mem ordering bugs).
Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
block size, 4K page size). System is 2 socket Xeon with HT (4 thread).
intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh
Before:
0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k
After:
0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.
Most of the changes come from the sound and net subsystems. The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.
I left UML alone for now because I would need more time to read the code
carefully before making changes there.
Signed-off-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
There is a race condition that can occur if an inode is allocated and then
released (using iput) during the ->fill_super functions. The race
condition is between kswapd and mount.
For most filesystems this can only happen in an error path when kswapd is
running concurrently. For isofs, however, the error can occur in a more
common code path (which is how the bug was found).
The logic here is "we want final iput() to free inode *now* instead of
letting it sit in cache if fs is going down or had not quite come up". The
problem is with kswapd seeing such inodes in the middle of being killed and
happily taking over.
The clean solution would be to tell kswapd to leave those inodes alone and
let our final iput deal with them. I.e. add a new flag
(I_FORCED_FREEING), set it before write_inode_now() there and make
prune_icache() leave those alone.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch splits del_timer_sync() into 2 functions. The new one,
try_to_del_timer_sync(), returns -1 when it hits executing timer.
It can be used in interrupt context, or when the caller hold locks which
can prevent completion of the timer's handler.
NOTE. Currently it can't be used in interrupt context in UP case, because
->running_timer is used only with CONFIG_SMP.
Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
set_running_timer(), it is cheap.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch tries to solve following problems:
1. del_timer_sync() is racy. The timer can be fired again after
del_timer_sync have checked all cpus and before it will recheck
timer_pending().
2. It has scalability problems. All cpus are scanned to determine
if the timer is running on that cpu.
With this patch del_timer_sync is O(1) and no slower than plain
del_timer(pending_timer), unless it has to actually wait for
completion of the currently running timer.
The only restriction is that the recurring timer should not use
add_timer_on().
3. The timers are not serialized wrt to itself.
If CPU_0 does mod_timer(jiffies+1) while the timer is currently
running on CPU 1, it is quite possible that local interrupt on
CPU_0 will start that timer before it finished on CPU_1.
4. The timers locking is suboptimal. __mod_timer() takes 3 locks
at once and still requires wmb() in del_timer/run_timers.
The new implementation takes 2 locks sequentially and does not
need memory barriers.
Currently ->base != NULL means that the timer is pending. In that case
->base.lock is used to lock the timer. __mod_timer also takes timer->lock
because ->base can be == NULL.
This patch uses timer->entry.next != NULL as indication that the timer is
pending. So it does __list_del(), entry->next = NULL instead of list_del()
when the timer is deleted.
The ->base field is used for hashed locking only, it is initialized
in init_timer() which sets ->base = per_cpu(tvec_bases). When the
tvec_bases.lock is locked, it means that all timers which are tied
to this base via timer->base are locked, and the base itself is locked
too.
So __run_timers/migrate_timers can safely modify all timers which could
be found on ->tvX lists (pending timers).
When the timer's base is locked, and the timer removed from ->entry list
(which means that _run_timers/migrate_timers can't see this timer), it is
possible to set timer->base = NULL and drop the lock: the timer remains
locked.
This patch adds lock_timer_base() helper, which waits for ->base != NULL,
locks the ->base, and checks it is still the same.
__mod_timer() schedules the timer on the local CPU and changes it's base.
However, it does not lock both old and new bases at once. It locks the
timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
and adds this timer. This simplifies the code, because AB-BA deadlock is not
possible. __mod_timer() also ensures that the timer's base is not changed
while the timer's handler is running on the old base.
__run_timers(), del_timer() do not change ->base anymore, they only clear
pending flag.
So del_timer_sync() can test timer->base->running_timer == timer to detect
whether it is running or not.
We don't need timer_list->lock anymore, this patch kills it.
We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
before clearing timer's pending flag. It was needed because __mod_timer()
did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
could race with del_timer()->list_del(). With this patch these functions are
serialized through base->lock.
One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
adds global
struct timer_base_s {
spinlock_t lock;
struct timer_list *running_timer;
} __init_timer_base;
which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
struct are replaced by struct timer_base_s t_base.
It is indeed ugly. But this can't have scalability problems. The global
__init_timer_base.lock is used only when __mod_timer() is called for the first
time AND the timer was compile time initialized. After that the timer migrates
to the local CPU.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
blk_queue_tag->real_max_depth was used to optimize out unnecessary
allocations/frees on tag resize. However, the whole thing was very broken -
tag_map was never allocated to real_max_depth resulting in access beyond the
end of the map, bits in [max_depth..real_max_depth] were set when initializing
a map and copied when resizing resulting in pre-occupied tags.
As the gain of the optimization is very small, well, almost nill, remove the
whole thing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems). The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.
Does some realignment too.
Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org>
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Pravin Shelar <pravin@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sparse's initalization be accessible at runtime. This allows sparse
mappings to be created after boot in a hotplug situation.
This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.
The section_mem_map doesn't really store a pointer. It stores something that
is convenient to do some math against to get a pointer. It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.
There are a couple of things I'd like to store about a section. First of all,
the fact that it is !NULL does not mean that it is present. There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map. So, I don't think that check is safe.
Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with. Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there. We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem. It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn. It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.
Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts. The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:
Node 0: +++++ +++++
Node 1: +++++
Previous behavior before Mike's patch was to assign nodes like this:
Node 0: 00000 XXXXX
Node 1: 11111
Where the 'X' areas were simply thrown away. The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111
This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory. memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages. However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:
struct page *start = pfn_to_page(start_pfn);
// effectively start = &node->node_mem_map[0]
for (page = start; page < (start + size); page++) {
init_page_here();...
page++;
}
Slow, and wasteful, but generally harmless.
But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):
for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
page = pfn_to_page(pfn);
}
And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0. This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a default implementation for early_pfn_to_nid returning node 0. Allow
architectures to override this with their own implementation out of
asm/mmzone.h.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is some confusion that arose when working on SPARSEMEM patch between
what is needed for DISCONTIG vs. NUMA.
Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
All of the current NUMA implementations require an implementation of
DISCONTIG. Because of this, quite a lot of code which is really needed for
NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some
of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
case.
Introducing this new NEED_MULTIPLE_NODES config option allows code that is
needed for both NUMA or DISCONTIG to be separated out from code that is
specific to DISCONTIG.
One great advantage of this approach is that it doesn't require every
architecture to be converted over. All of the current implementations
should "just work", only the ones implementing SPARSEMEM will have to be
fixed up.
The change to free_area_init() makes it work inside, or out of the new
config option.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Generify the value fields in the page_flags. The aim is to allow the location
and size of these fields to be varied. Additionally we want to move away from
fixed allocations per field whilst still enforcing the overall bit utilisation
limits. We rely on the compiler to spot and optimise the accessor functions.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a simple allocator for the NUMA remap space. This space is very
scarce, used for structures which are best allocated node local.
This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.
Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
to allow us to allocate the pgdat's using this mechanism; which we do not yet
do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code. On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.
There was also a node_mem_map(nid) macro on many architectures. Its use
along with the use of ->node_mem_map itself was not consistent. It has
been removed in favor of two new, more explicit, arch-independent macros:
pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)
I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I
believe the newer names are much clearer.
These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead. We could
make this the only behavior if people want, but I don't want to change too
much at once. One thing at a time.
This patch removes more code than it adds.
Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/
Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Use extern prefix for functions required.
- Removed a lot of wrappers, including t1_read/write_reg_4.
- Removed various macros, using native kernel calls now.
- Enumerated various #defines.
- Removed a lot of shared code which is not currently used in "NIC only" mode.
- Removed dead code.
Documentation/networking/cxgb.txt:
- Updated release notes for version 2.1.1
drivers/net/chelsio/ch_ethtool.h
- removed file, no longer using ETHTOOL namespace.
drivers/net/chelsio/common.h
- moved code from osdep.h to common.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/cphy.h
- removed dead code.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/cxgb2.c
- use DMA_{32,64}BIT_MASK in include/linux/dma-mapping.h.
- removed unused code.
- use printk message for link info resembling drivers/net/mii.c.
- no longer using the MODULE_xxx namespace.
- no longer using "pci_" namespace.
- no longer using ETHTOOL namespace.
drivers/net/chelsio/cxgb2.h
- removed file, merged into common.h
drivers/net/chelsio/elmer0.h
- removed dead code.
- added various enums.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/espi.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
drivers/net/chelsio/espi.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/gmac.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/mv88x201x.c
- changes to sync with Chelsio TOT.
drivers/net/chelsio/osdep.h
- removed file, consolidation. osdep was used to translate wrapper functions
since our code supports multiple OSs. removed wrappers.
drivers/net/chelsio/pm3393.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
- removed unused code.
drivers/net/chelsio/regs.h
- added a few register entries for future and current feature support.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/sge.c
- rewrote large portion of scatter-gather engine to stabilize
performance.
- using u8/u16/u32 kernel types instead of __u8/__u16/__u32 compiler
types.
drivers/net/chelsio/sge.h
- rewrote large portion of scatter-gather engine to stabilize
performance.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/subr.c
- merged tp.c into subr.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
- removed unused code.
drivers/net/chelsio/suni1x10gexp_regs.h
- modified copyright and authorship of file.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/tp.c
- removed file, merged into subr.c.
drivers/net/chelsio/tp.h
- removed file.
include/linux/pci_ids.h
- patched to include PCI_VENDOR_ID_CHELSIO 0x1425, removed define from
our code.
This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data". It allows use of the Fast-Select-Acceptance
optional user facility for X.25.
This patch just implements fast select with no restriction on response
(NRR). What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data. This patch allows such a response.
The called DTE can also respond with a clear-request packet that contains
call-user-data. However, this feature is currently not implemented by the
patch.
How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,
After a listen socket in created and bound as follows
socket(AF_X25, SOCK_SEQPACKET, 0);
bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted
packet will be sent back until the following system call is made on the socket
that accepts the call
ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the
application server's call-user-data in the call-accepted packet,
provided the call-request was made with Fast-select NRR.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Shaun Pereira <spereira@tusc.com.au>
This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router). Details
are as follows.
Current state of module:
A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.
If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own. This is required when multiple servers listen at the same x25 address
and device interface. The kernel currently matches ALL call user data, if
present.
Current Changes:
This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match. By
default no call user data is matched, even if call user data is present.
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched.
Otherwise the kernel behavior is exactly as the original implementation.
This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller
Future Changes on next patch:
There are cases however when call user data may be present in the call
accepted packet. According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used. My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used. I
am currently testing this, again with xot on linux and cisco.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides support for registering multiple netpoll clients to the
same network device. Only one of these clients may register an rx_hook,
however. In practice, this restriction has not been problematic. It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.
The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in. Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.
A lock is introduced to protect the setting and clearing of the np_rx
pointer. The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll. The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll; and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.
The struct netpoll is now pointed to by the netpoll_info structure. As
such, the previous behaviour of the code is preserved.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This trivial patch moves the assignment of poll_owner to -1 inside of
the lock. This fixes a potential SMP race in the code.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Basically copies the VFS's method for tracking writebacks and applies
it to the struct nfs_page.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Unless we're doing O_APPEND writes, we really don't care about revalidating
the file length. Just make sure that we catch any page cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.
Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 currently returns the unsigned 64-bit cookie directly to
userspace. The following patch causes the kernel to generate
loff_t offsets for the benefit of userland.
The current server-generated READDIR cookie is cached in the
nfs_open_context instead of in filp->f_pos, so we still end up work
correctly under directory insertions/deletion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Attach acls to inodes in the icache to avoid unnecessary GETACL RPC
round-trips. As long as the client doesn't retrieve any acls itself, only the
default acls of exiting directories and the default and access acls of new
directories will end up in the cache, which preserves some memory compared to
always caching the access and default acl of all files.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 has no concept of a umask on the server side: The client applies
the umask locally, and sends the effective permissions to the server.
This behavior is wrong when files are created in a directory that has a
default ACL. In this case, the umask is supposed to be ignored, and
only the default ACL determines the file's effective permissions.
Usually its the server's task to conditionally apply the umask. But
since the server knows nothing about the umask, we have to do it on the
client side. This patch tries to fetch the parent directory's default
ACL before creating a new file, computes the appropriate create mode to
send to the server, and finally sets the new file's access and default
acl appropriately.
Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial
version of this patch, as well as for arguing why we need this change.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds acl support fo nfs clients via the NFSACL protocol extension, by
implementing the getxattr, listxattr, setxattr, and removexattr iops for the
system.posix_acl_access and system.posix_acl_default attributes. This patch
implements a dumb version that uses no caching (and thus adds some overhead).
(Another patch in this patchset adds caching as well.)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS and NFSACL programs run on the same RPC transport. This patch adds
support for this by converting svc_program into a chained list of programs
(server-side).
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache
acls of size up to a page. Also prepare for up to a page of acl data even
when the user doesn't pass in a buffer, as when they want to get the acl
length to decide what size buffer to allocate.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
writing acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
reading acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ACL support will require supporting additional inode operations in v4
(getxattr, setxattr, listxattr). This patch allows different protocol versions
to support different inode operations by adding a file_inode_ops to the
nfs_rpc_ops (to match the existing dir_inode_ops).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we don't create an RPC client without checking that the server
does indeed support the RPC program + version that we are trying to set up.
This enables us to immediately return an error to "mount" if it turns out
that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The patch just changes the order of structure members.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a gfp_mask to audit_log_start() and audit_log(), to reduce the
amount of GFP_ATOMIC allocation -- most of it doesn't need to be
GFP_ATOMIC. Also if the mask includes __GFP_WAIT, then wait up to
60 seconds for the auditd backlog to clear instead of immediately
abandoning the message.
The timeout should probably be made configurable, but for now it'll
suffice that it only happens if auditd is actually running.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Add support for Maxim/Dallas DS1374 Real-Time Clock Chip
This change adds support for the Maxim/Dallas DS1374 RTC chip. This chip
is an I2C-based RTC that maintains a simple 32-bit binary seconds count
with battery backup support.
Signed-off-by: Randy Vinson <rvinson@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch renames the new linux/i2c-sysfs.h header file to
linux/hwmon-sysfs.h. This names seems to be more appropriate since this
file defines macros and structures not related to i2c but to hardware
monitoring drivers. The patch also updates the five hardware monitoring
driver which include that header file already.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds conversion from VID (mV) to register value. Used by the atxp1 I2C module.
Removed uneeded switch case.
Signed-off-by: Sebastian Witt <se.witt@gmx.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some months ago, you killed the address ranges mechanism from all
sensors i2c chip drivers (both the module parameters and the in-code
address lists). I think it was a very good move, as the ranges can
easily be replaced by individual addresses, and this allowed for
significant cleanups in the i2c core (let alone the impressive size
shrink for all these drivers).
Unfortunately you did not do the same for non-sensors i2c chip drivers.
These need the address ranges even less, so we could get rid of the
ranges here as well for another significant i2c core cleanup. Here comes
a patch which does just that. Since the process is exactly the same as
what you did for the other drivers set already, I did not split this one
in parts.
A documentation update is included.
The change saves 308 bytes in the i2c core, and an average 1382 bytes
for chip drivers which use I2C_CLIENT_INSMOD, 126 bytes for those which
do not.
This change is required if we want to merge the sensors and non-sensors
i2c code (and we want to do this).
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Index: gregkh-2.6/Documentation/i2c/writing-clients
===================================================================
1/ Must typecast int to (sector_t) before inverting or we
might not invert enough bits.
2/ When "bitmap_offset" was added to mdp_superblock_1, we didn't increase
the count of words-used (96 to 100).
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
currently, md updates all superblocks (one on each device) in series. It
waits for one write to complete before starting the next. This isn't a big
problem as superblock updates don't happen that often.
However it is neater to do it in parallel, and if the drives in the array have
gone to "sleep" after a period of idleness, then waking them is parallel is
faster (and someone else should be worrying about power drain).
Futher, we will need parallel superblock updates for a future patch which
keeps the intent-logging bitmap near the superblock.
Also remove the silly code that retired superblock updates 100 times. This
simply never made sense.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This provides an alternate to storing the bitmap in a separate file. The
bitmap can be stored at a given offset from the superblock. Obviously the
creator of the array must make sure this doesn't intersect with data....
After is good for version-0.90 superblocks.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before completing a 'write' the md superblock might need to be updated.
This is best done by the md_thread.
The current code schedules this up and queues the write request for later
handling by the md_thread.
However some personalities (Raid5/raid6) will deadlock if the md_thread
tries to submit requests to its own array.
So this patch changes things so the processes submitting the request waits
for the superblock to be written and then submits the request itself.
This fixes a recently-created deadlock in raid5/raid6
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an array is degraded, bit in the intent-bitmap are never cleared. So if
a recently failed drive is re-added, we only need to reconstruct the block
that are still reflected in the bitmap.
This patch adds support for this re-adding.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we don't wait for updates to the bitmap to be flushed to disk
properly. The infrastructure all there, but it isn't being used....
A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each
page as we cannot get callbacks when a page write completes.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With this patch, the intent to write to some block in the array can be logged
to a bitmap file. Each bit represents some number of sectors and is set
before any update happens, and only cleared when all writes relating to all
sectors are complete.
After an unclean shutdown, information in this bitmap can be used to optimise
resync - only sectors which could be out-of-sync need to be updated.
Also if a drive is removed and then added back into an array, the recovery can
make use of the bitmap to optimise reconstruction. This is not implemented in
this patch.
Currently the bitmap is stored in a file which must (obviously) be stored on a
separate device.
The patch only provided infrastructure. It does not update any personalities
to bitmap intent logging.
Md arrays can still be used with no bitmap file. This patch has minimal
impact on such arrays.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1/ change the return value (which is number-of-sectors synced)
from 'int' to 'sector_t'.
The number of sectors is usually easily small enough to fit
in an int, but if resync needs to abort, it may want to return
the total number of remaining sectors, which could be large.
Also errors cannot be returned as negative numbers now, so use
0 instead
2/ Add a 'skipped' return parameter to allow the array to report
that it skipped the sectors. This allows md to take this into account
in the speed calculations.
Currently there is no important skipping, but the bitmap-based-resync
that is coming will use this.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When md marks the superblock dirty before a write, it calls
generic_make_request (to write the superblock) from within
generic_make_request (to write the first dirty block), which could cause
problems later.
With this patch, the superblock write is always done by the helper thread, and
write request are delayed until that write completes.
Also, the locking around marking the array dirty and writing the superblock is
improved to avoid possible races.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Shrink the stack when calling the drawing alignment functions.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@hotpop.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve the fonts for use with the framebuffer.
I've added all the characters marked 'FIXME' in the sun12x22 font and
created a 10x18 font (based on the sun12x22 font) and a 7x14 font (based
on the vga8x16 font).
This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.
I am placing my changes under the GPL, however, I've not seen any copyright
notices on the sun12x22 font and the vga8x16 font which I derived my new
fonts from so I don't know what the copyright status is.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch adds pci ID for CT 69000 chipset.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the Arc monochrome LCD board.
The board uses KS108 controllers to drive individual 64x64 LCD matrices.
The board can be paneled in a variety of setups such as 2x1=128x64,
4x4=256x256 and so on. The board/host interface is through GPIO.
Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: <linux-fbdev-devel@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since no one is using the inbuf, outbuf of struct fb_pixmap I removed their
use in the framebuffer console. The idea is instead move the pixmap
functionality below the accelerated functions intead of on top as the way
it is now. If there is no objection please apply. This is against Linus
latestr GIT tree. Thank you.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Chris Wedgwood <cw@f00f.org>
As suggested by Chris, we can make the "just added" method ->release
conditional to UML only (better: to archs requesting it, i.e. only UML
currently), so that other archs don't get this unneeded crud, and if UML
won't need it any more we can kill this.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Chris Wedgwood <cw@f00f.org>
Currently UML must explicitly call the UML-specific
free_irq_by_irq_and_dev() for each free_irq call it's done.
This is needed because ->shutdown and/or ->disable are only called when the
last "action" for that irq is removed.
Instead, for UML shared IRQs (UML IRQs are very often, if not always,
shared), for each dev_id some setup is done, which must be cleared on the
release of that fd. For instance, for each open console a new instance
(i.e. new dev_id) of the same IRQ is requested().
Exactly, a fd is stored in an array (pollfds), which is after read by a
host thread and passed to poll(). Each event registered by poll() triggers
an interrupt. So, for each free_irq() we must remove the corresponding
host fd from the table, which we do via this -release() method.
In this patch we add an appropriate hook for this, and remove all uses of
it by pointing the hook to the said procedure; this is safe to do since the
said procedure.
Also some cosmetic improvements are included.
This is heavily based on some work by Chris Wedgwood, which however didn't
get the patch merged for something I'd call a "misunderstanding" (the need
for this patch wasn't cleanly explained, thus adding the generic hook was
felt as undesirable).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several hardware features of SGI's IOC4 I/O controller chip require
timing-related driver calculations dependent upon the PCI bus speed. This
patch enables the core IOC4 driver code to detect the actual bus speed and
store a value that can later be used by the IOC4 subdrivers as needed.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This series of patches reworks the configuration and internal structure
of the SGI IOC4 I/O controller device drivers.
These changes are motivated by several factors:
- The IOC4 chip PCI resources are of mixed use between functions (i.e.
multiple functions are handled in the same address range, sometimes
within the same register), muddling resource ownership and initialization
issues. Centralizing this ownership in a core driver is desirable.
- The IOC4 chip implements multiple functions (serial, IDE, others not
yet implemented in the mainline kernel) but is not a multifunction
PCI device. In order to properly handle device addition and removal
as well as module insertion and deletion, an intermediary IOC4-specific
driver layer is needed to handle these operations cleanly.
- All IOC4 drivers are currently enabled by a single CONFIG value. As
not all systems need all IOC4 functions, it is desireable to enable
these drivers independently.
- The current IOC4 core driver will trigger loading of all function-level
drivers, as it makes direct calls to them. This situation should be
reversed (i.e. function-level drivers cause loading of core driver)
in order to maintain a clear and least-surprise driver loading model.
- IOC4 hardware design necessitates some driver-level dependency on
the PCI bus clock speed. Current code assumes a 66MHz bus, but the
speed should be autodetected and appropriate compensation taken.
This patch series effects the above changes by a newly and better designed
IOC4 core driver with which the function-level drivers can register and
deregister themselves upon module insertion/removal. By tracking these
modules, device addition/removal is also handled properly. PCI resource
management and ownership issues are centralized in this core driver, and
IOC4-wide configuration actions such as bus speed detection are also
handled in this core driver.
This patch:
The SGI IOC4 I/O controller chip implements multiple functions, though it is
not a multi-function PCI device. Additionally, various PCI resources of the
IOC4 are shared by multiple hardware functions, and thus resource ownership by
driver is not clearly delineated. Due to the current driver design, all core
and subordinate drivers must be loaded, or none, which is undesirable if not
all IOC4 hardware features are being used.
This patch reorganizes the IOC4 drivers so that the core driver provides a
subdriver registration service. Through appropriate callbacks the subdrivers
can now handle device addition and removal, as well as module insertion and
deletion (though the IOC4 IDE driver requires further work before module
deletion will work). The core driver now takes care of allocating PCI
resources and data which must be shared between subdrivers, to clearly
delineate module ownership of these items.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com
Acked-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Added descriptions of the new MPC8548 family processors, e500 core and
peripherals.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc). The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.
The generic allocator can be used by device driver to manage special memory
etc. The generic allocator is based on the allocator from the sym53c8xx_2
driver.
Various users on ia64 needs uncached memory. The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster. The specific user for this is the XPC code. Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose. Performance
of doing this through uncached vs cached memory is pretty substantial. This
is handled by the mspec driver which I will push out in a seperate patch.
Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please. For instance to handle onboard
device memory. It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).
On ia64 memory has various properties within a granule, ie. it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode. The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc. The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool. Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pageset array can potentially acquire a huge amount of memory on large
NUMA systems. F.e. on a system with 512 processors and 256 nodes there
will be 256*512 pagesets. If each pageset only holds 5 pages then we are
talking about 655360 pages.With a 16K page size on IA64 this results in
potentially 10 Gigabytes of memory being trapped in pagesets. The typical
cases are much less for smaller systems but there is still the potential of
memory being trapped in off node pagesets. Off node memory may be rarely
used if local memory is available and so we may potentially have memory in
seldom used pagesets without this patch.
The slab allocator flushes its per cpu caches every 2 seconds. The
following patch flushes the off node pageset caches in the same way by
tying into the slab flush.
The patch also changes /proc/zoneinfo to include the number of pages
currently in each pageset.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By making the offset argument of __read_page_state an unsigned long instead of
unsigned, we can avoid forcing the compiler to sign extend a usually constant
argument. This saves 1 instruction on x86-64.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By making the offset argument of __mod_page_state an unsigned long instead
of unsigned, we can avoid forcing the compiler to sign extend a usually
constant argument. This saves 1 instruction on x86-64.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0. The following patch removes the argument and updates all the
calls to try_to_free_pages.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll
generate a little more code, but we don't use PageHigheMem() in many places.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.
The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.
The problem is twofold:
1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.
So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.
2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.
The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.
The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.
Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.
Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.
Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch modifies the way pagesets in struct zone are managed.
Each zone has a per-cpu array of pagesets. So any particular CPU has some
memory in each zone structure which belongs to itself. Even if that CPU is
not local to that zone.
So the patch relocates the pagesets for each cpu to the node that is nearest
to the cpu instead of allocating the pagesets in the (possibly remote) target
zone. This means that the operations to manage pages on remote zone can be
done with information available locally.
We play a macro trick so that non-NUMA pmachines avoid the additional
pointer chase on the page allocator fastpath.
AIM7 benchmark on a 32 CPU SGI Altix
w/o patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005
100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005
200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005
300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005
400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005
500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005
600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005
700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005
800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005
900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005
1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005
with slab API changes and pageset patch:
Tasks jobs/min jti jobs/min/task real cpu
1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005
100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005
200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005
300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005
400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005
500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005
600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005
700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005
800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005
900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005
1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code. It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.
Tested, at least a little, on ppc64, i386 and x86_64.
Notes:
- this patch changes the meaning of set_huge_pte() to be more
analagous to set_pte()
- does SH4 need s special huge_ptep_get_and_clear()??
Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When early zone reclaim is turned on the LRU is scanned more frequently when a
zone is low on memory. This limits when the zone reclaim can be called by
skipping the scan if another thread (either via kswapd or sync reclaim) is
already reclaiming from the zone.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When using the early zone reclaim, it was noticed that allocating new pages
that should be spread across the whole system caused eviction of local pages.
This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts. The example that is implemented here is for page
cache pages. We want page cache pages to be spread across the whole system,
and we don't want page cache pages to evict other pages to get local memory.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the core of the (much simplified) early reclaim. The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.
One of the major uses of this is NUMA machines. With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.
This adds a zone tuneable to enable early zone reclaim. It is selected on a
per-zone basis and is turned on/off via syscall.
Adding some extra throttling on the reclaim was also required (patch
4/4). Without the machine would grind to a crawl when doing a "make -j"
kernel build. Even with this patch the System Time is higher on
average, but it seems tolerable. Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
wall user sys %cpu ctx sw. sleeps
---- ---- --- ---- ------ ------
No patch 1009 1384 847 258 298170 504402
w/patch, no reclaim 880 1376 667 288 254064 396745
w/patch & reclaim 1079 1385 926 252 291625 548873
These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot. Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.
I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.
Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).
The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.
The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.
Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.
In the new code, there are two externally visible symbols:
- smp_processor_id(): debug variant.
- raw_smp_processor_id(): nondebug variant. Replaces all existing
uses of _smp_processor_id() and __smp_processor_id(). Defined
by every SMP architecture in include/asm-*/smp.h.
There is one new internal symbol, dependent on DEBUG_PREEMPT:
- debug_smp_processor_id(): internal debug variant, mapped to
smp_processor_id().
Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file. All related comments got updated and/or
clarified.
I have build/boot tested the following 8 .config combinations on x86:
{SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}
I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
architectures are untested, but should work just fine.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we have enough rules to fill the netlink buffer space, it'll
deadlock because auditctl isn't ever actually going to read from the
socket until we return, and we aren't going to return until it
reads... so we spawn a kernel thread to spew out the list and then
exit.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
attribues are saved in the persistent sysfs_dirent structure as a pointer
to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
for which default attributes are getting changed. Thanks to Jon Smirl for
this suggestion.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch creates a new header with a potential standard i2c sensor
attribute type (which simply includes an int representing the sensor
number/index) and the associated macros, SENSOR_DEVICE_ATTR to define
a static attribute and to_sensor_dev_attr to get a
sensor_device_attribute reference from an embedded device_attribute
reference.
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
This patch adds the device_attribute paramerter to the
device_attribute store and show sysfs callback functions, and passes a
reference to the attribute when the callbacks are called.
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.
Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.
These operations are defined as
void foo_set(void *data, u64 val); and
u64 foo_get(void *data);
where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.
A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.
This patch already contains the changes for debugfs to use attributes
for its internal file operations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a generic function 'unregister_node()'.
It is used to remove objects of a node going away
for hotplug. All the devices on the node must be
unregistered before calling this function.
Signed-off-by: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -puN drivers/base/node.c~numa_hp_base drivers/base/node.c
There's no check to see if the device is already bound to a driver, which
could do bad things. The first thing to go wrong is that it will try to match
a driver with a device already bound to one. In some cases (it appears with
USB with drivers/usb/core/usb.c::usb_match_id()), some drivers will match a
device based on the class type, so it would be common (especially for HID
devices) to match a device that is already bound.
The fun comes when ->probe() is called, it fails, then
driver_probe_device() does this:
dev->driver = NULL;
Later on, that pointer could be be dereferenced without checking and cause
hell to break loose.
This problem could be nasty. It's very hardware dependent, since some
devices could have a different set of matching qualifiers than others.
Now, I don't quite see exactly where/how you were getting that crash.
You're dereferencing bad memory, but I'm not sure which pointer was bad
and where it came from, but it could have come from a couple of different
places.
The patch below will hopefully fix it all up for you. It's against
2.6.12-rc2-mm1, and does the following:
- Move logic to driver_probe_device() and comments uncommon returns:
1 - If device is bound
0 - If device not bound, and no error
error - If there was an error.
- Move locking to caller of that function, since we want to lock a
device for the entire time we're trying to bind it to a driver (to
prevent against a driver being loaded at the same time).
- Update __device_attach() and __driver_attach() to do that locking.
- Check if device is already bound in __driver_attach()
- Update the converse device_release_driver() so it locks the device
around all of the operations.
- Mark driver_probe_device() as static and remove export. It's an
internal function, it should stay that way, and there are no other
callers. If there is ever a need to export it, we can audit it as
necessary.
Signed-off-by: Andrew Morton <akpm@osdl.org>
- Use klist iterator in device_for_each_child(), making it safe to use for
removing devices.
- Remove unused list_to_dev() function.
- Kills all usage of devices_subsys.rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it in driver_for_each_device() instead of the regular list_head and stop using
the bus's rwsem for protection.
- Use driver_for_each_device() in driver_detach() so we don't deadlock on the
bus's rwsem.
- Remove ->devices.
- Move klist access and sysfs link access out from under device's semaphore, since
they're synchronized through other means.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it in bus_for_each_drv().
- Use the klist spinlock instead of the bus rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it for bus_for_each_dev().
- Use the klist spinlock instead of the bus rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.
The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.
It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.
The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.
It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.
There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.
There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -Nru a/include/linux/klist.h b/include/linux/klist.h
Now there's an iterator for accessing each device bound to a driver.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Index: linux-2.6.12-rc2/drivers/base/driver.c
===================================================================
This adds a per-device semaphore that is taken before every call from the core to a
driver method. This prevents e.g. simultaneous calls to the ->suspend() or ->resume()
and ->probe() or ->release(), potentially saving a whole lot of headaches.
It also moves us a step closer to removing the bus rwsem, since it protects the fields
in struct device that are modified by the core.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
One step on improving the class api so that it can not be used incorrectly.
This also fixes the module owner issue with the dev files that happened when
the devt logic moved to the class core.
Based on a patch originally written by Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver core:
change driver's, bus's, class's and platform device's names
to be const char * so one can use
const char *drv_name = "asdfg";
when initializing structures.
Also kill couple of whitespaces.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kobject: make kobject's name const char * since users should not
attempt to change it (except by calling kobject_rename).
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Below is a more generic patch to do fib_lookup via netlink. For others
we should say that we discussed this as a way to verify route selection.
It's also possible there are others uses for this.
In short the fist half of struct fib_result_nl is filled in by caller
and netlink call fills in the other half and returns it.
In case anyone is interested there is a corresponding user app to compare
the full routing table this was used to test implementation of the LC-trie.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is
similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect
on IPv4 tunnel mode states. For these states, it will ensure that the
DF flag is always cleared.
This is primarily useful to work around ICMP blackholes.
In future this flag could also allow a larger MTU to be set within the
tunnel just like IPIP/GRE tunnels. This could be useful for short haul
tunnels where temporary fragmentation outside the tunnel is desired over
smaller fragments inside the tunnel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When LOOKUP_PARENT is used, the inode which results is not the inode
found at the pathname. Report the flags so that this doesn't generate
misleading audit records.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Original From: Mike Christie <michaelc@cs.wisc.edu>
Modified to split out block changes (this patch) and SCSI pieces.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change the blk_rq_map_user() and blk_rq_map_kern() interface to require
a previously allocated request to be passed in. This is both more efficient
for multiple iterations of mapping data to the same request, and it is also
a much nicer API.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add blk_rq_map_kern which takes a kernel buffer and maps it into
a request and bio. This can be used by the dm hw_handlers, old
sg_scsi_ioctl, and one day scsi special requests so all requests
comming into scsi will have bios. All requests having bios
should allow scsi to use scatter lists for all IO and allow it
to use block layer functions.
Signed-off-by: Jens Axboe <axboe@suse.de>
Turn the field from a bitmask to an enumeration and add a list to allow
filtering of messages generated by userspace. We also define a list for
file system watches in anticipation of that feature.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch changes the format of the XFRM_MSG_DELSA and
XFRM_MSG_DELPOLICY notification so that the main message
sent is of the same format as that received by the kernel
if the original message was via netlink. This also means
that we won't lose the byid information carried in km_event.
Since this user interface is introduced by Jamal's patch
we can still afford to change it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
a flags argument. NLMSG_PUT stays there for compatibility but now
calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
NLMSG_NEW_ANSWER which now also takes a flags argument.
Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
and fixes the two direct users of __nlmsg_put to either provide
the flags or use NLMSG_NEW(_ANSWER).
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.
Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.
To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTA_GET_U(32|64)(tlv)
Assumes TLV is a u32/u64 field and returns its value.
RTA_GET_[M]SECS(tlv)
Assumes TLV is a u64 and transports jiffies converted
to seconds or milliseconds and returns its value.
RTA_PUT_U(32|64)(skb, type, value)
Appends %value as fixed u32/u64 to %skb as TLV %type.
RTA_PUT_[M]SECS(skb, type, jiffies)
Converts %jiffies to secs/msecs and appends it as u64
to %skb as TLV %type.
RTA_PUT_STRING(skb, type, string)
Appends %NUL terminated %string to %skb as TLV %type.
RTA_NEST(skb, type)
Starts a nested TLV %type and returns the nesting handle.
RTA_NEST_END(skb, nesting_handle)
Finishes the nested TLV %nesting_handle, must be called
symmetric to RTA_NEST(). Returns skb->len
RTA_NEST_CANCEL(skb, nesting_handle)
Cancel the nested TLV %nesting_handle and trim nested TLV
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
NLMSG_PUT_ANSWER(skb, nlcb, type, length)
Start a new netlink message as answer to a request,
returns the message header.
NLMSG_END(skb, nlh)
End a netlink message, fixes total message length,
returns skb->len.
NLMSG_CANCEL(skb, nlh)
Cancel the building process and trim whole message
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.
Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:
struct open_request -> struct request_sock
tcp_openreq_alloc -> reqsk_alloc
tcp_openreq_free -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free
With this most of the infrastructure closely resembles a struct
sock methods subset.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is for use with slab users that pass a dynamically allocated slab name in
kmem_cache_create, so that before destroying the slab one can retrieve the name
and free its memory.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heres the final patch.
What this patch provides
- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a fixed-up version of the broken "upstream-2.6.13" branch, where
I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
the history is all good.
warning when building with gcc -W :
include/linux/efi.h: In function `efi_range_is_wc':
include/linux/efi.h:320: warning: comparison between signed and unsigned
It looks to me like a significantly large 'len' passed in could cause the
loop to never end. Isn't it safer to make 'i' an unsigned long as well?
Like this little patch below (which of course also kills the warning) :
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
<linux/if_tr.h> uses __be16, but does not directly include
<asm/byteorder.h>. Add this in, so that dhcp/net-tools token ring code
can compile again.
Signed-off-by: Tom Rini <trini@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now m68k no longer sets HAVE_ARCH_GET_SIGNAL_TO_DELIVER, can it be removed
completely? Or may ARM26 still need it? Note that its usage was removed from
kernel/signal.c about 2 months ago.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 5700/5701 DMA write corruption on Apple G4 by detecting the Apple
UniNorth PCI 1.5 chipset and adjusting the DMA write boundary to 16. DMA
test fails to detect the problem with this chipset.
Thanks to Manuel Perez Ayala for reporting the problem and helping to
debug it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attached is a small patch for i945G support against 2.6.11.11.
From: Alan Hourihane <alanh@fairlite.demon.co.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
I'm not sure why this issue is suddenly showing, but without this
patchlet, the zx1 config won't compile anymore (e.g., to see the
compilation-error, look for "***" in [1]).
[1] http://www.gelato.unsw.edu.au/kerncomp/results//2005-06-06-17-00/zx1_defconfig-log.html
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Wed, May 04, 2005 at 01:37:30PM -0700, David Brownell wrote:
> On Wednesday 04 May 2005 12:19 pm, Roman Kagan wrote:
> > struct urb {
> > /* private, usb core and host controller only fields in the urb */
> > ...
> > struct list_head urb_list; /* list pointer to all active urbs */
> > ...
> > };
> >
> > Is it safe to use it for driver's purposes when the driver owns the urb,
> > that is, starting from the completion routine until the urb is submitted
> > with usb_submit_urb()?
>
> Right now, it should be.
Great! FWIW I've briefly tested a modified version of usbatm using
the list head in struct urb instead of creating a wrapper struct, and I
haven't seen any failures yet. So I tend to believe that your "should
be" actually means "is" :)
> > If it is, can it be guaranteed in future, e.g.
> > by moving the list head into the public section of struct urb?
>
> In fact I'm not sure why it ever got called "private" to usbcore/hcds.
> I thought the idea was that it should be like urb->status, reserved for
> whoever controls the URB.
OK then how about the following (essentially documentation) patch?
Signed-off-by: Roman Kagan <rkagan@mail.ru>
Acked-by: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the hardware header size is a multiple of HH_DATA_MOD, HH_DATA_OFF()
incorrectly returns HH_DATA_MOD (instead of 0). This affects ieee80211 layer
as 802.11 header is 32 bytes long.
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
o use a semaphore instead of an opencoded and racy lock
o move locking out of shaper_kick and into the callers - most just
released the lock before calling shaper_kick
o remove in_interrupt() tests. from ->close we can always block, from
->hard_start_xmit and timer context never
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
given number of bytes from device. Change ps2_command to
allow using 0 as command ID and actually pass it to the
device instead of working as a drain.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Fix up comment in cpufreq.h stating transition latency should be passed
in microseconds -- it was decided long ago to switch to nanoseconds.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch adds is_multicast_ether_addr() to go along with
is_valid_ether_addr() and friends. It then changes
is_valid_ether_addr() to use the new macro, and fixes up the comment
on that function to move implementation details out of the API doco.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The features field in netdevice is really a bitmask, and bitmask's should
be unsigned.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resend of earlier patch (no changes) from Catalin used to provide
device feature change notification.
Signed-off-by: Catalin BOIE <catab at umbrella.ro>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
serialize open and close calls and ensure that device's
open and close methods are only called when first user
opens it or last user closes it.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
I've tested it with a Logitech WingMan Rumblepad on an x86-64
machine, and on an ia32 machine to make sure I didn't break
anything.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
After porting this fixlet to UML:
http://linux.bkbits.net:8080/linux-2.5/cset@41791ab52lfMuF2i3V-eTIGRBbDYKQ
, I've also added a warning which should refuse compilation with insane values
for PREEMPT_ACTIVE... maybe we should simply move PREEMPT_ACTIVE out of
architectures using GENERIC_IRQS.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds dummy gameport_register_port, gameport_unregister_port
and gameport_set_phys functions to gameport.h for the case when a driver
can't use gameport.
This fixes the compilation of some OSS drivers with GAMEPORT=n without
the need to #if inside every single driver.
This patch also removes the non-working and now obsolete SOUND_GAMEPORT.
This patch is also an alternative solution for ALSA drivers with similar
problems (but #if's inside the drivers might have the advantage of
saving some more bytes of gameport is not available).
The only user-visible change is that for GAMEPORT=m the affected OSS
drivers are now allowed to be built statically (but they won't have
gameport support).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Delete quirk_via_bridge(), restore quirk_via_irqpic() -- but now
improved to be invoked upon device ENABLE, and now only for VIA devices
-- not all devices behind VIA bridges.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe pointed out that the iounmap() call in libata was occurring
too early, and some drivers (ahci, probably others) were using ioremap'd
memory after it had been unmapped.
The patch should address that problem by way of improving the libata
driver API:
* move ->host_stop() call after all ->port_stop() calls have occurred.
* create default helper function ata_host_stop(), and move iounmap()
call there.
* add ->host_stop_prewalk() hook, use it in sata_qstor.c (hi Mark).
sata_qstor appears to require the host-stop-before-port-stop ordering
that existed prior to applying the attached patch.
A new driver bnx2 for Broadcom bcm5706 is available.
The patch also includes new 1000BASE-X advertisement bit definitions in
mii.h
Thanks to David Miller and Jeff Garzik for reviewing and their valuable
feedback.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a fixed up version of the reorder feature of netem.
It is the same as the earlier patch plus with the bugfix from Julio merged in.
Has expected backwards compatibility behaviour.
Go ahead and merge this one, the TCP strangeness I was seeing was due
to the reordering bug, and previous version of TSO patch.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* add ide_bus_match() and export ide_bus_type
* split ide_remove_driver_from_hwgroup() out of ide_unregister()
* move device cleanup from ide_unregister() to drive_release_dev()
* convert ide_driver_t->name to driver->name
* convert ide_driver_t->{attach,cleanup} to driver->{probe,remove}
* remove ide_driver_t->busy as ide_bus_type->subsys.rwsem
protects against concurrent ->{probe,remove} calls
* make ide_{un}register_driver() void as it cannot fail now
* use driver_{un}register() directly, remove ide_{un}register_driver()
* use device_register() instead of ata_attach(), remove ata_attach()
* add proc_print_driver() and ide_drivers_show(), remove ide_drivers_op
* fix ide_replace_subdriver() and move it to ide-proc.c
* remove ide_driver_t->drives, ide_drives and drives_lock
* remove ide_driver_t->drivers, drivers and drivers_lock
* remove ide_drive_t->driver and DRIVER() macro
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Use LIST_HEAD_INIT rather than doing it by hand in DEFINE_WAIT.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
map_word_ff() was setting the mapword to ~0UL regardless of the
buswidth of the mapped flash chip. The read_map functions are
buswidth aware and therefor the map_word_equal function failed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Everybody does
struct packet_type foo_packet_type = {
.type = __constant_htons(ETH_P_FOO);
};
5 introduced warnings will be properly fixed later.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 0x1601 as 5752M, it's a 5752 but for mobile PCs.
Stolen from Broadcom bcm5700-8.1.55 driver.
Someone forgot to add it to tg3 ;-)
Signed-off-by: David S. Miller <davem@davemloft.net>
to make sure the flash is in array mode whenever we're about to
reboot. This is especially useful to allow "soft" reboot to work
which consists of branching back into the bootloader.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix fairly sad NOR-specific bug - during FS building ic->scan_dents
isn't zero, but jffs2_mark_node_obsolete() migt be called it tries to
finde the ic corresponding to ref - this requires ic->scan_dents = 0.
Signed-off-by: Artem B. Bityuckiy <dedekind@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch replaces the current CONFIG_JFFS2_FS_NAND, CONFIG_JFFS2_FS_NOR_ECC
and CONFIG_JFFS2_FS_DATAFLASH with a single configuration option -
CONFIG_JFFS2_FS_WRITEBUFFER.
The only functional change of this patch is that the slower div/mod
calculations for SECTOR_ADDR(), PAGE_DIV() and PAGE_MOD() are now always
used when CONFIG_JFFS2_FS_WRITEBUFFER is enabled.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For Dataflash, can_mark_obsolete = false and the NAND write buffering
code (wbuf.c) is used.
Since the DataFlash chip will automatically erase pages when writing,
the cleanmarkers are not needed - so cleanmarker_oob = false and
cleanmarker_size = 0
DataFlash page-sizes are not a power of two (they're multiples of 528
bytes). The SECTOR_ADDR macro (added in the previous core patch) is
replaced with a (slower) div/mod version if CONFIG_JFFS2_FS_DATAFLASH is
selected.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This enables support for reading, writing and locking so called
"Protection Registers" present on some flash chips.
A subset of them are pre-programmed at the factory with a
unique set of values. The rest is user-programmable.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add optional hardware specific callback routine to perform extra error
status checks on erase and write failures for devices with hardware ECC.
Signed-off-by: David A. Marlin <dmarlin@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Driver for generic RAM blocks which are exported by an platform_device
from the device driver system.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Added extended commands for AG-AND device and added
option for BBT_AUTO_REFRESH.
Signed-off-by: David A. Marlin <dmarlin@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move audit_serial() into audit.c and use it to generate serial numbers
on messages even when there is no audit context from syscall auditing.
This allows us to disambiguate audit records when more than one is
generated in the same millisecond.
Based on a patch by Steve Grubb after he observed the problem.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In _spin_unlock_bh(lock):
do { \
_raw_spin_unlock(lock); \
preempt_enable(); \
local_bh_enable(); \
__release(lock); \
} while (0)
there is no reason for using preempt_enable() instead of a simple
preempt_enable_no_resched()
Since we know bottom halves are disabled, preempt_schedule() will always
return at once (preempt_count!=0), and hence preempt_check_resched() is
useless here...
This fixes it by using "preempt_enable_no_resched()" instead of the
"preempt_enable()", and thus avoids the useless preempt_check_resched()
just before re-enabling bottom halves.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Defines for the different command classes as defined in the MMC and SD
specifications.
Removes the check for high command classes and instead checks that the
command classes needed are present.
Previous solution killed forward compatibility at no apparent gain.
Signed-of-by: Pierre Ossman
This patch changes the SELinux AVC to defer logging of paths to the audit
framework upon syscall exit, by saving a reference to the (dentry,vfsmount)
pair in an auxiliary audit item on the current audit context for processing
by audit_log_exit.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Caused oopses again. Also fix potential mismatch in checking if
change_page_attr was needed.
To do it without races I needed to change mm/vmalloc.c to export a
__remove_vm_area that does not take vmlist lock.
Noticed by Terence Ripperda and based on a patch of his.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a device driver for scsi media changer devices.
Signed-off-by: Gerd Knorr <kraxel@bytesex.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
blk_insert_request() has a unobivous feature of requeuing a
request setting REQ_SPECIAL|REQ_SOFTBARRIER. SCSI midlayer
was the only user and as previous patches removed the usage,
remove the feature from blk_insert_request(). Only special
requests should be queued with blk_insert_request(). All
requeueing should go through blk_requeue_request().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.
It's silly to have to add explicit entries for new userspace messages
as we invent them. Just treat all messages in the user range the same.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The driver model has a "detach_state" mechanism that:
- Has never been used by any in-kernel drive;
- Is superfluous, since driver remove() methods can do the same thing;
- Became buggy when the suspend() parameter changed semantics and type;
- Could self-deadlock when called from certain suspend contexts;
- Is effectively wasted documentation, object code, and headspace.
This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The attached patch updates generic HDLC to version 1.18.
FR Cisco LMI production-tested. Please apply to Linux 2.6. Thanks.
Changes:
- doc updates
- added Cisco LMI support to Frame-Relay code
- cleaned hdlc_fr.c a bit, removed some orphaned #defines etc.
- fixed a problem with non-functional LMI in FR DCE mode.
- changed diagnostic messages to better conform to FR standards
- all protocols: information about carrier changes (DCD line) is now
printed to kernel logs.
Signed-Off-By: Krzysztof Halasa <khc@pm.waw.pl>
Updated patch to fix erroneous flush of COMRESET set and missing flush
of COMRESET clear. Created a new routine scr_write_flush() to try to
prevent this in the future. Also, this patch is based on libata-2.6
instead of the previous libata-dev-2.6 based patch.
Signed-off-by: Brett Russ <russb@emc.com>
Index: libata-2.6/drivers/scsi/libata-core.c
===================================================================
This patch adds support for the davicom dm9000 network driver. The dm9000
is found on some embedded arm boards such as the pimx1 or the scb9328.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
diff -puN /dev/null drivers/net/dm9000.c
I'm going through the kernel code and have a patch that corrects
several spelling errors in comments.
From: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch adds more messages types to the audit subsystem so that audit
analysis is quicker, intuitive, and more useful.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
---
I forgot one type in the big patch. I need to add one for user space
originating SE Linux avc messages. This is used by dbus and nscd.
-Steve
---
Updated to 2.6.12-rc4-mm1.
-dwmw2
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is version 18 of the Wireless Extensions. The main change
is that it adds all the necessary APIs for WPA and WPA2 support. This
work was entirely done by Jouni Malinen, so let's thank him for both
his hard work and deep expertise on the subject ;-)
This APIs obviously doesn't do much by itself and works in
concert with driver support (Jouni already sent you the HostAP
changes) and userspace (Jouni is updating wpa_supplicant). This is
also orthogonal with the ongoing work on in-kernel IEEE support (but
potentially useful).
The patch is attached, tested with 2.6.11. Normally, I would
ask you to push that directly in the kernel (99% of the patch has been
on my web page for ages and it does not affect non-WPA stuff), but
Jouni convinced me that it should bake a few weeks in wireless-2.6
first, so that other driver maintainers can get up to speed with it.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch works around an issue with WD drives (and possibly others)
over SiL PATA->SATA Bridges on SATA controllers locking up with
transfers > 200 sectors.
Signed-off-by: Brad Campbell <brad@wasp.net.au>
Add audit_log_type to allow callers to specify type and pid when logging.
Convert audit_log to wrapper around audit_log_type. Could have
converted all audit_log callers directly, but common case is default
of type AUDIT_KERNEL and pid 0. Update audit_log_start to take type
and pid values when creating a new audit_buffer. Move sequences that
did audit_log_start, audit_log_format, audit_set_type, audit_log_end,
to simply call audit_log_type directly. This obsoletes audit_set_type
and audit_set_pid, so remove them.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Remove code conditionally dependent on CONFIG_AUDITSYSCALL from audit.c.
Move these dependencies to audit.h with the rest.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Add uart_insert_char(), which handles inserting characters into the
flip buffer. This helper function handles the correct semantics
for handling overrun in addition to inserting normal characters.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
shutdown credential information. It creates a new message type
AUDIT_TERM_INFO, which is used by the audit daemon to query who issued the
shutdown.
It requires the placement of a hook function that gathers the information. The
hook is after the DAC & MAC checks and before the function returns. Racing
threads could overwrite the uid & pid - but they would have to be root and
have policy that allows signalling the audit daemon. That should be a
manageable risk.
The userspace component will be released later in audit 0.7.2. When it
receives the TERM signal, it queries the kernel for shutdown information.
When it receives it, it writes the message and exits. The message looks
like this:
type=DAEMON msg=auditd(1114551182.000) auditd normal halt, sending pid=2650
uid=525, auditd pid=1685
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Ross moved. Remove the bad email address so people will find the correct
one in ./CREDITS.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just a few small cleanups to make this coherent english.
Signed-Off-By: Martin Hicks <mort@wildopensource.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
compile warning cleanup - suggested by Adrian Bunk; remove unmaintained rcs
char strings from source and handle the occurrences of their use, make sure
kernel-userspace issues taken care of; break out into separate patch
Signed-off-by: Stephen Biggs <yrgrknmxpzlk@gawab.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some comments about task->comm, to explain what it is near its definition
and provide some important pointers to its uses.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes some needlessly global identifiers static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjanv@infradead.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This had a fatal lock ranking bug: we do journal_start outside
mpage_writepages()'s lock_page().
Revert the whole thing, think again.
Credit-to: Jan Kara <jack@suse.cz>
For identifying the bug.
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only caller that ever sets it can call fsync_bdev itself easily. Also
update some comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for a new class of DAC960 controllers. It's based
on the GPLed idac320 driver from IBM for Linux 2.4.18. That driver is a
fork of the 2.4.18 version of DAC960 that adds support for this new type of
controllers (internally called "GEM Series"), that differ from other DAC960
V2 firmware controllers only in the register offsets and removes support
for all others.
This patch instead integrates support for these controllers into the DAC960
driver.
Thanks to Anders Norrbring for pointing me to the idac320 driver and
testing this patch.
No Signed-Off: line because all code is either copy & pasted from IBM's
idac320 driver or support for other controllers in the 2.6 DAC960 driver.
Note: the really odd formating matches the rest of the DAC960 driver.
Cc: Dave Olien <dmo@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow registration of multiple kprobes at an address in an architecture
agnostic way. Corresponding handlers will be invoked in a sequence. But,
a kprobe and a jprobe can't (yet) co-exist at the same address.
Signed-off-by: Ananth N Mavinakayanahalli <amavin@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes for big-endian systems in soundcard.h and awe_voice.h
This patch fixes the AFMT_S16_NE (include/linux/soundcard.h) and AWE_PATCH
(awe_voice.h) macros on big-endian systems.
It also moves _PATCHKEY into a new file, patchkey.h, in order to remove a
duplicate definition of it from awe_voice.h.
Signed-off-by: Stuart Brady <sdbrady@ntlworld.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
this matches the API used by other link layer like ethernet or token
ring.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now pci drivers can know when the system is going down without having to
add a reboot notifier event.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Converts remaining rtnetlink_link tables to use c99 designated
initializers to make greping a little bit easier.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts rtm_min and rtm_max arrays to use c99 designated
initializers for easier insertion of new message families.
RTM_GETMULTICAST and RTM_GETANYCAST did not have the minimal
message size specified which means that the netlink message
was parsed for routing attributes starting from the header.
Adds the proper minimal message sizes for these messages
(netlink header + common rtnetlink header) to fix this issue.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTM_MAX is currently set to the maximum reserverd message type plus one
thus being the cause of two bugs for new types being assigned a) given the
new family registers only the NEW command in its reserved block the array
size for per family entries is calculated one entry short and b) given the
new family registers all commands RTM_MAX would point to the first entry
of the block following this one and the rtnetlink receive path would accept
a message type for a nonexisting family.
This patch changes RTM_MAX to point to the maximum valid message type
by aligning it to the start of the next block and subtracting one.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts xfrm_msg_min and xfrm_dispatch to use c99 designated
initializers to make greping a little bit easier. Also replaces
two hardcoded message type with meaningful names.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes the type > XFRM_MSG_MAX check behave correctly to
protect access to xfrm_dispatch.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another large rollup of various patches from Adrian which make things static
where they were needlessly exported.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some KernelDoc descriptions are updated to match the current code.
No code changes.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again. The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels. I
have tried to proceed with that task. I have done that more times from 2.6.0
time and it gets boring to do same changes again and again. Linux kernel
compiles after changes for i386 and ARM targets. I have added references to
some more files into kernel-api book, I have added some section names as well.
So please, check that changes do not break something and that categories are
not too much skewed.
I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention. Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors. Changed <pid> to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.
You can see result of the modified documentation build at
http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz
Some more sources are ready to be included into kernel-doc generated
documentation. Sources has been added into kernel-api for now. Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds to the fbdev interface a set_cmap callback that allow the
driver to "batch" palette changes. This is useful for drivers like
radeonfb which might require lenghtly workarounds on palette accesses, thus
allowing to factor out those workarounds efficiently.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since we only access reiserfs_key ->u.k_offset_v2 guts in four helper
functions, we are free to sanitize those, as long as
- layout of the structure is unchanged (it's on-disk object)
- behaviour of these helpers is same as before.
Patch kills the mess with endianness-dependent bitfields and replaces them
with a single __le64. Helpers are switched to straightforward shift/and/or.
Benefits:
- exact same definitions for little- and big-endian architectures; no ifdefs
in sight.
- generate the same code on little-endian and improved on big-endian.
- doesn't rely on lousy bitfields handling in gcc codegenerator.
- happens to be standard C (unsigned long long is not a valid type for a
bitfield; it's a gccism and not well-implemented one, at that).
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
comp_short_keys() massaged into sane form, which kills the last place where
pointer to in_core_key (or any object containing such) would be cast to or
from something else. At that point we are free to change layout of
in_core_key - nothing depends on it anymore.
So we drop the mess with union in there and simply use (unconditional) __u64
k_offset and __u8 k_type instead; places using in_core_key switched to those.
That gives _far_ better code than current mess - on all platforms.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fixes for a couple of bugs exposed by the above: le32_to_cpu() used on 16bit
value and missing conversion in comparison of host- and little-endian values.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
little-endian objects annotated as such; again, obviously no changes of
resulting code, we only replace __u16 with __le16, etc. in relevant places.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
struct reiserfs_key cloned; (currently) identical struct in_core_key added.
Places that expect host-endian data in reiserfs_key switched to in_core_key.
Basically, we get annotation of reiserfs_key users and keep the resulting tree
obviously equivalent to original.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bump autofs4 version so we know what's going on.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new function valid_signal() that tests if its argument is
a valid signal number.
The reasons for adding this new function are:
- some code currently testing _NSIG directly has off-by-one errors.
Using this function instead avoids such errors.
- some code currently tests unsigned signal numbers for <0 which is
pointless and generates warnings when building with gcc -W. Using this
function instead avoids such warnings.
I considered various places to add this function but eventually settled on
include/linux/signal.h as the most logical place for it. If there's some
reason this is a bad choice then please let me know (hints as to a better
location are then welcome of course).
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The synchronize_kernel() primitive is used for quite a few different purposes:
waiting for RCU readers, waiting for NMIs, waiting for interrupts, and so on.
This makes RCU code harder to read, since synchronize_kernel() might or might
not have matching rcu_read_lock()s. This patch creates a new
synchronize_rcu() that is to be used for RCU readers and a new
synchronize_sched() that is used for the rest. These two new primitives
currently have the same implementation, but this is might well change with
additional real-time support. Both new primitives are GPL-only, the old
primitive is deprecated.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a deprecated_for_modules macro that allows symbols to be deprecated only
when used by modules, as suggested by Andrew Morton some months back.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch moves the IRQ-related SA_xxx flags (namely, SA_PROBE,
SA_SAMPLE_RANDOM and SA_SHIRQ) from all the arch-specific headers to
linux/signal.h. This looks like a left-over after the irq-handling code
was consolidated. The code was moved to kernel/irq/*, but the flags are
still left per-arch.
Right now, adding a new IRQ flag to the arch-specific header, like this
patch does:
http://cvs.sourceforge.net/viewcvs.py/*checkout*/alsa/alsa-driver/utils/patches/pcsp-kernel-2.6.10-03.diff?rev=1.1
no longer works, it breaks the compilation for all other arches, unless you
add that flag to all the other arch-specific headers too. So I think such
a clean-up makes sense.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arrange for all kernel printks to be no-ops. Only available if
CONFIG_EMBEDDED.
This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a pair of rlimits for allowing non-root tasks to raise nice and rt
priorities. Defaults to traditional behavior. Originally written by
Chris Wright.
The patch implements a simple rlimit ceiling for the RT (and nice) priorities
a task can set. The rlimit defaults to 0, meaning no change in behavior by
default. A value of 50 means RT priority levels 1-50 are allowed. A value of
100 means all 99 privilege levels from 1 to 99 are allowed. CAP_SYS_NICE is
blanket permission.
(akpm: see http://www.uwsg.iu.edu/hypermail/linux/kernel/0503.1/1921.html for
tips on integrating this with PAM).
Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC 2.95 uses __va_copy instead of va_copy. Handle it inside compiler.h
instead of in a casual file, and avoid the risk that this breaks with a newer
compiler (which it could do).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The specifications that talk about E820 map doesn't have an upper limit on
the number of e820 entries. But, today's kernel has a hard limit of 32.
With increase in memory size, we are seeing the number of E820 entries
reaching close to 32. Patch below bumps the number upto 128.
The patch changes the location of EDDBUF in zero-page (as it comes after E820).
As, EDDBUF is not used by boot loaders, this patch should not have any effect
on bootloader-setup code interface.
Patch covers both i386 and x86-64.
Tested on:
* grub booting bzImage
* lilo booting bzImage with EDID info enabled
* pxeboot of bzImage
Side-effect:
bss increases by ~ 2K and init.data increases by ~7.5K
on all systems, due to increase in size of static arrays.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the Intel ICH7DH and ICH7-M DH DID's to the irq.c and
pci_ids.h files.
Signed-off-by: Jason Gaston <Jason.d.gaston@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch by Jaya Kumar introduces a generic infrastructure to deal with
x86 chipsets with nonstandard reset sequences, and adds support for the
Geode gx1/cs5530a chipset.
Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for the special adb buttons of the aluminium
PowerBook G4.
Signed-off-by: Andreas Jaggi <andreas.jaggi@waterwave.ch>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a patch for counting the number of pages for bounce buffers. It's
shown in /proc/vmstat.
Currently, the number of bounce pages are not counted anywhere. So, if
there are many bounce pages, it seems that there are leaked pages. And
it's difficult for a user to imagine the usage of bounce pages. So, it's
meaningful to show # of bouce pages.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mempools have 2 problems.
The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.
The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.
Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.
(I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is
always equal to current->mm? If not, then we're comparing some other
process's total_vm with the calling process's rlimits).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached is a new patch that solves the issue of getting valid credentials
into the LOGIN message. The current code was assuming that the audit context
had already been copied. This is not always the case for LOGIN messages.
To solve the problem, the patch passes the task struct to the function that
emits the message where it can get valid credentials.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Most audit control messages are sent over netlink.In order to properly
log the identity of the sender of audit control messages, we would like
to add the loginuid to the netlink_creds structure, as per the attached
patch.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Attached is a patch that corrects a signed/unsigned warning. I also noticed
that we needlessly init serial to 0. That only needs to occur if the kernel
was compiled without the audit system.
-Steve Grubb
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We were calling ptrace_notify() after auditing the syscall and arguments,
but the debugger could have _changed_ them before the syscall was actually
invoked. Reorder the calls to fix that.
While we're touching ever call to audit_syscall_entry(), we also make it
take an extra argument: the architecture of the syscall which was made,
because some architectures allow more than one type of syscall.
Also add an explicit success/failure flag to audit_syscall_exit(), for
the benefit of architectures which return that in a condition register
rather than only returning a single register.
Change type of syscall return value to 'long' not 'int'.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We log strings from userspace, such as arguments to open(). These could
be formatted to contain \n followed by fake audit log entries. Provide
a function for logging such strings, which gives a hex dump when the
string contains anything but basic printable ASCII characters. Use it
for logging filenames.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In order to properly fix some issues with cpufreq vs. sleep on
PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver.
I must force a switch to full speed before sleep and I switch back to
previous speed on resume.
I also added a driver flag to disable the warnings in suspend/resume
since it is expected in this case to have different speed (and I want it
to fixup the jiffies properly).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Include chunk and skb sizes in sendbuffer accounting.
- 2 policies are supported. 0: per socket accouting, 1: per association
accounting
DaveM: I've made the default per-socket.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fs/isofs includes trimmed down to something resembling sanity.
Kernel-only parts of linux/iso_fs.h and entire linux/iso_fs_{sb,i}.h
moved to fs/isofs/isofs.h.
A lot of useless #include in fs/isofs/*.c killed.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
And provide an example simply action in order to
demonstrate usage.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NAT changes in 2.6.11 changed the position where helpers
are called and perform packet mangling. Before 2.6.11, a NAT
helper was called before the packet was NATed and had its
sequence number adjusted. Since 2.6.11, the helpers get packets
with already adjusted sequence numbers.
This breaks sequence number adjustment, adjust_tcp_sequence()
needs the original sequence number to determine whether
a packet was a retransmission and to store it for further
corrections. It can't be reconstructed without more information
than available, so this patch restores the old order by
calling helpers from a new conntrack hook two priorities
below ip_conntrack_confirm() and adjusting the sequence number
from a new NAT hook one priority below ip_conntrack_confirm().
Tracked down by Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add proper entry for bcm5752 PCI ID to pci_ids.h, and use it in tg3.
I did this separately in case patches like this (i.e. new PCI IDs)
need to come from more "official" sources.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a revised alternative that uses BUG_ON/WARN_ON
(as suggested by Herbert Xu) to eliminate NET_CALLER.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So here is a patch that introduces skb_store_bits -- the opposite of
skb_copy_bits, and uses them to read/write the csum field in rawv6.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
called, and it wasn't obvious what to do about them.
The ppc64 case turns out to be easy: the associated tables are noted elsewhere
and freed later, safe to either skip its hugetlb areas or go through the
motions of freeing nothing. Since ia64 does need a special case, restore to
ppc64 the special case of skipping them.
The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
probably appeared to work okay if you just had one such area; in fact it's
been broken much longer if you consider a long munmap spanning from another
region into the hugetlb region.
In the ia64 hugetlb region, more virtual address bits are available than in
the other regions, yet the page tables are structured the same way: the page
at the bottom is larger. Here we need to scale down each addr before passing
it to the standard free_pgd_range. Was about to write a hugely_scaled_down
macro, but found htlbpage_to_page already exists for just this purpose. Fixed
off-by-one in ia64 is_hugepage_only_range.
Uninline free_pgd_range to make it available to ia64. Make sure the
vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
other (safe to join huges? probably but don't bother).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
because mm doesn't contain the (32-bit emulation?) info needed. But it too is
only needed because we ignore the end from the vma list.
We could make flush_pgtables return that end, or unmap_vmas. Choose the
latter, since it's a natural fit with unmap_mapping_range_vma needing to know
its restart addr. This does make more than minimal change, but if unmap_vmas
had returned the end before, this is how we'd have done it, rather than
storing the break_addr in zap_details.
unmap_vmas used to return count of vmas scanned, but that's just debug which
hasn't been useful in a while; and if we want the map_count 0 on exit check
back, it can easily come from the final remove_vm_struct loop.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
clear_page_range regression since 2.6.10's clear_page_tables; and its
long-standing well-known inefficiency in searching throughout the higher-level
page tables for those few entries to clear and free: all can be blamed on
ignoring the list of vmas when we free page tables.
Replace exit_mmap's clear_page_range of the total user address space by
free_pgtables operating on the mm's vma list; unmap_region use it in the same
way, giving floor and ceiling beyond which it may not free tables. This
brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
in which case latency fixes spoil unmap_vmas throughput).
Beware: the do_mmap_pgoff driver failure case must now use unmap_region
instead of zap_page_range, since a page table might have been allocated, and
can only be freed while it is touched by some vma.
Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
from the clear_page_range levels. (Most of free_pgtables' old code was
actually for a non-existent case, prev not properly set up, dating from before
hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
might want to add latency lockdrops later; but no attempt to do so yet, going
by vma should itself reduce latency.
But what if is_hugepage_only_range? Those ia64 and ppc64 cases need careful
examination: put that off until a later patch of the series.
What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?
And the range to sparc64's flush_tlb_pgtables? It's less clear to me now that
we need to do more than is done here - every PMD_SIZE ever occupied will be
flushed, do we really have to flush every PGDIR_SIZE ever partially occupied?
A shame to complicate it unnecessarily.
Special thanks to David Miller for time spent repairing my ceilings.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fix prototypes for debugfs functions (in configurations where
debugfs is disabled).
Signed-off-by: Michal Ostrowski <mostrows@speakeasy.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current <linux/debugfs.h> include file is a little fragile in that
it is not self-contained and hence may cause compile warnings or
errors depending on the files included before it, the kernel config
and the architecture. This patch makes things a little more robust by:
- including <linux/types.h> to get definitions of u32, mode_t, and so on.
- forward declaring struct file_operations.
- including <linux/err.h> when CONFIG_DEBUG_FS is not set
The last change is particularly useful, as a kernel developer is
likely to build with debugfs always enabled and never see the build
breakage cased if debugfs is disabled.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs: allow changing the permissions for already created attributes
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the first of a few installments of PM API updates to match the
recent switch to "pm_message_t". This installment primarily affects
USB device drivers (for USB interfaces), and it changes the handful of
drivers which currently implement suspend methods:
- <linux/usb.h> and usbcore, signature change
- Some drivers only changed the signature, net effect this just
shuts up "sparse -Wbitwise":
* hid-core
* stir4200
- Two network drivers did that, and also grew slightly more
featureful suspend code ... they now properly shut down
their activities. (As should stir4200...)
* pegasus
* usbnet
Note that the Wake-On-Lan (WOL) support in pegasus doesn't yet work; looks
to me like it's missing a request to turn it on, vs just configuring it.
The ASIX code in usbnet also has WOL hooks that are ready to use; untested.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Index: gregkh-2.6/drivers/net/irda/stir4200.c
===================================================================
With older gcc's:
In file included from drivers/usb/class/cdc-acm.c:63:
include/linux/usb_cdc.h:117: field `bDetailData' has incomplete type
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -puN include/linux/usb_cdc.h~usb_cdc-build-fix include/linux/usb_cdc.h
The current problem seen is that the queue lock is actually in the
SCSI device structure, so when that structure is freed on device
release, we go boom if the queue tries to access the lock again.
The fix here is to move the lock from the scsi_device to the queue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch hides reparent_to_init(). reparent_to_init() should only be
called by daemonize().
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
gcc-4 warns with
include/linux/cpuset.h:21: warning: type qualifiers ignored on function
return type
cpuset_cpus_allowed is declared with const
extern const cpumask_t cpuset_cpus_allowed(const struct task_struct *p);
First const should be __attribute__((const)), but the gcc manual
explains that:
"Note that a function that has pointer arguments and examines the data
pointed to must not be declared const. Likewise, a function that calls a
non-const function usually must not be const. It does not make sense for
a const function to return void."
The following patch remove const from the function declaration.
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hlist_for_each_entry_rcu() comment block refers to a nonexistent
hlist_add_rcu() API, needs to change to hlist_add_head_rcu().
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These have been deprecated since ->compat_ioctl when in, thus only a short
deprecation period. There's four users left: i2o_config, s390/z90crypy,
s390/dasd and s390/zfcp and for the first two patches are about to be
submitted to get rid of it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes u32 vs. pm_message_t confusion in remaining places. Fortunately
there's few of them.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I thought I'm done with fixing u32 vs. pm_message_t ... unfortunately
that turned out not to be the case as Russel King pointed out. Here are
fixes for Documentation and common code (mainly system devices).
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the Intel ESB2 DID's to the irq.c and pci_ids.h files.
Signed-off-by: Jason Gaston <Jason.d.gaston@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This recently got changed to include a lot of kernel internal stuff in the
non-__KERNEL__ area of the header, which isn't so kosher and breaks libc
builds.
The fix is pretty simple.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
iscsi/lvm2/multipath needs guaranteed protection from the oom-killer, so
make the magical value of -17 in /proc/<pid>/oom_adj defeat the oom-killer
altogether.
(akpm: we still need to document oom_adj and friends in
Documentation/filesystems/proc.txt!)
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was unexported by Arjan because we have no current users.
However, during a conversion from tasklets to workqueues of the parisc led
functions, we ran across a case where this was needed. In particular, the
open coded equivalent of cancel_rearming_delayed_workqueue was implemented
incorrectly, which is, I think, all the evidence necessary that this is a
useful API.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!