This command is missing.
Fixes: 5c79de6e79 ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: 97a64b4577 ("[XFRM]: Introduce XFRM_MSG_REPORT.")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These commands are missing.
Fixes: 28d8909bc7 ("[XFRM]: Export SAD info.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This command is missing.
Fixes: ecfd6b1837 ("[XFRM]: Export SPD info")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new command is missing.
Fixes: 880a6fab8f ("xfrm: configure policy hash table thresholds by netlink")
Reported-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new command is missing.
Fixes: 9a9634545c ("netns: notify netns id events")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new commands are missing.
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Makefile automatically generates the tomoyo policy files, which are
not removed by make clean (because they could have been provided by the
user). Instead of generating the missing files, use /dev/null if a
given file is not provided. Store the default exception_policy in
exception_policy.conf.default.
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Combine the generation of builtin-policy.h into a single command and use
if_changed, so that the file is regenerated each time the command
changes. The next patch will make use of this.
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Simplify the Makefile by using a readily available tool instead of a
custom sed script. The downside is that builtin-policy.h becomes
unreadable for humans, but it is only a generated file.
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Now that we can safely increase the avtab max buckets without
triggering high order allocations and have a hash function that
will make better use of the larger number of buckets, increase
the max buckets to 2^16.
Original:
101421 entries and 2048/2048 buckets used, longest chain length 374
With new hash function:
101421 entries and 2048/2048 buckets used, longest chain length 81
With increased max buckets:
101421 entries and 31078/32768 buckets used, longest chain length 12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This function, based on murmurhash3, has much better distribution than
the original. Using the current default of 2048 buckets, there are many
fewer collisions:
Before:
101421 entries and 2048/2048 buckets used, longest chain length 374
After:
101421 entries and 2048/2048 buckets used, longest chain length 81
The difference becomes much more significant when buckets are increased.
A naive attempt to expand the current function to larger outputs doesn't
yield any significant improvement; so this function is a prerequisite
for increasing the bucket size.
sds: Adapted from the original patches for libsepol to the kernel.
Signed-off-by: John Brooks <john.brooks@jolla.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously we shrank the avtab max hash buckets to avoid
high order memory allocations, but this causes avtab lookups to
degenerate to very long linear searches for the Fedora policy. Convert to
using a flex_array instead so that we can increase the buckets
without such limitations.
This change does not alter the max hash buckets; that is left to a
separate follow-on change.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Move the NetLabel secattr MLS category import logic into
mls_import_netlbl_cat() where it belongs, and use the
mls_import_netlbl_cat() function in security_netlbl_secattr_to_sid().
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Commit f01e1af445 ("selinux: don't pass in NULL avd to avc_has_perm_noaudit")
made this pointer reassignment unnecessary. Avd should continue to reference
the stack-based copy.
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Conflicts:
drivers/net/usb/asix_common.c
drivers/net/usb/sr9800.c
drivers/net/usb/usbnet.c
include/linux/usb/usbnet.h
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c
The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.
With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.
Signed-off-by: David S. Miller <davem@davemloft.net>
Return a negative error value like the rest of the entries in this function.
Cc: <stable@vger.kernel.org>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
In commit 00f84f3f2e ("Smack: Make the
syslog control configurable") this mutex was added, but the rest of
the final commit never actually made use of it, resulting in:
In file included from include/linux/mutex.h:29:0,
from include/linux/notifier.h:13,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:821,
from include/linux/gfp.h:5,
from include/linux/slab.h:14,
from include/linux/security.h:27,
from security/smack/smackfs.c:21:
security/smack/smackfs.c:63:21: warning: ‘smack_syslog_lock’ defined but not used [-Wunused-variable]
static DEFINE_MUTEX(smack_syslog_lock);
^
A git grep shows no other instances/references to smack_syslog_lock.
Delete it, assuming that the mutex addition was just a leftover from
an earlier work in progress version of the change.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
I have vehemently opposed adding a "permissive" mode to Smack
for the simple reasons that it would be subject to massive abuse
and that developers refuse to turn it off come product release.
I still believe that this is true, and still refuse to add a
general "permissive mode". So don't ask again.
Bumjin Im suggested an approach that addresses most of the concerns,
and I have implemented it here. I still believe that we'd be better
off without this sort of thing, but it looks like this minimizes the
abuse potential.
Firstly, you have to configure Smack Bringup Mode. That allows
for "release" software to be ammune from abuse. Second, only one
label gets to be "permissive" at a time. You can use it for
debugging, but that's about it.
A label written to smackfs/unconfined is treated specially.
If either the subject or object label of an access check
matches the "unconfined" label, and the access would not
have been allowed otherwise an audit record and a console
message are generated. The audit record "request" string is
marked with either "(US)" or "(UO)", to indicate that the
request was granted because of an unconfined label. The
fact that an inode was accessed by an unconfined label is
remembered, and subsequent accesses to that "impure"
object are noted in the log. The impurity is not stored in
the filesystem, so a file mislabled as a side effect of
using an unconfined label may still cause concern after
a reboot.
So, it's there, it's dangerous, but so many application
developers seem incapable of living without it I have
given in. I've tried to make it as safe as I can, but
in the end it's still a chain saw.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
With this commit, the LSM Smack implements the LSM
side part of the system call keyctl with the action
code KEYCTL_GET_SECURITY.
It is now possible to get the context of, for example,
the user session key using the command "keyctl security @s".
The original patch has been modified for merge.
Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
This change fixes the bug associated with sockets owned by kernel threads. These
sockets, created usually by network devices' drivers tasks, received smk_in
label from the task that created them - the "floor" label in the most cases. The
result was that they were not able to receive data packets because of missing
smack rules. The main reason of the access deny is that the socket smk_in label
is placed as the object during smk check, kernel thread's capabilities are
omitted.
Signed-off-by: Marcin Lis <m.lis@samsung.com>
This reverts commit ca10b9e9a8.
No longer needed after commit eb8895debe
("tcp: tcp_make_synack() should use sock_wmalloc")
When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc
Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the sysctl table is constified, we won't be able to directly modify
it. Instead, use a table copy that carries any needed changes.
Suggested-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull more vfs updates from Al Viro:
"Assorted stuff from this cycle. The big ones here are multilayer
overlayfs from Miklos and beginning of sorting ->d_inode accesses out
from David"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
procfs: fix race between symlink removals and traversals
debugfs: leave freeing a symlink body until inode eviction
Documentation/filesystems/Locking: ->get_sb() is long gone
trylock_super(): replacement for grab_super_passive()
fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
SELinux: Use d_is_positive() rather than testing dentry->d_inode
Smack: Use d_is_positive() rather than testing dentry->d_inode
TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
VFS: Split DCACHE_FILE_TYPE into regular and special types
VFS: Add a fallthrough flag for marking virtual dentries
VFS: Add a whiteout dentry type
VFS: Introduce inode-getting helpers for layered/unioned fs environments
Infiniband: Fix potential NULL d_inode dereference
posix_acl: fix reference leaks in posix_acl_create
autofs4: Wrong format for printing dentry
...
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use d_is_positive() rather than testing dentry->d_inode in SELinux to get rid
of direct references to d_inode outside of the VFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use d_is_positive() rather than testing dentry->d_inode in Smack to get rid of
direct references to d_inode outside of the VFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use d_is_dir() rather than d_inode and S_ISDIR(). Note that this will include
fake directories such as automount triggers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use d_is_positive(dentry) or d_is_negative(dentry) rather than testing
dentry->d_inode as the dentry may cover another layer that has an inode when
the top layer doesn't or may hold a 0,0 chardev that's actually a whiteout.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mediated_filesystem() should use dentry->d_sb not dentry->d_inode->i_sb and
should avoid file_inode() also since it is really dealing with the path.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull kconfig updates from Michal Marek:
"Yann E Morin was supposed to take over kconfig maintainership, but
this hasn't happened. So I'm sending a few kconfig patches that I
collected:
- Fix for missing va_end in kconfig
- merge_config.sh displays used if given too few arguments
- s/boolean/bool/ in Kconfig files for consistency, with the plan to
only support bool in the future"
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig: use va_end to match corresponding va_start
merge_config.sh: Display usage if given too few arguments
kconfig: use bool instead of boolean for type definition attributes
Pull misc VFS updates from Al Viro:
"This cycle a lot of stuff sits on topical branches, so I'll be sending
more or less one pull request per branch.
This is the first pile; more to follow in a few. In this one are
several misc commits from early in the cycle (before I went for
separate branches), plus the rework of mntput/dput ordering on umount,
switching to use of fs_pin instead of convoluted games in
namespace_unlock()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch the IO-triggering parts of umount to fs_pin
new fs_pin killing logics
allow attaching fs_pin to a group not associated with some superblock
get rid of the second argument of acct_kill()
take count and rcu_head out of fs_pin
dcache: let the dentry count go down to zero without taking d_lock
pull bumping refcount into ->kill()
kill pin_put()
mode_t whack-a-mole: chelsio
file->f_path.dentry is pinned down for as long as the file is open...
get rid of lustre_dump_dentry()
gut proc_register() a bit
kill d_validate()
ncpfs: get rid of d_validate() nonsense
selinuxfs: don't open-code d_genocide()
If a request_key() call to allocate and fill out a key attempts to insert the
key structure into a revoked keyring, the key will leak, using memory and part
of the user's key quota until the system reboots. This is from a failure of
construct_alloc_key() to decrement the key's reference count after the attempt
to insert into the requested keyring is rejected.
key_put() needs to be called in the link_prealloc_failed callpath to ensure
the unused key is released.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog. Nothing
major or unusual, except maybe the binder selinux stuff, which was all
acked by the proper selinux people and they thought it best to come
through this tree.
All of this has been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb
SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf
=lfmM
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc patches from Greg KH:
"Here's the big char/misc driver update for 3.20-rc1.
Lots of little things in here, all described in the changelog.
Nothing major or unusual, except maybe the binder selinux stuff, which
was all acked by the proper selinux people and they thought it best to
come through this tree.
All of this has been in linux-next with no reported issues for a while"
* tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
coresight: fix function etm_writel_cp14() parameter order
coresight-etm: remove check for unknown Kconfig macro
coresight: fixing CPU hwid lookup in device tree
coresight: remove the unnecessary function coresight_is_bit_set()
coresight: fix the debug AMBA bus name
coresight: remove the extra spaces
coresight: fix the link between orphan connection and newly added device
coresight: remove the unnecessary replicator property
coresight: fix the replicator subtype value
pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl'
mcb: Fix error path of mcb_pci_probe
virtio/console: verify device has config space
ti-st: clean up data types (fix harmless memory corruption)
mei: me: release hw from reset only during the reset flow
mei: mask interrupt set bit on clean reset bit
extcon: max77693: Constify struct regmap_config
extcon: adc-jack: Release IIO channel on driver remove
extcon: Remove duplicated include from extcon-class.c
Drivers: hv: vmbus: hv_process_timer_expiration() can be static
Drivers: hv: vmbus: serialize Offer and Rescind offer
...
Pull backing device changes from Jens Axboe:
"This contains a cleanup of how the backing device is handled, in
preparation for a rework of the life time rules. In this part, the
most important change is to split the unrelated nommu mmap flags from
it, but also removing a backing_dev_info pointer from the
address_space (and inode), and a cleanup of other various minor bits.
Christoph did all the work here, I just fixed an oops with pages that
have a swap backing. Arnd fixed a missing export, and Oleg killed the
lustre backing_dev_info from staging. Last patch was from Al,
unexporting parts that are now no longer needed outside"
* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
Make super_blocks and sb_lock static
mtd: export new mtd_mmap_capabilities
fs: make inode_to_bdi() handle NULL inode
staging/lustre/llite: get rid of backing_dev_info
fs: remove default_backing_dev_info
fs: don't reassign dirty inodes to default_backing_dev_info
nfs: don't call bdi_unregister
ceph: remove call to bdi_unregister
fs: remove mapping->backing_dev_info
fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
nilfs2: set up s_bdi like the generic mount_bdev code
block_dev: get bdev inode bdi directly from the block device
block_dev: only write bdev inode on close
fs: introduce f_op->mmap_capabilities for nommu mmap support
fs: kill BDI_CAP_SWAP_BACKED
fs: deduplicate noop_backing_dev_info
Pull security layer updates from James Morris:
"Highlights:
- Smack adds secmark support for Netfilter
- /proc/keys is now mandatory if CONFIG_KEYS=y
- TPM gets its own device class
- Added TPM 2.0 support
- Smack file hook rework (all Smack users should review this!)"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
cipso: don't use IPCB() to locate the CIPSO IP option
SELinux: fix error code in policydb_init()
selinux: add security in-core xattr support for pstore and debugfs
selinux: quiet the filesystem labeling behavior message
selinux: Remove unused function avc_sidcmp()
ima: /proc/keys is now mandatory
Smack: Repair netfilter dependency
X.509: silence asn1 compiler debug output
X.509: shut up about included cert for silent build
KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
MAINTAINERS: email update
tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
smack: fix possible use after frees in task_security() callers
smack: Add missing logging in bidirectional UDS connect check
Smack: secmark support for netfilter
Smack: Rework file hooks
tpm: fix format string error in tpm-chip.c
char/tpm/tpm_crb: fix build error
smack: Fix a bidirectional UDS connect check typo
smack: introduce a special case for tmpfs in smack_d_instantiate()
...
If hashtab_create() returns a NULL pointer then we should return -ENOMEM
but instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
- add "pstore" and "debugfs" to list of in-core exceptions
- change fstype checks to boolean equation
- change from strncmp to strcmp for checking
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked the subject line prefix to "selinux"]
Signed-off-by: Paul Moore <pmoore@redhat.com>
While the filesystem labeling method is only printed at the KERN_DEBUG
level, this still appears in dmesg and on modern Linux distributions
that create a lot of tmpfs mounts for session handling, the dmesg can
easily be filled with a lot of "SELinux: initialized (dev X ..."
messages. This patch removes this notification for the normal case
but leaves the error message intact (displayed when mounting a
filesystem with an unknown labeling behavior).
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Remove the function avc_sidcmp() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[PM: rewrite the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
/proc/keys is now mandatory and its config option no longer exists, so it
doesn't need selecting.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add security hooks to the binder and implement the hooks for SELinux.
The security hooks enable security modules such as SELinux to implement
controls over binder IPC. The security hooks include support for
controlling what process can become the binder context manager
(binder_set_context_mgr), controlling the ability of a process
to invoke a binder transaction/IPC to another process (binder_transaction),
controlling the ability of a process to transfer a binder reference to
another process (binder_transfer_binder), and controlling the ability
of a process to transfer an open file to another process (binder_transfer_file).
These hooks have been included in the Android kernel trees since Android 4.3.
(Updated to reflect upstream relocation and changes to the binder driver,
changes to the LSM audit data structures, coding style cleanups, and
to add inline documentation for the hooks).
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Nick Kralevich <nnk@google.com>
Acked-by: Jeffrey Vander Stoep <jeffv@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 1/23/2015 8:20 AM, Jim Davis wrote:
> Building with the attached random configuration file,
>
> security/smack/smack_netfilter.c: In function ‘smack_ipv4_output’:
> security/smack/smack_netfilter.c:55:6: error: ‘struct sk_buff’ has no
> member named ‘secmark’
> skb->secmark = skp->smk_secid;
> ^
> make[2]: *** [security/smack/smack_netfilter.o] Error 1
The existing Makefile used the wrong configuration option to
determine if smack_netfilter should be built. This sets it right.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Now that /proc/keys is used by libkeyutils to look up a key by type and
description, we should make it unconditional and remove
CONFIG_DEBUG_PROC_KEYS.
Reported-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
We hit use after free on dereferncing pointer to task_smack struct in
smk_of_task() called from smack_task_to_inode().
task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
task_cred_xxx() could be used only for non-pointer members of task's
credentials. It cannot be used for pointer members since what they point
to may disapper after dropping RCU read lock.
Mainly task_security() used this way:
smk_of_task(task_security(p))
Intead of this introduce function smk_of_task_struct() which
takes task_struct as argument and returns pointer to smk_known struct
and do this under RCU read lock.
Bogus task_security() macro is not used anymore, so remove it.
KASan's report for this:
AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
=============================================================================
BUG kmalloc-64 (Tainted: PO): kasan error
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
kmem_cache_alloc_trace+0x88/0x1bc
new_task_smack+0x44/0xd8
smack_cred_prepare+0x48/0x21c
security_prepare_creds+0x44/0x4c
prepare_creds+0xdc/0x110
smack_setprocattr+0x104/0x150
security_setprocattr+0x4c/0x54
proc_pid_attr_write+0x12c/0x194
vfs_write+0x1b0/0x370
SyS_write+0x5c/0x94
ret_fast_syscall+0x0/0x48
INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
kfree+0x270/0x290
smack_cred_free+0xc4/0xd0
security_cred_free+0x34/0x3c
put_cred_rcu+0x58/0xcc
rcu_process_callbacks+0x738/0x998
__do_softirq+0x264/0x4cc
do_softirq+0x94/0xf4
irq_exit+0xbc/0x120
handle_IRQ+0x104/0x134
gic_handle_irq+0x70/0xac
__irq_svc+0x44/0x78
_raw_spin_unlock+0x18/0x48
sync_inodes_sb+0x17c/0x1d8
sync_filesystem+0xac/0xfc
vdfs_file_fsync+0x90/0xc0
vfs_fsync_range+0x74/0x7c
INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
INFO: Object 0xc4635600 @offset=5632 fp=0x (null)
Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone c4635640: bb bb bb bb ....
Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 #1
Backtrace:
[<c00233a4>] (dump_backtrace+0x0/0x158) from [<c0023dec>] (show_stack+0x20/0x24)
r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
[<c0023dcc>] (show_stack+0x0/0x24) from [<c06d6d7c>] (dump_stack+0x20/0x28)
[<c06d6d5c>] (dump_stack+0x0/0x28) from [<c01c1d50>] (print_trailer+0x124/0x144)
[<c01c1c2c>] (print_trailer+0x0/0x144) from [<c01c1e88>] (object_err+0x3c/0x44)
r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
[<c01c1e4c>] (object_err+0x0/0x44) from [<c01cac18>] (kasan_report_error+0x2b8/0x538)
r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
[<c01ca960>] (kasan_report_error+0x0/0x538) from [<c01c9430>] (__asan_load4+0xd4/0xf8)
[<c01c935c>] (__asan_load4+0x0/0xf8) from [<c031e168>] (smack_task_to_inode+0x50/0x70)
r5:c4635600 r4:ca9da000
[<c031e118>] (smack_task_to_inode+0x0/0x70) from [<c031af64>] (security_task_to_inode+0x3c/0x44)
r5:cca25e80 r4:c0ba9780
[<c031af28>] (security_task_to_inode+0x0/0x44) from [<c023d614>] (pid_revalidate+0x124/0x178)
r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
[<c023d4f0>] (pid_revalidate+0x0/0x178) from [<c01db98c>] (lookup_fast+0x35c/0x43y4)
r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
[<c01db630>] (lookup_fast+0x0/0x434) from [<c01deec8>] (do_last.isra.24+0x1c0/0x1108)
[<c01ded08>] (do_last.isra.24+0x0/0x1108) from [<c01dff04>] (path_openat.isra.25+0xf4/0x648)
[<c01dfe10>] (path_openat.isra.25+0x0/0x648) from [<c01e1458>] (do_filp_open+0x3c/0x88)
[<c01e141c>] (do_filp_open+0x0/0x88) from [<c01ccb28>] (do_sys_open+0xf0/0x198)
r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
[<c01cca38>] (do_sys_open+0x0/0x198) from [<c01ccc00>] (SyS_open+0x30/0x34)
[<c01ccbd0>] (SyS_open+0x0/0x34) from [<c001db80>] (ret_fast_syscall+0x0/0x48)
Read of size 4 by thread T834:
Memory state around the buggy address:
c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: <stable@vger.kernel.org>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
During UDS connection check, both sides are checked for write access to
the other side. But only the first check is performed with audit support.
The second one didn't produce any audit logs. This simple patch fixes that.
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Smack uses CIPSO to label internet packets and thus provide
for access control on delivery of packets. The netfilter facility
was not used to allow for Smack to work properly without netfilter
configuration. Smack does not need netfilter, however there are
cases where it would be handy.
As a side effect, the labeling of local IPv4 packets can be optimized
and the handling of local IPv6 packets is just all out better.
The best part is that the netfilter tools use "contexts" that
are just strings, and they work just as well for Smack as they
do for SELinux.
All of the conditional compilation for IPv6 was implemented
by Rafal Krypa <r.krypa@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
This is one of those cases where you look at code you did
years ago and wonder what you might have been thinking.
There are a number of LSM hooks that work off of file pointers,
and most of them really want the security data from the inode.
Some, however, really want the security context that the process
had when the file was opened. The difference went undetected in
Smack until it started getting used in a real system with real
testing. At that point it was clear that something was amiss.
This patch corrects the misuse of the f_security value in several
of the hooks. The behavior will not usually be any different, as
the process had to be able to open the file in the first place, and
the old check almost always succeeded, as will the new, but for
different reasons.
Thanks to the Samsung Tizen development team that identified this.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The 54e70ec5eb commit introduced a
bidirectional check that should have checked for mutual WRITE access
between two labels. Due to a typo subject's OUT label is checked with
object's OUT. Should be OUT to IN.
Signed-off-by: Zbigniew Jasinski <z.jasinski@samsung.com>
Files created with __shmem_file_stup() appear to have somewhat fake
dentries which make them look like root directories and not get
the label the current process or ("*") star meant for tmpfs files.
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
In principle if this function was called with "value" == NULL and "len"
not NULL it could return different results for the "len" compared to a
case where "name" was not NULL. This is a hypothetical case that does
not exist in the kernel, but it's a logic bug nonetheless.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
Support for keyword 'boolean' will be dropped later on.
No functional change.
Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com
Signed-off-by: Christoph Jaeger <cj@linux.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.
This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).
This would cause either a panic, or corrupt memory.
Fixes CVE-2014-9529.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
We already checked if "desc" was NULL at the beginning of the function
and we've dereferenced it so this causes a static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull security layer updates from James Morris:
"In terms of changes, there's general maintenance to the Smack,
SELinux, and integrity code.
The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
which allows IMA appraisal to require signatures. Support for reading
keys from rootfs before init is call is also added"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: Remove security_ops extern
security: smack: fix out-of-bounds access in smk_parse_smack()
VFS: refactor vfs_read()
ima: require signature based appraisal
integrity: provide a hook to load keys when rootfs is ready
ima: load x509 certificate from the kernel
integrity: provide a function to load x509 certificate from the kernel
integrity: define a new function integrity_read_file()
Security: smack: replace kzalloc with kmem_cache for inode_smack
Smack: Lock mode for the floor and hat labels
ima: added support for new kernel cmdline parameter ima_template_fmt
ima: allocate field pointers array on demand in template_desc_init_fields()
ima: don't allocate a copy of template_fmt in template_desc_init_fields()
ima: display template format in meas. list if template name length is zero
ima: added error messages to template-related functions
ima: use atomic bit operations to protect policy update interface
ima: ignore empty and with whitespaces policy lines
ima: no need to allocate entry for comment
ima: report policy load status
ima: use path names cache
...
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
On powerpc we can end up with IMA=y and PPC_PSERIES=n which leads to:
warning: (IMA) selects TCG_IBMVTPM which has unmet direct dependencies (TCG_TPM && PPC_PSERIES)
tpm_ibmvtpm.c:(.text+0x14f3e8): undefined reference to `.plpar_hcall_norets'
I'm not sure why IMA needs to select those user-visible symbols, but if
it must then the simplest fix is to just express the proper dependencies
on the select.
Tested-by: Hon Ching (Vicky) Lo <lo1@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
When loading encrypted-keys module, if the last check of
aes_get_sizes() in init_encrypted() fails, the driver just returns an
error without unregistering its key type. This results in the stale
entry in the list. In addition to memory leaks, this leads to a kernel
crash when registering a new key type later.
This patch fixes the problem by swapping the calls of aes_get_sizes()
and register_key_type(), and releasing resources properly at the error
paths.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Since the keyring facility can be viewed as a cache (at least in some
applications), the local expiration time on the key should probably be viewed
as a 'needs updating after this time' property rather than an absolute 'anyone
now wanting to use this object is out of luck' property.
Since request_key() is the main interface for the usage of keys, this should
update or replace an expired key rather than issuing EKEYEXPIRED if the local
expiration has been reached (ie. it should refresh the cache).
For absolute conditions where refreshing the cache probably doesn't help, the
key can be negatively instantiated using KEYCTL_REJECT_KEY with EKEYEXPIRED
given as the error to issue. This will still cause request_key() to return
EKEYEXPIRED as that was explicitly set.
In the future, if the key type has an update op available, we might want to
upcall with the expired key and allow the upcall to update it. We would pass
a different operation name (the first column in /etc/request-key.conf) to the
request-key program.
request_key() returning EKEYEXPIRED is causing an NFS problem which Chuck
Lever describes thusly:
After about 10 minutes, my NFSv4 functional tests fail because the
ownership of the test files goes to "-2". Looking at /proc/keys
shows that the id_resolv keys that map to my test user ID have
expired. The ownership problem persists until the expired keys are
purged from the keyring, and fresh keys are obtained.
I bisected the problem to 3.13 commit b2a4df200d ("KEYS: Expand
the capacity of a keyring"). This commit inadvertantly changes the
API contract of the internal function keyring_search_aux().
The root cause appears to be that b2a4df200d made "no state check"
the default behavior. "No state check" means the keyring search
iterator function skips checking the key's expiry timeout, and
returns expired keys. request_key_and_link() depends on getting
an -EAGAIN result code to know when to perform an upcall to refresh
an expired key.
This patch can be tested directly by:
keyctl request2 user debug:fred a @s
keyctl timeout %user:debug:fred 3
sleep 4
keyctl request2 user debug:fred a @s
Without the patch, the last command gives error EKEYEXPIRED, but with the
command it gives a new key.
Reported-by: Carl Hetherington <cth@carlh.net>
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Simplify KEYRING_SEARCH_{NO,DO}_STATE_CHECK flags to be two variations of the
same flag. They are effectively mutually exclusive and one or the other
should be provided, but not both.
Keyring cycle detection and key possession determination are the only things
that set NO_STATE_CHECK, except that neither flag really does anything there
because neither purpose makes use of the keyring_search_iterator() function,
but rather provides their own.
For cycle detection we definitely want to check inside of expired keyrings,
just so that we don't create a cycle we can't get rid of. Revoked keyrings
are cleared at revocation time and can't then be reused, so shouldn't be a
problem either way.
For possession determination, we *might* want to validate each keyring before
searching it: do you possess a key that's hidden behind an expired or just
plain inaccessible keyring? Currently, the answer is yes. Note that you
cannot, however, possess a key behind a revoked keyring because they are
cleared on revocation.
keyring_search() sets DO_STATE_CHECK, which is correct.
request_key_and_link() currently doesn't specify whether to check the key
state or not - but it should set DO_STATE_CHECK.
key_get_instantiation_authkey() also currently doesn't specify whether to
check the key state or not - but it probably should also set DO_STATE_CHECK.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
When a key description argument is imported into the kernel from userspace, as
happens in add_key(), request_key(), KEYCTL_JOIN_SESSION_KEYRING,
KEYCTL_SEARCH, the description is copied into a buffer up to PAGE_SIZE in size.
PAGE_SIZE, however, is a variable quantity, depending on the arch. Fix this at
4096 instead (ie. 4095 plus a NUL termination) and define a constant
(KEY_MAX_DESC_SIZE) to this end.
When reading the description back with KEYCTL_DESCRIBE, a PAGE_SIZE internal
buffer is allocated into which the information and description will be
rendered. This means that the description will get truncated if an extremely
long description it has to be crammed into the buffer with the stringified
information. There is no particular need to copy the description into the
buffer, so just copy it directly to userspace in a separate operation.
Reported-by: Christian Kastner <debian@kvr.at>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Christian Kastner <debian@kvr.at>
integrity_kernel_read() duplicates the file read operations code
in vfs_read(). This patch refactors vfs_read() code creating a
helper function __vfs_read(). It is used by both vfs_read() and
integrity_kernel_read().
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch provides CONFIG_IMA_APPRAISE_SIGNED_INIT kernel configuration
option to force IMA appraisal using signatures. This is useful, when EVM
key is not initialized yet and we want securely initialize integrity or
any other functionality.
It forces embedded policy to require signature. Signed initialization
script can initialize EVM key, update the IMA policy and change further
requirement of everything to be signed.
Changes in v3:
* kernel parameter fixed to configuration option in the patch description
Changes in v2:
* policy change of this patch separated from the key loading patch
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Keys can only be loaded once the rootfs is mounted. Initcalls
are not suitable for that. This patch defines a special hook
to load the x509 public keys onto the IMA keyring, before
attempting to access any file. The keys are required for
verifying the file's signature. The hook is called after the
root filesystem is mounted and before the kernel calls 'init'.
Changes in v3:
* added more explanation to the patch description (Mimi)
Changes in v2:
* Hook renamed as 'integrity_load_keys()' to handle both IMA and EVM
keys by integrity subsystem.
* Hook patch moved after defining loading functions
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Define configuration option to load X509 certificate into the
IMA trusted kernel keyring. It implements ima_load_x509() hook
to load X509 certificate into the .ima trusted kernel keyring
from the root filesystem.
Changes in v3:
* use ima_policy_flag in ima_get_action()
ima_load_x509 temporarily clears ima_policy_flag to disable
appraisal to load key. Use it to skip appraisal rules.
* Key directory path changed to /etc/keys (Mimi)
* Expand IMA_LOAD_X509 Kconfig help
Changes in v2:
* added '__init'
* use ima_policy_flag to disable appraisal to load keys
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Provide the function to load x509 certificates from the kernel into the
integrity kernel keyring.
Changes in v2:
* configuration option removed
* function declared as '__init'
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch defines a new function called integrity_read_file()
to read file from the kernel into a buffer. Subsequent patches
will read a file containing the public keys and load them onto
the IMA keyring.
This patch moves and renames ima_kernel_read(), the non-security
checking version of kernel_read(), to integrity_kernel_read().
Changes in v3:
* Patch descriptions improved (Mimi)
* Add missing cast (kbuild test robot)
Changes in v2:
* configuration option removed
* function declared as '__init'
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Convert WARN_ONCE() to printk() in selinux_nlmsg_perm().
After conversion from audit_log() in commit e173fb26, WARN_ONCE() was
deemed too alarmist, so switch it to printk().
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: Changed to printk(WARNING) so we catch all of the different
invalid netlink messages. In Richard's defense, he brought this
point up earlier, but I didn't understand his point at the time.]
Signed-off-by: Paul Moore <pmoore@redhat.com>
The patch use kmem_cache to allocate/free inode_smack since they are
alloced in high volumes making it a perfect case for kmem_cache.
As per analysis, 24 bytes of memory is wasted per allocation due
to internal fragmentation. With kmem_cache, this can be avoided.
Accounting of memory allocation is below :
total slack net count-alloc/free caller
Before (with kzalloc)
1919872 719952 1919872 29998/0 new_inode_smack+0x14
After (with kmem_cache)
1201680 0 1201680 30042/0 new_inode_smack+0x18
>From above data, we found that 719952 bytes(~700 KB) of memory is
saved on allocation of 29998 smack inodes.
Signed-off-by: Rohit <rohit.kr@samsung.com>
The lock access mode allows setting a read lock on a file
for with the process has only read access. The floor label is
defined to make it easy to have the basic system installed such
that everyone can read it. Once there's a desire to read lock
(rationally or otherwise) a floor file a rule needs to get set.
This happens all the time, so make the floor label a little bit
more special and allow everyone lock access, too. By implication,
give processes with the hat label (hat can read everything)
lock access as well. This reduces clutter in the Smack rule set.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount(). This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux: Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element of the inode security structure for call_rcu()
upon an inode_free_security(). But the underlying issue
was already present before that commit as a possible use-after-free
of isec.
Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element. However,
this would merely hide the issue and not truly fix the code.
This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially. Then,
if the inode is dropped subsequently, there will be no further
references to the isec.
Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
compliant equivalent. This patch allocates the appropriate amount of memory
using a char array using the SHASH_DESC_ON_STACK macro.
The new code can be compiled with both gcc and clang.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Cc: tglx@linutronix.de
This patch allows users to provide a custom template format through the
new kernel command line parameter 'ima_template_fmt'. If the supplied
format is not valid, IMA uses the default template descriptor.
Changelog:
- v3:
- added check for 'fields' and 'num_fields' in
template_desc_init_fields() (suggested by Mimi Zohar)
- v2:
- using template_desc_init_fields() to validate a format string
(Roberto Sassu)
- updated documentation by stating that only the chosen template
descriptor is initialized (Roberto Sassu)
- v1:
- simplified code of ima_template_fmt_setup()
(Roberto Sassu, suggested by Mimi Zohar)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The allocation of a field pointers array is moved at the end of
template_desc_init_fields() and done only if the value of the 'fields'
and 'num_fields' parameters is not NULL. For just validating a template
format string, retrieved template field pointers are placed in a temporary
array.
Changelog:
- v3:
- do not check in this patch if 'fields' and 'num_fields' are NULL
(suggested by Mimi Zohar)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch removes the allocation of a copy of 'template_fmt', needed for
iterating over all fields in the passed template format string. The removal
was possible by replacing strcspn(), which modifies the passed string,
with strchrnul(). The currently processed template field is copied in
a temporary variable.
The purpose of this change is use template_desc_init_fields() in two ways:
for just validating a template format string (the function should work
if called by a setup function, when memory cannot be allocated), and for
actually initializing a template descriptor. The implementation of this
feature will be complete with the next patch.
Changelog:
- v3:
- added 'goto out' in template_desc_init_fields() to free allocated
memory if a template field length is not valid (suggested by
Mimi Zohar)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
With the introduction of the 'ima_template_fmt' kernel cmdline parameter,
a user can define a new template descriptor with custom format. However,
in this case, userspace tools will be unable to parse the measurements
list because the new template is unknown. For this reason, this patch
modifies the current IMA behavior to display in the list the template
format instead of the name (only if the length of the latter is zero)
so that a tool can extract needed information if it can handle listed
fields.
This patch also correctly displays the error log message in
ima_init_template() if the selected template cannot be initialized.
Changelog:
- v3:
- check the first byte of 'e->template_desc->name' instead of using
strlen() in ima_fs.c (suggested by Mimi Zohar)
- v2:
- print the template format in ima_init_template(), if the selected
template is custom (Roberto Sassu)
- v1:
- fixed patch description (Roberto Sassu, suggested by Mimi Zohar)
- set 'template_name' variable in ima_fs.c only once
(Roberto Sassu, suggested by Mimi Zohar)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch adds some error messages to inform users about the following
events: template descriptor not found, invalid template descriptor,
template field not found and template initialization failed.
Changelog:
- v2:
- display an error message if the format string contains too many
fields (Roberto Sassu)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Pull security subsystem updates from James Morris.
Mostly ima, selinux, smack and key handling updates.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
integrity: do zero padding of the key id
KEYS: output last portion of fingerprint in /proc/keys
KEYS: strip 'id:' from ca_keyid
KEYS: use swapped SKID for performing partial matching
KEYS: Restore partial ID matching functionality for asymmetric keys
X.509: If available, use the raw subjKeyId to form the key description
KEYS: handle error code encoded in pointer
selinux: normalize audit log formatting
selinux: cleanup error reporting in selinux_nlmsg_perm()
KEYS: Check hex2bin()'s return when generating an asymmetric key ID
ima: detect violations for mmaped files
ima: fix race condition on ima_rdwr_violation_check and process_measurement
ima: added ima_policy_flag variable
ima: return an error code from ima_add_boot_aggregate()
ima: provide 'ima_appraise=log' kernel option
ima: move keyring initialization to ima_init()
PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
PKCS#7: Better handling of unsupported crypto
KEYS: Overhaul key identification when searching for asymmetric keys
KEYS: Implement binary asymmetric key ID handling
...
The current implementation uses an atomic counter to provide exclusive
access to the sysfs 'policy' entry to update the IMA policy. While it is
highly unlikely, the usage of a counter might potentially allow another
process to overflow the counter, open the interface and insert additional
rules into the policy being loaded.
This patch replaces using an atomic counter with atomic bit operations
which is more reliable and a widely used method to provide exclusive access.
As bit operation keep the interface locked after successful update, it makes
it unnecessary to verify if the default policy was set or not during parsing
and interface closing. This patch also removes that code.
Changes in v3:
* move audit log message to ima_relead_policy() to report successful and
unsuccessful result
* unnecessary comment removed
Changes in v2:
* keep interface locked after successful policy load as in original design
* remove sysfs entry as in original design
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Empty policy lines cause parsing failures which is, especially
for new users, hard to spot. This patch prevents it.
Changes in v2:
* strip leading blanks and tabs in rules to prevent parsing failures
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
If a rule is a comment, there is no need to allocate an entry.
Move the checking for comments before allocating the entry.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Audit messages are rate limited, often causing the policy update
info to not be visible. Report policy loading status also using
pr_info.
Changes in v2:
* reporting moved to ima_release_policy to notice parsing errors
* reporting both completed and failed status
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
J2z9Shz68zu0ok8cuSlo
=Hhee
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:
- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.
There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"
* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...
Cheers,
Rusty.
PS. My virtio-next tree is empty: DaveM took the patches I had. There might
be a virtio-rng starvation fix, but so far it's a bit voodoo so I will
get to that in the next two days or it will wait.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUGFrvAAoJENkgDmzRrbjxOJYQALaZbTumrtX3Mo/FAtzn8d5N
8gxcqk1Mhz4lR1vPWy/YN/H2f23qb/saqLxPar8Wgou3h7N8EqSdwDqJSuvEqhG0
iEXUsNLC7BOsDkLYhdjTfZoW/lsVU/EH4bkZMSxAZI9V64phXhDYfPb5SQgJTECr
Ue6IK4ijW6zdWLstGfg/ixrIeGDUSnyiThF9O2mYVaB1D0QkLDIAZxbjZJgfFfut
PwO33/sEV4pceTpkmxFKl/OiS+obi/VbDixjSCcO+jaBd1pVxH9fhhKREStOhN4z
88z5ADR71RH6so9TQTwIIcgb2Hon5d+3RVMB6CxuvKs9NmHSXDiQyZvG9J/jiSdm
KrPKSiVwGGwJSwxXTm8CDaz6Oj0ibDXBIzv/vYI22sR7u8PmRQFvL3O1VrW+KDnE
yoG75S9DHzSQ1183xFFFTt4FBRm/4XKyVs+F6YqYkchLigrUfQMCGb1cmZyE5y7K
bgNyonu0m/ItoQmekoDgYqvSjwdguaJ35XCW55GrKJ84JDHBaw3SpPdEfjAS8FsH
aT5o2oernvwRG6gsX9858RvB/uo1UKwHv1waDfV4cqNjMm5Ko+Yr6OIdQvBQiq07
cFkVmkrMtEyX19QyIGW3QSbFL1lr3X5cC5glzEeKY941yZbTluSsNuMlMPT1+IMx
NOUbh0aG8B8ZaMZPFNLi
=QzCn
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"Nothing major: support for compressing modules, and auto-tainting
params.
PS. My virtio-next tree is empty: DaveM took the patches I had. There
might be a virtio-rng starvation fix, but so far it's a bit voodoo
so I will get to that in the next two days or it will wait"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
moduleparam: Resolve missing-field-initializer warning
kbuild: handle module compression while running 'make modules_install'.
modinst: wrap long lines in order to enhance cmd_modules_install
modsign: lookup lines ending in .ko in .mod files
modpost: simplify file name generation of *.mod.c files
modpost: reduce visibility of symbols and constify r/o arrays
param: check for tainting before calling set op.
drm/i915: taint the kernel if unsafe module parameters are set
module: add module_param_unsafe and module_param_named_unsafe
module: make it possible to have unsafe, tainting module params
module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
__getname() uses slab allocation which is faster than kmalloc.
Make use of it.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
If filesystem is mounted read-only or file is immutable, updating
xattr will fail. This is a usual case during early boot until
filesystem is remount read-write. This patch verifies conditions
to skip unnecessary attempt to calculate HMAC and set xattr.
Changes in v2:
* indention changed according to Lindent (requested by Mimi)
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
integrity_init_keyring() is used only from kernel '__init'
functions. Add it there as well.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch completes the switching to the 'ima_policy_flag' variable
in the checks at the beginning of IMA functions, starting with the
commit a756024e.
Checking 'iint_initialized' is completely unnecessary, because
S_IMA flag is unset if iint was not allocated. At the same time
the integrity cache is allocated with SLAB_PANIC and the kernel will
panic if the allocation fails during kernel initialization. So on
a running system iint_initialized is always true and can be removed.
Changes in v3:
* not limiting test to IMA_APPRAISE (spotted by Roberto Sassu)
Changes in v2:
* 'iint_initialized' removal patch merged to this patch (requested
by Mimi)
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Acked-by: Roberto Sassu <roberto.sassu@polito.it>
Restructure to keyword=value pairs without spaces. Drop superfluous words in
text. Make invalid_context a keyword. Change result= keyword to seresult=.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Minor rewrite to the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Convert audit_log() call to WARN_ONCE().
Rename "type=" to nlmsg_type=" to avoid confusion with the audit record
type.
Added "protocol=" to help track down which protocol (NETLINK_AUDIT?) was used
within the netlink protocol family.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Rewrote the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch fixes the detection of the 'open_writers' violation for mmaped
files.
before) an 'open_writers' violation is detected if the policy contains
a rule with the criteria: func=FILE_CHECK mask=MAY_READ
after) an 'open_writers' violation is detected if the current event
matches one of the policy rules.
With the old behaviour, the 'open_writers' violation is not detected
in the following case:
policy:
measure func=FILE_MMAP mask=MAY_EXEC
steps:
1) open a shared library for writing
2) execute a binary that links that shared library
3) during the binary execution, modify the shared library and save
the change
result:
the 'open_writers' violation measurement is not present in the IMA list.
Only binaries executed are protected from writes. For libraries mapped
in memory there is the flag MAP_DENYWRITE for this purpose, but according
to the output of 'man mmap', the mmap flag is ignored.
Since ima_rdwr_violation_check() is now called by process_measurement()
the information about if the inode must be measured is already provided
by ima_get_action(). Thus the unnecessary function ima_must_measure()
has been removed.
Changes in v3 (Dmitry Kasatkin):
- Violation for MMAP_CHECK function are verified since this patch
- Changed patch description a bit
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch fixes a race condition between two functions that try to access
the same inode. Since the i_mutex lock is held and released separately
in the two functions, there may be the possibility that a violation is
not correctly detected.
Suppose there are two processes, A (reader) and B (writer), if the
following sequence happens:
A: ima_rdwr_violation_check()
B: ima_rdwr_violation_check()
B: process_measurement()
B: starts writing the inode
A: process_measurement()
the ToMToU violation (a reader may be accessing a content different from
that measured, due to a concurrent modification by a writer) will not be
detected. To avoid this issue, the violation check and the measurement
must be done atomically.
This patch fixes the problem by moving the violation check inside
process_measurement() when the i_mutex lock is held. Differently from
the old code, the violation check is executed also for the MMAP_CHECK
hook (other than for FILE_CHECK). This allows to detect ToMToU violations
that are possible because shared libraries can be opened for writing
while they are in use (according to the output of 'man mmap', the mmap()
flag MAP_DENYWRITE is ignored).
Changes in v5 (Roberto Sassu):
* get iint if action is not zero
* exit process_measurement() after the violation check if action is zero
* reverse order process_measurement() exit cleanup (Mimi)
Changes in v4 (Dmitry Kasatkin):
* iint allocation is done before calling ima_rdrw_violation_check()
(Suggested-by Mimi)
* do not check for violations if the policy does not contain 'measure'
rules (done by Roberto Sassu)
Changes in v3 (Dmitry Kasatkin):
* no violation checking for MMAP_CHECK function in this patch
* remove use of filename from violation
* removes checking if ima is enabled from ima_rdrw_violation_check
* slight style change
Suggested-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch introduces the new variable 'ima_policy_flag', whose bits
are set depending on the action of the current policy rules. Only the
flags IMA_MEASURE, IMA_APPRAISE and IMA_AUDIT are set.
The new variable will be used to improve performance by skipping the
unnecessary execution of IMA code if the policy does not contain rules
with the above actions.
Changes in v6 (Roberto Sassu)
* do not check 'ima_initialized' before calling ima_update_policy_flag()
in ima_update_policy() (suggested by Dmitry)
* calling ima_update_policy_flag() moved to init_ima to co-locate with
ima_initialized (Dmitry)
* add/revise comments (Mimi)
Changes in v5 (Roberto Sassu)
* reset IMA_APPRAISE flag in 'ima_policy_flag' if 'ima_appraise' is set
to zero (reported by Dmitry)
* update 'ima_policy_flag' only if IMA initialization is successful
(suggested by Mimi and Dmitry)
* check 'ima_policy_flag' instead of 'ima_initialized'
(suggested by Mimi and Dmitry)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch modifies ima_add_boot_aggregate() to return an error code.
This way we can determine if all the initialization procedures have
been executed successfully.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The kernel boot parameter "ima_appraise" currently defines 'off',
'enforce' and 'fix' modes. When designing a policy and labeling
the system, access to files are either blocked in the default
'enforce' mode or automatically fixed in the 'fix' mode. It is
beneficial to be able to run the system in a logging only mode,
without fixing it, in order to properly analyze the system. This
patch adds a 'log' mode to run the system in a permissive mode and
log the appraisal results.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
ima_init() is used as a single place for all initializations.
Experimental keyring patches used the 'late_initcall' which was
co-located with the late_initcall(init_ima). When the late_initcall
for the keyring initialization was abandoned, initialization moved
to init_ima, though it would be more logical to move it to ima_init,
where the rest of the initialization is done. This patch moves the
keyring initialization to ima_init() as a preparatory step for
loading the keys which will be added to ima_init() in following
patches.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Make the key matching functions pointed to by key_match_data::cmp return bool
rather than int.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
A previous patch added a ->match_preparse() method to the key type. This is
allowed to override the function called by the iteration algorithm.
Therefore, we can just set a default that simply checks for an exact match of
the key description with the original criterion data and allow match_preparse
to override it as needed.
The key_type::match op is then redundant and can be removed, as can the
user_match() function.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Remove key_type::def_lookup_type as it's no longer used. The information now
defaults to KEYRING_SEARCH_LOOKUP_DIRECT but may be overridden by
type->match_preparse().
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Preparse the match data. This provides several advantages:
(1) The preparser can reject invalid criteria up front.
(2) The preparser can convert the criteria to binary data if necessary (the
asymmetric key type really wants to do binary comparison of the key IDs).
(3) The preparser can set the type of search to be performed. This means
that it's not then a one-off setting in the key type.
(4) The preparser can set an appropriate comparator function.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVBhlXROxKuMESys7AQIQghAAmxHRD9kD7AJrlcL187gTXl5dRyxQNhX0
myT9A1/01jHJh6mPPKS5jt5ooenwfcbvBkdUARLNLq6j7urIzFk+UtSmfcN1OMXw
3y15nuJKbSCZRzPMR94Z0Ik23YED9dmc4ubxW7E+psoWWIZUvFt4GaGw7ei77O39
8Pbt/n7nIx2+s8aiXte2Om/zqzyMEmcLd3Mzlqxe4GX1jo/ThNQZ348EL+ECxJEl
3OKe7i7oRfa48+SybmhW5Bx/2f/aQMcIQX04+akOFMIC505rUg6CTzphcrTyV5/s
6FdQexRquf8/Ei/6DMAYPhnumRfWJ5x9txiNSEY4i11AIjo6Bt65vaLPuNniYRNI
b6Wn8SSE8Ucrq5RrmNlSmoJCs7r1NE+JdOaEPO0MDVkOouaja8daISmveV2AfZfF
bITOQgEw3QRpyL2FYdwa39/NXCONBILfL5HvNyXEfPEHBhI8igTgEyXYRwNHV9jT
dsVFTc9ZIrjksLt3CDh4Z8xdyZbyojYdRCfH/wna9aAkZpwwGfSYCcR3dE6SK26y
IjkqEVJoCFHEnvLUQkAQrc/2qWX2D1qHrcjVLwzwbM5G66YIPeLcJlZK2FbiY0Ay
Yc/kUJY0hU5W+TfFb1hhjO5G2DTTw8Ou6MGcxSTE3HwzqICjDhE7BwFZdVikHRP0
xMtjfJnMwuM=
=v+dv
-----END PGP SIGNATURE-----
Merge tag 'keys-fixes-20140916' into keys-next
Merge in keyrings fixes, at least some of which later patches depend on:
(1) Reinstate the production of EPERM for key types beginning with '.' in
requests from userspace.
(2) Tidy up the cleanup of PKCS#7 message signed information blocks and fix a
bug this made more obvious.
Signed-off-by: David Howells <dhowells@redhat.coM>
Reinstate the generation of EPERM for a key type name beginning with a '.' in
a userspace call. Types whose name begins with a '.' are internal only.
The test was removed by:
commit a4e3b8d79a
Author: Mimi Zohar <zohar@linux.vnet.ibm.com>
Date: Thu May 22 14:02:23 2014 -0400
Subject: KEYS: special dot prefixed keyring name bug fix
I think we want to keep the restriction on type name so that userspace can't
add keys of a special internal type.
Note that removal of the test causes several of the tests in the keyutils
testsuite to fail.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
While SELinux largely ignores namespaces, for good reason, there are
some places where it needs to at least be aware of namespaces in order
to function correctly. Network namespaces are one example. Basic
awareness of network namespaces are necessary in order to match a
network interface's index number to an actual network device.
This patch corrects a problem with network interfaces added to a
non-init namespace, and can be reproduced with the following commands:
[NOTE: the NetLabel configuration is here only to active the dynamic
networking controls ]
# netlabelctl unlbl add default address:0.0.0.0/0 \
label:system_u:object_r:unlabeled_t:s0
# netlabelctl unlbl add default address:::/0 \
label:system_u:object_r:unlabeled_t:s0
# netlabelctl cipsov4 add pass doi:100 tags:1
# netlabelctl map add domain:lspp_test_netlabel_t \
protocol:cipsov4,100
# ip link add type veth
# ip netns add myns
# ip link set veth1 netns myns
# ip a add dev veth0 10.250.13.100/24
# ip netns exec myns ip a add dev veth1 10.250.13.101/24
# ip l set veth0 up
# ip netns exec myns ip l set veth1 up
# ping -c 1 10.250.13.101
# ip netns exec myns ping -c 1 10.250.13.100
Reported-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The kernel print macros use the KBUILD_MODNAME, which is initialized
to the module name. The current integrity/Makefile makes every file
as its own module, so pr_xxx messages are prefixed with the file name
instead of the module. Similar to the evm/Makefile and ima/Makefile,
this patch fixes the integrity/Makefile to use the single name
'integrity'.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The integrity subsystem has lots of options and takes more than
half of the security menu. This patch consolidates the options
under "integrity", which are hidden if not enabled. This change
does not affect existing configurations. Re-configuration is not
needed.
Changes v4:
- no need to change "integrity subsystem" to menuconfig as
options are hidden, when not enabled. (Mimi)
- add INTEGRITY Kconfig help description
Changes v3:
- dependency to INTEGRITY removed when behind 'if INTEGRITY'
Changes v2:
- previous patch moved integrity out of the 'security' menu.
This version keeps integrity as a security option (Mimi).
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
For better visual appearance it is better to co-locate
asymmetric key options together with signature support.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
IMA uses only one template. This patch initializes only required
template to avoid unnecessary memory allocations.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Reviewed-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
In all cases except ima_bprm_check() the filename was not defined
and ima_d_path() was used to find the full path. Unfortunately,
the bprm filename is a relative pathname (eg. ./<dir>/filename).
ima_bprm_check() selects between bprm->interp and bprm->filename.
The following dump demonstrates the differences between using
filename and interp.
bprm->filename
filename: ./foo.sh, pathname: /root/bin/foo.sh
filename: ./foo.sh, pathname: /bin/dash
bprm->interp
filename: ./foo.sh, pathname: /root/bin/foo.sh
filename: /bin/sh, pathname: /bin/dash
In both cases the pathnames are currently the same. This patch
removes usage of filename and interp in favor of d_absolute_path.
Changes v3:
- 11 extra bytes for "deleted" not needed (Mimi)
- purpose "replace relative bprm filename with full pathname" (Mimi)
Changes v2:
- use d_absolute_path() instead of d_path to work in chroot environments.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
ima_get_action() sets the "action" flags based on policy.
Before collecting, measuring, appraising, or auditing the
file, the "action" flag is updated based on the cached
iint->flags.
This patch removes the subsequent unnecessary appraisal
test in ima_appraise_measurement().
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Add missing keywords to the function definition to cleanup
to discard initialization code.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Reviewed-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
'function' variable value can be changed instead of
allocating extra '_func' variable.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Precede bit testing before string comparison makes code
faster. Also refactor statement as a single line pointer
assignment. Logic is following: we set 'xattr_ptr' to read
xattr value when we will do appraisal or in any case when
measurement template is other than 'ima'.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Commit f381c27 "integrity: move ima inode integrity data management"
(re)moved few functions but left their declarations in header files.
This patch removes them and also removes duplicated declaration of
integrity_iint_find().
Commit c7de7ad "ima: remove unused cleanup functions". This patch
removes these definitions as well.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
If file has IMA signature, IMA in enforce mode, but key is missing
then file access is blocked and single error message is printed.
If IMA appraisal is enabled in fix mode, then system runs as usual
but might produce tons of 'Request for unknown key' messages.
This patch switches 'pr_warn' to 'pr_err_ratelimited'.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Empty files and missing xattrs do not guarantee that a file was
just created. This patch passes FILE_CREATED flag to IMA to
reliably identify new files.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> 3.14+
Unless an LSM labels a file during d_instantiate(), newly created
files are not labeled with an initial security.evm xattr, until
the file closes. EVM, before allowing a protected, security xattr
to be written, verifies the existing 'security.evm' value is good.
For newly created files without a security.evm label, this
verification prevents writing any protected, security xattrs,
until the file closes.
Following is the example when this happens:
fd = open("foo", O_CREAT | O_WRONLY, 0644);
setxattr("foo", "security.SMACK64", value, sizeof(value), 0);
close(fd);
While INTEGRITY_NOXATTRS status is handled in other places, such
as evm_inode_setattr(), it does not handle it in all cases in
evm_protect_xattr(). By limiting the use of INTEGRITY_NOXATTRS to
newly created files, we can now allow setting "protected" xattrs.
Changelog:
- limit the use of INTEGRITY_NOXATTRS to IMA identified new files
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> 3.14+
This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Push ipv4 and ipv6 nf hooks into single array and register/unregister
them via single call.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Paul Moore <pmoore@redhat.com>
On ima_file_free(), newly created empty files are not labeled with
an initial security.ima value, because the iversion did not change.
Commit dff6efc "fs: fix iversion handling" introduced a change in
iversion behavior. To verify this change use the shell command:
$ (exec >foo)
$ getfattr -h -e hex -d -m security foo
This patch defines the IMA_NEW_FILE flag. The flag is initially
set, when IMA detects that a new file is created, and subsequently
checked on the ima_file_free() hook to set the initial security.ima
value.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> 3.14+
This patch fixes a bug, where evm_verify_hmac() returns INTEGRITY_PASS
if inode->i_op->getxattr() returns an error in evm_find_protected_xattrs.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
A previous commit c0828e5048 ("selinux:
process labeled IPsec TCP SYN-ACK packets properly in
selinux_ip_postroute()") mistakenly left out a 'break' from a switch
statement which caused problems with IPv6 traffic.
Thanks to Florian Westphal for reporting and debugging the issue.
Reported-by: Florian Westphal <fwestpha@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Now that NFS client uses the kernel key ring facility to store the NFSv4
id/gid mappings, the defaults for root_maxkeys and root_maxbytes need to be
substantially increased.
These values have been soak tested:
https://bugzilla.redhat.com/show_bug.cgi?id=1033708#c73
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch fixes checkpatch 'return' warnings introduced with commit
9819cf2 "checkpatch: warn on unnecessary void function return statements".
Use scripts/checkpatch.pl --file security/integrity/evm/evm_main.c
to produce the warnings.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
3.16 commit aad4f8bb42
'switch simple generic_file_aio_read() users to ->read_iter()'
replaced ->aio_read with ->read_iter in most of the file systems
and introduced new_sync_read() as a replacement for do_sync_read().
Most of file systems set '->read' and ima_kernel_read is not affected.
When ->read is not set, this patch adopts fallback call changes from the
vfs_read.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> 3.16+
This patch fixes the case where the file's signature/hash xattr contains
an invalid hash algorithm. Although we can not verify the xattr, we still
need to measure the file. Use the default IMA hash algorithm.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The patch 3bcced39ea7d: "ima: use ahash API for file hash
calculation" from Feb 26, 2014, leads to the following static checker
warning:
security/integrity/ima/ima_crypto.c:204 ima_alloc_atfm()
error: buffer overflow 'hash_algo_name' 17 <= 17
Unlike shash tfm memory, which is allocated on initialization, the
ahash tfm memory allocation is deferred until needed.
This patch fixes the case where ima_ahash_tfm has not yet been
allocated and the file's signature/hash xattr contains an invalid hash
algorithm. Although we can not verify the xattr, we still need to
measure the file. Use the default IMA hash algorithm.
Changelog:
- set valid algo before testing tfm - based on Dmitry's comment
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Smack used to use a mix of smack_known struct and char* throughout its
APIs and implementation. This patch unifies the behaviour and makes it
store and operate exclusively on smack_known struct pointers when managing
labels.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
Conflicts:
security/smack/smack_access.c
security/smack/smack_lsm.c
The 54e70ec5eb commit introduced a
bidirectional check that should have checked for mutual WRITE access
between two labels. Due to a typo the second check was incorrect.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@samsung.com>
People keep asking me for permissive mode, and I keep saying "no".
Permissive mode is wrong for more reasons than I can enumerate,
but the compelling one is that it's once on, never off.
Nonetheless, there is an argument to be made for running a
process with lots of permissions, logging which are required,
and then locking the process down. There wasn't a way to do
that with Smack, but this provides it.
The notion is that you start out by giving the process an
appropriate Smack label, such as "ATBirds". You create rules
with a wide range of access and the "b" mode. On Tizen it
might be:
ATBirds System rwxalb
ATBirds User rwxalb
ATBirds _ rwxalb
User ATBirds wb
System ATBirds wb
Accesses that fail will generate audit records. Accesses
that succeed because of rules marked with a "b" generate
log messages identifying the rule, the program and as much
object information as is convenient.
When the system is properly configured and the programs
brought in line with the labeling scheme the "b" mode can
be removed from the rules. When the system is ready for
production the facility can be configured out.
This provides the developer the convenience of permissive
mode without creating a system that looks like it is
enforcing a policy while it is not.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
If the callee SID is bounded by the caller SID, then allowing
the transition to occur poses no risk of privilege escalation and we can
therefore safely allow the transition to occur. Add this exemption
for both the case where a transition was explicitly requested by the
application and the case where an automatic transition is defined in
policy.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Make it clear this is about kernel_param_ops, not kernel_param (which
will soon have a flags field of its own). No functional changes.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Jon Mason <jon.mason@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 7177a9c4b5 ("fs: call rename2 if exists") changed
"struct inode_operations"->rename == NULL if
"struct inode_operations"->rename2 != NULL .
TOMOYO needs to check for both ->rename and ->rename2 , or
a system on (e.g.) ext4 filesystem won't boot.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
While opening with CAP_MAC_OVERRIDE file label is not set.
Other calls may access it after CAP_MAC_OVERRIDE is dropped from process.
Signed-off-by: Marcin Niesluchowski <m.niesluchow@samsung.com>
Pull SElinux fixes from Paul Moore:
"Two small patches to fix a couple of build warnings in SELinux and
NetLabel. The patches are obvious enough that I don't think any
additional explanation is necessary, but it basically boils down to
the usual: I was stupid, and these patches fix some of the stupid.
Both patches were posted earlier this week to the SELinux list, and
that is where they sat as I didn't think there were noteworthy enough
to go upstream at this point in time, but DaveM would rather see them
upstream now so who am I to argue. As the patches are both very
small"
* 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux:
selinux: remove unused variabled in the netport, netnode, and netif caches
netlabel: fix the netlbl_catmap_setlong() dummy function
Values of extended attributes are stored as binary blobs. NULL-termination
of them isn't required. It just wastes disk space and confuses command-line
tools like getfattr because they have to print that zero byte at the end.
This patch removes terminating zero byte from initial security label in
smack_inode_init_security and cuts it out in function smack_inode_getsecurity
which is used by syscall getxattr. This change seems completely safe, because
function smk_parse_smack ignores everything after first zero byte.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Zero-length security labels are invalid but kernel should handle them.
This patch fixes kernel panic after setting zero-length security labels:
# attr -S -s "SMACK64" -V "" file
And after writing zero-length string into smackfs files syslog and onlycp:
# python -c 'import os; os.write(1, "")' > /smack/syslog
The problem is caused by brain-damaged logic in function smk_parse_smack()
which takes pointer to buffer and its length but if length below or equal zero
it thinks that the buffer is zero-terminated. Unfortunately callers of this
function are widely used and proper fix requires serious refactoring.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Security operation ->inode_listsecurity is used for generating list of
available extended attributes for syscall listxattr. Currently it's used
only in nfs4 or if filesystem doesn't provide i_op->listxattr.
The list is the set of NULL-terminated names, one after the other.
This method must include zero byte at the and into result.
Also this function must return length even if string does not fit into
output buffer or it is NULL, see similar method in selinux and man listxattr.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
This patch removes the unused return code variable in the netport,
netnode, and netif initialization functions.
Reported-by: fengguang.wu@intel.com
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull security subsystem updates from James Morris:
"In this release:
- PKCS#7 parser for the key management subsystem from David Howells
- appoint Kees Cook as seccomp maintainer
- bugfixes and general maintenance across the subsystem"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
X.509: Need to export x509_request_asymmetric_key()
netlabel: shorter names for the NetLabel catmap funcs/structs
netlabel: fix the catmap walking functions
netlabel: fix the horribly broken catmap functions
netlabel: fix a problem when setting bits below the previously lowest bit
PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
tpm: simplify code by using %*phN specifier
tpm: Provide a generic means to override the chip returned timeouts
tpm: missing tpm_chip_put in tpm_get_random()
tpm: Properly clean sysfs entries in error path
tpm: Add missing tpm_do_selftest to ST33 I2C driver
PKCS#7: Use x509_request_asymmetric_key()
Revert "selinux: fix the default socket labeling in sock_graft()"
X.509: x509_request_asymmetric_keys() doesn't need string length arguments
PKCS#7: fix sparse non static symbol warning
KEYS: revert encrypted key change
ima: add support for measuring and appraising firmware
firmware_class: perform new LSM checks
security: introduce kernel_fw_from_file hook
PKCS#7: Missing inclusion of linux/err.h
...
Pull timer and time updates from Thomas Gleixner:
"A rather large update of timers, timekeeping & co
- Core timekeeping code is year-2038 safe now for 32bit machines.
Now we just need to fix all in kernel users and the gazillion of
user space interfaces which rely on timespec/timeval :)
- Better cache layout for the timekeeping internal data structures.
- Proper nanosecond based interfaces for in kernel users.
- Tree wide cleanup of code which wants nanoseconds but does hoops
and loops to convert back and forth from timespecs. Some of it
definitely belongs into the ugly code museum.
- Consolidation of the timekeeping interface zoo.
- A fast NMI safe accessor to clock monotonic for tracing. This is a
long standing request to support correlated user/kernel space
traces. With proper NTP frequency correction it's also suitable
for correlation of traces accross separate machines.
- Checkpoint/restart support for timerfd.
- A few NOHZ[_FULL] improvements in the [hr]timer code.
- Code move from kernel to kernel/time of all time* related code.
- New clocksource/event drivers from the ARM universe. I'm really
impressed that despite an architected timer in the newer chips SoC
manufacturers insist on inventing new and differently broken SoC
specific timers.
[ Ed. "Impressed"? I don't think that word means what you think it means ]
- Another round of code move from arch to drivers. Looks like most
of the legacy mess in ARM regarding timers is sorted out except for
a few obnoxious strongholds.
- The usual updates and fixlets all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
timekeeping: Fixup typo in update_vsyscall_old definition
clocksource: document some basic timekeeping concepts
timekeeping: Use cached ntp_tick_length when accumulating error
timekeeping: Rework frequency adjustments to work better w/ nohz
timekeeping: Minor fixup for timespec64->timespec assignment
ftrace: Provide trace clocks monotonic
timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
seqcount: Add raw_write_seqcount_latch()
seqcount: Provide raw_read_seqcount()
timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
timekeeping: Create struct tk_read_base and use it in struct timekeeper
timekeeping: Restructure the timekeeper some more
clocksource: Get rid of cycle_last
clocksource: Move cycle_last validation to core code
clocksource: Make delta calculation a function
wireless: ath9k: Get rid of timespec conversions
drm: vmwgfx: Use nsec based interfaces
drm: i915: Use nsec based interfaces
timekeeping: Provide ktime_get_raw()
hangcheck-timer: Use ktime_get_ns()
...
Pull scheduler updates from Ingo Molnar:
- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.
This necessiated quite some background infrastructure rework,
including:
* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification
- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)
- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)
- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)
- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)
- Robustify the sched topology setup code. (Peterz Zijlstra)
- Improve sched_feat() handling wrt. static_keys (Jason Baron)
- Misc fixes.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel. This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.
There are no substantial code or logic changes in this patch.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow. At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly). This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.
Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.
One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap. NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel category (catmap) functions have a problem in that they
assume categories will be set in an increasing manner, e.g. the next
category set will always be larger than the last. Unfortunately, this
is not a valid assumption and could result in problems when attempting
to set categories less than the startbit in the lowest catmap node.
In some cases kernel panics and other nasties can result.
This patch corrects the problem by checking for this and allocating a
new catmap node instance and placing it at the front of the list.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
This reverts commit 4da6daf4d3.
Unfortunately, the commit in question caused problems with Bluetooth
devices, specifically it caused them to get caught in the newly
created BUG_ON() check. The AF_ALG problem still exists, but will be
addressed in a future patch.
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Commit fc7c70e "KEYS: struct key_preparsed_payload should have two
payload pointers" erroneously modified encrypted-keys. This patch
reverts the change to that file.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
The "security: introduce kernel_fw_from_file hook" patch defined a
new security hook to evaluate any loaded firmware that wasn't built
into the kernel.
This patch defines ima_fw_from_file(), which is called from the new
security hook, to measure and/or appraise the loaded firmware's
integrity.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
In order to validate the contents of firmware being loaded, there must be
a hook to evaluate any loaded firmware that wasn't built into the kernel
itself. Without this, there is a risk that a root user could load malicious
firmware designed to mount an attack against kernel memory (e.g. via DMA).
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
This is effectively a revert of 7b9a7ec565
plus fixing it a different way...
We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits. This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.
Consider a root application which drops all capabilities from ALL 4
capability sets. We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.
The BSET gets cleared differently. Instead it is cleared one bit at a
time. The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read. So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.
So the 'parent' will look something like:
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffc000000000
All of this 'should' be fine. Given that these are undefined bits that
aren't supposed to have anything to do with permissions. But they do...
So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel). We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets. If that root task calls execve()
the child task will pick up all caps not blocked by the bset. The bset
however does not block bits higher than CAP_LAST_CAP. So now the child
task has bits in eff which are not in the parent. These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.
The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits! So now we set durring commit creds that
the child is not dumpable. Given it is 'more priv' than its parent. It
also means the parent cannot ptrace the child and other stupidity.
The solution here:
1) stop hiding capability bits in status
This makes debugging easier!
2) stop giving any task undefined capability bits. it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
This fixes the cap_issubset() tests and resulting fallout (which
made the init task in a docker container untraceable among other
things)
3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.
4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
This lets 'setcap all+pe /bin/bash; /bin/bash' run
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Dan Walsh <dwalsh@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
In function cap_task_prctl(), we would allocate a credential
unconditionally and then check if we support the requested function.
If not we would release this credential with abort_creds() by using
RCU method. But on some archs such as powerpc, the sys_prctl is heavily
used to get/set the floating point exception mode. So the unnecessary
allocating/releasing of credential not only introduce runtime overhead
but also do cause OOM due to the RCU implementation.
This patch removes abort_creds() from cap_task_prctl() by calling
prepare_creds() only when we need to modify it.
Reported-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Provide key preparsing for the request_key_auth key type so that we can make
preparsing mandatory. This does nothing as this type can only be set up
internally to the kernel.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Provide key preparsing in the keyring so that we can make preparsing
mandatory. For keyrings, however, only an empty payload is permitted.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Make use of key preparsing in the big key type so that quota size determination
can take place prior to keyring locking when a key is being added.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Make use of key preparsing in user-defined and logon keys so that quota size
determination can take place prior to keyring locking when a key is being
added.
Also the idmapper key types need to change to match as they use the
user-defined key type routines.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Call the ->free_preparse() key type op even after ->preparse() returns an
error as it does cleaning up type stuff.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Allow a key type's preparsing routine to set the expiry time for a key.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Sage Weil <sage@redhat.com>
struct key_preparsed_payload should have two payload pointers to correspond
with those in struct key.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Provide a generic instantiation function for key types that use the preparse
hook. This makes it easier to prereserve key quota before keyrings get locked
to retain the new key.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Special kernel keys, such as those used to hold DNS results for AFS, CIFS and
NFS and those used to hold idmapper results for NFS, used to be
'invalidateable' with key_revoke(). However, since the default permissions for
keys were reduced:
Commit: 96b5c8fea6
KEYS: Reduce initial permissions on keys
it has become impossible to do this.
Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be
invalidated by root. This should not be used for system keyrings as the
garbage collector will try and remove any invalidate key. For system keyrings,
KEY_FLAG_ROOT_CAN_CLEAR can be used instead.
After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be
used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and
idmapper keys. Invalidated keys are immediately garbage collected and will be
immediately rerequested if needed again.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Steve Dickson <steved@redhat.com>
Require all keys added to the IMA keyring be signed by an
existing trusted key on the system trusted keyring.
Changelog v6:
- remove ifdef CONFIG_IMA_TRUSTED_KEYRING in C code - Dmitry
- update Kconfig dependency and help
- select KEYS_DEBUG_PROC_KEYS - Dmitry
Changelog v5:
- Move integrity_init_keyring() to init_ima() - Dmitry
- reset keyring[id] on failure - Dmitry
Changelog v1:
- don't link IMA trusted keyring to user keyring
Changelog:
- define stub integrity_init_keyring() function (reported-by Fengguang Wu)
- differentiate between regular and trusted keyring names.
- replace printk with pr_info (D. Kasatkin)
- only make the IMA keyring a trusted keyring (reported-by D. Kastatkin)
- define stub integrity_init_keyring() definition based on
CONFIG_INTEGRITY_SIGNATURE, not CONFIG_INTEGRITY_ASYMMETRIC_KEYS.
(reported-by Jim Davis)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Acked-by: David Howells <dhowells@redhat.com>
Dot prefixed keyring names are supposed to be reserved for the
kernel, but add_key() calls key_get_type_from_user(), which
incorrectly verifies the 'type' field, not the 'description' field.
This patch verifies the 'description' field isn't dot prefixed,
when creating a new keyring, and removes the dot prefix test in
key_get_type_from_user().
Changelog v6:
- whitespace and other cleanup
Changelog v5:
- Only prevent userspace from creating a dot prefixed keyring, not
regular keys - Dmitry
Reported-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
The asynchronous hash API allows initiating a hash calculation and
then performing other tasks, while waiting for the hash calculation
to complete.
This patch introduces usage of double buffering for simultaneous
hashing and reading of the next chunk of data from storage.
Changes in v3:
- better comments
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Use of multiple-page collect buffers reduces:
1) the number of block IO requests
2) the number of asynchronous hash update requests
Second is important for HW accelerated hashing, because significant
amount of time is spent for preparation of hash update operation,
which includes configuring acceleration HW, DMA engine, etc...
Thus, HW accelerators are more efficient when working on large
chunks of data.
This patch introduces usage of multi-page collect buffers. Buffer size
can be specified using 'ahash_bufsize' module parameter. Default buffer
size is 4096 bytes.
Changes in v3:
- kernel parameter replaced with module parameter
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Async hash API allows the use of HW acceleration for hash calculation.
It may give significant performance gain and/or reduce power consumption,
which might be very beneficial for battery powered devices.
This patch introduces hash calculation using ahash API. ahash performance
depends on the data size and the particular HW. Depending on the specific
system, shash performance may be better.
This patch defines 'ahash_minsize' module parameter, which is used to
define the minimal file size to use with ahash. If this minimum file size
is not set or the file is smaller than defined by the parameter, shash will
be used.
Changes in v3:
- kernel parameter replaced with module parameter
- pr_crit replaced with pr_crit_ratelimited
- more comment changes - Mimi
Changes in v2:
- ima_ahash_size became as ima_ahash
- ahash pre-allocation moved out from __init code to be able to use
ahash crypto modules. Ahash allocated once on the first use.
- hash calculation falls back to shash if ahash allocation/calculation fails
- complex initialization separated from variable declaration
- improved comments
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Replace spaces in op keyword labels in log output since userspace audit tools
can't parse orphaned keywords.
Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
process_measurement() always calls ima_template_desc_current(),
including when an IMA policy has not been defined.
This patch delays template descriptor lookup until action is
determined.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Before 2.6.39 inode->i_readcount was maintained by IMA. It was not atomic
and protected using spinlock. For 2.6.39, i_readcount was converted to
atomic and maintaining was moved VFS layer. Spinlock for some unclear
reason was replaced by i_mutex.
After analyzing the code, we came to conclusion that i_mutex locking is
unnecessary, especially when an IMA policy has not been defined.
This patch removes i_mutex locking from ima_rdwr_violation_check().
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTwvRvAAoJEHm+PkMAQRiG8CoIAJucWkj+MJFFoDXjR9hfI8U7
/WeQLJP0GpWGMXd2KznX9epCuw5rsuaPAxCy1HFDNOa7OtNYacWrsIhByxOIDLwL
YjDB9+fpMMPFWsr+LPJa8Ombh/TveCS77w6Pt5VMZFwvIKujiNK/C3MdxjReH5Gr
iTGm8x7nEs2D6L2+5sQVlhXot/97phxIlBSP6wPXEiaztNZ9/JZi905Xpgq+WU16
ZOA8MlJj1TQD4xcWyUcsQ5REwIOdQ6xxPF00wv/12RFela+Puy4JLAilnV6Mc12U
fwYOZKbUNBS8rjfDDdyX3sljV1L5iFFqKkW3WFdnv/z8ZaZSo5NupWuavDnifKw=
=6Q8o
-----END PGP SIGNATURE-----
Merge tag 'v3.16-rc5' into timers/core
Reason: Bring in upstream modifications, so the pending changes which
depend on them can be queued.
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them. This is quite hairy and error-prone. Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.
cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type. This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.
In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes. This patch is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The sock_graft() hook has special handling for AF_INET, AF_INET, and
AF_UNIX sockets as those address families have special hooks which
label the sock before it is attached its associated socket.
Unfortunately, the sock_graft() hook was missing a default approach
to labeling sockets which meant that any other address family which
made use of connections or the accept() syscall would find the
returned socket to be in an "unlabeled" state. This was recently
demonstrated by the kcrypto/AF_ALG subsystem and the newly released
cryptsetup package (cryptsetup v1.6.5 and later).
This patch preserves the special handling in selinux_sock_graft(),
but adds a default behavior - setting the sock's label equal to the
associated socket - which resolves the problem with AF_ALG and
presumably any other address family which makes use of accept().
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
When flushing the AVC, such as during a policy load, the various
network caches are also flushed, with each making a call to
synchronize_net() which has shown to be expensive in some cases.
This patch consolidates the network cache flushes into a single AVC
callback which only calls synchronize_net() once for each AVC cache
flush.
Reported-by: Jaejyn Shin <flagon22bass@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
With the introduction of fair queued rwlock, recursive read_lock()
may hang the offending process if there is a write_lock() somewhere
in between.
With recursive read_lock checking enabled, the following error was
reported:
=============================================
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: G E
---------------------------------------------
load_policy/708 is trying to acquire lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b32a>]
security_genfs_sid+0x3a/0x170
but task is already holding lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b48c>]
security_fs_use+0x2c/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(policy_rwlock);
lock(policy_rwlock);
This patch fixes the occurrence of recursive read_lock() of
policy_rwlock by adding a helper function __security_genfs_sid()
which requires caller to take the lock before calling it. The
security_fs_use() was then modified to call the new helper function.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The cond_read_node() should free the given node on error path as it's
not linked to p->cond_list yet. This is done via cond_node_destroy()
but it's not called when next_entry() fails before the expr loop.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The node->cur_state and len can be read in a single call of next_entry().
And setting len before reading is a dead write so can be eliminated.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(Minor tweak to the length parameter in the call to next_entry())
Signed-off-by: Paul Moore <pmoore@redhat.com>
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __packed
for __attribute__((packed)).
This patch is part of a large task I've taken to clean the gcc
specific attributes and use the the macros instead.
Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
There're some code duplication for reading a string value during
policydb_read(). Add str_read() helper to fix it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
ARRAY_SIZE is more concise to use when the size of an array is divided
by the size of its type or the size of its first element.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull more security layer updates from Serge Hallyn:
"A few more commits had previously failed to make it through
security-next into linux-next but this week made it into linux-next.
At least commit "ima: introduce ima_kernel_read()" was deemed critical
by Mimi to make this merge window.
This is a temporary tree just for this request. Mimi has pointed me
to some previous threads about keeping maintainer trees at the
previous release, which I'll certainly do for anything long-term,
after talking with James"
* 'serge-next-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security:
ima: introduce ima_kernel_read()
evm: prohibit userspace writing 'security.evm' HMAC value
ima: check inode integrity cache in violation check
ima: prevent unnecessary policy checking
evm: provide option to protect additional SMACK xattrs
evm: replace HMAC version with attribute mask
ima: prevent new digsig xattr from being replaced
Calculating the 'security.evm' HMAC value requires access to the
EVM encrypted key. Only the kernel should have access to it. This
patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
from setting/modifying the 'security.evm' HMAC value directly.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
When IMA did not support ima-appraisal, existance of the S_IMA flag
clearly indicated that the file was measured. With IMA appraisal S_IMA
flag indicates that file was measured and/or appraised. Because of
this, when measurement is not enabled by the policy, violations are
still reported.
To differentiate between measurement and appraisal policies this
patch checks the inode integrity cache flags. The IMA_MEASURED
flag indicates whether the file was actually measured, while the
IMA_MEASURE flag indicates whether the file should be measured.
Unfortunately, the IMA_MEASURED flag is reset to indicate the file
needs to be re-measured. Thus, this patch checks the IMA_MEASURE
flag.
This patch limits the false positive violation reports, but does
not fix it entirely. The IMA_MEASURE/IMA_MEASURED flags are
indications that, at some point in time, the file opened for read
was in policy, but might not be in policy now (eg. different uid).
Other changes would be needed to further limit false positive
violation reports.
Changelog:
- expanded patch description based on conversation with Roberto (Mimi)
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
ima_rdwr_violation_check is called for every file openning.
The function checks the policy even when violation condition
is not met. It causes unnecessary policy checking.
This patch does policy checking only if violation condition is met.
Changelog:
- check writecount is greater than zero (Mimi)
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Newer versions of SMACK introduced following security xattrs:
SMACK64EXEC, SMACK64TRANSMUTE and SMACK64MMAP.
To protect these xattrs, this patch includes them in the HMAC
calculation. However, for backwards compatibility with existing
labeled filesystems, including these xattrs needs to be
configurable.
Changelog:
- Add SMACK dependency on new option (Mimi)
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Using HMAC version limits the posibility to arbitrarily add new
attributes such as SMACK64EXEC to the hmac calculation.
This patch replaces hmac version with attribute mask.
Desired attributes can be enabled with configuration parameter.
It allows to build kernels which works with previously labeled
filesystems.
Currently supported attribute is 'fsuuid' which is equivalent of
the former version 2.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Even though a new xattr will only be appraised on the next access,
set the DIGSIG flag to prevent a signature from being replaced with
a hash on file close.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
There is no point in calling gettimeofday if only the seconds part of
the timespec is used. Use get_seconds() instead. It's not only the
proper interface it's also faster.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: linux-security-module@vger.kernel.org
Link: http://lkml.kernel.org/r/20140611234607.775273584@linutronix.de
Pull security layer updates from Serge Hallyn:
"This is a merge of James Morris' security-next tree from 3.14 to
yesterday's master, plus four patches from Paul Moore which are in
linux-next, plus one patch from Mimi"
* 'serge-next-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security:
ima: audit log files opened with O_DIRECT flag
selinux: conditionally reschedule in hashtab_insert while loading selinux policy
selinux: conditionally reschedule in mls_convert_context while loading selinux policy
selinux: reject setexeccon() on MNT_NOSUID applications with -EACCES
selinux: Report permissive mode in avc: denied messages.
Warning in scanf string typing
Smack: Label cgroup files for systemd
Smack: Verify read access on file open - v3
security: Convert use of typedef ctl_table to struct ctl_table
Smack: bidirectional UDS connect check
Smack: Correctly remove SMACK64TRANSMUTE attribute
SMACK: Fix handling value==NULL in post setxattr
bugfix patch for SMACK
Smack: adds smackfs/ptrace interface
Smack: unify all ptrace accesses in the smack
Smack: fix the subject/object order in smack_ptrace_traceme()
Minor improvement of 'smack_sb_kern_mount'
smack: fix key permission verification
KEYS: Move the flags representing required permission to linux/key.h
Pull cgroup updates from Tejun Heo:
"A lot of activities on cgroup side. Heavy restructuring including
locking simplification took place to improve the code base and enable
implementation of the unified hierarchy, which currently exists behind
a __DEVEL__ mount option. The core support is mostly complete but
individual controllers need further work. To explain the design and
rationales of the the unified hierarchy
Documentation/cgroups/unified-hierarchy.txt
is added.
Another notable change is css (cgroup_subsys_state - what each
controller uses to identify and interact with a cgroup) iteration
update. This is part of continuing updates on css object lifetime and
visibility. cgroup started with reference count draining on removal
way back and is now reaching a point where csses behave and are
iterated like normal refcnted objects albeit with some complexities to
allow distinguishing the state where they're being deleted. The css
iteration update isn't taken advantage of yet but is planned to be
used to simplify memcg significantly"
* 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits)
cgroup: disallow disabled controllers on the default hierarchy
cgroup: don't destroy the default root
cgroup: disallow debug controller on the default hierarchy
cgroup: clean up MAINTAINERS entries
cgroup: implement css_tryget()
device_cgroup: use css_has_online_children() instead of has_children()
cgroup: convert cgroup_has_live_children() into css_has_online_children()
cgroup: use CSS_ONLINE instead of CGRP_DEAD
cgroup: iterate cgroup_subsys_states directly
cgroup: introduce CSS_RELEASED and reduce css iteration fallback window
cgroup: move cgroup->serial_nr into cgroup_subsys_state
cgroup: link all cgroup_subsys_states in their sibling lists
cgroup: move cgroup->sibling and ->children into cgroup_subsys_state
cgroup: remove cgroup->parent
device_cgroup: remove direct access to cgroup->children
memcg: update memcg_has_children() to use css_next_child()
memcg: remove tasks/children test from mem_cgroup_force_empty()
cgroup: remove css_parent()
cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css
cgroup: use cgroup->self.refcnt for cgroup refcnting
...
Files are measured or appraised based on the IMA policy. When a
file, in policy, is opened with the O_DIRECT flag, a deadlock
occurs.
The first attempt at resolving this lockdep temporarily removed the
O_DIRECT flag and restored it, after calculating the hash. The
second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
time. The third attempt, by Dmitry Kasatkin, resolves the i_mutex
locking issue, by re-introducing the IMA mutex, but uncovered
another problem. Reading a file with O_DIRECT flag set, writes
directly to userspace pages. A second patch allocates a user-space
like memory. This works for all IMA hooks, except ima_file_free(),
which is called on __fput() to recalculate the file hash.
Until this last issue is addressed, do not 'collect' the
measurement for measuring, appraising, or auditing files opened
with the O_DIRECT flag set. Based on policy, permit or deny file
access. This patch defines a new IMA policy rule option named
'permit_directio'. Policy rules could be defined, based on LSM
or other criteria, to permit specific applications to open files
with the O_DIRECT flag set.
Changelog v1:
- permit or deny file access based IMA policy rules
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Cc: <stable@vger.kernel.org>
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
We presently prevent processes from using setexecon() to set the
security label of exec()'d processes when NO_NEW_PRIVS is enabled by
returning an error; however, we silently ignore setexeccon() when
exec()'ing from a nosuid mounted filesystem. This patch makes things
a bit more consistent by returning an error in the setexeccon()/nosuid
case.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
We cannot presently tell from an avc: denied message whether access was in
fact denied or was allowed due to global or per-domain permissive mode.
Add a permissive= field to the avc message to reflect this information.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
devcgroup_update_access() wants to know whether there are child
cgroups which are online and visible to userland and has_children()
may return false positive. Replace it with css_has_online_children().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Currently, devcg::has_children() directly tests cgroup->children for
list emptiness. The field is not a published field and scheduled to
go away. In addition, the test isn't strictly correct as devcg should
only care about children which are visible to userland.
This patch converts has_children() to use css_next_child() instead.
The subtle incorrectness is noted and will be dealt with later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
cgroup in general is moving towards using cgroup_subsys_state as the
fundamental structural component and css_parent() was introduced to
convert from using cgroup->parent to css->parent. It was quite some
time ago and we're moving forward with making css more prominent.
This patch drops the trivial wrapper css_parent() and let the users
dereference css->parent. While at it, explicitly mark fields of css
which are public and immutable.
v2: New usage from device_cgroup.c converted.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Johannes Weiner <hannes@cmpxchg.org>
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
We presently prevent processes from using setexecon() to set the
security label of exec()'d processes when NO_NEW_PRIVS is enabled by
returning an error; however, we silently ignore setexeccon() when
exec()'ing from a nosuid mounted filesystem. This patch makes things
a bit more consistent by returning an error in the setexeccon()/nosuid
case.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Convert all cftype->write_string() users to the new cftype->write()
which maps directly to kernfs write operation and has full access to
kernfs and cgroup contexts. The conversions are mostly mechanical.
* @css and @cft are accessed using of_css() and of_cft() accessors
respectively instead of being specified as arguments.
* Should return @nbytes on success instead of 0.
* @buf is not trimmed automatically. Trim if necessary. Note that
blkcg and netprio don't need this as the parsers already handle
whitespaces.
cftype->write_string() has no user left after the conversions and
removed.
While at it, remove unnecessary local variable @p in
cgroup_subtree_control_write() and stale comment about
CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c.
This patch doesn't introduce any visible behavior changes.
v2: netprio was missing from conversion. Converted.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Pull cgroup fixes from Tejun Heo:
"During recent restructuring, device_cgroup unified config input check
and enforcement logic; unfortunately, it turned out to share too much.
Aristeu's patches fix the breakage and marked for -stable backport.
The other two patches are fallouts from kernfs conversion. The blkcg
change is temporary and will go away once kernfs internal locking gets
simplified (patches pending)"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats()
device_cgroup: check if exception removal is allowed
device_cgroup: fix the comment format for recently added functions
device_cgroup: rework device access check and exception checking
cgroup: fix the retry path of cgroup_mount()
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs fixes from Al Viro:
"dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl
bugfix from hch"
The dcache fixes are for a subtle LRU list corruption bug reported by
Miklos Szeredi, where people inside IBM saw list corruptions with the
LTP/host01 test.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nick kvfree() from apparmor
posix_acl: handle NULL ACL in posix_acl_equiv_mode
dcache: don't need rcu in shrink_dentry_list()
more graceful recovery in umount_collect()
don't remove from shrink list in select_collect()
dentry_kill(): don't try to remove from shrink list
expand the call of dentry_lru_del() in dentry_kill()
new helper: dentry_free()
fold try_prune_one_dentry()
fold d_kill() and d_free()
fix races between __d_instantiate() and checks of dentry flags
[PATCH v3 1/2] device_cgroup: check if exception removal is allowed
When the device cgroup hierarchy was introduced in
bd2953ebbb - devcg: propagate local changes down the hierarchy
a specific case was overlooked. Consider the hierarchy bellow:
A default policy: ALLOW, exceptions will deny access
\
B default policy: ALLOW, exceptions will deny access
There's no need to verify when an new exception is added to B because
in this case exceptions will deny access to further devices, which is
always fine. Hierarchy in device cgroup only makes sure B won't have
more access than A.
But when an exception is removed (by writing devices.allow), it isn't
checked if the user is in fact removing an inherited exception from A,
thus giving more access to B.
Example:
# echo 'a' >A/devices.allow
# echo 'c 1:3 rw' >A/devices.deny
# echo $$ >A/B/tasks
# echo >/dev/null
-bash: /dev/null: Operation not permitted
# echo 'c 1:3 w' >A/B/devices.allow
# echo >/dev/null
#
This shouldn't be allowed and this patch fixes it by making sure to never allow
exceptions in this case to be removed if the exception is partially or fully
present on the parent.
v3: missing '*' in function description
v2: improved log message and formatting fixes
Cc: cgroups@vger.kernel.org
Cc: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Moving more extensive explanations to the end of the comment.
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We cannot presently tell from an avc: denied message whether access was in
fact denied or was allowed due to global or per-domain permissive mode.
Add a permissive= field to the avc message to reflect this information.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The cgroup filesystem isn't ready for an LSM to
properly use extented attributes. This patch makes
files created in the cgroup filesystem usable by
a system running Smack and systemd.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Smack believes that many of the operatons that can
be performed on an open file descriptor are read operations.
The fstat and lseek system calls are examples.
An implication of this is that files shouldn't be open
if the task doesn't have read access even if it has
write access and the file is being opened write only.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Register a netlink per-protocol bind fuction for audit to check userspace
process capabilities before allowing a multicast group connection.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.
...and I can't even disagree. The names and command macros do suck.
We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".
The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.
This patch makes the following changes that I think are necessary before
v3.15 ships:
1) rename the command macros to their new names. These end up in the uapi
headers and so are part of the external-facing API. It turns out that
glibc doesn't actually use the fcntl.h uapi header, but it's hard to
be sure that something else won't. Changing it now is safest.
2) make the the /proc/locks output display these as type "OFDLCK"
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Whenever a device file is opened and checked against current device
cgroup rules, it uses the same function (may_access()) as when a new
exception rule is added by writing devices.{allow,deny}. And in both
cases, the algorithm is the same, doesn't matter the behavior.
First problem is having device access to be considered the same as rule
checking. Consider the following structure:
A (default behavior: allow, exceptions disallow access)
\
B (default behavior: allow, exceptions disallow access)
A new exception is added to B by writing devices.deny:
c 12:34 rw
When checking if that exception is allowed in may_access():
if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
if (behavior == DEVCG_DEFAULT_ALLOW) {
/* the exception will deny access to certain devices */
return true;
Which is ok, since B is not getting more privileges than A, it doesn't
matter and the rule is accepted
Now, consider it's a device file open check and the process belongs to
cgroup B. The access will be generated as:
behavior: allow
exception: c 12:34 rw
The very same chunk of code will allow it, even if there's an explicit
exception telling to do otherwise.
A simple test case:
# mkdir new_group
# cd new_group
# echo $$ >tasks
# echo "c 1:3 w" >devices.deny
# echo >/dev/null
# echo $?
0
This is a serious bug and was introduced on
c39a2a3018 devcg: prepare may_access() for hierarchy support
To solve this problem, the device file open function was split from the
new exception check.
Second problem is how exceptions are processed by may_access(). The
first part of the said function tries to match fully with an existing
exception:
list_for_each_entry_rcu(ex, &dev_cgroup->exceptions, list) {
if ((refex->type & DEV_BLOCK) && !(ex->type & DEV_BLOCK))
continue;
if ((refex->type & DEV_CHAR) && !(ex->type & DEV_CHAR))
continue;
if (ex->major != ~0 && ex->major != refex->major)
continue;
if (ex->minor != ~0 && ex->minor != refex->minor)
continue;
if (refex->access & (~ex->access))
continue;
match = true;
break;
}
That means the new exception should be contained into an existing one to
be considered a match:
New exception Existing match? notes
b 12:34 rwm b 12:34 rwm yes
b 12:34 r b *:34 rw yes
b 12:34 rw b 12:34 w no extra "r"
b *:34 rw b 12:34 rw no too broad "*"
b *:34 rw b *:34 rwm yes
Which is fine in some cases. Consider:
A (default behavior: deny, exceptions allow access)
\
B (default behavior: deny, exceptions allow access)
In this case the full match makes sense, the new exception cannot add
more access than the parent allows
But this doesn't always work, consider:
A (default behavior: allow, exceptions disallow access)
\
B (default behavior: deny, exceptions allow access)
In this case, a new exception in B shouldn't match any of the exceptions
in A, after all you can't allow something that was forbidden by A. But
consider this scenario:
New exception Existing in A match? outcome
b 12:34 rw b 12:34 r no exception is accepted
Because the new exception has "w" as extra, it doesn't match, so it'll
be added to B's exception list.
The same problem can happen during a file access check. Consider a
cgroup with allow as default behavior:
Access Exception match?
b 12:34 rw b 12:34 r no
In this case, the access didn't match any of the exceptions in the
cgroup, which is required since exceptions will disallow access.
To solve this problem, two new functions were created to match an
exception either fully or partially. In the example above, a partial
check will be performed and it'll produce a match since at least
"b 12:34 r" from "b 12:34 rw" access matches.
Cc: cgroups@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
Smack IPC policy requires that the sender have write access
to the receiver. UDS streams don't do per-packet checks. The
only check is done at connect time. The existing code checks
if the connecting process can write to the other, but not the
other way around. This change adds a check that the other end
can write to the connecting process.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schuafler <casey@schaufler-ca.com>
Sam Henderson points out that removing the SMACK64TRANSMUTE
attribute from a directory does not result in the directory
transmuting. This is because the inode flag indicating that
the directory is transmuting isn't cleared. The fix is a tad
less than trivial because smk_task and smk_mmap should have
been broken out, too.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The function `smack_inode_post_setxattr` is called each
time that a setxattr is done, for any value of name.
The kernel allow to put value==NULL when size==0
to set an empty attribute value. The systematic
call to smk_import_entry was causing the dereference
of a NULL pointer hence a KERNEL PANIC!
The problem can be produced easily by issuing the
command `setfattr -n user.data file` under bash prompt
when SMACK is active.
Moving the call to smk_import_entry as proposed by this
patch is correcting the behaviour because the function
smack_inode_post_setxattr is called for the SMACK's
attributes only if the function smack_inode_setxattr validated
the value and its size (what will not be the case when size==0).
It also has a benefical effect to not fill the smack hash
with garbage values coming from any extended attribute
write.
Change-Id: Iaf0039c2be9bccb6cee11c24a3b44d209101fe47
Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>
1. In order to remove any SMACK extended attribute from a file, a user
should have CAP_MAC_ADMIN capability. But user without having this
capability is able to remove SMACK64MMAP security attribute.
2. While validating size and value of smack extended attribute in
smack_inode_setsecurity hook, wrong error code is returned.
Signed-off-by: Pankaj Kumar <pamkaj.k2@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
This allows to limit ptrace beyond the regular smack access rules.
It adds a smackfs/ptrace interface that allows smack to be configured
to require equal smack labels for PTRACE_MODE_ATTACH access.
See the changes in Documentation/security/Smack.txt below for details.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
The decision whether we can trace a process is made in the following
functions:
smack_ptrace_traceme()
smack_ptrace_access_check()
smack_bprm_set_creds() (in case the proces is traced)
This patch unifies all those decisions by introducing one function that
checks whether ptrace is allowed: smk_ptrace_rule_check().
This makes possible to actually trace with TRACEME where first the
TRACEME itself must be allowed and then exec() on a traced process.
Additional bugs fixed:
- The decision is made according to the mode parameter that is now correctly
translated from PTRACE_MODE_* to MAY_* instead of being treated 1:1.
PTRACE_MODE_READ requires MAY_READ.
PTRACE_MODE_ATTACH requires MAY_READWRITE.
- Add a smack audit log in case of exec() refused by bprm_set_creds().
- Honor the PTRACE_MODE_NOAUDIT flag and don't put smack audit info
in case this flag is set.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
The order of subject/object is currently reversed in
smack_ptrace_traceme(). It is currently checked if the tracee has a
capability to trace tracer and according to this rule a decision is made
whether the tracer will be allowed to trace tracee.
Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com>
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Pull file locking updates from Jeff Layton:
"Highlights:
- maintainership change for fs/locks.c. Willy's not interested in
maintaining it these days, and is OK with Bruce and I taking it.
- fix for open vs setlease race that Al ID'ed
- cleanup and consolidation of file locking code
- eliminate unneeded BUG() call
- merge of file-private lock implementation"
* 'locks-3.15' of git://git.samba.org/jlayton/linux:
locks: make locks_mandatory_area check for file-private locks
locks: fix locks_mandatory_locked to respect file-private locks
locks: require that flock->l_pid be set to 0 for file-private locks
locks: add new fcntl cmd values for handling file private locks
locks: skip deadlock detection on FL_FILE_PVT locks
locks: pass the cmd value to fcntl_getlk/getlk64
locks: report l_pid as -1 for FL_FILE_PVT locks
locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
locks: rename locks_remove_flock to locks_remove_file
locks: consolidate checks for compatible filp->f_mode values in setlk handlers
locks: fix posix lock range overflow handling
locks: eliminate BUG() call when there's an unexpected lock on file close
locks: add __acquires and __releases annotations to locks_start and locks_stop
locks: remove "inline" qualifier from fl_link manipulation functions
locks: clean up comment typo
locks: close potential race between setlease and open
MAINTAINERS: update entry for fs/locks.c
Pull renameat2 system call from Miklos Szeredi:
"This adds a new syscall, renameat2(), which is the same as renameat()
but with a flags argument.
The purpose of extending rename is to add cross-rename, a symmetric
variant of rename, which exchanges the two files. This allows
interesting things, which were not possible before, for example
atomically replacing a directory tree with a symlink, etc... This
also allows overlayfs and friends to operate on whiteouts atomically.
Andy Lutomirski also suggested a "noreplace" flag, which disables the
overwriting behavior of rename.
These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
implemented for ext4 as an example and for testing"
* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ext4: add cross rename support
ext4: rename: split out helper functions
ext4: rename: move EMLINK check up
ext4: rename: create ext4_renament structure for local vars
vfs: add cross-rename
vfs: lock_two_nondirectories: allow directory args
security: add flags to rename hooks
vfs: add RENAME_NOREPLACE flag
vfs: add renameat2 syscall
vfs: rename: use common code for dir and non-dir
vfs: rename: move d_move() up
vfs: add d_is_dir()
Pull cgroup updates from Tejun Heo:
"A lot updates for cgroup:
- The biggest one is cgroup's conversion to kernfs. cgroup took
after the long abandoned vfs-entangled sysfs implementation and
made it even more convoluted over time. cgroup's internal objects
were fused with vfs objects which also brought in vfs locking and
object lifetime rules. Naturally, there are places where vfs rules
don't fit and nasty hacks, such as credential switching or lock
dance interleaving inode mutex and cgroup_mutex with object serial
number comparison thrown in to decide whether the operation is
actually necessary, needed to be employed.
After conversion to kernfs, internal object lifetime and locking
rules are mostly isolated from vfs interactions allowing shedding
of several nasty hacks and overall simplification. This will also
allow implmentation of operations which may affect multiple cgroups
which weren't possible before as it would have required nesting
i_mutexes.
- Various simplifications including dropping of module support,
easier cgroup name/path handling, simplified cgroup file type
handling and task_cg_lists optimization.
- Prepatory changes for the planned unified hierarchy, which is still
a patchset away from being actually operational. The dummy
hierarchy is updated to serve as the default unified hierarchy.
Controllers which aren't claimed by other hierarchies are
associated with it, which BTW was what the dummy hierarchy was for
anyway.
- Various fixes from Li and others. This pull request includes some
patches to add missing slab.h to various subsystems. This was
triggered xattr.h include removal from cgroup.h. cgroup.h
indirectly got included a lot of files which brought in xattr.h
which brought in slab.h.
There are several merge commits - one to pull in kernfs updates
necessary for converting cgroup (already in upstream through
driver-core), others for interfering changes in the fixes branch"
* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
cgroup: remove useless argument from cgroup_exit()
cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup: fix cgroup_taskset walking order
cgroup: implement CFTYPE_ONLY_ON_DFL
cgroup: make cgrp_dfl_root mountable
cgroup: drop const from @buffer of cftype->write_string()
cgroup: rename cgroup_dummy_root and related names
cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup: reorganize cgroup bootstrapping
cgroup: relocate setting of CGRP_DEAD
cpuset: use rcu_read_lock() to protect task_cs()
cgroup_freezer: document freezer_fork() subtleties
cgroup: update cgroup_transfer_tasks() to either succeed or fail
cgroup: drop task_lock() protection around task->cgroups
cgroup: update how a newly forked task gets associated with css_set
...
Pull security subsystem updates from James Morris:
"Apart from reordering the SELinux mmap code to ensure DAC is called
before MAC, these are minor maintenance updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: correctly label /proc inodes in use before the policy is loaded
selinux: put the mmap() DAC controls before the MAC controls
selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
evm: enable key retention service automatically
ima: skip memory allocation for empty files
evm: EVM does not use MD5
ima: return d_name.name if d_path fails
integrity: fix checkpatch errors
ima: fix erroneous removal of security.ima xattr
security: integrity: Use a more current logging style
MAINTAINERS: email updates and other misc. changes
ima: reduce memory usage when a template containing the n field is used
ima: restore the original behavior for sending data with ima template
Integrity: Pass commname via get_task_comm()
fs: move i_readcount
ima: use static const char array definitions
security: have cap_dentry_init_security return error
ima: new helper: file_inode(file)
kernel: Mark function as static in kernel/seccomp.c
capability: Use current logging styles
...
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
If flags contain RENAME_EXCHANGE then exchange source and destination files.
There's no restriction on the type of the files; e.g. a directory can be
exchanged with a symlink.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Add flags to security_path_rename() and security_inode_rename() hooks.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Pull s390 compat wrapper rework from Heiko Carstens:
"S390 compat system call wrapper simplification work.
The intention of this work is to get rid of all hand written assembly
compat system call wrappers on s390, which perform proper sign or zero
extension, or pointer conversion of compat system call parameters.
Instead all of this should be done with C code eg by using Al's
COMPAT_SYSCALL_DEFINEx() macro.
Therefore all common code and s390 specific compat system calls have
been converted to the COMPAT_SYSCALL_DEFINEx() macro.
In order to generate correct code all compat system calls may only
have eg compat_ulong_t parameters, but no unsigned long parameters.
Those patches which change parameter types from unsigned long to
compat_ulong_t parameters are separate in this series, but shouldn't
cause any harm.
The only compat system calls which intentionally have 64 bit
parameters (preadv64 and pwritev64) in support of the x86/32 ABI
haven't been changed, but are now only available if an architecture
defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.
System calls which do not have a compat variant but still need proper
zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
get a proper wrapper function with the new s390 specific
COMPAT_SYSCALL_WRAPx() macro:
COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);
which generates the following code (simplified):
asmlinkage long sys_brk(unsigned long brk);
asmlinkage long compat_sys_brk(long brk)
{
return sys_brk((u32)brk);
}
Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
includes both linux/syscall.h and linux/compat.h, it will generate
build errors, if the declaration of sys_brk() doesn't match, or if
there exists a non-matching compat_sys_brk() declaration.
In addition this will intentionally result in a link error if
somewhere else a compat_sys_brk() function exists, which probably
should have been used instead. Two more BUILD_BUG_ONs make sure the
size and type of each compat syscall parameter can be handled
correctly with the s390 specific macros.
I converted the compat system calls step by step to verify the
generated code is correct and matches the previous code. In fact it
did not always match, however that was always a bug in the hand
written asm code.
In result we get less code, less bugs, and much more sanity checking"
* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
s390/compat: add copyright statement
compat: include linux/unistd.h within linux/compat.h
s390/compat: get rid of compat wrapper assembly code
s390/compat: build error for large compat syscall args
mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
ipc/compat: convert to COMPAT_SYSCALL_DEFINE
fs/compat: convert to COMPAT_SYSCALL_DEFINE
security/compat: convert to COMPAT_SYSCALL_DEFINE
mm/compat: convert to COMPAT_SYSCALL_DEFINE
net/compat: convert to COMPAT_SYSCALL_DEFINE
kernel/compat: convert to COMPAT_SYSCALL_DEFINE
fs/compat: optional preadv64/pwrite64 compat system calls
ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
s390/compat: partial parameter conversion within syscall wrappers
s390/compat: automatic zero, sign and pointer conversion of syscalls
s390/compat: add sync_file_range and fallocate compat syscalls
...
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.
This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.
Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.
This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.
This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.
These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.
The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.
Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Conflicts:
Documentation/devicetree/bindings/net/micrel-ks8851.txt
net/core/netpoll.c
The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.
In micrel-ks8851.txt we simply have overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Store and log all PIDs with reference to the initial PID namespace and
use the access functions task_pid_nr() and task_tgid_nr() for task->pid
and task->tgid.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
(informed by ebiederman's c776b5d2)
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
This patch is based on an earlier patch by Eric Paris, he describes
the problem below:
"If an inode is accessed before policy load it will get placed on a
list of inodes to be initialized after policy load. After policy
load we call inode_doinit() which calls inode_doinit_with_dentry()
on all inodes accessed before policy load. In the case of inodes
in procfs that means we'll end up at the bottom where it does:
/* Default to the fs superblock SID. */
isec->sid = sbsec->sid;
if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
if (opt_dentry) {
isec->sclass = inode_mode_to_security_class(...)
rc = selinux_proc_get_sid(opt_dentry,
isec->sclass,
&sid);
if (rc)
goto out_unlock;
isec->sid = sid;
}
}
Since opt_dentry is null, we'll never call selinux_proc_get_sid()
and will leave the inode labeled with the label on the superblock.
I believe a fix would be to mimic the behavior of xattrs. Look
for an alias of the inode. If it can't be found, just leave the
inode uninitialized (and pick it up later) if it can be found, we
should be able to call selinux_proc_get_sid() ..."
On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
However, with this patch in place we see the expected result:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
It turns out that doing the SELinux MAC checks for mmap() before the
DAC checks was causing users and the SELinux policy folks headaches
as users were seeing a lot of SELinux AVC denials for the
memprotect:mmap_zero permission that would have also been denied by
the normal DAC capability checks (CAP_SYS_RAWIO).
Example:
# cat mmap_test.c
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int rc;
void *mem;
mem = mmap(0x0, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (mem == MAP_FAILED)
return errno;
printf("mem = %p\n", mem);
munmap(mem, 4096);
return 0;
}
# gcc -g -O0 -o mmap_test mmap_test.c
# ./mmap_test
mem = (nil)
# ausearch -m AVC | grep mmap_zero
type=AVC msg=audit(...): avc: denied { mmap_zero }
for pid=1025 comm="mmap_test"
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tclass=memprotect
This patch corrects things so that when the above example is run by a
user without CAP_SYS_RAWIO the SELinux AVC is no longer generated as
the DAC capability check fails before the SELinux permission check.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer. The
only thing const achieves is unnecessarily complicating parsing of the
buffer. Drop const from @buffer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Steffen Klassert says:
====================
1) Fix a sleep in atomic when pfkey_sadb2xfrm_user_sec_ctx()
is called from pfkey_compile_policy().
Fix from Nikolay Aleksandrov.
2) security_xfrm_policy_alloc() can be called in process and atomic
context. Add an argument to let the callers choose the appropriate
way. Fix from Nikolay Aleksandrov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.c
Both the r8152 and netback conflicts were simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
For any keyring access type SMACK always used MAY_READWRITE access check.
It prevents reading the key with label "_", which should be allowed for anyone.
This patch changes default access check to MAY_READ and use MAY_READWRITE in only
appropriate cases.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Move the flags representing required permission to linux/key.h as the perm
parameter of security_key_permission() is in terms of them - and not the
permissions mask flags used in key->perm.
Whilst we're at it:
(1) Rename them to be KEY_NEED_xxx rather than KEY_xxx to avoid collisions
with symbols in uapi/linux/input.h.
(2) Don't use key_perm_t for a mask of required permissions, but rather limit
it to the permissions mask attached to the key and arguments related
directly to that.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
security_xfrm_policy_alloc can be called in atomic context so the
allocation should be done with GFP_ATOMIC. Add an argument to let the
callers choose the appropriate way. In order to do so a gfp argument
needs to be added to the method xfrm_policy_alloc_security in struct
security_operations and to the internal function
selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
callers and leave GFP_KERNEL as before for the rest.
The path that needed the gfp argument addition is:
security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)
Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
add it to security_context_to_sid which is used inside and prior to this
patch did only GFP_KERNEL allocation. So add gfp argument to
security_context_to_sid and adjust all of its callers as well.
CC: Paul Moore <paul@paul-moore.com>
CC: Dave Jones <davej@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Fan Du <fan.du@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: LSM list <linux-security-module@vger.kernel.org>
CC: SELinux list <selinux@tycho.nsa.gov>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This fixes CVE-2014-0102.
The following command sequence produces an oops:
keyctl new_session
i=`keyctl newring _ses @s`
keyctl link @s $i
The problem is that search_nested_keyrings() sees two keyrings that have
matching type and description, so keyring_compare_object() returns true.
s_n_k() then passes the key to the iterator function -
keyring_detect_cycle_iterator() - which *should* check to see whether this is
the keyring of interest, not just one with the same name.
Because assoc_array_find() will return one and only one match, I assumed that
the iterator function would only see an exact match or never be called - but
the iterator isn't only called from assoc_array_find()...
The oops looks something like this:
kernel BUG at /data/fs/linux-2.6-fscache/security/keys/keyring.c:1003!
invalid opcode: 0000 [#1] SMP
...
RIP: keyring_detect_cycle_iterator+0xe/0x1f
...
Call Trace:
search_nested_keyrings+0x76/0x2aa
__key_link_check_live_key+0x50/0x5f
key_link+0x4e/0x85
keyctl_keyring_link+0x60/0x81
SyS_keyctl+0x65/0xe4
tracesys+0xdd/0xe2
The fix is to make keyring_detect_cycle_iterator() check that the key it
has is the key it was actually looking for rather than calling BUG_ON().
A testcase has been included in the keyutils testsuite for this:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=891f3365d07f1996778ade0e3428f01878a1790b
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If keys are not enabled, EVM is not visible in the configuration menu.
It may be difficult to figure out what to do unless you really know.
Other subsystems as NFS, CIFS select keys automatically. This patch does
the same.
This patch also removes '(TRUSTED_KEYS=y || TRUSTED_KEYS=n)' dependency,
which is unnecessary. EVM does not depend on trusted keys, but on
encrypted keys. evm.h provides compile time dependency.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Memory allocation is unnecessary for empty files.
This patch calculates the hash without memory allocation.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
EVM does not use MD5 HMAC. Selection of CRYPTO_MD5 can be safely removed.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This is a small refactoring so ima_d_path() returns dentry name
if path reconstruction fails. It simplifies callers actions
and removes code duplication.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Between checkpatch changes (eg. sizeof) and inconsistencies between
Lindent and checkpatch, unfixed checkpatch errors make it difficult
to see new errors. This patch fixes them. Some lines with over 80 chars
remained unchanged to improve code readability.
The "extern" keyword is removed from internal evm.h to make it consistent
with internal ima.h.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
ima_inode_post_setattr() calls ima_must_appraise() to check if the
file needs to be appraised. If it does not then it removes security.ima
xattr. With original policy matching code it might happen that even
file needs to be appraised with FILE_CHECK hook, it might not be
for POST_SETATTR hook. 'security.ima' might be erronously removed.
This patch treats POST_SETATTR as special wildcard function and will
cause ima_must_appraise() to be true if any of the hooks rules matches.
security.ima will not be removed if any of the hooks would require
appraisal.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS3IyXAAoJEHm+PkMAQRiGplAH/ilCikBrCHyZ2938NHNLm+j1
yhfYnEJHLNg7T69KEj3p0cNagO3v9RPWM6UYFBQ6uFIYNN1MBKO7U+mCZuMWzeO8
+tGMV3mn5wx+oYn1RnWCCweQx5AESEl6rYn8udPDKh7LfW5fCLV60jguUjVSQ9IQ
cvtKlWknbiHyM7t1GoYgzN7jlPrRQvcNQZ+Aogzz7uSnJAgwINglBAHS7WP2tiEM
HAU2FoE4b3MbfGaid1vypaYQPBbFebx7Bw2WxAuZhkBbRiUBKlgF0/SYhOTvH38a
Sjpj1EHKfjcuCBt9tht6KP6H56R25vNloGR2+FB+fuQBdujd/SPa9xDflEMcdG4=
=iXnG
-----END PGP SIGNATURE-----
Merge tag 'v3.13' into for-3.15
Linux 3.13
Conflicts:
include/net/xfrm.h
Simple merge where v3.13 removed 'extern' from definitions and the audit
tree did s/u32/unsigned int/ to the same definitions.
Before this change, to correctly calculate the template digest for the
'ima' template, the event name field (id: 'n') length was set to the fixed
size of 256 bytes.
This patch reduces the length of the event name field to the string
length incremented of one (to make room for the termination character '\0')
and handles the specific case of the digest calculation for the 'ima'
template directly in ima_calc_field_array_hash_tfm().
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
With the new template mechanism introduced in IMA since kernel 3.13,
the format of data sent through the binary_runtime_measurements interface
is slightly changed. Now, for a generic measurement, the format of
template data (after the template name) is:
template_len | field1_len | field1 | ... | fieldN_len | fieldN
In addition, fields containing a string now include the '\0' termination
character.
Instead, the format for the 'ima' template should be:
SHA1 digest | event name length | event name
It must be noted that while in the IMA 3.13 code 'event name length' is
'IMA_EVENT_NAME_LEN_MAX + 1' (256 bytes), so that the template digest
is calculated correctly, and 'event name' contains '\0', in the pre 3.13
code 'event name length' is exactly the string length and 'event name'
does not contain the termination character.
The patch restores the behavior of the IMA code pre 3.13 for the 'ima'
template so that legacy userspace tools obtain a consistent behavior
when receiving data from the binary_runtime_measurements interface
regardless of which kernel version is used.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: <stable@vger.kernel.org> # 3.3.13: 3ce1217 ima: define template fields library
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
When we pass task->comm to audit_log_untrustedstring(), we need to pass it
via get_task_comm() because task->comm can be changed to contain untrusted
string by other threads after audit_log_untrustedstring() confirmed that
task->comm does not contain untrusted string.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
A const char pointer allocates memory for a pointer as well as for
a string, This patch replaces a number of the const char pointers
throughout IMA, with a static const char array.
Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Currently, cap_dentry_init_security returns 0 without actually
initializing the security label. This confuses its only caller
(nfs4_label_init_security) which expects an error in that situation, and
causes it to end up sending out junk onto the wire instead of simply
suppressing the label in the attributes sent.
When CONFIG_SECURITY is disabled, security_dentry_init_security returns
-EOPNOTSUPP. Have cap_dentry_init_security do the same.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Convert all compat system call functions where all parameter types
have a size of four or less than four bytes, or are pointer types
to COMPAT_SYSCALL_DEFINE.
The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
zero and sign extension to 64 bit of all parameters if needed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is based on an earlier patch by Eric Paris, he describes
the problem below:
"If an inode is accessed before policy load it will get placed on a
list of inodes to be initialized after policy load. After policy
load we call inode_doinit() which calls inode_doinit_with_dentry()
on all inodes accessed before policy load. In the case of inodes
in procfs that means we'll end up at the bottom where it does:
/* Default to the fs superblock SID. */
isec->sid = sbsec->sid;
if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
if (opt_dentry) {
isec->sclass = inode_mode_to_security_class(...)
rc = selinux_proc_get_sid(opt_dentry,
isec->sclass,
&sid);
if (rc)
goto out_unlock;
isec->sid = sid;
}
}
Since opt_dentry is null, we'll never call selinux_proc_get_sid()
and will leave the inode labeled with the label on the superblock.
I believe a fix would be to mimic the behavior of xattrs. Look
for an alias of the inode. If it can't be found, just leave the
inode uninitialized (and pick it up later) if it can be found, we
should be able to call selinux_proc_get_sid() ..."
On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
However, with this patch in place we see the expected result:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
It turns out that doing the SELinux MAC checks for mmap() before the
DAC checks was causing users and the SELinux policy folks headaches
as users were seeing a lot of SELinux AVC denials for the
memprotect:mmap_zero permission that would have also been denied by
the normal DAC capability checks (CAP_SYS_RAWIO).
Example:
# cat mmap_test.c
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int rc;
void *mem;
mem = mmap(0x0, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (mem == MAP_FAILED)
return errno;
printf("mem = %p\n", mem);
munmap(mem, 4096);
return 0;
}
# gcc -g -O0 -o mmap_test mmap_test.c
# ./mmap_test
mem = (nil)
# ausearch -m AVC | grep mmap_zero
type=AVC msg=audit(...): avc: denied { mmap_zero }
for pid=1025 comm="mmap_test"
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
tclass=memprotect
This patch corrects things so that when the above example is run by a
user without CAP_SYS_RAWIO the SELinux AVC is no longer generated as
the DAC capability check fails before the SELinux permission check.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
The Makefiles in security/ uses a non-standard way to
specify sub-directories for building.
Fix it up so the normal (and documented) approach is used.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.
Since flowcache is tightly coupled with IPsec, so it would be easier to
put flow cache global parameters into xfrm namespace part. And one last
thing needs to do is bumping flow cache genid, and flush flow cache should
also be made in per net style.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
cgroup_subsys is a bit messier than it needs to be.
* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.
* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.
* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.
This patch cleans up cgroup_subsys names and initialization by doing
the followings.
* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.
* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.
cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event
* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.
* Redundant cgroup_subsys declarations removed.
* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.
This patch doesn't introduce any behavior changes.
v2: Rebased on top of fe1217c4f3 ("net: net_cls: move cgroupfs
classid handling into core").
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
The usage of strict_strto*() is not preferred, because
strict_strto*() is obsolete. Thus, kstrto*() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.
Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy. In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.
Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo
Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled. Any subsequent access to foo
after doing the above will also trigger the BUG.
BUG output from Matthew Thode:
[ 473.893141] ------------[ cut here ]------------
[ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
[ 473.995314] invalid opcode: 0000 [#6] SMP
[ 474.027196] Modules linked in:
[ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
3.13.0-grsec #1
[ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
[ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[ 474.556058] Stack:
[ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[ 474.690461] Call Trace:
[ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[ 475.255884] RIP [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 475.296120] RSP <ffff8805c0ac3c38>
[ 475.328734] ---[ end trace f076482e9d754adc ]---
Reported-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS3IyXAAoJEHm+PkMAQRiGplAH/ilCikBrCHyZ2938NHNLm+j1
yhfYnEJHLNg7T69KEj3p0cNagO3v9RPWM6UYFBQ6uFIYNN1MBKO7U+mCZuMWzeO8
+tGMV3mn5wx+oYn1RnWCCweQx5AESEl6rYn8udPDKh7LfW5fCLV60jguUjVSQ9IQ
cvtKlWknbiHyM7t1GoYgzN7jlPrRQvcNQZ+Aogzz7uSnJAgwINglBAHS7WP2tiEM
HAU2FoE4b3MbfGaid1vypaYQPBbFebx7Bw2WxAuZhkBbRiUBKlgF0/SYhOTvH38a
Sjpj1EHKfjcuCBt9tht6KP6H56R25vNloGR2+FB+fuQBdujd/SPa9xDflEMcdG4=
=iXnG
-----END PGP SIGNATURE-----
Merge tag 'v3.13' into stable-3.14
Linux 3.13
Conflicts:
security/selinux/hooks.c
Trivial merge issue in selinux_inet_conn_request() likely due to me
including patches that I sent to the stable folks in my next tree
resulting in the patch hitting twice (I think). Thankfully it was an
easy fix this time, but regardless, lesson learned, I will not do that
again.
Binaries compiled for arm may run on arm64 if CONFIG_COMPAT is
selected. Set LSM_MMAP_MIN_ADDR to 32768 if ARM64 && COMPAT to
prevent selinux failures launching 32-bit static executables that
are mapped at 0x8000.
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull audit update from Eric Paris:
"Again we stayed pretty well contained inside the audit system.
Venturing out was fixing a couple of function prototypes which were
inconsistent (didn't hurt anything, but we used the same value as an
int, uint, u32, and I think even a long in a couple of places).
We also made a couple of minor changes to when a couple of LSMs called
the audit system. We hoped to add aarch64 audit support this go
round, but it wasn't ready.
I'm disappearing on vacation on Thursday. I should have internet
access, but it'll be spotty. If anything goes wrong please be sure to
cc rgb@redhat.com. He'll make fixing things his top priority"
* git://git.infradead.org/users/eparis/audit: (50 commits)
audit: whitespace fix in kernel-parameters.txt
audit: fix location of __net_initdata for audit_net_ops
audit: remove pr_info for every network namespace
audit: Modify a set of system calls in audit class definitions
audit: Convert int limit uses to u32
audit: Use more current logging style
audit: Use hex_byte_pack_upper
audit: correct a type mismatch in audit_syscall_exit()
audit: reorder AUDIT_TTY_SET arguments
audit: rework AUDIT_TTY_SET to only grab spin_lock once
audit: remove needless switch in AUDIT_SET
audit: use define's for audit version
audit: documentation of audit= kernel parameter
audit: wait_for_auditd rework for readability
audit: update MAINTAINERS
audit: log task info on feature change
audit: fix incorrect set of audit_sock
audit: print error message when fail to create audit socket
audit: fix dangling keywords in audit_log_set_loginuid() output
audit: log on errors from filter user rules
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJS3IyXAAoJEHm+PkMAQRiGplAH/ilCikBrCHyZ2938NHNLm+j1
yhfYnEJHLNg7T69KEj3p0cNagO3v9RPWM6UYFBQ6uFIYNN1MBKO7U+mCZuMWzeO8
+tGMV3mn5wx+oYn1RnWCCweQx5AESEl6rYn8udPDKh7LfW5fCLV60jguUjVSQ9IQ
cvtKlWknbiHyM7t1GoYgzN7jlPrRQvcNQZ+Aogzz7uSnJAgwINglBAHS7WP2tiEM
HAU2FoE4b3MbfGaid1vypaYQPBbFebx7Bw2WxAuZhkBbRiUBKlgF0/SYhOTvH38a
Sjpj1EHKfjcuCBt9tht6KP6H56R25vNloGR2+FB+fuQBdujd/SPa9xDflEMcdG4=
=iXnG
-----END PGP SIGNATURE-----
Merge tag 'v3.13' into next
Linux 3.13
Minor fixup needed in selinux_inet_conn_request()
Conflicts:
security/selinux/hooks.c
Pull cgroup updates from Tejun Heo:
"The bulk of changes are cleanups and preparations for the upcoming
kernfs conversion.
- cgroup_event mechanism which is and will be used only by memcg is
moved to memcg.
- pidlist handling is updated so that it can be served by seq_file.
Also, the list is not sorted if sane_behavior. cgroup
documentation explicitly states that the file is not sorted but it
has been for quite some time.
- All cgroup file handling now happens on top of seq_file. This is
to prepare for kernfs conversion. In addition, all operations are
restructured so that they map 1-1 to kernfs operations.
- Other cleanups and low-pri fixes"
* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
cgroup: trivial style updates
cgroup: remove stray references to css_id
doc: cgroups: Fix typo in doc/cgroups
cgroup: fix fail path in cgroup_load_subsys()
cgroup: fix missing unlock on error in cgroup_load_subsys()
cgroup: remove for_each_root_subsys()
cgroup: implement for_each_css()
cgroup: factor out cgroup_subsys_state creation into create_css()
cgroup: combine css handling loops in cgroup_create()
cgroup: reorder operations in cgroup_create()
cgroup: make for_each_subsys() useable under cgroup_root_mutex
cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
cgroup: unify pidlist and other file handling
cgroup: replace cftype->read_seq_string() with cftype->seq_show()
cgroup: attach cgroup_open_file to all cgroup files
cgroup: generalize cgroup_pidlist_open_file
cgroup: unify read path so that seq_file is always used
cgroup: unify cgroup_write_X64() and cgroup_write_string()
cgroup: remove cftype->read(), ->read_map() and ->write()
hugetlb_cgroup: convert away from cftype->read()
...
Pull security layer updates from James Morris:
"Changes for this kernel include maintenance updates for Smack, SELinux
(and several networking fixes), IMA and TPM"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
SELinux: Fix memory leak upon loading policy
tpm/tpm-sysfs: active_show() can be static
tpm: tpm_tis: Fix compile problems with CONFIG_PM_SLEEP/CONFIG_PNP
tpm: Make tpm-dev allocate a per-file structure
tpm: Use the ops structure instead of a copy in tpm_vendor_specific
tpm: Create a tpm_class_ops structure and use it in the drivers
tpm: Pull all driver sysfs code into tpm-sysfs.c
tpm: Move sysfs functions from tpm-interface to tpm-sysfs
tpm: Pull everything related to /dev/tpmX into tpm-dev.c
char: tpm: nuvoton: remove unused variable
tpm: MAINTAINERS: Cleanup TPM Maintainers file
tpm/tpm_i2c_atmel: fix coccinelle warnings
tpm/tpm_ibmvtpm: fix unreachable code warning (smatch warning)
tpm/tpm_i2c_stm_st33: Check return code of get_burstcount
tpm/tpm_ppi: Check return value of acpi_get_name
tpm/tpm_ppi: Do not compare strcmp(a,b) == -1
ima: remove unneeded size_limit argument from ima_eventdigest_init_common()
ima: update IMA-templates.txt documentation
ima: pass HASH_ALGO__LAST as hash algo in ima_eventdigest_init()
ima: change the default hash algorithm to SHA1 in ima_eventdigest_ng_init()
...
Remove the call to audit_log() (which call audit_log_start()) and deal with
the errors in the caller, logging only once if the condition is met. Calling
audit_log_start() in this location makes buffer allocation and locking more
complicated in the calling tree (audit_filter_user()).
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Two of the conditions in selinux_audit_rule_match() should never happen and
the third indicates a race that should be retried. Remove the calls to
audit_log() (which call audit_log_start()) and deal with the errors in the
caller, logging only once if the condition is met. Calling audit_log_start()
in this location makes buffer allocation and locking more complicated in the
calling tree (audit_filter_user()).
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running stress tests on adding and deleting ftrace instances I hit
this bug:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_inode_permission+0x85/0x160
PGD 63681067 PUD 7ddbe067 PMD 0
Oops: 0000 [#1] PREEMPT
CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
Call Trace:
security_inode_permission+0x1c/0x30
__inode_permission+0x41/0xa0
inode_permission+0x18/0x50
link_path_walk+0x66/0x920
path_openat+0xa6/0x6c0
do_filp_open+0x43/0xa0
do_sys_open+0x146/0x240
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
RIP selinux_inode_permission+0x85/0x160
CR2: 0000000000000020
Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.
in selinux_inode_permission():
isec = inode->i_security;
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: [PATCH] SELinux: Fix memory leak upon loading policy
Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<ffffffff812b345c>] security_load_policy+0x6c/0x500
[<ffffffff812a623c>] sel_write_load+0xac/0x750
[<ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch removes the 'size_limit' argument from
ima_eventdigest_init_common(). Since the 'd' field will never include
the hash algorithm as prefix and the 'd-ng' will always have it, we can
use the hash algorithm to differentiate the two cases in the modified
function (it is equal to HASH_ALGO__LAST in the first case, the opposite
in the second).
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Replace the '-1' value with HASH_ALGO__LAST in ima_eventdigest_init()
as the called function ima_eventdigest_init_common() expects an unsigned
char.
Fix commit:
4d7aeee ima: define new template ima-ng and template fields d-ng and n-ng
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Replace HASH_ALGO__LAST with HASH_ALGO_SHA1 as the initial value of
the hash algorithm so that the prefix 'sha1:' is added to violation
digests.
Fix commit:
4d7aeee ima: define new template ima-ng and template fields d-ng and n-ng
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: <stable@vger.kernel.org> # 3.13.x
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Eric Paris politely points out:
Inside smack_file_receive() it seems like you are initting the audit
field with LSM_AUDIT_DATA_TASK. And then use
smk_ad_setfield_u_fs_path().
Seems like LSM_AUDIT_DATA_PATH would make more sense. (and depending
on how it's used fix a crash...)
He is correct. This puts things in order.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The mount restrictions imposed by Smack rely heavily on the
use of the filesystem "floor", which is the label that all
processes writing to the filesystem must have access to. It
turns out that while the "floor" notion is sound, it has yet
to be fully implemented and has never been used.
The sb_mount and sb_umount hooks only make sense if the
filesystem floor is used actively, and it isn't. They can
be reintroduced if a rational restriction comes up. Until
then, they get removed.
The sb_kern_mount hook is required for the option processing.
It is too permissive in the case of unprivileged mounts,
effectively bypassing the CAP_MAC_ADMIN restrictions if
any of the smack options are specified. Unprivileged mounts
are no longer allowed to set Smack filesystem options.
Additionally, the root and default values are set to the
label of the caller, in keeping with the policy that objects
get the label of their creator.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
smk_write_change_rule() is calling capable rather than
the more correct smack_privileged(). This allows for setting
rules in violation of the onlycap facility. This is the
simple repair.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The syslog control requires that the calling proccess
have the floor ("_") Smack label. Tizen does not run any
processes except for kernel helpers with the floor label.
This changes allows the admin to configure a specific
label for syslog. The default value is the star ("*")
label, effectively removing the restriction. The value
can be set using smackfs/syslog for anyone who wants
a more restrictive behavior.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Smack prohibits processes from using the star ("*") and web ("@") labels
because we don't want files with those labels getting created implicitly.
All setting of those labels should be done explicitly. The trouble is that
there is no check for these labels in the processing of SMACK64EXEC. That
is repaired.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
selinux: look for IPsec labels on both inbound and outbound packets
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()
selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()
selinux: fix possible memory leak
This reverts commit 102aefdda4.
Tom London reports that it causes sync() to hang on Fedora rawhide:
https://bugzilla.redhat.com/show_bug.cgi?id=1033965
and Josh Boyer bisected it down to this commit. Reverting the commit in
the rawhide kernel fixes the problem.
Eric Paris root-caused it to incorrect subtype matching in that commit
breaking fuse, and has a tentative patch, but by now we're better off
retrying this in 3.14 rather than playing with it any more.
Reported-by: Tom London <selinux@gmail.com>
Bisected-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Anand Avati <avati@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert "selinux: consider filesystem subtype in policies"
This reverts commit 102aefdda4.
Explanation from Eric Paris:
SELinux policy can specify if it should use a filesystem's
xattrs or not. In current policy we have a specification that
fuse should not use xattrs but fuse.glusterfs should use
xattrs. This patch has a bug in which non-glusterfs
filesystems would match the rule saying fuse.glusterfs should
use xattrs. If both fuse and the particular filesystem in
question are not written to handle xattr calls during the mount
command, they will deadlock.
I have fixed the bug to do proper matching, however I believe a
revert is still the correct solution. The reason I believe
that is because the code still does not work. The s_subtype is
not set until after the SELinux hook which attempts to match on
the ".gluster" portion of the rule. So we cannot match on the
rule in question. The code is useless.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIVAwUAUqdgihOxKuMESys7AQIishAAjGG3LnEp12fd7oay5//4SEg31ybPqGj7
Xotk9mblBW7cWPbLqk7xyIhxpIj2zM14XatEV2TfBNZV0Lcrzr1M5s87ZRUX0ui+
FHbtfn/M3Or8AvX7W1HZgG2Se7M0Ba6Y2RRXNsEdgqk6KMQuHhPEZ2FZ5vCx6BAD
vXzxWIKVNeqMXR7z6xkyqmpztnVQs+ZgzE9+c96QKyhBdtBy4spDmfgJS90m0pjN
HhgONpdfgosknj8yu43rWIQvd3UUO5BVntCeic94Fbgh77ZAEi0dD7ifLz/ebcja
pfJfPXxYzkCfIrSdAzQF8iYnQ+5rRomvGsMcvtq6mBooah/YmEkmKpKJDTp47wyN
IEUeJaxM2Qp2jcUSjEd7lY9o1AK4sj+90cKeVRUd5kzZP1iolkQHR+dGiAwqbn6w
MJBn9fCamvhpsZDhl2G/DICprFDvErd9zAlNubivggJmITnPXNWx+K1RDL4fO4qc
zLHTGHkxYyACM/7oJfwbH/NyZ1yu53OlE3R2h6TA6ISc7nAed7qzGAZyrSV92Quc
pItGaQ/zNa0sbe+nufCx9FOWu8sA3x7qnazLxhVtlPde9nlxcpGo5cagqcyc4hyp
/IGK2JoGkgHvehECY8miJlsu7UqmThIhmy6o4T4X6ErX1M9ifR92WGeXkAi/2mh/
WciVpPvTrqM=
=/AdA
-----END PGP SIGNATURE-----
Merge tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull misc keyrings fixes from David Howells:
"These break down into five sets:
- A patch to error handling in the big_key type for huge payloads.
If the payload is larger than the "low limit" and the backing store
allocation fails, then big_key_instantiate() doesn't clear the
payload pointers in the key, assuming them to have been previously
cleared - but only one of them is.
Unfortunately, the garbage collector still calls big_key_destroy()
when sees one of the pointers with a weird value in it (and not
NULL) which it then tries to clean up.
- Three patches to fix the keyring type:
* A patch to fix the hash function to correctly divide keyrings off
from keys in the topology of the tree inside the associative
array. This is only a problem if searching through nested
keyrings - and only if the hash function incorrectly puts the a
keyring outside of the 0 branch of the root node.
* A patch to fix keyrings' use of the associative array. The
__key_link_begin() function initially passes a NULL key pointer
to assoc_array_insert() on the basis that it's holding a place in
the tree whilst it does more allocation and stuff.
This is only a problem when a node contains 16 keys that match at
that level and we want to add an also matching 17th. This should
easily be manufactured with a keyring full of keyrings (without
chucking any other sort of key into the mix) - except for (a)
above which makes it on average adding the 65th keyring.
* A patch to fix searching down through nested keyrings, where any
keyring in the set has more than 16 keyrings and none of the
first keyrings we look through has a match (before the tree
iteration needs to step to a more distal node).
Test in keyutils test suite:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1
- A patch to fix the big_key type's use of a shmem file as its
backing store causing audit messages and LSM check failures. This
is done by setting S_PRIVATE on the file to avoid LSM checks on the
file (access to the shmem file goes through the keyctl() interface
and so is gated by the LSM that way).
This isn't normally a problem if a key is used by the context that
generated it - and it's currently only used by libkrb5.
Test in keyutils test suite:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e
- A patch to add a generated file to .gitignore.
- A patch to fix the alignment of the system certificate data such
that it it works on s390. As I understand it, on the S390 arch,
symbols must be 2-byte aligned because loading the address discards
the least-significant bit"
* tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
KEYS: correct alignment of system_certificate_list content in assembly file
Ignore generated file kernel/x509_certificate_list
security: shmem: implement kernel private shmem inodes
KEYS: Fix searching of nested keyrings
KEYS: Fix multiple key add into associative array
KEYS: Fix the keyring hash function
KEYS: Pre-clear struct key on allocation
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This is a regression caused by f7112e6c. When either subject or
object is not found the answer for access should be no. This
patch fixes the situation. '0' is written back instead of failing
with -EINVAL.
v2: cosmetic style fixes
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously selinux_skb_peerlbl_sid() would only check for labeled
IPsec security labels on inbound packets, this patch enables it to
check both inbound and outbound traffic for labeled IPsec security
labels.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs. This patch
replaces cftype->read_seq_string() with cftype->seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.
The conversions are mechanical. As ->seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively. In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.
This patch does not introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
We don't need to inspect the packet to determine if the packet is an
IPv4 packet arriving on an IPv6 socket when we can query the
request_sock directly.
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_netlbl_skbuff_setsid() we leverage a cached NetLabel
secattr whenever possible. However, we never check to ensure that
the desired SID matches the cached NetLabel secattr. This patch
checks the SID against the secattr before use and only uses the
cached secattr when the SID values match.
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
The new templates management mechanism records information associated
to an event into an array of 'ima_field_data' structures and makes it
available through the 'template_data' field of the 'ima_template_entry'
structure (the element of the measurements list created by IMA).
Since 'ima_field_data' contains dynamically allocated data (which length
varies depending on the data associated to a selected template field),
it is not enough to just free the memory reserved for a
'ima_template_entry' structure if something goes wrong.
This patch creates the new function ima_free_template_entry() which
walks the array of 'ima_field_data' structures, frees the memory
referenced by the 'data' pointer and finally the space reserved for
the 'ima_template_entry' structure. Further, it replaces existing kfree()
that have a pointer to an 'ima_template_entry' structure as argument
with calls to the new function.
Fixes: a71dc65: ima: switch to new template management mechanism
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
7bc5f447ce (ima: define new function ima_alloc_init_template() to
API) moved the initialization of 'entry' in ima_add_boot_aggregate() a
bit more below, after the if (ima_used_chip).
So, 'entry' is not initialized while being inside this if-block. So, we
should not attempt to free it.
Found by Coverity (CID: 1131971)
Fixes: 7bc5f447ce (ima: define new function ima_alloc_init_template() to API)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
We have a problem where the big_key key storage implementation uses a
shmem backed inode to hold the key contents. Because of this detail of
implementation LSM checks are being done between processes trying to
read the keys and the tmpfs backed inode. The LSM checks are already
being handled on the key interface level and should not be enforced at
the inode level (since the inode is an implementation detail, not a
part of the security model)
This patch implements a new function shmem_kernel_file_setup() which
returns the equivalent to shmem_file_setup() only the underlying inode
has S_PRIVATE set. This means that all LSM checks for the inode in
question are skipped. It should only be used for kernel internal
operations where the inode is not exposed to userspace without proper
LSM checking. It is possible that some other users of
shmem_file_setup() should use the new interface, but this has not been
explored.
Reproducing this bug is a little bit difficult. The steps I used on
Fedora are:
(1) Turn off selinux enforcing:
setenforce 0
(2) Create a huge key
k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`
(3) Access the key in another context:
runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null
(4) Examine the audit logs:
ausearch -m AVC -i --subject httpd_t | audit2allow
If the last command's output includes a line that looks like:
allow httpd_t user_tmpfs_t:file { open read };
There was an inode check between httpd and the tmpfs filesystem. With
this patch no such denial will be seen. (NOTE! you should clear your
audit log if you have tested for this previously)
(Please return you box to enforcing)
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hugh Dickins <hughd@google.com>
cc: linux-mm@kvack.org
If a keyring contains more than 16 keyrings (the capacity of a single node in
the associative array) then those keyrings are split over multiple nodes
arranged as a tree.
If search_nested_keyrings() is called to search the keyring then it will
attempt to manually walk over just the 0 branch of the associative array tree
where all the keyring links are stored. This works provided the key is found
before the algorithm steps from one node containing keyrings to a child node
or if there are sufficiently few keyring links that the keyrings are all in
one node.
However, if the algorithm does need to step from a node to a child node, it
doesn't change the node pointer unless a shortcut also gets transited. This
means that the algorithm will keep scanning the same node over and over again
without terminating and without returning.
To fix this, move the internal-pointer-to-node translation from inside the
shortcut transit handler so that it applies it to node arrival as well.
This can be tested by:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done
for ((i=17; i<=20; i++)); do keyctl search $r user a$i; done
The searches should all complete successfully (or with an error for 17-20),
but instead one or more of them will hang.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
If sufficient keys (or keyrings) are added into a keyring such that a node in
the associative array's tree overflows (each node has a capacity N, currently
16) and such that all N+1 keys have the same index key segment for that level
of the tree (the level'th nibble of the index key), then assoc_array_insert()
calls ops->diff_objects() to indicate at which bit position the two index keys
vary.
However, __key_link_begin() passes a NULL object to assoc_array_insert() with
the intention of supplying the correct pointer later before we commit the
change. This means that keyring_diff_objects() is given a NULL pointer as one
of its arguments which it does not expect. This results in an oops like the
attached.
With the previous patch to fix the keyring hash function, this can be forced
much more easily by creating a keyring and only adding keyrings to it. Add any
other sort of key and a different insertion path is taken - all 16+1 objects
must want to cluster in the same node slot.
This can be tested by:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
This should work fine, but oopses when the 17th keyring is added.
Since ops->diff_objects() is always called with the first pointer pointing to
the object to be inserted (ie. the NULL pointer), we can fix the problem by
changing the to-be-inserted object pointer to point to the index key passed
into assoc_array_insert() instead.
Whilst we're at it, we also switch the arguments so that they are the same as
for ->compare_object().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
Call Trace:
[<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
[<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
[<ffffffff811929a7>] __key_link_begin+0x78/0xe5
[<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
[<ffffffff81192e0a>] SyS_add_key+0x123/0x183
[<ffffffff81400ddb>] tracesys+0xdd/0xe2
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
The keyring hash function (used by the associative array) is supposed to clear
the bottommost nibble of the index key (where the hash value resides) for
keyrings and make sure it is non-zero for non-keyrings. This is done to make
keyrings cluster together on one branch of the tree separately to other keys.
Unfortunately, the wrong mask is used, so only the bottom two bits are
examined and cleared and not the whole bottom nibble. This means that keys
and keyrings can still be successfully searched for under most circumstances
as the hash is consistent in its miscalculation, but if a keyring's
associative array bottom node gets filled up then approx 75% of the keyrings
will not be put into the 0 branch.
The consequence of this is that a key in a keyring linked to by another
keyring, ie.
keyring A -> keyring B -> key
may not be found if the search starts at keyring A and then descends into
keyring B because search_nested_keyrings() only searches up the 0 branch (as it
"knows" all keyrings must be there and not elsewhere in the tree).
The fix is to use the right mask.
This can be tested with:
r=`keyctl newring sandbox @s`
for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
for ((i=0; i<=16; i++)); do keyctl add user a$i a %:ring$i; done
for ((i=0; i<=16; i++)); do keyctl search $r user a$i; done
This creates a sandbox keyring, then creates 17 keyrings therein (labelled
ring0..ring16). This causes the root node of the sandbox's associative array
to overflow and for the tree to have extra nodes inserted.
Each keyring then is given a user key (labelled aN for ringN) for us to search
for.
We then search for the user keys we added, starting from the sandbox. If
working correctly, it should return the same ordered list of key IDs as
for...keyctl add... did. Without this patch, it reports ENOKEY "Required key
not available" for some of the keys. Just which keys get this depends as the
kernel pointer to the key type forms part of the hash function.
Reported-by: Nalin Dahyabhai <nalin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
The second word of key->payload does not get initialised in key_alloc(), but
the big_key type is relying on it having been cleared. The problem comes when
big_key fails to instantiate a large key and doesn't then set the payload. The
big_key_destroy() op is called from the garbage collector and this assumes that
the dentry pointer stored in the second word will be NULL if instantiation did
not complete.
Therefore just pre-clear the entire struct key on allocation rather than trying
to be clever and only initialising to 0 only those bits that aren't otherwise
initialised.
The lack of initialisation can lead to a bug report like the following if
big_key failed to initialise its file:
general protection fault: 0000 [#1] SMP
Modules linked in: ...
CPU: 0 PID: 51 Comm: kworker/0:1 Not tainted 3.10.0-53.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge 1955/0HC513, BIOS 1.4.4 12/09/2008
Workqueue: events key_garbage_collector
task: ffff8801294f5680 ti: ffff8801296e2000 task.ti: ffff8801296e2000
RIP: 0010:[<ffffffff811b4a51>] dput+0x21/0x2d0
...
Call Trace:
[<ffffffff811a7b06>] path_put+0x16/0x30
[<ffffffff81235604>] big_key_destroy+0x44/0x60
[<ffffffff8122dc4b>] key_gc_unused_keys.constprop.2+0x5b/0xe0
[<ffffffff8122df2f>] key_garbage_collector+0x1df/0x3c0
[<ffffffff8107759b>] process_one_work+0x17b/0x460
[<ffffffff8107834b>] worker_thread+0x11b/0x400
[<ffffffff81078230>] ? rescuer_thread+0x3e0/0x3e0
[<ffffffff8107eb00>] kthread+0xc0/0xd0
[<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110
[<ffffffff815c4bec>] ret_from_fork+0x7c/0xb0
[<ffffffff8107ea40>] ? kthread_create_on_node+0x110/0x110
Reported-by: Patrik Kis <pkis@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stephen Gallagher <sgallagh@redhat.com>
This patch stores the address of the 'template_fmt_copy' variable in a new
variable, called 'template_fmt_ptr', so that the latter is passed as an
argument of strsep() instead of the former. This modification is needed
in order to correctly free the memory area referenced by
'template_fmt_copy' (strsep() modifies the pointer of the passed string).
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Free 'ctx_str' when necessary.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch defines a new value for the 'ima_show_type' enumerator
(IMA_SHOW_BINARY_NO_FIELD_LEN) to prevent that the field length
is transmitted through the 'binary_runtime_measurements' interface
for the digest field of the 'ima' template.
Fixes commit: 3ce1217 ima: define template fields library and new helpers
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
To maintain compatibility with userspace tools, the field length must not
be included in the template digest calculation for the 'ima' template.
Fixes commit: a71dc65 ima: switch to new template management mechanism
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This reverts commit 217091dd7a, which
caused the following build error:
security/integrity/digsig.c:70:5: error: redefinition of ‘integrity_init_keyring’
security/integrity/integrity.h:149:12: note: previous definition of ‘integrity_init_keyring’ w
security/integrity/integrity.h:149:12: warning: ‘integrity_init_keyring’ defined but not used
reported by Krzysztof Kolasa. Mimi says:
"I made the classic mistake of requesting this patch to be upstreamed
at the last second, rather than waiting until the next open window.
At this point, the best course would probably be to revert the two
commits and fix them for the next open window"
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"In this patchset, we finally get an SELinux update, with Paul Moore
taking over as maintainer of that code.
Also a significant update for the Keys subsystem, as well as
maintenance updates to Smack, IMA, TPM, and Apparmor"
and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:
"Okay. There are a number of separate bits. I'll go over the big bits
and the odd important other bit, most of the smaller bits are just
fixes and cleanups. If you want the small bits accounting for, I can
do that too.
(1) Keyring capacity expansion.
KEYS: Consolidate the concept of an 'index key' for key access
KEYS: Introduce a search context structure
KEYS: Search for auth-key by name rather than target key ID
Add a generic associative array implementation.
KEYS: Expand the capacity of a keyring
Several of the patches are providing an expansion of the capacity of a
keyring. Currently, the maximum size of a keyring payload is one page.
Subtract a small header and then divide up into pointers, that only gives
you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
a keyring to store ID mapping data, that has proven to be insufficient to
the cause.
Whatever data structure I use to handle the keyring payload, it can only
store pointers to keys, not the keys themselves because several keyrings
may point to a single key. This precludes inserting, say, and rb_node
struct into the key struct for this purpose.
I could make an rbtree of records such that each record has an rb_node
and a key pointer, but that would use four words of space per key stored
in the keyring. It would, however, be able to use much existing code.
I selected instead a non-rebalancing radix-tree type approach as that
could have a better space-used/key-pointer ratio. I could have used the
radix tree implementation that we already have and insert keys into it by
their serial numbers, but that means any sort of search must iterate over
the whole radix tree. Further, its nodes are a bit on the capacious side
for what I want - especially given that key serial numbers are randomly
allocated, thus leaving a lot of empty space in the tree.
So what I have is an associative array that internally is a radix-tree
with 16 pointers per node where the index key is constructed from the key
type pointer and the key description. This means that an exact lookup by
type+description is very fast as this tells us how to navigate directly to
the target key.
I made the data structure general in lib/assoc_array.c as far as it is
concerned, its index key is just a sequence of bits that leads to a
pointer. It's possible that someone else will be able to make use of it
also. FS-Cache might, for example.
(2) Mark keys as 'trusted' and keyrings as 'trusted only'.
KEYS: verify a certificate is signed by a 'trusted' key
KEYS: Make the system 'trusted' keyring viewable by userspace
KEYS: Add a 'trusted' flag and a 'trusted only' flag
KEYS: Separate the kernel signature checking keyring from module signing
These patches allow keys carrying asymmetric public keys to be marked as
being 'trusted' and allow keyrings to be marked as only permitting the
addition or linkage of trusted keys.
Keys loaded from hardware during kernel boot or compiled into the kernel
during build are marked as being trusted automatically. New keys can be
loaded at runtime with add_key(). They are checked against the system
keyring contents and if their signatures can be validated with keys that
are already marked trusted, then they are marked trusted also and can
thus be added into the master keyring.
Patches from Mimi Zohar make this usable with the IMA keyrings also.
(3) Remove the date checks on the key used to validate a module signature.
X.509: Remove certificate date checks
It's not reasonable to reject a signature just because the key that it was
generated with is no longer valid datewise - especially if the kernel
hasn't yet managed to set the system clock when the first module is
loaded - so just remove those checks.
(4) Make it simpler to deal with additional X.509 being loaded into the kernel.
KEYS: Load *.x509 files into kernel keyring
KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate
The builder of the kernel now just places files with the extension ".x509"
into the kernel source or build trees and they're concatenated by the
kernel build and stuffed into the appropriate section.
(5) Add support for userspace kerberos to use keyrings.
KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
KEYS: Implement a big key type that can save to tmpfs
Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
We looked at storing it in keyrings instead as that confers certain
advantages such as tickets being automatically deleted after a certain
amount of time and the ability for the kernel to get at these tokens more
easily.
To make this work, two things were needed:
(a) A way for the tickets to persist beyond the lifetime of all a user's
sessions so that cron-driven processes can still use them.
The problem is that a user's session keyrings are deleted when the
session that spawned them logs out and the user's user keyring is
deleted when the UID is deleted (typically when the last log out
happens), so neither of these places is suitable.
I've added a system keyring into which a 'persistent' keyring is
created for each UID on request. Each time a user requests their
persistent keyring, the expiry time on it is set anew. If the user
doesn't ask for it for, say, three days, the keyring is automatically
expired and garbage collected using the existing gc. All the kerberos
tokens it held are then also gc'd.
(b) A key type that can hold really big tickets (up to 1MB in size).
The problem is that Active Directory can return huge tickets with lots
of auxiliary data attached. We don't, however, want to eat up huge
tracts of unswappable kernel space for this, so if the ticket is
greater than a certain size, we create a swappable shmem file and dump
the contents in there and just live with the fact we then have an
inode and a dentry overhead. If the ticket is smaller than that, we
slap it in a kmalloc()'d buffer"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
KEYS: Fix keyring content gc scanner
KEYS: Fix error handling in big_key instantiation
KEYS: Fix UID check in keyctl_get_persistent()
KEYS: The RSA public key algorithm needs to select MPILIB
ima: define '_ima' as a builtin 'trusted' keyring
ima: extend the measurement list to include the file signature
kernel/system_certificate.S: use real contents instead of macro GLOBAL()
KEYS: fix error return code in big_key_instantiate()
KEYS: Fix keyring quota misaccounting on key replacement and unlink
KEYS: Fix a race between negating a key and reading the error set
KEYS: Make BIG_KEYS boolean
apparmor: remove the "task" arg from may_change_ptraced_domain()
apparmor: remove parent task info from audit logging
apparmor: remove tsk field from the apparmor_audit_struct
apparmor: fix capability to not use the current task, during reporting
Smack: Ptrace access check mode
ima: provide hash algo info in the xattr
ima: enable support for larger default filedata hash algorithms
ima: define kernel parameter 'ima_template=' to change configured default
ima: add Kconfig default measurement list template
...
Pull audit updates from Eric Paris:
"Nothing amazing. Formatting, small bug fixes, couple of fixes where
we didn't get records due to some old VFS changes, and a change to how
we collect execve info..."
Fixed conflict in fs/exec.c as per Eric and linux-next.
* git://git.infradead.org/users/eparis/audit: (28 commits)
audit: fix type of sessionid in audit_set_loginuid()
audit: call audit_bprm() only once to add AUDIT_EXECVE information
audit: move audit_aux_data_execve contents into audit_context union
audit: remove unused envc member of audit_aux_data_execve
audit: Kill the unused struct audit_aux_data_capset
audit: do not reject all AUDIT_INODE filter types
audit: suppress stock memalloc failure warnings since already managed
audit: log the audit_names record type
audit: add child record before the create to handle case where create fails
audit: use given values in tty_audit enable api
audit: use nlmsg_len() to get message payload length
audit: use memset instead of trying to initialize field by field
audit: fix info leak in AUDIT_GET requests
audit: update AUDIT_INODE filter rule to comparator function
audit: audit feature to set loginuid immutable
audit: audit feature to only allow unsetting the loginuid
audit: allow unsetting the loginuid (with priv)
audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
audit: loginuid functions coding style
selinux: apply selinux checks on new audit message types
...
Dynamically allocate a couple of the larger stack variables in order to
reduce the stack footprint below 1024. gcc-4.8
security/selinux/ss/services.c: In function 'security_load_policy':
security/selinux/ss/services.c:1964:1: warning: the frame size of 1104 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
Also silence a couple of checkpatch warnings at the same time.
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(oldpolicydb, &policydb, sizeof policydb);
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(&policydb, newpolicydb, sizeof policydb);
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Update the policy version (POLICYDB_VERSION_CONSTRAINT_NAMES) to allow
holding of policy source info for constraints.
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Key pointers stored in the keyring are marked in bit 1 to indicate if they
point to a keyring. We need to strip off this bit before using the pointer
when iterating over the keyring for the purpose of looking for links to garbage
collect.
This means that expirable keyrings aren't correctly expiring because the
checker is seeing their key pointer with 2 added to it.
Since the fix for this involves knowing about the internals of the keyring,
key_gc_keyring() is moved to keyring.c and merged into keyring_gc().
This can be tested by:
echo 2 >/proc/sys/kernel/keys/gc_delay
keyctl timeout `keyctl add keyring qwerty "" @s` 2
cat /proc/keys
sleep 5; cat /proc/keys
which should see a keyring called "qwerty" appear in the session keyring and
then disappear after it expires, and:
echo 2 >/proc/sys/kernel/keys/gc_delay
a=`keyctl get_persistent @s`
b=`keyctl add keyring 0 "" $a`
keyctl add user a a $b
keyctl timeout $b 2
cat /proc/keys
sleep 5; cat /proc/keys
which should see a keyring called "0" with a key called "a" in it appear in the
user's persistent keyring (which will be attached to the session keyring) and
then both the "0" keyring and the "a" key should disappear when the "0" keyring
expires.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Simo Sorce <simo@redhat.com>
In the big_key_instantiate() function we return 0 if kernel_write() returns us
an error rather than returning an error. This can potentially lead to
dentry_open() giving a BUG when called from big_key_read() with an unset
tmpfile path.
------------[ cut here ]------------
kernel BUG at fs/open.c:798!
...
RIP: 0010:[<ffffffff8119bbd1>] dentry_open+0xd1/0xe0
...
Call Trace:
[<ffffffff812350c5>] big_key_read+0x55/0x100
[<ffffffff81231084>] keyctl_read_key+0xb4/0xe0
[<ffffffff81231e58>] SyS_keyctl+0xf8/0x1d0
[<ffffffff815bb799>] system_call_fastpath+0x16/0x1b
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Stephen Gallagher <sgallagh@redhat.com>
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
Pull cgroup changes from Tejun Heo:
"Not too much activity this time around. css_id is finally killed and
a minor update to device_cgroup"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: remove can_attach
cgroup: kill css_id
memcg: stop using css id
memcg: fail to create cgroup if the cgroup id is too big
memcg: convert to use cgroup id
memcg: convert to use cgroup_is_descendant()
If the UID is specified by userspace when calling the KEYCTL_GET_PERSISTENT
function and the process does not have the CAP_SETUID capability, then the
function will return -EPERM if the current process's uid, suid, euid and fsuid
all match the requested UID. This is incorrect.
Fix it such that when a non-privileged caller requests a persistent keyring by
a specific UID they can only request their own (ie. the specified UID matches
either then process's UID or the process's EUID).
This can be tested by logging in as the user and doing:
keyctl get_persistent @p
keyctl get_persistent @p `id -u`
keyctl get_persistent @p 0
The first two should successfully print the same key ID. The third should do
the same if called by UID 0 or indicate Operation Not Permitted otherwise.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Gallagher <sgallagh@redhat.com>
Supress the stock memory allocation failure warnings for audit buffers
since audit alreay takes care of memory allocation failure warnings, including
rate-limiting, in audit_log_start().
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
We use the read check to get the feature set (like AUDIT_GET) and the
write check to set the features (like AUDIT_SET).
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Require all keys added to the IMA keyring be signed by an
existing trusted key on the system trusted keyring.
Changelog:
- define stub integrity_init_keyring() function (reported-by Fengguang Wu)
- differentiate between regular and trusted keyring names.
- replace printk with pr_info (D. Kasatkin)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
This patch defines a new template called 'ima-sig', which includes
the file signature in the template data, in addition to the file's
digest and pathname.
A template is composed of a set of fields. Associated with each
field is an initialization and display function. This patch defines
a new template field called 'sig', the initialization function
ima_eventsig_init(), and the display function ima_show_template_sig().
This patch modifies the .field_init() function definition to include
the 'security.ima' extended attribute and length.
Changelog:
- remove unused code (Dmitry Kasatkin)
- avoid calling ima_write_template_field_data() unnecesarily (Roberto Sassu)
- rename DATA_FMT_SIG to DATA_FMT_HEX
- cleanup ima_eventsig_init() based on Roberto's comments
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
If a key is displaced from a keyring by a matching one, then four more bytes
of quota are allocated to the keyring - despite the fact that the keyring does
not change in size.
Further, when a key is unlinked from a keyring, the four bytes of quota
allocated the link isn't recovered and returned to the user's pool.
The first can be tested by repeating:
keyctl add big_key a fred @s
cat /proc/key-users
(Don't put it in a shell loop otherwise the garbage collector won't have time
to clear the displaced keys, thus affecting the result).
This was causing the kerberos keyring to run out of room fairly quickly.
The second can be tested by:
cat /proc/key-users
a=`keyctl add user a a @s`
cat /proc/key-users
keyctl unlink $a
sleep 1 # Give RCU a chance to delete the key
cat /proc/key-users
assuming no system activity that otherwise adds/removes keys, the amount of
key data allocated should go up (say 40/20000 -> 47/20000) and then return to
the original value at the end.
Reported-by: Stephen Gallagher <sgallagh@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Having the big_keys functionality as a module is very marginally useful.
The userspace code that would use this functionality will get odd error
messages from the keys layer if the module isn't loaded. The code itself
is fairly small, so just have this as a boolean option and not a tristate.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Unless task == current ptrace_parent(task) is not safe even under
rcu_read_lock() and most of the current users are not right.
So may_change_ptraced_domain(task) looks wrong as well. However it
is always called with task == current so the code is actually fine.
Remove this argument to make this fact clear.
Note: perhaps we should simply kill ptrace_parent(), it buys almost
nothing. And it is obviously racy, perhaps this should be fixed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The reporting of the parent task info is a vestage from old versions of
apparmor. The need for this information was removed by unique null-
profiles before apparmor was upstreamed so remove this info from logging.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Now that aa_capabile no longer sets the task field it can be removed
and the lsm_audit version of the field can be used.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Mediation is based off of the cred but auditing includes the current
task which may not be related to the actual request.
Signed-off-by: John Johansen <john.johansen@canonical.com>
When the ptrace security hooks were split the addition of
a mode parameter was not taken advantage of in the Smack
ptrace access check. This changes the access check from
always looking for read and write access to using the
passed mode. This will make use of /proc much happier.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
All files labeled with 'security.ima' hashes, are hashed using the
same hash algorithm. Changing from one hash algorithm to another,
requires relabeling the filesystem. This patch defines a new xattr
type, which includes the hash algorithm, permitting different files
to be hashed with different algorithms.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The IMA measurement list contains two hashes - a template data hash
and a filedata hash. The template data hash is committed to the TPM,
which is limited, by the TPM v1.2 specification, to 20 bytes. The
filedata hash is defined as 20 bytes as well.
Now that support for variable length measurement list templates was
added, the filedata hash is not limited to 20 bytes. This patch adds
Kconfig support for defining larger default filedata hash algorithms
and replacing the builtin default with one specified on the kernel
command line.
<uapi/linux/hash_info.h> contains a list of hash algorithms. The
Kconfig default hash algorithm is a subset of this list, but any hash
algorithm included in the list can be specified at boot, using the
'ima_hash=' kernel command line option.
Changelog v2:
- update Kconfig
Changelog:
- support hashes that are configured
- use generic HASH_ALGO_ definitions
- add Kconfig support
- hash_setup must be called only once (Dmitry)
- removed trailing whitespaces (Roberto Sassu)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
This patch allows users to specify from the kernel command line the
template descriptor, among those defined, that will be used to generate
and display measurement entries. If an user specifies a wrong template,
IMA reverts to the template descriptor set in the kernel configuration.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch adds a Kconfig option to select the default IMA
measurement list template. The 'ima' template limited the
filedata hash to 20 bytes and the pathname to 255 charaters.
The 'ima-ng' measurement list template permits larger hash
digests and longer pathnames.
Changelog:
- keep 'select CRYPTO_HASH_INFO' in 'config IMA' section (Kconfig)
(Roberto Sassu);
- removed trailing whitespaces (Roberto Sassu).
- Lindent fixes
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
The same hash algorithm should be used for calculating the file
data hash for the IMA measurement list, as for appraising the file
data integrity. (The appraise hash algorithm is stored in the
'security.ima' extended attribute.) The exception is when the
reference file data hash digest, stored in the extended attribute,
is larger than the one supported by the template. In this case,
the file data hash needs to be calculated twice, once for the
measurement list and, again, for appraisal.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Different files might be signed based on different hash algorithms.
This patch prefixes the audit log measurement hash with the hash
algorithm.
Changelog:
- use generic HASH_ALGO defintions
- use ':' as delimiter between the hash algorithm and the digest
(Roberto Sassu)
- always include the hash algorithm used when audit-logging a measurement
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Peter Moody <pmoody@google.com>
This patch performs the switch to the new template mechanism by modifying
the functions ima_alloc_init_template(), ima_measurements_show() and
ima_ascii_measurements_show(). The old function ima_template_show() was
removed as it is no longer needed. Also, if the template descriptor used
to generate a measurement entry is not 'ima', the whole length of field
data stored for an entry is provided before the data itself through the
binary_runtime_measurement interface.
Changelog:
- unnecessary to use strncmp() (Mimi Zohar)
- create new variable 'field' in ima_alloc_init_template() (Roberto Sassu)
- use GFP_NOFS flag in ima_alloc_init_template() (Roberto Sassu)
- new variable 'num_fields' in ima_store_template() (Roberto Sassu,
proposed by Mimi Zohar)
- rename ima_calc_buffer_hash/template_hash() to ima_calc_field_array_hash(),
something more generic (Mimi, requested by Dmitry)
- sparse error fix - Fengguang Wu
- fix lindent warnings
- always include the field length in the template data length
- include the template field length variable size in the template data length
- include both the template field data and field length in the template digest
calculation. Simplifies verifying the template digest. (Mimi)
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch adds support for the new template 'ima-ng', whose format
is defined as 'd-ng|n-ng'. These new field definitions remove the
size limitations of the original 'ima' template. Further, the 'd-ng'
field prefixes the inode digest with the hash algorithim, when
displaying the new larger digest sizes.
Change log:
- scripts/Lindent fixes - Mimi
- "always true comparison" - reported by Fengguang Wu, resolved Dmitry
- initialize hash_algo variable to HASH_ALGO__LAST
- always prefix digest with hash algorithm - Mimi
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch defines a library containing two initial template fields,
inode digest (d) and file name (n), the 'ima' template descriptor,
whose format is 'd|n', and two helper functions,
ima_write_template_field_data() and ima_show_template_field_data().
Changelog:
- replace ima_eventname_init() parameter NULL checking with BUG_ON.
(suggested by Mimi)
- include "new template fields for inode digest (d) and file name (n)"
definitions to fix a compiler warning. - Mimi
- unnecessary to prefix static function names with 'ima_'. remove
prefix to resolve Lindent formatting changes. - Mimi
- abbreviated/removed inline comments - Mimi
- always send the template field length - Mimi
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The original 'ima' template is fixed length, containing the filedata hash
and pathname. The filedata hash is limited to 20 bytes (md5/sha1). The
pathname is a null terminated string, limited to 255 characters. To
overcome these limitations and to add additional file metadata, it is
necessary to extend the current version of IMA by defining additional
templates.
The main reason to introduce this feature is that, each time a new
template is defined, the functions that generate and display the
measurement list would include the code for handling a new format and,
thus, would significantly grow over time.
This patch set solves this problem by separating the template management
from the remaining IMA code. The core of this solution is the definition
of two new data structures: a template descriptor, to determine which
information should be included in the measurement list, and a template
field, to generate and display data of a given type.
To define a new template field, developers define the field identifier
and implement two functions, init() and show(), respectively to generate
and display measurement entries. Initially, this patch set defines the
following template fields (support for additional data types will be
added later):
- 'd': the digest of the event (i.e. the digest of a measured file),
calculated with the SHA1 or MD5 hash algorithm;
- 'n': the name of the event (i.e. the file name), with size up to
255 bytes;
- 'd-ng': the digest of the event, calculated with an arbitrary hash
algorithm (field format: [<hash algo>:]digest, where the digest
prefix is shown only if the hash algorithm is not SHA1 or MD5);
- 'n-ng': the name of the event, without size limitations.
Defining a new template descriptor requires specifying the template format,
a string of field identifiers separated by the '|' character. This patch
set defines the following template descriptors:
- "ima": its format is 'd|n';
- "ima-ng" (default): its format is 'd-ng|n-ng'
Further details about the new template architecture can be found in
Documentation/security/IMA-templates.txt.
Changelog:
- don't defer calling ima_init_template() - Mimi
- don't define ima_lookup_template_desc() until used - Mimi
- squashed with documentation patch - Mimi
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Instead of allocating and initializing the template entry from multiple
places (eg. boot aggregate, violation, and regular measurements), this
patch defines a new function called ima_alloc_init_template(). The new
function allocates and initializes the measurement entry with the inode
digest and the filename.
In respect to the current behavior, it truncates the file name passed
in the 'filename' argument if the latter's size is greater than 255 bytes
and the passed file descriptor is NULL.
Changelog:
- initialize 'hash' variable for non TPM case - Mimi
- conform to expectation for 'iint' to be defined as a pointer. - Mimi
- add missing 'file' dependency for recalculating file hash. - Mimi
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Pass the filename argument to ima_add_template_entry() in order to
eliminate a dependency on template specific data (third argument of
integrity_audit_msg).
This change is required because, with the new template management
mechanism, the generation of a new measurement entry will be performed
by new specific functions (introduced in next patches) and the current IMA
code will not be aware anymore of how data is stored in the entry payload.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Pass the file descriptor instead of the inode to ima_add_violation(),
to make the latter consistent with ima_store_measurement() in
preparation for the new template architecture.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
With multiple hash algorithms, ima_hash_tfm is no longer guaranteed to be sha1.
Need to force to use sha1.
Changelog:
- pass ima_digest_data to ima_calc_boot_aggregate() instead of char *
(Roberto Sassu);
- create an ima_digest_data structure in ima_add_boot_aggregate()
(Roberto Sassu);
- pass hash->algo to ima_alloc_tfm() (Roberto Sassu, reported by Dmitry).
- "move hash definition in ima_add_boot_aggregate()" commit hunk to here.
- sparse warning fix - Fengguang Wu
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
ima_calc_buffer_hash will be used with different hash algorithms.
This patch provides support for arbitrary hash algorithms in
ima_calc_buffer_hash.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch provides dedicated hash algo allocation and
deallocation function which can be used by different clients.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The TPM v1.2 limits the template hash size to 20 bytes. This
patch differentiates between the template hash size, as defined
in the ima_template_entry, and the file data hash size, as
defined in the ima_template_data. Subsequent patches add support
for different file data hash algorithms.
Change log:
- hash digest definition in ima_store_template() should be TPM_DIGEST_SIZE
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
For each inode in the IMA policy, an iint is allocated. To support
larger hash digests, the iint digest size changed from 20 bytes to
the maximum supported hash digest size. Instead of allocating the
maximum size, which most likely is not needed, this patch dynamically
allocates the needed hash storage.
Changelog:
- fix krealloc bug
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
For possibility to use xattr type for new signature formats,
pass full xattr to the signature verification function.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
All files on the filesystem, currently, are hashed using the same hash
algorithm. In preparation for files from different packages being
signed using different hash algorithms, this patch adds support for
reading the signature hash algorithm from the 'security.ima' extended
attribute and calculates the appropriate file data hash based on it.
Changelog:
- fix scripts Lindent and checkpatch msgs - Mimi
- fix md5 support for older version, which occupied 20 bytes in the
xattr, not the expected 16 bytes. Fix the comparison to compare
only the first 16 bytes.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
In preparation of supporting more hash algorithms with larger hash sizes
needed for signature verification, this patch replaces the 20 byte sized
digest, with a more flexible structure. The new structure includes the
hash algorithm, digest size, and digest.
Changelog:
- recalculate filedata hash for the measurement list, if the signature
hash digest size is greater than 20 bytes.
- use generic HASH_ALGO_
- make ima_calc_file_hash static
- scripts lindent and checkpatch fixes
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This reverts commit 4c2c392763.
Everything in the initramfs should be measured and appraised,
but until the initramfs has extended attribute support, at
least measured.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
It is really only wanting to duplicate a check which is already done by the
cgroup subsystem.
With this patch, user jdoe still cannot move pid 1 into a devices cgroup
he owns, but now he can move his own other tasks into devices cgroups.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Conflicts:
drivers/net/usb/qmi_wwan.c
include/net/dst.h
Trivial merge conflicts, both were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux file locking does not follow the same rules
as other mechanisms. Even though it is a write operation
a process can set a read lock on files which it has open
only for read access. Two programs with read access to
a file can use read locks to communicate.
This is not acceptable in a Mandatory Access Control
environment. Smack treats setting a read lock as the
write operation that it is. Unfortunately, many programs
assume that setting a read lock is a read operation.
These programs are unhappy in the Smack environment.
This patch introduces a new access mode (lock) to address
this problem. A process with lock access to a file can
set a read lock. A process with write access to a file can
set a read lock or a write lock. This prevents a situation
where processes are granted write access just so they can
set read locks.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
BugLink: http://bugs.launchpad.net/bugs/1235977
The profile introspection seq file has a locking bug when policy is viewed
from a virtual root (task in a policy namespace), introspection from the
real root is not affected.
The test for root
while (parent) {
is correct for the real root, but incorrect for tasks in a policy namespace.
This allows the task to walk backup the policy tree past its virtual root
causing it to be unlocked before the virtual root should be in the p_stop
fn.
This results in the following lockdep back trace:
[ 78.479744] [ BUG: bad unlock balance detected! ]
[ 78.479792] 3.11.0-11-generic #17 Not tainted
[ 78.479838] -------------------------------------
[ 78.479885] grep/2223 is trying to release lock (&ns->lock) at:
[ 78.479952] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[ 78.480002] but there are no more locks to release!
[ 78.480037]
[ 78.480037] other info that might help us debug this:
[ 78.480037] 1 lock held by grep/2223:
[ 78.480037] #0: (&p->lock){+.+.+.}, at: [<ffffffff812111bd>] seq_read+0x3d/0x3d0
[ 78.480037]
[ 78.480037] stack backtrace:
[ 78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17
[ 78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 78.480037] ffffffff817bf3be ffff880007763d60 ffffffff817b97ef ffff8800189d2190
[ 78.480037] ffff880007763d88 ffffffff810e1c6e ffff88001f044730 ffff8800189d2190
[ 78.480037] ffffffff817bf3be ffff880007763e00 ffffffff810e5bd6 0000000724fe56b7
[ 78.480037] Call Trace:
[ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[ 78.480037] [<ffffffff817b97ef>] dump_stack+0x54/0x74
[ 78.480037] [<ffffffff810e1c6e>] print_unlock_imbalance_bug+0xee/0x100
[ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[ 78.480037] [<ffffffff810e5bd6>] lock_release_non_nested+0x226/0x300
[ 78.480037] [<ffffffff817bf2fe>] ? __mutex_unlock_slowpath+0xce/0x180
[ 78.480037] [<ffffffff817bf3be>] ? mutex_unlock+0xe/0x10
[ 78.480037] [<ffffffff810e5d5c>] lock_release+0xac/0x310
[ 78.480037] [<ffffffff817bf2b3>] __mutex_unlock_slowpath+0x83/0x180
[ 78.480037] [<ffffffff817bf3be>] mutex_unlock+0xe/0x10
[ 78.480037] [<ffffffff81376c91>] p_stop+0x51/0x90
[ 78.480037] [<ffffffff81211408>] seq_read+0x288/0x3d0
[ 78.480037] [<ffffffff811e9d9e>] vfs_read+0x9e/0x170
[ 78.480037] [<ffffffff811ea8cc>] SyS_read+0x4c/0xa0
[ 78.480037] [<ffffffff817ccc9d>] system_call_fastpath+0x1a/0x1f
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pass the hook ops to the hookfn to allow for generic hook
functions. This change is required by nf_tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONFIG_IPV6=n is still a valid choice ;)
It appears we can remove dead code.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.. so get rid of it. The only indirect users were all the
avc_has_perm() callers which just expanded to have a zero flags
argument.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every single user passes in '0'. I think we had non-zero users back in
some stone age when selinux_inode_permission() was implemented in terms
of inode_has_perm(), but that complicated case got split up into a
totally separate code-path so that we could optimize the much simpler
special cases.
See commit 2e33405785 ("SELinux: delay initialization of audit data in
selinux_inode_permission") for example.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/usb/qmi_wwan.c
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
include/net/netfilter/nf_conntrack_synproxy.h
include/net/secure_seq.h
The conflicts are of two varieties:
1) Conflicts with Joe Perches's 'extern' removal from header file
function declarations. Usually it's an argument signature change
or a function being added/removed. The resolutions are trivial.
2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
a new value, another changes an existing value. That sort of
thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c
v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next
v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the shash interface, rather than the hash interface, when hashing
AppArmor profiles. The shash interface does not use scatterlists and it
is a better fit for what AppArmor needs.
This fixes a kernel paging BUG when aa_calc_profile_hash() is passed a
buffer from vmalloc(). The hash interface requires callers to handle
vmalloc() buffers differently than what AppArmor was doing. Due to
vmalloc() memory not being physically contiguous, each individual page
behind the buffer must be assigned to a scatterlist with sg_set_page()
and then the scatterlist passed to crypto_hash_update().
The shash interface does not have that limitation and allows vmalloc()
and kmalloc() buffers to be handled in the same manner.
BugLink: https://launchpad.net/bugs/1216294/
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=62261
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
In order to create the integrity keyrings (eg. _evm, _ima), root's
uid and session keyrings need to be initialized early.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Add KEY_FLAG_TRUSTED to indicate that a key either comes from a trusted source
or had a cryptographic signature chain that led back to a trusted key the
kernel already possessed.
Add KEY_FLAGS_TRUSTED_ONLY to indicate that a keyring will only accept links to
keys marked with KEY_FLAGS_TRUSTED.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Add support for per-user_namespace registers of persistent per-UID kerberos
caches held within the kernel.
This allows the kerberos cache to be retained beyond the life of all a user's
processes so that the user's cron jobs can work.
The kerberos cache is envisioned as a keyring/key tree looking something like:
struct user_namespace
\___ .krb_cache keyring - The register
\___ _krb.0 keyring - Root's Kerberos cache
\___ _krb.5000 keyring - User 5000's Kerberos cache
\___ _krb.5001 keyring - User 5001's Kerberos cache
\___ tkt785 big_key - A ccache blob
\___ tkt12345 big_key - Another ccache blob
Or possibly:
struct user_namespace
\___ .krb_cache keyring - The register
\___ _krb.0 keyring - Root's Kerberos cache
\___ _krb.5000 keyring - User 5000's Kerberos cache
\___ _krb.5001 keyring - User 5001's Kerberos cache
\___ tkt785 keyring - A ccache
\___ krbtgt/REDHAT.COM@REDHAT.COM big_key
\___ http/REDHAT.COM@REDHAT.COM user
\___ afs/REDHAT.COM@REDHAT.COM user
\___ nfs/REDHAT.COM@REDHAT.COM user
\___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key
\___ http/KERNEL.ORG@KERNEL.ORG big_key
What goes into a particular Kerberos cache is entirely up to userspace. Kernel
support is limited to giving you the Kerberos cache keyring that you want.
The user asks for their Kerberos cache by:
krb_cache = keyctl_get_krbcache(uid, dest_keyring);
The uid is -1 or the user's own UID for the user's own cache or the uid of some
other user's cache (requires CAP_SETUID). This permits rpc.gssd or whatever to
mess with the cache.
The cache returned is a keyring named "_krb.<uid>" that the possessor can read,
search, clear, invalidate, unlink from and add links to. Active LSMs get a
chance to rule on whether the caller is permitted to make a link.
Each uid's cache keyring is created when it first accessed and is given a
timeout that is extended each time this function is called so that the keyring
goes away after a while. The timeout is configurable by sysctl but defaults to
three days.
Each user_namespace struct gets a lazily-created keyring that serves as the
register. The cache keyrings are added to it. This means that standard key
search and garbage collection facilities are available.
The user_namespace struct's register goes away when it does and anything left
in it is then automatically gc'd.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Simo Sorce <simo@redhat.com>
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Implement a big key type that can save its contents to tmpfs and thus
swapspace when memory is tight. This is useful for Kerberos ticket caches.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Simo Sorce <simo@redhat.com>
Expand the capacity of a keyring to be able to hold a lot more keys by using
the previously added associative array implementation. Currently the maximum
capacity is:
(PAGE_SIZE - sizeof(header)) / sizeof(struct key *)
which, on a 64-bit system, is a little more 500. However, since this is being
used for the NFS uid mapper, we need more than that. The new implementation
gives us effectively unlimited capacity.
With some alterations, the keyutils testsuite runs successfully to completion
after this patch is applied. The alterations are because (a) keyrings that
are simply added to no longer appear ordered and (b) some of the errors have
changed a bit.
Signed-off-by: David Howells <dhowells@redhat.com>
Drop the permissions argument from __keyring_search_one() as the only caller
passes 0 here - which causes all checks to be skipped.
Signed-off-by: David Howells <dhowells@redhat.com>
Define a __key_get() wrapper to use rather than atomic_inc() on the key usage
count as this makes it easier to hook in refcount error debugging.
Signed-off-by: David Howells <dhowells@redhat.com>
Search for auth-key by name rather than by target key ID as, in a future
patch, we'll by searching directly by index key in preference to iteration
over all keys.
Signed-off-by: David Howells <dhowells@redhat.com>
Search functions pass around a bunch of arguments, each of which gets copied
with each call. Introduce a search context structure to hold these.
Whilst we're at it, create a search flag that indicates whether the search
should be directly to the description or whether it should iterate through all
keys looking for a non-description match.
This will be useful when keyrings use a generic data struct with generic
routines to manage their content as the search terms can just be passed
through to the iterator callback function.
Also, for future use, the data to be supplied to the match function is
separated from the description pointer in the search context. This makes it
clear which is being supplied.
Signed-off-by: David Howells <dhowells@redhat.com>
Consolidate the concept of an 'index key' for accessing keys. The index key
is the search term needed to find a key directly - basically the key type and
the key description. We can add to that the description length.
This will be useful when turning a keyring into an associative array rather
than just a pointer block.
Signed-off-by: David Howells <dhowells@redhat.com>
Skip key state checks (invalidation, revocation and expiration) when checking
for possession. Without this, keys that have been marked invalid, revoked
keys and expired keys are not given a possession attribute - which means the
possessor is not granted any possession permits and cannot do anything with
them unless they also have one a user, group or other permit.
This causes failures in the keyutils test suite's revocation and expiration
tests now that commit 96b5c8fea6 reduced the
initial permissions granted to a key.
The failures are due to accesses to revoked and expired keys being given
EACCES instead of EKEYREVOKED or EKEYEXPIRED.
Signed-off-by: David Howells <dhowells@redhat.com>
Back when we had half ass LSM stacking we had to link capabilities.o
after bigger LSMs so that on initialization the bigger LSM would
register first and the capabilities module would be the one stacked as
the 'seconday'. Somewhere around 6f0f0fd496 (back in 2008) we
finally removed the last of the kinda module stacking code but this
comment in the makefile still lives today.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Conflicts:
security/selinux/hooks.c
Pull Eric's existing SELinux tree as there are a number of patches in
there that are not yet upstream. There was some minor fixup needed to
resolve a conflict in security/selinux/hooks.c:selinux_set_mnt_opts()
between the labeled NFS patches and Eric's security_fs_use()
simplification patch.
Pull namespace changes from Eric Biederman:
"This is an assorted mishmash of small cleanups, enhancements and bug
fixes.
The major theme is user namespace mount restrictions. nsown_capable
is killed as it encourages not thinking about details that need to be
considered. A very hard to hit pid namespace exiting bug was finally
tracked and fixed. A couple of cleanups to the basic namespace
infrastructure.
Finally there is an enhancement that makes per user namespace
capabilities usable as capabilities, and an enhancement that allows
the per userns root to nice other processes in the user namespace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Kill nsown_capable it makes the wrong thing easy
capabilities: allow nice if we are privileged
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Allow PR_CAPBSET_DROP in a user namespace.
namespaces: Simplify copy_namespaces so it is clear what is going on.
pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
sysfs: Restrict mounting sysfs
userns: Better restrictions on when proc and sysfs can be mounted
vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
kernel/nsproxy.c: Improving a snippet of code.
proc: Restrict mounting the proc filesystem
vfs: Lock in place mounts from more privileged users
Pull security subsystem updates from James Morris:
"Nothing major for this kernel, just maintenance updates"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
apparmor: add the ability to report a sha1 hash of loaded policy
apparmor: export set of capabilities supported by the apparmor module
apparmor: add the profile introspection file to interface
apparmor: add an optional profile attachment string for profiles
apparmor: add interface files for profiles and namespaces
apparmor: allow setting any profile into the unconfined state
apparmor: make free_profile available outside of policy.c
apparmor: rework namespace free path
apparmor: update how unconfined is handled
apparmor: change how profile replacement update is done
apparmor: convert profile lists to RCU based locking
apparmor: provide base for multiple profiles to be replaced at once
apparmor: add a features/policy dir to interface
apparmor: enable users to query whether apparmor is enabled
apparmor: remove minimum size check for vmalloc()
Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
Smack: network label match fix
security: smack: add a hash table to quicken smk_find_entry()
security: smack: fix memleak in smk_write_rules_list()
xattr: Constify ->name member of "struct xattr".
...
Pull networking changes from David Miller:
"Noteworthy changes this time around:
1) Multicast rejoin support for team driver, from Jiri Pirko.
2) Centralize and simplify TCP RTT measurement handling in order to
reduce the impact of bad RTO seeding from SYN/ACKs. Also, when
both timestamps and local RTT measurements are available prefer
the later because there are broken middleware devices which
scramble the timestamp.
From Yuchung Cheng.
3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel
memory consumed to queue up unsend user data. From Eric Dumazet.
4) Add a "physical port ID" abstraction for network devices, from
Jiri Pirko.
5) Add a "suppress" operation to influence fib_rules lookups, from
Stefan Tomanek.
6) Add a networking development FAQ, from Paul Gortmaker.
7) Extend the information provided by tcp_probe and add ipv6 support,
from Daniel Borkmann.
8) Use RCU locking more extensively in openvswitch data paths, from
Pravin B Shelar.
9) Add SCTP support to openvswitch, from Joe Stringer.
10) Add EF10 chip support to SFC driver, from Ben Hutchings.
11) Add new SYNPROXY netfilter target, from Patrick McHardy.
12) Compute a rate approximation for sending in TCP sockets, and use
this to more intelligently coalesce TSO frames. Furthermore, add
a new packet scheduler which takes advantage of this estimate when
available. From Eric Dumazet.
13) Allow AF_PACKET fanouts with random selection, from Daniel
Borkmann.
14) Add ipv6 support to vxlan driver, from Cong Wang"
Resolved conflicts as per discussion.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits)
openvswitch: Fix alignment of struct sw_flow_key.
netfilter: Fix build errors with xt_socket.c
tcp: Add missing braces to do_tcp_setsockopt
caif: Add missing braces to multiline if in cfctrl_linkup_request
bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize
vxlan: Fix kernel panic on device delete.
net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls
net: mvneta: properly disable HW PHY polling and ensure adjust_link() works
icplus: Use netif_running to determine device state
ethernet/arc/arc_emac: Fix huge delays in large file copies
tuntap: orphan frags before trying to set tx timestamp
tuntap: purge socket error queue on detach
qlcnic: use standard NAPI weights
ipv6:introduce function to find route for redirect
bnx2x: VF RSS support - VF side
bnx2x: VF RSS support - PF side
vxlan: Notify drivers for listening UDP port changes
net: usbnet: update addr_assign_type if appropriate
driver/net: enic: update enic maintainers and driver
driver/net: enic: Exposing symbols for Cisco's low latency driver
...
CONFIG_DEBUG_KOBJECT_RELEASE which may be theoretical.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSJsL7AAoJENkgDmzRrbjx/N0P+wU81L55uYdJnDVPsW/fzhyJ
BEZtmhFWtqxHeuFMQpwm5N3/ORwMht4czF4Hf957ePtI2PelHu+kapnFFQ+/KHZs
T5sjH0mAYf9+CPa8wj4OKVzvd8lH4udbV+E7INVfciDySRX5HKXynJ6pZHvfeu7y
/2q2PH6kGzuGoTEkDwOOwJ5yyZqs+RW9ZkSMStgCOUE8GmoDXEsH1KwIYE/9buCh
XTngMo7AhimQaQ9QKypJLjlcnI9X/9ljXqFRKqSFOeMA1Ba+h+7eUqd4FJI6jDJu
tecMOxX9PezPK6Wdg8V7AFBSzOhDPqoKQBOcaqeLd1wVICi8oQirVzwQNlsoiVNu
JC+8rDqaeuG3dazROhaAnez7nhHfTjnMsYLVMRUmYtqXetd0qWYSmlmcJRKSJwi4
okl/Lv5BroQdB9bB8+sc7l34nE3HXZGV3tJcNXf91NNEbDt97xFZ2YYbdtsQcOqj
igUHcjsZq1gLsdIhnkHjTGLkLPxMTdCi8mtUc9+uzXHvSPJqEUMcn+fEA7RdliUw
/WvpUX2tj2Al44LfBy6L0D6AVyS2/zIQ9PzH6FHsgVDqrNHomkF20w3btnM3yyPA
hakV6vPr+kOpfSJYlTSU7yhEJh+LGkfPXeaX4X3tqubKhsZjq8rS7vrfbcqmgwvT
DbzDKRx5R3URiiFSbb2v
=15lp
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Minor fixes mainly, including a potential use-after-free on remove
found by CONFIG_DEBUG_KOBJECT_RELEASE which may be theoretical"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: Fix mod->mkobj.kobj potentially freed too early
kernel/params.c: use scnprintf() instead of sprintf()
kernel/module.c: use scnprintf() instead of sprintf()
module/lsm: Have apparmor module parameters work with no args
module: Add NOARG flag for ops with param_set_bool_enable_only() set function
module: Add flag to allow mod params to have no arguments
modules: add support for soft module dependencies
scripts/mod/modpost.c: permit '.cranges' secton for sh64 architecture.
module: fix sprintf format specifier in param_get_byte()
Pull cgroup updates from Tejun Heo:
"A lot of activities on the cgroup front. Most changes aren't visible
to userland at all at this point and are laying foundation for the
planned unified hierarchy.
- The biggest change is decoupling the lifetime management of css
(cgroup_subsys_state) from that of cgroup's. Because controllers
(cpu, memory, block and so on) will need to be dynamically enabled
and disabled, css which is the association point between a cgroup
and a controller may come and go dynamically across the lifetime of
a cgroup. Till now, css's were created when the associated cgroup
was created and stayed till the cgroup got destroyed.
Assumptions around this tight coupling permeated through cgroup
core and controllers. These assumptions are gradually removed,
which consists bulk of patches, and css destruction path is
completely decoupled from cgroup destruction path. Note that
decoupling of creation path is relatively easy on top of these
changes and the patchset is pending for the next window.
- cgroup has its own event mechanism cgroup.event_control, which is
only used by memcg. It is overly complex trying to achieve high
flexibility whose benefits seem dubious at best. Going forward,
new events will simply generate file modified event and the
existing mechanism is being made specific to memcg. This pull
request contains prepatory patches for such change.
- Various fixes and cleanups"
Fixed up conflict in kernel/cgroup.c as per Tejun.
* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
cgroup: fix cgroup_css() invocation in css_from_id()
cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
cgroup: implement CFTYPE_NO_PREFIX
cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
cgroup: fix cgroup_write_event_control()
cgroup: fix subsystem file accesses on the root cgroup
cgroup: change cgroup_from_id() to css_from_id()
cgroup: use css_get() in cgroup_create() to check CSS_ROOT
cpuset: remove an unncessary forward declaration
cgroup: RCU protect each cgroup_subsys_state release
cgroup: move subsys file removal to kill_css()
cgroup: factor out kill_css()
cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
cgroup: replace cgroup->css_kill_cnt with ->nr_css
cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
cgroup: move cgroup->subsys[] assignment to online_css()
cgroup: reorganize css init / exit paths
cgroup: add __rcu modifier to cgroup->subsys[]
...
We allow task A to change B's nice level if it has a supserset of
B's privileges, or of it has CAP_SYS_NICE. Also allow it if A has
CAP_SYS_NICE with respect to B - meaning it is root in the same
namespace, or it created B's namespace.
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
As the capabilites and capability bounding set are per user namespace
properties it is safe to allow changing them with just CAP_SETPCAP
permission in the user namespace.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This reverts commit 308ab70c46.
It breaks my FC6 test box. /dev/pts is not mounted. dmesg says
SELinux: mount invalid. Same superblock, different security settings
for (dev devpts, type devpts)
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Not considering sub filesystem has the following limitation. Support
for SELinux in FUSE is dependent on the particular userspace
filesystem, which is identified by the subtype. For e.g, GlusterFS,
a FUSE based filesystem supports SELinux (by mounting and processing
FUSE requests in different threads, avoiding the mount time
deadlock), whereas other FUSE based filesystems (identified by a
different subtype) have the mount time deadlock.
By considering the subtype of the filesytem in the SELinux policies,
allows us to specify a filesystem subtype, in the following way:
fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);
This way not all FUSE filesystems are put in the same bucket and
subjected to the limitations of the other subtypes.
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The apparmor module parameters for param_ops_aabool and
param_ops_aalockpolicy are both based off of the param_ops_bool,
and can handle a NULL value passed in as val. Have it enable the
new KERNEL_PARAM_FL_NOARGS flag to allow the parameters to be set
without having to state "=y" or "=1".
Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Provide userspace the ability to introspect a sha1 hash value for each
profile currently loaded.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Add the dynamic namespace relative profiles file to the interace, to allow
introspection of loaded profiles and their modes.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Add the ability to take in and report a human readable profile attachment
string for profiles so that attachment specifications can be easily
inspected.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
Add basic interface files to access namespace and profile information.
The interface files are created when a profile is loaded and removed
when the profile or namespace is removed.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Allow emulating the default profile behavior from boot, by allowing
loading of a profile in the unconfined state into a new NS.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
namespaces now completely use the unconfined profile to track the
refcount and rcu freeing cycle. So rework the code to simplify (track
everything through the profile path right up to the end), and move the
rcu_head from policy base to profile as the namespace no longer needs
it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
ns->unconfined is being used read side without locking, nor rcu but is
being updated when a namespace is removed. This works for the root ns
which is never removed but has a race window and can cause failures when
children namespaces are removed.
Also ns and ns->unconfined have a circular refcounting dependency that
is problematic and must be broken. Currently this is done incorrectly
when the namespace is destroyed.
Fix this by forward referencing unconfined via the replacedby infrastructure
instead of directly updating the ns->unconfined pointer.
Remove the circular refcount dependency by making the ns and its unconfined
profile share the same refcount.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
remove the use of replaced by chaining and move to profile invalidation
and lookup to handle task replacement.
Replacement chaining can result in large chains of profiles being pinned
in memory when one profile in the chain is use. With implicit labeling
this will be even more of a problem, so move to a direct lookup method.
Signed-off-by: John Johansen <john.johansen@canonical.com>
previously profiles had to be loaded one at a time, which could result
in cases where a replacement of a set would partially succeed, and then fail
resulting in inconsistent policy.
Allow multiple profiles to replaced "atomically" so that the replacement
either succeeds or fails for the entire set of profiles.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Add a policy directory to features to contain features that can affect
policy compilation but do not affect mediation. Eg of such features would
be types of dfa compression supported, etc.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
This is a follow-up to commit b5b3ee6c "apparmor: no need to delay vfree()".
Since vmalloc() will do "size = PAGE_ALIGN(size);",
we don't need to check for "size >= sizeof(struct work_struct)".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Smack interface for loading rules has always parsed only single rule from
data written to it. This requires user program to call one write() per
each rule it wants to load.
This change makes it possible to write multiple rules, separated by new
line character. Smack will load at most PAGE_SIZE-1 characters and properly
return number of processed bytes. In case when user buffer is larger, it
will be additionally truncated. All characters after last \n will not get
parsed to avoid partial rule near input buffer boundary.
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration. The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.
While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.
The conversions are mostly straight-forward. If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration. If not, if (pos == origin) continue; is added. Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest. While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API. For hierarchy iterators, this is beneficial because
* In most cases, css is the only thing subsystems care about anyway.
* On the planned unified hierarchy, iterations for different
subsystems will need to skip over different subtrees of the
hierarchy depending on which subsystems are enabled on each cgroup.
Passing around css makes it unnecessary to explicitly specify the
subsystem in question as css is intersection between cgroup and
subsystem
* For the planned unified hierarchy, css's would need to be created
and destroyed dynamically independent from cgroup hierarchy. Having
cgroup core manage css iteration makes enforcing deref rules a lot
easier.
Most subsystem conversions are straight-forward. Noteworthy changes
are
* blkio: cgroup_to_blkcg() is no longer used. Removed.
* freezer: cgroup_freezer() is no longer used. Removed.
* devices: cgroup_to_devcgroup() is no longer used. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.
This patch converts all cftype file operations to take @css instead of
@cgroup. cftypes for the cgroup core files don't have their subsytem
pointer set. These will automatically use the dummy_css added by the
previous patch and can be converted the same way.
Most subsystem conversions are straight forwards but there are some
interesting ones.
* freezer: update_if_frozen() is also converted to take @css instead
of @cgroup for consistency. This will make the code look simpler
too once iterators are converted to use css.
* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
vmpressure while mem_cgroup_from_cont() can be made static.
Updated accordingly.
* cpu: cgroup_tg() doesn't have any user left. Removed.
* cpuacct: cgroup_ca() doesn't have any user left. Removed.
* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
Removed.
* net_cls: cgrp_cls_state() doesn't have any user left. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.
* With unified hierarchy, subsystems will be dynamically bound and
unbound from cgroups and thus css's (cgroup_subsys_state) may be
created and destroyed dynamically over the lifetime of a cgroup,
which is different from the current state where all css's are
allocated and destroyed together with the associated cgroup. This
in turn means that cgroup_css() should be synchronized and may
return NULL, making it more cumbersome to use.
* Differing levels of per-subsystem granularity in the unified
hierarchy means that the task and descendant iterators should behave
differently depending on the specific subsystem the iteration is
being performed for.
* In majority of the cases, subsystems only care about its part in the
cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
often obtain the matching css pointer from the cgroup and don't
bother with the cgroup pointer itself. Passing around css fits
much better.
This patch converts all cgroup_subsys methods to take @css instead of
@cgroup. The conversions are mostly straight-forward. A few
noteworthy changes are
* ->css_alloc() now takes css of the parent cgroup rather than the
pointer to the new cgroup as the css for the new cgroup doesn't
exist yet. Knowing the parent css is enough for all the existing
subsystems.
* In kernel/cgroup.c::offline_css(), unnecessary open coded css
dereference is replaced with local variable access.
This patch shouldn't cause any behavior differences.
v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
with local variable @css as suggested by Li Zefan.
Rebased on top of new for-3.12 which includes for-3.11-fixes so
that ->css_free() invocation added by da0a12caff ("cgroup: fix a
leak when percpu_ref_init() fails") is converted too. Suggested
by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css. cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.
This patch implements css_parent() which, given a css, returns its
parent. The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.
freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.
* __parent_ca() is dropped from cpuacct and its usage is replaced with
parent_ca(). The only difference between the two was NULL test on
cgroup->parent which is now embedded in css_parent() making the
distinction moot. Note that eventually a css->parent field will be
added to css and the NULL check in css_parent() will go away.
This patch shouldn't cause any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure. Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast. As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.
All accessors explicitly handle NULL input and return NULL in those
cases. While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.
* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
accessor. Added.
* memory, hugetlb and devices already had one but didn't explicitly
handle NULL input. Updated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.
We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward. Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.
Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css(). This patch is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
The original implementation of the Smack IPv6 port based
local controls works most of the time using a sockaddr as
a temporary variable, but not always as it overflows in
some circumstances. The correct data is a sockaddr_in6.
A struct sockaddr isn't as large as a struct sockaddr_in6.
There would need to be casting one way or the other. This
patch gets it the right way.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The Smack code that matches incoming CIPSO tags with Smack labels
reaches through the NetLabel interfaces and compares the network
data with the CIPSO header associated with a Smack label. This was
done in a ill advised attempt to optimize performance. It works
so long as the categories fit in a single capset, but this isn't
always the case.
This patch changes the Smack code to use the appropriate NetLabel
interfaces to compare the incoming CIPSO header with the CIPSO
header associated with a label. It will always match the CIPSO
headers correctly.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Accepted for the smack-next tree after changing the number of
slots from 128 to 16.
This patch adds a hash table to quicken searching of a smack label by its name.
Basically, the patch improves performance of SMACK initialization. Parsing of
rules involves translation from a string to a smack_known (aka label) entity
which is done in smk_find_entry().
The current implementation of the function iterates over a global list of
smack_known resulting in O(N) complexity for smk_find_entry(). The total
complexity of SMACK initialization becomes O(rules * labels). Therefore it
scales quadratically with a complexity of a system.
Applying the patch reduced the complexity of smk_find_entry() to O(1) as long
as number of label is in hundreds. If the number of labels is increased please
update SMACK_HASH_SLOTS constant defined in security/smack/smack.h. Introducing
the configuration of this constant with Kconfig or cmdline might be a good
idea.
The size of the hash table was adjusted experimentally. The rule set used by
TIZEN contains circa 17K rules for 500 labels. The table above contains
results of SMACK initialization using 'time smackctl apply' bash command.
The 'Ref' is a kernel without this patch applied. The consecutive values
refers to value of SMACK_HASH_SLOTS. Every measurement was repeated three
times to reduce noise.
| Ref | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512
--------------------------------------------------------------------------------------------
Run1 | 1.156 | 1.096 | 0.883 | 0.764 | 0.692 | 0.667 | 0.649 | 0.633 | 0.634 | 0.629 | 0.620
Run2 | 1.156 | 1.111 | 0.885 | 0.764 | 0.694 | 0.661 | 0.649 | 0.651 | 0.634 | 0.638 | 0.623
Run3 | 1.160 | 1.107 | 0.886 | 0.764 | 0.694 | 0.671 | 0.661 | 0.638 | 0.631 | 0.624 | 0.638
AVG | 1.157 | 1.105 | 0.885 | 0.764 | 0.693 | 0.666 | 0.653 | 0.641 | 0.633 | 0.630 | 0.627
Surprisingly, a single hlist is slightly faster than a double-linked list.
The speed-up saturates near 64 slots. Therefore I chose value 128 to provide
some margin if more labels were used.
It looks that IO becomes a new bottleneck.
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
The smack_parsed_rule structure is allocated. If a rule is successfully
installed then the last reference to the object is lost. This patch fixes this
leak. Moreover smack_parsed_rule is allocated on stack because it no longer
needed ofter smk_write_rules_list() is finished.
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:
- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.
Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the packet class in SELinux is not checked if there are no
SECMARK rules in the security or mangle netfilter tables. Some systems
prefer that packets are always checked, for example, to protect the system
should the netfilter rules fail to load or if the nefilter rules
were maliciously flushed.
Add the always_check_network policy capability which, when enabled, treats
SECMARK as enabled, even if there are no netfilter SECMARK rules and
treats peer labeling as enabled, even if there is no Netlabel or
labeled IPSEC configuration.
Includes definition of "redhat1" SELinux policy capability, which
exists in the SELinux userpace library, to keep ordering correct.
The SELinux userpace portion of this was merged last year, but this kernel
change fell on the floor.
Signed-off-by: Chris PeBenito <cpebenito@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
When the BUG() macro is disabled at compile time it can cause some
problems in the SELinux netnode code: invalid return codes and
uninitialized variables. This patch fixes this by making sure we take
some corrective action after the BUG() macro.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Use a helper to determine if a superblock should have the seclabel flag
rather than doing it in the function. I'm going to use this in the
security server as well.
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than passing pointers to memory locations, strings, and other
stuff just give up on the separation and give security_fs_use the
superblock. It just makes the code easier to read (even if not easier to
reuse on some other OS)
Signed-off-by: Eric Paris <eparis@redhat.com>
Instead of having special code around the 'non-mount' seclabel mount option
just handle it like the mount options.
Signed-off-by: Eric Paris <eparis@redhat.com>
We only have 6 options, so char is good enough, but use a short as that
packs nicely. This shrinks the superblock_security_struct just a little
bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
Just to make it clear that we have mount time options and flags,
separate them. Since I decided to move the non-mount options above
above 0x10, we need a short instead of a char. (x86 padding says
this takes up no additional space as we have a 3byte whole in the
structure)
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently we set the initialize and seclabel flag in one place. Do some
unrelated printk then we unset the seclabel flag. Eww. Instead do the flag
twiddling in one place in the code not seperated by unrelated printk. Also
don't set and unset the seclabel flag. Only set it if we need to.
Signed-off-by: Eric Paris <eparis@redhat.com>
We had this random hard coded value of '8' in the code (I put it there)
for the number of bits to check for mount options. This is stupid. Instead
use the #define we already have which tells us the number of mount
options.
Signed-off-by: Eric Paris <eparis@redhat.com>
We check if the fsname is proc and if so set the proc superblock security
struct flag. We then check if the flag is set and use the string 'proc'
for the fsname instead of just using the fsname. What's the point? It's
always proc... Get rid of the useless conditional.
Signed-off-by: Eric Paris <eparis@redhat.com>
The /sys/fs/selinux/policy file is not valid on big endian systems like
ppc64 or s390. Let's see why:
static int hashtab_cnt(void *key, void *data, void *ptr)
{
int *cnt = ptr;
*cnt = *cnt + 1;
return 0;
}
static int range_write(struct policydb *p, void *fp)
{
size_t nel;
[...]
/* count the number of entries in the hashtab */
nel = 0;
rc = hashtab_map(p->range_tr, hashtab_cnt, &nel);
if (rc)
return rc;
buf[0] = cpu_to_le32(nel);
rc = put_entry(buf, sizeof(u32), 1, fp);
So size_t is 64 bits. But then we pass a pointer to it as we do to
hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with
the first 4 bytes. On x86_64 which is little endian, those first 4
bytes and the least significant, so this works out fine. On ppc64/s390
those first 4 bytes of memory are the high order bits. So at the end of
the call to hashtab_map nel has a HUGE number. But the least
significant 32 bits are all 0's.
We then pass that 64 bit number to cpu_to_le32() which happily truncates
it to a 32 bit number and does endian swapping. But the low 32 bits are
all 0's. So no matter how many entries are in the hashtab, big endian
systems always say there are 0 entries because I screwed up the
counting.
The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
rootfs (ramfs) can support setting of security contexts
by userspace due to the vfs fallback behavior of calling
the security module to set the in-core inode state
for security.* attributes when the filesystem does not
provide an xattr handler. No xattr handler required
as the inodes are pinned in memory and have no backing
store.
This is useful in allowing early userspace to label individual
files within a rootfs while still providing a policy-defined
default via genfs.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.
On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.
This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:
5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit
1.96% ls [kernel.kallsyms] [k] mls_level_isvalid
1.95% ls [kernel.kallsyms] [k] find_next_bit
The ebitmap_get_bit() was the hottest function in the perf-report
output. Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.
Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.
The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().
With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.
0.07% ls [kernel.kallsyms] [k] ebitmap_contains
0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit
0.01% ls [kernel.kallsyms] [k] mls_level_isvalid
0.01% ls [kernel.kallsyms] [k] find_next_bit
The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.
This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +0.9% | +2.6% |
+--------------+---------------+----------------+-----------------+
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Remove the BUG_ON() from selinux_skb_xfrm_sid() and propogate the
error code up to the caller. Also check the return values in the
only caller function, selinux_skb_peerlbl_sid().
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Remove the unused get_sock_isec() function and do some formatting
fixes.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The SELinux labeled IPsec code state management functions have been
long neglected and could use some cleanup and consolidation.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The xfrm_state_alloc_security() LSM hook implementation is really a
multiplexed hook with two different behaviors depending on the
arguments passed to it by the caller. This patch splits the LSM hook
implementation into two new hook implementations, which match the
LSM hooks in the rest of the kernel:
* xfrm_state_alloc
* xfrm_state_alloc_acquire
Also included in this patch are the necessary changes to the SELinux
code; no other LSMs are affected.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Since everybody sets kstrdup()ed constant string to "struct xattr"->name but
nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure
checking by constifying ->name member of "struct xattr".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Joel Becker <jlbec@evilplan.org> [ocfs2]
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull nfsd changes from Bruce Fields:
"Changes this time include:
- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.
Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().
Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit 0a4db187a9 ("Merge branch 'll_poll'")
From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.
3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.
4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.
5) Allow controlling network device VF link state using netlink, from
Rony Efraim.
6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.
8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.
9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.
10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.
11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.
12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.
13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.
14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.
15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.
16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.
17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.
18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.
19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.
20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.
21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.
23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.
24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.
25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.
26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.
28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.
29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.
30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...
Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
add support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
+4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
R2mLTRhKu1DKguTzO13f
=wGjn
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."
* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...
Pull security subsystem updates from James Morris:
"In this update, Smack learns to love IPv6 and to mount a filesystem
with a transmutable hierarchy (i.e. security labels are inherited
from parent directory upon creation rather than creating process).
The rest of the changes are maintenance"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (37 commits)
tpm/tpm_i2c_infineon: Remove unused header file
tpm: tpm_i2c_infinion: Don't modify i2c_client->driver
evm: audit integrity metadata failures
integrity: move integrity_audit_msg()
evm: calculate HMAC after initializing posix acl on tmpfs
maintainers: add Dmitry Kasatkin
Smack: Fix the bug smackcipso can't set CIPSO correctly
Smack: Fix possible NULL pointer dereference at smk_netlbl_mls()
Smack: Add smkfstransmute mount option
Smack: Improve access check performance
Smack: Local IPv6 port based controls
tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
maintainers: Remove Kent from maintainers
tpm: move TPM_DIGEST_SIZE defintion
tpm_tis: missing platform_driver_unregister() on error in init_tis()
security: clarify cap_inode_getsecctx description
apparmor: no need to delay vfree()
apparmor: fix fully qualified name parsing
apparmor: fix setprocattr arg processing for onexec
apparmor: localize getting the security context to a few macros
...
Pull second set of VFS changes from Al Viro:
"Assorted f_pos race fixes, making do_splice_direct() safe to call with
i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
->d_hash/->d_compare calling conventions changes from Linus, misc
stuff all over the place."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
Document ->tmpfile()
ext4: ->tmpfile() support
vfs: export lseek_execute() to modules
lseek_execute() doesn't need an inode passed to it
block_dev: switch to fixed_size_llseek()
cpqphp_sysfs: switch to fixed_size_llseek()
tile-srom: switch to fixed_size_llseek()
proc_powerpc: switch to fixed_size_llseek()
ubi/cdev: switch to fixed_size_llseek()
pci/proc: switch to fixed_size_llseek()
isapnp: switch to fixed_size_llseek()
lpfc: switch to fixed_size_llseek()
locks: give the blocked_hash its own spinlock
locks: add a new "lm_owner_key" lock operation
locks: turn the blocked_list into a hashtable
locks: convert fl_link to a hlist_node
locks: avoid taking global lock if possible when waking up blocked waiters
locks: protect most of the file_lock handling with i_lock
locks: encapsulate the fl_link list handling
locks: make "added" in __posix_lock_file a bool
...
Pull cgroup changes from Tejun Heo:
"This pull request contains the following changes.
- cgroup_subsys_state (css) reference counting has been converted to
percpu-ref. css is what each resource controller embeds into its
own control structure and perform reference count against. It may
be used in hot paths of various subsystems and is similar to module
refcnt in that aspect. For example, block-cgroup's css refcnting
was showing up a lot in Mikulaus's device-mapper scalability work
and this should alleviate it.
- cgroup subtree iterator has been updated so that RCU read lock can
be released after grabbing reference. This allows simplifying its
users which requires blocking which used to build iteration list
under RCU read lock and then traverse it outside. This pull
request contains simplification of cgroup core and device-cgroup.
A separate pull request will update cpuset.
- Fixes for various bugs including corner race conditions and RCU
usage bugs.
- A lot of cleanups and some prepartory work for the planned unified
hierarchy support."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (48 commits)
cgroup: CGRP_ROOT_SUBSYS_BOUND should also be ignored when mounting an existing hierarchy
cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options
cgroup: fix deadlock on cgroup_mutex via drop_parsed_module_refcounts()
cgroup: always use RCU accessors for protected accesses
cgroup: fix RCU accesses around task->cgroups
cgroup: fix RCU accesses to task->cgroups
cgroup: grab cgroup_mutex in drop_parsed_module_refcounts()
cgroup: fix cgroupfs_root early destruction path
cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy
cgroup: implement for_each_[builtin_]subsys()
cgroup: move init_css_set initialization inside cgroup_mutex
cgroup: s/for_each_subsys()/for_each_root_subsys()/
cgroup: clean up find_css_set() and friends
cgroup: remove cgroup->actual_subsys_mask
cgroup: prefix global variables with "cgroup_"
cgroup: convert CFTYPE_* flags to enums
cgroup: rename cont to cgrp
cgroup: clean up cgroup_serial_nr_cursor
cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre()
cgroup: make serial_nr_cursor available throughout cgroup.c
...
Create a file_path_has_perm() function that is like path_has_perm() but
instead takes a file struct that is the source of both the path and the
inode (rather than getting the inode from the dentry in the path). This
is then used where appropriate.
This will be useful for situations like unionmount where it will be
possible to have an apparently-negative dentry (eg. a fallthrough) that is
open with the file struct pointing to an inode on the lower fs.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace a bunch of file->dentry->d_inode refs with file_inode().
In __fput(), use file->f_inode instead so as not to be affected by any tricks
that file_inode() might grow.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Before modifying an EVM protected extended attribute or any other
metadata included in the HMAC calculation, the existing 'security.evm'
is verified. This patch adds calls to integrity_audit_msg() to audit
integrity metadata failures.
Reported-by: Sven Vermeulen <sven.vermeulen@siphos.be>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch moves the integrity_audit_msg() function and defintion to
security/integrity/, the parent directory, renames the 'ima_audit'
boot command line option to 'integrity_audit', and fixes the Kconfig
help text to reflect the actual code.
Changelog:
- Fixed ifdef inclusion of integrity_audit_msg() (Fengguang Wu)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The existing NFSv4 xattr handlers do not accept xattr calls to the security
namespace. This patch extends these handlers to accept xattrs from the security
namespace in addition to the default NFSv4 ACL namespace.
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch implements the client transport and handling support for labeled
NFS. The patch adds two functions to encode and decode the security label
recommended attribute which makes use of the LSM hooks added earlier. It also
adds code to grab the label from the file attribute structures and encode the
label to be sent back to the server.
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There currently doesn't exist a labeling type that is adequate for use with
labeled NFS. Since NFS doesn't really support xattrs we can't use the use xattr
labeling behavior. For this we developed a new labeling type. The native
labeling type is used solely by NFS to ensure NFS inodes are labeled at runtime
by the NFS code instead of relying on the SELinux security server on the client
end.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no way to differentiate if a text mount option is passed from user
space or the kernel. A flags field is being added to the
security_sb_set_mnt_opts hook to allow for in kernel security flags to be sent
to the LSM for processing in addition to the text options received from mount.
This patch also updated existing code to fix compilation errors.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The interface to request security labels from user space is the xattr
interface. When requesting the security label from an NFS server it is
important to make sure the requested xattr actually is a MAC label. This allows
us to make sure that we get the desired semantics from the attribute instead of
something else such as capabilities or a time based LSM.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is a time where we need to calculate a context without the
inode having been created yet. To do this we take the negative dentry and
calculate a context based on the process and the parent directory contexts.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.
This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge. Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug report: https://tizendev.org/bugs/browse/TDIS-3891
The reason is userspace libsmack only use "smackfs/cipso2" long-label interface,
but the code's logical is still for orginal fixed length label. Now update
smack_cipso_apply() to support flexible label (<=256 including tailing '\0')
There is also a bug in kernel/security/smack/smackfs.c:
When smk_set_cipso() parsing the CIPSO setting from userspace, the offset of
CIPSO level should be "strlen(label)+1" instead of "strlen(label)"
Signed-off-by: Passion,Zhao <passion.zhao@intel.com>
The SELinux labeled IPsec code was improperly handling its reference
counting, dropping a reference on a delete operation instead of on a
free/release operation.
Reported-by: Ondrej Moris <omoris@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppliment the smkfsroot mount option with another, smkfstransmute,
that does the same thing but also marks the root inode as
transmutting. This allows a freshly created filesystem to
be mounted with a transmutting heirarchy.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Each Smack label that the kernel has seen is added to a
list of labels. The list of access rules for a given subject
label hangs off of the label list entry for the label.
This patch changes the structures that contain subject
labels to point at the label list entry rather that the
label itself. Doing so removes a label list lookup in
smk_access() that was accounting for the largest single
chunk of Smack overhead.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Smack does not provide access controls on IPv6 communications.
This patch introduces a mechanism for maintaining Smack lables
for local IPv6 communications. It is based on labeling local ports.
The behavior should be compatible with any future "real" IPv6
support as it provides no interfaces for users to manipulate
the labeling. Remote IPv6 connections use the ambient label
the same way that unlabeled IPv4 packets are treated.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
During a config change, propagate_exception() needs to traverse the
subtree to update config on the subtree. Because such config updates
need to allocate memory, it couldn't directly use
cgroup_for_each_descendant_pre() which required the whole iteration to
be contained in a single RCU read critical section. To work around
the limitation, propagate_exception() built a linked list of
descendant cgroups while read-locking RCU and then walked the list
afterwards, which is safe as the whole iteration is protected by
devcgroup_mutex. This works but is cumbersome.
With the recent updates, cgroup iterators now allow dropping RCU read
lock while iteration is in progress making this workaround no longer
necessary. This patch replaces dev_cgroup->propagate_pending list and
get_online_devcg() with direct cgroup_for_each_descendant_pre() walk.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aristeu Rozanski <aris@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
We shouldn't be returning success from this function without also
filling in the return values ctx and ctxlen.
Note currently this doesn't appear to cause bugs since the only
inode_getsecctx caller I can find is fs/sysfs/inode.c, which only calls
this if security_inode_setsecurity succeeds. Assuming
security_inode_setsecurity is set to cap_inode_setsecurity whenever
inode_getsecctx is set to cap_inode_getsecctx, this function can never
actually called.
So I noticed this only because the server labeled NFS patches add a real
caller.
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
vfree() can be called from interrupt contexts now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):
1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.
2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.
3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.
4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.
5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.
6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.
Use it to support wake-on-lan in mv643xx_eth.
From Michael Stapelberg.
7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.
8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.
9) Add support for T5 cxgb4 chips, from Santosh Rastapur.
10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.
11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.
12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.
13) Start adding networking selftests.
14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.
15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.
16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.
17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.
18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.
19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.
20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.
21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.
22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.
23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.
24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.
25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.
26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.
27) Convert IPVS schedulers to RCU, also from Julian Anastasov.
28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.
29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.
30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.
31) Support several new r8169 chips, from Hayes Wang.
32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.
33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.
34) Add 802.1ad vlan offload support, from Patrick McHardy.
35) Support mmap() based netlink communication, also from Patrick
McHardy.
36) Support HW timestamping in mlx4 driver, from Amir Vadai.
37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.
38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.
39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...
Merge third batch of fixes from Andrew Morton:
"Most of the rest. I still have two large patchsets against AIO and
IPC, but they're a bit stuck behind other trees and I'm about to
vanish for six days.
- random fixlets
- inotify
- more of the MM queue
- show_stack() cleanups
- DMI update
- kthread/workqueue things
- compat cleanups
- epoll udpates
- binfmt updates
- nilfs2
- hfs
- hfsplus
- ptrace
- kmod
- coredump
- kexec
- rbtree
- pids
- pidns
- pps
- semaphore tweaks
- some w1 patches
- relay updates
- core Kconfig changes
- sysrq tweaks"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
Documentation/sysrq: fix inconstistent help message of sysrq key
ethernet/emac/sysrq: fix inconstistent help message of sysrq key
sparc/sysrq: fix inconstistent help message of sysrq key
powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
ARM/etm/sysrq: fix inconstistent help message of sysrq key
power/sysrq: fix inconstistent help message of sysrq key
kgdb/sysrq: fix inconstistent help message of sysrq key
lib/decompress.c: fix initconst
notifier-error-inject: fix module names in Kconfig
kernel/sys.c: make prctl(PR_SET_MM) generally available
UAPI: remove empty Kbuild files
menuconfig: print more info for symbol without prompts
init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
kconfig menu: move Virtualization drivers near other virtualization options
Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
relay: use macro PAGE_ALIGN instead of FIX_SIZE
kernel/relay.c: move FIX_SIZE macro into relay.c
kernel/relay.c: remove unused function argument actor
drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
...
Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case there's an OOM in this last
function the cleanup function may not be called - in this case we would
miss a call to key_put().
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem update from James Morris:
"Just some minor updates across the subsystem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: eliminate passing d_name.name to process_measurement()
TPM: Retry SaveState command in suspend path
tpm/tpm_i2c_infineon: Add small comment about return value of __i2c_transfer
tpm/tpm_i2c_infineon.c: Add OF attributes type and name to the of_device_id table entries
tpm_i2c_stm_st33: Remove duplicate inclusion of header files
tpm: Add support for new Infineon I2C TPM (SLB 9645 TT 1.2 I2C)
char/tpm: Convert struct i2c_msg initialization to C99 format
drivers/char/tpm/tpm_ppi: use strlcpy instead of strncpy
tpm/tpm_i2c_stm_st33: formatting and white space changes
Smack: include magic.h in smackfs.c
selinux: make security_sb_clone_mnt_opts return an error on context mismatch
seccomp: allow BPF_XOR based ALU instructions.
Fix NULL pointer dereference in smack_inode_unlink() and smack_inode_rmdir()
Smack: add support for modification of existing rules
smack: SMACK_MAGIC to include/uapi/linux/magic.h
Smack: add missing support for transmute bit in smack_str_from_perm()
Smack: prevent revoke-subject from failing when unseen label is written to it
tomoyo: use DEFINE_SRCU() to define tomoyo_ss
tomoyo: use DEFINE_SRCU() to define tomoyo_ss
Pull cgroup updates from Tejun Heo:
- Fixes and a lot of cleanups. Locking cleanup is finally complete.
cgroup_mutex is no longer exposed to individual controlelrs which
used to cause nasty deadlock issues. Li fixed and cleaned up quite a
bit including long standing ones like racy cgroup_path().
- device cgroup now supports proper hierarchy thanks to Aristeu.
- perf_event cgroup now supports proper hierarchy.
- A new mount option "__DEVEL__sane_behavior" is added. As indicated
by the name, this option is to be used for development only at this
point and generates a warning message when used. Unfortunately,
cgroup interface currently has too many brekages and inconsistencies
to implement a consistent and unified hierarchy on top. The new flag
is used to collect the behavior changes which are necessary to
implement consistent unified hierarchy. It's likely that this flag
won't be used verbatim when it becomes ready but will be enabled
implicitly along with unified hierarchy.
The option currently disables some of broken behaviors in cgroup core
and also .use_hierarchy switch in memcg (will be routed through -mm),
which can be used to make very unusual hierarchy where nesting is
partially honored. It will also be used to implement hierarchy
support for blk-throttle which would be impossible otherwise without
introducing a full separate set of control knobs.
This is essentially versioning of interface which isn't very nice but
at this point I can't see any other options which would allow keeping
the interface the same while moving towards hierarchy behavior which
is at least somewhat sane. The planned unified hierarchy is likely
to require some level of adaptation from userland anyway, so I think
it'd be best to take the chance and update the interface such that
it's supportable in the long term.
Maintaining the existing interface does complicate cgroup core but
shouldn't put too much strain on individual controllers and I think
it'd be manageable for the foreseeable future. Maybe we'll be able
to drop it in a decade.
Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
cpuset: fix compile warning when CONFIG_SMP=n
cpuset: fix cpu hotplug vs rebuild_sched_domains() race
cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
cgroup: restore the call to eventfd->poll()
cgroup: fix use-after-free when umounting cgroupfs
cgroup: fix broken file xattrs
devcg: remove parent_cgroup.
memcg: force use_hierarchy if sane_behavior
cgroup: remove cgrp->top_cgroup
cgroup: introduce sane_behavior mount option
move cgroupfs_root to include/linux/cgroup.h
cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
cgroup: make cgroup_path() not print double slashes
Revert "cgroup: remove bind() method from cgroup_subsys."
perf: make perf_event cgroup hierarchical
cgroup: implement cgroup_is_descendant()
cgroup: make sure parent won't be destroyed before its children
cgroup: remove bind() method from cgroup_subsys.
devcg: remove broken_hierarchy tag
cgroup: remove cgroup_lock_is_held()
...
currently apparmor name parsing is only correctly handling
:<NS>:<profile>
but
:<NS>://<profile>
is also a valid form and what is exported to userspace.
Signed-off-by: John Johansen <john.johansen@canonical.com>
the exec file isn't processing its command arg. It should only set be
responding to a command of exec.
Also cleanup setprocattr some more while we are at it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
smatch reports
error: potential NULL dereference 'ns'.
this can not actually occur because it relies on aa_split_fqname setting
both ns_name and name as null but ns_name will actually always have a
value in this case.
so remove the unnecessary if (ns_name) conditional that is resulting
in the false positive further down.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The audit type table is missing a comma so that KILLED comes out as
KILLEDAUTO.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Steve Beattie <sbeattie@ubuntu.com>
The top 8 bits of the base field have never been used, in fact can't
be used, by the current 'dfa16' format. However they will be used in the
future as flags, so mask them off when using base as an index value.
Note: the use of the top 8 bits, without masking is trapped by the verify
checks that base entries are within the size bounds.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Move the free_profile fn ahead of aa_alloc_profile so it can be used
in aa_alloc_profile without a forward declaration.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
The sid is not going to be a direct property of a profile anymore, instead
it will be directly related to the label, and the profile will pickup
a label back reference.
For null-profiles replace the use of sid with a per namespace unique
id.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Instead of limiting the setting of the processes limits to current,
relax this to tasks confined by the same profile, as the apparmor
controls for rlimits are at a profile level granularity.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Steve Beattie <sbeattie@ubuntu.com>
The "permipc" command is unused and unfinished, remove it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
-ESTALE used to be incorrectly used to indicate a disconnected path, when
name lookup failed. This was fixed in commit e1b0e444 to correctly return
-EACCESS, but the error to failure message mapping was not correctly updated
to reflect this change.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Steve Beattie <sbeattie@ubuntu.com>
When policy specifies a transition to a profile that is not currently
loaded, it result in exec being denied. However the failure is not being
audited correctly because the audit code is treating this as an allowed
permission and thus not reporting it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c
The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.
The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.
An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.
Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.
Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.
Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
In devcgroup_css_alloc(), there is no longer need for parent_cgroup.
bd2953ebbb("devcg: propagate local changes down the hierarchy") made
the variable parent_cgroup redundant. This patch removes parent_cgroup
from devcgroup_css_alloc().
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Passing a pointer to the dentry name, as a parameter to
process_measurement(), causes a race condition with rename() and
is unnecessary, as the dentry name is already accessible via the
file parameter.
In the normal case, we use the full pathname as provided by
brpm->filename, bprm->interp, or ima_d_path(). Only on ima_d_path()
failure, do we fallback to using the d_name.name, which points
either to external memory or d_iname.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Commit 90ba9b1986 (tcp: tcp_make_synack() can use alloc_skb())
broke certain SELinux/NetLabel configurations by no longer correctly
assigning the sock to the outgoing SYNACK packet.
Cost of atomic operations on the LISTEN socket is quite big,
and we would like it to happen only if really needed.
This patch introduces a new security_ops->skb_owned_by() method,
that is a void operation unless selinux is active.
Reported-by: Miroslav Vadkerti <mvadkert@redhat.com>
Diagnosed-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-security-module@vger.kernel.org
Acked-by: James Morris <james.l.morris@oracle.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Acked-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported for linux-next: Tree for Apr 2 (smack)
Add the required include for smackfs.c
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
I had the following problem reported a while back. If you mount the
same filesystem twice using NFSv4 with different contexts, then the
second context= option is ignored. For instance:
# mount server:/export /mnt/test1
# mount server:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
# ls -dZ /mnt/test1
drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test1
# ls -dZ /mnt/test2
drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test2
When we call into SELinux to set the context of a "cloned" superblock,
it will currently just bail out when it notices that we're reusing an
existing superblock. Since the existing superblock is already set up and
presumably in use, we can't go overwriting its context with the one from
the "original" sb. Because of this, the second context= option in this
case cannot take effect.
This patch fixes this by turning security_sb_clone_mnt_opts into an int
return operation. When it finds that the "new" superblock that it has
been handed is already set up, it checks to see whether the contexts on
the old superblock match it. If it does, then it will just return
success, otherwise it'll return -EBUSY and emit a printk to tell the
admin why the second mount failed.
Note that this patch may cause casualties. The NFSv4 code relies on
being able to walk down to an export from the pseudoroot. If you mount
filesystems that are nested within one another with different contexts,
then this patch will make those mounts fail in new and "exciting" ways.
For instance, suppose that /export is a separate filesystem on the
server:
# mount server:/ /mnt/test1
# mount salusa:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
mount.nfs: an incorrect mount option was specified
...with the printk in the ring buffer. Because we *might* eventually
walk down to /mnt/test1/export, the mount is denied due to this patch.
The second mount needs the pseudoroot superblock, but that's already
present with the wrong context.
OTOH, if we mount these in the reverse order, then both mounts work,
because the pseudoroot superblock created when mounting /export is
discarded once that mount is done. If we then however try to walk into
that directory, the automount fails for the similar reasons:
# cd /mnt/test1/scratch/
-bash: cd: /mnt/test1/scratch: Device or resource busy
The story I've gotten from the SELinux folks that I've talked to is that
this is desirable behavior. In SELinux-land, mounting the same data
under different contexts is wrong -- there can be only one.
Cc: Steve Dickson <steved@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Conflicts:
net/mac80211/sta_info.c
net/wireless/core.h
Two minor conflicts in wireless. Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull userns fixes from Eric W Biederman:
"The bulk of the changes are fixing the worst consequences of the user
namespace design oversight in not considering what happens when one
namespace starts off as a clone of another namespace, as happens with
the mount namespace.
The rest of the changes are just plain bug fixes.
Many thanks to Andy Lutomirski for pointing out many of these issues."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Restrict when proc and sysfs can be mounted
ipc: Restrict mounting the mqueue filesystem
vfs: Carefully propogate mounts across user namespaces
vfs: Add a mount flag to lock read only bind mounts
userns: Don't allow creation if the user is chrooted
yama: Better permission check for ptraceme
pid: Handle the exit of a multi-threaded init.
scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
Change the permission check for yama_ptrace_ptracee to the standard
ptrace permission check, testing if the traceer has CAP_SYS_PTRACE
in the tracees user namespace.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This patch makes exception changes to propagate down in hierarchy respecting
when possible local exceptions.
New exceptions allowing additional access to devices won't be propagated, but
it'll be possible to add an exception to access all of part of the newly
allowed device(s).
New exceptions disallowing access to devices will be propagated down and the
local group's exceptions will be revalidated for the new situation.
Example:
A
/ \
B
group behavior exceptions
A allow "b 8:* rwm", "c 116:1 rw"
B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
If a new exception is added to group A:
# echo "c 116:* r" > A/devices.deny
it'll propagate down and after revalidating B's local exceptions, the exception
"c 116:2 rwm" will be removed.
In case parent's exceptions change and local exceptions are not allowed anymore,
they'll be deleted.
v7:
- do not allow behavior change when the cgroup has children
- update documentation
v6: fixed issues pointed by Serge Hallyn
- only copy parent's exceptions while propagating behavior if the local
behavior is different
- while propagating exceptions, do not clear and copy parent's: it'd be against
the premise we don't propagate access to more devices
v5: fixed issues pointed by Serge Hallyn
- updated documentation
- not propagating when an exception is written to devices.allow
- when propagating a new behavior, clean the local exceptions list if they're
for a different behavior
v4: fixed issues pointed by Tejun Heo
- separated function to walk the tree and collect valid propagation targets
v3: fixed issues pointed by Tejun Heo
- update documentation
- move css_online/css_offline changes to a new patch
- use cgroup_for_each_descendant_pre() instead of own descendant walk
- move exception_copy rework to a separared patch
- move exception_clean rework to a separated patch
v2: fixed issues pointed by Tejun Heo
- instead of keeping the local settings that won't apply anymore, remove them
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Allocate resources and change behavior only when online. This is needed in
order to determine if a node is suitable for hierarchy propagation or if it's
being removed.
Locking:
Both functions take devcgroup_mutex to make changes to device_cgroup structure.
Hierarchy propagation will also take devcgroup_mutex before walking the
tree while walking the tree itself is protected by rcu lock.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently may_access() is only able to verify if an exception is valid for the
current cgroup, which has the same behavior. With hierarchy, it'll be also used
to verify if a cgroup local exception is valid towards its cgroup parent, which
might have different behavior.
v2:
- updated patch description
- rebased on top of a new patch to expand the may_access() logic to make it
more clear
- fixed argument description order in may_access()
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch fixes kernel Oops because of wrong common_audit_data type
in smack_inode_unlink() and smack_inode_rmdir().
When SMACK security module is enabled and SMACK logging is on (/smack/logging
is not zero) and you try to delete the file which
1) you cannot delete due to SMACK rules and logging of failures is on
or
2) you can delete and logging of success is on,
you will see following:
Unable to handle kernel NULL pointer dereference at virtual address 000002d7
[<...>] (strlen+0x0/0x28)
[<...>] (audit_log_untrustedstring+0x14/0x28)
[<...>] (common_lsm_audit+0x108/0x6ac)
[<...>] (smack_log+0xc4/0xe4)
[<...>] (smk_curacc+0x80/0x10c)
[<...>] (smack_inode_unlink+0x74/0x80)
[<...>] (security_inode_unlink+0x2c/0x30)
[<...>] (vfs_unlink+0x7c/0x100)
[<...>] (do_unlinkat+0x144/0x16c)
The function smack_inode_unlink() (and smack_inode_rmdir()) need
to log two structures of different types. First of all it does:
smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_DENTRY);
smk_ad_setfield_u_fs_path_dentry(&ad, dentry);
This will set common audit data type to LSM_AUDIT_DATA_DENTRY
and store dentry for auditing (by function smk_curacc(), which in turn calls
dump_common_audit_data(), which is actually uses provided data and logs it).
/*
* You need write access to the thing you're unlinking
*/
rc = smk_curacc(smk_of_inode(ip), MAY_WRITE, &ad);
if (rc == 0) {
/*
* You also need write access to the containing directory
*/
Then this function wants to log anoter data:
smk_ad_setfield_u_fs_path_dentry(&ad, NULL);
smk_ad_setfield_u_fs_inode(&ad, dir);
The function sets inode field, but don't change common_audit_data type.
rc = smk_curacc(smk_of_inode(dir), MAY_WRITE, &ad);
}
So the dump_common_audit() function incorrectly interprets inode structure
as dentry, and Oops will happen.
This patch reinitializes common_audit_data structures with correct type.
Also I removed unneeded
smk_ad_setfield_u_fs_path_dentry(&ad, NULL);
initialization, because both dentry and inode pointers are stored
in the same union.
Signed-off-by: Igor Zhbanov <i.zhbanov@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Rule modifications are enabled via /smack/change-rule. Format is as follows:
"Subject Object rwaxt rwaxt"
First two strings are subject and object labels up to 255 characters.
Third string contains permissions to enable.
Fourth string contains permissions to disable.
All unmentioned permissions will be left unchanged.
If no rule previously existed, it will be created.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
This fixes audit logs for granting or denial of permissions to show
information about transmute bit.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Special file /smack/revoke-subject will silently accept labels that are not
present on the subject label list. Nothing has to be done for such labels,
as there are no rules for them to revoke.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
The call tree here is:
sk_clone_lock() <- takes bh_lock_sock(newsk);
xfrm_sk_clone_policy()
__xfrm_sk_clone_policy()
clone_policy() <- uses GFP_ATOMIC for allocations
security_xfrm_policy_clone()
security_ops->xfrm_policy_clone_security()
selinux_xfrm_policy_clone()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
DEFINE_STATIC_SRCU() defines srcu struct and do init at build time.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().
This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:
Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().
I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.
While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.
And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes CVE-2013-1792.
There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created. It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.
Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.
THREAD A THREAD B
=============================== ===============================
==>call install_user_keyrings();
if (!cred->user->session_keyring)
==>call install_user_keyrings()
...
user->uid_keyring = uid_keyring;
if (user->uid_keyring)
return 0;
<==
key = cred->user->session_keyring [== NULL]
user->session_keyring = session_keyring;
atomic_inc(&key->usage); [oops]
At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.
The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.
This couldn't be reproduced on a stock kernel. However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.
Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull more VFS bits from Al Viro:
"Unfortunately, it looks like xattr series will have to wait until the
next cycle ;-/
This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
more file_inode() work"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify path_get/path_put and fs_struct.c stuff
fix nommu breakage in shmem.c
cache the value of file_inode() in struct file
9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
9p: make sure ->lookup() adds fid to the right dentry
9p: untangle ->lookup() a bit
9p: double iput() in ->lookup() if d_materialise_unique() fails
9p: v9fs_fid_add() can't fail now
v9fs: get rid of v9fs_dentry
9p: turn fid->dlist into hlist
9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
more file_inode() open-coded instances
selinux: opened file can't have NULL or negative ->f_path.dentry
(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Commit "85865c1 ima: add policy support for file system uuid"
introduced a CONFIG_BLOCK dependency. This patch defines a
wrapper called blk_part_pack_uuid(), which returns -EINVAL,
when CONFIG_BLOCK is not defined.
security/integrity/ima/ima_policy.c:538:4: error: implicit declaration
of function 'part_pack_uuid' [-Werror=implicit-function-declaration]
Changelog v2:
- Reference commit number in patch description
Changelog v1:
- rename ima_part_pack_uuid() to blk_part_pack_uuid()
- resolve scripts/checkpatch.pl warnings
Changelog v0:
- fix UUID scripts/Lindent msgs
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Commit "750943a ima: remove enforce checking duplication" combined
the 'in IMA policy' and 'enforcing file integrity' checks. For
the non-file, kernel module verification, a specific check for
'enforcing file integrity' was not added. This patch adds the
check.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Commit 103a197c0c ("security/device_cgroup: lock assert fails in
dev_exception_clean()") grabs devcgroup_mutex to fix assert failure, but
a mutex can't be grabbed in rcu callback. Since there shouldn't be any
other references when css_free is called, mutex isn't needed for list
cleanup in devcgroup_css_free().
Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"This is basically a maintenance update for the TPM driver and EVM/IMA"
Fix up conflicts in lib/digsig.c and security/integrity/ima/ima_main.c
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (45 commits)
tpm/ibmvtpm: build only when IBM pseries is configured
ima: digital signature verification using asymmetric keys
ima: rename hash calculation functions
ima: use new crypto_shash API instead of old crypto_hash
ima: add policy support for file system uuid
evm: add file system uuid to EVM hmac
tpm_tis: check pnp_acpi_device return code
char/tpm/tpm_i2c_stm_st33: drop temporary variable for return value
char/tpm/tpm_i2c_stm_st33: remove dead assignment in tpm_st33_i2c_probe
char/tpm/tpm_i2c_stm_st33: Remove __devexit attribute
char/tpm/tpm_i2c_stm_st33: Don't use memcpy for one byte assignment
tpm_i2c_stm_st33: removed unused variables/code
TPM: Wait for TPM_ACCESS tpmRegValidSts to go high at startup
tpm: Fix cancellation of TPM commands (interrupt mode)
tpm: Fix cancellation of TPM commands (polling mode)
tpm: Store TPM vendor ID
TPM: Work around buggy TPMs that block during continue self test
tpm_i2c_stm_st33: fix oops when i2c client is unavailable
char/tpm: Use struct dev_pm_ops for power management
TPM: STMicroelectronics ST33 I2C BUILD STUFF
...
A patch to fix some unreachable code in search_my_process_keyrings() got
applied twice by two different routes upstream as commits e67eab39be
and b010520ab3 (both "fix unreachable code").
Unfortunately, the second application removed something it shouldn't
have and this wasn't detected by GIT. This is due to the patch not
having sufficient lines of context to distinguish the two places of
application.
The effect of this is relatively minor: inside the kernel, the keyring
search routines may search multiple keyrings and then prioritise the
errors if no keys or negative keys are found in any of them. With the
extra deletion, the presence of a negative key in the thread keyring
(causing ENOKEY) is incorrectly overridden by an error searching the
process keyring.
So revert the second application of the patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Asymmetric keys were introduced in linux-3.7 to verify the signature on
signed kernel modules. The asymmetric keys infrastructure abstracts the
signature verification from the crypto details. This patch adds IMA/EVM
signature verification using asymmetric keys. Support for additional
signature verification methods can now be delegated to the asymmetric
key infrastructure.
Although the module signature header and the IMA/EVM signature header
could use the same format, to minimize the signature length and save
space in the extended attribute, this patch defines a new IMA/EVM
header format. The main difference is that the key identifier is a
sha1[12 - 19] hash of the key modulus and exponent, similar to the
current implementation. The only purpose of the key identifier is to
identify the corresponding key in the kernel keyring. ima-evm-utils
was updated to support the new signature format.
While asymmetric signature verification functionality supports many
different hash algorithms, the hash used in this patch is calculated
during the IMA collection phase, based on the configured algorithm.
The default algorithm is sha1, but for backwards compatibility md5
is supported. Due to this current limitation, signatures should be
generated using a sha1 hash algorithm.
Changes in this patch:
- Functionality has been moved to separate source file in order to get rid of
in source #ifdefs.
- keyid is derived according to the RFC 3280. It does not require to assign
IMA/EVM specific "description" when loading X509 certificate. Kernel
asymmetric key subsystem automatically generate the description. Also
loading a certificate does not require using of ima-evm-utils and can be
done using keyctl only.
- keyid size is reduced to 32 bits to save xattr space. Key search is done
using partial match functionality of asymmetric_key_match().
- Kconfig option title was changed
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Rename hash calculation functions to reflect meaning
and change argument order in conventional way.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Old crypto hash API internally uses shash API.
Using shash API directly is more efficient.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The IMA policy permits specifying rules to enable or disable
measurement/appraisal/audit based on the file system magic number.
If, for example, the policy contains an ext4 measurement rule,
the rule is enabled for all ext4 partitions.
Sometimes it might be necessary to enable measurement/appraisal/audit
only for one partition and disable it for another partition of the
same type. With the existing IMA policy syntax, this can not be done.
This patch provides support for IMA policy rules to specify the file
system by its UUID (eg. fsuuid=397449cd-687d-4145-8698-7fed4a3e0363).
For partitions not being appraised, it might be a good idea to mount
file systems with the 'noexec' option to prevent executing non-verified
binaries.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
EVM uses the same key for all file systems to calculate the HMAC,
making it possible to paste inodes from one file system on to another
one, without EVM being able to detect it. To prevent such an attack,
it is necessary to make the EVM HMAC file system specific.
This patch uses the file system UUID, a file system unique identifier,
to bind the EVM HMAC to the file system. The value inode->i_sb->s_uuid
is used for the HMAC hash calculation, instead of using it for deriving
the file system specific key. Initializing the key for every inode HMAC
calculation is a bit more expensive operation than adding the uuid to
the HMAC hash.
Changing the HMAC calculation method or adding additional info to the
calculation, requires existing EVM labeled file systems to be relabeled.
This patch adds a Kconfig HMAC version option for backwards compatability.
Changelog v1:
- squash "hmac version setting"
Changelog v0:
- add missing Kconfig depends (Mimi)
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Pull networking updates from David Miller:
"Much more accumulated than I would have liked due to an unexpected
bout with a nasty flu:
1) AH and ESP input don't set ECN field correctly because the
transport head of the SKB isn't set correctly, fix from Li
RongQing.
2) If netfilter conntrack zones are disabled, we can return an
uninitialized variable instead of the proper error code. Fix from
Borislav Petkov.
3) Fix double SKB free in ath9k driver beacon handling, from Felix
Feitkau.
4) Remove bogus assumption about netns cleanup ordering in
nf_conntrack, from Pablo Neira Ayuso.
5) Remove a bogus BUG_ON in the new TCP fastopen code, from Eric
Dumazet. It uses spin_is_locked() in it's test and is therefore
unsuitable for UP.
6) Fix SELINUX labelling regressions added by the tuntap multiqueue
changes, from Paul Moore.
7) Fix CRC errors with jumbo frame receive in tg3 driver, from Nithin
Nayak Sujir.
8) CXGB4 driver sets interrupt coalescing parameters only on first
queue, rather than all of them. Fix from Thadeu Lima de Souza
Cascardo.
9) Fix regression in the dispatch of read/write registers in dm9601
driver, from Tushar Behera.
10) ipv6_append_data miscalculates header length, from Romain KUNTZ.
11) Fix PMTU handling regressions on ipv4 routes, from Steffen
Klassert, Timo Teräs, and Julian Anastasov.
12) In 3c574_cs driver, add necessary parenthesis to "x << y & z"
expression. From Nickolai Zeldovich.
13) macvlan_get_size() causes underallocation netlink message space,
fix from Eric Dumazet.
14) Avoid division by zero in xfrm_replay_advance_bmp(), from Nickolai
Zeldovich. Amusingly the zero check was already there, we were
just performing it after the modulus :-)
15) Some more splice bug fixes from Eric Dumazet, which fix things
mostly eminating from how we now more aggressively use high-order
pages in SKBs.
16) Fix size calculation bug when freeing hash tables in the IPSEC
xfrm code, from Michal Kubecek.
17) Fix PMTU event propagation into socket cached routes, from Steffen
Klassert.
18) Fix off by one in TX buffer release in netxen driver, from Eric
Dumazet.
19) Fix rediculous memory allocation requirements introduced by the
tuntap multiqueue changes, from Jason Wang.
20) Remove bogus AMD platform workaround in r8169 driver that causes
major problems in normal operation, from Timo Teräs.
21) virtio-net set affinity and select queue don't handle
discontiguous cpu numbers properly, fix from Wanlong Gao.
22) Fix a route refcounting issue in loopback driver, from Eric
Dumazet. There's a similar fix coming that we might add to the
macvlan driver as well.
23) Fix SKB leaks in batman-adv's distributed arp table code, from
Matthias Schiffer.
24) r8169 driver gives descriptor ownership back the hardware before
we're done reading the VLAN tag out of it, fix from Francois
Romieu.
25) Checksums not calculated properly in GRE tunnel driver fix from
Pravin B Shelar.
26) Fix SCTP memory leak on namespace exit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
dm9601: support dm9620 variant
SCTP: Free the per-net sysctl table on net exit. v2
net: phy: icplus: fix broken INTR pin settings
net: phy: icplus: Use the RGMII interface mode to configure clock delays
IP_GRE: Fix kernel panic in IP_GRE with GRE csum.
sctp: set association state to established in dupcook_a handler
ip6mr: limit IPv6 MRT_TABLE identifiers
r8169: fix vlan tag read ordering.
net: cdc_ncm: use IAD provided by the USB core
batman-adv: filter ARP packets with invalid MAC addresses in DAT
batman-adv: check for more types of invalid IP addresses in DAT
batman-adv: fix skb leak in batadv_dat_snoop_incoming_arp_reply()
net: loopback: fix a dst refcounting issue
virtio-net: reset virtqueue affinity when doing cpu hotplug
virtio-net: split out clean affinity function
virtio-net: fix the set affinity bug when CPU IDs are not consecutive
can: pch_can: fix invalid error codes
can: ti_hecc: fix invalid error codes
can: c_can: fix invalid error codes
r8169: remove the obsolete and incorrect AMD workaround
...
Different hooks can require different methods for appraising a
file's integrity. As a result, an integrity appraisal status is
cached on a per hook basis.
Only a hook specific rule, requires the inode to be re-appraised.
This patch eliminates unnecessary appraisals.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
With the new IMA policy 'appraise_type=' option, different hooks
can require different methods for appraising a file's integrity.
For example, the existing 'ima_appraise_tcb' policy defines a
generic rule, requiring all root files to be appraised, without
specfying the appraisal method. A more specific rule could require
all kernel modules, for example, to be signed.
appraise fowner=0 func=MODULE_CHECK appraise_type=imasig
appraise fowner=0
As a result, the integrity appraisal results for the same inode, but
for different hooks, could differ. This patch caches the integrity
appraisal results on a per hook basis.
Changelog v2:
- Rename ima_cache_status() to ima_set_cache_status()
- Rename and move get_appraise_status() to ima_get_cache_status()
Changelog v0:
- include IMA_APPRAISE/APPRAISED_SUBMASK in IMA_DO/DONE_MASK (Dmitry)
- Support independent MODULE_CHECK appraise status.
- fixed IMA_XXXX_APPRAISE/APPRAISED flags
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
In preparation for hook specific appraise status results, increase
the iint flags size.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
The 'security.ima' extended attribute may contain either the file data's
hash or a digital signature. This patch adds support for requiring a
specific extended attribute type. It extends the IMA policy with a new
keyword 'appraise_type=imasig'. (Default is hash.)
Changelog v2:
- Fixed Documentation/ABI/testing/ima_policy option syntax
Changelog v1:
- Differentiate between 'required' vs. 'actual' extended attribute
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch forbids write access to files with digital signatures, as they
are considered immutable.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Define a new function ima_d_path(), which returns the full pathname.
This function will be used further, for example, by the directory
verification code.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch reduces size of the iint structure by 8 bytes.
It saves about 15% of iint cache memory.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Hexdump is not really helping. Audit messages prints error messages.
Remove it.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Based on the IMA appraisal policy, files are appraised. For those
files appraised, the IMA hooks return the integrity appraisal result,
assuming IMA-appraisal is in enforcing mode. This patch combines
both of these criteria (in policy and enforcing file integrity),
removing the checking duplication.
Changelog v1:
- Update hook comments
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
When a file system is mounted read-only, setting the xattr value in
fix mode fails with an error code -EROFS. The xattr should be fixed
after the file system is remounted read-write. This patch verifies
that the set xattr succeeds, before setting the appraise status value
to INTEGRITY_PASS.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
EVM cannot be built as a kernel module. Remove the unncessary __exit
functions.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Although the IMA policy does not change, the LSM policy can be
reloaded, leaving the IMA LSM based rules referring to the old,
stale LSM policy. This patch updates the IMA LSM based rules
to reflect the reloaded LSM policy.
Reported-by: Sven Vermeulen <sven.vermeulen@siphos.be>
tested-by: Sven Vermeulen <sven.vermeulen@siphos.be>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
This patch corrects some problems with LSM/SELinux that were introduced
with the multiqueue patchset. The problem stems from the fact that the
multiqueue work changed the relationship between the tun device and its
associated socket; before the socket persisted for the life of the
device, however after the multiqueue changes the socket only persisted
for the life of the userspace connection (fd open). For non-persistent
devices this is not an issue, but for persistent devices this can cause
the tun device to lose its SELinux label.
We correct this problem by adding an opaque LSM security blob to the
tun device struct which allows us to have the LSM security state, e.g.
SELinux labeling information, persist for the lifetime of the tun
device. In the process we tweak the LSM hooks to work with this new
approach to TUN device/socket labeling and introduce a new LSM hook,
security_tun_dev_attach_queue(), to approve requests to attach to a
TUN queue via TUNSETQUEUE.
The SELinux code has been adjusted to match the new LSM hooks, the
other LSMs do not make use of the LSM TUN controls. This patch makes
use of the recently added "tun_socket:attach_queue" permission to
restrict access to the TUNSETQUEUE operation. On older SELinux
policies which do not define the "tun_socket:attach_queue" permission
the access control decision for TUNSETQUEUE will be handled according
to the SELinux policy's unknown permission setting.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@parisplace.org>
Tested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new permission to align with the new TUN multiqueue support,
"tun_socket:attach_queue".
The corresponding SELinux reference policy patch is show below:
diff --git a/policy/flask/access_vectors b/policy/flask/access_vectors
index 28802c5..a0664a1 100644
--- a/policy/flask/access_vectors
+++ b/policy/flask/access_vectors
@@ -827,6 +827,9 @@ class kernel_service
class tun_socket
inherits socket
+{
+ attach_queue
+}
class x_pointer
inherits x_device
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@parisplace.org>
Tested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new kernel module syscall appraises kernel modules based
on policy. If the IMA policy requires kernel module checking,
fallback to module signature enforcing for the existing syscall.
Without CONFIG_MODULE_SIG_FORCE enabled, the kernel module's
integrity is unknown, return -EACCES.
Changelog v1:
- Fix ima_module_check() return result (Tetsuo Handa)
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
We set ret to NULL then test it. Remove the bogus test
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Really fix tuntap SKB use after free bug, from Eric Dumazet.
2) Adjust SKB data pointer to point past the transport header before
calling icmpv6_notify() so that the headers are in the state which
that function expects. From Duan Jiong.
3) Fix ambiguities in the new tuntap multi-queue APIs. From Jason
Wang.
4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.
5) Don't destroy mutex after freeing up device private in mac802154,
fix also from Konstantin Khlebnikov.
6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.
7) SCTP HMAC kconfig rework, from Neil Horman.
8) Fix SCTP jprobes function signature, otherwise things explode, from
Daniel Borkmann.
9) Fix typo in ipv6-offload Makefile variable reference, from Simon
Arlott.
10) Don't fail USBNET open just because remote wakeup isn't supported,
from Oliver Neukum.
11) be2net driver bug fixes from Sathya Perla.
12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
Woodhouse.
13) Fix MTU changing regression in 8139cp driver, from John Greene.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
solos-pci: ensure all TX packets are aligned to 4 bytes
solos-pci: add firmware upgrade support for new models
solos-pci: remove superfluous debug output
solos-pci: add GPIO support for newer versions on Geos board
8139cp: Prevent dev_close/cp_interrupt race on MTU change
net: qmi_wwan: add ZTE MF880
drivers/net: Use of_match_ptr() macro in smsc911x.c
drivers/net: Use of_match_ptr() macro in smc91x.c
ipv6: addrconf.c: remove unnecessary "if"
bridge: Correctly encode addresses when dumping mdb entries
bridge: Do not unregister all PF_BRIDGE rtnl operations
use generic usbnet_manage_power()
usbnet: generic manage_power()
usbnet: handle PM failure gracefully
ksz884x: fix receive polling race condition
qlcnic: update driver version
qlcnic: fix unused variable warnings
net: fec: forbid FEC_PTP on SoCs that do not support
be2net: fix wrong frag_idx reported by RX CQ
be2net: fix be_close() to ensure all events are ack'ed
...
to verify the source of the module (ChromeOS) and/or use standard IMA on it
or other security hooks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQ0VKlAAoJENkgDmzRrbjxjuEQALVHpD1cSmryOzVwkNn7rVGP
PV3KVbUs+qzUCm2c3AafIIlSBm2LOUl+cR3uNC7di8aHarRF3VHkK2OQ4Fx97ECd
KKBqAyY3R0q1mAKujb/MWwiK0YgosEDIOzGGn2yQhNFsxKqnMB02P4j82IO7+g+w
Cc3XuDyWHoH2I+ySgz0Q8NHAqufD/DMZUKud7jw2Lsv6PuICJ1Oqgl/Gd/muxort
4a5tV3tjhRGywHS/8b2fbDUXkybC5NKK0FN+gyoaROmJ/THeHEQDGXZT9bc2vmVx
HvRy/5k8dzQ6LAJ2mLnPvy0pmv0u7NYMvjxTxxUlUkFMkYuVticikQfwSYDbDPt4
mbsLxchpgi8z4x8HltEERffCX5tldo/5hz1uemqhqIsMRIrRFnlHkSIgkGjVHf2u
LXQBLT8uTm6C0VyNQPrI/hUZzIax7WtKbPSoK9lmExNbKqloEFh/mVXvfQxei2kp
wnUZcnmPIqSvw7b4CWu7HibMYu2VvGBgm3YIfJRi4AQme1mzFYLpZoxF5Pj+Ykbt
T//Hb1EsNQTTFCg7MZhnJSAw/EVUvNDUoullORClyqw6+xxjVKqWpPJgYDRfWOlJ
Xa+s7DNrL+Oo1WWR8l5ruoQszbR8szIyeyPKKxRUcQj2zsqghoWuzKAx2saSEw3W
pNkoJU+dGC7kG/yVAS8N
=uoJj
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"Nothing all that exciting; a new module-from-fd syscall for those who
want to verify the source of the module (ChromeOS) and/or use standard
IMA on it or other security hooks."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Fix kbuild output when using default extra_certificates
MODSIGN: Avoid using .incbin in C source
modules: don't hand 0 to vmalloc.
module: Remove a extra null character at the top of module->strtab.
ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
ASN.1: Define indefinite length marker constant
moduleparam: use __UNIQUE_ID()
__UNIQUE_ID()
MODSIGN: Add modules_sign make target
powerpc: add finit_module syscall.
ima: support new kernel module syscall
add finit_module syscall to asm-generic
ARM: add finit_module syscall to ARM
security: introduce kernel_module_from_file hook
module: add flags arg to sys_finit_module()
module: add syscall to load module from fd
Pull (again) user namespace infrastructure changes from Eric Biederman:
"Those bugs, those darn embarrasing bugs just want don't want to get
fixed.
Linus I just updated my mirror of your kernel.org tree and it appears
you successfully pulled everything except the last 4 commits that fix
those embarrasing bugs.
When you get a chance can you please repull my branch"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Fix typo in description of the limitation of userns_install
userns: Add a more complete capability subset test to commit_creds
userns: Require CAP_SYS_ADMIN for most uses of setns.
Fix cap_capable to only allow owners in the parent user namespace to have caps.
Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.
This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.
This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.
This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.
The files under /proc/<pid>/ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.
The files under /proc/<pid>/ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.
Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.
Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc/<pid>/ns/ files in this tree.
Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.
Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...
Pull security subsystem updates from James Morris:
"A quiet cycle for the security subsystem with just a few maintenance
updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Smack: create a sysfs mount point for smackfs
Smack: use select not depends in Kconfig
Yama: remove locking from delete path
Yama: add RCU to drop read locking
drivers/char/tpm: remove tasklet and cleanup
KEYS: Use keyring_alloc() to create special keyrings
KEYS: Reduce initial permissions on keys
KEYS: Make the session and process keyrings per-thread
seccomp: Make syscall skipping and nr changes more consistent
key: Fix resource leak
keys: Fix unreachable code
KEYS: Add payload preparsing opportunity prior to key instantiate or update
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Lutomirski pointed out that the current behavior of allowing the
owner of a user namespace to have all caps when that owner is not in a
parent user namespace is wrong. Add a test to ensure the owner of a user
namespace is in the parent of the user namespace to fix this bug.
Thankfully this bug did not apply to the initial user namespace, keeping
the mischief that can be caused by this bug quite small.
This is bug was introduced in v3.5 by commit 783291e690
"Simplify the user_namespace by making userns->creator a kuid."
But did not matter until the permisions required to create
a user namespace were relaxed allowing a user namespace to be created
inside of a user namespace.
The bug made it possible for the owner of a user namespace to be
present in a child user namespace. Since the owner of a user nameapce
is granted all capabilities it became possible for users in a
grandchild user namespace to have all privilges over their parent user
namspace.
Reorder the checks in cap_capable. This should make the common case
faster and make it clear that nothing magic happens in the initial
user namespace. The reordering is safe because cred->user_ns
can only be in targ_ns or targ_ns->parent but not both.
Add a comment a the top of the loop to make the logic of
the code clear.
Add a distinct variable ns that changes as we walk up
the user namespace hierarchy to make it clear which variable
is changing.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
There are a number of "conventions" for where to put LSM filesystems.
Smack adheres to none of them. Create a mount point at /sys/fs/smackfs
for mounting smackfs so that Smack can be conventional.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The components NETLABEL and SECURITY_NETWORK are required by
Smack. Using "depends" in Kconfig hides the Smack option
if the user hasn't figured out that they need to be enabled
while using make menuconfig. Using select is a better choice.
Because select is not recursive depends on NET and SECURITY
are added. The reflects similar usage in TOMOYO and AppArmor.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
With the addition of the new kernel module syscall, which defines two
arguments - a file descriptor to the kernel module and a pointer to a NULL
terminated string of module arguments - it is now possible to measure and
appraise kernel modules like any other file on the file system.
This patch adds support to measure and appraise kernel modules in an
extensible and consistent manner.
To support filesystems without extended attribute support, additional
patches could pass the signature as the first parameter.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that kernel module origins can be reasoned about, provide a hook to
the LSMs to make policy decisions about the module file. This will let
Chrome OS enforce that loadable kernel modules can only come from its
read-only hash-verified root filesystem. Other LSMs can, for example,
read extended attributes for signatures, etc.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
Pull cgroup changes from Tejun Heo:
"A lot of activities on cgroup side. The big changes are focused on
making cgroup hierarchy handling saner.
- cgroup_rmdir() had peculiar semantics - it allowed cgroup
destruction to be vetoed by individual controllers and tried to
drain refcnt synchronously. The vetoing never worked properly and
caused good deal of contortions in cgroup. memcg was the last
reamining user. Michal Hocko removed the usage and cgroup_rmdir()
path has been simplified significantly. This was done in a
separate branch so that the memcg people can base further memcg
changes on top.
- The above allowed cleaning up cgroup lifecycle management and
implementation of generic cgroup iterators which are used to
improve hierarchy support.
- cgroup_freezer updated to allow migration in and out of a frozen
cgroup and handle hierarchy. If a cgroup is frozen, all descendant
cgroups are frozen.
- netcls_cgroup and netprio_cgroup updated to handle hierarchy
properly.
- Various fixes and cleanups.
- Two merge commits. One to pull in memcg and rmdir cleanups (needed
to build iterators). The other pulled in cgroup/for-3.7-fixes for
device_cgroup fixes so that further device_cgroup patches can be
stacked on top."
Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)
* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
cgroup: update Documentation/cgroups/00-INDEX
cgroup_rm_file: don't delete the uncreated files
cgroup: remove subsystem files when remounting cgroup
cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
cgroup: warn about broken hierarchies only after css_online
cgroup: list_del_init() on removed events
cgroup: fix lockdep warning for event_control
cgroup: move list add after list head initilization
netprio_cgroup: allow nesting and inherit config on cgroup creation
netprio_cgroup: implement netprio[_set]_prio() helpers
netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
netprio_cgroup: reimplement priomap expansion
netprio_cgroup: shorten variable names in extend_netdev_table()
netprio_cgroup: simplify write_priomap()
netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
cgroup: remove obsolete guarantee from cgroup_task_migrate.
cgroup: add cgroup->id
cgroup, cpuset: remove cgroup_subsys->post_clone()
cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
...
Rebased on the latest net-next tree.
RTM_NEWNETCONF and RTM_GETNETCONF are missing in this table.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V5: fix two bugs pointed out by Thomas
remove seq check for now, mark it as TODO
V4: remove some useless #include
some coding style fix
V3: drop debugging printk's
update selinux perm table as well
V2: drop patch 1/2, export ifindex directly
Redesign netlink attributes
Improve netlink seq check
Handle IPv6 addr as well
This patch exports bridge multicast database via netlink
message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
(Thanks to Thomas for patient reviews)
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
===============================
[ INFO: suspicious RCU usage. ]
3.5.0-rc1+ #63 Not tainted
-------------------------------
security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by trinity-child1/8750:
#0: (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0
stack backtrace:
Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63
Call Trace:
[<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130
[<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0
[<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0
[<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0
[<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0
[<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff812c9536>] security_socket_bind+0x16/0x20
[<ffffffff815550ca>] sys_bind+0x7a/0x100
[<ffffffff816c03d5>] ? sysret_check+0x22/0x5d
[<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b
This patch below does what Paul McKenney suggested in the previous thread.
Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Instead of locking the list during a delete, mark entries as invalid
and trigger a workqueue to clean them up. This lets us easily handle
task_free from interrupt context.
Signed-off-by: Kees Cook <keescook@chromium.org>
Stop using spinlocks in the read path. Add RCU list to handle the readers.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: John Johansen <john.johansen@canonical.com>
The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task. struct cred may go away as
soon as the rcu lock is released. This leads to a race where we
can dereference a stale user namespace pointer.
To make it obvious a struct cred is involved kill task_user_ns.
To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
device_cgroup uses RCU safe ->exceptions list which is write-protected
by devcgroup_mutex and has had some issues using locking correctly.
Add lockdep asserts to utility functions so that future errors can be
easily detected.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
dev_cgroup->exceptions is protected with devcgroup_mutex for writes
and RCU for reads; however, RCU usage isn't correct.
* dev_exception_clean() doesn't use RCU variant of list_del() and
kfree(). The function can race with may_access() and may_access()
may end up dereferencing already freed memory. Use list_del_rcu()
and kfree_rcu() instead.
* may_access() may be called only with RCU read locked but doesn't use
RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
In 4cef7299b4 ("device_cgroup: add proper checking when changing
default behavior") the cgroup parent usage is unchecked. root will not
have a parent and trying to use device.{allow,deny} will cause problems.
For some reason my stressing scripts didn't test the root directory so I
didn't catch it on my regular tests.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Before changing a group's default behavior to ALLOW, we must check if
its parent's behavior is also ALLOW.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert the code to use kstrtou32() instead of simple_strtoul() which is
deprecated. The real size of the variables are u32, so use kstrtou32
instead of kstrtoul
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done in a v2 patch but v1 ended up being committed. The
variable name is less confusing and stores the default behavior when no
matching exception exists.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
BugLink: http://bugs.launchpad.net/bugs/1056078
Profile replacement can cause long chains of profiles to build up when
the profile being replaced is pinned. When the pinned profile is finally
freed, it puts the reference to its replacement, which may in turn nest
another call to free_profile on the stack. Because this may happen for
each profile in the replacedby chain this can result in a recusion that
causes the stack to overflow.
Break this nesting by directly walking the chain of replacedby profiles
(ie. use iteration instead of recursion to free the list). This results
in at most 2 levels of free_profile being called, while freeing a
replacedby chain.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The capability defines have moved causing the auto generated names
of capabilities that apparmor uses in logging to be incorrect.
Fix the autogenerated table source to uapi/linux/capability.h
Reported-by: YanHong <clouds.yan@gmail.com>
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Analyzed-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
replace_fd() began with "eats a reference, tries to insert into
descriptor table" semantics; at some point I'd switched it to
much saner current behaviour ("try to insert into descriptor
table, grabbing a new reference if inserted; caller should do
fput() in any case"), but forgot to update the callers.
Mea culpa...
[Spotted by Pavel Roskin, who has really weird system with pipe-fed
coredumps as part of what he considers a normal boot ;-)]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull module signing support from Rusty Russell:
"module signing is the highlight, but it's an all-over David Howells frenzy..."
Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.
* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
X.509: Fix indefinite length element skip error handling
X.509: Convert some printk calls to pr_devel
asymmetric keys: fix printk format warning
MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
MODSIGN: Make mrproper should remove generated files.
MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
MODSIGN: Use the same digest for the autogen key sig as for the module sig
MODSIGN: Sign modules during the build process
MODSIGN: Provide a script for generating a key ID from an X.509 cert
MODSIGN: Implement module signature checking
MODSIGN: Provide module signing public keys to the kernel
MODSIGN: Automatically generate module signing keys if missing
MODSIGN: Provide Kconfig options
MODSIGN: Provide gitignore and make clean rules for extra files
MODSIGN: Add FIPS policy
module: signature checking hook
X.509: Add a crypto key parser for binary (DER) X.509 certificates
MPILIB: Provide a function to read raw data into an MPI
X.509: Add an ASN.1 decoder
X.509: Add simple ASN.1 grammar compiler
...
Merge patches from Andrew Morton:
"A few misc things and very nearly all of the MM tree. A tremendous
amount of stuff (again), including a significant rbtree library
rework."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits)
sparc64: Support transparent huge pages.
mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().
mm: Add and use update_mmu_cache_pmd() in transparent huge page code.
sparc64: Document PGD and PMD layout.
sparc64: Eliminate PTE table memory wastage.
sparc64: Halve the size of PTE tables
sparc64: Only support 4MB huge pages and 8KB base pages.
memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
mm: memcg: clean up mm_match_cgroup() signature
mm: document PageHuge somewhat
mm: use %pK for /proc/vmallocinfo
mm, thp: fix mlock statistics
mm, thp: fix mapped pages avoiding unevictable list on mlock
memory-hotplug: update memory block's state and notify userspace
memory-hotplug: preparation to notify memory block's state at memory hot remove
mm: avoid section mismatch warning for memblock_type_name
make GFP_NOTRACK definition unconditional
cma: decrease cc.nr_migratepages after reclaiming pagelist
CMA: migrate mlocked pages
kpageflags: fix wrong KPF_THP on non-huge compound pages
...
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
a task's executable file. After this patch they will use mm->exe_file
directly. mm->exe_file is protected with mm->mmap_sem, so locking stays
the same.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [arch/tile]
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [tomoyo]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
other branch as normal asm-generic changes do. One is a fix for a
build warning, the other two are more interesting:
* A patch from Mark Brown to allow using the common clock infrastructure
on all architectures, so we can use the clock API in architecture
independent device drivers.
* The UAPI split patches from David Howells for the asm-generic files.
There are other architecture specific series that are going through
the arch maintainer tree and that depend on this one.
There may be a few small merge conflicts between Mark's patch and
the following arch header file split patches. In each case the solution
will be to keep the new "generic-y += clkdev.h" line, even if it
ends up being the only line in the Kbuild file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIVAwUAUHLuO2CrR//JCVInAQLsKxAAoa+oSP3KGuQbLHq2wvUxAdXWDFcZgKo+
qMRejSJPI0sreJ9GJHpUjHtJ7W2gujeo9upmUIJzoWY9vrmjkhCDkaWliaQI8SmY
CKB9zI2xCB9iFzHtWxocfnJzU7NvzjJm+jnIYrqkaO9HGMxL99tsv9TsBYXK/08j
QmlGP5fHdGU3zZxVt5r1GL8/nfX4zn3/YEll9nJ7vqXZltIBbaksxmgPoa0QkkH8
LMeMAlgRR2DHWt58gXHyGB7Afx3QEnZBDaQpYxA446P+2gtvIhFYOnpuX14pZb7t
m4IM0vOO6WzARQR6DJlRHfYJevojgGHu4Y8wkEzuWE+Hr2BqmiVct7UKqGJdqTY5
7+I7wwaJmdd3zE61LxRS9UOjJDwMh1gmsNU4+42RArQ5eLcikNR5zfYzDRLCTmnk
qKZvbiaxgme2YvWazxbBT6EqmIVU6lfHHIoMLr8U0j40Cl0GCmN7EBbe7/r2Jhjs
6VnCOJ6vb4RCOJGGAcLRMQu7xEtqcCe0Zht839wl13QXewxS3QRgwg6Bjy/fwA9r
jij5gf+R25J/fQW7yZv4LwcMowRE1xvpu0ebwkK3LLR8jcon71scd6f3PW/bUUpj
j4tgFuJbXzOxQ4LFgBzvdVgx3wDzsQhqb/6p2l6ROdcw7xXFDdFZ4zq3h0A25wXZ
J6WDO387tpg=
=Aaki
-----END PGP SIGNATURE-----
Merge tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"This has three changes for asm-generic that did not really fit into
any other branch as normal asm-generic changes do. One is a fix for a
build warning, the other two are more interesting:
* A patch from Mark Brown to allow using the common clock
infrastructure on all architectures, so we can use the clock API in
architecture independent device drivers.
* The UAPI split patches from David Howells for the asm-generic
files. There are other architecture specific series that are going
through the arch maintainer tree and that depend on this one.
There may be a few small merge conflicts between Mark's patch and the
following arch header file split patches. In each case the solution
will be to keep the new "generic-y += clkdev.h" line, even if it ends
up being the only line in the Kbuild file."
* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
UAPI: (Scripted) Disintegrate include/asm-generic
asm-generic: Add default clkdev.h
asm-generic: xor: mark static functions as __maybe_unused
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called. This is done with the
provision of two new key type operations:
int (*preparse)(struct key_preparsed_payload *prep);
void (*free_preparse)(struct key_preparsed_payload *prep);
If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases). The second operation is called to clean up if the first
was called.
preparse() is given the opportunity to fill in the following structure:
struct key_preparsed_payload {
char *description;
void *type_data[2];
void *payload;
const void *data;
size_t datalen;
size_t quotalen;
};
Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.
The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.
The preparser may also propose a description for the key by attaching it as a
string to the description field. This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function. This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.
This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.
The instantiate() and update() operations are then modified to look like this:
int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
int (*update)(struct key *key, struct key_preparsed_payload *prep);
and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull IMA bugfix (security subsystem) from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: fix bug in argument order
This patch replaces the "whitelist" usage in the code and comments and replace
them by exception list related information.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original model of device_cgroup is having a whitelist where all the
allowed devices are listed. The problem with this approach is that is
impossible to have the case of allowing everything but few devices.
The reason for that lies in the way the whitelist is handled internally:
since there's only a whitelist, the "all devices" entry would have to be
removed and replaced by the entire list of possible devices but the ones
that are being denied. Since dev_t is 32 bits long, representing the allowed
devices as a bitfield is not memory efficient.
This patch replaces the "whitelist" by a "exceptions" list and the default
policy is kept as "deny_all" variable in dev_cgroup structure.
The current interface determines that whenever "a" is written to devices.allow
or devices.deny, the entry masking all devices will be added or removed,
respectively. This behavior is kept and it's what will determine the default
policy:
# cat devices.list
a *:* rwm
# echo a >devices.deny
# cat devices.list
# echo a >devices.allow
# cat devices.list
a *:* rwm
The interface is also preserved. For example, if one wants to block only access
to /dev/null:
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null
# echo a >devices.allow
# echo "c 1:3 rwm" >devices.deny
# cat /dev/null
cat: /dev/null: Operation not permitted
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 r" >devices.allow
# cat /dev/null
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rw" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rwm" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
#
Note that I didn't rename the functions/variables in this patch, but in the
next one to make reviewing easier.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function cleans all the items in a whitelist and will be used by the next
patches.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
deny_all will determine if the default policy is to deny all device access
unless for the ones in the exception list.
This variable will be used in the next patches to convert device_cgroup
internally into a default policy + rules.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: James Morris <jmorris@namei.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Pull security subsystem updates from James Morris:
"Highlights:
- Integrity: add local fs integrity verification to detect offline
attacks
- Integrity: add digital signature verification
- Simple stacking of Yama with other LSMs (per LSS discussions)
- IBM vTPM support on ppc64
- Add new driver for Infineon I2C TIS TPM
- Smack: add rule revocation for subject labels"
Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
Documentation: Update git repository URL for Smack userland tools
ima: change flags container data type
Smack: setprocattr memory leak fix
Smack: implement revoking all rules for a subject label
Smack: remove task_wait() hook.
ima: audit log hashes
ima: generic IMA action flag handling
ima: rename ima_must_appraise_or_measure
audit: export audit_log_task_info
tpm: fix tpm_acpi sparse warning on different address spaces
samples/seccomp: fix 31 bit build on s390
ima: digital signature verification support
ima: add support for different security.ima data types
ima: add ima_inode_setxattr/removexattr function and calls
ima: add inode_post_setattr call
ima: replace iint spinblock with rwlock/read_lock
ima: allocating iint improvements
ima: add appraise action keywords and default rules
ima: integrity appraisal extension
vfs: move ima_file_free before releasing the file
...
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Use keyring_alloc() to create special keyrings now that it has a permissions
parameter rather than using key_alloc() + key_instantiate_and_link().
Also document and export keyring_alloc() so that modules can use it too.
Signed-off-by: David Howells <dhowells@redhat.com>
Reduce the initial permissions on new keys to grant the possessor everything,
view permission only to the user (so the keys can be seen in /proc/keys) and
nothing else.
This gives the creator a chance to adjust the permissions mask before other
processes can access the new key or create a link to it.
To aid with this, keyring_alloc() now takes a permission argument rather than
setting the permissions itself.
The following permissions are now set:
(1) The user and user-session keyrings grant the user that owns them full
permissions and grant a possessor everything bar SETATTR.
(2) The process and thread keyrings grant the possessor full permissions but
only grant the user VIEW. This permits the user to see them in
/proc/keys, but not to do anything with them.
(3) Anonymous session keyrings grant the possessor full permissions, but only
grant the user VIEW and READ. This means that the user can see them in
/proc/keys and can list them, but nothing else. Possibly READ shouldn't
be provided either.
(4) Named session keyrings grant everything an anonymous session keyring does,
plus they grant the user LINK permission. The whole point of named
session keyrings is that others can also subscribe to them. Possibly this
should be a separate permission to LINK.
(5) The temporary session keyring created by call_sbin_request_key() gets the
same permissions as an anonymous session keyring.
(6) Keys created by add_key() get VIEW, SEARCH, LINK and SETATTR for the
possessor, plus READ and/or WRITE if the key type supports them. The used
only gets VIEW now.
(7) Keys created by request_key() now get the same as those created by
add_key().
Reported-by: Lennart Poettering <lennart@poettering.net>
Reported-by: Stef Walter <stefw@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Make the session keyring per-thread rather than per-process, but still
inherited from the parent thread to solve a problem with PAM and gdm.
The problem is that join_session_keyring() will reject attempts to change the
session keyring of a multithreaded program but gdm is now multithreaded before
it gets to the point of starting PAM and running pam_keyinit to create the
session keyring. See:
https://bugs.freedesktop.org/show_bug.cgi?id=49211
The reason that join_session_keyring() will only change the session keyring
under a single-threaded environment is that it's hard to alter the other
thread's credentials to effect the change in a multi-threaded program. The
problems are such as:
(1) How to prevent two threads both running join_session_keyring() from
racing.
(2) Another thread's credentials may not be modified directly by this process.
(3) The number of threads is uncertain whilst we're not holding the
appropriate spinlock, making preallocation slightly tricky.
(4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
another thread to replace its keyring, but that means preallocating for
each thread.
A reasonable way around this is to make the session keyring per-thread rather
than per-process and just document that if you want a common session keyring,
you must get it before you spawn any threads - which is the current situation
anyway.
Whilst we're at it, we can the process keyring behave in the same way. This
means we can clean up some of the ickyness in the creds code.
Basically, after this patch, the session, process and thread keyrings are about
inheritance rules only and not about sharing changes of keyring.
Reported-by: Mantas M. <grawity@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ray Strode <rstrode@redhat.com>
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
Pull cgroup hierarchy update from Tejun Heo:
"Currently, different cgroup subsystems handle nested cgroups
completely differently. There's no consistency among subsystems and
the behaviors often are outright broken.
People at least seem to agree that the broken hierarhcy behaviors need
to be weeded out if any progress is gonna be made on this front and
that the fallouts from deprecating the broken behaviors should be
acceptable especially given that the current behaviors don't make much
sense when nested.
This patch makes cgroup emit warning messages if cgroups for
subsystems with broken hierarchy behavior are nested to prepare for
fixing them in the future. This was put in a separate branch because
more related changes were expected (didn't make it this round) and the
memory cgroup wanted to pull in this and make changes on top."
* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.
* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.
* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.
These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.
* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.
All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.
* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.
* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.
There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."
Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.
Tejun pointed out a few of them, I fixed a couple more.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...
Pull core kernel fixes from Ingo Molnar:
"This is a complex task_work series from Oleg that fixes the bug that
this VFS commit tried to fix:
d35abdb288 hold task_lock around checks in keyctl
but solves the problem without the lockup regression that d35abdb288
introduced in v3.6.
This series came late in v3.6 and I did not feel confident about it so
late in the cycle. Might be worth backporting to -stable if it proves
itself upstream."
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
task_work: Simplify the usage in ptrace_notify() and get_signal_to_deliver()
task_work: Revert "hold task_lock around checks in keyctl"
task_work: task_work_add() should not succeed after exit_task_work()
task_work: Make task_work_add() lockless
Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
On an error iov may still have been reallocated and need freeing
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
We set ret to NULL then test it. Remove the bogus test
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s
LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI
u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT
xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU
i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF
GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk=
=0v2l
-----END PGP SIGNATURE-----
Merge tag 'v3.6-rc7' into next
Linux 3.6-rc7
Requested by David Howells so he can merge his key susbsystem work into
my tree with requisite -linus changesets.
iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.
tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't make the security modules deal with raw user space uid and
gids instead pass in a kuid_t and a kgid_t so that security modules
only have to deal with internal kernel uids and gids.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <james.l.morris@oracle.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Use kuid's in the IMA rules.
When reporting the current uid in audit logs use from_kuid
to get a usable value.
Cc: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
IMA audit hashes patches introduced new IMA flags and required
space went beyond 8 bits. Currently the only flag is IMA_DIGSIG.
This patch use 16 bit short instead of 8 bit char.
Without this fix IMA signature will be replaced with hash, which
should not happen.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
When a policy is inserted or deleted, all dst should be recalculated.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The data structure allocations being done in prepare_creds
are duplicated in smack_setprocattr. This results in the
structure allocated in prepare_creds being orphaned and
never freed. The duplicate code is removed from
smack_setprocattr.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Add /smack/revoke-subject special file. Writing a SMACK label to this file will
set the access to '-' for all access rules with that subject label.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
On 12/20/2011 11:20 PM, Jarkko Sakkinen wrote:
> Allow SIGCHLD to be passed to child process without
> explicit policy. This will help to keep the access
> control policy simple and easily maintainable with
> complex applications that require use of multiple
> security contexts. It will also help to keep them
> as isolated as possible.
>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
I have a slightly different version that applies to the
current smack-next tree.
Allow SIGCHLD to be passed to child process without
explicit policy. This will help to keep the access
control policy simple and easily maintainable with
complex applications that require use of multiple
security contexts. It will also help to keep them
as isolated as possible.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
security/smack/smack_lsm.c | 37 ++++++++-----------------------------
1 files changed, 8 insertions(+), 29 deletions(-)
Currently, cgroup hierarchy support is a mess. cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children. blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup. Others show yet different behaviors.
These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.
Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy. Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.
This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support. The goal of
this patch is two-fold.
* Move users away from using hierarchy on currently non-hierarchical
subsystems, so that implementing proper hierarchy support on those
doesn't surprise them.
* Keep track of which controllers are broken how and nudge the
subsystems to implement proper hierarchy support.
For now, start with a single warning message. We can whine louder
later on.
v2: Fixed a typo spotted by Michal. Warning message updated.
v3: Updated memcg part so that it doesn't generate warning in the
cases where .use_hierarchy=false doesn't make the behavior
different from root.use_hierarchy=true. Fixed a typo spotted by
Glauber.
v4: Check ->broken_hierarchy after cgroup creation is complete so that
->create() can affect the result per Michal. Dropped unnecessary
memcg root handling per Michal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
- Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
- Use from_kuid to generate key descriptions
- Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
- Avoid potential problems with file descriptor passing by displaying
keys in the user namespace of the opener of key status proc files.
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@linux-nfs.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This adds an 'audit' policy action which audit logs file measurements.
Changelog v6:
- use new action flag handling (Dmitry Kasatkin).
- removed whitespace (Mimi)
Changelog v5:
- use audit_log_untrustedstring.
Changelog v4:
- cleanup digest -> hash conversion.
- use filename rather than d_path in ima_audit_measurement.
Changelog v3:
- Use newly exported audit_log_task_info for logging pid/ppid/uid/etc.
- Update the ima_policy ABI documentation.
Changelog v2:
- Use 'audit' action rather than 'measure_and_audit' to permit
auditing in the absence of measuring..
Changelog v1:
- Initial posting.
Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Make the IMA action flag handling generic in order to support
additional new actions, without requiring changes to the base
implementation. New actions, like audit logging, will only
need to modify the define statements.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This reverts commit d35abdb288.
task_lock() was added to ensure exit_mm() and thus exit_task_work() is
not possible before task_work_add().
This is wrong, task_lock() must not be nested with write_lock(tasklist).
And this is no longer needed, task_work_add() now fails if it is called
after exit_task_work().
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120826191214.GA4231@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called. This is done with the
provision of two new key type operations:
int (*preparse)(struct key_preparsed_payload *prep);
void (*free_preparse)(struct key_preparsed_payload *prep);
If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases). The second operation is called to clean up if the first
was called.
preparse() is given the opportunity to fill in the following structure:
struct key_preparsed_payload {
char *description;
void *type_data[2];
void *payload;
const void *data;
size_t datalen;
size_t quotalen;
};
Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.
The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.
The preparser may also propose a description for the key by attaching it as a
string to the description field. This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function. This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.
This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.
The instantiate() and update() operations are then modified to look like this:
int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
int (*update)(struct key *key, struct key_preparsed_payload *prep);
and the new payload data is passed in *prep, whether or not it was preparsed.
Signed-off-by: David Howells <dhowells@redhat.com>
When AUDIT action support is added to the IMA,
ima_must_appraise_or_measure() does not reflect the real meaning anymore.
Rename it to ima_get_action().
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).
Suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.
This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).
Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for digital signature based integrity appraisal.
With this patch, 'security.ima' contains either the file data hash or
a digital signature of the file data hash. The file data hash provides
the security attribute of file integrity. In addition to file integrity,
a digital signature provides the security attribute of authenticity.
Unlike EVM, when the file metadata changes, the digital signature is
replaced with an HMAC, modification of the file data does not cause the
'security.ima' digital signature to be replaced with a hash. As a
result, after any modification, subsequent file integrity appraisals
would fail.
Although digitally signed files can be modified, but by not updating
'security.ima' to reflect these modifications, in essence digitally
signed files could be considered 'immutable'.
IMA uses a different keyring than EVM. While the EVM keyring should not
be updated after initialization and locked, the IMA keyring should allow
updating or adding new keys when upgrading or installing packages.
Changelog v4:
- Change IMA_DIGSIG to hex equivalent
Changelog v3:
- Permit files without any 'security.ima' xattr to be labeled properly.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
IMA-appraisal currently verifies the integrity of a file based on a
known 'good' measurement value. This patch reserves the first byte
of 'security.ima' as a place holder for the type of method used for
verifying file data integrity.
Changelog v1:
- Use the newly defined 'struct evm_ima_xattr_data'
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@nokia.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Based on xattr_permission comments, the restriction to modify 'security'
xattr is left up to the underlying fs or lsm. Ensure that not just anyone
can modify or remove 'security.ima'.
Changelog v1:
- Unless IMA-APPRAISE is configured, use stub ima_inode_removexattr()/setxattr()
functions. (Moved ima_inode_removexattr()/setxattr() to ima_appraise.c)
Changelog:
- take i_mutex to fix locking (Dmitry Kasatkin)
- ima_reset_appraise_flags should only be called when modifying or
removing the 'security.ima' xattr. Requires CAP_SYS_ADMIN privilege.
(Incorporated fix from Roberto Sassu)
- Even if allowed to update security.ima, reset the appraisal flags,
forcing re-appraisal.
- Replace CAP_MAC_ADMIN with CAP_SYS_ADMIN
- static inline ima_inode_setxattr()/ima_inode_removexattr() stubs
- ima_protect_xattr should be static
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
For performance, replace the iint spinlock with rwlock/read_lock.
Eric Paris questioned this change, from spinlocks to rwlocks, saying
"rwlocks have been shown to actually be slower on multi processor
systems in a number of cases due to the cache line bouncing required."
Based on performance measurements compiling the kernel on a cold
boot with multiple jobs with/without this patch, Dmitry Kasatkin
and I found that rwlocks performed better than spinlocks, but very
insignificantly. For example with total compilation time around 6
minutes, with rwlocks time was 1 - 3 seconds shorter... but always
like that.
Changelog v2:
- new patch taken from the 'allocating iint improvements' patch
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
With IMA-appraisal's removal of the iint mutex and taking the i_mutex
instead, allocating the iint becomes a lot simplier, as we don't need
to be concerned with two processes racing to allocate the iint. This
patch cleans up and improves performance for allocating the iint.
- removed redundant double i_mutex locking
- combined iint allocation with tree search
Changelog v2:
- removed the rwlock/read_lock changes from this patch
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Unlike the IMA measurement policy, the appraise policy can not be dependent
on runtime process information, such as the task uid, as the 'security.ima'
xattr is written on file close and must be updated each time the file changes,
regardless of the current task uid.
This patch extends the policy language with 'fowner', defines an appraise
policy, which appraises all files owned by root, and defines 'ima_appraise_tcb',
a new boot command line option, to enable the appraise policy.
Changelog v3:
- separate the measure from the appraise rules in order to support measuring
without appraising and appraising without measuring.
- change appraisal default for filesystems without xattr support to fail
- update default appraise policy for cgroups
Changelog v1:
- don't appraise RAMFS (Dmitry Kasatkin)
- merged rest of "ima: ima_must_appraise_or_measure API change" commit
(Dmtiry Kasatkin)
ima_must_appraise_or_measure() called ima_match_policy twice, which
searched the policy for a matching rule. Once for a matching measurement
rule and subsequently for an appraisal rule. Searching the policy twice
is unnecessary overhead, which could be noticeable with a large policy.
The new version of ima_must_appraise_or_measure() does everything in a
single iteration using a new version of ima_match_policy(). It returns
IMA_MEASURE, IMA_APPRAISE mask.
With the use of action mask only one efficient matching function
is enough. Removed other specific versions of matching functions.
Changelog:
- change 'owner' to 'fowner' to conform to the new LSM conditions posted by
Roberto Sassu.
- fix calls to ima_log_string()
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
IMA currently maintains an integrity measurement list used to assert the
integrity of the running system to a third party. The IMA-appraisal
extension adds local integrity validation and enforcement of the
measurement against a "good" value stored as an extended attribute
'security.ima'. The initial methods for validating 'security.ima' are
hashed based, which provides file data integrity, and digital signature
based, which in addition to providing file data integrity, provides
authenticity.
This patch creates and maintains the 'security.ima' xattr, containing
the file data hash measurement. Protection of the xattr is provided by
EVM, if enabled and configured.
Based on policy, IMA calls evm_verifyxattr() to verify a file's metadata
integrity and, assuming success, compares the file's current hash value
with the one stored as an extended attribute in 'security.ima'.
Changelov v4:
- changed iint cache flags to hex values
Changelog v3:
- change appraisal default for filesystems without xattr support to fail
Changelog v2:
- fix audit msg 'res' value
- removed unused 'ima_appraise=' values
Changelog v1:
- removed unused iint mutex (Dmitry Kasatkin)
- setattr hook must not reset appraised (Dmitry Kasatkin)
- evm_verifyxattr() now differentiates between no 'security.evm' xattr
(INTEGRITY_NOLABEL) and no EVM 'protected' xattrs included in the
'security.evm' (INTEGRITY_NOXATTRS).
- replace hash_status with ima_status (Dmitry Kasatkin)
- re-initialize slab element ima_status on free (Dmitry Kasatkin)
- include 'security.ima' in EVM if CONFIG_IMA_APPRAISE, not CONFIG_IMA
- merged half "ima: ima_must_appraise_or_measure API change" (Dmitry Kasatkin)
- removed unnecessary error variable in process_measurement() (Dmitry Kasatkin)
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured
(moved ima_inode_post_setattr() to ima_appraise.c)
- make sure ima_collect_measurement() can read file
Changelog:
- add 'iint' to evm_verifyxattr() call (Dimitry Kasatkin)
- fix the race condition between chmod, which takes the i_mutex and then
iint->mutex, and ima_file_free() and process_measurement(), which take
the locks in the reverse order, by eliminating iint->mutex. (Dmitry Kasatkin)
- cleanup of ima_appraise_measurement() (Dmitry Kasatkin)
- changes as a result of the iint not allocated for all regular files, but
only for those measured/appraised.
- don't try to appraise new/empty files
- expanded ima_appraisal description in ima/Kconfig
- IMA appraise definitions required even if IMA_APPRAISE not enabled
- add return value to ima_must_appraise() stub
- unconditionally set status = INTEGRITY_PASS *after* testing status,
not before. (Found by Joe Perches)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
When running a 64-bit kernel and receiving prctls from a 32-bit
userspace, the "-1" used as an unsigned long will end up being
misdetected. The kernel is looking for 0xffffffffffffffff instead of
0xffffffff. Since prctl lacks a distinct compat interface, Yama needs
to handle this translation itself. As such, support either value as
meaning PR_SET_PTRACER_ANY, to avoid breaking the ABI for 64-bit.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Unconditionally call Yama when CONFIG_SECURITY_YAMA_STACKED is selected,
no matter what LSM module is primary.
Ubuntu and Chrome OS already carry patches to do this, and Fedora
has voiced interest in doing this as well. Instead of having multiple
distributions (or LSM authors) carrying these patches, just allow Yama
to be called unconditionally when selected by the new CONFIG.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Commit 4fdef2183e ("AppArmor: Cleanup make
file to remove cruft and make it easier to read") removed all traces of
af_names.h from the tree. Remove its entry in AppArmor's .gitignore file
too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Move the tpm_get_random api from the trusted keys code into the TPM
device driver itself so that other callers can make use of it. Also,
change the api slightly so that the number of bytes read is returned in
the call, since the TPM command can potentially return fewer bytes than
requested.
Acked-by: David Safford <safford@linux.vnet.ibm.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
convert all users to system[_freezable]_wq.
If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
Please use system[_freezable]_wq instead.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
The core ptrace access checking routine holds a task lock, and when
reporting a failure, Yama takes a separate task lock. To avoid a
potential deadlock with two ptracers taking the opposite locks, do not
use get_task_comm() and just use ->comm directly since accuracy is not
important for the report.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The higher ptrace restriction levels should be blocking even
PTRACE_TRACEME requests. The comments in the LSM documentation are
misleading about when the checks happen (the parent does not go through
security_ptrace_access_check() on a PTRACE_TRACEME call).
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # 3.5.x and later
Signed-off-by: James Morris <james.l.morris@oracle.com>
Failing to allocate a cache entry will only harm performance not
correctness. Do not consume valuable reserve pages for something like
that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge Andrew's first set of patches:
"Non-MM patches:
- lots of misc bits
- tree-wide have_clk() cleanups
- quite a lot of printk tweaks. I draw your attention to "printk:
convert the format for KERN_<LEVEL> to a 2 byte pattern" which
looks a bit scary. But afaict it's solid.
- backlight updates
- lib/ feature work (notably the addition and use of memweight())
- checkpatch updates
- rtc updates
- nilfs updates
- fatfs updates (partial, still waiting for acks)
- kdump, proc, fork, IPC, sysctl, taskstats, pps, etc
- new fault-injection feature work"
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
drivers/misc/lkdtm.c: fix missing allocation failure check
lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
fault-injection: add tool to run command with failslab or fail_page_alloc
fault-injection: add selftests for cpu and memory hotplug
powerpc: pSeries reconfig notifier error injection module
memory: memory notifier error injection module
PM: PM notifier error injection module
cpu: rewrite cpu-notifier-error-inject module
fault-injection: notifier error injection
c/r: fcntl: add F_GETOWNER_UIDS option
resource: make sure requested range is included in the root range
include/linux/aio.h: cpp->C conversions
fs: cachefiles: add support for large files in filesystem caching
pps: return PTR_ERR on error in device_create
taskstats: check nla_reserve() return
sysctl: suppress kmemleak messages
ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
ipc: compat: use signed size_t types for msgsnd and msgrcv
ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
ipc: add COMPAT_SHMLBA support
...
When we restore file descriptors we would like them to look exactly as
they were at dumping time.
With help of fcntl it's almost possible, the missing snippet is file
owners UIDs.
To be able to read their values the F_GETOWNER_UIDS is introduced.
This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
returning -EINVAL.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OK, what we have so far is e.g.
setxattr(path, name, whatever, 0, XATTR_REPLACE)
with name being good enough to get through xattr_permission().
Then we reach security_inode_setxattr() with the desired value and size.
Aha. name should begin with "security.selinux", or we won't get that
far in selinux_inode_setxattr(). Suppose we got there and have enough
permissions to relabel that sucker. We call security_context_to_sid()
with value == NULL, size == 0. OK, we want ss_initialized to be non-zero.
I.e. after everything had been set up and running. No problem...
We do 1-byte kmalloc(), zero-length memcpy() (which doesn't oops, even
thought the source is NULL) and put a NUL there. I.e. form an empty
string. string_to_context_struct() is called and looks for the first
':' in there. Not found, -EINVAL we get. OK, security_context_to_sid_core()
has rc == -EINVAL, force == 0, so it silently returns -EINVAL.
All it takes now is not having CAP_MAC_ADMIN and we are fucked.
All right, it might be a different bug (modulo strange code quoted in the
report), but it's real. Easily fixed, AFAICS:
Deal with size == 0, value == NULL case in selinux_inode_setxattr()
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Dave Jones <davej@redhat.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Consider the input case of a rule that consists entirely of non space
symbols followed by a \0. Say 64 + \0
In this case strlen(data) = 64
kzalloc of subject and object are 64 byte objects
sscanfdata, "%s %s %s", subject, ...)
will put 65 bytes into subject.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
<linux/types.h> after including <sys/select.h>. A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.
It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely. The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines. Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.
Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.
Reported-by: Jeff Law <law@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking changes from David S Miller:
1) Remove the ipv4 routing cache. Now lookups go directly into the FIB
trie and use prebuilt routes cached there.
No more garbage collection, no more rDOS attacks on the routing
cache. Instead we now get predictable and consistent performance,
no matter what the pattern of traffic we service.
This has been almost 2 years in the making. Special thanks to
Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
have helped along the way.
I'm sure that with a change of this magnitude there will be some
kind of fallout, but such things ought the be simple to fix at this
point. Luckily I'm not European so I'll be around all of August to
fix things :-)
The major stages of this work here are each fronted by a forced
merge commit whose commit message contains a top-level description
of the motivations and implementation issues.
2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
input.
3) TCP SYN/ACK performance tweaks from Eric Dumazet.
4) Add namespace support for netfilter L4 conntrack helpers, from Gao
Feng.
5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
Yuval Mintz.
6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.
7) Support for connection tracker helpers in userspace, from Pablo
Neira Ayuso.
8) Allow userspace driven TX load balancing functions in TEAM driver,
from Jiri Pirko.
9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
embedded gotos.
10) TCP Small Queues, essentially minimize the amount of TCP data queued
up in the packet scheduler layer. Whereas the existing BQL (Byte
Queue Limits) limits the pkt_sched --> netdevice queuing levels,
this controls the TCP --> pkt_sched queueing levels.
From Eric Dumazet.
11) Reduce the number of get_page/put_page ops done on SKB fragments,
from Alexander Duyck.
12) Implement protection against blind resets in TCP (RFC 5961), from
Eric Dumazet.
13) Support the client side of TCP Fast Open, basically the ability to
send data in the SYN exchange, from Yuchung Cheng.
Basically, the sender queues up data with a sendmsg() call using
MSG_FASTOPEN, then they do the connect() which emits the queued up
fastopen data.
14) Avoid all the problems we get into in TCP when timers or PMTU events
hit a locked socket. The TCP Small Queues changes added a
tcp_release_cb() that allows us to queue work up to the
release_sock() caller, and that's what we use here too. From Eric
Dumazet.
15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
r8169: revert "add byte queue limit support".
ipv4: Change rt->rt_iif encoding.
net: Make skb->skb_iif always track skb->dev
ipv4: Prepare for change of rt->rt_iif encoding.
ipv4: Remove all RTCF_DIRECTSRC handliing.
ipv4: Really ignore ICMP address requests/replies.
decnet: Don't set RTCF_DIRECTSRC.
net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
ipv4: Remove redundant assignment
rds: set correct msg_namelen
openvswitch: potential NULL deref in sample()
tcp: dont drop MTU reduction indications
bnx2x: Add new 57840 device IDs
tcp: avoid oops in tcp_metrics and reset tcpm_stamp
niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
niu: Fix to check for dma mapping errors.
net: Fix references to out-of-scope variables in put_cmsg_compat()
net: ethernet: davinci_emac: add pm_runtime support
net: ethernet: davinci_emac: Remove unnecessary #include
...
Pull security subsystem updates from James Morris:
"Nothing groundbreaking for this kernel, just cleanups and fixes, and a
couple of Smack enhancements."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
Smack: Maintainer Record
Smack: don't show empty rules when /smack/load or /smack/load2 is read
Smack: user access check bounds
Smack: onlycap limits on CAP_MAC_ADMIN
Smack: fix smack_new_inode bogosities
ima: audit is compiled only when enabled
ima: ima_initialized is set only if successful
ima: add policy for pseudo fs
ima: remove unused cleanup functions
ima: free securityfs violations file
ima: use full pathnames in measurement list
security: Fix nommu build.
samples: seccomp: add .gitignore for untracked executables
tpm: check the chip reference before using it
TPM: fix memleak when register hardware fails
TPM: chip disabled state erronously being reported as error
MAINTAINERS: TPM maintainers' contacts update
Merge branches 'next-queue' and 'next' into next
Remove unused code from MPI library
Revert "crypto: GnuPG based MPI lib - additional sources (part 4)"
...
Pull the big VFS changes from Al Viro:
"This one is *big* and changes quite a few things around VFS. What's in there:
- the first of two really major architecture changes - death to open
intents.
The former is finally there; it was very long in making, but with
Miklos getting through really hard and messy final push in
fs/namei.c, we finally have it. Unlike his variant, this one
doesn't introduce struct opendata; what we have instead is
->atomic_open() taking preallocated struct file * and passing
everything via its fields.
Instead of returning struct file *, it returns -E... on error, 0
on success and 1 in "deal with it yourself" case (e.g. symlink
found on server, etc.).
See comments before fs/namei.c:atomic_open(). That made a lot of
goodies finally possible and quite a few are in that pile:
->lookup(), ->d_revalidate() and ->create() do not get struct
nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
flags instead, ->create() gets "do we want it exclusive" flag.
With the introduction of new helper (kern_path_locked()) we are rid
of all struct nameidata instances outside of fs/namei.c; it's still
visible in namei.h, but not for long. Come the next cycle,
declaration will move either to fs/internal.h or to fs/namei.c
itself. [me, miklos, hch]
- The second major change: behaviour of final fput(). Now we have
__fput() done without any locks held by caller *and* not from deep
in call stack.
That obviously lifts a lot of constraints on the locking in there.
Moreover, it's legal now to call fput() from atomic contexts (which
has immediately simplified life for aio.c). We also don't need
anti-recursion logics in __scm_destroy() anymore.
There is a price, though - the damn thing has become partially
asynchronous. For fput() from normal process we are guaranteed
that pending __fput() will be done before the caller returns to
userland, exits or gets stopped for ptrace.
For kernel threads and atomic contexts it's done via
schedule_work(), so theoretically we might need a way to make sure
it's finished; so far only one such place had been found, but there
might be more.
There's flush_delayed_fput() (do all pending __fput()) and there's
__fput_sync() (fput() analog doing __fput() immediately). I hope
we won't need them often; see warnings in fs/file_table.c for
details. [me, based on task_work series from Oleg merged last
cycle]
- sync series from Jan
- large part of "death to sync_supers()" work from Artem; the only
bits missing here are exofs and ext4 ones. As far as I understand,
those are going via the exofs and ext4 trees resp.; once they are
in, we can put ->write_super() to the rest, along with the thread
calling it.
- preparatory bits from unionmount series (from dhowells).
- assorted cleanups and fixes all over the place, as usual.
This is not the last pile for this cycle; there's at least jlayton's
ESTALE work and fsfreeze series (the latter - in dire need of fixes,
so I'm not sure it'll make the cut this cycle). I'll probably throw
symlink/hardlink restrictions stuff from Kees into the next pile, too.
Plus there's a lot of misc patches I hadn't thrown into that one -
it's large enough as it is..."
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
switch dentry_open() to struct path, make it grab references itself
spufs: shift dget/mntget towards dentry_open()
zoran: don't bother with struct file * in zoran_map
ecryptfs: don't reinvent the wheels, please - use struct completion
don't expose I_NEW inodes via dentry->d_inode
tidy up namei.c a bit
unobfuscate follow_up() a bit
ext3: pass custom EOF to generic_file_llseek_size()
ext4: use core vfs llseek code for dir seeks
vfs: allow custom EOF in generic_file_llseek code
vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
vfs: Remove unnecessary flushing of block devices
vfs: Make sys_sync writeout also block device inodes
vfs: Create function for iterating over block devices
vfs: Reorder operations during sys_sync
quota: Move quota syncing to ->sync_fs method
quota: Split dquot_quota_sync() to writeback and cache flushing part
vfs: Move noop_backing_dev_info check from sync into writeback
...
task_work and rcu_head are identical now; merge them (calling the result
struct callback_head, rcu_head #define'd to it), kill separate allocation
in security/keys since we can just use cred->rcu now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
get rid of the only user of ->data; this is _not_ the final variant - in the
end we'll have task_work and rcu_head identical and just use cred->rcu,
at which point the separate allocation will be gone completely.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull SELinux regression fixes from James Morris.
Andrew Morton has a box that hit that open perms problem.
I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit d9914cf661
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
SELinux: do not check open perms if they are not known to policy
SELinux: include definition of new capabilities
When I introduced open perms policy didn't understand them and I
implemented them as a policycap. When I added the checking of open perm
to truncate I forgot to conditionalize it on the userspace defined
policy capability. Running an old policy with a new kernel will not
check open on open(2) but will check it on truncate. Conditionalize the
truncate check the same as the open check.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: James Morris <james.l.morris@oracle.com>
The kernel has added CAP_WAKE_ALARM and CAP_EPOLLWAKEUP. We need to
define these in SELinux so they can be mediated by policy.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch removes empty rules (i.e. with access set to '-') from the
rule list presented to user space.
Smack by design never removes labels nor rules from its lists. Access
for a rule may be set to '-' to effectively disable it. Such rules would
show up in the listing generated when /smack/load or /smack/load2 is
read. This may cause clutter if many rules were disabled.
As a rule with access set to '-' is equivalent to no rule at all, they
may be safely hidden from the listing.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Some of the bounds checking used on the /smack/access
interface was lost when support for long labels was
added. No kernel access checks are affected, however
this is a case where /smack/access could be used
incorrectly and fail to detect the error. This patch
reintroduces the original checks.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Smack is integrated with the POSIX capabilities scheme,
using the capabilities CAP_MAC_OVERRIDE and CAP_MAC_ADMIN to
determine if a process is allowed to ignore Smack checks or
change Smack related data respectively. Smack provides an
additional restriction that if an onlycap value is set
by writing to /smack/onlycap only tasks with that Smack
label are allowed to use CAP_MAC_OVERRIDE.
This change adds CAP_MAC_ADMIN as a capability that is affected
by the onlycap mechanism.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
In January of 2012 Al Viro pointed out three bits of code that
he titled "new_inode_smack bogosities". This patch repairs these
errors.
1. smack_sb_kern_mount() included a NULL check that is impossible.
The check and NULL case are removed.
2. smack_kb_kern_mount() included pointless locking. The locking is
removed. Since this is the only place that lock was used the lock
is removed from the superblock_smack structure.
3. smk_fill_super() incorrectly and unnecessarily set the Smack label
for the smackfs root inode. The assignment has been removed.
Targeted for git://gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
IMA auditing code was compiled even when CONFIG_AUDIT was not enabled.
This patch compiles auditing code only when possible and enabled.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Set ima_initialized only if initialization was successful.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:
security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in
include backing-dev.h directly to fix it up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
IMA cannot be used as module and does not need __exit functions.
Removed them.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The IMA measurement list contains filename hints, which can be
ambigious without the full pathname. This patch replaces the
filename hint with the full pathname, simplifying for userspace
the correlating of file hash measurements with files.
Change log v1:
- Revert to short filenames, when full pathname is longer than IMA
measurement buffer size. (Based on Dmitry's review)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:
security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in
include backing-dev.h directly to fix it up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This patch adds the following structure:
struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};
That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.
I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.
That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.
This patch also adapts all callers to use this new interface.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...
Pull second pile of signal handling patches from Al Viro:
"This one is just task_work_add() series + remaining prereqs for it.
There probably will be another pull request from that tree this
cycle - at least for helpers, to get them out of the way for per-arch
fixes remaining in the tree."
Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
had brought in commit 97fd75b7b8 ("kernel/irq/manage.c: use the
pr_foo() infrastructure to prefix printks") which changed one of the
pr_err() calls that this merge moves around.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
keys: kill task_struct->replacement_session_keyring
keys: kill the dummy key_replace_session_keyring()
keys: change keyctl_session_to_parent() to use task_work_add()
genirq: reimplement exit_irq_thread() hook via task_work_add()
task_work_add: generic process-context callbacks
avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
parisc: need to check NOTIFY_RESUME when exiting from syscall
move key_repace_session_keyring() into tracehook_notify_resume()
TIF_NOTIFY_RESUME is defined on all targets now
A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.
Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to. This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.
Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allocation may be large. The code is probing to see if it will
succeed and if not, it falls back to vmalloc(). We should suppress any
page-allocation failure messages when the fallback happens.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a) %d does _not_ produce a page worth of output
b) snprintf() doesn't return negatives - it used to in old glibc, but
that's the kernel...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix some sparse warnings in the keyrings code:
(1) compat_keyctl_instantiate_key_iov() should be static.
(2) There were a couple of places where a pointer was being compared against
integer 0 rather than NULL.
(3) keyctl_instantiate_key_common() should not take a __user-labelled iovec
pointer as the caller must have copied the iovec to kernel space.
(4) __key_link_begin() takes and __key_link_end() releases
keyring_serialise_link_sem under some circumstances and so this should be
declared.
Note that adding __acquires() and __releases() for this doesn't help cure
the warnings messages - something only commenting out both helps.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Change keyctl_session_to_parent() to use task_work_add() and move
key_replace_session_keyring() logic into task_work->func().
Note that we do task_work_cancel() before task_work_add() to ensure that
only one work can be pending at any time. This is important, we must not
allow user-space to abuse the parent's ->task_works list.
The callback, replace_session_keyring(), checks PF_EXITING. I guess this
is not really needed but looks better.
As a side effect, this fixes the (unlikely) race. The callers of
key_replace_session_keyring() and keyctl_session_to_parent() lack the
necessary barriers, the parent can miss the request.
Now we can remove task_struct->replacement_session_keyring and related
code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
Pull cgroup updates from Tejun Heo:
"cgroup file type addition / removal is updated so that file types are
added and removed instead of individual files so that dynamic file
type addition / removal can be implemented by cgroup and used by
controllers. blkio controller changes which will come through block
tree are dependent on this. Other changes include res_counter cleanup
and disallowing kthread / PF_THREAD_BOUND threads to be attached to
non-root cgroups.
There's a reported bug with the file type addition / removal handling
which can lead to oops on cgroup umount. The issue is being looked
into. It shouldn't cause problems for most setups and isn't a
security concern."
Fix up trivial conflict in Documentation/feature-removal-schedule.txt
* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
res_counter: Account max_usage when calling res_counter_charge_nofail()
res_counter: Merge res_counter_charge and res_counter_charge_nofail
cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
cgroup: remove cgroup_subsys->populate()
cgroup: get rid of populate for memcg
cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
cgroup: make css->refcnt clearing on cgroup removal optional
cgroup: use negative bias on css->refcnt to block css_tryget()
cgroup: implement cgroup_rm_cftypes()
cgroup: introduce struct cfent
cgroup: relocate __d_cgrp() and __d_cft()
cgroup: remove cgroup_add_file[s]()
cgroup: convert memcg controller to the new cftype interface
memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
cgroup: convert all non-memcg controllers to the new cftype interface
cgroup: relocate cftype and cgroup_subsys definitions in controllers
cgroup: merge cft_release_agent cftype array into the base files array
cgroup: implement cgroup_add_cftypes() and friends
cgroup: build list of all cgroups under a given cgroupfs_root
cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
...
Pull security subsystem updates from James Morris:
"New notable features:
- The seccomp work from Will Drewry
- PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
- Longer security labels for Smack from Casey Schaufler
- Additional ptrace restriction modes for Yama by Kees Cook"
Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
apparmor: fix long path failure due to disconnected path
apparmor: fix profile lookup for unconfined
ima: fix filename hint to reflect script interpreter name
KEYS: Don't check for NULL key pointer in key_validate()
Smack: allow for significantly longer Smack labels v4
gfp flags for security_inode_alloc()?
Smack: recursive tramsmute
Yama: replace capable() with ns_capable()
TOMOYO: Accept manager programs which do not start with / .
KEYS: Add invalidation support
KEYS: Do LRU discard in full keyrings
KEYS: Permit in-place link replacement in keyring list
KEYS: Perform RCU synchronisation on keys prior to key destruction
KEYS: Announce key type (un)registration
KEYS: Reorganise keys Makefile
KEYS: Move the key config into security/keys/Kconfig
KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
Yama: remove an unused variable
samples/seccomp: fix dependencies on arch macros
Yama: add additional ptrace scopes
...
BugLink: http://bugs.launchpad.net/bugs/955892
All failures from __d_path where being treated as disconnected paths,
however __d_path can also fail when the generated pathname is too long.
The initial ENAMETOOLONG error was being lost, and ENAMETOOLONG was only
returned if the subsequent dentry_path call resulted in that error. Other
wise if the path was split across a mount point such that the dentry_path
fit within the buffer when the __d_path did not the failure was treated
as a disconnected path.
Signed-off-by: John Johansen <john.johansen@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/978038
also affects apparmor portion of
BugLink: http://bugs.launchpad.net/bugs/987371
The unconfined profile is not stored in the regular profile list, but
change_profile and exec transitions may want access to it when setting
up specialized transitions like switch to the unconfined profile of a
new policy namespace.
Signed-off-by: John Johansen <john.johansen@canonical.com>
When IMA was first upstreamed, the bprm filename and interp were
always the same. Currently, the bprm->filename and bprm->interp
are the same, except for when only bprm->interp contains the
interpreter name. So instead of using the bprm->filename as
the IMA filename hint in the measurement list, we could replace
it with bprm->interp, but this feels too fragil.
The following patch is not much better, but at least there is some
indication that sometimes we're passing the filename and other times
the interpreter name.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Don't bother checking for NULL key pointer in key_validate() as all of the
places that call it will crash anyway if the relevant key pointer is NULL by
the time they call key_validate(). Therefore, the checking must be done prior
to calling here.
Whilst we're at it, simplify the key_validate() function a bit and mark its
argument const.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
V4 updated to current linux-security#next
Targeted for git://gitorious.org/smack-next/kernel.git
Modern application runtime environments like to use
naming schemes that are structured and generated without
human intervention. Even though the Smack limit of 23
characters for a label name is perfectly rational for
human use there have been complaints that the limit is
a problem in environments where names are composed from
a set or sources, including vendor, author, distribution
channel and application name. Names like
softwarehouse-pgwodehouse-coolappstore-mellowmuskrats
are becoming harder to avoid. This patch introduces long
label support in Smack. Labels are now limited to 255
characters instead of the old 23.
The primary reason for limiting the labels to 23 characters
was so they could be directly contained in CIPSO category sets.
This is still done were possible, but for labels that are too
large a mapping is required. This is perfectly safe for communication
that stays "on the box" and doesn't require much coordination
between boxes beyond what would have been required to keep label
names consistent.
The bulk of this patch is in smackfs, adding and updating
administrative interfaces. Because existing APIs can't be
changed new ones that do much the same things as old ones
have been introduced.
The Smack specific CIPSO data representation has been removed
and replaced with the data format used by netlabel. The CIPSO
header is now computed when a label is imported rather than
on use. This results in improved IP performance. The smack
label is now allocated separately from the containing structure,
allowing for larger strings.
Four new /smack interfaces have been introduced as four
of the old interfaces strictly required labels be specified
in fixed length arrays.
The access interface is supplemented with the check interface:
access "Subject Object rwxat"
access2 "Subject Object rwaxt"
The load interface is supplemented with the rules interface:
load "Subject Object rwxat"
load2 "Subject Object rwaxt"
The load-self interface is supplemented with the self-rules interface:
load-self "Subject Object rwxat"
load-self2 "Subject Object rwaxt"
The cipso interface is supplemented with the wire interface:
cipso "Subject lvl cnt c1 c2 ..."
cipso2 "Subject lvl cnt c1 c2 ..."
The old interfaces are maintained for compatibility.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Dave Chinner wrote:
> Yes, because you have no idea what the calling context is except
> for the fact that is from somewhere inside filesystem code and the
> filesystem could be holding locks. Therefore, GFP_NOFS is really the
> only really safe way to allocate memory here.
I see. Thank you.
I'm not sure, but can call trace happen where somewhere inside network
filesystem or stackable filesystem code with locks held invokes operations that
involves GFP_KENREL memory allocation outside that filesystem?
----------
[PATCH] SMACK: Fix incorrect GFP_KERNEL usage.
new_inode_smack() which can be called from smack_inode_alloc_security() needs
to use GFP_NOFS like SELinux's inode_alloc_security() does, for
security_inode_alloc() is called from inode_init_always() and
inode_init_always() is called from xfs_inode_alloc() which is using GFP_NOFS.
smack_inode_init_security() needs to use GFP_NOFS like
selinux_inode_init_security() does, for initxattrs() callback function (e.g.
btrfs_initxattrs()) which is called from security_inode_init_security() is
using GFP_NOFS.
smack_audit_rule_match() needs to use GFP_ATOMIC, for
security_audit_rule_match() can be called from audit_filter_user_rules() and
audit_filter_user_rules() is called from audit_filter_user() with RCU read lock
held.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <cschaufler@cschaufler-intel.(none)>
The transmuting directory feature of Smack requires that
the transmuting attribute be explicitly set in all cases.
It seems the users of this facility would expect that the
transmuting attribute be inherited by subdirectories that
are created in a transmuting directory. This does not seem
to add any additional complexity to the understanding of
how the system works.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
When checking capabilities, the question we want to be asking is "does
current() have the capability in the child's namespace?"
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The pathname of /usr/sbin/tomoyo-editpolicy seen from Ubuntu 12.04 Live CD is
squashfs:/usr/sbin/tomoyo-editpolicy rather than /usr/sbin/tomoyo-editpolicy .
Therefore, we need to accept manager programs which do not start with / .
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.
It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.
To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict. It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.
The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.
Signed-off-by: David Howells <dhowells@redhat.com>
Do an LRU discard in keyrings that are full rather than returning ENFILE. To
perform this, a time_t is added to the key struct and updated by the creation
of a link to a key and by a key being found as the result of a search. At the
completion of a successful search, the keyrings in the path between the root of
the search and the first found link to it also have their last-used times
updated.
Note that discarding a link to a key from a keyring does not necessarily
destroy the key as there may be references held by other places.
An alternate discard method that might suffice is to perform FIFO discard from
the keyring, using the spare 2-byte hole in the keylist header as the index of
the next link to be discarded.
This is useful when using a keyring as a cache for DNS results or foreign
filesystem IDs.
This can be tested by the following. As root do:
echo 1000 >/proc/sys/kernel/keys/root_maxkeys
kr=`keyctl newring foo @s`
for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done
Without this patch ENFILE should be reported when the keyring fills up. With
this patch, the keyring discards keys in an LRU fashion. Note that the stored
LRU time has a granularity of 1s.
After doing this, /proc/key-users can be observed and should show that most of
the 2000 keys have been discarded:
[root@andromeda ~]# cat /proc/key-users
0: 517 516/516 513/1000 5249/20000
The "513/1000" here is the number of quota-accounted keys present for this user
out of the maximum permitted.
In /proc/keys, the keyring shows the number of keys it has and the number of
slots it has allocated:
[root@andromeda ~]# grep foo /proc/keys
200c64c4 I--Q-- 1 perm 3b3f0000 0 0 keyring foo: 509/509
The maximum is (PAGE_SIZE - header) / key pointer size. That's typically 509
on a 64-bit system and 1020 on a 32-bit system.
Signed-off-by: David Howells <dhowells@redhat.com>
Make use of the previous patch that makes the garbage collector perform RCU
synchronisation before destroying defunct keys. Key pointers can now be
replaced in-place without creating a new keyring payload and replacing the
whole thing as the discarded keys will not be destroyed until all currently
held RCU read locks are released.
If the keyring payload space needs to be expanded or contracted, then a
replacement will still need allocating, and the original will still have to be
freed by RCU.
Signed-off-by: David Howells <dhowells@redhat.com>
Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count. This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.
This is useful in keyring search vs key link. Consider a keyring containing a
link to a key. That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.
This permits __key_link() to replace a key without having to reallocate the key
payload. A key gets replaced if a new key being linked into a keyring has the
same type and description.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Announce the (un)registration of a key type in the core key code rather than
in the callers.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Reorganise the keys directory Makefile to put all the core bits together and
the type-specific bits after.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Move the key config into security/keys/Kconfig as there are going to be a lot
of key-related options.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.
This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.
Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPnb50AAoJEHm+PkMAQRiGAE0H/A4zFZIUGmF3miKPDYmejmrZ
oVDYxVAu6JHjHWhu8E3VsinvyVscowjV8dr15eSaQzmDmRkSHAnUQ+dB7Di7jLC2
MNopxsWjwyZ8zvvr3rFR76kjbWKk/1GYytnf7GPZLbJQzd51om2V/TY/6qkwiDSX
U8Tt7ihSgHAezefqEmWp2X/1pxDCEt+VFyn9vWpkhgdfM1iuzF39MbxSZAgqDQ/9
JJrBHFXhArqJguhENwL7OdDzkYqkdzlGtS0xgeY7qio2CzSXxZXK4svT6FFGA8Za
xlAaIvzslDniv3vR2ZKd6wzUwFHuynX222hNim3QMaYdXm012M+Nn1ufKYGFxI0=
=4d4w
-----END PGP SIGNATURE-----
Merge tag 'v3.4-rc5' into next
Linux 3.4-rc5
Merge to pull in prerequisite change for Smack:
86812bb0de
Requested by Casey.
- Use uid_eq when comparing kuids
Use gid_eq when comparing kgids
- Use make_kuid(user_ns, 0) to talk about the user_namespace root uid
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
cred.h and a few trivial users of struct cred are changed. The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable. If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values. Unless user namespaces are used this change should
have no effect.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
- Transform userns->creator from a user_struct reference to a simple
kuid_t, kgid_t pair.
In cap_capable this allows the check to see if we are the creator of
a namespace to become the classic suser style euid permission check.
This allows us to remove the need for a struct cred in the mapping
functions and still be able to dispaly the user namespace creators
uid and gid as 0.
- Remove the now unnecessary delayed_work in free_user_ns.
All that is left for free_user_ns to do is to call kmem_cache_free
and put_user_ns. Those functions can be called in any context
so call them directly from free_user_ns removing the need for delayed work.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
GCC complains that we don't use "one" any more after 389da25f93 "Yama:
add additional ptrace scopes".
security/yama/yama_lsm.c:322:12: warning: ?one? defined but not used
[-Wunused-variable]
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This expands the available Yama ptrace restrictions to include two more
modes. Mode 2 requires CAP_SYS_PTRACE for PTRACE_ATTACH, and mode 3
completely disables PTRACE_ATTACH (and locks the sysctl).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
A kernel with Smack enabled will fail if tmpfs has xattr support.
Move the initialization of predefined Smack label
list entries to the LSM initialization from the
smackfs setup. This became an issue when tmpfs
acquired xattr support, but was never correct.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Add support for AppArmor to explicitly fail requested domain transitions
if NO_NEW_PRIVS is set and the task is not unconfined.
Transitions from unconfined are still allowed because this always results
in a reduction of privileges.
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
v18: new acked-by, new description
Signed-off-by: James Morris <james.l.morris@oracle.com>
With this change, calling
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time. For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set. The same is true for file capabilities.
Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.
To determine if the NO_NEW_PRIVS bit is set, a task may call
prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)
This functionality is desired for the proposed seccomp filter patch
series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.
Another potential use is making certain privileged operations
unprivileged. For example, chroot may be considered "safe" if it cannot
affect privileged tasks.
Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use. It is fixed in a subsequent patch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
This fixes builds where CONFIG_AUDIT is not defined and
CONFIG_SECURITY_SMACK=y.
This got introduced by the stack-usage reducation commit 48c62af68a
("LSM: shrink the common_audit_data data union").
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
avc_add_callback now only called from initcalls, so replace the
weak GFP_ATOMIC to GFP_KERNEL, and mark this function __init
to make a warning when not been called from initcalls.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
We no longer need the distinction. We only need data after we decide to do an
audit. So turn the "late" audit data into just "data" and remove what we
currently have as "data".
Signed-off-by: Eric Paris <eparis@redhat.com>
It isn't needed. If you don't set the type of the data associated with
that type it is a pretty obvious programming bug. So why waste the cycles?
Signed-off-by: Eric Paris <eparis@redhat.com>
We did a lot of work to shrink the common_audit_data. Add a BUILD_BUG_ON
so future programers (let's be honest, probably me) won't do something
foolish like make it large again!
Signed-off-by: Eric Paris <eparis@redhat.com>
There are no legitimate users. Always use current and get back some stack
space for the common_audit_data.
Signed-off-by: Eric Paris <eparis@redhat.com>
apparmor is the only LSM that uses the common_audit_data tsk field.
Instead of making all LSMs pay for the stack space move the aa usage into
the apparmor_audit_data.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_inode_has_perm is a hot path. Instead of declaring the
common_audit_data on the stack move it to a noinline function only used in
the rare case we need to send an audit message.
Signed-off-by: Eric Paris <eparis@redhat.com>