mirror-linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	848b81415c	Merge branch 'akpm' (Andrew's patch-bomb) Merge misc patches from Andrew Morton: "Incoming: - lots of misc stuff - backlight tree updates - lib/ updates - Oleg's percpu-rwsem changes - checkpatch - rtc - aoe - more checkpoint/restart support I still have a pile of MM stuff pending - Pekka should be merging later today after which that is good to go. A number of other things are twiddling thumbs awaiting maintainer merges." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits) scatterlist: don't BUG when we can trivially return a proper error. docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output fs, fanotify: add @mflags field to fanotify output docs: add documentation about /proc/<pid>/fdinfo/<fd> output fs, notify: add procfs fdinfo helper fs, exportfs: add exportfs_encode_inode_fh() helper fs, exportfs: escape nil dereference if no s_export_op present fs, epoll: add procfs fdinfo helper fs, eventfd: add procfs fdinfo helper procfs: add ability to plug in auxiliary fdinfo providers tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test breakpoint selftests: print failure status instead of cause make error kcmp selftests: print fail status instead of cause make error kcmp selftests: make run_tests fix mem-hotplug selftests: print failure status instead of cause make error cpu-hotplug selftests: print failure status instead of cause make error mqueue selftests: print failure status instead of cause make error vm selftests: print failure status instead of cause make error ubifs: use prandom_bytes mtd: nandsim: use prandom_bytes ...	2012-12-17 20:58:12 -08:00
Roger Pau Monne	d62f691858	xen-blkfront: handle bvecs with partial data Currently blkfront fails to handle cases in blkif_completion like the following: 1st loop in rq_for_each_segment * bv_offset: 3584 * bv_len: 512 * offset += bv_len * i: 0 2nd loop: * bv_offset: 0 * bv_len: 512 * i: 0 In the second loop i should be 1, since we assume we only wanted to read a part of the previous page. This patches fixes this cases where only a part of the shared page is read, and blkif_completion assumes that if the bv_offset of a bvec is less than the previous bv_offset plus the bv_size we have to switch to the next shared page. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: linux-kernel@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-12-17 21:56:03 -05:00
Roger Pau Monne	ebb351cf78	llist/xen-blkfront: implement safe version of llist_for_each_entry Implement a safe version of llist_for_each_entry, and use it in blkif_free. Previously grants where freed while iterating the list, which lead to dereferences when trying to fetch the next item. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> [v2: Move the llist_for_each_entry_safe in llist.h] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-12-17 21:55:56 -05:00
Dan Carpenter	31279b1457	aoe: fix use after free in aoedev_by_aoeaddr() We should return NULL on failure instead of returning a freed pointer. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:26 -08:00
Ed Cashin	2b37c7d865	aoe: update internal version number to 81 This version number is printed to the console on module initialization and is available in sysfs, which is where the userland aoe-version tool looks for it. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:26 -08:00
Ed Cashin	bf29754ae8	aoe: identify source of runt AoE packets This change only affects experimental AoE storage networks. It modifies the console message about runt packets detected so that the AoE major and minor addresses of the AoE target that generated the runt are mentioned. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:26 -08:00
Ed Cashin	4a6c9ee93c	aoe: allow comma separator in aoe_iflist value By default, the aoe driver uses any ethernet interface for AoE, but the aoe_iflist module parameter provides a convenient way to limit AoE traffic to a specific list of local network interfaces. This change allows a list to be specified using the comma character as a separator. For example, modprobe aoe aoe_iflist=eth2,eth3 Before, it was inconvenient to get the quoting right in shell scripts when setting aoe_iflist to have more than one network interface. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:26 -08:00
Ed Cashin	c450ba0fc1	aoe: allow user to disable target failure timeout With this change, the aoe driver treats the value zero as special for the aoe_deadsecs module parameter. Normally, this value specifies the number of seconds during which the driver will continue to attempt retransmits to an unresponsive AoE target. After aoe_deadsecs has elapsed, the aoe driver marks the aoe device as "down" and fails all I/O. The new meaning of an aoe_deadsecs of zero is for the driver to retransmit commands indefinitely. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	71114ec45f	aoe: use dynamic number of remote ports for AoE storage target Many AoE targets have four or fewer network ports, but some existing storage devices have many, and the AoE protocol sets no limit. This patch allows the use of more than eight remote MAC addresses per AoE target, while reducing the amount of memory used by the aoe driver in cases where there are many AoE targets with fewer than eight MAC addresses each. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	e52a293264	aoe: avoid races between device destruction and discovery This change avoids a race that could result in a NULL pointer derference following a WARNing from kobject_add_internal, "don't try to register things with the same name in the same directory." The problem was found with a test that forgets and discovers an aoe device in a loop: while test ! -r /tmp/stop; do aoe-flush -a aoe-discover done The race was between aoedev_flush taking aoedevs out of the devlist, allowing a new discovery of the same AoE target to take place before the driver gets around to calling sysfs_remove_group. Fixing that one revealed another race between do_open and add_disk, and this patch avoids that, too. The fix required some care, because for flushing (forgetting) an aoedev, some of the steps must be performed under lock and some must be able to sleep. Also, for discovering a new aoedev, some steps might sleep. The check for a bad aoedev pointer remains from a time when about half of this patch was done, and it was possible for the bdev->bd_disk->private_data to become corrupted. The check should be removed eventually, but it is not expected to add significant overhead, occurring in the aoeblk_open routine. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	bbb44e30d0	aoe: improve handling of misbehaving network paths An AoE target can have multiple network ports used for AoE, and in the aoe driver, those are tracked by the aoetgt struct. These changes allow the aoe driver to handle network paths, or aoetgts, that are not working well, compared to the others. Paths that do not get responses despite the retransmission of AoE commands are marked as "tainted", and non-tainted paths are preferred. Meanwhile, the aoe driver attempts to "probe" the tainted path in the background by issuing reads of LBA 0 that are padded out to full (possibly jumbo-frame) size. If the probes get responses, then the path is "redeemed", and its taint is removed. This mechanism has been shown to be helpful in transparently handling and recovering from real-world network "brown outs" in ways that the earlier "shoot the help-needing target in the head" mechanism could not. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	b91316f2b7	aoe: return real minor number for static minors The value returned by the static minor device number number allocator is the real minor number, so it must be multiplied by the supported number of partitions per aoedev. Without this fix the support for systems without udev is incomplete, and the few users of aoe on such systems will have surprising results when device nodes names do not match the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	10935d052e	aoe: initialize sysminor to avoid compiler warning Because the minor_get and related functions use the return values for errors, the compiler doesn't know that sysminor will always either 1) be initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be unused as the "goto out" is executed. This patch avoids the compiler warning. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	e0b2bbab0b	aoe: make error messages more specific in static minor allocation For some special-purpose systems where udev isn't present, static allocation of minor numbers is desirable. This update distinguishes different failure scenarios, to help the user understand what went wrong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	60116cf773	aoe: remove call to request handler from I/O completion There is no need to call the request handler function in the I/O completion routine. The user impact of not doing it is a more "nice" aoe driver that is less susceptible to causing soft lockups. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	72837600ee	aoe: cleanup: correct comment for aoetgt nout A misplaced comment was attached to the nout member of the aoetgt. This change corrects the comment. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	7b6ccc5f97	aoe: increase default cap on outstanding AoE commands in the network The aoe driver will never be waiting for more than aoe_maxout AoE commands from a given remote network port on an AoE target. Increasing the cap increases performance. Users can tighten the setting to reduce the amount of memory used for handling AoE traffic or the network bandwidth used for AoE. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	0a41409c51	aoe: remove vestigial request queue allocation Before the aoe driver was an I/O request handler, it was a make_request-style block driver. Even so, there was a problem where sysfs expected a request queue to exist, so one was provided in commit `7135a71b19` ("aoe: allocate unused request_queue for sysfs"). During the transition to the request-handler style, a patch was merged that was based on a driver without the noop queue, and the noop queue remained in place after the patch was merged, even though a new functional queue was introduced by the patch, allocated through blk_init_queue. The user impact is a memory leak proportional to the number of AoE targets discovered. This patch removes the memory leak and cleans up vestiges of the old do-nothing queue from the aoeblk_gdalloc function. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	fe7252bf51	aoe: copy fallback timing information on destination failover Commit f3b8e07af774 ("aoe: commands in retransmit queue use new destination on failure") omits the copying of the coarse-grained time when an AoE command was sent during the failover from one destination MAC address on the AoE target to another. The coarse-grained timing is only used when the system time changes or an unlikely length of time has passed since the sending of the AoE command. Users will not be impacted unless their system clock is very inaccurate or something unusual (e.g., 10 GbE link reset) happens during the period when the aoe driver is handling the failure of a port on the AoE target. Being effected will mean that an AoE target could be considered "down" too eagerly. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	519b77b032	aoe: update driver-internal version to 64+ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	3fc9b03248	aoe: commands in retransmit queue use new destination on failure When one remote MAC address isn't working as a destination for AoE commands, the frames used to track information associated with the AoE commands are moved to a new aoetgt (defined by the tuple of {AoE major, AoE minor, target MAC address}). This patch makes sure that the frames on the queue for retransmits that need to be done are updated to use the new destination, so that retransmits will be sent through a working network path. Without this change, packets on the retransmit queue will be needlessly retransmitted to the unresponsive destination MAC, possibly causing premature target failure before there's time for the retransmit timer to run again, decide to retransmit again, and finally update the destination to a working MAC address on the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	5f0c9c48e7	aoe: use high-resolution RTTs with fallback to low-res These changes improve the accuracy of the decision about whether it's time to retransmit an AoE command by using the microsecond-resolution gettimeofday instead of jiffies. Because the system time can jump suddenly, the decision reverts to using jiffies if the high-resolution time difference is relatively large. Otherwise the AoE targets could be considered failed inappropriately. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	0d555ecfa4	aoe: manipulate aoedev network stats under lock With this bugfix in place the calculation of the criterion for "lateness" is performed under lock. Without the lock, there is a chance that one of the non-atomic operations performed on the round trip time statistics could be incomplete, such that an incorrect lateness criterion would be calculated. Without this change, the effect of the bug would be rare unecessary but benign retransmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	2292a7e109	aoe: err device: include MAC addresses for unexpected responses The /dev/etherd/err character device provides low-level information about normal but sometimes interesting AoE command retransmits and "unexpected responses", i.e., responses for packets that have already been retransmitted. This change adds MAC addresses to the messages about unexpected responses, so that when they occur, it's more easy to determine the network paths to which they belong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	3a0c40d2d2	aoe: improve network congestion handling The aoe driver already had some congestion handling, but it was limited in its ability to cope with the kind of congestion that can arise on more complex networks such as those involving paths through multiple ethernet switches. Some of the lessons from TCP's history of development can be applied to improving the congestion control and avoidance on AoE storage networks. These changes use familar concepts from Van Jacobson's "Congestion Avoidance and Control" paper from '88, without adding significant overhead. This patch depends on an upcoming patch that covers the failover case when AoE commands being retransmitted are transferred from one retransmit queue to another. Another upcoming patch increases the timing accuracy. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	667be1e757	aoe: provide ATA identify device content to user on request Make the aoe driver follow expected behavior when the user uses ioctl to get the ATA device identify information, allowing access to model, serial number, etc. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	cd220bf51f	aoe: update driver-internal version number to 60 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	a04b41cd2c	aoe: whitespace cleanup Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	d437962504	aoe: cleanup: remove unused ata_scnt function Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	90a2508d01	aoe: "payload" sysfs file exports per-AoE-command data transfer size The userland aoetools package includes an "aoe-stat" command that can display a "payload size" column when the aoe driver exports this information. Users can quickly see what amount of user data is transferred inside each AoE command on the network, network headers excluded. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	aa304fdefa	aoe: support larger I/O requests via aoe_maxsectors module param The GPFS filesystem is an example of an aoe user that requires the aoe driver to support I/O request sizes larger than the default. Most users will not need large I/O request sizes, because they would need to be split up into multiple AoE commands anyway. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4ba9aa7f98	aoe: support the forgetting (flushing) of a user-specified AoE target Users sometimes want to cause the aoe driver to forget a particular previously discovered device when it is no longer online. The aoetools provide an "aoe-flush" command that users run to perform this administrative task. The changes below provide the support needed in the driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	1b8a1636ce	aoe: update cap on outstanding commands based on config query response The ATA over Ethernet config query response contains a "buffer count" field reflecting the AoE target's capacity to buffer incoming AoE commands. By taking the current value of this field into accound, we increase performance throughput or avoid network congestion, when the value has increased or decreased, respectively. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4e78dd144b	aoe: print warning regarding a common reason for dropped transmits Dropped transmits are not common, but when they do occur, increasing the transmit queue length often helps. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	662a889608	aoe: describe the behavior of the "err" character device Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Linus Torvalds	9228ff9038	Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...	2012-12-17 13:39:11 -08:00
Alex Elder	b8f5c6edca	rbd: don't use ENOTSUPP ENOTSUPP is not a standard errno (it shows up as "Unknown error 524" in an error message). This is what was getting produced when the the local rbd code does not implement features required by a discovered rbd image. Change the error code returned in this case to ENXIO. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 12:07:32 -06:00
Alex Elder	2fd82b9e92	rbd: get rid of RBD_MAX_SEG_NAME_LEN RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object name (i.e., one of the objects providing storage backing an rbd image). Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to define the maximum length of any object name in an osd request. Right now they disagree, with RBD_MAX_SEG_NAME_LEN being too big. There's no real benefit at this point to defining the rbd object name length limit separate from any other object name, so just get rid of RBD_MAX_SEG_NAME_LEN and use MAX_OBJ_NAME_SIZE in its place. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 08:37:29 -06:00
Alex Elder	42382b709b	rbd: do not allow remove of mounted-on image There is no check in rbd_remove() to see if anybody holds open the image being removed. That's not cool. Add a simple open count that goes up and down with opens and closes (releases) of the device, and don't allow an rbd image to be removed if the count is non-zero. Protect the updates of the open count value with ctl_mutex to ensure the underlying rbd device doesn't get removed while concurrently being opened. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 08:36:59 -06:00
Roger Pau Monne	7dc341175a	xen-blkback: implement safe iterator for the list of persistent grants Change foreach_grant iterator to a safe version, that allows freeing the element while iterating. Also move the free code in free_persistent_gnts to prevent freeing the element before the rb_next call. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-12-07 15:13:09 -05:00
Lars Ellenberg	d2ec180c23	drbd: update Kconfig to match current dependencies We no longer need the connector. But we need libcrc32c. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:08:29 +01:00
Philipp Reisner	ef86b77957	drbd: Fix drbdsetup wait-connect, wait-sync etc... commands This was introduces when moving the code over from the 8.3 codebase with commit `328e0f125b` Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:04:34 +01:00
Philipp Reisner	13c76aba78	drbd: close race between drbd_set_role and drbd_connect drbd_set_role(, R_PRIMARY, ) does the state change to Primary, some more housekeeping, and possibly generates a new UUID set. All of this holding the "state_mutex". The connection handshake involves sending of various state information, including the current data generation UUID set, and two connection state changes from C_WF_CONNECTION to C_WF_REPORT_PARAMS further to a number of different outcomes, resync being one of them. If the connection handshake happens between the state change to Primary and the generation of the new UUIDs, the resync decision based on the old UUID set may be confused, depending on circumstances. Make sure that, before we do the handshake, any promotion to Primary role will either be complete (including the housekeeping stuff), or can see, and serialize with, the ongoing handshake, based on the "STATE_SENT" bit, which is set when we start the handshake, and cleared only when we leave C_WF_REPORT_PARAMS again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:00:33 +01:00
Lars Ellenberg	691631c065	drbd: respect no-md-barriers setting also when changed online via disk-options We need to propagate the configuration into the flag bits, or it won't be effective. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:00:04 +01:00
Philipp Reisner	298307ed1d	drbd: Remove obsolete check Smatch complained about it this redundanct check. The check was introduced in 2006-09-13. On 2007-07-24 the body of the function was enclosed by get_ldev()/put_ldev() reference counting. Since then the check is useless and miss leading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 12:09:55 +01:00
Jens Axboe	84ad6845fb	Merge branch 'stable/for-jens-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.8/drivers	2012-12-01 09:42:41 +01:00
Jens Axboe	2cecb73098	drbd: fixup after wait_even_lock_irq() addition to generic code Compiling drbd yields: drivers/block/drbd/drbd_state.c: In function ‘_conn_request_state’: drivers/block/drbd/drbd_state.c:1804:5: error: macro "wait_event_lock_irq" passed 4 arguments, but takes just 3 drivers/block/drbd/drbd_state.c:1801:3: error: ‘wait_event_lock_irq’ undeclared (first use in this function) drivers/block/drbd/drbd_state.c:1801:3: note: each undeclared identifier is reported only once for each function it appears in drivers/block/drbd/drbd_state.c: At top level: drivers/block/drbd/drbd_state.c:1734:1: warning: ‘_conn_rq_cond’ defined but not used [-Wunused-function] Due to drbd having copied the MD definition for wait_event_lock_irq() as well. Kill them. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-30 21:20:15 +01:00
Lukas Czerner	7b5a35225b	loop: Limit the number of requests in the bio list Currently there is not limitation of number of requests in the loop bio list. This can lead into some nasty situations when the caller spawns tons of bio requests taking huge amount of memory. This is even more obvious with discard where blkdev_issue_discard() will submit all bios for the range and wait for them to finish afterwards. On really big loop devices and slow backing file system this can lead to OOM situation as reported by Dave Chinner. With this patch we will wait in loop_make_request() if the number of bios in the loop bio list would exceed 'nr_congestion_on'. We'll wake up the process as we process the bios form the list. Some threshold hysteresis is in place to avoid high frequency oscillation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-30 11:48:05 +01:00
Roger Pau Monne	07c540a0b5	xen-blkfront: free allocated page Free the page allocated for the persistent grant. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-26 14:58:11 -05:00
Roger Pau Monne	4d4f270f18	xen-blkback: move free persistent grants code Move the code that frees persistent grants from the red-black tree to a function. This will make it easier for other consumers to move this to a common place. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-26 14:58:11 -05:00
Selvan Mani	836413e8c7	mtip32xx: Fix padding issue Hi Jens, Another tiny patch. Removed __packed before the struct smart_attr and added __packed at end of the structure to fix padding issue. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Ed Cashin	11cfb6ff73	aoe: avoid running request handler on plugged queue Calling the request handler directly on a plugged queue defeats the performance improvements provided by the plugging mechanism. Use the __blk_run_queue function instead of calling the request handler directly, so that we don't interfere with the block layer's ability to plug the queue. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Wei Yongjun	298d80152c	mtip32xx: fix potential NULL pointer dereference in mtip_timeout_function() The dereference to port should be moved below the NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Jens Axboe	7c5d62388e	mtip32xx: fix shift larger than type warning If we're building a 32-bit kernel and CONFIG_LBADF isn't set, sector_t is 32-bits wide. The shifts by 32 and 40 are thus larger than we support. Cast the sector offset to a u64 to avoid these warnings. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	4b9e884523	mtip32xx: Fix incorrect mask used for erase mode Previous commit use value 3 for erasemode mask. Changing the mask to correct value to 2 Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	eda4531492	mtip32xx: Fix to make lba address correct in big-endian systems Earlier lba address was assigned directly to lba_low and lba_low_ex, which would result in a different number (bytes reversed) in big-endian systems. Now assigning lba address byte-by-byte to fis. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	3208795e61	mtip32xx: fix potential crash on SEC_ERASE_UNIT The mtip driver lifted this code from elsewhere and then added a special handling check for SEC_ERASE_UNIT. If the caller tries to do a security erase but passes no output data for the command then outbuf is not allocated and the driver duly explodes. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:54 +01:00
Jiri Kosina	eac7cc52c6	floppy: destroy floppy workqueue before cleaning up the queue We need to first destroy the floppy_wq workqueue before cleaning up the queue. Otherwise we might race with still pending work with the workqueue, but all the block queue already gone. This might lead to various oopses, such as CPU 0 Pid: 6, comm: kworker/u:0 Not tainted 3.7.0-rc4 #1 Bochs Bochs RIP: 0010:[<ffffffff8134eef5>] [<ffffffff8134eef5>] blk_peek_request+0xd5/0x1c0 RSP: 0000:ffff88000dc7dd88 EFLAGS: 00010092 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88000f602688 RSI: ffffffff81fd95d8 RDI: 6b6b6b6b6b6b6b6b RBP: ffff88000dc7dd98 R08: ffffffff81fd95c8 R09: 0000000000000000 R10: ffffffff81fd9480 R11: 0000000000000001 R12: 6b6b6b6b6b6b6b6b R13: ffff88000dc7dfd8 R14: ffff88000dc7dfd8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 6, threadinfo ffff88000dc7c000, task ffff88000dc5ecc0) Stack: 0000000000000000 0000000000000000 ffff88000dc7ddb8 ffffffff8134efee ffff88000dc7ddb8 0000000000000000 ffff88000dc7dde8 ffffffff814aef3c ffffffff81e75d80 ffff88000dc0c640 ffff88000fbfb000 ffffffff814aed90 Call Trace: [<ffffffff8134efee>] blk_fetch_request+0xe/0x30 [<ffffffff814aef3c>] redo_fd_request+0x1ac/0x400 [<ffffffff814aed90>] ? start_motor+0x130/0x130 [<ffffffff8106b526>] process_one_work+0x136/0x450 [<ffffffff8106af65>] ? manage_workers+0x205/0x2e0 [<ffffffff8106bb6d>] worker_thread+0x14d/0x420 [<ffffffff8106ba20>] ? rescuer_thread+0x1a0/0x1a0 [<ffffffff8107075a>] kthread+0xba/0xc0 [<ffffffff810706a0>] ? __kthread_parkme+0x80/0x80 [<ffffffff818b553a>] ret_from_fork+0x7a/0xb0 [<ffffffff810706a0>] ? __kthread_parkme+0x80/0x80 Code: 0f 84 c0 00 00 00 83 f8 01 0f 85 e2 00 00 00 81 4b 40 00 00 80 00 48 89 df e8 58 f8 ff ff be fb ff ff ff fe ff ff <49> 8b 1c 24 49 39 dc 0f 85 2e ff ff ff 41 0f b6 84 24 28 04 00 RIP [<ffffffff8134eef5>] blk_peek_request+0xd5/0x1c0 RSP <ffff88000dc7dd88> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:54 +01:00
Akinobu Mita	d48c152a41	cciss: use check_signature() Use check_signature() to find a signature in the mmio address. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:28:34 +01:00
Akinobu Mita	1f118bc479	cciss: cleanup bitops usage - Remove unnecessary correction of bit and address - Use BITS_TO_LONGS macro to calculate bitmap size - Use bitmap_zero() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:27:22 +01:00
Keith Busch	2b19603415	NVMe: Initialize iod nents to 0 For commands that do not map a scatter list, we need to initilaize the iod's number of sg entries (nents) to 0 and not unmap in this case. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:50 -05:00
Keith Busch	6ecec74520	NVMe: Define SMART log This data structure is defined in the NVMe specification. It's not used by the kernel, but is available for use by userspace software. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:50 -05:00
Keith Busch	08df1e0565	NVMe: Add result to nvme_get_features nvme_get_features() was not returning the result. Add a parameter to return the result in (similar to nvme_set_features()) and change all callers. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	f4f117f64b	NVMe: Set result from user admin command The ioctl data structure includes space for the 'result' of the admin command to be returned; it just wasn't filled in. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	3295874b60	NVMe: End queued bio requests when freeing queue If the queue has bios queued on it when it is freed, bio_endio() must be called for them first. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	859361a228	NVMe: Free cmdid on nvme_submit_bio error nvme_map_bio() is called after the cmdid is allocated, so we have to free the cmdid before returning from nvme_submit_bio() if nvme_map_bio() returned an error. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Jens Axboe	8d0ff3924b	Merge branch 'stable/for-jens-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.8/drivers	2012-11-12 09:18:47 -07:00
Akinobu Mita	f1d6a328bb	drbd: use copy_highpage Use copy_highpage() to copy from one page to another. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:26 +01:00
Lars Ellenberg	ed635cb067	drbd: if the replication link breaks during handshake, keep retrying The 8.3.12 commit drbd: Bugfix for the connection behavior fixes a "wasted established connection", if a former connection attempt failed during its early stages. However it opened a window for a regression, if a connection attempt fails during its last stages. The result was a terminated receiver thread, that left behind the supposedly transient "C_UNCONNECTED" state. Any later requests to change the connection state fail, as they wait for the connection state to "stabilize". Fix: short circuit and keep retrying to restablish a new connection, if we don't reach C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:19 +01:00
Jing Wang	063eacf88c	drbd: check return of kmalloc in receive_uuids Signed-off-by: Jing Wang <windsdaemon@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:10 +01:00
Philipp Reisner	986836503e	Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6	2012-11-09 14:20:23 +01:00
Philipp Reisner	328e0f125b	drbd: Broadcast sync progress no more often than once per second Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:43 +01:00
Philipp Reisner	518a4d53b2	drbd: don't try to clear bits once the disk has failed If the disk has failed already, there is no point trying to change the bitmap. drbd_set_out_of_sync() already had this safeguard, time to add it to drbd_set_in_sync() as well. This also prevents some warning messages, like FIXME asender in bm_change_bits_to, bitmap locked for 'detach' by worker if our disk fails during resync, while there are some resync acks queued up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Philipp Reisner	fd0017c124	drbd: fix regression: potential NULL pointer dereference recent commit drbd: always write bitmap on detach introduced a bitmap writeout during detach, which obviously needs some meta data device to write to. Unfortunately, that same error path may be taken if we fail to attach, e.g. due to UUID mismatch, after we changed state to D_ATTACHING, but before the lower level device pointer is even assigned. We need to test for presence of mdev->ldev. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Philipp Reisner	4035e4c2eb	drbd: Fix clearing of MDF_AL_DISABLED Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Lars Ellenberg	42839f6536	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:41 +01:00
Lars Ellenberg	edc9f5eb7a	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:41 +01:00
Lars Ellenberg	e34b677d09	drbd: wait for meta data IO completion even with failed disk, unless force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:40 +01:00
Lars Ellenberg	8747d30af9	drbd: a few more GFP_KERNEL -> GFP_NOIO This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:40 +01:00
Lars Ellenberg	bc891c9ae3	drbd: fix potential deadlock during bitmap (re-)allocation The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Lars Ellenberg	a506c13a4d	drbd: use list_move_tail instead of list_del/list_add_tail Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Philipp Reisner	1b6dd252e6	drbd: panic on delayed completion of aborted requests "aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed error completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Philipp Reisner	a3025a2737	drbd: Fix comparison of is_valid_transition()'s return code is_valid_transition() might return SS_NOTHING_TO_DO. The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which caused the wait_event to abort too early. Therefore drbd_req_state() did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER causing serve disruption of the state machine logic... Detaching from a single volue was one way to trigger this bug. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:38 +01:00
Philipp Reisner	1393b59f8c	drbd: Remove duplicate code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:38 +01:00
Lars Ellenberg	70f17b6bd1	drbd: differentiate early and later "postponing" of requests We use the RQ_POSTPONED flag to mark a request for several reasons. It may be a conflicting request in a dual-primary setup, where conflict detection and resolution on the peer decided that this request needs to be re-submitted, it needs to re-enter drbd_make_request() to fix the data divergence caused by these conflicting, partially overlapping, quasi-simultaneous requests. In this case we need to mark the corresponding area as out-of-sync, before we call drbd_al_complete_io(). We also use the RQ_POSTPONED flag to just "push back" a request, before even processing it, if IO is suspended for some reason. In this case, as this request was neither submitted nor sent yet, we must not touch the bitmap. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:37 +01:00
Philipp Reisner	76590cd1fc	drbd: Fix postponed requests A postponed request might has RQ_IN_ACT_LOG already set, but is POSTPONED before it gets something in the RQ_LOCAL_MASK set. Up to now this caused a left-over active extent. Fix that by only testing for the RQ_IN_ACT_LOG bit in drbd_req_destroy() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:37 +01:00
Philipp Reisner	19fffd7b03	drbd: Call drbd_md_sync() explicitly after a state change on the connection Without this, the meta-data gets updates after 5 seconds by the md_sync_timer. Better to do it immeditaly after a state change. If the asender detects a network failure, it may take a bit until the worker processes the according after-conn-state-change work item. The worker might be blocked in sending something, i.e. it takes until it gets into its timeout. That is 6 seconds by default which is longer than the 5 seconds of the md_sync_timer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:08 +01:00
Philipp Reisner	d76440181d	drbd: Fix postponed requests * Postponed requests should not set or clear out-of-sync marks * When a request gets postponed we need to drop its reference mdev->local_cnt (put_ldev()). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:24 +01:00
Philipp Reisner	4ae98b4db3	drbd: Imporve the error reporting of failed conn state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:24 +01:00
Philipp Reisner	7970201177	drbd: Fix the way the STATE_SENT bit is cleared With merging the commit 'drbd: Delay/reject other state changes while establishing a connection' the condition check for clearing the flag was wrong. Move the bit clearing to the __drbd_set_state() function in order to have it already cleared for the other parts of the function. I.e. clearing the susp_fen in the after_state_ch() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:23 +01:00
Philipp Reisner	07fc96197a	drbd: Do not check aspects that are not subject to change in _conn_requests_state() When _conn_requests_state() is used to change other parts of the state than the connection, do not check for a valid connection transition. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:23 +01:00
Philipp Reisner	892fdd1aee	drbd: Improve readability of IO resuming after freeze due to no data access The previous way of doing the state change was also okay since the state change on the susp flag gets propagated from the mdev to the tconn. Fortunately all this goes away in drbd-9.0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:22 +01:00
Philipp Reisner	88f79ec4ae	drbd: Fix IO resuming after connection was established while executing the fence handler Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:22 +01:00
Lars Ellenberg	b792b655cd	drbd: fix potential list_add corruption If the md_sync_timer triggers a second time, while the work queued during the first time is still pending, this could result in list_add() of an already added item, and corrupt the work item list. This likely only triggered because of the erroneous batch-dequeueing of work items fixed with drbd: dequeue single work items in wait_for_work() Still, skip queueing if md_sync_work is already queued. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Lars Ellenberg	bc317a9ecd	drbd: dequeue single work items in wait_for_work() As long as we still use drbd_queue_work_front(), we must only dequeue the single first item during normal operation. The comment in drbd_worker() even says so, but bc8a5a1 drbd: remove struct drbd_tl_epoch objects (barrier works) introduced the batch dequeueing again via list_splice_init() in wait_for_work(). Change back to list_move() of the first item, if any. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Lars Ellenberg	c02abda2b2	drbd: mutex_unlock "... must no be used in interrupt context" Documentation of mutex_unlock says we must not use it in interrupt context. So do not call it while holding the spin_lock_irq, but give up the spinlock temporarily. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Philipp Reisner	c1fd29a11f	drbd: Fix a race condition that can lead to a BUG() If the preconditions for a state change change after the wait_event() we might hit the BUG() statement in conn_set_state(). With holding the spin_lock while evaluating the condition AND until the actual state change we ensure the the preconditions can not change anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:20 +01:00
Lars Ellenberg	0ee98e2eb0	drbd: temporarily suspend io in drbd_adm_disk_opts drbd_adm_disk_opts() does wait_event(mdev->al_wait, lc_try_lock(mdev->act_log)); drbd_al_shrink(mdev); If the device is very busy, this can take a very long time to succeed. Fix this by temporarily suspending IO, then quickly change the settings, and resume. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:20 +01:00
Lars Ellenberg	4eb9b3cba0	drbd: don't send out P_BARRIER with stale information We must only send P_BARRIER for epochs we actually sent P_DATA in. If we (re-)establish a connection, we reinitialized the send.current_epoch_nr, but forgot to reset send.current_epoch_writes. This could result in a spurious P_BARRIER with stale epoch information, and a disconnect/reconnect cycle once the then "unexpected" P_BARRIER_ACK is received: BAD! BarrierAck #28823 received, expected #28829! Introduce re_init_if_first_write() and maybe_send_barrier() helpers, and call them appropriately for read/write/set-out-of-sync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Lars Ellenberg	08332d7325	drbd: properly call drbd_rs_cancel_all() in drbd_disconnected() drbd_disconnected() is supposed to clear the resync lru cache, by calling drbd_rs_cancel_all(). We must do so before we call drbd_flush_workqueue(), as at least the callback w_restart_disk_io() may wait for resync progres, and would otherwise deadlock. drbd_finish_peer_reqs() may again populate that cache, which will then potentially be stale after the next resync handshake and bitmap exchange, we have to do it again after that. A stale resync lru cache causes no harm but ugly messages like this: BAD! sector=196608s enr=6 rs_left=-256 rs_failed=0 count=256 cstate=SyncTarget Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Philipp Reisner	155522df5b	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Philipp Reisner	b66623e33e	drbd: Avoid NetworkFailure state during disconnect Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:18 +01:00
Philipp Reisner	39a1aa7f49	drbd: Protect accesses to the uuid set with a spinlock There is at least the worker context, the receiver context, the context of receiving netlink packts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:04 +01:00
Philipp Reisner	fef45d297e	drbd: Write all pages of the bitmap after an online resize We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:51 +01:00
Philipp Reisner	5af2e8ce2b	drbd: Fix completion of requests while the device is suspended In various places (E.g. CONNECTION_LOST_WHILE_PENDING) the RQ_COMPLETION_SUSP mask is passed in the clear set to mod_rq_state(). The issue was that it tried to clear the RQ_COMPLETION_SUSP bit out of the state mask first, and eventuelly set it afterwards, in the drbd_req_put_completion_ref() function. Fixed that by moving the reference getting out of drbd_req_put_completion_ref() into the mod_rq_state(), before the place where the extra reference might be put. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Andreas Gruenbacher	715306f69d	drbd: Don't unregister socket state_change callback from within the callback Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Lars Ellenberg	eb12010e9a	drbd: disambiguation, s/ERR_DISCARD/ERR_DISCARD_IMPOSSIBLE/ If for some reason (typically "split-brained" cluster manager) drbd replica data has diverged, we can chose a victim, and reconnect using "--discard-my-data", causing the victim to become sync-target, fetching all changed blocks from the peer. If we are Primary, we are potentially in use, and we refuse to "roll back" changes to the data below the page cache and other users. Rename the error symbol for this to ERR_DISCARD_IMPOSSIBLE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Lars Ellenberg	427c0434fc	drbd: disambiguation, s/DISCARD_CONCURRENT/RESOLVE_CONFLICTS/ We don't discard anything here, really. We resolve conflicting, concurrent writes to overlapping data blocks. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:49 +01:00
Lars Ellenberg	d4dabbe22d	drbd: disambiguation, s/P_DISCARD_WRITE/P_SUPERSEDED/ To avoid confusion with REQ_DISCARD aka TRIM, rename our "discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED. At the same time, rename the drbd request event DISCARD_WRITE to CONFLICT_RESOLVED. It already triggers both successful completion or restart of the request, depending on our RQ_POSTPONED flag. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:49 +01:00
Lars Ellenberg	232fd3f4a0	drbd: cleanup, drop unused struct Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:48 +01:00
Lars Ellenberg	46e21bbadb	drbd: NEG_ACK does not imply a barrier-ack Don't drop a request from the transfer log just because it was NEG_ACKED. We need it around to be able to verify P_BARRIER_ACKs against the transver log. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:48 +01:00
Lars Ellenberg	99b4d8fe6d	drbd: only start a new epoch, if the current epoch contains writes Almost all code paths calling start_new_tl_epoch() guarded it with if (... current_tle_writes > 0 ... ). Just move that inside start_new_tl_epoch(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Philipp Reisner	8a0bab2a6d	drbd: Finish requests that completed while IO was frozen Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Lars Ellenberg	e959d08d3e	drbd: Fix a potential issue with the DISCARD_CONCURRENT flag The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Lars Ellenberg	519b6d3eac	drbd: fix drbd wire compatibility for empty flushes DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:46 +01:00
Philipp Reisner	80c6eed49d	drbd: More random to the connect logic Since the listening socket is open all the time, it was possible to get into stable "initial packet S crossed" loops. * when both sides realize in the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with repeated "packet S crossed" messages. * when both sides do not realize with the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with alternating "packet S crossed" "packet M crossed" messages. In order to break out these stable loops randomize the behaviour if such a crossing of P_INITIAL_DATA or P_INITIAL_META packets is detected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:46 +01:00
Philipp Reisner	92f14951c0	drbd: Try to connec to peer only once per cycle Since now our listening socket is open all the time we will get connection tries of the peer always in. No need to try it three times. This is valid when connecting to older peers as well, it simply increases the probability that the new version DRBD will accept a connection instead that it will establish one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Philipp Reisner	b666dbf819	drbd: Remove redundant and wrong test for NULL simplification in conn_connect() Since the drbd_socket_okay() function itself tests if the the socket is NULL, the explicit test "if (sock.socket && &msock.socket)" was redundent. Apart from that the address opperator ('&') before msock.socket rendered the test pointless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Philipp Marek	3174f8c504	drbd: pass some more information to userspace. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Lars Ellenberg	81a3537a97	drbd: announce FLUSH/FUA capability to upper layers In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:44 +01:00
Lars Ellenberg	58ffa580a7	drbd: introduce stop-sector to online verify We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Had to bump the protocol version differently, we are now 101. Added verify_can_do_stop_sector() { protocol >= 97 && protocol != 100; } Also, the return value convention for worker callbacks has changed, we returned "true/false" for "keep the connection up" in 8.3, we return 0 for success and <= for failure in 8.4. Affected: receive_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:32 +01:00
Lars Ellenberg	970fbde1f1	drbd: flush drbd work queue before invalidate/invalidate remote If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:41 +01:00
Lars Ellenberg	6f1a656325	drbd: call local-io-error handler early In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:40 +01:00
Lars Ellenberg	a324896b17	drbd: do not reset rs_pending_cnt too early Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:40 +01:00
Lars Ellenberg	8a94317071	drbd: reset congestion information before reporting it in /proc/drbd We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	6f3465ed82	drbd: report congestion if we are waiting for some userland callback If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	0c84966601	drbd: differentiate between normal and forced detach Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	bf709c8552	drbd: cleanup, remove two unused global flags The two unused "global flags" in 8.3 are "per volume" flags in 8.4. Still, they are unused, so lose them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:38 +01:00
Lars Ellenberg	3b9ef85e05	drbd: fix null pointer dereference with on-congestion policy when diskless We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:38 +01:00
Lars Ellenberg	27012382bc	drbd: take error path in drbd_adm_down if interrupted by signal drbd_adm_down() does adm_detach(), which can fail with various error codes, or be interrupted by a signal. The interrupted by signal case was not properly handled, leading to block drbd0: ASSERT( mdev->state.disk == D_DISKLESS && mdev->state.conn == C_STANDALONE ) in drbd/drbd_worker.c and further to destroying objects while still in use, and resulting crashes. Detect the interruption, and take the error path out. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	9a278a7906	drbd: allow read requests to be retried after force-detach Sometimes, a lower level block device turns into a tar-pit, not completing requests at all, not even doing error completion. We can force-detach from such a tar-pit block device, either by disk-timeout, or by drbdadm detach --force. Queueing for retry only from the request destruction path (kref hit 0) makes it impossible to retry affected read requests from the peer, until the local IO completion happened, as the locally submitted bio holds a reference on the drbd request object. If we can only complete READs when the local completion finally happens, we would not need to force-detach in the first place. Instead, queue for retry where we otherwise had done the error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	934722a2db	drbd: __req_mod: make DISCARD_WRITE and independend case cherry-picked and adapted from drbd 9 devel branch This looks cleaner to me, and also gets rid of the other ugly if-inside-case-fall-through. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	a0d856dfae	drbd: base completion and destruction of requests on ref counts cherry-picked and adapted from drbd 9 devel branch The logic for when to get or put a reference is in mod_rq_state(). To not get confused in the freeze/thaw respectively resend/restart paths, or when cleaning up requests waiting for P_BARRIER_ACK, this also introduces additional state flags: RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:36 +01:00
Lars Ellenberg	b406777e64	drbd: introduce completion_ref and kref to struct drbd_request cherry-picked and adapted from drbd 9 devel branch completion_ref will count pending events necessary for completion. kref is for destruction. This only introduces these new members of struct drbd_request, a followup patch will make actual use of them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:36 +01:00
Lars Ellenberg	5df69ece6e	drbd: __drbd_make_request() is now void The previous commit causes __drbd_make_request() to always return 0. Change it to void. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	5da9c83644	drbd: better separate WRITE and READ code paths in drbd_make_request cherry-picked and adapted from drbd 9 devel branch READs will be interesting to at most one connection, WRITEs should be interesting for all established connections. Introduce some helper functions to hopefully make this easier to follow. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	b6dd1a8976	drbd: remove struct drbd_tl_epoch objects (barrier works) cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	d5b27b01f1	drbd: move the drbd_work_queue from drbd_socket to drbd_connection cherry-picked and adapted from drbd 9 devel branch In 8.4, we don't distinguish between "resource work" and "connection work" yet, we have one worker for both, as we still have only one connection. We only ever used the "data.work", no need to keep the "meta.work" around. Move tconn->data.work to tconn->sender_work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:34 +01:00
Lars Ellenberg	8c0785a5c9	drbd: allow to dequeue batches of work at a time cherry-picked and adapted from drbd 9 devel branch In 8.4, we still use drbd_queue_work_front(), so in normal operation, we can not dequeue batches, but only single items. Still, followup commits will wake the worker without explicitly queueing a work item, so up() is replaced by a simple wake_up(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:34 +01:00
Lars Ellenberg	b379c41ed7	drbd: transfer log epoch numbers are now per resource cherry-picked from drbd 9 devel branch. In preparation of multiple connections, the "barrier number" or "epoch number" needs to be tracked per-resource, not per connection. The sequence number space will not be reset anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	9d05e7c4e7	drbd: rename drbd_restart_write to drbd_restart_request Meanwhile, this is used to restart failed READ requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	629663c942	drbd: fix wrong assert in completion/retry path of failed local reads Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	ab53b90e89	drbd: fix local read error hung forever The commit drbd: simplify retry path of failed READ requests simplified it too much: it just did not do anything for local read errors. Add the missing req_may_be_completed_not_susp() to the READ_COMPLETED_WITH_ERROR case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:32 +01:00
Lars Ellenberg	1b6f19740d	drbd: fix access of unallocated pages and kernel panic BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:32 +01:00
Philipp Reisner	7a426fd8d5	drbd: Keep the listening socket open while trying to connect to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Philipp Reisner	1f3e509b76	drbd: pull prepare_listen_socket() out of drbd_wait_for_connect() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Philipp Reisner	9a51ab1c1b	drbd: New disk option al-updates By disabling al-updates one might increase performace. The price for that is that in case a crashed primary (that had al-updates disabled) is reintegraded, it will receive a full-resync instead of a bitmap based resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Andreas Gruenbacher	26ec92871b	drbd: Stop using NLA_PUT*(). These macros no longer exist in kernel version v3.5-rc1. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:30 +01:00
Philipp Reisner	7e0f096b8d	drbd: Remove drbd_accept() and use kernel_accept() instead Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:30 +01:00
Philipp Reisner	2820fd3969	drbd: Move the call to listen() out of drbd_accept() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Philipp Reisner	c5b005ab70	drbd: use bitmap_parse instead of __bitmap_parse The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Lars Ellenberg	1882e22df7	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Lars Ellenberg	f66ee69746	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:28 +01:00
Lars Ellenberg	a220d29180	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:28 +01:00
Lars Ellenberg	5016b82a49	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Lars Ellenberg	07be15b12c	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Philipp Reisner	3ea35df83f	drbd: fix spelling, remove boring development log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Philipp Reisner	e4bad1bcac	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:26 +01:00
Philipp Reisner	a1096a6e9d	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:26 +01:00
Philipp Reisner	27eb13e99b	drbd: Fixed processing of disk-barrier, disk-flushes and disk-drain Since drbd_bump_write_ordering() is called in the attaching process while the disk state is D_ATTACHING, it was not considering these three flags during attach. A call to this function was missing form drbd_adm_disk_opts(). Fixed both issues. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	9ed57dcbda	drbd: ignore volume number for drbd barrier packet exchange Transfer log epochs, and therefore P_BARRIER packets, are per resource, not per volume. We must not associate them with "some random volume". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	648e46b531	drbd: complete_conflicting_writes() should not care about connections complete_conflicting_writes() should not cause -EIO. It should not timeout either, or care for connection states. Connection timeout is detected elsewhere, and it's cleanup path is supposed to remove any pending requests or peer_requests from the write_requests tree. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	4439c400ab	drbd: simplify retry path of failed READ requests If a local or remote READ request fails, just push it back to the retry workqueue. It will re-enter __drbd_make_request, and be re-assigned to a suitable local or remote path, or failed, if we do not have access to good data anymore. This obsoletes w_read_retry_remote(), and eliminates two goto...retry blocks in __req_mod() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:24 +01:00
Lars Ellenberg	2415308eb9	drbd: move put_ldev from __req_mod() to the endio callback Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:24 +01:00
Lars Ellenberg	6870ca6d46	drbd: factor out master_bio completion and drbd_request destruction paths In preparation for multiple connections and reference counting, separate the code paths for completion of the master bio and destruction of the request object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	8d6cdd7848	drbd: conflicting writes: make wake_up of waiting peer_requests explicit Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	0afd569a40	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	ea9d6729bd	drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:22 +01:00
Lars Ellenberg	27a434fe40	drbd: make OOS_HANDED_TO_NETWORK its own case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:22 +01:00
Lars Ellenberg	2312f0b3c5	drbd: fix potential deadlock during "restart" of conflicting writes w_restart_write(), run from worker context, calls __drbd_make_request() and further drbd_al_begin_io(, delegate=true), which then potentially deadlocks. The previous patch moved a BUG_ON to expose such call paths, which would now be triggered. Also, if we call __drbd_make_request() from resource worker context, like w_restart_write() did, and that should block for whatever reason (!drbd_state_is_stable(), resource suspended, ...), we potentially deadlock the whole resource, as the worker is needed for state changes and other things. Create a dedicated retry workqueue for this instead. Also make sure that inc_ap_bio()/dec_ap_bio() are properly paired, even if do_retry() needs to retry itself, in case __drbd_make_request() returns != 0. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	f9916d61a4	drbd: don't pretend that barrier_nr == 0 was special Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	0642d5f8e0	drbd: remove unused static helper function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	e44d71f36c	drbd: remove some very outdated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:20 +01:00
Lars Ellenberg	9dab3842b5	drbd: fix memleak in error path in bm_rw and drbd_bm_write_range Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:20 +01:00
Lars Ellenberg	a6a7d4f0c1	drbd: missing wakeup after drbd_rs_del_all Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	5cdb0bf322	drbd: remove now unused seq_num member from struct drbd_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	4b8514ee28	drbd: fix potential data corruption and protocol error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	e8744f5aca	drbd: Fixed detach Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:18 +01:00
Lars Ellenberg	d93f63028f	drbd: Fix a potential write ordering issue on SyncTarget nodes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:18 +01:00
Lars Ellenberg	81f448629a	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	38a05c16b8	drbd: Consider that bio->bi_bdev might be modified below DRBD Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	72585d2428	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	dd9b360475	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	93f5afe956	drbd: If disk timeout expires fail only the affected volume ...and not all volumes of the resource Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	32db80f6f6	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	25b0d6c8c1	drbd: Reinstate disabling AL updates with invalidate-remote Commit d0ef827e (drbd: switch configuration interface from connector to genetlink) introduced a regression by removing the ability to set all bits in the out of sync bitmap and to suspend updates to the activity log of a disconnected device via the invalidate-remote management call. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Lars Ellenberg	b17f33cb0a	drbd: explicitly clear unused dp_flags in drbd_send_block We send left-over garbage from the previous packet in P_DATA_REPLY and P_RS_DATA_REPLY packets. That's bad behaviour. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	4d0fc3fdc3	drbd: Fixed compat issue with disconnecting 8.4 from a primary 8.3 For compatibility reasons 8.4 has to send P_STATE_CHG_REQ (instead of P_CONN_ST_CHG_REQ) when disconnecting. In the receiving code path we missed to convert the old answer (P_STATE_CHG_REPLY) back to 8.4 logic. Therefore the CL_ST_CHG_SUCCESS or CL_ST_CHG_FAIL bit in the flags word of mdev got set, while the state code was waiting for the CONN_WD_ST_CHG_OKAY or CONN_WD_ST_CHG_FAIL bits in tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Andreas Gruenbacher	1a3cde4406	drbd: drbd_bm_ALe_set_all(): Remove unused function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Philipp Reisner	69b6a3b159	drbd: restart loop in drbd_make_request() [prepare for Linux-3.2] With Linux-3.2 generic_make_request() will no longer loop over the request function until it finally returns 0. Move this loop into our drbd_make_request() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	7da358625c	drbd: Restore late assigning of tconn->data.sock and meta.sock With commit from Mon Mar 28 16:33:12 2011 +0200 "drbd: drbd_connect(): Initialize struct drbd_socket before sending anything" tconn->data.sock and tconn->meta.sock get assigned early, in conn_connect. The early assigning can trigger an OOPS, because it may released the socket without acquiring the mutex protecting the socket. An other thread (worker) might use setsockopt() on the socket while it gets free()ed. Restored the (proven) 8.3 behavior of assigning these sockets after the two connections are established. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	a01842ebee	drbd: Log failures of connection state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	e8cdc34335	drbd: Consider that read requests could be NEG_ACKEDed ap_in_flight only counts writes. NEG_ACKED is an action on a request that might be called for reads and writes. This bug was there forever, but it becomes much more relevant with the read balincing code. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	6ab9b1b60b	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	b8853dbd8c	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Note by Philipp Reisner: The 3rd chunk described as "relax the safe-guard..." is not there in 8.4 as it is relaxed to the maximum in 8.4 already Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	57bcb6cf1d	drbd: Do not call generic_make_request() while holding req_lock Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	d60de03a66	drbd: Load balancing method: striping Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	380207d08e	drbd: Load balancing of read requests New config option for the disk secition "read-balancing", with the values: prefer-local, prefer-remote, round-robin, when-congested-remote. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Philipp Reisner	d10b4ea32b	drbd: Get rid of "ASSERTION FAILED: tconn->current_epoch->list not empty" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Lars Ellenberg	615e087fbd	drbd: add missing rcu locks around recently introduced idr_for_each Recent commit drbd: Move write_ordering from mdev to tconn introduced a new idr_for_each loop over all volumes, but did not take necessary rcu locks or krefs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Andreas Gruenbacher	03d63e1d1e	drbd: Remove leftover prototype Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	975b297947	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	77fede5137	drbd: Fix the WO=drain implementation for multiple volumes Wait until IO is drained in all volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1e9dd2912e	drbd: Switch drbd_may_finish_epoch() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	12038a3a71	drbd: Move list of epochs from mdev to tconn This is necessary since the transfer_log on the sending is also per tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1d2783d532	drbd: Prepare epochs per connection An epoch object needs a pointer to the mdev it was received for. This is necessary to be able to send the barrier ack packet for the same volume as the original barrier packet was assigned to. This prepares the next step, in which the (receiver side) epoch list is moved from the device (mdev) to the connection (tconn) object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	4b0007c0e8	drbd: Move write_ordering from mdev to tconn This is necessary in order to prepare the move of the (receiver side) epoch list from the device (mdev) to the connection (tconn) objects. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	6936fcb49a	drbd: Move the CREATE_BARRIER flag from connection to device That is necessary since the whole transfer log is per connection(tconn) and not per device(mdev). This bug caused list corruption on the worker list. When a barrier is queued for sending in the context of one device, another device did not see the CREATE_BARRIER bit, and queued the same object again -> list corruption. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	36baf6117b	drbd: Fixed an obvious copy-n-paste mistake Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	43de7c852b	drbd: Fixes from the drbd-8.3 branch * drbd-8.3: drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6). drbd: send intermediate state change results to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	0cfac5dd90	drbd: Fixes from the drbd-8.3 branch * drbd-8.3: drbd: fix spurious meta data IO "error" drbd: Fixed a race condition between detach and start of resync drbd: fix harmless race to not trigger an ASSERT drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:05 +01:00
Philipp Reisner	376694a054	drbd: Silenced compiler warnings Since version 4.6.1 gcc warns about variables that get a value assigned, but which are never read later on. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:05 +01:00
Philipp Reisner	9bcd252182	drbd: fix "stalled" empty resync With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Lars Ellenberg	22d81140ae	drbd: fix bitmap writeout after aborted resync Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Philipp Reisner	08b165ba11	drbd: Consider the discard-my-data flag for all volumes [bugz 359] ...not only for the first volume Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Andreas Gruenbacher	935be260c1	drbd: Improve error reporting in drbd_md_sync_page_io() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:03 +01:00
Lars Ellenberg	25e409321a	drbd: fix connect failure with all default net-options If no net-options are configured (all on their default), no DRBD_NLA_NET_CONF will be passed to the kernel. The kernel must not require its presence, there is no required option in there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:03 +01:00
Andreas Gruenbacher	a209b4aec3	drbd: Update some outdated comments to match the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	c4e7afdc01	drbd: Remove unused code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	4276dea70c	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	85d735138a	drbd: Cleanup all epoch objects upon connection loss Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:01 +01:00
Philipp Reisner	f132f554ce	drbd: Do not display bogus log lines for pdsk in case pdsk < D_UNKNOWN This was a regression recently introduced with commit 7848ddb752c09b6dfd1ddfabb06b69b08aa8f6b9 "drbd: Correctly handle resources without volumes" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Lars Ellenberg	97ddb68790	drbd: detach must not try to abort non-local requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Andreas Gruenbacher	f497609e4c	drbd: Get rid of MR_{READ,WRITE}_SHIFT Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Philipp Reisner	823bd832a6	drbd: Bugfix for the connection behavior If we get into the C_BROKEN_PIPE cstate once, the state engine set the thi->t_state of the receiver thread to restarting. But with the while loop in drbdd_init() a new connection gets established. After the call into drbdd() returns immediately since the thi->t_state is not RUNNING. The restart of drbd_init() then resets thi->t_state to RUNNING. I.e. after entering C_BROKEN_PIPE once, the next successful established connection gets wasted. The two parts of the fix: * Do not cause the thread to restart if we detect the issue with the sockets while we are in C_WF_CONNECTION. * Make sure that all actions that would have set us to C_BROKEN_PIPE happen before the state change to C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:59 +01:00
Andreas Gruenbacher	7d4c782cbd	drbd: Fix the data-integrity-alg setting The last data-integrity-alg fix made data integrity checking work when the algorithm was changed for an established connection, but the common case of configuring the algorithm before connecting was still broken. Fix that. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:59 +01:00
Andreas Gruenbacher	71fc7eedb3	drbd: Turn tl_apply() into tl_abort_disk_io() There is no need to overly generalize this function; it only makes the code harder to understand. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	1b7ab15b11	drbd: Fixed w_restart_disk_io() to handle non active AL-extents Since we now apply the AL in user space onto the bitmap, the AL is not active for the requests we want to reply. For that a al_write_transaction() that might be called from worker context became necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	9b743da96c	drbd: Missing assignment of mdev before drbd_queue_work() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	3b03ad5929	drbd: Do not mod_timer() with a past time In case we can not find out why the request takes too long (happens e.g. when IO got suspended on DRBD level). rearm the timer with a reasonable value. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:57 +01:00
Philipp Reisner	3fb4746d8d	drbd: Consider that the no-data-condition could be in connected state ...when the peer has inconsistent data. In that case we failed to clear the susp_nod flag. When the local disk was attached again Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:57 +01:00
Philipp Reisner	97af09d51e	drbd: Dropped wrong clause to generate new current UUIDs Looks like a remainder from long ago. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	accdbcc5f9	drbd: receive_protocol(): We cannot change our own data-integrity-alg setting here Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	d505d9bef2	drbd: Be consistent in reporting incompatibilities in P_PROTOCOL settings Refer to the settings by the names which drbdsetup and drbd.conf are using. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	fbc12f4514	drbd: receive_protocol(): Make the program flow less confusing Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	b792c35cfb	drbd: receive_protocol(): Give variables more easily searchable names Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	5af172ed9e	drbd: Print memory address in hex instead of decimal in error message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	f2257a56ee	drbd: Allow to create devices with a minor number > minor_count The minor_count module/kernel parameter serves to scale the size of drbd's internal memory pool, but it is no longer a limit for the number of minors or the minor number. (Minor numbers can be arbitrarily high within the allowed limit of 2^20.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:54 +01:00
Lars Ellenberg	367d675da8	drbd: report net config even for resources without a single volume Currently it is legal (though unusual) to create and connect a resource, before adding in all necessary volumes. We should include the network configuration details, even if we don't have a single volume (yet). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:53 +01:00
Philipp Reisner	e0e1665381	drbd: Correctly handle resources without volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:52 +01:00
Philipp Reisner	a67e1d9e8c	drbd: Eliminated the "notified peer" messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:52 +01:00
Philipp Reisner	369bea6371	drbd: Fixed removal of volumes/devices from connected resources When removing a volume/device we need to switch the connection status of the peer back into WFReportParams. Before this fix it was left in Connected state. That means that the peer device continued to inform us about state changes, etc... But we deleted that minor -> protocol error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:51 +01:00
Lars Ellenberg	d5d7ebd422	drbd: on attach, enforce clean meta data Detection of unclean shutdown has moved into user space. The kernel code will, whenever it updates the meta data, mark it as "unclean", and will refuse to attach to such unclean meta data. "drbdadm up" now schedules "drbdmeta apply-al", which will apply the activity log to the bitmap, and/or reinitialize it, if necessary, as well as set a "clean" indicator flag. This moves a bit code out of kernel space. As a side effect, it also prevents some 8.3 module from accidentally ignoring the 8.4 style activity log, if someone should downgrade, whether on purpose, or accidentally because he changed kernel versions without providing an 8.4 for the new kernel, and the new kernel comes with in-tree 8.3. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:51 +01:00
Philipp Reisner	cdfda633d2	drbd: detach from frozen backing device * drbd-8.3: documentation: Documented detach's --force and disk's --disk-timeout drbd: Implemented the disk-timeout option drbd: Force flag for the detach operation drbd: Allow new IOs while the local disk in in FAILED state drbd: Bitmap IO functions can not return prematurely if the disk breaks drbd: Added a kref to bm_aio_ctx drbd: Hold a reference to ldev while doing meta-data IO drbd: Keep a reference to the bio until the completion handler finished drbd: Implemented wait_until_done_or_disk_failure() drbd: Replaced md_io_mutex by an atomic: md_io_in_use drbd: moved md_io into mdev drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Andreas Gruenbacher	2fcb8f307f	drbd: Improve the "unexpected packet" error messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Philipp Reisner	9510b2411d	drbd: Fixed state transitions in case reading meta data failes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Philipp Reisner	2ffca4f3ee	drbd: Improve compatibility with drbd's older than 8.3.7 Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:49 +01:00
Philipp Reisner	d942ae4453	drbd: Fixes from the 8.3 development branch * commit 'ae57a0a': drbd: Only print sanitize state's warnings, if the state change happens drbd: we should write meta data updates with FLUSH FUA drbd: fix limit define, we support 1 PiByte now drbd: fix log message argument order drbd: Typo in user-visible message. drbd: Make "(rcv\|snd)buf-size" and "ping-timeout" available for the proxy, too. drbd: Allow keywords to be used in multiple config sections. drbd: fix typos in comments. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:49 +01:00
Lars Ellenberg	4dbdae3ec9	drbd: downgraded error printk to info Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:48 +01:00
David Howells	3b7cd457d0	DRBD: Fix comparison always false warning due to long/long long compare Fix warnings of the following nature in the drbd header: In file included from drivers/block/drbd/drbd_bitmap.c:32: drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress': drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which is always false on a 32-bit machine. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:48 +01:00
Andreas Gruenbacher	6dff290220	drbd: Rename --dry-run to --tentative drbdadm already has a --dry-run option, so this option cannot directly be passed through to drbdsetup. Rename the drbdsetup option to resolve this conflict. For backward compatibility, make --dry-run an alias of --tentative. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:47 +01:00
Andreas Gruenbacher	d0fa7fd680	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:47 +01:00
Andreas Gruenbacher	afbbfa88bc	drbd: Allow to pass resource options to the new-resource command This is equivalent to how the attach and connect commands work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	089c075d88	drbd: Convert the generic netlink interface to accept connection endpoints Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	44e52cfaa2	drbd: Rename DRBD_ADM_NEED_{CONN -> RESOURCE} Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	01b39b50d3	drbd: Split off netlink mandatory attribute handling into separate file Duplicate this file in the kernel module and in user space; both sides need it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:45 +01:00
Andreas Gruenbacher	7c3063cc6f	drbd: Also need to check for DRBD_GENLA_F_MANDATORY flags before nla_find_nested() This is done by introducing drbd_nla_find_nested() which handles the flag before calling nla_find_nested(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:45 +01:00
Andreas Gruenbacher	789c1b626c	drbd: Use the terminology suggested by the command names in the source code and messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:44 +01:00
Lars Ellenberg	67b58bf723	drbd: spelling fix: too small It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:44 +01:00
Lars Ellenberg	fc251d5c24	drbd: cosmetic: fix accidental division instead of modulo when pretty printing For large resync rates, seq_printf_with_thousands_grouping() accidentally only produced Y,000,00Y, instead of the real numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Andreas Gruenbacher	46530e859c	drbd: Use DRBD_MINOR_COUNT_DEF in one more place Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Philipp Reisner	a67b813cfa	drbd: Lower log priority for an event that is definitely not an error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Andreas Gruenbacher	c75b9b10e7	drbd: Don't use empty nested netlink attributes Before mainline commit `ea5693cc` (v2.6.29-rc1), empty nested netlink attributes were not allowed. Fix that by leaving out nested attributes if they are empty and by allowing the top-level attributes to be missing. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:40 +01:00
Andreas Gruenbacher	1e2a2551ee	drbd: drbd_adm_prepare(): Pass through error codes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:56 +01:00
Philipp Reisner	d659f2aaea	drbd: Send PROTOCOL_UPDATE packets when appropriate Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:54 +01:00
Philipp Reisner	036b17eaab	drbd: Receiving part for the PROTOCOL_UPDATE packet Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	7aca6c7549	drbd: Allocation of int_dig_in and int_dig_vv was missing Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	f179d76d76	drbd: Made cmp_after_sb() more generic into convert_after_sb() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	46e1ce4177	drbd: protect updates to integrits_tfm by tconn->data->mutex Since we need to hold that mutex anyways to make sure the peer gets that change in the right position in the data stream, it makes a lot of sense to use the same mutex to ensure existence of the tfm. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:52 +01:00
Philipp Reisner	dcb20d1a8e	drbd: Refuse to change network options online when... * the peer does not speak protocol_version 100 and the user wants to change one of: - wire_protocol - two_primaries - integrity_alg * the user wants to remove the allow_two_primaries flag when there are two primaries Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:52 +01:00
Andreas Gruenbacher	95f8efd08b	drbd: Fix the upper limit of resync-after The 32-bit resync_after netlink field takes a device minor number as parameter, which is no longer limited to 255. We cannot statically verify which device numbers are valid, so set the ummer limit to the highest possible signed 32-bit integer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:51 +01:00
Andreas Gruenbacher	69ef82dea4	drbd: Refer to connect-int consistently throughout the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:50 +01:00
Andreas Gruenbacher	6394b9358e	drbd: Refer to resync-rate consistently throughout the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:50 +01:00
Lars Ellenberg	7dc1d67f7c	drbd: skip spurious wait_event in drbd_al_begin_io Activity log transaction writes are serialized on a bit lock. If several CPUs race to write an AL transaction, those that did not get the lock the first time may continue as soon as there are no more pending transactions. The do not need to all grab the lock in turn, just to realize that the AL is clean already, and they have nothing to do. This also closes a potential deadlock with drbd_adm_disk_opts. Once it got the AL bit lock, it knows there are no pending transactions, the AL is clean, and it should be safe to wait for all element references to drop to zero. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:49 +01:00
Andreas Gruenbacher	6139f60dc1	drbd: Rename the want_lose field/flag to discard_my_data This is what it is called in config files and on the command line as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:49 +01:00
Andreas Gruenbacher	6f9b5f84f5	drbd: Make broadcast events return NO_ERROR Instead of returning a ret_code outside of the range of enum drbd_ret_code, use NO_ERROR to indicate success. This way, ret_code has the same meaning in all packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:48 +01:00
Philipp Reisner	c141ebda03	drbd: Removing drbd_cfg_rwsem * Updates to all configuration items is done under genl_lock(). Including removal of mdevs or tconns. * All read non sleeping read sides are protected by rcu * All sleeping read sides keep reference counts to keep the objects alive Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:48 +01:00
Philipp Reisner	ec0bddbc55	drbd: Use RCU for the drbd_tconns list Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:47 +01:00
Philipp Reisner	81fa2e675c	drbd: Refcounting for mdev objects Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:47 +01:00
Andreas Gruenbacher	bb77d34ecc	drbd: Turn no-tcp-cork into tcp-cork={yes\|no} Change the --no-tcp-cork drbdsetup command line option as well as the no_cork netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	e544046ab8	drbd: Turn no-md-flushes into md-flushes={yes\|no} Change the --no-md-flushes drbdsetup command line option as well as the no_md_flush netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	d0c980e236	drbd: Turn no-disk-drain into disk-drain={yes\|no} Change the --no-disk-drain drbdsetup command line option as well as the no_disk_drain netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	66b2f6b9c5	drbd: Turn no-disk-flushes into disk-flushes={yes\|no} Change the --no-disk-flushes drbdsetup command line option as well as the no_disk_flush netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:45 +01:00
Philipp Reisner	813472ced7	drbd: RCU for rs_plan_s This removes the issue with using peer_seq_lock out of different contexts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:44 +01:00
Philipp Reisner	d589a21e5d	drbd: Enforce limits of disk_conf members; centralized these checks Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:44 +01:00
Philipp Reisner	9958c857c7	drbd: Made the fifo object a self contained object (preparing for RCU) * Moved rs_planed into it, named total * When having a pointer to the object the values can be embedded into the fifo object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Philipp Reisner	daeda1cca9	drbd: RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Lars Ellenberg	563e4cf25e	drbd: Introduce __s32_field in the genetlink macro magic ...and drop explicit typecasts (int)meta_dev_idx < 0. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Philipp Reisner	2ec91e0e29	drbd: Renamed (old\|new)_conf into (old\|new)_net_conf in receive_SyncParam Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:42 +01:00
Philipp Reisner	dc97b70801	drbd: Split drbd_alter_sa() into drbd_sync_after_valid() and drbd_sync_after_changed() Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:42 +01:00
Philipp Reisner	ef5e44a672	drbd: drbd_dew_dev_size() gets the user requests disk_size as argument Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:41 +01:00
Philipp Reisner	a0095508ca	drbd: Renamed the net_conf_update mutex to conf_update Preparing to use the same mutex for disk_conf updates Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:41 +01:00
Philipp Reisner	934e6138b5	drbd: Removed dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:40 +01:00
Andreas Gruenbacher	b966b5dd8e	drbd: Generate the drbd_set_*_defaults() functions from drbd_genl.h Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:38 +01:00
Lars Ellenberg	009ba89db5	drbd: fix schedule in atomic An administrative detach used to request a state change directly to D_DISKLESS, first suspending IO to avoid the last put_ldev() occuring from an endio handler, potentially in irq context. This is not enough on the receiving side (typically secondary), we may miss some peer_req on the way to local disk, which then may do the last put_ldev() from their drbd_peer_request_endio(). This patch makes the detach always go through the intermediate D_FAILED state. We may consider to rename it D_DETACHING. Alternative approach would be to create yet an other work item to be scheduled on the worker, do the destructor work from there, and get the timing right. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:01 +01:00
Lars Ellenberg	992d6e91d3	drbd: fix thread stop deadlock There are races where the receiver may be exiting, but still need the worker to process some stuff. Do not wait for the receiver to die from an exiting worker. The receiver must already be dead in case the worker decides to exit. If the receiver was still alive, it may still want to queue work, and do drbd_flush_workqueue() from it's disconnect cleanup code, which would no longer be processed by an exiting worker. This also would deadlock, if the worker was to synchornously wait for the receiver to die. Do not implicitly stop the worker. The worker will only be stopped from configuration context, from conn_reconfig_done(), drbd_adm_down() or drbd_adm_delete_connection(), after making sure the receiver is already stopped. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:00 +01:00
Lars Ellenberg	f3dfa40a67	drbd: fix race when forcefully disconnecting If a forced disconnect hits a restarting receiver right after it passed its final "if (C_DISCONNECTING)" test in drbdd_init(), but before it was actually restarted by drbd_thread_setup, we could be left with a connection stuck in C_DISCONNECTING, never reaching C_STANDALONE, which would be necessary to take it down or reconfigure it. Move the last cleanup into w_after_conn_state_ch(), and do an additional state change request in conn_try_disconnect(), just in case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:00 +01:00
Andreas Gruenbacher	88104ca458	drbd: Allow to change data-integrity-alg on the fly The main purpose of this is to allow to turn data integrity checking on and off on demand without causing interruptions. Implemented by allocating tconn->peer_integrity_tfm only when receiving a P_PROTOCOL message. l accesses to tconn->peer_integrity_tf happen in worker context, and no further synchronization is necessary. On the sender side, tconn->integrity_tfm is modified under tconn->data.mutex, and a P_PROTOCOL message is sent whenever. All accesses to tconn->integrity_tfm already happen under this mutex. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:59 +01:00
Andreas Gruenbacher	a7eb7bdf58	drbd: Introduce a "lockless" variant of drbd_send_protocoll() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:59 +01:00
Andreas Gruenbacher	4b6ad6d457	drbd: Remove obsolete drbd_crypto_is_hash() We allocate hash transformations with crypto_alloc_hash() which will only return hash algorithms. It is not necessary to reconfirm that we actually got a hash algorithm. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	5b614abe30	drbd: Rename integrity_r_tfm -> peer_integrity_tfm Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	8d412fc6d5	drbd: Rename integrity_w_tfm -> integrity_tfm Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	86db06180a	drbd: Wrong use of RCU in receive_protocol() It is not enough to grab net_conf->integrity_alg under rcu_read_lock() and access it outside of it; the entire net_conf object may be gone by then. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:57 +01:00
Lars Ellenberg	acb104c396	drbd: fix copy/paste error in comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:57 +01:00
Lars Ellenberg	b57a1e27ee	drbd: rename variable sc to res_opts sc was short for syncer conf, which does not exist anymore anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:56 +01:00
Lars Ellenberg	5ecc72c3b9	drbd: rename variable ndc to new_disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:54 +01:00
Lars Ellenberg	5979e36155	drbd: on reconfiguration requests, mind the SET_DEFAULTS flag The DRBD_GENL_F_SET_DEFAULTS flag was ignored for drbd_adm_disk_opts() and drbd_adm_net_opts(). Factor out drbd_set_*_defaults() helper functions, and call them appropriately. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:50:38 +01:00
Philipp Reisner	0fd0ea064c	drbd: Consider all crypto options in connect and in net-options So for this was simply not considered after the options have been re-arranged. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:08 +01:00
Lars Ellenberg	d9cc6e2318	drbd: fix various disconnecting races If an admin requests disconnect at a time when the state handling already disconnects/reconnects, there have been some races. Make sure to always really stop the network threads before returning success for disconnect. Do not pretend successfull forced disconnect, if the state handling returned an error. Return success from drbd_adm_down() only after all threads are finished. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:08 +01:00
Lars Ellenberg	5ee743e92d	drbd: remove useless kobject_uevent from drbd_adm_connect Calling kobject_uevent, which may sleep, from within rcu_read_lock() protected regions is not possible. This particular kobject_uevent also is also wrong. It was supposed to trigger a udev run, just in case something relevant to udev symlink magic has changed, when adjusting runtime re-configurable settings while we still had the "syncer conf". It was improperly placed in connect when we dropped the "syncer conf". The right thing to do is probably to call "udevadm trigger" directly in those cases where drbdadm thinks there was a need to trigger extra udev runs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:07 +01:00
Philipp Reisner	a18e9d1eb0	drbd: Removed the OBJECT_DYING and the CONFIG_PENDING bits superseded by refcounting Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:07 +01:00
Philipp Reisner	0ace9dfabe	drbd: Take a reference on tconn when finding a tconn by name Rule #3 of kref.txt Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:06 +01:00
Philipp Reisner	9dc9fbb357	drbd: Basic refcounting for drbd_tconn References hold by: * Each (running) drbd thread has a reference on tconn * Each mdev has a referenc on tconn * Beeing in the all_tconn list counts for one reference * Each after_conn_state_chg_work has a reference to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:06 +01:00
Philipp Reisner	1d04122599	drbd: Eliminated drbd_free_resoruces() it is superseeded by conn_free_crypto() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	f5e2b8b3b6	drbd: move comment about stopping the receiver thread to where it belongs When the last volume of a replication group is unconfigured, the worker thread exits. To not interfere with cleanup of other threads, before the the last cleanups run, we need to make sure the receiver has already exited. The commend explaining that clearly belongs above drbd_thread_stop(&tconn->receiver), not in the cleanup loop below. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	ae25b336e0	drbd: cmdname() enum to string convertion was missing a few constants Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	ed439848ca	drbd: fix setsockopt for user mode linux We use our own copy of kernel_setsockopt, and did not mess around with get_fs/set_fs, since we thought we knew we would always be KERNEL_DS anyways. Apparently not so for at least user mode linux, so put the set_fs(KERNEL_DS) in there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:04 +01:00
Lars Ellenberg	71932efc1c	drbd: allow status dump request all volumes of a specific resource We had drbd_adm_get_status (one single volume), and drbd_adm_get_status_all (dump of all volumes of all resources). This enhances the latter to be able to dump all volumes of just one specific resource. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:04 +01:00
Philipp Reisner	302bdeae49	drbd: Considering that the two_primaries config flag can change Now since it is possible to change the two_primaries config flag while the connection is up, make sure we treat a peer_req in a consistent way if the config flag changes while the peer_req is under IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:03 +01:00
Philipp Reisner	91fd4dad64	drbd: Proper locking for updates to net_conf under RCU Removing the get_net_conf()/put_net_conf() functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:03 +01:00
Philipp Reisner	44ed167da7	drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf Removing the get_net_conf()/put_net_conf() calls Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:48:59 +01:00
Philipp Reisner	b032b6fa35	drbd: Allow online change of replication protocol only with agreed_pv >= 100 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	cd64397c0b	drbd: Check consistency of net options when the get changed online Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	303d1448a0	drbd: Runtime changeable wire protocol The wire protocol is no longer a property that is negotiated between the two peers. It is now expressed with two bits (DP_SEND_WRITE_ACK and DP_SEND_RECEIVE_ACK) in each data packet. Therefore the primary node is free to change the wire protocol at any time without disconnect/reconnect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	d3fcb4908d	drbd: protect all idr accesses that might sleep with drbd_cfg_rwsem With this commit the locking for all accesses to IDRs is complete: * Non sleeping read accesses are protected by RCU * sleeping read accesses are protocted by a read lock on drbd_cfg_rwsem * accesses that add anything are protected by a write lock * accesses that remove an object are protoected by a write lock and a call to synchronize_rcu() after it is removed from the IDR and before the object is actually free()ed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:17 +01:00
Philipp Reisner	ef35626284	drbd: Converted drbd_cfg_mutex into drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:17 +01:00
Philipp Reisner	695d08fa94	drbd: rcu_read_[un]lock() for all idr accesses that do not sleep Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:16 +01:00
Philipp Reisner	cd1d9950f6	drbd: Inlined drbd_free_mdev(); it got called only from one place Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:16 +01:00
Philipp Reisner	ff370e5a9e	drbd: drbd_delete_device() takes a struct drbd_conf * now Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	5cc287e0ae	drbd: Rename drbd_pp_free() to drbd_free_pages() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	c37c8ecfee	drbd: Rename drbd_pp_alloc() to drbd_alloc_pages() and make it non-static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	18c2d52249	drbd: Rename drbd_pp_first_pages_or_try_alloc() to __drbd_alloc_pages() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:14 +01:00
Andreas Gruenbacher	d4da15374b	drbd: Make drbd_wait_ee_list_empty() and _drbd_wait_ee_list_empty() static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:14 +01:00
Andreas Gruenbacher	045417f75c	drbd: Rename drbd_{ ee -> peer_req }_has_active_page Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	a990be4682	drbd: Rename reclaim_net_ee(), drbd_process_done_ee(), drbd_process_done_ee(), tconn_process_done_ee() to *_peer_reqs Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	7721f5675e	drbd: Rename drbd_release_ee() to drbd_free_peer_reqs() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	3967deb192	drbd: Rename drbd_free_ee() and variants to *_peer_req() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:12 +01:00
Andreas Gruenbacher	0db55363cb	drbd: Rename drbd_alloc_ee() to drbd_alloc_peer_req() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:12 +01:00
Andreas Gruenbacher	e0ab6ad4bc	drbd: drbd_init_ee() no longer exists Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	2735a59467	drbd: Make all asynchronous command handlers return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	859976758d	drbd: validate_req_change_req_state(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	b55d84ba17	drbd: Removed outdated comments and code that envisioned VNRs in header 95 Since have now header 100, that has space for 16 bit volume numbers, the high byte of the length in header 95 is no longer reserved for 8 bit volume numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:10 +01:00
Andreas Gruenbacher	0c8e36d9b8	drbd: Introduce protocol version 100 headers The 8 byte header finally becomes too small. With the protocol 100 header we have 16 bit for the volume number, proper 32 bit for the data length, and 32 bit for further extensions in the future. Previous versions of drbd are using version 80 headers for all packets short enough for protocol 80. They support both header versions in worker context, but only version 80 headers in asynchronous context. For backwards compatibility, continue to use version 80 headers for short packets before protocol version 100. From protocol version 100 on, use the same header version for all packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:10 +01:00
Andreas Gruenbacher	e658983af6	drbd: Remove headers from on-the-wire data structures (struct p_*) Prepare the introduction of the protocol 100 headers. The actual protocol header is removed for the packet declarations. I.e. allow us to use the packets with different headers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	50d0b1ad78	drbd: Remove some fixed header size assumptions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	da39fec492	drbd: Remove now-unused int_dig_out buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	9f5bdc339e	drbd: Replace and remove old primitives Centralize sock->mutex locking and unlocking in [drbd\|conn]_prepare_command() and [drbd\|conn]_send_comman(). Therefore all _send_ functions are touched to use these primitives instead of drbd_get_data_sock()/drbd_put_data_sock() and former helper functions. That change makes the _send_ functions more standardized. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:08 +01:00
Andreas Gruenbacher	52b061a440	drbd: Introduce drbd_header_size() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:08 +01:00
Andreas Gruenbacher	dba5858750	drbd: Introduce new primitives for sending commands Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Andreas Gruenbacher	a17647aae4	drbd: drbd_send_ping(), drbd_send_ping(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Philipp Reisner	8b924f1d63	drbd: Use tconn in request_timer_fn() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Philipp Reisner	706cb24c23	drbd: Improved logging of state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:06 +01:00
Philipp Reisner	a6d00c8ec3	drbd: Implemented IO thawing for multiple volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:06 +01:00
Philipp Reisner	4669265a7b	drbd: Implemented conn_lowest_disk() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	19f83c7661	drbd: Implemented conn_lowest_conn() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	8c7e16c39f	drbd: Calculate and provide ns_min to the w_after_conn_state_ch() work Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	5f082f98f5	drbd: Renamed nms to ns_max Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:04 +01:00
Philipp Reisner	da9fbc276e	drbd: Introduced a new type union drbd_dev_state Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:04 +01:00
Philipp Reisner	8e0af25fa8	drbd: Moved susp, susp_nod and susp_fen to the connection object Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Philipp Reisner	2aebfabb17	drbd: Renamed id_susp(union drbd_state s) to drbd_suspended(struct drbd_conf *) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Philipp Reisner	78bae59b1b	drbd: Introduced drbd_read_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Lars Ellenberg	e15766e9c9	drbd: improvements to activate/deactivate multiple activity log extents Recent commit drbd: get rid of bio_split, allow bios of "arbitrary" size had a reference count leak: it only deactivated the first of several activity log extents for intervals crossing extent boundaries. This commit generalizes on bios spanning multiple activity log extents in drbd_al_begin_io, and adds the necessary loop around lc_put in drbd_al_complete_io as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:02 +01:00
Lars Ellenberg	23361cf32b	drbd: get rid of bio_split, allow bios of "arbitrary" size Where "arbitrary" size is currently 1 MiB, which is the BIO_MAX_SIZE for architectures with 4k PAGE_CACHE_SIZE (most). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:02 +01:00
Lars Ellenberg	7726547e67	drbd: prepare to activate two activity log extents at once Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Lars Ellenberg	181286ad22	drbd: preparation commit, pass drbd_interval to drbd_al_begin/complete_io We want to avoid bio_split for bios crossing activity log boundaries. So we may need to activate two activity log extents "atomically". drbd_al_begin_io() needs to know more than just the start sector. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Lars Ellenberg	85f103d88c	drbd: introduce the "initialized" activity log transaction type So we can initialize a clean on disk activity log area, without the module complaining with loud assert messages because of checksum or magic value mismatches. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Andreas Gruenbacher	6038178ebe	drbd: Change how the "handshake" packets are called Packets of type P_HAND_SHAKE define which protocol versions and features a node supports. For clarity, call those packets P_CONNECTION_FEATURES instead. (This does not determine the features that a specific drbd device supports, such as drbd protocol A, B, C.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:00 +01:00
Andreas Gruenbacher	e5d6f33abe	drbd: Change how the initial packets are called The first packets exchanged when a connection is established are referred to as P_HAND_SHAKE_S and P_HAND_SHAKE_M in the code, followed by P_HAND_SHAKE packets. To avoid confusion between these two unrelated things, call the initial packets P_INITIAL_DATA and P_INITIAL_META. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:00 +01:00
Andreas Gruenbacher	7c96715aa8	drbd: _conn_send_cmd(), _drbd_send_cmd(): Pass a struct drbd_socket instead of a plain socket Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	2bf896213d	drbd: drbd_connect(): Initialize struct drbd_socket before sending anything Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	1952e9166a	drbd: Map from (connection, volume number) to device in the asender handlers Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	e05e1e59db	drbd: Pass struct packet_info down to the asender receive functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	438c8374ae	drbd: Do not segfault if a sync dependency reaches a diskless device Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	778bcf2e29	drbd: Allow to disconnect if one volume is diskless Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	435693e89b	drbd: Print common state changes of all volumes as connection state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:57 +01:00
Philipp Reisner	88ef594ed7	drbd: Fixed logging of old connection state During a disconnect the oc variable in _conn_request_state() could become outdated. Determin the common old state after sleeping. While at it, I implemented that for all parts of the state Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:57 +01:00
Philipp Reisner	bd0c824a9d	drbd: Use the idr_for_each_entry() iterator instead of idr_for_each() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	4a76b1612f	drbd: Map from (connection, volume number) to device in the receive handlers The receive handlers do not all handle unknown volume numbers the same way. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	e28572167c	drbd: Pass struct packet_info down to the receive functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	49ba9b1bb3	drbd: Remove useless error messages These messages can only trigger in case there is a pretty obvious internal programming error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:55 +01:00
Andreas Gruenbacher	deebe19579	drbd: A small cleanup in drbdd() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:55 +01:00
Andreas Gruenbacher	79ed9bd053	drbd: _drbd_send_bitmap(): Use the pre-allocated send buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	5a87d920f3	drbd: Preallocate one page per drbd_socket as a send buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	fc56815c81	drbd: receive_bitmap(): Use the pre-allocated receive buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	e6ef8a5cb3	drbd: Preallocate one page per drbd_socket as a receive buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:53 +01:00
Philipp Reisner	cb703454a2	drbd: Converted drbd_try_outdate_peer() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:53 +01:00
Andreas Gruenbacher	a02d124091	drbd: Rename the DCBP_* functions to dcbp_* and move them to where they are used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	058820cdd7	drbd: Make _drbd_send_bitmap() static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	e307f352b4	drbd: Move drbd_send_ping() and drbd_send_ping_ack() to drbd_main.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	0916e0e308	drbd: Always use the same protocol version for the same peer There is no need to send protocol 80 headers to peers that understand protocol 95 headers. Make sure that we don't send protocol 95 headers until we have agreed upon a protocol version with our peer, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:51 +01:00
Andreas Gruenbacher	0829f5edf3	drbd: drbd_connected(): Return an error code upon failure. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:51 +01:00
Andreas Gruenbacher	a5c3190435	drbd: Introduce and use drbd_recv_all_warn() The pattern of receiving a fixed number of bytes and warning if a short packet is received and the receiver has not actively been interruped is repeated many times; clean that up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	309a834896	drbd: Get rid of typedef drbd_work_cb This type is not used anywhere else. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	8f7bed7774	drbd: Rename various functions from _oos_ to _out_of_sync_ for clarity Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	0da34df0d0	drbd: drbd_may_do_local_read(): Use bool/true/false Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	1097e9a80c	drbd: Remove unnecessary assertion This is also checked further below in the same function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	69f5ec728c	drbd: Remove duplicate initialization Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	3fbf4d21ae	drbd: drbd_md_sync_page_io(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:48 +01:00
Andreas Gruenbacher	ac29f4039a	drbd: _drbd_md_sync_page_io(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:48 +01:00
Andreas Gruenbacher	22ab6a30b8	drbd: drbd_bm_read() never returns a positive value through drbd_bitmap_io() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	82bc01940a	drbd: Make all command handlers return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	c696774691	drbd: Add drbd_recv_all(): Receive an entire buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	a982dd579c	drbd: send_bitmap_rle_or_plain(): Error handling cleanup Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:46 +01:00
Andreas Gruenbacher	e1c1b0fc8f	drbd: recv_resync_read(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:46 +01:00
Andreas Gruenbacher	28284ceff0	drbd: recv_dless_read(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	fc5be8397f	drbd: drbd_drain_block(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	69bc7bc351	drbd: drbd_recv_header(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	8172f3e9bf	drbd: decode_header(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:44 +01:00
Andreas Gruenbacher	e2b3032b90	drbd: drbd_process_done_ee(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:44 +01:00
Andreas Gruenbacher	99920dc5c5	drbd: Make all worker callbacks return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	b2f0ab62ec	drbd: Temporarily change the return type of all worker callbacks This helps to ensure that we don't miss one of them when changing their return value semantics. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	a896527c06	drbd: drbd_send_short_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	6bdb9b0e23	drbd: drbd_send_dblock(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	7fae55da38	drbd: _drbd_send_bio(), _drbd_send_zc_bio(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	7b57b89d62	drbd: drbd_send_block(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	9f69230cd6	drbd: _drbd_send_zc_ee(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:41 +01:00
Andreas Gruenbacher	88b390ff63	drbd: _drbd_send_page(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:41 +01:00
Andreas Gruenbacher	b987427b53	drbd: _drbd_no_send_page(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	73218a3c4c	drbd: drbd_send_oos(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	db1b0b724e	drbd: drbd_send_drequest_csum(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	6c1005e74d	drbd: drbd_send_drequest(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:39 +01:00
Andreas Gruenbacher	5b9f499c66	drbd: drbd_send_ov_request(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:39 +01:00
Andreas Gruenbacher	fa79abd893	drbd: drbd_send_ack_ex(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	a9a9994dc7	drbd: drbd_send_ack_{dp,rp}(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	dd5161218b	drbd: drbd_send_ack(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	a8c32aa846	drbd: _drbd_send_ack(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:37 +01:00
Andreas Gruenbacher	d4e67d7c4f	drbd: drbd_send_b_ack(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:37 +01:00
Andreas Gruenbacher	2f4e7abe51	drbd: drbd_send_sr_reply(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	d24ae219e9	drbd: drbd_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	caee1c3a92	drbd: conn_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	758970c832	drbd: _conn_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:35 +01:00
Andreas Gruenbacher	f02d4d0a9c	drbd: drbd_send_sizes(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:35 +01:00
Andreas Gruenbacher	9c1b7f7282	drbd: drbd_gen_and_send_sync_uuid(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	2ae5f95b1a	drbd: drbd_send_uuids() and its variants: Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	387eb30817	drbd: drbd_send_protocol(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	e8d17b015e	drbd: drbd_send_handshake(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	927036f908	drbd: drbd_send_state(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	103ea27528	drbd: drbd_send_sync_param(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	f725446353	drbd: drbd_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:32 +01:00
Andreas Gruenbacher	7d168ed30f	drbd: Get rid of USE_DATA_SOCKET and USE_META_SOCKET Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:32 +01:00
Andreas Gruenbacher	596a37f9ef	drbd: conn_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	04dfa13788	drbd: _drbd_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	ecf2363cb5	drbd: _conn_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	ce9879cb1f	drbd: conn_send_cmd2(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:30 +01:00
Andreas Gruenbacher	fb708e408f	drbd: Add drbd_send_all(): Send an entire buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:30 +01:00
Andreas Gruenbacher	11b0be28e5	drbd: drbd_get_data_sock(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:29 +01:00
Andreas Gruenbacher	c0d42c8e57	drbd: drbd_send(): Return a "real" error code if we have no socket Q: Can this case even trigger? Is failing this way any better than one that causes a NULL pointer access? Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:29 +01:00
Philipp Reisner	e90285e0ba	drbd: Fixed conn_lowest_minor It actually returned the lowest volume number. While doing that renamed a few wrongly named variables. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:28 +01:00
Lars Ellenberg	f399002e68	drbd: distribute former syncer_conf settings to disk, connection, and resource level This commit breaks the API again. Move per-volume former syncer options into disk_conf. Move per-connection former syncer options into net_conf. Renamed the remainign sync_conf to res_opts Syncer settings have been changeable at runtime, so we need to prepare for these settings to be runtime-changeable in their new home as well. Introduce new configuration operations, and share the netlink attribute between "attach" (create new disk) and "disk-opts" (change options). Same for "connect" and "net-opts". Some fields cannot be changed at runtime, however. Introduce a new flag GENLA_F_INVARIANT to be able to trigger on that in the generated validation and assignment functions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:20 +01:00
Roger Pau Monne	cb5bd4d19b	xen/blkback: persistent-grants fixes This patch contains fixes for persistent grants implementation v2: * handle == 0 is a valid handle, so initialize grants in blkback setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported by Konrad Rzeszutek Wilk. * new_map is a boolean, use "true" or "false" instead of 1 and 0. Reported by Konrad Rzeszutek Wilk. * blkfront announces the persistent-grants feature as feature-persistent-grants, use feature-persistent instead which is consistent with blkback and the public Xen headers. * Add a consistency check in blkfront to make sure we don't try to access segments that have not been set. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> [v1: The new_map int->bool had already been changed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-04 10:35:40 -05:00
Philipp Reisner	6b75dced00	drbd: conn_khelper() for user mode callbacks for connections Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:32 +01:00
Philipp Reisner	047e95e259	drbd: Allow volumes to become primary only on one side Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Lars Ellenberg	40cbf085f5	drbd: fix conn_reconfig_start without conn_reconfig_done in drbd_adm_attach If drbd_adm_attach failed early, it left the CONFIG_PENDING bit on, blocking any further conn_reconfig_start on that connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Philipp Reisner	e4f78edee1	drbd: Separate connection state changes from minor dev state changes #2 New function got_conn_RqSReply() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	f19e4f8ba7	drbd: Converted got_Ping() and got_PingAck() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	a4fbda8eca	drbd: Allow packet handler functions that take a connection (meta connection) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:29 +01:00
Philipp Reisner	dfafcc8a7b	drbd: Separate connection state changes from minor dev state changes #1 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	7204624c5e	drbd: Converted receive_protocol() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	d9ae84e790	drbd: Allow packet handler functions that take a connection That is necessary in case a connection does not have a volume 0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	8169e41b3e	drbd: Moved CONN_DRY_RUN to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	38fa9988fa	drbd: Do not modify the connection state with something else that conn_request_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:26 +01:00
Philipp Reisner	34f646bd57	drbd: Allow two diskless minors to be connected In the context of drbd-8.4 it no longer makes sense to dissalow that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	2325eb661f	drbd: New minors have to intherit the connection state form their connection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	082a3439a2	drbd: process_done_ee() has to handle unconfigured devices now Took the chance and converted tconn_process_done_ee() to use idr_for_each_entry() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:24 +01:00
Philipp Reisner	2de876efa6	drbd: Ignore packets for non existing volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	85f75dd763	drbd: introduce in-kernel "down" command This greatly simplifies deconfiguration of whole resources. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	3c5e5f6afd	drbd: add forgotten spin_unlock somehow a "goto abort" was introduced with commit drbd: Extracted is_valid_transition() out of sanitize_state() which left drbd_req_state still holding the spin lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:22 +01:00
Lars Ellenberg	527f4b24e5	drbd: bail out if a config requrest is over-determined, and not matching We have resources resp. connections, volumes, and minor numbers. A config request may specifies all three of them. If it turns out that the minor belongs to a different connection, or a different volume number in the same connection, that configuration request is invalid. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	38f19616d2	drbd: new-connection and new-minor succeed, if the object already exists Follow O_CREAT semantics when creating connection or minor device/volume objects. If we need O_CREAT\|O_EXCL semantics some time down the road, we can add NLM_F_EXCL to the netlink message flags. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	cffec5b2fe	drbd: Allow a Diskless Secondary volume to be removed Even if the connection is still established. We should be able to reduce a volume from a replication group, without taking the whole group offline. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:20 +01:00
Lars Ellenberg	d0456c72df	drbd: simplify conn_all_vols_unconf, make it bool Get rid of a temporary variable and, funny bitand assignment. Just short circuit, returning false, once we encounter the first still configured volume. FIXME verify call sites for need of rcu_read_lock or stronger. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	543cc10b4c	drbd: drbd_adm_get_status needs to show some more detail We want to see existing connection objects, even if they do not currently have volumes attached. Change the .dumpit variant of drbd_adm_get_status to iterate not over minor devices, but over connections + volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	8432b31457	drbd: allow holes in minor and volume id allocation s/idr_get_new/idr_get_new_above/ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Lars Ellenberg	3b98c0c209	drbd: switch configuration interface from connector to genetlink Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Alex Elder	9e15b77d9a	rbd: get additional info in parent spec When a layered rbd image has a parent, that parent is identified only by its pool id, image id, and snapshot id. Images that have been mapped also record names for those three id's. Add code to look up these names for parent images so they match mapped images more closely. Skip doing this for an image if it already has its pool name defined (this will be the case for images mapped by the user). It is possible that an the name of a parent image can't be determined, even if the image id is valid. If this occurs it does not preclude correct operation, so don't treat this as an error. On the other hand, defined pools will always have both an id and a name. And any snapshot of an image identified as a parent for a clone image will exist, and will have a name (if not it indicates some other internal error). So treat failure to get these bits of information as errors. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	86b00e0da6	rbd: get parent spec for version 2 images Add support for getting the the information identifying the parent image for rbd images that have them. The child image holds a reference to its parent image specification structure. Create a new entry "parent" in /sys/bus/rbd/image/N/ to report the identifying information for the parent image, if any. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	a92ffdf8a9	rbd: allow null image name Format 2 parent images are partially identified by their image id, but it may not be possible to determine their image name. The name is not strictly needed for correct operation, so we won't be treating it as an error if we don't know it. Handle this case gracefully in rbd_name_show(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	2c0d0a10ea	rbd: allow null image name We will know the image id for format 2 parent images, but won't initially know its image name. Avoid making the query for an image id in rbd_dev_image_id() if it's already known. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	83a0626362	rbd: encapsulate last part of probe Group the activities that now take place after an rbd_dev_probe() call into a single function, and move the call to that function into rbd_dev_probe() itself. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	c53d589337	rbd: define rbd_dev_{create,destroy}() helpers Encapsulate the creation/initialization and destruction of rbd device structures. The rbd_client and the rbd_spec structures provided on creation hold references whose ownership is transferred to the new rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	bd4ba6554d	rbd: consolidate rbd_dev init in rbd_add() Group the allocation and initialization of fields of the rbd device structure created in rbd_add(). Move the grouped code down later in the function, just prior to the call to rbd_dev_probe(). This is for the most part simple code movement. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	9d3997fdf4	rbd: don't pass rbd_dev to rbd_get_client() The only reason rbd_dev is passed to rbd_get_client() is so its rbd_client field can get assigned. Instead, just return the rbd_client pointer as a result and have the caller do the assignment. Change rbd_put_client() so it takes an rbd_client structure, so follows the more typical symmetry with rbd_get_client(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	859c31df9c	rbd: fill rbd_spec in rbd_add_parse_args() Pass the address of an rbd_spec structure to rbd_add_parse_args(). Use it to hold the information defining the rbd image to be mapped in an rbd_add() call. Use the result in the caller to initialize the rbd_dev->id field. This means rbd_dev is no longer needed in rbd_add_parse_args(), so get rid of it. Now that this transformation of rbd_add_parse_args() is complete, correct and expand on the its header documentation to reflect the new reality. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	8b8fb99c5c	rbd: add reference counting to rbd_spec With layered images we'll share rbd_spec structures, so add a reference count to it. It neatens up some code also. A silly get/put pair is added to the alloc routine just to avoid "defined but not used" warnings. It will go away soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:41 -05:00
Roger Pau Monne	0a8704a51f	xen/blkback: Persistent grant maps for xen blk drivers This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]	2012-10-30 09:50:04 -04:00
Alex Elder	0d7dbfce9d	rbd: define image specification structure Group the fields that uniquely specify an rbd image into a new reference-counted rbd_spec structure. This structure will be used to describe the desired image when mapping an image, and when probing parent images in layered rbd devices. Replace the set of fields in the rbd device structure with a pointer to a dynamically allocated rbd_spec. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	dc79b113d6	rbd: have rbd_add_parse_args() return error Change the interface to rbd_add_parse_args() so it returns an error code rather than a pointer. Return the ceph_options result via a pointer whose address is passed as an argument. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	4e9afeba7a	rbd: pass and populate rbd_options structure Have the caller pass the address of an rbd_options structure to rbd_add_parse_args(), to be initialized with the information gleaned as a result of the parse. I know, this is another near-reversal of a recent change... Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	819d52bf72	rbd: remove snap_name arg from rbd_add_parse_args() The snapshot name returned by rbd_add_parse_args() just gets saved in the rbd_dev eventually. So just do that inside that function and do away with the snap_name argument, both in rbd_add_parse_args() and rbd_dev_set_mapping(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	f28e565a1b	rbd: remove options args from rbd_add_parse_args() They "options" argument to rbd_add_parse_args() (and it's partner options_size) is now only needed within the function, so there's no need to have the caller allocate and pass the options buffer. Just allocate the options buffer within the function using dup_token(). Also distinguish between failures due to failed memory allocation and failing because a required argument was missing. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	e5c3553404	rbd: get rid of snap_name_len The value returned in the "snap_name_len" argument to rbd_add_parse_args() is never actually used, so get rid of it. The snap_name_len recorded in rbd_dev_v2_snap_name() is not useful either, so get rid of that too. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	0ddebc0c6c	rbd: do all argument parsing in one place This patch makes rbd_add_parse_args() be the single place all argument parsing occurs for an image map request: - Move the ceph_parse_options() call into that function - Use local variables rather than parameters to hold the list of monitor addresses supplied - Rather than returning it, pass the snapshot name (and its length) back via parameters - Have the function return a ceph_options structure pointer Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	78cea76e05	rbd: move ceph_parse_options() call up Move option parsing out of rbd_get_client() and into its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	daba5fdb4c	rbd: rename snap_exists field A Boolean field "snap_exists" in an rbd mapping is used to indicate whether a mapped snapshot has been removed from an image's snapshot context, to stop sending requests for that snapshot as soon as we know it's gone. Generalize the interpretation of this field so it applies to non-snapshot (i.e. "head") mappings. That is, define its value to be false until the mapping has been set, and then define it to be true for both snapshot mappings or head mappings. Rename the field "exists" to reflect the broader interpretation. The rbd_mapping structure is on its way out, so move the field back into the rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	971f839a76	rbd: move snap info out of rbd_mapping struct Moving the snap_id and snap_name fields into the separate rbd_mapping structure was misguided. (And in time, perhaps we'll do away with that structure altogether...) Move these fields back into struct rbd_device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	86992098e7	rbd: make pool_id a 64 bit value If a format 2 image has a parent, its pool id will be specified using a 64-bit value. Change the pool id we save for an image to match that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	41f38c2b2f	rbd: remove snapshots on error in rbd_add() If rbd_dev_snaps_update() has ever been called for an rbd device structure there could be snapshot structures on its snaps list. In rbd_add(), this function is called but a subsequent error path neglected to clean up any of these snapshots. Add a call to rbd_remove_all_snaps() in the appropriate spot to remedy this. Change a couple of error labels to be a little clearer while there. Drop the leading underscores from the function name; there's nothing special about that function that they might signify. As suggested in review, the leading underscores in __rbd_remove_snap_dev() have been removed as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Alex Elder	f7760dad28	rbd: simplify rbd_rq_fn() When processing a request, rbd_rq_fn() makes clones of the bio's in the request's bio chain and submits the results to osd's to be satisfied. If a request bio straddles the boundary between objects backing the rbd image, it must be represented by two cloned bio's, one for the first part (at the end of one object) and one for the second (at the beginning of the next object). This has been handled by a function bio_chain_clone(), which includes an interface only a mother could love, and which has been found to have other problems. This patch defines two new fairly generic bio functions (one which replaces bio_chain_clone()) to help out the situation, and then revises rbd_rq_fn() to make use of them. First, bio_clone_range() clones a portion of a single bio, starting at a given offset within the bio and including only as many bytes as requested. As a convenience, a request to clone the entire bio is passed directly to bio_clone(). Second, bio_chain_clone_range() performs a similar function, producing a chain of cloned bio's covering a sub-range of the source chain. No bio_pair structures are used, and if successful the result will represent exactly the specified range. Using bio_chain_clone_range() makes bio_rq_fn() a little easier to understand, because it avoids the need to pass very much state information between consecutive calls. By avoiding the need to track a bio_pair structure, it also eliminates the problem described here: http://tracker.newdream.net/issues/2933 Note that a block request (and therefore the complete length of a bio chain processed in rbd_rq_fn()) is an unsigned int, while the result of rbd_segment_length() is u64. This change makes this range trunctation explicit, and trips a bug if the the segment boundary is too far off. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Lars Ellenberg	ccae7868b0	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	a2a3c74f24	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	06f10adbdb	drbd: prepare for more than 32 bit flags - struct drbd_conf { ... unsigned long flags; ... } + struct drbd_conf { ... unsigned long drbd_flags[N]; ... } And introduce wrapper functions for test/set/clear bit operations on this member. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	44edfb0d78	drbd: wait for meta data IO completion even with failed disk, unless force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	8b45a5c8a1	drbd: a few more GFP_KERNEL -> GFP_NOIO This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	0b143d4382	drbd: fix potential deadlock during bitmap (re-)allocation The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	7fb907c15f	drbd: panic on delayed completion of aborted requests "aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed error completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	dbd0820c6f	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	599377acb7	drbd: Avoid NetworkFailure state during disconnect Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	c12a3d8c84	drbd: Fix a potential issue with the DISCARD_CONCURRENT flag The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Lars Ellenberg	02b91b5526	drbd: introduce stop-sector to online verify We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Philipp Reisner	9f2247bb9b	drbd: Protect accesses to the uuid set with a spinlock There is at least the worker context, the receiver context, the context of receiving netlink packts and processes reading a sysfs attribute that access the uuids. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Dave Chinner	a1ecac3b06	loop: Make explicit loop device destruction lazy xfstests has always had random failures of tests due to loop devices failing to be torn down and hence leaving filesytems that cannot be unmounted. This causes test runs to immediately stop. Over the past 6 or 7 years we've added hacks like explicit unmount -d commands for loop mounts, losetup -d after unmount -d fails, etc, but still the problems persist. Recently, the frequency of loop related failures increased again to the point that xfstests 259 will reliably fail with a stray loop device that was not torn down. That is despite the fact the test is above as simple as it gets - loop 5 or 6 times running mkfs.xfs with different paramters: lofile=$(losetup -f) losetup $lofile "$testfile" "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null \|\| echo "mkfs failed!" sync losetup -d $lofile And losteup -d $lofile is failing with EBUSY on 1-3 of these loops every time the test is run. Turns out that blkid is running simultaneously with losetup -d, and so it sees an elevated reference count and returns EBUSY. But why is blkid running? It's obvious, isn't it? udev has decided to try and find out what is on the block device as a result of a creation notification. And it is racing with mkfs, so might still be scanning the device when mkfs finishes and we try to tear it down. So, make losetup -d force autoremove behaviour. That is, when the last reference goes away, tear down the device. xfstests wants it gone, not causing random teardown failures when we know that all the operations the tests have specifically run on the device have completed and are no longer referencing the loop device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:31 +01:00
Selvan Mani	4453bc88f0	mtip32xx:Added appropriate timeout value for secure erase Added appropriate timeout value for secure erase based on identify device data Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:27 +01:00
Oliver Chick	1f999572f2	xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool Changing the type of bdev parameters to be unsigned int :1, rather than bool. This is more consistent with the types of other features in the block drivers. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:20 +01:00
Akinobu Mita	b7010ede43	cciss: select CONFIG_CHECK_SIGNATURE The patch cciss-use-check_signature.patch in -mm tree introduced a build error: drivers/built-in.o: In function `CISS_signature_present': drivers/block/cciss.c:4270: undefined reference to `check_signature' Add missing CONFIG_CHECK_SIGNATURE to fix this issue. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Fengguang Wu <wfg@linux.intel.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:00 +01:00
Wei Yongjun	2541aa799f	cciss: remove unneeded memset() The memory return by kzalloc() or kmem_cache_zalloc() has already be set to zero, so remove useless memset(0). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:58 +01:00
Wei Yongjun	654dbef214	xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:36:27 +01:00
Herton Ronaldo Krzesinski	1a4ae43e4f	floppy: remove dr, reuse drive on do_floppy_init This is a small cleanup, that also may turn error handling of unitialized disks more readable. We don't need a separate variable to track allocated disks, remove dr and reuse drive variable instead. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:07 +01:00
Herton Ronaldo Krzesinski	8d3ab4ebfd	floppy: use common function to check if floppies can be registered The same checks to see if a drive can be or is registered are repeated through the code, factor out the checks in a common function and replace the repeated checks with it. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	d60e7ec18c	floppy: properly handle failure on add_disk loop On floppy initialization, if something failed inside the loop we call add_disk, there was no cleanup of previous iterations in the error handling. Cc: stable@vger.kernel.org Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	238ab78469	floppy: do put_disk on current dr if blk_init_queue fails If blk_init_queue fails, we do not call put_disk on the current dr (dr is decremented first in the error handling loop). Cc: stable@vger.kernel.org Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	b54e1f8889	floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop Since commit `070ad7e` ("floppy: convert to delayed work and single-thread wq"), we end up calling alloc_ordered_workqueue multiple times inside the loop, which shouldn't be intended. Besides the leak, other side effect in the current code is if blk_init_queue fails, we would end up calling unregister_blkdev even if we didn't call yet register_blkdev. Just moved the allocation of floppy_wq before the loop, and adjusted the code accordingly. Cc: stable@vger.kernel.org # 3.5+ Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:24 +01:00
Konrad Rzeszutek Wilk	2911758f14	xen/blkback: Fix compile warning drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static? drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static? Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:32:43 +01:00
Alex Elder	069a4b5690	rbd: kill rbd_device->rbd_opts The rbd_device structure has an embedded rbd_options structure. Such a structure is needed to work with the generic ceph argument parsing code, but there's no need to keep it around once argument parsing is done. Use a local variable to hold the rbd options used in parsing in rbd_get_client(), and just transfer its content (it's just a read_only flag) into the field in the rbd_mapping sub-structure that requires that information. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	e5cfeed281	rbd: simplify rbd_merge_bvec() The aim of this patch is to make what's going on rbd_merge_bvec() a bit more obvious than it was before. This was an issue when a recent btrfs bug led us to question whether the merge function was working correctly. Use "obj" rather than "chunk" to indicate the units whose boundaries we care about we call (rados) "objects". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	d4b125e9eb	rbd: increase maximum snapshot name length Change RBD_MAX_SNAP_NAME_LEN to be based on NAME_MAX. That is a practical limit for the length of a snapshot name (based on the presence of a directory using the name under /sys/bus/rbd to represent the snapshot). The /sys entry is created by prefixing it with "snap_"; define that prefix symbolically, and take its length into account in defining the snapshot name length limit. Enforce the limit in rbd_add_parse_args(). Also delete a dout() call in that function that was not meant to be committed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	db2388b6ee	rbd: verify rbd image order value This adds a verification that an rbd image's object order is within the upper and lower bounds supported by this implementation. It must be at least 9 (SECTOR_SHIFT), because the Linux bio system assumes that minimum granularity. It also must be less than 32 (at the moment anyway) because there exist spots in the code that store the size of a "segment" (object backing an rbd image) in a signed int variable, which can be 32 bits including the sign. We should be able to relax this limit once we've verified the code uses 64-bit types where needed. Note that the CLI tool already limits the order to the range 12-25. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	4634246db8	rbd: consolidate rbd_do_op() calls The two calls to rbd_do_op() from rbd_rq_fn() differ only in the value passed for the snapshot id and the snapshot context. For reads the snapshot always comes from the mapping, and for writes the snapshot id is always CEPH_NOSNAP. The snapshot context is always null for reads. For writes, the snapshot context always comes from the rbd header, but it is acquired under protection of header semaphore and could change thereafter, so we can't simply use what's available inside rbd_do_op(). Eliminate the snapid parameter from rbd_do_op(), and set it based on the I/O direction inside that function instead. Always pass the snapshot context acquired in the caller, but reset it to a null pointer inside rbd_do_op() if the operation is a read. As a result, there is no difference in the read and write calls to rbd_do_op() made in rbd_rq_fn(), so just call it unconditionally. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	ff2e4bb5b3	rbd: drop rbd_do_op() opcode and flags The only callers of rbd_do_op() are in rbd_rq_fn(), where call one is used for writes and the other used for reads. The request passed to rbd_do_op() already encodes the I/O direction, and that information can be used inside the function to set the opcode and flags value (rather than passing them in as arguments). So get rid of the opcode and flags arguments to rbd_do_op(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	13f4042c05	rbd: kill rbd_req_{read,write}() Both rbd_req_read() and rbd_req_write() are simple wrapper routines for rbd_do_op(), and each is only called once. Replace each wrapper call with a direct call to rbd_do_op(), and get rid of the wrapper functions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	be466c1cc3	rbd: fix read-only option name The name of the "read-only" mapping option was inadvertently changed in this commit: `f84344f3` rbd: separate mapping info in rbd_dev Revert that hunk to return it to what it should be. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	a0ea3a40fd	rbd: zero return code in rbd_dev_image_id() When rbd_dev_probe() calls rbd_dev_image_id() it expects to get a 0 return code if successful, but it is getting a positive value. The reason is that rbd_dev_image_id() returns the value it gets from rbd_req_sync_exec(), which returns the number of bytes read in as a result of the request. (This ultimately comes from ceph_copy_from_page_vector() in rbd_req_sync_op()). Force the return value to 0 when successful in rbd_dev_image_id(). Do the same in rbd_dev_v2_object_prefix(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	b213e0b1a6	rbd: fix bug in rbd_dev_id_put() In rbd_dev_id_put(), there's a loop that's intended to determine the maximum device id in use. But it isn't doing that at all, the effect of how it's written is to simply use the just-put id number, which ignores whole purpose of this function. Fix the bug. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Kees Cook	b8977285ec	drivers/block: remove CONFIG_EXPERIMENTAL This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Asai Thambi S P <asamymuthupa@micron.com> CC: Pete Zaitcev <zaitcev@redhat.com> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-23 22:30:38 +02:00
Linus Torvalds	ce40be7a82	Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...	2012-10-11 09:04:23 +09:00
Alex Elder	35152979e6	rbd: activate v2 image support Now that v2 images support is fully implemented, have rbd_dev_v2_probe() return 0 to indicate a successful probe. (Note that an image that implements layering will fail the probe early because of the feature chekc.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:44:01 -07:00
Alex Elder	d889140c4a	rbd: implement feature checks Version 2 images have two sets of feature bit fields. The first indicates features possibly used by the image. The second indicates features that the client must support in order to use the image. When an image (or snapshot) is first examined, we need to make sure that the local implementation supports the image's required features. If not, fail the probe for the image. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:51 -07:00
Alex Elder	117973fb4c	rbd: define rbd_dev_v2_refresh() Define a new function rbd_dev_v2_refresh() to update/refresh the snapshot context for a format version 2 rbd image. This function will update anything that is not fixed for the life of an rbd image--at the moment this is mainly the snapshot context and (for a base mapping) the size. Update rbd_refresh_header() so it selects which function to use based on the image format. Rename __rbd_refresh_header() to be rbd_dev_v1_refresh() to be consistent with the naming of its version 2 counterpart. Similarly rename rbd_refresh_header() to be rbd_dev_refresh(). Unrelated--we use rbd_image_format_valid() here. Delete the other use of it, which was primarily put in place to ensure that function was referenced at the time it was defined. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:39 -07:00
Alex Elder	9478554ae5	rbd: define rbd_update_mapping_size() Encapsulate the code that handles updating the size of a mapping after an rbd image has been refreshed. This is done in anticipation of the next patch, which will make this common code for format 1 and 2 images. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:28 -07:00
Linus Torvalds	7035cdf36d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The bulk of this pull is a series from Alex that refactors and cleans up the RBD code to lay the groundwork for supporting the new image format and evolving feature set. There are also some cleanups in libceph, and for ceph there's fixed validation of file striping layouts and a bugfix in the code handling a shrinking MDS cluster." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: avoid 32-bit page index overflow ceph: return EIO on invalid layout on GET_DATALOC ioctl rbd: BUG on invalid layout ceph: propagate layout error on osd request creation libceph: check for invalid mapping ceph: convert to use le32_add_cpu() ceph: Fix oops when handling mdsmap that decreases max_mds rbd: update remaining header fields for v2 rbd: get snapshot name for a v2 image rbd: get the snapshot context for a v2 image rbd: get image features for a v2 image rbd: get the object prefix for a v2 rbd image rbd: add code to get the size of a v2 rbd image rbd: lay out header probe infrastructure rbd: encapsulate code that gets snapshot info rbd: add an rbd features field rbd: don't use index in __rbd_add_snap_dev() rbd: kill create_snap sysfs entry rbd: define rbd_dev_image_id() rbd: define some new format constants ...	2012-10-08 06:38:18 +09:00
Linus Torvalds	dc92b1f9ab	Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio changes from Rusty Russell: "New workflow: same git trees pulled by linux-next get sent straight to Linus. Git is awkward at shuffling patches compared with quilt or mq, but that doesn't happen often once things get into my -next branch." * 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits) lguest: fix occasional crash in example launcher. virtio-blk: Disable callback in virtblk_done() virtio_mmio: Don't attempt to create empty virtqueues virtio_mmio: fix off by one error allocating queue drivers/virtio/virtio_pci.c: fix error return code virtio: don't crash when device is buggy virtio: remove CONFIG_VIRTIO_RING virtio: add help to CONFIG_VIRTIO option. virtio: support reserved vqs virtio: introduce an API to set affinity for a virtqueue virtio-ring: move queue_index to vring_virtqueue virtio_balloon: not EXPERIMENTAL any more. virtio-balloon: dependency fix virtio-blk: fix NULL checking in virtblk_alloc_req() virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path virtio-blk: Add bio-based IO path for virtio-blk virtio: console: fix error handling in init() function tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace tools: Add guest trace agent as a user tool virtio/console: Allocate scatterlist according to the current pipe size ...	2012-10-07 21:04:56 +09:00
Linus Torvalds	f1c6872e49	Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI= =gxJD -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...	2012-10-07 07:13:01 +09:00
Ed Cashin	322c9ec009	aoe: update aoe-internal version number to 50 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	1ac9e60262	aoe: remove unused code Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	08b6062351	aoe: make dynamic block minor numbers the default Because udev use is so widespread, making the old static mapping the default is too conservative, given the severe limitations it places on usable AoE addresses. Storage virtualization and larger shelves have made the old limitations too confining. These changes make the dynamic block device minor numbers the default, removing the limitations on usable AoE addresses. The static arrangement is still available with aoe_dyndevs=0, and the aoe-stat tool from the userland aoetools package, the user space counterpart to the aoe driver, recognizes the case where there is a mismatch between the minor number in sysfs and the minor number in a special device file. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	7159e969d1	aoe: update and specify AoE address guards and error messages In general, specific is better when it comes to messages about AoE usage problems. Also, explicit checks for the AoE broadcast addresses are added. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	4bcce1a355	aoe: retain static block device numbers for backwards compatibility The old mapping between AoE target shelf and slot addresses and the block device minor number is retained as a backwards-compatible feature, with a new "aoe_dyndevs" module parameter available for enabling dynamic block device minor numbers. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	0c96621458	aoe: support more AoE addresses with dynamic block device minor numbers The ATA over Ethernet protocol uses a major (shelf) and minor (slot) address to identify a particular storage target. These changes remove an artificial limitation the aoe driver imposes on the use of AoE addresses. For example, without these changes, the slot address has a maximum of 15, but users commonly use slot numbers much greater than that. The AoE shelf and slot address space is often used sparsely. Instead of using a static mapping between AoE addresses and the block device minor number, the block device minor numbers are now allocated on demand. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	fea05a26c3	aoe: update copyright year in touched files Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	7392fbe5ad	aoe: update internal version number to 49 The internal version number of the aoe driver appears in a console message when the driver loads and is usually obtained by the user with the userland aoe-version tool, part of the aoetools.[1] Although this patchset includes bugfixes backported from higher-numbered versions published on the coraid.com website, it is a form of version 49. 1. http://aoetools.sourceforge.net/ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	b21faa25c6	aoe: remove unused code and add cosmetic improvements This change removes some unused code and attempts to increase code consistency. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	1b86fda9ad	aoe: increase net_device reference count while using it This change eliminates the danger that the user could rmmod the driver for a network interface that is being used for AoE by the aoe driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	64a80f5ac7	aoe: associate frames with the AoE storage target In the driver code, "target" and aoetgt refer to a particular remote interface on the AoE storage target. The latter is identified by its AoE major and minor addresses. Commands that are being sent to an AoE storage target {major, minor} can be sent or retransmitted to any of the remote MAC addresses associated with the AoE storage target. That is, frames are naturally associated with not an aoetgt (AoE major, AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor). Making the code reflect that reality simplifies the driver, especially when the path to a remote MAC address becomes unusable. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	6583303c5e	aoe: disallow unsupported AoE minor addresses A guard is inserted to prevent AoE minor addresses (slot addresses) higher than 15 to be used, as they are not yet supported by the driver. There is a change coming that will allow the aoe driver to overcome this limit by using system device minor numbers dynamically, but until then, this guard prevents unexpected targets from being used by the driver when AoE targets with high minor numbers are on the AoE network. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	25f4d75ea4	aoe: do revalidation steps in order The discovery process begins with an optional AoE config query command and an AoE config query response. Normally when an aoe device is already open, the config query response does not trigger an ATA identify device command to be sent out, since the response contains storage capacity information that, if changed, could surprise the user of the device. The userland "aoe-revalidate" tool uses a character device to trigger an AoE config query for a particular AoE storage target and an ATA device identify command, even when the device is open. This change causes the config query to go out first, reflecting the normal discovery sequence. The responses could come back in any order, so this change is fairly cosmetic. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	d54d35ac66	aoe: failover remote interface based on aoe_deadsecs parameter The aoe_deadsecs module parameter allows the user to specify a hard limit on the number of seconds an AoE command can be retransmitted before the AoE block device is considered to have failed. Using aoe_deadsecs to determine the time we try using a different remote interface helps to ensure that the hard limit is not reached before we've tried to recover by sending to a different remote port. As a data storage target, the AoE target is unambiguously identified by its {major, minor} AoE address tuple, and an AoE target can have multiple MAC addresses. However, note that "target" in the driver code and comments means a {major, minor, MAC address} tuple, as in "somewhere to send packets". Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	3f0f013374	aoe: use packets that work with the smallest-MTU local interface Users with several network interfaces dedicated to AoE generally do not configure them to support different-sized AoE data payloads on purpose. For a given AoE target, there will be a set of local network interfaces that can reach it. Using only the payload that will fit in the smallest-sized MTU of all those local interfaces greatly simplifies the driver, especially in failure scenarios. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	eb086ec596	aoe: use a kernel thread for transmissions The dev_queue_xmit function needs to have interrupts enabled, so the most simple way to get the locking right but still fulfill that requirement is to use a process that can call dev_queue_xmit serially over queued transmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	69cf2d85de	aoe: become I/O request queue handler for increased user control To allow users to choose an elevator algorithm for their particular workloads, change from a make_request-style driver to an I/O-request-queue-handler-style driver. We have to do a couple of things that might be surprising. We manipulate the page _count directly on the assumption that we still have no guarantee that users of the block layer are prohibited from submitting bios containing pages with zero reference counts.[1] If such a prohibition now exists, I can get rid of the _count manipulation. Just as before this patch, we still keep track of the sk_buffs that the network layer still hasn't finished yet and cap the resources we use with a "pool" of skbs.[2] Now that the block layer maintains the disk stats, the aoe driver's diskstats function can go away. 1. https://lkml.org/lkml/2007/3/1/374 2. https://lkml.org/lkml/2007/7/6/241 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	896831f590	aoe: kernel thread handles I/O completions for simple locking Make the frames the aoe driver uses to track the relationship between bios and packets more flexible and detached, so that they can be passed to an "aoe_ktio" thread for completion of I/O. The frames are handled much like skbs, with a capped amount of preallocation so that real-world use cases are likely to run smoothly and degenerate gracefully even under memory pressure. Decoupling I/O completion from the receive path and serializing it in a process makes it easier to think about the correctness of the locking in the driver, especially in the case of a remote MAC address becoming unusable. [dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	3d5b06051c	aoe: for performance support larger packet payloads tAdd adds the ability to work with large packets composed of a number of segments, using the scatter gather feature of the block layer (biovecs) and the network layer (skb frag array). The motivation is the performance gained by using a packet data payload greater than a page size and by using the network card's scatter gather feature. Users of the out-of-tree aoe driver already had these changes, but since early 2011, they have complained of increased memory utilization and higher CPU utilization during heavy writes.[1] The commit below appears related, as it disables scatter gather on non-IP protocols inside the harmonize_features function, even when the NIC supports sg. commit `f01a5236bd` Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). With that regression in place, transmits always linearize sg AoE packets, but in-kernel users did not have this patch. Before 2.6.38, though, these changes were working to allow sg to increase performance. 1. http://www.spinics.net/lists/linux-mm/msg15184.html Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	a336d29870	nbd: handle discard requests Add discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	2f01250888	nbd: add set flags ioctl Add a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:23 +09:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Sage Weil	6cae3717cd	rbd: BUG on invalid layout This shouldn't actually be possible because the layout struct is constructed from the RBD header and validated then. [elder@inktank.com: converted BUG() call to equivalent rbd_assert()] Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-10-01 17:20:00 -05:00
Linus Torvalds	d9a807461f	USB merge for 3.7-rc1 Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlBp3+AACgkQMUfUDdst+ym5vwCfe93FyJyXn/RDkGz7iBemvWFd vrwAoIxjaOa4/yWZWcgrWc5bP4aO3ssc =jYDr -----END PGP SIGNATURE----- Merge tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB changes from Greg Kroah-Hartman: "Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fix up several annoying - but fairly mindless - conflicts due to the termios structure having moved into the tty device, and often clashing with dbg -> dev_dbg conversion. * tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (339 commits) USB: ezusb: move ezusb.c from drivers/usb/serial to drivers/usb/misc USB: uas: fix gcc warning USB: uas: fix locking USB: Fix race condition when removing host controllers USB: uas: add locking USB: uas: fix abort USB: uas: remove aborted field, replace with status bit. USB: uas: fix task management USB: uas: keep track of command urbs xhci: Intel Panther Point BEI quirk. powerpc/usb: remove checking PHY_CLK_VALID for UTMI PHY USB: ftdi_sio: add TIAO USB Multi-Protocol Adapter (TUMPA) support Revert "usb : Add sysfs files to control port power." USB: serial: remove vizzini driver usb: host: xhci: Fix Null pointer dereferencing with `71c731a` for non-x86 systems Increase XHCI suspend timeout to 16ms USB: ohci-at91: fix null pointer in ohci_hcd_at91_overcurrent_irq USB: sierra_ms: don't keep unused variable fsl/usb: Add support for USB controller version 2.4 USB: qcaux: add Pantech vendor class match ...	2012-10-01 13:23:01 -07:00
Alex Elder	6e14b1a6c3	rbd: update remaining header fields for v2 There are three fields that are not yet updated for format 2 rbd image headers: the version of the header object; the encryption type; and the compression type. There is no interface defined for fetching the latter two, so just initialize them explicitly to 0 for now. Change rbd_dev_v2_snap_context() so the caller can be supplied the version for the header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	b8b1e2db52	rbd: get snapshot name for a v2 image Define rbd_dev_v2_snap_name() to fetch the name for a particular snapshot in a format 2 rbd image. Define rbd_dev_v2_snap_info() to to be a wrapper for getting the name, size, and features for a particular snapshot, using an interface that matches the equivalent function for version 1 images. Define rbd_dev_snap_info() wrapper function and use it to call the appropriate function for getting the snapshot name, size, and features, dependent on the rbd image format. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	35d489f946	rbd: get the snapshot context for a v2 image Fetch the snapshot context for an rbd format 2 image by calling the "get_snapcontext" method on its header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	b1b5402aa9	rbd: get image features for a v2 image The features values for an rbd format 2 image are fetched from the server using a "get_features" method. The same method is used for getting the features for a snapshot, so structure this addition with a generic helper routine that can get this information for either. The server will provide two 64-bit feature masks, one representing the features potentially in use for this image (or its snapshot), and one representing features that must be supported by the client in order to work with the image. For the time being, neither of these is really used so we keep things simple and just record the first feature vector. Once we start using these feature masks, what we record and what we expose to the user will most likely change. Signed-off-by: Alex Elder <elder@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	1e1301998e	rbd: get the object prefix for a v2 rbd image The object prefix of an rbd format 2 image is fetched from the server using a "get_object_prefix" method. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	9d475de5d1	rbd: add code to get the size of a v2 rbd image The size of an rbd format 2 image is fetched from the server using a "get_size" method. The same method is used for getting the size of a snapshot, so structure this addition with a generic helper routine that we can get this information for either. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	a30b71b999	rbd: lay out header probe infrastructure This defines a new function rbd_dev_probe() as a top-level function for populating detailed information about an rbd device. It first checks for the existence of a format 2 rbd image id object. If it exists, the image is assumed to be a format 2 rbd image, and another function rbd_dev_v2() is called to finish populating header data for that image. If it does not exist, it is assumed to be an old (format 1) rbd image, and calls a similar function rbd_dev_v1() to populate its header information. A new field, rbd_dev->format, is defined to record which version of the rbd image format the device represents. For a valid mapped rbd device it will have one of two values, 1 or 2. So far, the format 2 images are not really supported; this is laying out the infrastructure for fleshing out that support. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	cd892126c6	rbd: encapsulate code that gets snapshot info Create a function that encapsulates looking up the name, size and features related to a given snapshot, which is indicated by its index in an rbd device's snapshot context array of snapshot ids. This interface will be used to hide differences between the format 1 and format 2 images. At the moment this (looking up the name anyway) is slightly less efficient than what's done currently, but we may be able to optimize this a bit later on by cacheing the last lookup if it proves to be a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	34b131849f	rbd: add an rbd features field Record the features values for each rbd image and each of its snapshots. This is really something that only becomes meaningful for version 2 images, so this is just putting in place code that will form common infrastructure. It may be useful to expand the sysfs entries--and therefore the information we maintain--for the image and for each snapshot. But I'm going to hold off doing that until we start making active use of the feature bits. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	c8d184250d	rbd: don't use index in __rbd_add_snap_dev() Pass the snapshot id and snapshot size rather than an index to __rbd_add_snap_dev() to specify values for a new snapshot. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	02cdb02cea	rbd: kill create_snap sysfs entry Josh proposed the following change, and I don't think I could explain it any better than he did: From: Josh Durgin <josh.durgin@inktank.com> Date: Tue, 24 Jul 2012 14:22:11 -0700 To: ceph-devel <ceph-devel@vger.kernel.org> Message-ID: <500F1203.9050605@inktank.com> Right now the kernel still has one piece of rbd management duplicated from the rbd command line tool: snapshot creation. There's nothing special about snapshot creation that makes it advantageous to do from the kernel, so I'd like to remove the create_snap sysfs interface. That is, /sys/bus/rbd/devices/<id>/create_snap would be removed. Does anyone rely on the sysfs interface for creating rbd snapshots? If so, how hard would it be to replace with: rbd snap create pool/image@snap Is there any benefit to the sysfs interface that I'm missing? Josh This patch implements this proposal, removing the code that implements the "snap_create" sysfs interface for rbd images. As a result, quite a lot of other supporting code goes away. Suggested-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	589d30e0b3	rbd: define rbd_dev_image_id() New format 2 rbd images are permanently identified by a unique image id. Each rbd image also has a name, but the name can be changed. A format 2 rbd image will have an object--whose name is based on the image name--which maps an image's name to its image id. Create a new function rbd_dev_image_id() that checks for the existence of the image id object, and if it's found, records the image id in the rbd_device structure. Create a new rbd device attribute (/sys/bus/rbd/<num>/image_id) that makes this information available. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	3bb59ad515	rbd: define some new format constants Define constant symbols related to the rbd format 2 object names. This begins to bring this version of the "rbd_types.h" header more in line with the current user-space version of that file. Complete reconciliation of differences will be done at some point later, as a separate task. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	f8d4de6e1c	rbd: support data returned from OSD methods An OSD object method call can be made using rbd_req_sync_exec(). Until now this has only been used for creating a new RBD snapshot, and that has only required sending data out, not receiving anything back from the OSD. We will now need to get data back from an OSD on a method call, so add parameters to rbd_req_sync_exec() that allow a buffer into which returned data should be placed to be specified, along with its size. Previously, rbd_req_sync_exec() passed a null pointer and zero size to rbd_req_sync_op(); change this so the new inbound buffer information is provided instead. Rename the "buf" and "len" parameters in rbd_req_sync_op() to make it more obvious they are describing inbound data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	3cb4a687c7	rbd: pass flags to rbd_req_sync_exec() In order to allow both read requests and write requests to be initiated using rbd_req_sync_exec(), add an OSD flags value which can be passed down to rbd_req_sync_op(). Rename the "data" and "len" parameters to be more clear that they represent data that is outbound. At this point, this function is still only used (and only works) for write requests. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	3ee4001e0c	rbd: set up watch before announcing disk We're ready to handle header object (refresh) events at the point we call rbd_bus_add_dev(). Set up the watch request on the rbd image header just after that, and after we've registered the devices for the snapshots for the initial snapshot context. Do this before announce the disk as available for use. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	12f029448c	rbd: set initial capacity in rbd_init_disk() Move the setting of the initial capacity for an rbd image mapping into rb_init_disk(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	86ff77bb68	rbd: drop dev registration check for new snap By the time rbd_dev_snaps_register() gets called during rbd device initialization, the main device will have already been registered. Similarly, a header refresh will only occur for an rbd device whose Linux device is registered. There is therefore no need to verify the main device is registered when registering a snapshot device. For the time being, turn the check into a WARN_ON(), but it can eventually just go away. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	0f308a3188	rbd: call rbd_init_disk() sooner Call rbd_init_disk() from rbd_add() as soon as we have the major device number for the mapping. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	85ae892675	rbd: defer setting device id Hold off setting the device id and formatting the device name in rbd_add() until just before it's needed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	05fd6f6f8c	rbd: read the header before registering device Read the rbd header information and call rbd_dev_set_mapping() earlier--before registering the block device or setting up the sysfs entries for the image. The sysfs entries provide users access to some information that's only available after doing the rbd header initialization, so this will make sure it's valid right away. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	5ed1617731	rbd: call set_snap() before snap_devs_update() rbd_header_set_snap() is a simple initialization routine for an rbd device's mapping. It has to be called after the snapshot context for the rbd_dev has been updated, but can be done before snapshot devices have been registered. Change the name to rbd_dev_set_mapping() to better reflect its purpose, and call it a little sooner, before registering snapshot devices. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	304f68086f	rbd: defer registering snapshot devices When a new snapshot is found in an rbd device's updated snapshot context, __rbd_add_snap_dev() is called to create and insert an entry in the rbd devices list of snapshots. In addition, a Linux device is registered to represent the snapshot. For version 2 rbd images, it will be undesirable to initialize the device right away. So in anticipation of that, this patch separates the insertion of a snapshot entry in the snaps list from the creation of devices for those snapshots. To do this, create a new function rbd_dev_snaps_register() which traverses the list of snapshots and calls rbd_register_snap_dev() on any that have not yet been registered. Rename rbd_dev_snap_devs_update() to be rbd_dev_snaps_update() to better reflect that only the entry in the snaps list and not the snapshot's device is affected by the function. For now, call rbd_dev_snaps_register() immediately after each call to rbd_dev_snaps_update(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	3fcf2581c2	rbd: assign header name later Move the assignment of the header name for an rbd image a bit later, outside rbd_add_parse_args() and into its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	e86924a809	rbd: use snaps list in rbd_snap_by_name() An rbd_dev structure maintains a list of current snapshots that have already been fully initialized. The entries on the list have type struct rbd_snap, and each entry contains a copy of information that's found in the rbd_dev's snapshot context and header. The only caller of snap_by_name() is rbd_header_set_snap(). In that call site any positive return value (the index in the snapshot array) is ignored, so there's no need to return the index in the snapshot context's id array when it's found. rbd_header_set_snap() also has only one caller--rbd_add()--and that call is made after a call to rbd_dev_snap_devs_update(). Because the rbd_snap structures are initialized in that function, the current snapshot list can be used instead of the snapshot context to look up a snapshot's information by name. Change snap_by_name() so it uses the snapshot list rather than the rbd_dev's snapshot context in looking up snapshot information. Return 0 if it's found rather than the snapshot id. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	cd789ab9ca	rbd: don't register snapshots in bus_add_dev() When rbd_bus_add_dev() is called (one spot--in rbd_add()), the rbd image header has not even been read yet. This means that the list of snapshots will be empty at the time of the call. As a result, there is no need for the code that calls rbd_register_snap_dev() for each entry in that list--so get rid of it. Once the header has been read (just after returning), a call will be made to rbd_dev_snap_devs_update(), which will then find every snapshot in the context to be new and will therefore call rbd_register_snap_dev() via __rbd_add_snap_dev() accomplishing the same thing. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:52 -05:00
Alex Elder	4bb1f1ed00	rbd: move locking out of rbd_header_set_snap() Move the calls to get the header semaphore out of rbd_header_set_snap() and into its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	1fcdb8aa1f	rbd: simplify rbd_init_disk() a bit This just simplifies a few things in rbd_init_disk(), now that the previous patch has moved a bunch of initialization code out if it. Done separately to facilitate review. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	2ac4e75d89	rbd: do some header initialization earlier Move some of the code that initializes an rbd header out of rbd_init_disk() and into its caller. Move the code at the end of rbd_init_disk() that sets the device capacity and activates the Linux device out of that function and into the caller, ensuring we still have the disk size available where we need it. Update rbd_free_disk() so it still aligns well as an inverse of rbd_init_disk(), moving the rbd_header_free() call out to its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	8836b995fd	rbd: simplify snap_by_name() interface There is only one caller of snap_by_name(), and it passes two values to be assigned, both of which are found within an rbd device structure. Change the interface so it just passes the address of the rbd_dev, and make the assignments to its fields directly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	4e1105a299	rbd: set mapping name with the rest With the exception of the snapshot name, all of the mapping-specific fields in an rbd device structure are set in rbd_header_set_snap(). Pass the snapshot name to be assigned into rbd_header_set_snap() to keep all of the mapping assignments together. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	3feeb89467	rbd: return snap name from rbd_add_parse_args() This is the first of two patches aimed at isolating the code that sets the mapping information into a single spot. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	99c1f08f64	rbd: record mapped size Add the size of the mapped image to the set of mapping-specific fields in an rbd_device, and use it when setting the capacity of the disk. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	f84344f334	rbd: separate mapping info in rbd_dev Several fields in a struct rbd_dev are related to what is mapped, as opposed to the actual base rbd image. If the base image is mapped these are almost unneeded, but if a snapshot is mapped they describe information about that snapshot. In some contexts this can be a little bit confusing. So group these mapping-related field into a structure to make it clear what they are describing. This also includes a minor change that rearranges the fields in the in-core image header structure so that invariant fields are at the top, followed by those that change. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	c9aadfe786	rbd: kill rbd_image_header->total_snaps The "total_snaps" field in an rbd header structure is never any different from the value of "num_snaps" stored within a snapshot context. Avoid any confusion by just using the value held within the snapshot context, and get rid of the "total_snaps" field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	98cec111c0	rbd: kill rbd_dev->q A copy of rbd_dev->disk->queue is held in rbd_dev->q, but it's never actually used. So get just get rid of the field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:51 -05:00
Alex Elder	9fcbb80024	rbd: rename __rbd_init_snaps_header() The name __rbd_init_snaps_header() doesn't really convey what that function does very well. Its purpose is to scan a new snapshot context and either create or destroy snapshot device entries so that local host's view is consistent with the reality maintained on the OSDs. This patch just changes the name of this function, to be rbd_dev_snap_devs_update(). Still not perfect, but I think better. Also add some dynamic debug statements to this function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	e28393082d	rbd: rename rbd_id_get() This should have been done as part of this commit: commit `de71a2970d` Author: Alex Elder <elder@inktank.com> Date: Tue Jul 3 16:01:19 2012 -0500 rbd: rename rbd_device->id rbd_id_get() is assigning the rbd_dev->dev_id field. Change the name of that function as well as rbd_id_put() and rbd_id_max to reflect what they are affecting. Add some dynamic debug statements related to rbd device id activity. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	aafb230ebc	rbd: define rbd_assert() Define rbd_assert() and use it in place of various BUG_ON() calls now present in the code. By default assertion checking is enabled; we want to do this differently at some point. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	65ccfe21dd	rbd: split up rbd_get_segment() There are two places where rbd_get_segment() is called. One, in rbd_rq_fn(), only needs to know the length within a segment that an I/O request should be. The other, in rbd_do_op(), also needs the name of the object and the offset within it for the I/O request. Split out rbd_segment_name() into three dedicated functions: - rbd_segment_name() allocates and formats the name of the object for a segment containing a given rbd image offset - rbd_segment_offset() computes the offset within a segment for a given rbd image offset - rbd_segment_length() computes the length to use for I/O within a segment for a request, not to exceed the end of a segment object. In the new functions be a bit more careful, checking for possible error conditions: - watch for errors or overflows returned by snprintf() - catch (using BUG_ON()) potential overflow conditions when computing segment length Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	df111be631	rbd: check for overflow in rbd_get_num_segments() It is possible in rbd_get_num_segments() for an overflow to occur when adding the offset and length. This is easily avoided. Since the function returns an int and the one caller is already prepared to handle errors, have it return -ERANGE if overflow would occur. The overflow check would not work if a zero-length request was being tested, so short-circuit that case, returning 0 for the number of segments required. (This condition might be avoided elsewhere already, I don't know.) Have the caller end the request if either an error or 0 is returned. The returned value is passed to __blk_end_request_all(), meaning a 0 length request is not treated an error. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	38f5f65e9d	rbd: drop needless test in rbd_rq_fn() There's a test for null rq pointer inside the while loop in rbd_rq_fn() that's not needed. That same test already occurred in the immediatly preceding loop condition test. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	542582fce1	rbd: bio_chain_clone() cleanups In bio_chain_clone(), at the end of the function the bi_next field of the tail of the new bio chain is nulled. This isn't necessary, because if "tail" is non-null, its value will be the last bio structure allocated at the top of the while loop in that function. And before that structure is added to the end of the new chain, its bi_next pointer is always made null. While touching that function, clean a few other things: - define each local variable on its own line - move the definition of "tmp" to an inner scope - move the modification of gfpmask closer to where it's used - rearrange the logic that sets the chain's tail pointer Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	84d34dcc11	rbd: kill notify_timeout option The "notify_timeout" rbd device option is never used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	cc0538b62c	rbd: add read_only rbd map option Add the ability to map an rbd image read-only, by specifying either "read_only" or "ro" as an option on the rbd "command line." Also allow the inverse to be explicitly specified using "read_write" or "rw". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	f8c3892911	rbd: move rbd_opts to struct rbd_device The rbd options don't really apply to the ceph client. So don't store a pointer to it in the ceph_client structure, and put them (a struct, not a pointer) into the rbd_dev structure proper. Pass the rbd device structure to rbd_client_create() so it can assign rbd_dev->rbdc if successful, and have it return an error code instead of the rbd client pointer. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	621901d652	rbd: more cleanup in rbd_header_from_disk() This just rearranges things a bit more in rbd_header_from_disk() so that the snapshot sizes are initialized right after the buffer to hold them is allocated and doing a little further consolidation that follows from that. Also adds a few simple comments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:50 -05:00
Alex Elder	f785cc1dbe	rbd: kill incore snap_names_len The only thing the on-disk snap_names_len field is needed is to size the buffer allocated to hold a copy of the snapshot names for an rbd image. So don't bother saving it in the in-core rbd_image_header structure. Just use a local variable to hold the required buffer size while it's needed. Move the code that actually copies the snapshot names up closer to where the required length is saved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	58c17b0e1b	rbd: don't over-allocate space for object prefix In rbd_header_from_disk() the object prefix buffer is sized based on the maximum size it's block_name equivalent on disk could be. Instead, only allocate enough to hold null-terminated string from the on-disk header--or the maximum size of no NUL is found. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	1f7ba33115	rbd: handle locking inside __rbd_client_find() There is only caller of __rbd_client_find(), and it somewhat clumsily gets the appropriate lock and gets a reference to the existing ceph_client structure if it's found. Instead, have that function handle its own locking, and acquire the reference if found while it holds the lock. Drop the underscores from the name because there's no need to signify anything special about this function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	523f32582f	rbd: add new snapshots at the tail This fixes a bug that went in with this commit: commit f6e0c99092cca7be00fca4080cfc7081739ca544 Author: Alex Elder <elder@inktank.com> Date: Thu Aug 2 11:29:46 2012 -0500 rbd: simplify __rbd_init_snaps_header() The problem is that a new rbd snapshot needs to go either after an existing snapshot entry, or at the end of an rbd device's snapshot list. As originally coded, it is placed at the beginning. This was based on the assumption the list would be empty (so it wouldn't matter), but in fact if multiple new snapshots are added to an empty list in one shot the list will be non-empty after the first one is added. This addresses http://tracker.newdream.net/issues/3063 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	843a0d0879	rbd: rename block_name -> object_prefix In the on-disk image header structure there is a field "block_name" which represents what we now call the "object prefix" for an rbd image. Rename this field "object_prefix" to be consistent with modern usage. This appears to be the only remaining vestige of the use of "block" in symbols that represent objects in the rbd code. This addresses http://tracker.newdream.net/issues/1761 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	4156d99840	rbd: separate reading header from decoding it Right now rbd_read_header() both reads the header object for an rbd image and decodes its contents. It does this repeatedly if needed, in order to ensure a complete and intact header is obtained. Separate this process into two steps--reading of the raw header data (in new function, rbd_dev_v1_header_read()) and separately decoding its contents (in rbd_header_from_disk()). As a result, the latter function no longer requires its allocated_snaps argument. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:49 -05:00
Alex Elder	103a150f0c	rbd: expand rbd_dev_ondisk_valid() checks Add checks on the validity of the snap_count and snap_names_len field values in rbd_dev_ondisk_valid(). This eliminates the need to do them in rbd_header_from_disk(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	28cb775de1	rbd: return earlier in rbd_header_from_disk() The only caller of rbd_header_from_disk() is rbd_read_header(). It passes as allocated_snaps the number of snapshots it will have received from the server for the snapshot context that rbd_header_from_disk() is to interpret. The first time through it provides 0--mainly to extract the number of snapshots from the snapshot context header--so that it can allocate an appropriately-sized buffer to receive the entire snapshot context from the server in a second request. rbd_header_from_disk() will not fill in the array of snapshot ids unless the number in the snapshot matches the number the caller had allocated. This patch adjusts that logic a little further to be more efficient. rbd_read_header() doesn't even examine the snapshot context unless the snapshot count (stored in header->total_snaps) matches the number of snapshots allocated. So rbd_header_from_disk() doesn't need to allocate or fill in the snapshot context field at all in that case. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	6a52325f61	rbd: rearrange rbd_header_from_disk() This just moves code around for the most part. It was pulled out as a separate patch to avoid cluttering up some upcoming patches which are more substantive. The point is basically to group everything related to initializing the snapshot context together. The only functional change is that rbd_header_from_disk() now ensures the (in-core) header it is passed is zero-filled. This allows a simpler error handling path in rbd_header_from_disk(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	d2bb24e506	rbd: use sizeof (object) instead of sizeof (type) Fix a few spots in rbd_header_from_disk() to use sizeof (object) rather than sizeof (type). Use a local variable to record sizes to shorten some lines and improve readability. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	d78fd7ae03	rbd: ensure invalid pointers are made null Fix a number of spots where a pointer value that is known to have become invalid but was not reset to null. Also, toss in a change so we use sizeof (object) rather than sizeof (type). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	0f1d3f9385	rbd: make snap_names_len a u64 The snap_names_len field of an rbd_image_header structure is defined with type size_t. That field is used as both the source and target of 64-bit byte-order swapping operations though, so it's best to define it with type u64 instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Alex Elder	3593815022	rbd: simplify __rbd_init_snaps_header() The purpose of __rbd_init_snaps_header() is to compare a new snapshot context with an rbd device's list of existing snapshots. It updates the list by adding any new snapshots or removing any that are not present in the new snapshot context. The code as written is a little confusing, because it traverses both the existing snapshot list and the set of snapshots in the snapshot context in reverse. This was done based on an assumption about snapshots that is not true--namely that a duplicate snapshot name could cause an error in intepreting things if they were not processed in ascending order. These precautions are not necessary, because: - all snapshots are uniquely identified by their snapshot id - a new snapshot cannot be created if the rbd device has another snapshot with the same name (It is furthermore not currently possible to rename a snapshot.) This patch re-implements __rbd_init_snaps_header() so it passes through both the existing snapshot list and the entries in the snapshot context in forward order. It still does the same thing as before, but I find the logic considerably easier to understand. By going forward through the names in the snapshot context, there is no longer a need for the rbd_prev_snap_name() helper function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:48 -05:00
Linus Torvalds	fdb2f9c2eb	PCI changes for the 3.7 merge window: Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/ RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB 5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b 2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW 4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z SxkILwY/MFpuCFteKE1t =haBu -----END PGP SIGNATURE----- Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)" * tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits) PCI: acpiphp: Handle PCIe ports without native hotplug capability PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots PCI/ACPI: Protect acpi_pci_roots list with mutex PCI/ACPI: Use acpi_pci_root info rather than looking it up again PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface PCI/ACPI: Protect acpi_pci_drivers list with mutex PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges PCI/ACPI: Use normal list for struct acpi_pci_driver PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots PCI: Fix default vga ref_count ia64/PCI: Clear host bridge aperture struct resource x86/PCI: Clear host bridge aperture struct resource PCI: Stop all children first, before removing all children Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()" PCI: Provide a default pcibios_update_irq() PCI: Discard __init annotations for pci_fixup_irqs() and related functions PCI: Use correct type when freeing bus resource list PCI: Check P2P bridge for invalid secondary/subordinate range PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot() ...	2012-10-01 12:05:36 -07:00
Linus Torvalds	21e98932dc	Merge git://git.infradead.org/users/willy/linux-nvme Pull NVMe driver fixes from Matthew Wilcox: "Now that actual hardware has been released (don't have any yet myself), people are starting to want some of these fixes merged." Willy doesn't have hardware? Guys... * git://git.infradead.org/users/willy/linux-nvme: NVMe: Cancel outstanding IOs on queue deletion NVMe: Free admin queue memory on initialisation failure NVMe: Use ida for nvme device instance NVMe: Fix whitespace damage in nvme_init NVMe: handle allocation failure in nvme_map_user_pages() NVMe: Fix uninitialized iod compiler warning NVMe: Do not set IO queue depth beyond device max NVMe: Set block queue max sectors NVMe: use namespace id for nvme_get_features NVMe: replace nvme_ns with nvme_dev for user admin NVMe: Fix nvme module init when nvme_major is set NVMe: Set request queue logical block size	2012-09-29 10:31:52 -07:00
Asias He	bb8111086c	virtio-blk: Disable callback in virtblk_done() This reduces unnecessary interrupts that host could send to guest while guest is in the progress of irq handling. If one vcpu is handling the irq, while another interrupt comes, in handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq() which is a very heavy operation that goes all the way down to host. Here are some performance numbers on qemu: Before: ------------------------------------- seq-read : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01 clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60 clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33 clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35 cpu : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54 cpu : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1 cpu : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1 cpu : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0 vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506, in_queue=34206098, util=99.68% 43: interrupt in total: 887320 fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 --filename=/dev/vdb --name=seq-read --stonewall --rw=read --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall --rw=randread --name=rnd-write --stonewall --rw=randwrite After: ------------------------------------- seq-read : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94 clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04 clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91 clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19 cpu : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54 cpu : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2 cpu : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8 cpu : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4 vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010, in_queue=30078377, util=99.59% 43: interrupt in total: 1259639 fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16 --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0 --filename=/dev/vdb --name=seq-read --stonewall --rw=read --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall --rw=randread --name=rnd-write --stonewall --rw=randwrite Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-09-28 15:05:16 +09:30
Dan Carpenter	f22cf8eb48	virtio-blk: fix NULL checking in virtblk_alloc_req() Smatch complains about the inconsistent NULL checking here. Fix it to return NULL on failure. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)	2012-09-28 15:05:14 +09:30
Asias He	c85a1f91b3	virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path We need to support both REQ_FLUSH and REQ_FUA for bio based path since it does not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers can request. REQ_FLUSH is emulated by: A) If the bio has no data to write: 1. Send VIRTIO_BLK_T_FLUSH to device, 2. In the flush I/O completion handler, finish the bio B) If the bio has data to write: 1. Send VIRTIO_BLK_T_FLUSH to device 2. In the flush I/O completion handler, send the actual write data to device 3. In the write I/O completion handler, finish the bio REQ_FUA is emulated by: 1. Send the actual write data to device 2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device 3. In the flush I/O completion handler, finish the bio Changes in v7: - Using vbr->flags to trace request type - Dropped unnecessary struct virtio_blk *vblk parameter - Reuse struct virtblk_req in bio done function Cahnges in v6: - Reworked REQ_FLUSH and REQ_FUA emulatation order Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shli@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-09-28 15:05:14 +09:30
Asias He	a98755c559	virtio-blk: Add bio-based IO path for virtio-blk This patch introduces bio-based IO path for virtio-blk. Compared to request-based IO path, bio-based IO path uses driver provided ->make_request_fn() method to bypasses the IO scheduler. It handles the bio to device directly without allocating a request in block layer. This reduces the IO path in guest kernel to achieve high IOPS and lower latency. The downside is that guest can not use the IO scheduler to merge and sort requests. However, this is not a big problem if the backend disk in host side uses faster disk device. When the bio-based IO path is not enabled, virtio-blk still uses the original request-based IO path, no performance difference is observed. Using a slow device e.g. normal SATA disk, the bio-based IO path for sequential read and write are slower than req-based IO path due to lack of merge in guest kernel. So we make the bio-based path optional. Performance evaluation: ----------------------------- 1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% Long version: With bio-based IO path: seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30 clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35 clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39 clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45 cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954 cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137 cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648 cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375 With request-based IO path: seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49 clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15 clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27 clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68 cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622 cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959 cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082 cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985 2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% Long Version: With bio-based IO path: read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29 clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80 clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07 clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25 cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517 cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604 cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612 cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105 With request-based IO path: read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84 clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64 clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55 clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95 cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641 cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689 cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223 cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409 3) Fio test is performed in a 8 vcpu guest with normal SATA based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : -10%, -10%, 4.4%, 0.5% Latency improvement: -12%, -15%, 2.5%, 0.8% Long Version: With bio-based IO path: read : io=124812KB, bw=36537KB/s, iops=9060 , runt= 3416msec write: io=169180KB, bw=24406KB/s, iops=6065 , runt= 6932msec read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54 clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70 clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28 clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71 cpu : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7 cpu : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0 cpu : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16 cpu : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0 With request-based IO path: read : io=150120KB, bw=40420KB/s, iops=10037 , runt= 3714msec write: io=194932KB, bw=27029KB/s, iops=6722 , runt= 7212msec read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45 clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71 clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97 clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61 cpu : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23 cpu : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0 cpu : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16 cpu : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0 How to use: ----------------------------- Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk use_bio=1' to enable ->make_request_fn() based I/O path. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shli@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Asias He <asias@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-09-28 15:05:13 +09:30
Linus Torvalds	bee2d97b2c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two ceph fixes from Sage Weil: "The first fixes a leak in the rbd setup error path, and the second fixes a more serious problem with mismatched kmap/kunmap that surfaced after the recent refactoring work." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: only kunmap kmapped pages rbd: drop dev reference on error in rbd_open()	2012-09-24 16:13:49 -07:00
Alex Elder	340c7a2b2c	rbd: drop dev reference on error in rbd_open() If a read-only rbd device is opened for writing in rbd_open(), it returns without dropping the just-acquired device reference. Fix this by moving the read-only check before getting the reference. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-09-21 20:48:54 -07:00
Linus Torvalds	abef3bd710	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking updates from David Miller: "More bug fixes, nothing gets past these guys" 1) More kernel info leaks found by Mathias Krause, this time in the IPSEC configuration layers. 2) When IPSEC policies change, we do not properly make sure that cached routes (which could now be stale) throughout the system will be revalidated. Fix this by generalizing the generation count invalidation scheme used by ipv4. From Nicolas Dichtel. 3) When repairing TCP sockets, we need to allow to restore not just the send window scale, but the receive one too. Extend the existing interface to achieve this in a backwards compatible way. From Andrey Vagin. 4) A fix for FCOE scatter gather feature validation erroneously caused scatter gather to be disabled for things like AOE too. From Ed L Cashin. 5) Several cases of mishandling of error pointers, from Mathias Krause, Wei Yongjun, and Devendra Naga. 6) Fix gianfar build, from Richard Cochran. 7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao Hongjiang. 8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder. 9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde. 10) The removal of the conditional compilation of the clk support code in the stmmac driver broke things. This is because the interfaces used are the ones that don't also perform the enable/disable of the clk. Fix from Stefan Roese. 11) The QFQ packet scheduler can record out of range virtual start times, resulting later in misbehavior and even crashes. Fix from Paolo Valente. 12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the receiver when the advertised receive window goes to zero. Detect this case and force the processing of the IOAT DMA queue when it happens to avoid getting stuck. Fix from Michal Kubecek. 13) batman-adv assumes that test_bit() returns only 0 or 1, but this is not true for x86 (which returns -1 or 0, via the 'sbb' instruction). Fix from Linus Lussing. 14) Fix small packet corruption in e1000, from Tushar Dave. 15) make_blackhole() in the IPSEC policy code can do one read unlock too many, fix from Li RongQing. 16) The new tcp_try_coalesce() code introduced a bug in TCP URG handling, fix from Eric Dumazet. 17) Fix memory leak in __netif_receive_skb() when doing zerocopy and when hit an OOM condition. From Michael S Tsirkin. 18) netxen blindly deferences pdev->bus->self, which is not guarenteed to be non-NULL. Fix from Nikolay Aleksandrov. 19) Fix a performance regression caused by mistakes in ipv6 checksum validation in the bnx2x driver, fix from Michal Schmidt. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits) net/stmmac: Use clk_prepare_enable and clk_disable_unprepare net: change return values from -EACCES to -EPERM net/irda: sh_sir: fix return value check in sh_sir_set_baudrate() stmmac: fix return value check in stmmac_open_ext_timer() gianfar: fix phc index build failure ipv6: fix return value check in fib6_add() bnx2x: remove false warning regarding interrupt number can: ti_hecc: fix oops during rmmod can: janz-ican3: fix support for older hardware revisions net: do not disable sg for packets requiring no checksum aoe: assert AoE packets marked as requiring no checksum at91ether: return PTR_ERR if call to clk_get fails xfrm_user: don't copy esn replay window twice for new states xfrm_user: ensure user supplied esn replay window is valid xfrm_user: fix info leak in copy_to_user_tmpl() xfrm_user: fix info leak in copy_to_user_policy() xfrm_user: fix info leak in copy_to_user_state() xfrm_user: fix info leak in copy_to_user_auth() net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200 tcp: restore rcv_wscale in a repair mode (v2) ...	2012-09-21 14:32:55 -07:00
Linus Torvalds	8ca7de9164	Bug-fixes: * Fix M2P batching re-using the incorrect structure field. * Disable BIOS SMP MP table search. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQXGfdAAoJEFjIrFwIi8fJbWcH/0FI2d/VyB+ZU0ng3R0Oa7mt iR/x+Z+mfFdp2dXS6gs6DgJIZVA7i2K9pX4rOXjpDGGGyUeo1xoqjlQfsFWQGjZ/ p49RrDrM93c2GdRXk3iMSWfboQI7BXBs5rnyYZQL7kMxUSR75MxbeONvhPrMSO9I 3EBidWH08qjrn2HVF44F6xh5ONjpclo5AvGIzJ0eU4X0D0eqMnhvlAw8/UYJU2HV heRvuxWF9l2jNpLhKhZy1730D1X/vKA5qKAcBW8rCOpEijyPpmtKbqapeUJg/9pH NVquuwGutP5ozrSi7a/23+L+ezvQBmCPm5ZRG44PccBoZ/HVs8haT8UypSWSDzo= =TwvM -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - Fix M2P batching re-using the incorrect structure field. In v3.5 we added batching for M2P override (Machine Frame Number -> Physical Frame Number), but the original MFN was saved in an incorrect structure - and we would oops/restore when restoring with the old MFN. - Disable BIOS SMP MP table search. A bootup issue that we had ignored until we found that on DL380 G6 it was needed. * tag 'stable/for-linus-3.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/boot: Disable BIOS SMP MP table search. xen/m2p: do not reuse kmap_op->dev_bus_addr	2012-09-21 12:06:54 -07:00
Eric W. Biederman	e4849737f7	userns: Convert loop to use kuid_t instead of uid_t Cc: Jens Axboe <jaxboe@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-09-21 03:13:20 -07:00
Ed Cashin	8babe8cc65	aoe: assert AoE packets marked as requiring no checksum In order for the network layer to see that AoE requires no checksumming in a generic way, the packets must be marked as requiring no checksum, so we make this requirement explicit with the assertion. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-20 22:23:40 -04:00
Linus Torvalds	c46de2263f	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A small collection of driver fixes/updates and a core fix for 3.6. It contains: - Bug fixes for mtip32xx, and support for new hardware (just addition of IDs). They have been queued up for 3.7 for a few weeks as well. - rate-limit a failing command error message in block core. - A fix for an old cciss bug from Stephen. - Prevent overflow of partition count from Alan." * 'for-linus' of git://git.kernel.dk/linux-block: cciss: fix handling of protocol error blk: add an upper sanity check on partition adding mtip32xx: fix user_buffer check in exec_drive_command mtip32xx: Remove dead code mtip32xx: Change printk to pr_xxxx mtip32xx: Proper reporting of write protect status on big-endian mtip32xx: Increase timeout for standby command mtip32xx: Handle NCQ commands during the security locked state mtip32xx: Add support for new devices block: rate-limit the error message from failing commands	2012-09-19 11:04:34 -07:00
Stephen M. Cameron	2453f5f992	cciss: fix handling of protocol error If a command completes with a status of CMD_PROTOCOL_ERR, this information should be conveyed to the SCSI mid layer, not dropped on the floor. Unlike a similar bug in the hpsa driver, this bug only affects tape drives and CD and DVD ROM drives in the cciss driver, and to induce it, you have to disconnect (or damage) a cable, so it is not a very likely scenario (which would explain why the bug has gone undetected for the last 10 years.) Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-18 11:57:08 +02:00
Paul Clements	fded4e090c	nbd: clear waiting_queue on shutdown Fix a serious but uncommon bug in nbd which occurs when there is heavy I/O going to the nbd device while, at the same time, a failure (server, network) or manual disconnect of the nbd connection occurs. There is a small window between the time that the nbd_thread is stopped and the socket is shutdown where requests can continue to be queued to nbd's internal waiting_queue. When this happens, those requests are never completed or freed. The fix is to clear the waiting_queue on shutdown of the nbd device, in the same way that the nbd request queue (queue_head) is already being cleared. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-09-17 15:00:37 -07:00
Greg Kroah-Hartman	2bcb132c69	Merge 3.6-rc6 into usb-next This resolves the merge problems with: drivers/usb/dwc3/gadget.c drivers/usb/musb/tusb6010.c that had been seen in linux-next. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-09-16 20:42:46 -07:00
Bjorn Helgaas	78890b5989	Merge commit 'v3.6-rc5' into next * commit 'v3.6-rc5': (1098 commits) Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec\|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync powerpc: Update DSCR on all CPUs when writing sysfs dscr_default powerpc/powernv: Always go into nap mode when CPU is offline powerpc: Give hypervisor decrementer interrupts their own handler powerpc/vphn: Fix arch_update_cpu_topology() return value ARM: gemini: fix the gemini build ... Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/rapidio/devices/tsi721.c	2012-09-13 08:41:01 -06:00
David Milburn	97651ea687	mtip32xx: fix user_buffer check in exec_drive_command Current user_buffer check is incorrect and causes hdparm to fail # hdparm -I /dev/rssda HDIO_DRIVE_CMD(identify) failed: Input/output error /dev/rssda: Patching linux-3.6-rc5 hdparm works as expected # hdparm -I /dev/rssda /dev/rssda: ATA device, with non-removable media Model Number: DELL_P320h-MTFDGAL350SAH Serial Number: 00000000121302025F01 Firmware Revision: B1442808 <snip> Reported-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: David Milburn <dmilburn@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:21:13 +02:00
Asai Thambi S P	ac64e6572d	mtip32xx: Remove dead code Removed the dead code in mtip_hw_read_registers() and mtip_hw_read_flags(). Reported-by: Coverity Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:20:22 +02:00
Asai Thambi S P	45422e7431	mtip32xx: Change printk to pr_xxxx Changed printk to be compliant with latest style changes Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:20:11 +02:00
Asai Thambi S P	b62868e50e	mtip32xx: Proper reporting of write protect status on big-endian Proper reporting of write protect status on big-endian Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:19:59 +02:00
Asai Thambi S P	d7c8b94548	mtip32xx: Increase timeout for standby command Increased timeout for standby command to work with larger capacity drives Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:19:49 +02:00
Asai Thambi S P	12a166c919	mtip32xx: Handle NCQ commands during the security locked state Return error for NCQ commands when the drive is in security locked state Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:19:39 +02:00
Asai Thambi S P	1a131458dd	mtip32xx: Add support for new devices Added supported device IDs in pci table Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-12 22:19:29 +02:00
Stefano Stabellini	2fc136eecd	xen/m2p: do not reuse kmap_op->dev_bus_addr If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-09-12 11:21:40 -04:00
Kent Overstreet	bf800ef181	block: Add bio_clone_bioset(), bio_clone_kmalloc() Previously, there was bio_clone() but it only allocated from the fs bio set; as a result various users were open coding it and using __bio_clone(). This changes bio_clone() to become bio_clone_bioset(), and then we add bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of the functionality the last patch adedd. This will also help in a later patch changing how bio cloning works. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Boaz Harrosh <bharrosh@panasas.com> CC: Jeff Garzik <jeff@garzik.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-09 10:35:39 +02:00
Kent Overstreet	ccc5c9ca6a	pktcdvd: Switch to bio_kmalloc() This is prep work for killing bi_destructor - previously, pktcdvd had its own pkt_bio_alloc which was basically duplication bio_kmalloc(), necessitating its own bi_destructor implementation. v5: Un-reorder some functions, to make the patch easier to review Signed-off-by: Kent Overstreet <koverstreet@google.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-09 10:35:39 +02:00
Kent Overstreet	395c72a707	block: Generalized bio pool freeing With the old code, when you allocate a bio from a bio pool you have to implement your own destructor that knows how to find the bio pool the bio was originally allocated from. This adds a new field to struct bio (bi_pool) and changes bio_alloc_bioset() to use it. This makes various bio destructors unnecessary, so they're then deleted. v6: Explain the temporary if statement in bio_put Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Nicholas Bellinger <nab@linux-iscsi.org> CC: Lars Ellenberg <lars.ellenberg@linbit.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-09 10:35:38 +02:00
Stephen Hemminger	1d3520357d	make drivers with pci error handlers const Covers the rest of the uses of pci error handler. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-09-07 16:35:00 -06:00
Cong Wang	68a5059ecf	block: remove the deprecated ub driver It was scheduled to be removed in 3.6. Acked-by: Pete Zaitcev <zaitcev@redhat.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-09-05 17:18:53 -07:00
Linus Torvalds	a7e546f175	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block-related fixes from Jens Axboe: - Improvements to the buffered and direct write IO plugging from Fengguang. - Abstract out the mapping of a bio in a request, and use that to provide a blk_bio_map_sg() helper. Useful for mapping just a bio instead of a full request. - Regression fix from Hugh, fixing up a patch that went into the previous release cycle (and marked stable, too) attempting to prevent a loop in __getblk_slow(). - Updates to discard requests, fixing up the sizing and how we align them. Also a change to disallow merging of discard requests, since that doesn't really work properly yet. - A few drbd fixes. - Documentation updates. * 'for-linus' of git://git.kernel.dk/linux-block: block: replace __getblk_slow misfix by grow_dev_page fix drbd: Write all pages of the bitmap after an online resize drbd: Finish requests that completed while IO was frozen drbd: fix drbd wire compatibility for empty flushes Documentation: update tunable options in block/cfq-iosched.txt Documentation: update tunable options in block/cfq-iosched.txt Documentation: update missing index files in block/00-INDEX block: move down direct IO plugging block: remove plugging at buffered write time block: disable discard request merge temporarily bio: Fix potential memory leak in bio_find_or_create_slab() block: Don't use static to define "void *p" in show_partition_start() block: Add blk_bio_map_sg() helper block: Introduce __blk_segment_map_sg() helper fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices block: split discard into aligned requests block: reorganize rounding of max_discard_sectors	2012-08-25 11:36:43 -07:00
Stephen M. Cameron	b0cf0b118c	cciss: fix incorrect scsi status reporting Delete code which sets SCSI status incorrectly as it's already been set correctly above this incorrect code. The bug was introduced in 2009 by commit `b0e15f6db1` ("cciss: fix typo that causes scsi status to be lost.") Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Reported-by: Roel van Meer <roel.vanmeer@bokxing.nl> Tested-by: Roel van Meer <roel.vanmeer@bokxing.nl> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:02 -07:00
Tejun Heo	136b5721d7	workqueue: deprecate __cancel_delayed_work() Now that cancel_delayed_work() can be safely called from IRQ handlers, there's no reason to use __cancel_delayed_work(). Use cancel_delayed_work() instead of __cancel_delayed_work() and mark the latter deprecated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Roland Dreier <roland@kernel.org> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>	2012-08-21 13:18:24 -07:00
Tejun Heo	e7c2f96744	workqueue: use mod_delayed_work() instead of __cancel + queue Now that mod_delayed_work() is safe to call from IRQ handlers, __cancel_delayed_work() followed by queue_delayed_work() can be replaced with mod_delayed_work(). Most conversions are straight-forward except for the following. * net/core/link_watch.c: linkwatch_schedule_work() was doing a quite elaborate dancing around its delayed_work. Collapse it such that linkwatch_work is queued for immediate execution if LW_URGENT and existing timer is kept otherwise. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>	2012-08-21 13:18:24 -07:00
Tejun Heo	43829731dd	workqueue: deprecate flush[_delayed]_work_sync() flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>	2012-08-20 14:51:24 -07:00
Jens Axboe	79df9b40b3	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus	2012-08-16 19:27:18 +02:00
Philipp Reisner	d1aa4d04da	drbd: Write all pages of the bitmap after an online resize We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:17:35 +02:00
Philipp Reisner	509fc019e5	drbd: Finish requests that completed while IO was frozen Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:14:45 +02:00
Lars Ellenberg	227f052f47	drbd: fix drbd wire compatibility for empty flushes DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:12:56 +02:00
Stefano Stabellini	e79affc3f2	xen/arm: compile blkfront and blkback Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-08-08 17:21:14 +00:00
Matthew Wilcox	a09115b23e	NVMe: Cancel outstanding IOs on queue deletion If the device is hot-unplugged while there are active commands, we should time out the I/Os so that upper layers don't just see the I/Os disappear. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-08-07 15:56:23 -04:00
Artem Bityutskiy	d97482ede2	drbd: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from drbd comments. Cc: drbd-dev@lists.linbit.com Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:39 +04:00
Matthew Wilcox	9e866774aa	NVMe: Free admin queue memory on initialisation failure If the adapter fails initialisation, the memory allocated for the admin queue may not be freed. Split the memory freeing part of nvme_free_queue() into nvme_free_queue_mem() and call it in the case of initialisation failure. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com>	2012-08-03 13:55:56 -04:00
Linus Torvalds	eff0d13f38	Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block Pull block driver changes from Jens Axboe: - Making the plugging support for drivers a bit more sane from Neil. This supersedes the plugging change from Shaohua as well. - The usual round of drbd updates. - Using a tail add instead of a head add in the request completion for ndb, making us find the most completed request more quickly. - A few floppy changes, getting rid of a duplicated flag and also running the floppy init async (since it takes forever in boot terms) from Andi. * 'for-3.6/drivers' of git://git.kernel.dk/linux-block: floppy: remove duplicated flag FD_RAW_NEED_DISK blk: pass from_schedule to non-request unplug functions. block: stack unplug blk: centralize non-request unplug handling. md: remove plug_cnt feature of plugging. block/nbd: micro-optimization in nbd request completion drbd: announce FLUSH/FUA capability to upper layers drbd: fix max_bio_size to be unsigned drbd: flush drbd work queue before invalidate/invalidate remote drbd: fix potential access after free drbd: call local-io-error handler early drbd: do not reset rs_pending_cnt too early drbd: reset congestion information before reporting it in /proc/drbd drbd: report congestion if we are waiting for some userland callback drbd: differentiate between normal and forced detach drbd: cleanup, remove two unused global flags floppy: Run floppy initialization asynchronous	2012-08-01 09:06:47 -07:00
Linus Torvalds	ac694dbdbc	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...	2012-07-31 19:25:39 -07:00
Linus Torvalds	3e9a97082f	This patch series contains a major revamp of how we collect entropy from interrupts for /dev/random and /dev/urandom. The goal is to addresses weaknesses discussed in the paper "Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices", by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will be published in the Proceedings of the 21st Usenix Security Symposium, August 2012. (See https://factorable.net for more information and an extended version of the paper.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8 lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+ LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T 6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw 03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi 3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67 AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0 NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0 TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC bi5FC1OolqhvzVIdsqgt =u7KM -----END PGP SIGNATURE----- Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random subsystem patches from Ted Ts'o: "This patch series contains a major revamp of how we collect entropy from interrupts for /dev/random and /dev/urandom. The goal is to addresses weaknesses discussed in the paper "Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices", by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will be published in the Proceedings of the 21st Usenix Security Symposium, August 2012. (See https://factorable.net for more information and an extended version of the paper.)" Fix up trivial conflicts due to nearby changes in drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c} * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits) random: mix in architectural randomness in extract_buf() dmi: Feed DMI table to /dev/random driver random: Add comment to random_initialize() random: final removal of IRQF_SAMPLE_RANDOM um: remove IRQF_SAMPLE_RANDOM which is now a no-op sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op ...	2012-07-31 19:07:42 -07:00
Mel Gorman	7f338fe454	nbd: set SOCK_MEMALLOC for access to PFMEMALLOC reserves Set SOCK_MEMALLOC on the NBD socket to allow access to PFMEMALLOC reserves so pages backed by NBD, particularly if swap related, can be cleaned to prevent the machine being deadlocked. It is still possible that the PFMEMALLOC reserves get depleted resulting in deadlock but this can be resolved by the administrator by increasing min_free_kbytes. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Linus Torvalds	cc8362b1f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...	2012-07-31 14:35:28 -07:00
Quoc-Son Anh	cd58ad7d18	NVMe: Use ida for nvme device instance Signed-off-by: Quoc-Son Anh <quoc-sonx.anh@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-31 13:31:34 -04:00
Matthew Wilcox	0ac13140d7	NVMe: Fix whitespace damage in nvme_init Commit `5c42ea1643` used spaces instead of tabs. Also remove the unnecessary initialisation of the 'result' variable. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-31 13:31:16 -04:00
Dan Carpenter	22fff826e7	NVMe: handle allocation failure in nvme_map_user_pages() We should return here and avoid a NULL dereference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-31 13:15:43 -04:00
Jens Axboe	10af8138eb	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/floppy into for-3.6/drivers	2012-07-31 11:47:36 +02:00
Fengguang Wu	2fb2ca6f5b	floppy: remove duplicated flag FD_RAW_NEED_DISK Fix coccinelle warning (without behavior change): drivers/block/floppy.c:2518:32-48: duplicated argument to & or \| Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-31 11:38:39 +02:00
NeilBrown	74018dc306	blk: pass from_schedule to non-request unplug functions. This will allow md/raid to know why the unplug was called, and will be able to act according - if !from_schedule it is safe to perform tasks which could themselves schedule. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-07-31 09:08:15 +02:00
NeilBrown	9cbb175088	blk: centralize non-request unplug handling. Both md and umem has similar code for getting notified on an blk_finish_plug event. Centralize this code in block/ and allow each driver to provide its distinctive difference. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-07-31 09:08:14 +02:00
Chetan Loke	01ff5dbc09	block/nbd: micro-optimization in nbd request completion Add in-flight cmds to the tail. That way while searching (during request completion),we will always get a hit on the first element. Signed-off-by: Chetan Loke <loke.chetan@gmail.com> Acked-by: Paul.Clements@steeleye.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-07-31 08:47:13 +02:00
Alex Elder	1fe5e99321	rbd: create rbd_refresh_helper() Create a simple helper that handles the common case of calling __rbd_refresh_header() while holding the ctl_mutex. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	b813623ab9	rbd: return obj version in __rbd_refresh_header() Add a new parameter to __rbd_refresh_header() through which the version of the header object is passed back to the caller. In most cases this isn't needed. The main motivation is to normalize (almost) all calls to __rbd_refresh_header() so they are all wrapped immediately by mutex_lock()/mutex_unlock(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	ccece235d3	rbd: fixes in rbd_header_from_disk() This fixes a few issues in rbd_header_from_disk(): - There is a check intended to catch overflow, but it's wrong in two ways. - First, the type we don't want to overflow is size_t, not unsigned int, and there is now a SIZE_MAX we can use for use with that type. - Second, we're allocating the snapshot ids and snapshot image sizes separately (each has type u64; on disk they grouped together as a rbd_image_header_ondisk structure). So we can use the size of u64 in this overflow check. - If there are no snapshots, then there should be no snapshot names. Enforce this, and issue a warning if we encounter a header with no snapshots but a non-zero snap_names_len. - When saving the snapshot names into the header, be more direct in defining the offset in the on-disk structure from which they're being copied by using "snap_count" rather than "i" in the array index. - If an error occurs, the "snapc" and "snap_names" fields are freed at the end of the function. Make those fields be null pointers after they're freed, to be explicit that they are no longer valid. - Finally, move the definition of the local variable "i" to the innermost scope in which it's needed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	913d2fdcf6	rbd: always pass ops array to rbd_req_sync_op() All of the callers of rbd_req_sync_op() except one pass a non-null "ops" pointer. The only one that does not is rbd_req_sync_read(), which passes CEPH_OSD_OP_READ as its "opcode" and, CEPH_OSD_FLAG_READ for "flags". By allocating the ops array in rbd_req_sync_read() and moving the special case code for the null ops pointer into it, it becomes clear that much of that code is not even necessary. In addition, the "opcode" argument to rbd_req_sync_op() is never actually used, so get rid of that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	d67d4be56a	rbd: pass null version pointer in add_snap() rbd_header_add_snap() passes the address of a version variable to rbd_req_sync_exec(), but it ignores the result. Just pass a null pointer instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	57cfc1060f	rbd: make rbd_create_rw_ops() return a pointer Either rbd_create_rw_ops() will succeed, or it will fail because a memory allocation failed. Have it just return a valid pointer or null rather than stuffing a pointer into a provided address and returning an errno. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:46 -07:00
Alex Elder	4e891e0af0	rbd: have __rbd_add_snap_dev() return a pointer It's not obvious whether the snapshot pointer whose address is provided to __rbd_add_snap_dev() will be assigned by that function. Change it to return the snapshot, or a pointer-coded errno in the event of a failure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:21:45 -07:00
Alex Elder	070c633f60	rbd: drop "object_name" from rbd_req_sync_unwatch() rbd_req_sync_unwatch() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:54 -07:00
Alex Elder	7f0a24d855	rbd: drop "object_name" from rbd_req_sync_notify_ack() rbd_req_sync_notify_ack() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:53 -07:00
Alex Elder	4cb162508a	rbd: drop "object_name" from rbd_req_sync_notify() rbd_req_sync_notify() only ever uses rbd_dev->header_name as the value of its "object_name" parameter, and that value is available within the function already. So get rid of the parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:53 -07:00
Alex Elder	0e6f322d55	rbd: drop "object_name" from rbd_req_sync_watch() rbd_req_sync_watch() is only called in one place, and in that place it passes rbd_dev->header_name as the value of the "object_name" parameter. This value is available within the function already. Having the extra parameter leaves the impression the object name could take on different values, but it does not. So get rid of the parameter. We can always add it back again if we find we want to watch some other object in the future. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:52 -07:00
Alex Elder	14e7085d84	rbd: drop rbd_dev parameter in snap functions Both rbd_register_snap_dev() and __rbd_remove_snap_dev() have rbd_dev parameters that are unused. Remove them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:51 -07:00
Alex Elder	ed63f4fd9a	rbd: drop rbd_header_from_disk() gfp_flags parameter The function rbd_header_from_disk() is only called in one spot, and it passes GFP_KERNEL as its value for the gfp_flags parameter. Just drop that parameter and substitute GFP_KERNEL everywhere within that function it had been used. (If we find we need the parameter again in the future it's easy enough to add back again.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:50 -07:00
Alex Elder	9a5d690b08	rbd: snapc is unused in rbd_req_sync_read() The "snapc" parameter to in rbd_req_sync_read() is not used, so get rid of it. Reported-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:49 -07:00
Alex Elder	de71a2970d	rbd: rename rbd_device->id The "id" field of an rbd device structure represents the unique client-local device id mapped to the underlying rbd image. Each rbd image will have another id--the image id--and each snapshot has its own id as well. The simple name "id" no longer conveys the information one might like to have. Rename the device "id" field in struct rbd_dev to be "dev_id" to make it a little more obvious what we're dealing with without having to think more about context. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:49 -07:00
Alex Elder	8e94af8e2b	rbd: encapsulate header validity test If an rbd image header is read and it doesn't begin with the expected magic information, a warning is displayed. This is a fairly simple test, but it could be extended at some point. Fix the comparison so it actually looks at the "text" field rather than the front of the structure. In any case, encapsulate the validity test in its own function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:48 -07:00
Alex Elder	bd919d45aa	rbd: clean up a few dout() calls There was a dout() call in rbd_do_request() that was reporting the reporting the offset as the length and vice versa. While fixing that I did a quick scan of other dout() calls and fixed a couple of other minor things. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:46 -07:00
Alex Elder	a059329056	rbd: simplify __rbd_remove_all_snaps() This just replaces a while loop with list_for_each_entry_safe() in __rbd_remove_all_snaps(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	a66f8c97a3	rbd: drop extra header_rwsem init In commit `c666601a` there was inadvertently added an extra initialization of rbd_dev->header_rwsem. This gets rid of the duplicate. Reported-by: Guangliang Zhao <gzhao@suse.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	9e15dc735a	rbd: kill rbd_image_header->snap_seq The snap_seq field in an rbd_image_header structure held the value from the rbd image header when it was last refreshed. We now maintain this value in the snapc->seq field. So get rid of the other one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:44 -07:00
Alex Elder	505cbb9bed	rbd: set snapc->seq only when refreshing header In rbd_header_add_snap() there is code to set snapc->seq to the just-added snapshot id. This is the only remnant left of the use of that field for recording which snapshot an rbd_dev was associated with. That functionality is no longer supported, so get rid of that final bit of code. Doing so means we never actually set snapc->seq any more. On the server, the snapshot context's sequence value represents the highest snapshot id ever issued for a particular rbd image. So we'll make it have that meaning here as well. To do so, set this value whenever the rbd header is (re-)read. That way it will always be consistent with the rest of the snapshot context we maintain. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:43 -07:00
Alex Elder	78dc447d3c	rbd: preserve snapc->seq in rbd_header_set_snap() In rbd_header_set_snap(), there is logic to make the snap context's seq field get set to a particular snapshot id, or 0 if there is no snapshot for the rbd image. This seems to be an artifact of how the current snapshot id for an rbd_dev was recorded before the rbd_dev->snap_id field began to be used for that purpose. There's no need to update the value of snapc->seq here any more, so stop doing it. Tidy up a few local variables in that function while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:42 -07:00
Alex Elder	75fe9e1981	rbd: don't use snapc->seq that way In what appears to be an artifact of a different way of encoding whether an rbd image maps a snapshot, __rbd_refresh_header() has code that arranges to update the seq value in an rbd image's snapshot context to point to the first entry in its snapshot array if that's where it was pointing initially. We now use rbd_dev->snap_id to record the snapshot id--using the special value CEPH_NOSNAP to indicate the rbd_dev is not mapping a snapshot at all. There is therefore no need to check for this case, nor to update the seq value, in __rbd_refresh_header(). Just preserve the seq value that rbd_read_header() provides (which, at the moment, is nothing). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:41 -07:00
Josh Durgin	a71b891bc7	rbd: send header version when notifying Previously the original header version was sent. Now, we update it when the header changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	d1d2564654	rbd: use reference counting for the snap context This prevents a race between requests with a given snap context and header updates that free it. The osd client was already expecting the snap context to be reference counted, since it get()s it in ceph_osdc_build_request and put()s it when the request completes. Also remove the second down_read()/up_read() on header_rwsem in rbd_do_request, which wasn't actually preventing this race or protecting any other data. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	93a24e084d	rbd: set image size when header is updated The image may have been resized. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:39 -07:00
Josh Durgin	a51aa0c042	rbd: expose the correct size of the device in sysfs If an image was mapped to a snapshot, the size of the head version would be shown. Protect capacity with header_rwsem, since it may change. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:38 -07:00
Josh Durgin	474ef7ce83	rbd: only reset capacity when pointing to head Snapshots cannot be resized, and the new capacity of head should not be reflected by the snapshot. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:37 -07:00
Josh Durgin	e88a36ec96	rbd: return errors for mapped but deleted snapshot When a snapshot is deleted, the OSD will return ENOENT when reading from it. This is normally interpreted as a hole by rbd, which will return zeroes. To minimize the time in which this can happen, stop requests early when we are notified that our snapshot no longer exists. [elder@inktank.com: updated __rbd_init_snaps_header() logic] Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:36 -07:00
Alex Elder	d1f57ea663	rbd: kill num_reply parameters Several functions include a num_reply parameter, but it is never used. Just get rid of it everywhere--it seems to be something that never got fully implemented. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:09 -07:00
Alex Elder	43ae470112	rbd: option symbol renames Use the name "ceph_opts" consistently (rather than just "opt") for pointers to a ceph_options structure. Change the few spots that don't use "rbd_opts" for a rbd_options pointer to match the rest. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:08 -07:00
Alex Elder	aded07ea9f	rbd: more symbol renames Rename variables named "obj" which represent object names so they're consistently named "object_name". Rename the "cls" and "method" parameters in rbd_req_sync_exec() to be "class_name" and "method_name", and make similar changes to the names of local variables in that function representing the lengths of those names. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:07 -07:00
Alex Elder	0bed54dc9a	rbd: rename some fields in struct rbd_dev An rbd image is not a single object, but a logical construct made up of an aggregation of objects. Rename some fields in struct rbd_dev, in hopes of reinforcing this. obj --> image_name obj_len --> image_name_len obj_md_name --> header_name Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:06 -07:00
Alex Elder	0ce1a79413	rbd: use rbd_dev consistently Most variables that represent a struct rbd_device are named "rbd_dev", but in some cases "dev" is used instead. Change all the "dev" references so they use "rbd_dev" consistently, to make it clear from the name that we're working with an RBD device (as opposed to, for example, a struct device). Similarly, change the name of the "dev" field in struct rbd_notify_info to be "rbd_dev". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:05 -07:00
Alex Elder	820a5f3e94	rbd: dynamically allocate snapshot name There is no need to impose a small limit the length of the snapshot name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the snapshot name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	bf3e5ae112	rbd: dynamically allocate image name There is no need to impose a small limit the length of the rbd image name recorded in a struct rbd_dev. Remove the limitation by allocating space for the image name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	cb8627c76d	rbd: dynamically allocate image header name There is no need to impose a small limit the length of the header name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the header name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:03 -07:00
Alex Elder	849b4260d4	rbd: dynamically allocate object prefix There is no need to impose a small limit the length of the object prefix recorded for an rbd image in a struct rbd_image_header. Remove the limitation by allocating space for the object prefix dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:02 -07:00
Alex Elder	d22f76e703	rbd: dynamically allocate pool name There is no need to impose a small limit the length of the pool name recorded for an rbd image in a struct rbd_device. Remove the limitation by allocating space for the pool name ynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:01 -07:00
Alex Elder	9bb2f334b9	rbd: create pool_id device attribute Add an entry under /sys/bus/rbd/devices/<N>/ named "pool_id" that provides the id for the pool the rbd image is assocatied with. This is in addition to the pool name already provided. Rename the "poolid" field in struct rbd_device to be "pool_id". Update the documentation to reflect the addition of this new entry. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:00 -07:00
Alex Elder	ca1e49a6af	rbd: rename rbd_dev->block_name Each rbd image has a name that forms the basis of all data objects backing the device. Old (format 1) images refer to this name as the "block name," while new (format 2) images use the term "object prefix" for this. Change the field name in the in-core rbd image header structure to reflect the more modern usage. We intentionally keep the the name "block_name" in the on-disk definition for format 1 image headers. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:59 -07:00
Alex Elder	ea3352f4aa	rbd: define dup_token() Define a new function dup_token(), to be used during argument parsing for making dynamically-allocated copies of tokens being parsed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:58 -07:00
Alex Elder	ad4f232f28	rbd: drop a useless local variable In rbd_req_sync_notify_ack(), a local variable was needlessly being used to hold a null pointer. Just pass NULL instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:56 -07:00
Jens Axboe	72ea1f74fc	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.6/drivers	2012-07-30 09:03:10 +02:00
Paolo Bonzini	cd5d503862	virtio-blk: allow toggling host cache between writeback and writethrough This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature, which exposes the cache mode in the configuration space and lets the driver modify it. The cache mode is exposed via sysfs. Even if the host does not support the new feature, the cache mode is visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:52 +09:30
Asias He	2c95a32909	virtio-blk: Use block layer provided spinlock Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ #238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Asias He	483001c765	virtio-blk: Reset device after blk_cleanup_queue() blk_cleanup_queue() will call blk_drian_queue() to drain all the requests before queue DEAD marking. If we reset the device before blk_cleanup_queue() the drain would fail. 1) if the queue is stopped in do_virtblk_request() because device is full, the q->request_fn() will not be called. blk_drain_queue() { while(true) { ... if (!list_empty(&q->queue_head)) __blk_run_queue(q) { if (queue is not stoped) q->request_fn() } ... } } Do no reset the device before blk_cleanup_queue() gives the chance to start the queue in interrupt handler blk_done(). 2) In commit `b79d866c8b`, We abort requests dispatched to driver before blk_cleanup_queue(). There is a race if requests are dispatched to driver after the abort and before the queue DEAD mark. To fix this, instead of aborting the requests explicitly, we can just reset the device after after blk_cleanup_queue so that the device can complete all the requests before queue DEAD marking in the drain process. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Asias He	02e2b12494	virtio-blk: Call del_gendisk() before disable guest kick del_gendisk() might not return due to failing to remove the /sys/block/vda/serial sysfs entry when another thread (udev) is trying to read it. virtblk_remove() vdev->config->reset() : guest will not kick us through interrupt del_gendisk() device_del() kobject_del(): got stuck, sysfs entry ref count non zero sysfs_open_file(): user space process read /sys/block/vda/serial sysfs_get_active() : got sysfs entry ref count dev_attr_show() virtblk_serial_show() blk_execute_rq() : got stuck, interrupt is disabled request cannot be finished This patch fixes it by calling del_gendisk() before we disable guest's interrupt so that the request sent in virtblk_serial_show() will be finished and del_gendisk() will success. This fixes another race in hot-unplug process. It is save to call del_gendisk(vblk->disk) before flush_work(&vblk->config_work) which might access vblk->disk, because vblk->disk is not freed until put_disk(vblk->disk). Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-07-30 13:30:51 +09:30
Keith Busch	c7d36ab8fa	NVMe: Fix uninitialized iod compiler warning Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-27 14:25:22 -04:00
Keith Busch	a0cadb85b8	NVMe: Do not set IO queue depth beyond device max Set the depth for IO queues to the device's maximum supported queue entries if the requested depth exceeds the device's capabilities. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-27 13:57:23 -04:00
Keith Busch	8fc23e032d	NVMe: Set block queue max sectors Set the max hw sectors in a namespace's request queue if the nvme device has a max data transfer size. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 13:43:20 -04:00
Keith Busch	a42ceccef0	NVMe: use namespace id for nvme_get_features The specification does not provide a use for command dword11 in the NVMe Get Features command, but does use the NSID for some features. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:46:36 -04:00
Keith Busch	50af8baec4	NVMe: replace nvme_ns with nvme_dev for user admin The function nvme_user_admin_command does not require a namespace to proceed. Replace with the nvme_dev structure so that it can be called from contexts that do not have a namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:24:36 -04:00
Keith Busch	5c42ea1643	NVMe: Fix nvme module init when nvme_major is set register_blkdev returns 0 when given a valid major number. Reported-by:Ross Zwisler <ross.zwisler@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-26 12:23:38 -04:00
Keith Busch	e9ef46369f	NVMe: Set request queue logical block size Sets the request queue logical block size with the block size of the namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-07-25 14:52:49 -04:00
Lars Ellenberg	a73ff3231d	drbd: announce FLUSH/FUA capability to upper layers Unconditionally announce FLUSH/FUA to upper layers. If the lower layers on either node do not actually support this, generic_make_request() will deal with it. If this causes performance regressions on your setup, make sure there are no volatile caches involved, and mount -o nobarrier or equivalent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:28 +02:00
Lars Ellenberg	db141b2f42	drbd: fix max_bio_size to be unsigned We capped our max_bio_size respectively max_hw_sectors with min_t(int, lower level limit, our limit); unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their limits to "-1U", and that is of course a smaller "int" value than our limit. Impact: we started to request 16 MB resync requests, which lead to protocol error and a reconnect loop. Fix all relevant constants and parameters to be unsigned int. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:00 +02:00
Lars Ellenberg	7ee1fb93f3	drbd: flush drbd work queue before invalidate/invalidate remote If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:58 +02:00
Lars Ellenberg	c12e9c8964	drbd: fix potential access after free Occasionally, if we disconnect, we triggered this assert: block drbd7: ASSERT FAILED tl_hash[27] == c30b0f04, expected NULL hlist_del() happens only on master bio completion. We used to wait for pending IO to complete before freeing tl_hash on disconnect. We no longer do so, since we learned to "freeze" IO on disconnect. If the local disk is too slow, we may reach C_STANDALONE early, and there are still some requests pending locally when we call drbd_free_tl_hash(). If we now free the tl_hash, and later the local IO completion completes the master bio, which then does hlist_del() and clobbers freed memory. Do hlist_del_init() and hlist_add_fake() before kfree(tl_hash), so the hlist_del() on master bio completion is harmless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:16 +02:00
Lars Ellenberg	63a6d0bb3d	drbd: call local-io-error handler early In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:10:41 +02:00
Lars Ellenberg	0029d62434	drbd: do not reset rs_pending_cnt too early Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:09:53 +02:00
Lars Ellenberg	88437879fb	drbd: reset congestion information before reporting it in /proc/drbd We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:48 +02:00
Lars Ellenberg	c2ba686f35	drbd: report congestion if we are waiting for some userland callback If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:18 +02:00
Lars Ellenberg	383606e0de	drbd: differentiate between normal and forced detach Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:06:18 +02:00
Lars Ellenberg	d264580145	drbd: cleanup, remove two unused global flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:02:41 +02:00
Jens Axboe	b1af9be5ef	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/floppy into for-3.6/drivers	2012-07-24 13:15:08 +02:00
Linus Torvalds	7100e505b7	Power management updates for 3.6 * ACPI conversion to PM handling based on struct dev_pm_ops. * Conversion of a number of platform drivers to PM handling based on struct dev_pm_ops and removal of empty legacy PM callbacks from a couple of PCI drivers. * Suspend-to-both for in-kernel hibernation from Bojan Smojver. * cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti U Murthy. * cpufreq bug fixes from Jonghwa Lee and Stephen Boyd. * Suspend and hibernate fixes from Srivatsa S. Bhat and Colin Cross. * Generic PM domains framework updates. * RTC CMOS wakeup signaling update from Paul Fox. * sparse warnings fixes from Sachin Kamat. * Build warnings fixes for the generic PM domains framework and PM sysfs code. * sysfs switch for printing device suspend times from Sameer Nanda. * Documentation fix from Oskar Schirmer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQDF5eAAoJEKhOf7ml8uNsEaAP/2wg4faoOGob5A0/7tLqG3Cw xnTmGsfL7wG07Q8ykCL1BSlBb1VeJz8L6LTmUpaABI4M//oIBlcYQKyCE0Tat1AO 9bJXFzK7qcHMhkTz6d6LDqtVzR3NGM3ypjZqj8aEXBov07LMR1AXvgNwXXhv25zM 0unwrh1XNinBN3n+oaktpWk1YHUjsa5IMU+2tQJrocuHXcgK30vGXZVrZ4g9w1c2 eS+ED1oKUqOYtFzIUX+aCtaDDheGaPlugk/GOtIB7Sae0s0vMlxH/T5ncB4SxRC+ v3s4OykqQc5Dc8+0bNlBH7ykSVNB0PoQiyKDY67CxtH+q1xQSc9/f3XJqnGMaVDE 17eZUZsL4qSyzRuCbPCGAgwBHmx3qNCMu1i1BcmnSxU+ikPUeCR7mYOP0mRThwPH OSfs+c/vZ+Ow6CwVE4UFrbm9Jve7ADnCrlZzT2m6XjhHGyjKP7SJlzP9TPsZ0LRk oxgQDYHmxbo50t9tBCz5L4ZTMKkDp28e78x84/CteP85srcW3GqDxrPyp2uzJu5O tvIEBvVlc4ucq8sG83RkugQwrG/2cQwG2HO9ERAwq01HHA1BYsuU3A961Jqf5CZo nFRSnByvVj/imPf47OWpDPAbVEs7jxufJuLEbPwGj1MkttTGDBIRu3zldXt2S6kP Q4qYU6fDaQQHFc90pqxQ =vC4/ -----END PGP SIGNATURE----- Merge tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: - ACPI conversion to PM handling based on struct dev_pm_ops. - Conversion of a number of platform drivers to PM handling based on struct dev_pm_ops and removal of empty legacy PM callbacks from a couple of PCI drivers. - Suspend-to-both for in-kernel hibernation from Bojan Smojver. - cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti Murthy. - cpufreq bug fixes from Jonghwa Lee and Stephen Boyd. - Suspend and hibernate fixes from Srivatsa Bhat and Colin Cross. - Generic PM domains framework updates. - RTC CMOS wakeup signaling update from Paul Fox. - sparse warnings fixes from Sachin Kamat. - Build warnings fixes for the generic PM domains framework and PM sysfs code. - sysfs switch for printing device suspend times from Sameer Nanda. - Documentation fix from Oskar Schirmer. * tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (70 commits) cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch EXYNOS: bugfix on retrieving old_index from freqs.old PM / Sleep: call early resume handlers when suspend_noirq fails PM / QoS: Use NULL pointer instead of plain integer in qos.c PM / QoS: Use NULL pointer instead of plain integer in pm_qos.h PM / Sleep: Require CAP_BLOCK_SUSPEND to use wake_lock/wake_unlock PM / Sleep: Add missing static storage class specifiers in main.c cpuilde / ACPI: remove time from acpi_processor_cx structure cpuidle / ACPI: remove usage from acpi_processor_cx structure cpuidle / ACPI : remove latency_ticks from acpi_processor_cx structure rtc-cmos: report wakeups from interrupt handler PM / Sleep: Fix build warning in sysfs.c for CONFIG_PM_SLEEP unset PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset olpc-xo15-sci: Use struct dev_pm_ops for power management PM / Domains: Replace plain integer with NULL pointer in domain.c file PM / Domains: Add missing static storage class specifier in domain.c file PM / crypto / ux500: Use struct dev_pm_ops for power management PM / IPMI: Remove empty legacy PCI PM callbacks tpm_nsc: Use struct dev_pm_ops for power management tpm_tis: Use struct dev_pm_ops for power management ...	2012-07-22 13:36:52 -07:00
Theodore Ts'o	89c30f161c	xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op With the changes in the random tree, IRQF_SAMPLE_RANDOM is now a no-op; interrupt randomness is now collected unconditionally in a very low-overhead fashion; see commit `775f4b297b`. The IRQF_SAMPLE_RANDOM flag was scheduled to be removed in 2009 on the feature-removal-schedule, so this patch is preparation for the final removal of this flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>	2012-07-19 10:39:25 -04:00
Rafael J. Wysocki	bfaa07bc32	Merge branch 'pm-drivers' * pm-drivers: rtc-cmos: report wakeups from interrupt handler PM / crypto / ux500: Use struct dev_pm_ops for power management PM / IPMI: Remove empty legacy PCI PM callbacks tpm_nsc: Use struct dev_pm_ops for power management tpm_tis: Use struct dev_pm_ops for power management tpm_atmel: Use struct dev_pm_ops for power management PM / TPM: Drop unused pm_message_t argument from tpm_pm_suspend() omap-rng: Use struct dev_pm_ops for power management mg_disk: Use struct dev_pm_ops for power management msi-laptop: Use struct dev_pm_ops for power management hdaps: Use struct dev_pm_ops for power management sonypi: Use struct dev_pm_ops for power management intel_mid_thermal: Use struct dev_pm_ops for power management acer-wmi: Use struct dev_pm_ops for power management intel_ips: Remove empty legacy PM callbacks thinkpad_acpi: Use struct dev_pm_ops instead of legacy PM routines thinkpad_acpi: Drop pm_message_t arguments from suspend routines	2012-07-19 00:03:42 +02:00
Dan Carpenter	6a3ca4f188	rbd: endian bug in rbd_req_cb() Sparse complains about this because: drivers/block/rbd.c:996:20: warning: cast to restricted __le32 drivers/block/rbd.c:996:20: warning: cast from restricted __le16 These are set in osd_req_encode_op() and they are le16. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit `895cfcc810`)	2012-07-17 21:30:31 -07:00
Yan, Zheng	236df3755d	rbd: Fix ceph_snap_context size calculation ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit `f9f9a19044`)	2012-07-17 21:30:19 -07:00
Silva Paulo	68d740d79c	blk: fix wrong idr_pre_get() error check in loop.c The idr_pre_get() function never returns a value < 0. It returns 0 (no memory) or 1 (OK). Reported-by: Silva Paulo <psdasilva@yahoo.com> [ Rewrote Silva's patch, but attributing it to Silva anyway - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-14 15:39:58 -07:00
Rafael J. Wysocki	156ffcb42a	mg_disk: Use struct dev_pm_ops for power management Make the mg_disk driver define its PM callbacks through a struct dev_pm_ops object rather than by using legacy PM hooks in struct platform_driver. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-06 19:07:00 +02:00
Andi Kleen	0cc15d03bc	floppy: Run floppy initialization asynchronous floppy_init is quite slow, 3s on my test system to determine that there is no floppy. Run it asynchronous to the other init calls to improve boot time. [jkosina@suse.cz: fix modular build] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-04 14:01:40 +02:00
Linus Torvalds	dab058fd5f	floppy: cancel any pending fd_timeouts before adding a new one In commit `070ad7e793` ("floppy: convert to delayed work and single-thread wq") the 'fd_timeout' timer was converted to a delayed work. However, the "del_timer(&fd_timeout)" was lost in the process, and any previous pending timeouts would stay active when we then re-queued the timeout. This resulted in the floppy probe sequence having a (stale) 20s timeout rather than the intended 3s timeout, and thus made booting with the floppy driver (but no actual floppy controller) take much longer than it should. Of course, there's little reason for most people to compile the floppy driver into the kernel at all, which is why most people never noticed. Canceling the delayed work where we used to do the del_timer() fixes the issue, and makes the floppy probing use the proper new timeout instead. The three second timeout is still very wasteful, but better than the 20s one. Reported-and-tested-by: Andi Kleen <ak@linux.intel.com> Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-03 15:51:22 -07:00
Sage Weil	9a64e8e0ac	Linux 3.5-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPyr4LAAoJEHm+PkMAQRiGhvMH/1uXaDmJiiyAtMhC9kQbLclK 5RpUOV+ukRrPXBJhwWGEZvC9G/DiWAfZ/19Ee6qTGZbA46yxkgZklqO+bw7fuOLH dPf4MNXdhgOgbs0KkVAk6aXIYzIU836pcYg+LcapG8E8SZp3SWbJzrVbUPFwPM+m Sv11ZcpJfM2HH9wFRdKErUOiZHsMY+LZHcw0nx+BObytjgzBbzHNkpF57F714TUO QplYpIToO3XtGhIM1yRDxww+2zFlVNsCZ8IC57EDbLb8BMZWuyZoFgWZqLAnrU0u vy7CHLledMSvs855juJ9JxGo/EDnfwJpCnjmcp8BY+h4b5T/k5mGK6d9aeXYRf4= =CcWn -----END PGP SIGNATURE----- Merge tag 'v3.5-rc1' Linux 3.5-rc1 Conflicts: net/ceph/messenger.c	2012-06-15 12:32:04 -07:00
Jens Axboe	6d407cfaf5	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus	2012-06-13 21:19:42 +02:00
Jens Axboe	987751719b	Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2012-06-13 21:19:06 +02:00
Tao Guo	32587371ad	umem: fix up unplugging Fix a regression introduced by `7eaceaccab` ("block: remove per-queue plugging"). In that patch, Jens removed the whole mm_unplug_device() function, which used to be the trigger to make umem start to work. We need to implement unplugging to make umem start to work, or I/O will never be triggered. Signed-off-by: Tao Guo <Tao.Guo@emc.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Shaohua Li <shli@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-13 21:17:21 +02:00
Lars Ellenberg	0d5934e3c2	drbd: fix null pointer dereference with on-congestion policy when diskless We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:35:19 +02:00
Lars Ellenberg	1ed25b269e	drbd: fix list corruption by failing but already aborted reads If a read is aborted due to force-detach of a supposedly unresponsive local backing device, and retried on the peer, it can happen that the local request later still completes (hopefully with an error). As it may already have been completed to upper layers meanwhile, it must not be retried again now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:34:51 +02:00
Lars Ellenberg	4eccc57979	drbd: fix access of unallocated pages and kernel panic BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:32:48 +02:00
Konrad Rzeszutek Wilk	6878c32e5c	xen/blkfront: Add WARN to deal with misbehaving backends. Part of the ring structure is the 'id' field which is under control of the frontend. The frontend stamps it with "some" value (this some in this implementation being a value less than BLK_RING_SIZE), and when it gets a response expects said value to be in the response structure. We have a check for the id field when spolling new requests but not when de-spolling responses. We also add an extra check in add_id_to_freelist to make sure that the 'struct request' was not NULL - as we cannot pass a NULL to __blk_end_request_all, otherwise that crashes (and all the operations that the response is dealing with end up with __blk_end_request_all). Lastly we also print the name of the operation that failed. [v1: s/BUG/WARN/ suggested by Stefano] [v2: Add extra check in add_id_to_freelist] [v3: Redid op_name per Jan's suggestion] [v4: add const * and add WARN on failure returns] Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-12 08:29:04 -04:00
Dan Carpenter	895cfcc810	rbd: endian bug in rbd_req_cb() Sparse complains about this because: drivers/block/rbd.c:996:20: warning: cast to restricted __le32 drivers/block/rbd.c:996:20: warning: cast from restricted __le16 These are set in osd_req_encode_op() and they are le16. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:54 -05:00
Yan, Zheng	f9f9a19044	rbd: Fix ceph_snap_context size calculation ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:53 -05:00
Asai Thambi S P	7b421d24ea	mtip32xx: Create debugfs entries for troubleshooting On module load, creates a debugfs parent 'rssd' in debugfs root. Then for each device, create a new node with corresponding disk name. Under the new node, two entries 'registers' and 'flags' are created. NOTE: These entries were removed from sysfs in the previous patch Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-05 09:13:49 +02:00
Asai Thambi S P	7412ff139d	mtip32xx: Remove 'registers' and 'flags' from sysfs This patch removes entries 'registers' and 'flags' from sysfs. Updated ABI file to reflect this change. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-05 09:13:48 +02:00
Sachin Kamat	87c9ea76a2	mtip32xx: Remove version.h header file inclusion version.h header file inclusion is no longer required. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>	2012-06-04 10:00:32 +02:00
Asai Thambi S P	b77874c969	mtip32xx: Changes to sysfs entries * Formatted the output of 'registers' entry * Added "Commands in Q' to output of 'registers' entry * Added a new entry 'flags' Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	8ce800935d	mtip32xx: Convert macro definitions for flag bits to enum Convert macro definitions for flags bits to enum Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	377b8fc6d7	mtip32xx: minor performance tweak When checking for command completions if the register value is zero, proceed to next register. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	e602878fd8	mtip32xx: Fix to support more than one sector in exec_drive_command() Fix to support more than one sector in exec_drive_command(). Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	0a07ab224a	mtip32xx: Use plain spinlock for 'cmd_issue_lock' 'cmd_issue_lock' is for only acquiring a free slot, and it is not used in interrupt context. So replaced irq version with non-irq version of spinlock. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	6c8ab69818	mtip32xx: Set block queue boundary variables Set the following block queue boundary variables * max_hw_sectors * max_segment_size Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Removed setting of q->nr_requests. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:46:50 +02:00
Asai Thambi S P	d02e1f0ad0	mtip32xx: Fix to handle TFE for PIO(IOCTL/internal) commands If a PIO (IOCTL/internal) command resulted in TFE, signal the wait event or break out of polling. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	971890f258	mtip32xx: Change HDIO_GET_IDENTITY to return stored data For the ioctl command HDIO_GET_IDENTITY, return the stored copy of IDENTIFY DATA instead of sending the command to the device - similar to libata. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	2df7aa96e7	mtip32xx: Set custom timeouts for PIO commands This change sets custom timeouts depending on PIO command. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	6bb688c048	mtip32xx: fix clearing an incorrect register in mtip_init_port Fix clearing an incorrect register in mtip_init_port Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Konrad Rzeszutek Wilk	8c9ce606a6	xen/blkback: Copy id field when doing BLKIF_DISCARD. We weren't copying the id field so when we sent the response back to the frontend (especially with a 64-bit host and 32-bit guest), we ended up using a random value. This lead to the frontend crashing as it would try to pass to __blk_end_request_all a NULL 'struct request' (b/c it would use the 'id' to find the proper 'struct request' in its shadow array) and end up crashing: BUG: unable to handle kernel NULL pointer dereference at 000000e4 IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40 .. snip.. EIP is at __blk_end_request_all+0xc/0x40 .. snip.. [<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront] This fixes the bug by passing in the proper id for the response. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=824641 CC: stable@kernel.org Tested-by: William Dauchy <wdauchy@gmail.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-30 17:20:04 -04:00
Linus Torvalds	af56e0aa35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...	2012-05-30 11:17:19 -07:00
Linus Torvalds	a70f35af4e	Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver related changes for 3.5. It contains: - The floppy changes from Jiri. Jiri is now also marked as the maintainer of floppy.c, I shall be publically branding his forehead with red hot iron at the next opportune moment. - A batch of drbd updates and fixes from the linbit crew, as well as fixes from others. - Two small fixes for xen-blkfront courtesy of Jan." * 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits) floppy: take over maintainership floppy: remove floppy-specific O_EXCL handling floppy: convert to delayed work and single-thread wq xen-blkfront: module exit handling adjustments xen-blkfront: properly name all devices drbd: grammar fix in log message drbd: check MODULE for THIS_MODULE drbd: Restore the request restart logic drbd: introduce a bio_set to allocate housekeeping bios from drbd: remove unused define drbd: bm_page_async_io: properly initialize page->private drbd: use the newly introduced page pool for bitmap IO drbd: add page pool to be used for meta data IO drbd: allow bitmap to change during writeout from resync_finished drbd: fix race between drbdadm invalidate/verify and finishing resync drbd: fix resend/resubmit of frozen IO drbd: Ensure that data_size is not 0 before using data_size-1 as index drbd: Delay/reject other state changes while establishing a connection drbd: move put_ldev from __req_mod() to the endio callback drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE ...	2012-05-30 09:05:47 -07:00
Linus Torvalds	99262a3daf	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuv35AAoJENkgDmzRrbjxUx4P/0uc+0oNnZv11vYQsqHuhURa zMlsVdlXGVkvPqQiLY0QkrK5LcO6KiSnSk8vEnOYFIPjL4wNqL/4RRRLnTAJwmE+ lsrL9DblI8Ira/EZRv7d2L12QrP+F2ZGKOZr67uVxSaxH71fUqtiJ0jqA/I8AYH7 /V7+DgdIB1DD28Ya/JEFEUi41F08A6MU10hpaQWy9kXv09gCc9apgvH7/S3s9DaQ G640YWkoKZAx/OFBb8XFvpu9LqZcVl02Nl8goMZOKnMctC4iU3km7HeVjfwCgLjO AdA5spLMhDkS/xrpI0mSQ/wT0k0+sSYW5vEdW9N4XLZza0NgH9GfU4RtEuK85Slj 7bPviZOcpjtt0sGi4wXCaVjZyHROX6tyRvTMUAIj3D0oJglb5T9D3MCvQnadILb0 I0+7gk3d9rHqkO6CmjNaZG9IwR9NpFkbuolcFQuEaZoUMoKd2pYNQyxpbFGl+jCl 7ViFHAy+fydNqDoETKincld4A43KWxOV7jyEJd7hloKcCixsqI7ZdPS7X8amec72 a0hfNgMJzarZkTgo61Hair/d+vKGRJPgEdF1Yq76SDhYKD1TeWeDjmboctsiMjqe f5M4C6IdNJj9cDIlCxMk+3bX250oy7KG77v7Ux0/7nvtSWVa3yEMowD57hnn1But 0gNC8bjXDHRsho90rDRN =Kj9v -----END PGP SIGNATURE----- Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus Pull virtio updates from Rusty Russell. * tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix typo in comment virtio-mmio: Devices parameter parsing virtio_blk: Drop unused request tracking list virtio-blk: Fix hot-unplug race in remove method virtio: Use ida to allocate virtio index virtio: balloon: separate out common code between remove and freeze functions virtio: balloon: drop restore_common() 9p: disconnect channel when PCI device is removed virtio: update documentation to v0.9.5 of spec	2012-05-21 20:20:23 -07:00
Asias He	f65ca1dc6a	virtio_blk: Drop unused request tracking list Benchmark shows small performance improvement on fusion io device. Before: seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec After: seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:14 +09:30
Asias He	b79d866c8b	virtio-blk: Fix hot-unplug race in remove method If we reset the virtio-blk device before the requests already dispatched to the virtio-blk driver from the block layer are finised, we will stuck in blk_cleanup_queue() and the remove will fail. blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued before DEAD marking. However it will never success if the device is already stopped. We'll have q->in_flight[] > 0, so the drain will not finish. How to reproduce the race: 1. hot-plug a virtio-blk device 2. keep reading/writing the device in guest 3. hot-unplug while the device is busy serving I/O Test: ~1000 rounds of hot-plug/hot-unplug test passed with this patch. Changes in v3: - Drop blk_abort_queue and blk_abort_request - Use __blk_end_request_all to complete request dispatched to driver Changes in v2: - Drop req_in_flight - Use virtqueue_detach_unused_buf to get request dispatched to driver Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:13 +09:30
David S. Miller	17eea0df5f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-20 21:53:04 -04:00
Linus Torvalds	14e931a264	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()	2012-05-19 10:12:17 -07:00
Jens Axboe	4fd1ffaa12	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.5/drivers Philipp writes: This are the updates we have in the drbd-8.3 tree. They are intended for your "for-3.5/drivers" drivers branch. These changes include one new feature: * Allow detach from frozen backing devices with the new --force option; configurable timeout for backing devices by the new disk-timeout option And huge number of bug fixes: * Fixed a write ordering problem on SyncTarget nodes for a write to a block that gets resynced at the same time. The bug can only be triggered with a device that has a firmware that actually reorders writes to the same block * Fixed a race between disconnect and receive_state, that could cause a IO lockup * Fixed resend/resubmit for requests with disk or network timeout * Make sure that hard state changed do not disturb the connection establishing process (I.e. detach due to an IO error). When the bug was triggered it caused a retry in the connect process * Postpone soft state changes to no disturb the connection establishing process (I.e. becoming primary). When the bug was triggered it could cause both nodes going into SyncSource state * Fixed a refcount leak that could cause failures when trying to unload a protocol family modules, that was used by DRBD * Dedicated page pool for meta data IOs * Deny normal detach (as opposed to --forced) if the user tries to detach from the last UpToDate disk in the resource * Fixed a possible protocol error that could be caused by "unusual" BIOs. * Enforce the disk-timeout option also on meta-data IO operations * Implemented stable bitmap pages when we do a full write out of the bitmap * Fixed a rare compatibility issue with DRBD's older than 8.3.7 when negotiating the bio_size * Fixed a rare race condition where an empty resync could stall with if pause/unpause events happen in parallel * Made the re-establishing of connections quicker, if it got a broken pipe once. Previously there was a bug in the code caused it to waste the first successful established connection after a broken pipe event. PS: I am postponing the drbd-8.4 for mainline for one or two kernel development cycles more (the ~400 patchets set).	2012-05-18 16:20:06 +02:00
Jens Axboe	13828dec45	Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.5/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.5 in your for-3.5/drivers branch. The changes in it are rather simple - cleaning up some code and adding proper mechanism to unload without leaking memory.	2012-05-18 16:17:41 +02:00
Jiri Kosina	bfa10b8c98	floppy: remove floppy-specific O_EXCL handling Block layer now handles O_EXCL in a generic way for block devices. The semantics is however different for floppy and all other block devices, as floppy driver contains its own O_EXCL handling. The semantics for all-but-floppy bdevs is "there can be at most one O_EXCL open of this file", while for floppy bdev the semantics is "if someone has the bdev open with O_EXCL, noone else can open it". There is actual userspace-observable change in behavior because of this since commit `e525fd89d3` ("block: make blkdev_get/put() handle exclusive access") -- on kernels containing this commit, mount of /dev/fd0 causes the fd0 block device be claimed with _EXCL, preventing subsequent open(/dev/fd0). Bring things back into shape, i.e. make it possible, analogically to other block devices, to mount the floppy and open() it afterwards -- remove the floppy-specific handling and let the generic bdev code O_EXCL handling take over. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:11 +02:00
Jiri Kosina	070ad7e793	floppy: convert to delayed work and single-thread wq There are several races in floppy driver between bottom half (scheduled_work) and timers (fd_timeout, fd_timer). Due to slowness of the actual floppy devices, those races are never (at least to my knowledge) triggered on a bare floppy metal. However on virtualized (emulated) floppy drives, which are of course magnitudes faster than the real ones, these races trigger reliably. They usually exhibit themselves as NULL pointer dereferences during DMA setup, such as BUG: unable to handle kernel NULL pointer dereference at 0000000a [ ... snip ... ] EIP: 0060:[<c02053d5>] EFLAGS: 00010293 CPU: 0 EAX: ffffe000 EBX: 0000000a ECX: 00000000 EDX: 0000000a ESI: c05d2718 EDI: 00000000 EBP: 00000000 ESP: f540fe44 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=f540e000 task=c082d5a0 task.ti=c0826000) Stack: ffffe000 00001ffc 00000000 00000000 00000000 c05d2718 c0708b40 f540fe80 c020470f c05d2718 c0708b40 00000000 f540fe80 0000000a f540fee4 00000000 c0708b40 f540fee4 00000000 00000000 c020526b 00000000 c05d2718 c0708b40 Call Trace: [<c020470f>] dump_trace+0xaf/0x110 [<c020526b>] show_trace_log_lvl+0x4b/0x60 [<c0205298>] show_trace+0x18/0x20 [<c05c5811>] dump_stack+0x6d/0x72 [<c0248527>] warn_slowpath_common+0x77/0xb0 [<c02485f3>] warn_slowpath_fmt+0x33/0x40 [<f7ec593c>] setup_DMA+0x14c/0x210 [floppy] [<f7ecaa95>] setup_rw_floppy+0x105/0x190 [floppy] [<c0256d08>] run_timer_softirq+0x168/0x2a0 [<c024e762>] __do_softirq+0xc2/0x1c0 [<c02042ed>] do_softirq+0x7d/0xb0 [<f54d8a00>] 0xf54d89ff but other instances can be easily seen as well. This can be observed at least under VMWare, VirtualBox and KVM. This patch converts all the timers and bottom halfs to be processed in a single workqueue. This aproach has been already discussed back in 2010 if I remember correctly, and Acked by Linus [1], but it then never made it to the tree. This all is based on original idea and code of Stephen Hemminger. I have ported original Stepen's code to the current state of the floppy driver, and performed quite some testing (on real hardware), which didn't reveal any issues (this includes not only writing and reading data, but also formatting (unfortunately I didn't find any Double-Density disks any more)). Ability to handle errors properly (supplying known bad floppies) has also been verified. [1] http://kerneltrap.org/mailarchive/linux-kernel/2010/6/11/4582092 Based-on-patch-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:10 +02:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Josh Durgin	263c6ca007	rbd: rename __rbd_update_snaps to __rbd_refresh_header This function rereads the entire header and handles any changes in it, not just changes in snapshots. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:09 -05:00
Josh Durgin	3591538fb2	rbd: fix snapshot size type Snapshot sizes should be the same type as regular image sizes. This only affects their displayed size in sysfs, not the reported size of an actual block device sizes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:03 -05:00
Josh Durgin	b06e6a6be7	rbd: remove conditional snapid parameters The snapid parameters passed to rbd_do_op() and rbd_req_sync_op() are now always either a valid snapid or an explicit CEPH_NOSNAP. [elder@dreamhost.com: Rephrased the description] Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:58 -05:00
Josh Durgin	77dfe99fe3	rbd: store snapshot id instead of index When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:52 -05:00
Josh Durgin	403f24d3d5	rbd: protect read of snapshot sequence number This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:46 -05:00
Xi Wang	50f7c4c967	rbd: fix integer overflow in rbd_header_from_disk() ondisk->snap_count is read from disk via rbd_req_sync_read() and thus needs validation. Otherwise, a bogus `snap_count' could overflow the kmalloc() size, leading to memory corruption. Also use `u32' consistently for `snap_count'. [elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:41 -05:00
Dan Carpenter	f8ad495a8a	rbd: use gfp_flags parameter in rbd_header_from_disk() We should use the gfp_flags that the caller specified instead of GFP_KERNEL here. There is only one caller and it uses GFP_KERNEL, so this change is just a cleanup and doesn't change how the code works. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:35 -05:00
Jan Beulich	8605067fb9	xen-blkfront: module exit handling adjustments The blkdev major must be released upon exit, or else the module can't attach to devices using the same majors upon being loaded again. Also avoid leaking the minor tracking bitmap. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:54 -04:00
Jan Beulich	e77c78c022	xen-blkfront: properly name all devices - devices beyond xvdzz didn't get proper names assigned at all - extended devices with minors not representable within the kernel's major/minor bit split spilled into foreign majors Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:52 -04:00
Asai Thambi S P	a09ba13eef	mtip32xx: release the semaphore on an error path Release the semaphore in an error path in mtip_hw_get_scatterlist(). This fixes the smatch warning inconsistent returns. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Jesper Juhl	d88a440edd	dac960: Remove unused variables from DAC960_CreateProcEntries() The variables 'StatusProcEntry' and 'UserCommandProcEntry' are assigned to once and then never used. This patch gets rid of the variables. While I was there I also fixed the indentation of the function to use tabs rather than spaces for the lines that did not already do so. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Eric W. Biederman	38bf195398	connector/userns: replace netlink uses of cap_raised() with capable() In 2009 Philip Reiser notied that a few users of netlink connector interface needed a capability check and added the idiom cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise that netlink was asynchronous. In 2011 Patrick McHardy noticed we were being silly because netlink is synchronous and removed eff_cap from the netlink_skb_params and changed the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN). Looking at those spots with a fresh eye we should be calling capable(CAP_SYS_ADMIN). The only reason I can see for not calling capable is that it once appeared we were not in the same task as the caller which would have made calling capable() impossible. In the initial user_namespace the only difference between between cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets PF_SUPERPRIV if we use the capability. Since we are going to be using root privilege setting PF_SUPERPRIV seems the right thing to do. The motivation for this that patch is that in a child user namespace cap_raised(current_cap(),...) tests your capabilities with respect to that child user namespace not capabilities in the initial user namespace and thus will allow processes that should be unprivielged to use the kernel services that are only protected with cap_raised(current_cap(),..). To fix possible user_namespace issues and to just clean up the code replace cap_raised(current_cap(), CAP_SYS_ADMIN) with capable(CAP_SYS_ADMIN). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Philipp Reisner <philipp.reisner@linbit.com> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:21:39 -04:00
Lars Ellenberg	92b4ca291f	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:56 +02:00
Cong Wang	bc4854bc91	drbd: check MODULE for THIS_MODULE THIS_MODULE is NULL only when drbd is compiled as built-in, so the #ifdef CONFIG_MODULES should be #ifdef MODULE instead. This fixes the warning: drivers/block/drbd/drbd_main.c: In function ‘drbd_buildtag’: drivers/block/drbd/drbd_main.c:4187:24: warning: the comparison will always evaluate as ‘true’ for the address of ‘__this_module’ will never be NULL [-Waddress] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:54 +02:00
Philipp Reisner	f6d0a8dbfd	drbd: Restore the request restart logic It got lost with the commit `5a7bbad27a` "block: remove support for bio remapping from ->make_request" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 17:20:59 +02:00
Lars Ellenberg	9476f39d66	drbd: introduce a bio_set to allocate housekeeping bios from Don't rely on availability of bios from the global fs_bio_set, we should use our own bio_set for meta data IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:07 +02:00
Lars Ellenberg	3c2f7a856f	drbd: remove unused define Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:06 +02:00
Arne Redlich	0c7db27920	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 15:17:04 +02:00
Lars Ellenberg	4d95a10f97	drbd: use the newly introduced page pool for bitmap IO Conflicts: drbd/drbd_bitmap.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:03 +02:00
Lars Ellenberg	4281808fb3	drbd: add page pool to be used for meta data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:02 +02:00
Lars Ellenberg	0e8488ade2	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:00 +02:00
Lars Ellenberg	a574daf5d7	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:59 +02:00
Lars Ellenberg	ba280c092e	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:58 +02:00
Philipp Reisner	5de738272e	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:56 +02:00
Philipp Reisner	197296ffed	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:55 +02:00
Lars Ellenberg	46385c84ac	drbd: move put_ldev from __req_mod() to the endio callback One invocation in the endio handler is good enough, we don't need mention it for each of the different ways it calls __req_mod(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:51 +02:00
Lars Ellenberg	d64957c9a9	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Just because this request happened during a resync does not mean it may pretend to have been barrier-acked. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:50 +02:00
Lars Ellenberg	41c4a0035b	drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended READ_RETRY_REMOTE_CANCELED needs to be grouped with the other _CANCELED cases, not with CONNECTION_LOST_WHILE_PENDING, as that would complete (fail) the bio even if the device became suspended. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:48 +02:00
Lars Ellenberg	6d49e101fd	drbd: make OOS_HANDED_TO_NETWORK its own case OOS_HANDED_TO_NETWORK should not be grouped with the various _CANCELED/_FAILED cases. Also, not only clear the RQ_NET_QUEUED flag, but also mark it RQ_NET_DONE, so it can be distinguished from a local-only request even after that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:47 +02:00
Lars Ellenberg	c088b2d904	drbd: don't pretend that barrier_nr == 0 was special We used to have a barrier implementation where barrier_nr 0 was reserved. That is long gone. Just use the full sequence space. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:46 +02:00
Lars Ellenberg	7ffcaa7194	drbd: remove unused static helper function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:44 +02:00
Lars Ellenberg	a5d214f621	drbd: remove some very outdated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:43 +02:00
Lars Ellenberg	1abc2af205	drbd: missing wakeup after drbd_rs_del_all Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:42 +02:00
Lars Ellenberg	671a74e749	drbd: remove now unused seq_num member from struct drbd_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:40 +02:00
Lars Ellenberg	001a88687a	drbd: fix potential data corruption and protocol error We assumed only bios with bi_idx == 0 would end up in drbd_make_request(). That is wrong. At least device mapper, in __clone_and_map(), may submit clones only covering a partial bio, but sharing the original bvec, by adjusting bi_idx and relevant other bio members of the clone. We used __bio_for_each_segment() in various places, even though that is documented as * drivers should not use the __ version unless they _really_ want to * run through the entire bio and not just pending pieces Impact: we would send the full bio bvec, even for the clone with bi_idx > 0, which will cause data corruption on the peer (because we submit wrong data at the clone offset), and will cause a DRBD protocol error, disconnect/reconnect and resync (thus fixing the corruption), because the next package header would be expected right in the middle of the sent data, causing DRBD magic mismatch. Fix: drop the assert, and use bio_for_each_segment() instead of the __ version. Conflicts: drbd/drbd_tracing.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:39 +02:00
Philipp Reisner	b6a370ba07	drbd: Fix a potential write ordering issue on SyncTarget nodes If a SyncTarget node gets a P_RS_DATA_REPLY before a P_DATA packet for the same sector, it simply submits these two IO requests. This is be possible because on the SyncSource node, the data of the P_RS_DATA_REPLY packet was read from disk. Immediately after that a write request from upper layers came in. The disk scheduler or even the "hardware" queues on the disk drive might reorder these writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:38 +02:00
Philipp Reisner	fc28845bc0	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:34 +02:00
Lars Ellenberg	031a7c173f	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:33 +02:00
Lars Ellenberg	47a4f1c1bb	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:32 +02:00
Philipp Reisner	7caacb69ac	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:30 +02:00
Philipp Reisner	4afc433cf8	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:29 +02:00
Lars Ellenberg	545752d5d8	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:01 +02:00
Lars Ellenberg	763eb63625	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:58 +02:00
Philipp Reisner	e89868a092	drbd: Fixed an obvious copy-n-paste mistake This bug might have caused troubles if disk-barriers and the ahead-behind more are enabled at the same time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:57 +02:00
Lars Ellenberg	f479ea0661	drbd: send intermediate state change results to the peer DRBD state changes schedule after_state_ch() actions to a worker thread, which decides on the old and new states of that change, whether to send an informational state update packet (P_STATE) to the peer. If it decides to drbd_send_state(), it would however always send the _curent_ state, which, if a second state change happens before the after_state_ch() of the first ran, may "fast-forward" the peer's view about this node. In most cases that is harmless, but sometimes this can confuse DRBD, for example into not actually starting a necessary resync if you do a very tight detach/attach loop on a Connected Secondary. Fix this by always sending the "new" state of the respective state transition which scheduled this after_state_ch() work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:56 +02:00
Lars Ellenberg	a2e9138197	drbd: fix spurious meta data IO "error" When detaching, even cleanly detaching due to administrator request, we always go through D_FAILED before we become D_DISKLESS. Don't let that state change race with an in-flight meta data IO, or that one might think it actually experienced an IO error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:54 +02:00
Philipp Reisner	aaae506d54	drbd: Fixed a race condition between detach and start of resync drbd_state_lock() is only there to serialize cluster wide state changes. Testing the local disk state needs to happen while holding the global_state_lock. Otherwise you might see something like this (Oct 6 on kugel) 14:20:24 drbd0: conn( WFSyncUUID -> Connected ) disk( Inconsistent -> Failed ) 14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) 14:20:24 drbd0: conn( Connected -> SyncTarget ) disk( Failed -> Inconsistent ) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:53 +02:00
Lars Ellenberg	6a9a92f4ef	drbd: fix harmless race to not trigger an ASSERT We have one pre-allocated page to do certain synchronous meta data IO with, using it is serialized like so: drbd_md_get_buffer(); drbd_md_sync_page_io(); drbd_md_sync_page_io(); ... drbd_md_put_buffer(); In drbd_md_sync_page_io() there is an ASSERT(atomic_read(&mdev->md_io_in_use) == 1); We want to be able to timeout on unresponsive lower level devices, so we can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra reference, to not have a dangling pointer in case a delayed IO eventually does still complete, even after we "detached" already. We need to put the extra reference before we signal completion from the completion handler, or the second drbd_md_sync_page_io() above may trigger the assert (reference count still 2). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:52 +02:00
Philipp Reisner	5ba3dac521	drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:50 +02:00
Andreas Gruenbacher	7b4e4d3126	drbd: drbd_nl_resize(): Fix missing put_ldev() on error path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:49 +02:00
Lars Ellenberg	40424e4a24	drbd: fix "stalled" empty resync With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:47 +02:00
Philipp Reisner	1e86ac48af	drbd: Bugfix for the connection behavior If we get into the C_BROKEN_PIPE cstate once, the state engine set the thi->t_state of the receiver thread to restarting. But with the while loop in drbdd_init() a new connection gets established. After the call into drbdd() returns immediately since the thi->t_state is not RUNNING. The restart of drbd_init() then resets thi->t_state to RUNNING. I.e. after entering C_BROKEN_PIPE once, the next successful established connection gets wasted. The two parts of the fix: * Do not cause the thread to restart if we detect the issue with the sockets while we are in C_WF_CONNECTION. * Make sure that all actions that would have set us to C_BROKEN_PIPE happen before the state change to C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:46 +02:00
Philipp Reisner	80f9fd55a6	drbd: Cleanup all epoch objects upon connection loss Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:44 +02:00
Philipp Reisner	fd2491f4a4	drbd: detach must not try to abort non-local requests from drbd-8.4 Cherry picked form 8.4 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:43 +02:00
Philipp Reisner	79f16f5dbc	drbd: Consider that the no-data-condition could be in connected state ...when the peer has inconsistent data. In that case we failed to clear the susp_nod flag. When the local disk was attached again Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:42 +02:00
Philipp Reisner	bca482e90b	drbd: Fixed current UUID generation Now, the new edition of the clause only fires if a diskless peer gets promoted. This is a fixup for "drbd: Delayed creation of current-UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:50 +02:00
Lars Ellenberg	22f46ce2ef	drbd: change some GFP_KERNEL to GFP_NOIO Bitmap IO may happend in the context of an application write, in the generic block IO path. We need to use GFP_NOIO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:47 +02:00
Philipp Reisner	dfa8bedbfe	drbd: Implemented the disk-timeout option When the disk-timeout is active, and it expires for a single request, we consider the local disk as D_FAILED. Note: With this change, I made both timeout based state transitions HARD state transitions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:45 +02:00
Philipp Reisner	02ee8f95fa	drbd: Force flag for the detach operation Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:38 +02:00
Philipp Reisner	5ca1de0384	drbd: Allow new IOs while the local disk in in FAILED state The last bunch of commits prepared the 'detach from tar pit' feature. With that we can be for long time in disk state FAILED. We need to accept new IO requests during that time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:34 +02:00
Philipp Reisner	9e58c4dad7	drbd: Bitmap IO functions can now return prematurely if the disk breaks Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:33 +02:00
Philipp Reisner	d1f3779bbe	drbd: Added a kref to bm_aio_ctx Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:37:19 +02:00
Philipp Reisner	b2057629ea	drbd: Hold a reference to ldev while doing meta-data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:31:11 +02:00
Philipp Reisner	4a2fe568b5	drbd: Keep a reference to the bio until the completion handler finished Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:28:51 +02:00
Philipp Reisner	0c46442515	drbd: Implemented wait_until_done_or_disk_failure() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:26:51 +02:00
Philipp Reisner	e17117310b	drbd: Replaced md_io_mutex by an atomic: md_io_in_use The new function drbd_md_get_buffer() aborts waiting for the buffer in case the disk failes in the meantime. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:22:31 +02:00
Philipp Reisner	cc94c65015	drbd: moved md_io into mdev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:17:24 +02:00
Philipp Reisner	2b4dd36fba	drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:16:04 +02:00
Philipp Reisner	6d7e32f568	drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:15:28 +02:00
Philipp Reisner	6809384c71	drbd: Improve compatibility with drbd's older than 8.3.7 Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:57 +02:00
Philipp Reisner	77e8fdfc18	drbd: Only print sanitize state's warnings, if the state change happens The reason for this change is that, with when doing 'drbdadm invalidate' on a disconnected resource caused an "implicitly set pdsk from UpToDate to DUnknown" message, which was missleading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:22 +02:00
Lars Ellenberg	07667347c8	drbd: downgraded error printk to info Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:05:25 +02:00
David Howells	5f138ce01a	DRBD: Fix comparison always false warning due to long/long long compare Fix warnings of the following nature in the drbd header: In file included from drivers/block/drbd/drbd_bitmap.c:32: drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress': drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which is always false on a 32-bit machine. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 10:03:19 +02:00
Lars Ellenberg	7948bcdc38	drbd: spelling fix: too small It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:02:22 +02:00
Lars Ellenberg	1381e9a496	drbd: cosmetic: fix accidental division instead of modulo when pretty printing For large resync rates, seq_printf_with_thousands_grouping() accidentally only produced Y,000,00Y, instead of the real numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:01:39 +02:00
Philipp Reisner	ebd2b0cde5	drbd: Lower log priority for an event that is definitely not an error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 09:59:29 +02:00
Sage Weil	3469ac1aa3	ceph: drop support for preferred_osd pgs This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-05-07 15:33:36 -07:00
David S. Miller	f24001941c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Fix merge between commit `3adadc08cc` ("net ax25: Reorder ax25_exit to remove races") and commit `0ca7a4c87d` ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:15:17 -04:00
Pavel Emelyanov	4a17fd5229	sock: Introduce named constants for sk_reuse Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Linus Torvalds	c1acb0ba33	Fixes in various components: * mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). * two tiny compile warning fixes. * proper error handling in gnttab_resume * Not using VM_PFNMAP anymore to allow backends in the same domain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPkYeqAAoJEFjIrFwIi8fJEXIIAI+PYLNMcHTc4bxa6pErpKaS rq5eCXL9+EaZOwTUqHRJjfrjnlAc+BWO8lN0H41oRQWFYh14hgfUVJ+ziEujb1kw N1eTMVHnH/XRJV6rIFX+TiBasnyoMmNfWEAb45UL1nEUTMPL1Jv7AiRY/GxUlHyg M+uFG52KP3ytXxcIiGW6pYEqJd6UgWrqnclaeg5TR5zvDlWfJbUIBEMQ/PyV0WSS 4e7biiwi4XPWT2f1qewOmI+3r68CltU3GAs1XxjcSX+bYYuh00UtY39AsBWo2N8I 1VORuq0QPs+GB22r3e47IqBcjXkBGRIf6w1e/5a6WLiq7TqVq4bYgCGUmWT8V7o= =olCU -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: - mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). - two tiny compile warning fixes. - proper error handling in gnttab_resume - Not using VM_PFNMAP anymore to allow backends in the same domain. * tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" xen/resume: Fix compile warnings. xen/xenbus: Add quirk to deal with misconfigured backends. xen/blkback: Fix warning error. xen/p2m: m2p_find_override: use list_for_each_entry_safe xen/gntdev: do not set VM_PFNMAP xen/grant-table: add error-handling code on failure of gnttab_resume	2012-04-20 11:31:00 -07:00
Konrad Rzeszutek Wilk	a71e23d992	xen/blkback: Fix warning error. drivers/block/xen-blkback/xenbus.c: In function 'xen_blkbk_discard': drivers/block/xen-blkback/xenbus.c:419:4: warning: passing argument 1 of 'dev_warn' makes pointer from integer without a cast +[enabled by default] include/linux/device.h:894:5: note: expected 'const struct device *' but argument is of type 'long int' It is unclear how that mistake made it in. It surely is wrong. Acked-by: Jens Axboe <axboe@kernel.dk> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-18 15:54:08 -04:00
Linus Torvalds	cdd5983063	virtio: fixes on top of 3.4-rc2 Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPio1GAAoJECgfDbjSjVRpGDAH/3C/bXm9mriuNauRHwktHgJe gmh2BfUgnxly6vheuz0Fv61lTe6V8kekHVolbUYwAUgXeWEKK1C59xehrMGRIPDG 1XUiti50U3P+skhIfrbkS5nZ7L+5Hk0ToQ6dd9v0BM2GxDOvgwidlY1bZe+wJEZf Lvl6w/djBCr1e3k4qfRnpTcdJJ4FnOjGbikLQhSTGfUXeNo6uWS1hljYWnAhzFkd 1xU8h5PP0TDR0nYb80CeB+9Lxw0w4qyNPJIBhNN6ucB/1U6R+55HpEpmrLUkn910 sEFEFsc0cRVWr8FiOTlmzxLHnwTc8AY/Bsp9TMSmnTRu3ZQcoQMTQQCczRj04xI= =VmpJ -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael S. Tsirkin: "Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix handling of PAGE_SIZE != 4k virtio_balloon: Fix endian bug virtio_blk: helper function to format disk names tools/virtio: fix up vhost/test module build	2012-04-16 18:34:12 -07:00
Linus Torvalds	c104f1fa1e	Merge branch 'for-3.4/drivers' of git://git.kernel.dk/linux-block Pull block driver bits from Jens Axboe: - A series of fixes for mtip32xx. Most from Asai at Micron, but also one from Greg, getting rid of the dependency on PCIE_HOTPLUG. - A few bug fixes for xen-blkfront, and blkback. - A virtio-blk fix for Vivek, making resize actually work. - Two fixes from Stephen, making larger transfers possible on cciss. This is needed for tape drive support. * 'for-3.4/drivers' of git://git.kernel.dk/linux-block: block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy mtip32xx: dump tagmap on failure mtip32xx: fix handling of commands in various scenarios mtip32xx: Shorten macro names mtip32xx: misc changes mtip32xx: Add new sysfs entry 'status' mtip32xx: make setting comp_time as common mtip32xx: Add new bitwise flag 'dd_flag' mtip32xx: fix error handling in mtip_init() virtio-blk: Call revalidate_disk() upon online disk resize xen/blkback: Make optional features be really optional. xen/blkback: Squash the discard support for 'file' and 'phy' type. mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() cciss: Fix scsi tape io with more than 255 scatter gather elements cciss: Initialize scsi host max_sectors for tape drive support xen-blkfront: make blkif_io_lock spinlock per-device xen/blkfront: don't put bdev right after getting it xen-blkfront: use bitmap_set() and bitmap_clear() xen/blkback: Enable blkback on HVM guests xen/blkback: use grant-table.c hypercall wrappers	2012-04-13 18:45:13 -07:00
Ren Mingxin	c0aa3e0916	virtio_blk: helper function to format disk names The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are more virtio blocks, there will be disks with the same name. Based on commit `3e1a7ff8a0`, add a function "virtblk_name_format()" for virtio block to support mass of disks naming. Notes: - Our naming scheme is ugly. We are stuck with it for virtio but don't use it for any new driver: new drivers should name their devices PREFIX%d where the sequence number can be allocated by ida - sd_format_disk_name has exactly the same logic. Moving it to a central place was deferred over worries that this will make people keep using the legacy naming in new drivers. We kept code idential in case someone wants to deduplicate later. Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com> Acked-by: Asias He <asias@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-04-12 10:37:05 +03:00
Greg Kroah-Hartman	6363480651	block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy This removes the HOTPLUG_PCI_PCIE dependency on the driver and makes it depend on PCI. Cc: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-12 08:47:05 +02:00
Asai Thambi S P	95fea2f1d9	mtip32xx: dump tagmap on failure Dump tagmap on failure, instead of individual tags. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	c74b0f586f	mtip32xx: fix handling of commands in various scenarios * If a ncq command time out and a non-ncq command is active, skip restart port * Queue(pause) ncq commands during operations spanning more than one non-ncq commands - secure erase, download microcode * When a non-ncq command is active, allow incoming non-ncq commands to wait instead of failing back * Changed timeout for download microcode and smart commands * If the device in write protect mode, fail all writes (do not send to device) * Set maximum retries to 2 Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	8a857a880b	mtip32xx: Shorten macro names Shortened macros used to represent mtip_port->flags and dd->dd_flag Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	8182b49528	mtip32xx: misc changes * Handle the interrupt completion of polled internal commands * Do not check remove pending flag for standby command * On rebuild failure, - set corresponding bit dd_flag - do not send standby command * Free ida index in remove path Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	f65872177d	mtip32xx: Add new sysfs entry 'status' * Add support for detecting the following device status - write protect - over temp (thermal shutdown) * Add new sysfs entry 'status', possible values - online, write_protect, thermal_shutdown * Add new file 'sysfs-block-rssd' to document ABI (Reported-by: Greg Kroah-Hartman) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	dad40f16ff	mtip32xx: make setting comp_time as common Moved setting completion time into mtip_issue_ncq_command() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	45038367c2	mtip32xx: Add new bitwise flag 'dd_flag' * Merged the following flags into one variable 'dd_flag': * drv_cleanup_done * resumeflag * Added the following flags into 'dd_flag' * remove pending * init done * Removed 'ftlrebuildflag' (similar flag is already part of mti_port->flags) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Linus Torvalds	9479f0f801	Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76 8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37 ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls= =6Ve+ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code." * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success xen/pciback: fix XEN_PCI_OP_enable_msix result xen/smp: Remove unnecessary call to smp_processor_id() xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' xen: only check xen_platform_pci_unplug if hvm	2012-04-06 17:54:53 -07:00
Igor Mammedov	e95ae5a493	xen: only check xen_platform_pci_unplug if hvm commit b9136d207f08 xen: initialize platform-pci even if xen_emul_unplug=never breaks blkfront/netfront by not loading them because of xen_platform_pci_unplug=0 and it is never set for PV guest. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-06 12:12:52 -04:00
Alex Elder	cd9d9f5df6	rbd: don't hold spinlock during messenger flush A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-04-05 15:43:58 -05:00
Ryosuke Saito	6d27f09a63	mtip32xx: fix error handling in mtip_init() Ensure that block device is properly unregistered, if pci_register_driver() fails. Signed-off-by: Ryosuke Saito <raitosyo@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-05 08:09:34 -06:00
Len Brown	f6365201d8	x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-03-30 08:50:27 +02:00
Vivek Goyal	e9986f303d	virtio-blk: Call revalidate_disk() upon online disk resize If a virtio disk is open in guest and a disk resize operation is done, (virsh blockresize), new size is not visible to tools like "fdisk -l". This seems to be happening as we update only part->nr_sects and not bdev->bd_inode size. Call revalidate_disk() which should take care of it. I tested growing disk size of already open disk and it works for me. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-29 10:09:44 +02:00
Linus Torvalds	532bfc851a	Merge branch 'akpm' (Andrew's patch-bomb) Merge third batch of patches from Andrew Morton: - Some MM stragglers - core SMP library cleanups (on_each_cpu_mask) - Some IPI optimisations - kexec - kdump - IPMI - the radix-tree iterator work - various other misc bits. "That'll do for -rc1. I still have ~10 patches for 3.4, will send those along when they've baked a little more." * emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits) backlight: fix typo in tosa_lcd.c crc32: add help text for the algorithm select option mm: move hugepage test examples to tools/testing/selftests/vm mm: move slabinfo.c to tools/vm mm: move page-types.c from Documentation to tools/vm selftests/Makefile: make `run_tests' depend on `all' selftests: launch individual selftests from the main Makefile radix-tree: use iterators in find_get_pages* functions radix-tree: rewrite gang lookup using iterator radix-tree: introduce bit-optimized iterator fs/proc/namespaces.c: prevent crash when ns_entries[] is empty nbd: rename the nbd_device variable from lo to nbd pidns: add reboot_pid_ns() to handle the reboot syscall sysctl: use bitmap library functions ipmi: use locks on watchdog timeout set on reboot ipmi: simplify locking ipmi: fix message handling during panics ipmi: use a tasklet for handling received messages ipmi: increase KCS timeouts ipmi: decrease the IPMI message transaction time in interrupt mode ...	2012-03-28 17:19:28 -07:00
Wanlong Gao	f4507164e7	nbd: rename the nbd_device variable from lo to nbd rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name copied from loop.c. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-28 17:14:37 -07:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
Linus Torvalds	47b816ff7d	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull a few more things for powerpc by Benjamin Herrenschmidt: - Anton's did some recent improvements to EPOW event reporting on pSeries (power supply failures and such). The patches are self contained enough and replace really nasty code so I felt it should still go in - I did the vio driver registration change Greg requested, I don't see the point of leaving that til the next merge window - The remaining EEH changes I said were still pending to get rid of the EEH references from the generic struct device_node - A few more iSeries removal bits - A perf bug fix on 970 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/perf: Fix instruction address sampling on 970 and Power4 powerpc+sparc/vio: Modernize driver registration powerpc: Random little legacy iSeries removal tidy ups powerpc: Remove NO_IRQ_IGNORE powerpc/pseries: Cut down on enthusiastic use of defines in RAS code powerpc/pseries: Clean up ras_error_interrupt code powerpc/pseries: Remove RTAS_POWERMGM_EVENTS powerpc/pseries: Use rtas_get_sensor in RAS code powerpc/pseries: Parse and handle EPOW interrupts powerpc: Make function that parses RTAS error logs global powerpc/eeh: Retrieve PHB from global list powerpc/eeh: Remove eeh information from pci_dn powerpc/eeh: Remove eeh device from OF node	2012-03-28 14:41:36 -07:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
Linus Torvalds	56b59b429b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...	2012-03-28 10:01:29 -07:00
Benjamin Herrenschmidt	cb52d8970e	powerpc+sparc/vio: Modernize driver registration This makes vio_register_driver() get the module owner & name at compile time like PCI drivers do, and adds a name pointer directly in struct vio_driver to avoid having to explicitly initialize the embedded struct device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David S. Miller <davem@davemloft.net>	2012-03-28 11:33:24 +11:00
Jens Axboe	6674fb79ca	Merge branch 'stable/for-jens-3.4-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.4/drivers Konrad writes: I've two small fixes for the xen-blkback - and I think one more will show up eventually (a partial revert), but not sure when. So in the spirit of keeping the patches flowing, please git pull the following branch.	2012-03-26 09:13:14 +02:00
Linus Torvalds	e22057c859	One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPbc3DAAoJEFjIrFwIi8fJTQkIAMnH2fPhcHAb4mNaz+3gdmsZ Flo6V1gMBcO8xKZlUkFgKKPYoOm7lLmvoceXLVSH5oOKSnSJo1zSinzKmcdJQo/D kPo4/EguNwtzcAcQh2dmT6/IM9O3ihMKUli7Oajif9PLCFFFqTaG3Y3YNBo/rxTY D3HAnJrIfmIyG0NpLnaFCWhCzUvcB4M7ysutECqcF8l5gnbHxRVeCKD0blM+n9GH Wyum00dQCwo6h6wTduhPOAxHAM4rncyR3heOB2vDxq9YJHSUhhcva5QCgQ+tdUVt 6U2TQT1L2Px8iXXzr2w9YBpepOVajZReoKhajLjJ5VbkpBZFz5dVNfJ8LpF8RV8= =z8IB -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull more xen updates from Konrad Rzeszutek Wilk: "One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code." * tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/acpi: Fix Kconfig dependency on CPU_FREQ xen: initialize platform-pci even if xen_emul_unplug=never xen/smp: Fix bringup bug in AP code. xen/acpi: Remove the WARN's as they just create noise. xen/tmem: cleanup xen: support pirq_eoi_map xen/acpi-processor: Do not depend on CPU frequency scaling drivers. xen/cpufreq: Disable the cpu frequency scaling drivers from loading. provide disable_cpufreq() function to disable the API.	2012-03-24 12:20:25 -07:00
Konrad Rzeszutek Wilk	3389bb8bf7	xen/blkback: Make optional features be really optional. They were using the xenbus_dev_fatal() function which would change the state of the connection immediately. Which is not what we want when we advertise optional features. So make 'feature-discard','feature-barrier','feature-flush-cache' optional. Suggested-by: Jan Beulich <JBeulich@suse.com> [v1: Made the discard function void and static] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:36 -04:00
Konrad Rzeszutek Wilk	4dae76705f	xen/blkback: Squash the discard support for 'file' and 'phy' type. The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: Jan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:35 -04:00
Oleg Nesterov	70834d3070	usermodehelper: use UMH_WAIT_PROC consistently A few call_usermodehelper() callers use the hardcoded constant instead of the proper UMH_WAIT_PROC, fix them. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Januszewski <spock@gentoo.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-23 16:58:41 -07:00
Asai Thambi S P	22be2e6e13	mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() This patch includes two changes: * fix incorrect value set for drv_cleanup_done * re-initialize and start port in mtip_restart_port() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-23 12:33:03 +01:00
Stephen M. Cameron	bc67f63650	cciss: Fix scsi tape io with more than 255 scatter gather elements The total number of scatter gather elements in the CISS command used by the scsi tape code was being cast to a u8, which can hold at most 255 scatter gather elements. It should have been cast to a u16. Without this patch the command gets rejected by the controller since the total scatter gather count did not add up to the right value resulting in an i/o error. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:09 +01:00
Stephen M. Cameron	395d287526	cciss: Initialize scsi host max_sectors for tape drive support The default is too small (1024 blocks), use h->cciss_max_sectors (8192 blocks) Without this change, if you try to set the block size of a tape drive above 512*1024, via "mt -f /dev/st0 setblk nnn" where nnn is greater than 524288, it won't work right. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:08 +01:00
Josh Durgin	c666601a93	rbd: move snap_rwsem to the device, rename to header_rwsem A new temporary header is allocated each time the header changes, but only the changed properties are copied over. We don't need a new semaphore for each header update. This addresses http://tracker.newdream.net/issues/2174 Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:52 -05:00
Alex Elder	32eec68d2f	rbd: don't drop the rbd_id too early Currently an rbd device's id is released when it is removed, but it is done before the code is run to clean up sysfs-related files (such as /sys/bus/rbd/devices/1). It's possible that an rbd is still in use after the rbd_remove() call has been made. It's essentially the same as an active inode that stays around after it has been removed--until its final close operation. This means that the id shows up as free for reuse at a time it should not be. The effect of this was seen by Jens Rehpoehler, who: - had a filesystem mounted on an rbd device - unmapped that filesystem (without unmounting) - found that the mount still worked properly - but hit a panic when he attempted to re-map a new rbd device This re-map attempt found the previously-unmapped id available. The subsequent attempt to reuse it was met with a panic while attempting to (re-)install the sysfs entry for the new mapped device. Fix this by holding off "putting" the rbd id, until the rbd_device release function is called--when the last reference is finally dropped. Note: This fixes: http://tracker.newdream.net/issues/1907 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	593a9e7b34	rbd: small changes Here is another set of small code tidy-ups: - Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic names throughout. Tell the blk_queue system our physical block size, in the (unlikely) event we want to use something other than the default. - Delete the definition of struct rbd_info, which is never used. - Move the definition of dev_to_rbd() down in its source file, just above where it gets first used, and change its name to dev_to_rbd_dev(). - Replace an open-coded operation in rbd_dev_release() to use dev_to_rbd_dev() instead. - Calculate the segment size for a given rbd_device just once in rbd_init_disk(). - Use the '%zd' conversion specifier in rbd_snap_size_show(), since the value formatted is a size_t. - Switch to the '%llu' conversion specifier in rbd_snap_id_show(). since the value formatted is unsigned. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	00f1f36ffa	rbd: do some refactoring A few blocks of code are rearranged a bit here: - In rbd_header_from_disk(): - Don't bother computing snap_count until we're sure the on-disk header starts with a good signature. - Move a few independent lines of code so they are after a check for a failed memory allocation. - Get rid of unnecessary local variable "ret". - Make a few other changes in rbd_read_header(), similar to the above--just moving things around a bit while preserving the functionality. - In rbd_rq_fn(), just assign rq in the while loop's controlling expression rather than duplicating it before and at the end of the loop body. This allows the use of "continue" rather than "goto next" in a number of spots. - Rearrange the logic in snap_by_name(). End result is the same. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	fed4c143ba	rbd: fix module sysfs setup/teardown code Once rbd_bus_type is registered, it allows an "add" operation via the /sys/bus/rbd/add bus attribute, and adding a new rbd device that way establishes a connection between the device and rbd_root_dev. But rbd_root_dev is not registered until after the rbd_bus_type registration is complete. This could (in principle anyway) result in an invalid state. Since rbd_root_dev has no tie to rbd_bus_type we can reorder these two initializations and never be faced with this scenario. In addition, unregister the device in the event the bus registration fails at module init time. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	7ef3214af2	rbd: don't allocate mon_addrs buffer in rbd_add() The mon_addrs buffer in rbd_add is used to hold a copy of the monitor IP addresses supplied via /sys/bus/rbd/add. That is passed to rbd_get_client(), which never modifies it (nor do any of the functions it gets passed to thereafter)--the mon_addr parameter to rbd_get_client() is a pointer to constant data, so it can't be modifed. Furthermore, rbd_get_client() has the length of the mon_addrs buffer and that is used to ensure nothing goes beyond its end. Based on all this, there is no reason that a buffer needs to be used to hold a copy of the mon_addrs provided via /sys/bus/rbd/add. Instead, the location within that passed-in buffer can be provided, along with the length of the "token" therein which represents the monitor IP's. A small change to rbd_add_parse_args() allows the address within the buffer to be passed back, and the length is already returned. This now means that, at least from the perspective of this interface, there is no such thing as a list of monitor addresses that is too long. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	5214ecc45c	rbd: have rbd_parse_args() report found mon_addrs size The argument parsing routine already computes the size of the mon_addrs buffer it extracts from the "command." Pass it to the caller so it can use it to provide the length to rbd_get_client(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	81a8979378	rbd: do a few checks at build time This is a bit gratuitous, but there are a few things that can be verified at build time rather than run time, so do that. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	e28fff268e	rbd: don't use sscanf() in rbd_add_parse_args() Make use of a few simple helper routines to parse the arguments rather than sscanf(). This will treat both missing and too-long arguments as invalid input (rather than silently truncating the input in the too-long case). In time this can also be used by rbd_add() to use the passed-in buffer in place, rather than copying its contents into new buffers. It appears to me that the sscanf() previously used would not correctly handle a supplied snapshot--the two final "%s" conversion specifications were not separated by a space, and I'm not sure how sscanf() handles that situation. It may not be well-defined. So that may be a bug this change fixes (but I didn't verify that). The sizes of the mon_addrs and options buffers are now passed to rbd_add_parse_args(), so they can be supplied to copy_token(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	a725f65e52	rbd: encapsulate argument parsing for rbd_add() Move the code that parses the arguments provided to rbd_add() (which are supplied via /sys/bus/rbd/add) into a separate function. Also rename the "mon_dev_name" variable in rbd_add() to be "mon_addrs". The variable represents a list of one or more comma-separated monitor IP addresses, each with an optional port number. I think "mon_addrs" captures that notion a little better. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:48 -05:00
Alex Elder	27cc25943f	rbd: simplify error handling in rbd_add() If a couple pointers are initialized to NULL then a single "out_nomem" label can be used for all of the memory allocation failure cases in rbd_add(). Also, get rid of the "irc" local variable there. There is no real need for "rc" to be type ssize_t, and it can be used in the spot "irc" was. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	60571c7d55	rbd: reduce memory used for rbd_dev fields The length of the string containing the monitor address specification(s) will never exceed the length of the string passed in to rbd_add(). The same holds true for the ceph + rbd options string. So reduce the amount of memory allocated for these to that length rather than the maximum (1024 bytes). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	d720bcb0a8	rbd: have rbd_get_client() return a rbd_client Since rbd_get_client() currently returns an error code. It assigns the rbd_client field of the rbd_device structure it is passed if successful. Instead, have it return the created rbd_client structure and return a pointer-coded error if there is an error. This makes the assignment of the client pointer more obvious at the call site. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	f0f8cef5a3	rbd: a few simple changes Here are a few very simple cleanups: - Add a "RBD_" prefix to the two driver name string definitions. - Move the definition of struct rbd_request below struct rbd_req_coll to avoid the need for an empty declaration of the latter. - Move and group the definitions of rbd_root_dev_release() and rbd_root_dev, as well as rbd_bus_type and rbd_bus_attrs[], close to the top of the file. Arrange the latter so rbd_bus_type.bus_attrs can be initialized statically. - Get rid of an unnecessary local variable in rbd_open(). - Rework some hokey logic in rbd_bus_add_dev(), so the value of "ret" at the end is either 0 or -ENOENT to avoid the need for the code duplication that was there. - Rename a goto target in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	432b858749	rbd: rename "node_lock" The spinlock used to protect rbd_client_list is named "node_lock". Rename it to "rbd_client_list_lock" to make it more obvious what it's for. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	bc534d86be	rbd: move ctl_mutex lock inside rbd_client_create() Since rbd_client_create() is only called in one place, move the acquisition of the mutex around that call inside that function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d97081b0c7	rbd: move ctl_mutex lock inside rbd_get_client() Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for the next patch.) Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e6994d3dde	rbd: release client list lock sooner In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d184f6bfde	rbd: restore previous rbd id sequence behavior It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd id's though, and this change makes ids get allocated as they were before--each new id is one more than the maximum currently in use. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	499afd5b8e	rbd: tie rbd_dev_list changes to rbd_id operations The only time entries are added to or removed from the global rbd_dev_list is exactly when a "put" or "get" operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e124a82f3c	rbd: protect the rbd_dev_list with a spinlock The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	1ddbe94eda	rbd: rework calculation of new rbd id's In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	b7f23c361b	rbd: encapsulate new rbd id selection Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Josh Durgin	cc9d734c3d	rbd: use a single value of snap_name to mean no snap There's already a constant for this anyway. Since rbd_header_set_snap() is only used to set the rbd device snap_name field, just do that within that function rather than having it take the snap_name as an argument. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net> v2: Changed interface rbd_header_set_snap() so it explicitly updates the snap_name in the rbd_device. Also added a BUILD_BUG_ON() to verify the size of the snap_name field is sufficient for SNAP_HEAD_NAME.	2012-03-22 10:47:47 -05:00
Alex Elder	1dbb439913	rbd: do not duplicate ceph_client pointer in rbd_device The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy only. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	ee57741c52	rbd: make ceph_parse_options() return a pointer ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	2107978668	rbd: a few small cleanups Some minor cleanups in "drivers/block/rbd.c: - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96" in the definition of RBD_MAX_MD_NAME_LEN. - Use DEFINE_SPINLOCK() to define and initialize node_lock. - Drop a needless (char *) cast in parse_rbd_opts_token(). - Make a few minor formatting changes. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:46 -05:00
Igor Mammedov	b9136d207f	xen: initialize platform-pci even if xen_emul_unplug=never When xen_emul_unplug=never is specified on kernel command line reading files from /sys/hypervisor is broken (returns -EBUSY). It is caused by xen_bus dependency on platform-pci and platform-pci isn't initialized when xen_emul_unplug=never is specified. Fix it by allowing platform-pci to ignore xen_emul_unplug=never, and do not intialize xen_[blk\|net]front instead. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-22 11:37:11 -04:00
Linus Torvalds	5375871d43	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc merge from Benjamin Herrenschmidt: "Here's the powerpc batch for this merge window. It is going to be a bit more nasty than usual as in touching things outside of arch/powerpc mostly due to the big iSeriesectomy :-) We finally got rid of the bugger (legacy iSeries support) which was a PITA to maintain and that nobody really used anymore. Here are some of the highlights: - Legacy iSeries is gone. Thanks Stephen ! There's still some bits and pieces remaining if you do a grep -ir series arch/powerpc but they are harmless and will be removed in the next few weeks hopefully. - The 'fadump' functionality (Firmware Assisted Dump) replaces the previous (equivalent) "pHyp assisted dump"... it's a rewrite of a mechanism to get the hypervisor to do crash dumps on pSeries, the new implementation hopefully being much more reliable. Thanks Mahesh Salgaonkar. - The "EEH" code (pSeries PCI error handling & recovery) got a big spring cleaning, motivated by the need to be able to implement a new backend for it on top of some new different type of firwmare. The work isn't complete yet, but a good chunk of the cleanups is there. Note that this adds a field to struct device_node which is not very nice and which Grant objects to. I will have a patch soon that moves that to a powerpc private data structure (hopefully before rc1) and we'll improve things further later on (hopefully getting rid of the need for that pointer completely). Thanks Gavin Shan. - I dug into our exception & interrupt handling code to improve the way we do lazy interrupt handling (and make it work properly with "edge" triggered interrupt sources), and while at it found & fixed a wagon of issues in those areas, including adding support for page fault retry & fatal signals on page faults. - Your usual random batch of small fixes & updates, including a bunch of new embedded boards, both Freescale and APM based ones, etc..." I fixed up some conflicts with the generalized irq-domain changes from Grant Likely, hopefully correctly. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits) powerpc/ps3: Do not adjust the wrapper load address powerpc: Remove the rest of the legacy iSeries include files powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces init: Remove CONFIG_PPC_ISERIES powerpc: Remove FW_FEATURE ISERIES from arch code tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable powerpc/spufs: Fix double unlocks powerpc/5200: convert mpc5200 to use of_platform_populate() powerpc/mpc5200: add options to mpc5200_defconfig powerpc/mpc52xx: add a4m072 board support powerpc/mpc5200: update mpc5200_defconfig to fit for charon board Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board MAINTAINERS: Update PowerPC 4xx tree powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board powerpc: document the FSL MPIC message register binding powerpc: add support for MPIC message register API powerpc/fsl: Added aliased MSIIR register address to MSI node in dts powerpc/85xx: mpc8548cds - add 36-bit dts ...	2012-03-21 18:55:10 -07:00
Linus Torvalds	9f3938346a	Merge branch 'kmap_atomic' of git://github.com/congwang/linux Pull kmap_atomic cleanup from Cong Wang. It's been in -next for a long time, and it gets rid of the (no longer used) second argument to k[un]map_atomic(). Fix up a few trivial conflicts in various drivers, and do an "evil merge" to catch some new uses that have come in since Cong's tree. * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits) feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] drbd: remove the second argument of k[un]map_atomic() zcache: remove the second argument of k[un]map_atomic() gma500: remove the second argument of k[un]map_atomic() dm: remove the second argument of k[un]map_atomic() tomoyo: remove the second argument of k[un]map_atomic() sunrpc: remove the second argument of k[un]map_atomic() rds: remove the second argument of k[un]map_atomic() net: remove the second argument of k[un]map_atomic() mm: remove the second argument of k[un]map_atomic() lib: remove the second argument of k[un]map_atomic() power: remove the second argument of k[un]map_atomic() kdb: remove the second argument of k[un]map_atomic() udf: remove the second argument of k[un]map_atomic() ubifs: remove the second argument of k[un]map_atomic() squashfs: remove the second argument of k[un]map_atomic() reiserfs: remove the second argument of k[un]map_atomic() ocfs2: remove the second argument of k[un]map_atomic() ntfs: remove the second argument of k[un]map_atomic() ...	2012-03-21 09:40:26 -07:00
Linus Torvalds	69a7aebcf0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "It's indeed trivial -- mostly documentation updates and a bunch of typo fixes from Masanari. There are also several linux/version.h include removals from Jesper." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits) kcore: fix spelling in read_kcore() comment constify struct pci_dev * in obvious cases Revert "char: Fix typo in viotape.c" init: fix wording error in mm_init comment usb: gadget: Kconfig: fix typo for 'different' Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c" writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header writeback: fix typo in the writeback_control comment Documentation: Fix multiple typo in Documentation tpm_tis: fix tis_lock with respect to RCU Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c" Doc: Update numastat.txt qla4xxx: Add missing spaces to error messages compiler.h: Fix typo security: struct security_operations kerneldoc fix Documentation: broken URL in libata.tmpl Documentation: broken URL in filesystems.tmpl mtd: simplify return logic in do_map_probe() mm: fix comment typo of truncate_inode_pages_range power: bq27x00: Fix typos in comment ...	2012-03-20 21:12:50 -07:00
Linus Torvalds	ed378a52da	USB merge for 3.4-rc1 Here's the big USB merge for the 3.4-rc1 merge window. Lots of gadget driver reworks here, driver updates, xhci changes, some new drivers added, usb-serial core reworking to fix some bugs, and other various minor things. There are some patches touching arch code, but they have all been acked by the various arch maintainers. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk9njL8ACgkQMUfUDdst+ylQ9wCfbBOnIT01lGOorkaE9pom0hhk HfMAoKq1xzCR2B+OS3UMyUQffk+Ri9Ri =KIQ2 -----END PGP SIGNATURE----- Merge tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB merge for 3.4-rc1 from Greg KH: "Here's the big USB merge for the 3.4-rc1 merge window. Lots of gadget driver reworks here, driver updates, xhci changes, some new drivers added, usb-serial core reworking to fix some bugs, and other various minor things. There are some patches touching arch code, but they have all been acked by the various arch maintainers." * tag 'usb-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (302 commits) net: qmi_wwan: add support for ZTE MF820D USB: option: add ZTE MF820D usb: gadget: f_fs: Remove lock is held before freeing checks USB: option: make interface blacklist work again usb/ub: deprecate & schedule for removal the "Low Performance USB Block" driver USB: ohci-pxa27x: add clk_prepare/clk_unprepare calls USB: use generic platform driver on ath79 USB: EHCI: Add a generic platform device driver USB: OHCI: Add a generic platform device driver USB: ftdi_sio: new PID: LUMEL PD12 USB: ftdi_sio: add support for FT-X series devices USB: serial: mos7840: Fixed MCS7820 device attach problem usb: Don't make USB_ARCH_HAS_{XHCI,OHCI,EHCI} depend on USB_SUPPORT. usb gadget: fix a section mismatch when compiling g_ffs with CONFIG_USB_FUNCTIONFS_ETH USB: ohci-nxp: Remove i2c_write(), use smbus USB: ohci-nxp: Support for LPC32xx USB: ohci-nxp: Rename symbols from pnx4008 to nxp USB: OHCI-HCD: Rename ohci-pnx4008 to ohci-nxp usb: gadget: Kconfig: fix typo for 'different' usb: dwc3: pci: fix another failure path in dwc3_pci_probe() ...	2012-03-20 11:26:30 -07:00
Cong Wang	589973a704	drbd: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:29 +08:00
Cong Wang	cfd8005c99	block: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:16 +08:00
Steven Noonan	3467811e26	xen-blkfront: make blkif_io_lock spinlock per-device This patch moves the global blkif_io_lock to the per-device structure. The spinlock seems to exists for two reasons: to disable IRQs when in the interrupt handlers for blkfront, and to protect the blkfront VBDs when a detachment is requested. Having a global blkif_io_lock doesn't make sense given the use case, and it drastically hinders performance due to contention. All VBDs with pending IOs have to take the lock in order to get work done, which serializes everything pretty badly. Signed-off-by: Steven Noonan <snoonan@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Andrew Jones	dad5cf659b	xen/blkfront: don't put bdev right after getting it We should hang onto bdev until we're done with it. Signed-off-by: Andrew Jones <drjones@redhat.com> [v1: Fixed up git commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Akinobu Mita	34ae2e47d9	xen-blkfront: use bitmap_set() and bitmap_clear() Use bitmap_set and bitmap_clear rather than modifying individual bits in a memory region. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Daniel De Graaf	b2167ba6dd	xen/blkback: Enable blkback on HVM guests Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Daniel De Graaf	4f14faaab4	xen/blkback: use grant-table.c hypercall wrappers Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-20 12:52:41 +01:00
Sebastian Andrzej Siewior	7396bd9fa1	usb/ub: deprecate & schedule for removal the "Low Performance USB Block" driver Deprecate this driver. All devices which can be handled by this driver can also be handled by the usb-storage driver. Acked-By: Pete Zaitcev <zaitcev@redhat.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-03-16 13:30:10 -07:00
Stephen Rothwell	ba7a4822b4	powerpc: Remove some of the legacy iSeries specific device drivers These drivers are specific to the PowerPC legacy iSeries platform and their Kconfig is specified in arch/powerpc. Legacy iSeries is being removed, so these drivers can no longer be selected. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-16 09:28:05 +11:00
Linus Torvalds	f1cbd03f5e	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Been sitting on this for a while, but lets get this out the door. This fixes various important bugs for 3.3 final, along with a few more trivial ones. Please pull!" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix ioc leak in put_io_context block, sx8: fix pointer math issue getting fw version Block: use a freezable workqueue for disk-event polling drivers/block/DAC960: fix -Wuninitialized warning drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning block: fix __blkdev_get and add_disk race condition block: Fix setting bio flags in drivers (sd_dif/floppy) block: Fix NULL pointer dereference in sd_revalidate_disk block: exit_io_context() should call elevator_exit_icq_fn() block: simplify ioc_release_fn() block: replace icq->changed with icq->flags	2012-03-14 17:16:45 -07:00
Greg Kroah-Hartman	f7a0d426f3	Merge 3.3-rc7 into usb-next This resolves the conflict with drivers/usb/host/ehci-fsl.h that happened with changes in Linus's and this branch at the same time. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-03-12 09:13:31 -07:00
Muthu Kumar	9354f1b8e6	floppy/scsi: fix setting of BIO flags Fix setting bio flags in drivers (sd_dif/floppy). Signed-off-by: Muthukumar R <muthur@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-05 15:49:43 -08:00
Dan Carpenter	ea5f4db8ec	block, sx8: fix pointer math issue getting fw version "mem" is type u8. We need parenthesis here or it screws up the pointer math probably leading to an oops. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-03 19:44:39 +01:00
Danny Kukawka	cecd353a02	drivers/block/DAC960: fix -Wuninitialized warning Set CommandMailbox with memset before use it. Fix for: drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’: arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’ may be used uninitialized in this function [-Wuninitialized] drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’ was declared here Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-02 10:48:35 +01:00
Danny Kukawka	bca505f109	drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning Fixed compiler warning: comparison between ‘DAC960_V2_IOCTL_Opcode_T’ and ‘enum <anonymous>’ Renamed enum, added a new enum for SCSI_10.CommandOpcode in DAC960_V2_ProcessCompletedCommand(). Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-02 10:48:32 +01:00
Muthukumar R	12ebffd146	block: Fix setting bio flags in drivers (sd_dif/floppy) Fix setting bio flags in drivers (sd_dif/floppy). Signed-off-by: Muthukumar R <muthur@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-02 10:40:58 +01:00
Sebastian Andrzej Siewior	7ac4704c09	usb/storage: a couple defines from drivers/usb/storage/transport.h to include/linux/usb/storage.h This moves the BOT data structures for CBW and CSW from drivers internal header file to global include able file in include/. The storage gadget is using the same name for CSW but a different for CBW so I fix it up properly. The same goes for the ub driver and keucr driver in staging. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-02-28 11:05:18 -08:00
Hitoshi Mitake	797a796a13	asm-generic: architecture independent readq/writeq for 32bit environment This provides unified readq()/writeq() helper functions for 32-bit drivers. For some cases, readq/writeq without atomicity is harmful, and order of io access has to be specified explicitly. So in this patch, new two header files which contain non-atomic readq/writeq are added. - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/ writeq with the order of lower address -> higher address - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/ writeq with reversed order This allows us to remove some readq()s that were added drivers when the default non-atomic ones were removed in commit `dbee8a0aff` ("x86: remove 32-bit versions of readq()/writeq()") The drivers which need readq/writeq but can do with the non-atomic ones must add the line: #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */ But this will be nop in 64-bit environments, and no other #ifdefs are required. So I believe that this patch can solve the problem of 1. driver-specific readq/writeq 2. atomicity and order of io access This patch is tested with building allyesconfig and allmodconfig as ARCH=x86 and ARCH=i386 on top of tip/master. Cc: Kashyap Desai <Kashyap.Desai@lsi.com> Cc: Len Brown <lenb@kernel.org> Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Roland Dreier <roland@purestorage.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-21 16:47:28 -08:00
Jesper Juhl	d0156f4d62	NVM Express: Remove unneeded include of linux/version.h from nvme.c There's no need for drivers/block/nvme.c to include linux/version.h, so remove the include. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-02-21 11:48:54 +01:00
Linus Torvalds	3ec1e88b33	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Says Jens: "Time to push off some of the pending items. I really wanted to wait until we had the regression nailed, but alas it's not quite there yet. But I'm very confident that it's "just" a missing expire on exit, so fix from Tejun should be fairly trivial. I'm headed out for a week on the slopes. - Killing the barrier part of mtip32xx. It doesn't really support barriers, and it doesn't need them (writes are fully ordered). - A few fixes from Dan Carpenter, preventing overflows of integer multiplication. - A fixup for loop, fixing a previous commit that didn't quite solve the partial read problem from Dave Young. - A bio integer overflow fix from Kent Overstreet. - Improvement/fix of the door "keep locked" part of the cdrom shared code from Paolo Benzini. - A few cfq fixes from Shaohua Li. - A fix for bsg sysfs warning when removing a file it did not create from Stanislaw Gruszka. - Two fixes for floppy from Vivek, preventing a crash. - A few block core fixes from Tejun. One killing the over-optimized ioc exit path, cleaning that up nicely. Two others fixing an oops on elevator switch, due to calling into the scheduler merge check code without holding the queue lock." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix lockdep warning on io_context release put_io_context() relay: prevent integer overflow in relay_open() loop: zero fill bio instead of return -EIO for partial read bio: don't overflow in bio_get_nr_vecs() floppy: Fix a crash during rmmod floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called cdrom: move shared static to cdrom_device_info bsg: fix sysfs link remove warning block: don't call elevator callbacks for plug merges block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data block: strip out locking optimization in put_io_context() cdrom: use copy_to_user() without the underscores block: fix ioc locking warning block: fix NULL icq_cache reference block,cfq: change code order	2012-02-11 10:07:11 -08:00
Dave Young	306df0716a	loop: zero fill bio instead of return -EIO for partial read commit `8268f5a741` ("deny partial write for loop dev fd") tried to fix the loop device partial read information leak problem. But it changed the semantics of read behavior. When we read beyond the end of the device we should get 0 bytes, which is normal behavior, we should not just return -EIO Instead of returning -EIO, zero out the bio to avoid information leak in case of partail read. Signed-off-by: Dave Young <dyoung@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: Dmitry Monakhov <dmonakhov@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-08 22:07:19 +01:00
Vivek Goyal	4609dff6b5	floppy: Fix a crash during rmmod floppy driver does not call add_disk() on all the drives hence we don't take gendisk reference on request queue for these drives. Don't call put_disk() with disk->queue set, otherwise we try to put the reference we never took. Reported-and-tested-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de> Signed-off-by: Vivek Goyal<vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-08 20:03:39 +01:00
Vivek Goyal	3f9a5aabd0	floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called add_disk() takes gendisk reference on request queue. If driver failed during initialization and never called add_disk() then that extra reference is not taken. That reference is put in put_disk(). floppy driver allocates the disk, allocates queue, sets disk->queue and then relizes that floppy controller is not present. It tries to tear down everything and tries to put a reference down in put_disk() which was never taken. In such error cases cleanup disk->queue before calling put_disk() so that we never try to put down a reference which was never taken in first place. Reported-and-tested-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-08 20:03:38 +01:00
Asai Thambi S P	4e8670e261	mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data Removed the following: * irrelevant argument 'barrier' of mtip_hw_submit_io() * unused member 'eh_active' of struct driver_data Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-07 07:54:31 +01:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Alex Elder	d23a4b3fd6	rbd: fix safety of rbd_put_client() The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A \| Thread on CPU B --------------- \| --------------- rbd_put_client(CLIENT) \| rbd_get_client() kref_put() \| (acquires node_lock) kref->refcount becomes 0 \| __rbd_client_find() returns CLIENT calls rbd_client_release() \| kref_get(&CLIENT->kref); \| (releases node_lock) (acquires node_lock) \| deletes CLIENT from list \| ...and starts using CLIENT... (releases node_lock) \| and frees CLIENT \| <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:56:59 -08:00
Alex Elder	97bb59a03d	rbd: fix a memory leak in rbd_get_client() If an existing rbd client is found to be suitable for use in rbd_get_client(), the rbd_options structure is not being freed as it should. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:27 -08:00
Linus Torvalds	93c3d65b28	nvme: fix merge error due to change of 'make_request_fn' fn type The type of 'make_request_fn' changed in `5a7bbad27a` ("block: remove support for bio remapping from ->make_request"), but the merge of the nvme driver didn't take that into account, and as a result the driver would compile with a warning: drivers/block/nvme.c: In function 'nvme_alloc_ns': drivers/block/nvme.c:1336:2: warning: passing argument 2 of 'blk_queue_make_request' from incompatible pointer type [enabled by default] include/linux/blkdev.h:830:13: note: expected 'void ()(struct request_queue , struct bio )' but argument is of type 'int ()(struct request_queue , struct bio )' It's benign, but the warning is annoying. Reported-by: Stephen Rothwell <sfr@canb.auug.org> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-18 15:41:27 -08:00
Linus Torvalds	92b5abbb44	Merge git://git.infradead.org/users/willy/linux-nvme * git://git.infradead.org/users/willy/linux-nvme: (105 commits) NVMe: Set number of queues correctly NVMe: Version 0.8 NVMe: Set queue flags correctly NVMe: Simplify nvme_unmap_user_pages NVMe: Mark the end of the sg list NVMe: Fix DMA mapping for admin commands NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT NVMe: Merge the nvme_bio and nvme_prp data structures NVMe: Change nvme_completion_fn to take a dev NVMe: Change get_nvmeq to take a dev instead of a namespace NVMe: Simplify completion handling NVMe: Update Identify Controller data structure NVMe: Implement doorbell stride capability NVMe: Version 0.7 NVMe: Don't probe namespace 0 Fix calculation of number of pages in a PRP List NVMe: Create nvme_identify and nvme_get_features functions NVMe: Fix memory leak in nvme_dev_add() NVMe: Fix calls to dma_unmap_sg NVMe: Correct sg list setup in nvme_map_user_pages ...	2012-01-18 12:34:09 -08:00
Linus Torvalds	16008d6416	Merge branch 'for-3.3/drivers' of git://git.kernel.dk/linux-block * 'for-3.3/drivers' of git://git.kernel.dk/linux-block: mtip32xx: do rebuild monitoring asynchronously xen-blkfront: Use kcalloc instead of kzalloc to allocate array mtip32xx: uninitialized variable in mtip_quiesce_io() mtip32xx: updates based on feedback xen-blkback: convert hole punching to discard request on loop devices xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io xen/blk[front\|back]: Enhance discard support with secure erasing support. xen/blk[front\|back]: Squash blkif_request_rw and blkif_request_discard together mtip32xx: update to new ->make_request() API mtip32xx: add module.h include to avoid conflict with moduleh tree mtip32xx: mark a few more items static mtip32xx: ensure that all local functions are static mtip32xx: cleanup compat ioctl handling mtip32xx: fix warnings/errors on 32-bit compiles block: Add driver for Micron RealSSD pcie flash cards	2012-01-15 12:48:41 -08:00
Linus Torvalds	b3c9dd182e	Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block * 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits) Revert "block: recursive merge requests" block: Stop using macro stubs for the bio data integrity calls blockdev: convert some macros to static inlines fs: remove unneeded plug in mpage_readpages() block: Add BLKROTATIONAL ioctl block: Introduce blk_set_stacking_limits function block: remove WARN_ON_ONCE() in exit_io_context() block: an exiting task should be allowed to create io_context block: ioc_cgroup_changed() needs to be exported block: recursive merge requests block, cfq: fix empty queue crash caused by request merge block, cfq: move icq creation and rq->elv.icq association to block core block, cfq: restructure io_cq creation path for io_context interface cleanup block, cfq: move io_cq exit/release to blk-ioc.c block, cfq: move icq cache management to block core block, cfq: move io_cq lookup to blk-ioc.c block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq block, cfq: reorganize cfq_io_context into generic and cfq specific parts block: remove elevator_queue->ops block: reorder elevator switch sequence ... Fix up conflicts in: - block/blk-cgroup.c Switch from can_attach_task to can_attach - block/cfq-iosched.c conflict with now removed cic index changes (we now use q->id instead)	2012-01-15 12:24:45 -08:00
Jens Axboe	85a0f7b220	Merge branch 'for-3.3/mtip32xx' into for-3.3/drivers	2012-01-15 10:39:35 +01:00
Paolo Bonzini	577ebb374c	block: add and use scsi_blk_cmd_ioctl Introduce a wrapper around scsi_cmd_ioctl that takes a block device. The function will then be enhanced to detect partition block devices and, in that case, subject the ioctls to whitelisting. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-14 15:07:24 -08:00
Linus Torvalds	0a80939b3e	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N 8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO 7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d 2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+ f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv 1pvuGcJDZwYiwfiXr3FL =HHrh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/rustyrussell/linux Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 * tag 'for-linus' of git://github.com/rustyrussell/linux: module_param: check that bool parameters really are bool. intelfbdrv.c: bailearly is an int module_param paride/pcd: fix bool verbose module parameter. module_param: make bool parameters really bool (drivers & misc) module_param: make bool parameters really bool (arch) module_param: make bool parameters really bool (core code) kernel/async: remove redundant declaration. printk: fix unnecessary module_param_name. lirc_parallel: fix module parameter description. module_param: avoid bool abuse, add bint for special cases. module_param: check type correctness for module_param_array modpost: use linker section to generate table. modpost: use a table rather than a giant if/else statement. modules: sysfs - export: taint, coresize, initsize kernel/params: replace DEBUGP with pr_debug module: replace DEBUGP with pr_debug module: struct module_ref should contains long fields module: Fix performance regression on modules with large symbol tables module: Add comments describing how the "strmap" logic works Fix up conflicts in scripts/mod/file2alias.c due to the new linker- generated table approach to adding __mod_*_device_table entries. The ARM sa11x0 mcp bus needed to be converted to that too.	2012-01-14 12:32:16 -08:00
Linus Torvalds	1a52bb0b68	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: ensure prealloc_blob is in place when removing xattr rbd: initialize snap_rwsem in rbd_add() ceph: enable/disable dentry complete flags via mount option vfs: export symbol d_find_any_alias() ceph: always initialize the dentry in open_root_dentry() libceph: remove useless return value for osd_client __send_request() ceph: avoid iput() while holding spinlock in ceph_dir_fsync ceph: avoid useless dget/dput in encode_fh ceph: dereference pointer after checking for NULL crush: fix force for non-root TAKE ceph: remove unnecessary d_fsdata conditional checks ceph: Use kmemdup rather than duplicating its implementation Fix up conflicts in fs/ceph/super.c (d_alloc_root() failure handling vs always initialize the dentry in open_root_dentry)	2012-01-13 10:29:21 -08:00
Rusty Russell	1b9fbafb3a	paride/pcd: fix bool verbose module parameter. Dan Carpenter points out that it's an int, not a bool: pcd.c:427: if (verbose > 1) pcd.c:433: if (verbose > 1) pcd.c:437: if (verbose < 2) pcd.c:506:#define DBMSG(msg) ((verbose>1)?(msg):NULL) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dan Carpenter <dan.carpenter@oracle.com>	2012-01-13 09:32:26 +10:30
Rusty Russell	90ab5ee941	module_param: make bool parameters really bool (drivers & misc) module_param(bool) used to counter-intuitively take an int. In `fddd5201` (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-13 09:32:20 +10:30
Alex Elder	0e805a1d85	rbd: initialize snap_rwsem in rbd_add() New rbd device structures get initialized in rbd_add(). Many of the fields rely on being initially zero-filled. However we lockdep was noticing that the rw_semaphore embedded in the header field was not getting properly initialized. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-12 11:00:50 -08:00
Amit Shah	f8fb5bc23a	virtio: blk: Add freeze, restore handlers to support S4 Delete the vq and flush any pending requests from the block queue on the freeze callback to prepare for hibernation. Re-create the vq in the restore callback to resume normal function. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-12 15:44:45 +10:30
Amit Shah	6abd6e5a44	virtio: blk: Move vq initialization to separate function The probe and PM restore functions will share this code. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-12 15:44:45 +10:30
Michael S. Tsirkin	4678d6f970	virtio_blk: fix config handler race Fix a theoretical race related to config work handler: a config interrupt might happen after we flush config work but before we reset the device. It will then cause the config work to run during or after reset. Two problems with this: - if this runs after device is gone we will get use after free - access of config while reset is in progress is racy (as layout is changing). As a solution 1. flush after reset when we know there will be no more interrupts 2. add a flag to disable config access before reset Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-12 15:44:44 +10:30
Rusty Russell	f96fde41f7	virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf Remove wrapper functions. This makes the allocation type explicit in all callers; I used GPF_KERNEL where it seemed obvious, left it at GFP_ATOMIC otherwise. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Christoph Hellwig <hch@lst.de>	2012-01-12 15:44:42 +10:30
Matthew Wilcox	df34813990	NVMe: Set number of queues correctly The number of submission & completion queues should be set by calling Set Features, not Get Features. Reported-by: Kwok Kong <Kwok.Kong@idt.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-11 09:22:24 -05:00
Linus Torvalds	4690dfa8cd	Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze * 'next' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Wire-up new system calls microblaze: Remove NO_IRQ from architecture input: xilinx_ps2: Don't use NO_IRQ block: xsysace: Don't use NO_IRQ microblaze: Trivial asm fix microblaze: Fix debug message in module microblaze: Remove eprintk macro microblaze: Send CR before LF for early console microblaze: Change NO_IRQ to 0 microblaze: Use irq_of_parse_and_map for timer microblaze: intc: Change variable name microblaze: Use of_find_compatible_node for timer and intc microblaze: Add __cmpdi2 microblaze: Synchronize __pa __va macros	2012-01-10 17:37:49 -08:00
Matthew Wilcox	366e8217e5	NVMe: Version 0.8 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 16:30:15 -05:00
Matthew Wilcox	4eeb9215a0	NVMe: Set queue flags correctly QUEUE_FLAG_* are flags (other than QUEUE_FLAG_DEFAULT), so they cannot be ORed together. Set the queue flags using queue_flag_set_unlocked(). Reported-by: Donald Wood <donald.e.wood@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 16:29:23 -05:00
Matthew Wilcox	1c2ad9faaf	NVMe: Simplify nvme_unmap_user_pages By using the iod->nents field (the same way other I/O paths do), we can avoid recalculating the number of sg entries at unmap time, and make nvme_unmap_user_pages() easier to call. Also, use the 'write' parameter instead of assuming DMA_FROM_DEVICE. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:22 -05:00
Matthew Wilcox	fe304c43c6	NVMe: Mark the end of the sg list For user I/O and admin commands, we were forgetting to mark the end of the SG list. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:14 -05:00
Matthew Wilcox	497421880a	NVMe: Fix DMA mapping for admin commands We were always mapping as DMA_FROM_DEVICE then unmapping with DMA_TO_DEVICE which was clearly not correct. Follow the same pattern as nvme_submit_io() and key off the bottom bit of the opcode to determine whether this is a read or a write. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:54:05 -05:00
Matthew Wilcox	ff976d724a	NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT IO_TIMEOUT is a little too generic and might be used by other parts of the kernel in the future. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:53:54 -05:00
Matthew Wilcox	eca18b2394	NVMe: Merge the nvme_bio and nvme_prp data structures The new merged data structure is called nvme_iod. This improves performance for mid-sized I/Os (in the 16k range) since we save a memory allocation. It is also a slightly simpler interface to use. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:51:20 -05:00
Matthew Wilcox	5c1281a3bf	NVMe: Change nvme_completion_fn to take a dev The queue is only needed for some rare occasions, and it's more consistent to pass the device around. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:51:00 -05:00
Matthew Wilcox	040a93b52a	NVMe: Change get_nvmeq to take a dev instead of a namespace Upcoming patches require calling get_nvmeq when we don't have a namespace. Some callers already have the device in a local variable anyway. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:49:18 -05:00
Matthew Wilcox	c2f5b65020	NVMe: Simplify completion handling Instead of encoding the handler type in the bottom two bits of the per-completion context pointer, store the handler function as well as the context pointer. This gives us more flexibility and the code is clearer. It comes at the cost of an extra 8k of memory per queue, but this feels like a reasonable price to pay. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-01-10 14:47:46 -05:00
Linus Torvalds	90160371b3	Merge branch 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (37 commits) xen/pciback: Expand the warning message to include domain id. xen/pciback: Fix "device has been assigned to X domain!" warning xen/pciback: Move the PCI_DEV_FLAGS_ASSIGNED ops to the "[un\|]bind" xen/xenbus: don't reimplement kvasprintf via a fixed size buffer xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX. Xen: consolidate and simplify struct xenbus_driver instantiation xen-gntalloc: introduce missing kfree xen/xenbus: Fix compile error - missing header for xen_initial_domain() xen/netback: Enable netback on HVM guests xen/grant-table: Support mappings required by blkback xenbus: Use grant-table wrapper functions xenbus: Support HVM backends xen/xenbus-frontend: Fix compile error with randconfig xen/xenbus-frontend: Make error message more clear xen/privcmd: Remove unused support for arch specific privcmp mmap xen: Add xenbus_backend device xen: Add xenbus device driver xen: Add privcmd device driver xen/gntalloc: fix reference counts on multi-page mappings ...	2012-01-10 10:09:59 -08:00
Linus Torvalds	98793265b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)	2012-01-08 13:21:22 -08:00
Linus Torvalds	972b2c7199	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc//mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount ...	2012-01-08 12:19:57 -08:00
Al Viro	ece2ccb668	Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z	2012-01-06 23:15:54 -05:00
Linus Torvalds	356b95424c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (21 commits) m68k/mac: Make CONFIG_HEARTBEAT unavailable on Mac m68k/serial: Remove references to obsolete serial config options m68k/net: Remove obsolete IRQ_FLG_* users m68k: Don't comment out syscalls used by glibc m68k/atari: Move declaration of atari_SCC_reset_done to header file m68k/serial: Remove references to obsolete CONFIG_SERIAL167 m68k/hp300: Export hp300_ledstate m68k: Initconst section fixes m68k/mac: cleanup macro case mac_scsi: fix mac_scsi on some powerbooks m68k/mac: fix powerbook 150 adb_type m68k/mac: fix baboon irq disable and shutdown m68k/mac: oss irq fixes m68k/mac: fix nubus slot irq disable and shutdown m68k/mac: enable via_alt_mapping on performa 580 m68k/mac: cleanup forward declarations m68k/mac: cleanup mac_irq_pending m68k/mac: cleanup mac_clear_irq m68k/mac: early console m68k/mvme16x: Add support for EARLY_PRINTK ... Fix up trivial conflict in arch/m68k/Kconfig.debug due to new EARLY_PRINTK config option addition clashing with movement of the BOOTPARAM options.	2012-01-06 18:28:12 -08:00
Michal Simek	ba2d5affde	block: xsysace: Don't use NO_IRQ Drivers shouldn't use NO_IRQ. Microblaze and PPC define NO_IRQ as 0 and this reference will be removed in near future. Signed-off-by: Michal Simek <monstr@monstr.eu> Reviewed-by: Ryan Mallon <rmallon@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> CC: Rob Herring <rob.herring@calxeda.com>	2012-01-05 08:34:29 +01:00
Jan Beulich	73db144b58	Xen: consolidate and simplify struct xenbus_driver instantiation The 'name', 'owner', and 'mod_name' members are redundant with the identically named fields in the 'driver' sub-structure. Rather than switching each instance to specify these fields explicitly, introduce a macro to simplify this. Eliminate further redundancy by allowing the drvname argument to DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from the ID table will be used for .driver.name). Also eliminate the questionable xenbus_register_{back,front}end() wrappers - their sole remaining purpose was the checking of the 'owner' field, proper setting of which shouldn't be an issue anymore when the macro gets used. v2: Restore DRV_NAME for the driver name in xen-pciback. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-01-04 17:01:17 -05:00
Asai Thambi S P	62ee8c13e2	mtip32xx: do rebuild monitoring asynchronously Earlier, rebuild monitoring was done in the context of probe. Now the service thread takes the responsibility of rebuild monitoring, and probe returns good status. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-01-04 22:01:32 +01:00
Al Viro	2c9ede55ec	switch device_get_devnode() and ->devnode() to umode_t * both callers of device_get_devnode() are only interested in lower 16bits and nobody tries to return anything wider than 16bit anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
Al Viro	ff01bb4832	fs: move code out of buffer.c Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:07 -05:00
Jens Axboe	f748040bb8	Merge branch 'stable/for-jens-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.3/drivers	2011-12-25 16:46:46 +01:00
Linus Torvalds	b0d78ee89c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block * 'for-linus' of git://git.kernel.dk/linux-block: block: don't kick empty queue in blk_drain_queue() block/swim3: Locking fixes loop: Fix discard_alignment default setting cfq-iosched: fix cfq_cic_link() race confition cfq-iosched: free cic_index if blkio_alloc_blkg_stats fails cciss: fix flush cache transfer length cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler loop: fix loop block driver discard and encryption comment block: initialize request_queue's numa node during	2011-12-16 10:05:14 -08:00
Thomas Meyer	f094148a17	xen-blkfront: Use kcalloc instead of kzalloc to allocate array The advantage of kcalloc is, that will prevent integer overflows which could result from the multiplication of number of elements and size and it is also a bit nicer to read. The semantic patch that makes this change is available in https://lkml.org/lkml/2011/11/25/107 Signed-off-by: Thomas Meyer <thomas@m3y3r.de> [v1: Seperated the drivers/block/cciss_scsi.c out of this patch] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-12-16 12:36:52 -05:00
Tejun Heo	1ba64edef6	block, sx8: kill blk_insert_request() The only user left for blk_insert_request() is sx8 and it can be trivially switched to use blk_execute_rq_nowait() - special requests aren't included in io stat and sx8 doesn't use block layer tagging. Switch sx8 and kill blk_insert_requeset(). This patch doesn't introduce any functional difference. Only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-12-14 00:33:37 +01:00
Linus Torvalds	653f42f6b6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options	2011-12-13 14:59:42 -08:00
Benjamin Herrenschmidt	b302545744	block/swim3: Locking fixes The old PowerMac swim3 driver has some "interesting" locking issues, using a private lock and failing to lock the queue before completing requests, which triggered WARN_ONs among others. This rips out the private lock, makes everything operate under the block queue lock, and generally makes things simpler. We used to also share a queue between the two possible instances which was problematic since we might pick the wrong controller in some cases, so make the queue and the current request per-instance and use queuedata to point to our private data which is a lot cleaner. We still share the queue lock but then, it's nearly impossible to actually use 2 swim3's simultaneously: one would need to have a Wallstreet PowerBook, the only machine afaik with two of these on the motherboard, and populate both hotswap bays with a floppy drive (the machine ships only with one), so nobody cares... While at it, add a little fix to clear up stale interrupts when loading the driver or plugging a floppy drive in a bay. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-12-12 12:42:12 +01:00
Finn Thain	ed04c97d51	m68k/mac: cleanup forward declarations Move some forward declarations into header files and adjust includes. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2011-12-10 19:52:46 +01:00
Josh Durgin	51703306b3	rbd: remove buggy rollback functionality This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:19 -08:00
Josh Durgin	81e759fbf7	rbd: return an error when an invalid header is read This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:10 -08:00
Justin P. Mattock	42b2aa86c6	treewide: Fix typos in various parts of the kernel, and fix some comments. The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-12-02 14:57:31 +01:00
Lukas Czerner	dfaf3c036c	loop: Fix discard_alignment default setting discard_alignment is not relevant to the loop driver since it is supposed to be set as a workaround for the old sector 63 alignments. So set it to zero rather than block size. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-12-02 14:47:03 +01:00
Stephen M. Cameron	59bd71a81b	cciss: fix flush cache transfer length We weren't filling in the transfer length of the flush cache command (it transfers 4 bytes of zeroes). Firmware didn't seem to be bothered by this, but it should be fixed. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-28 20:12:05 +01:00
Stephen M. Cameron	6225da4815	cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler IRQF_SHARED is required for older controllers that don't support MSI(X) and which may end up sharing an interrupt. Also remove deprecated IRQF_DISABLED. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-28 20:12:05 +01:00
Dave Young	ae95757a90	loop: fix loop block driver discard and encryption comment The loop driver does not support discard if encryption is enabled, fix the comment. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-25 09:41:25 +01:00
Dan Carpenter	3e54a3d1b8	mtip32xx: uninitialized variable in mtip_quiesce_io() We recently introduce new continue in the loop which make gcc complain. In theory if MTIP_FLAG_SVC_THD_ACTIVE_BIT is set, we could hit continue over and over until eventually we time out of the loop. In that case "active" should be set as true, but right now it's uninitialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-24 12:59:00 +01:00
Asai Thambi S P	60ec0eecfa	mtip32xx: updates based on feedback * queue ncq commands when a non-ncq is in progress or error handling is active * merge variables 'internal_cmd_in_progress' and 'eh_active' into new variable 'flags' * get rid of read/write semaphore 'internal_sem' * new service thread to issue queued commands * use macros from ata.h for command codes * return ENOTTY for BLKFLSBUF ioctl * style changes Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-23 08:29:24 +01:00
Li Dongyang	ae18be11b5	xen-blkback: convert hole punching to discard request on loop devices As of `dfaa2ef68e`, loop devices support discard request now. We could just issue a discard request, and the loop driver will punch the hole for us, so we don't need to touch the internals of loop device and punch the hole ourselves, Thanks. V0->V1: rebased on devel/for-jens-3.3 Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:05 -05:00
Konrad Rzeszutek Wilk	421463526f	xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io .. and move it to its own function that will deal with the discard operation. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:03 -05:00
Konrad Rzeszutek Wilk	5ea4298669	xen/blk[front\|back]: Enhance discard support with secure erasing support. Part of the blkdev_issue_discard(xx) operation is that it can also issue a secure discard operation that will permanantly remove the sectors in question. We advertise that we can support that via the 'discard-secure' attribute and on the request, if the 'secure' bit is set, we will attempt to pass in REQ_DISCARD \| REQ_SECURE. CC: Li Dongyang <lidongyang@novell.com> [v1: Used 'flag' instead of 'secure:1' bit] [v2: Use 'reserved' uint8_t instead of adding a new value] [v3: Check for nseg when mapping instead of operation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:28:01 -05:00
Konrad Rzeszutek Wilk	97e36834f5	xen/blk[front\|back]: Squash blkif_request_rw and blkif_request_discard together In a union type structure to deal with the overlapping attributes in a easier manner. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-11-18 13:27:59 -05:00
Dan Carpenter	a2c2a0e668	paride: fix potential information leak in pg_read() Smatch has a new check for Rosenberg type information leaks where structs are copied to the user with uninitialized stack data in them. i In this case, the pg_write_hdr struct has a hole in it. struct pg_write_hdr { char magic; /* 0 1 / char func; / 1 1 / / XXX 2 bytes hole, try to pack / int dlen; / 4 4 */ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tim Waugh <tim@cyberelk.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:50 +01:00
Stephen M. Cameron	0007a4c90a	cciss: auto engage SCSI mid layer at driver load time A long time ago, probably in 2002, one of the distros, or maybe more than one, loaded block drivers prior to loading the SCSI mid layer. This meant that the cciss driver, being a block driver, could not engage the SCSI mid layer at init time without panicking, and relied on being poked by a userland program after the system was up (and the SCSI mid layer was therefore present) to engage the SCSI mid layer. This is no longer the case, and cciss can safely rely on the SCSI mid layer being present at init time and engage the SCSI mid layer straight away. This means that users will see their tape drives and medium changers at driver load time without need for a script in /etc/rc.d that does this: for x in /proc/driver/cciss/cciss* do echo "engage scsi" > $x done However, if no tape drives or medium changers are detected, the SCSI mid layer will not be engaged. If a tape drive or medium change is later hot-added to the system it will then be necessary to use the above script or similar for the device(s) to be acceesible. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:49 +01:00
Dmitry Monakhov	7035b5df3c	loop: cleanup set_status interface 1) Anyone who has read access to loopdev has permission to call set_status and may change important parameters such as lo_offset, lo_sizelimit and so on, which contradicts to read access pattern and definitely equals to write access pattern. 2) Add lo_offset over i_size check to prevent blkdev_size overflow. ##Testcase_bagin #dd if=/dev/zero of=./file bs=1k count=1 #losetup /dev/loop0 ./file /* userspace_application / struct loop_info64 loinf; fd = open("/dev/loop0", O_RDONLY); ioctl(fd, LOOP_GET_STATUS64, &loinf); / Set offset to any value which is bigger than i_size, and sizelimit * to nonzero value/ loinf.lo_offset = 40961024; loinf.lo_sizelimit = 1024; ioctl(fd, LOOP_SET_STATUS64, &loinf); /* After this loop device will have size similar to 0x7fffffffffxxxx */ #blockdev --getsz /dev/loop0 ##OUTPUT: 36028797018955968 ##Testcase_end [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:49 +01:00
Dmitry Monakhov	3bb9068278	loop: prevent information leak after failed read If read was not fully successful we have to fail whole bio to prevent information leak of old pages ##Testcase_begin dd if=/dev/zero of=./file bs=1M count=1 losetup /dev/loop0 ./file -o 4096 truncate -s 0 ./file # OOps loop offset is now beyond i_size, so read will silently fail. # So bio's pages would not be cleared, may which result in information leak. hexdump -C /dev/loop0 ##testcase_end Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-16 09:21:48 +01:00
Matthew Garrett	1937335856	The Windows driver .inf disables ASPM on all cciss devices. Do the same. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: iss_storagedev@hp.com Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-11 22:05:54 +01:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	06d381484f	Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: net: xen-netback: use API provided by xenbus module to map rings block: xen-blkback: use API provided by xenbus module to map rings xen: use generic functions instead of xen_{alloc, free}_vm_area()	2011-11-06 18:31:36 -08:00
Jens Axboe	a71f483d79	mtip32xx: update to new ->make_request() API Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:36:21 +01:00
Jens Axboe	0e838c624e	mtip32xx: add module.h include to avoid conflict with moduleh tree Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	3ff147d3a8	mtip32xx: mark a few more items static Missed two items: mtip_major, and mtip_pci_driver. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	6316668fbc	mtip32xx: ensure that all local functions are static Kill the declarations in the header file and mark them as static. Reshuffle a few functions to ensure that everything is properly declared before being used. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	ef0f158734	mtip32xx: cleanup compat ioctl handling Do the conversion/copy up front instead of passing in a compat flag to the ioctl handler and subsequently to the exec_drive_taskfile() function. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Jens Axboe	16d02c040b	mtip32xx: fix warnings/errors on 32-bit compiles We need to clean up the compat ioctl handling, but this makes it work for now at least. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-11-05 08:35:10 +01:00
Sam Bradshaw	88523a6155	block: Add driver for Micron RealSSD pcie flash cards This adds mtip32xx, a driver supporting Microns line of pci-express flash storage cards. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-11-05 08:35:10 +01:00
Linus Torvalds	3d0a8d10cf	Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block * 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits) virtio-blk: use ida to allocate disk index hpsa: add small delay when using PCI Power Management to reset for kump cciss: add small delay when using PCI Power Management to reset for kump xen/blkback: Fix two races in the handling of barrier requests. xen/blkback: Check for proper operation. xen/blkback: Fix the inhibition to map pages when discarding sector ranges. xen/blkback: Report VBD_WSECT (wr_sect) properly. xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. xen-blkfront: plug device number leak in xlblk_init() error path xen-blkfront: If no barrier or flush is supported, use invalid operation. xen-blkback: use kzalloc() in favor of kmalloc()+memset() xen-blkback: fixed indentation and comments xen-blkfront: fix a deadlock while handling discard response xen-blkfront: Handle discard requests. xen-blkback: Implement discard requests ('feature-discard') xen-blkfront: add BLKIF_OP_DISCARD and discard request struct drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd() drivers/block/loop.c: emit uevent on auto release drivers/block/cpqarray.c: use pci_dev->revision loop: always allow userspace partitions and optionally support automatic scanning ... Fic up trivial header file includsion conflict in drivers/block/loop.c	2011-11-04 17:22:14 -07:00
Linus Torvalds	b4fdcb02f1	Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c	2011-11-04 17:06:58 -07:00
Matthew Wilcox	f1938f6e1e	NVMe: Implement doorbell stride capability The doorbell stride allows devices to spread out their doorbells instead of packing them tightly. This feature was added as part of ECN 003. This patch also enables support for more than 512 queues :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	ce38c14957	NVMe: Version 0.7 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:05 -04:00
Matthew Wilcox	2b2c189687	NVMe: Don't probe namespace 0 ECN 001 documented that namespace 0 is not valid. Sending an Identify with CNS of 0 and Namespace of 0 is an undefined command. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	0d1bc91258	Fix calculation of number of pages in a PRP List The existing calculation underestimated the number of pages required as it did not take into account the pointer at the end of each page. The replacement calculation may overestimate the number of pages required if the last page in the PRP List is entirely full. By using ->npages as a counter as we fill in the pages, we ensure that we don't try to free a page that was never allocated. Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	bc5fc7e4b2	NVMe: Create nvme_identify and nvme_get_features functions Instead of open-coding calls to nvme_submit_admin_cmd, these small wrappers are simpler to use (the patch removes 14 lines from nvme_dev_add() for example). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	684f5c2025	NVMe: Fix memory leak in nvme_dev_add() The driver was allocating 8k of memory, then freeing 4k of it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Nisheeth Bhat	d1a490e026	NVMe: Fix calls to dma_unmap_sg dma_unmap_sg() must be called with the same 'nents' passed to dma_map_sg(), not the number returned from dma_map_sg(). Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	d0ba1e497b	NVMe: Correct sg list setup in nvme_map_user_pages Our SG list was constructed to always fill the entire first page, even if that was more than the length of the I/O. This is probably harmless, but some IOMMUs might do something bad. Correcting the first call to sg_set_page() made it look a lot closer to the sg_set_page() in the loop, so fold the first call to sg_set_page() into the loop. Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6413214c5d	Fix bug in NVME_IOCTL_SUBMIT_IO Missing 'break' in the switch statement meant that we'd fall through to the 'return -EINVAL' case.	2011-11-04 15:53:04 -04:00
Matthew Wilcox	6bbf1acdde	NVMe: Rework ioctls Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD ioctl that can submit any admin command. Add a new ID ioctl that returns the namespace ID of the queried device. It corresponds to the SCSI Idlun ioctl. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	eac623ba7a	NVMe: Add the nvme thread to the wait queue before waking it up If the I/O was not completed by a single NVMe command, we add the bio to the congestion list and wake up the kthread to resubmit it. But the kthread calls remove_wait_queue() unconditionally, which will oops if it's not on the wait queue. So add the kthread to the wait queue before waking it up. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	6f0f54499f	NVMe: Return real error from nvme_create_queue nvme_setup_io_queues() was assuming that a NULL return from nvme_create_queue() was an out-of-memory error. That's not necessarily true; the adapter might return -EIO, for example. Change the calling convention to return an ERR_PTR on failure instead of NULL. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	be5e094840	NVMe: Version 0.6 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	184d2944cb	NVMe: Add a few calling convention notes For the benefit of reviewers, add comments to a few functions describing their calling context Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	b77954cbdd	NVMe: Handle failures from memory allocations in nvme_setup_prps If any of the memory allocations in nvme_setup_prps fail, handle it by modifying the passed-in data length to reflect the number of bytes we are actually able to send. Also allow the caller to specify the GFP flags they need; for user-initiated commands, we can use GFP_KERNEL allocations. The various callers are updated to handle this possibility; the main I/O path is already prepared for this possibility (as it may happen due to nvme_map_bio being unable to map all the segments of the I/O). The other callers return -ENOMEM instead of doing partial I/Os. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	5aff9382dd	NVMe: Use an IDA to allocate minor numbers The current approach of using the namespace ID as the minor number doesn't work when there are multiple adapters in the machine. Rather than statically partitioning the number of namespaces between adapters, dynamically allocate minor numbers to namespaces as they are detected. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:03 -04:00
Matthew Wilcox	fd63e9ceee	NVMe: Add include of delay.h for msleep Previously it was being implicitly included through some other header file Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	8de055350f	NVMe: Add support for timing out I/Os In the kthread, walk the list of outstanding I/Os and check they've not hit the timeout. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	21075bdee0	NVMe: Rename cancel_cmdid_data to cancel_cmdid The trailing '_data' on the end was annoying and inconsistent. Also, make it actually return the data since this is needed for timing out commands. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	09a58f5364	NVMe: Fix bug in error handling When an I/O completed with an error, we would call bio_endio twice (once with -EIO and once with 0). Found by inspection. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	22605f9681	NVMe: Time out initialisation after a few seconds THe device reports (in its capability register) how long it will take to initialise. If that time elapses before the ready bit becomes set, conclude the device is broken and refuse to initialise it. Log a nice error message so the user knows why we did nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	aba2080f3f	NVMe: Fix warning in free_irq We need to clear the affinity mask before calling free_irq() Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:02 -04:00
Matthew Wilcox	7f53f9d242	NVMe: Correct the Controller Configuration settings The arbitration field was extended by one bit, shifting the shutdown notification bits by one. Also, the SQ/CQ entry size was made configurable for future extensions. Reported-by: Paul Luse <paul.e.luse@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	8ef700678f	NVMe: Version 0.5 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	6c7d49455c	NVMe: Change the definition of nvme_user_io The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	4948168280	NVMe: Add compat_ioctl Make ioctls work for 32-bit applications on 64-bit kernels. The structures are defined to be the same for both 32- and 64-bit applications, so we can use the same handler for both. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	9ecdc94621	NVMe: Simplify queue lookup Fill in all the num_possible_cpus() entries with duplicate pointers. This reduces the complexity of the frequently-called get_nvmeq(), as well as avoiding a bug in it when there are fewer queues than CPUs. Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:01 -04:00
Matthew Wilcox	3cb967c039	NVMe: Remove the kthread from the wait queue Once there are no more bios on the congestion list, we can stop waking up the nvme kthread every time a completion happens. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	7523d834dd	NVMe: Fix off-by-one when filling in PRP lists If the last element in the PRP list fits on the end of the page, there's no need to allocate an extra page to put that single element in. It can fit on the end of the page. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	ac88c36a38	NVMe: Fix interpretation of 'Number of Namespaces' field The spec says this is a 0s based value. We don't need to handle the maximal value because it's reserved to mean "every namespace". Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	19e899b2f9	NVMe: Remove outdated comments The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	fa92282149	NVMe: Fix comment formatting Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	714a7a2288	NVMe: Convert comments to kernel-doc notation Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:53:00 -04:00
Matthew Wilcox	b57ab0fada	NVMe: Version 0.4 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	e6d15f79f9	NVMe: Reduce maximum queue depth by 1 The spec says we're not allowed to completely fill the submission queue. Solve this by reducing the number of allocatable cmdids by 1. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	d8ee9d69f2	NVMe: Fix discontiguous accesses When we submit subsequent portions of the I/O, we need to access the updated block, not start reading again from the original position. This was showing up as miscompares in the XFS randholes testcase. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	1ad2f8932a	NVMe: Handle bios that contain non-virtually contiguous addresses NVMe scatterlists must be virtually contiguous, like almost all I/Os. However, when the filesystem lays out files with a hole, it can be that adjacent LBAs map to non-adjacent virtual addresses. Handle this by submitting one NVMe command at a time for each virtually discontiguous range. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	00df5cb4eb	NVMe: Implement Flush Linux implements Flush as a bit in the bio. That means there may also be data associated with the flush; if so the flush should be sent before the data. To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate the completion routine should do nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	c42705592b	NVMe: Mark CMD_CTX_CANCELLED as being unlikely Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:59 -04:00
Matthew Wilcox	7547881d09	NVMe: Correct SQ doorbell semantics The value written to the doorbell needs to be the first free index in the queue, not the most recently used index in the queue. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	740216fc59	NVMe: Let the kthread take care of devices earlier If interrupts are misconfigured, the kthread will be needed to process admin queue completions. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	b348b7d543	NVMe: Rename nr_queues to nr_io_queues I got confused about whether this included the admin queue or not, and had to resort to reading the spec. It doesn't include the admin queue, so make that clear in the name. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	ca1615424c	NVMe: Remove setting of 'flags' in rw command This was the data transfer bit until spec rev 0.92 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	ad8a5df97c	NVMe: Release 0.3 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	1fa6aeadf1	NVMe: Add a kthread to handle the congestion list Instead of trying to resubmit I/Os in the I/O completion path (in interrupt context), wake up a kthread which will resubmit I/O from user context. This allows mke2fs to run to completion. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	eeee322647	NVMe: Handle failures differently in nvme_submit_bio_queue() Return -EBUSY if the queue is full or -ENOMEM if we failed to allocate memory (or map a scatterlist). Also use GFP_ATOMIC to allocate the nvme_bio and move the locking to the callers of nvme_submit_bio_queue(). In nvme_make_request(), don't permit an I/O to jump the queue -- if the congestion list already has an entry, just add to the tail, rather than trying to submit. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:58 -04:00
Matthew Wilcox	768308400f	NVMe: Handle physical merging of bvec entries In order to not overrun the sg array, we have to merge physically contiguous pages into a single sg entry. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	1974b1ae88	NVMe: Check for DMA mapping failure If dma_map_sg returns 0 (failure), we need to fail the I/O. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	d567760c40	NVMe: Pass the nvme_dev to nvme_free_prps and nvme_setup_prps We were passing the nvme_queue to access the q_dmadev for the dma_alloc_coherent calls, but since we moved to the dma pool API, we really only need the nvme_dev. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	99802a7aee	NVMe: Optimise memory usage for I/Os between 4k and 128k Add a second memory pool for smaller I/Os. We can pack 16 of these on a single page instead of using an entire page for each one. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	091b609258	NVMe: Switch to use DMA Pool API Calling dma_free_coherent from interrupt context causes warnings. Using the DMA pools delays freeing until pool destruction, so avoids the problem. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:57 -04:00
Matthew Wilcox	d534df3c73	NVMe: Rename nvme_req_info to nvme_bio There are too many things called 'info' in this driver. This data structure is auxiliary information for a struct bio, so call it nvme_bio, or nbio when used as a variable. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Shane Michael Matthews	e025344c56	NVMe: Initial PRP List support Add a pointer to the nvme_req_info to hold a new data structure (nvme_prps) which contains a list of the pages allocated to this particular request for holding PRP list entries. nvme_setup_prps() now returns this pointer. To allocate and free the memory used for PRP lists, we need a struct device, so we need to pass the nvme_queue pointer to many functions which didn't use to need it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	51882d00f0	NVMe: Advance the sg pointer when filling in an sg list For multipage BIOs, we were always using sg[0] instead of advancing through the list. Oops :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	d2d8703481	NVMe: Renumber the special context values If POISON_POINTER_DELTA isn't defined, ensure they're in page 0 which should never be mapped. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	9294bbed78	NVMe: Handle the congestion list a little better In the bio completion handler, check for bios on the congestion list for this NVM queue. Also, lock the congestion list in the make_request function as the queue may end up being shared between multiple CPUs. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	e85248e516	NVMe: Record the timeout for each command In addition to recording the completion data for each command, record the anticipated completion time. Choose a timeout of 5 seconds for normal I/Os and 60 seconds for admin I/Os. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	ec6ce618d6	NVMe: Need to lock queue during interrupt handling If we're sharing a queue between multiple CPUs and we cancel a sync I/O, we must have the queue locked to avoid corrupting the stack of the thread that submitted the I/O. It turns out this is the same locking that's needed for the threaded irq handler, so share that code. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:56 -04:00
Matthew Wilcox	48e3d39816	NVMe: Detect command IDs completing that are out of range If the adapter completes a command ID that is outside the bounds of the array, return CMD_CTX_INVALID instead of random data, and print a message in the sync_completion handler (which is rapidly becoming the misc completion handler :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	b36235df01	NVMe: Detect commands that are completed twice Set the context value to CMD_CTX_COMPLETED, and print a message in the sync_completion handler if we see it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	be7b62754e	NVMe: Use a symbolic name to represent cancelled commands instead of 0 I have plans for other special values in sync_completion. Plus, this is more self-documenting, and lets us detect bogus usages. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	58ffacb545	NVMe: Add a module parameter to use a threaded interrupt We're currently calling bio_endio from hard interrupt context. This is not a good idea for preemptible kernels as it will cause longer latencies. Using a threaded interrupt will run the entire queue processing mechanism (including bio_endio) in a thread, which can be preempted. Unfortuantely, it also adds about 7us of latency to the single-I/O case, so make it a module parameter for the moment. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	b1ad37efca	NVMe: Call put_nvmeq() before calling nvme_submit_sync_cmd() We can't have preemption disabled when we call schedule(). Accept the possibility that we'll get preempted, and it'll cost us some cacheline bounces. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	3c0cf138d7	NVMe: Allow fatal signals to interrupt I/O If the user sends a fatal signal, sleeping in the TASK_KILLABLE state permits the task to be aborted. The only wrinkle is making sure that if/when the command completes later that it doesn't upset anything. Handle this by setting the data pointer to 0, and checking the value isn't NULL in the sync completion path. Eventually, bios can be cancelled through this path too. Note that the cmdid isn't freed to prevent reuse. We should also abort the command in the future, but this is a good start. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:55 -04:00
Matthew Wilcox	db5d0c198d	NVMe: Release 0.2 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	6ee44cdced	NVMe: Add download / activate firmware ioctls Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	388f037f4e	NVMe: Move sysfs entries to the right place Because I wasn't setting driverfs_dev, the devices were showing up under /sys/devices/virtual/block. Now they appear underneath the PCI device which they belong to. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Shane Michael Matthews	5911f20039	NVMe: Disable the device before we write the admin queues In case the card has been left in a partially-configured state, write 0 to the Enable bit. Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	574e8b95bc	NVMe: Request I/O regions Calling pci_request_selected_regions() reserves these regions for our use. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:54 -04:00
Matthew Wilcox	2930353f9f	NVMe: Allow queues to be allocated above 4GB Need to call dma_set_coherent_mask() to allow queues to be allocated above 4GB. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	f64d3365a3	NVMe: Enable device DMA Need to call pci_set_master() to enable device DMA Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Shane Michael Matthews	0ee5a7d7cb	NVMe: Enable and disable the PCI device Call pci_enable_device_mem() at initialisation and pci_disable_device at exit. Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	3f85d50b60	NVMe: Check returns from nvme_alloc_queue() It can return NULL, so handle that. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	8e9f0e7115	NVMe: Remove 'node' from nvme_dev We don't keep a list of nvme_dev any more Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	51814232ec	NVMe: Read the model, serial & firmware rev from the controller Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	a53295b699	NVMe: Add NVME_IOCTL_SUBMIT_IO Allow userspace to submit synchronous I/O like the SCSI sg interface does. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:53 -04:00
Matthew Wilcox	7fc3cdabba	NVMe: Create nvme_map_user_pages() and nvme_unmap_user_pages() These are generalisations of the code that was in nvme_submit_user_admin_command(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:52 -04:00
Matthew Wilcox	bd38c5557c	NVMe: Change NVME_IOCTL_GET_RANGE_TYPE to return all the ranges Factor out most of nvme_identify() into a new nvme_submit_user_admin_command() function. Change nvme_get_range_type() to call it and change nvme_ioctl to realise that it's getting back all 64 ranges. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:52 -04:00
Matthew Wilcox	b8deb62cf2	NVMe: Zero the command before we send it Make sure there's no left-over bits set from previous commands that used this slot. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:52 -04:00
Matthew Wilcox	ff22b54fda	NVMe: Add nvme_setup_prps() Generalise the code from nvme_identify() that sets PRP1 & PRP2 so that it's usable for commands sent by nvme_submit_bio_queue(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:52 -04:00
Matthew Wilcox	36c14ed9ca	NVMe: Use PRP2 for the nvme_identify ioctl DMA the result straight to userspace instead of bounce-buffering in the kernel. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:52 -04:00
Matthew Wilcox	53c9577e9c	NVMe: Fix admin IRQ claim on real hardware The admin IRQ is supposed to use the pin-based (or single message MSI) interrupt. Accomplish this by filling in entry[0]'s vector with the INTx irq number. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Matthew Wilcox	821234603b	NVMe: Rename 'cycle' to 'phase' It's called the phase bit in the current draft Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Matthew Wilcox	1b23484bd0	NVMe: Implement per-CPU queues Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Matthew Wilcox	b3b06812e1	NVMe: Reduce set_queue_count arguments by one sq_count and cq_count are always the same, so just call it 'count'. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Matthew Wilcox	3001082cac	NVMe: Factor out queue_request_irq() Two callers with an almost identical long string of arguments, and introducing a third soon. Time to factor out the commonalities. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Matthew Wilcox	b60503ba43	NVMe: New driver This driver is for devices that follow the NVM Express standard Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2011-11-04 15:52:51 -04:00
Michael S. Tsirkin	5087a50e66	virtio-blk: use ida to allocate disk index Based on a patch by Mark Wu <dwu@redhat.com> Current index allocation in virtio-blk is based on a monotonically increasing variable "index". This means we'll run out of numbers after a while. It also could cause confusion about the disk name in the case of hot-plugging disks. Change virtio-blk to use ida to allocate index, instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-11-02 11:41:02 +10:30
Paul Gortmaker	0c8d44f239	block: Fix files that are modules and hence need module.h We want to remove the implicit everywhere presence of module.h so fix up the people relying on that implicit presence in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:13 -04:00
Paul Gortmaker	d5decd3b95	block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros These files were getting <linux/module.h> via an implicit include path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:12 -04:00
Michael S. Tsirkin	a0eda62552	virtio-blk: use ida to allocate disk index Based on a patch by Mark Wu <dwu@redhat.com> Current index allocation in virtio-blk is based on a monotonically increasing variable "index". This means we'll run out of numbers after a while. It also could cause confusion about the disk name in the case of hot-plugging disks. Change virtio-blk to use ida to allocate index, instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-31 08:05:36 +01:00
Linus Torvalds	97d2eb13a0	Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix double-free of page vector ceph: fix 32-bit ino numbers libceph: force resend of osd requests if we skip an osdmap ceph: use kernel DNS resolver ceph: fix ceph_monc_init memory leak ceph: let the set_layout ioctl set single traits Revert "ceph: don't truncate dirty pages in invalidate work thread" ceph: replace leading spaces with tabs libceph: warn on msg allocation failures libceph: don't complain on msgpool alloc failures libceph: always preallocate mon connection libceph: create messenger with client ceph: document ioctls ceph: implement (optional) max read size ceph: rename rsize -> rasize ceph: make readpages fully async	2011-10-28 16:42:18 -07:00
David Vrabel	2d073846b8	block: xen-blkback: use API provided by xenbus module to map rings The xenbus module provides xenbus_map_ring_valloc() and xenbus_map_ring_vfree(). Use these to map the ring pages granted by the frontend. Acked-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-26 10:02:35 -04:00
Sage Weil	6ab00d465a	libceph: create messenger with client This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Linus Torvalds	59e5253417	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits) MAINTAINERS: linux-m32r is moderated for non-subscribers linux@lists.openrisc.net is moderated for non-subscribers Drop default from "DM365 codec select" choice parisc: Kconfig: cleanup Kernel page size default Kconfig: remove redundant CONFIG_ prefix on two symbols cris: remove arch/cris/arch-v32/lib/nand_init.S microblaze: add missing CONFIG_ prefixes h8300: drop puzzling Kconfig dependencies MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers tty: drop superfluous dependency in Kconfig ARM: mxc: fix Kconfig typo 'i.MX51' Fix file references in Kconfig files aic7xxx: fix Kconfig references to READMEs Fix file references in drivers/ide/ thinkpad_acpi: Fix printk typo 'bluestooth' bcmring: drop commented out line in Kconfig btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888' doc: raw1394: Trivial typo fix CIFS: Don't free volume_info->UNC until we are entirely done with it. treewide: Correct spelling of successfully in comments ...	2011-10-25 12:11:02 +02:00
Linus Torvalds	31018acd4c	Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m/debugfs: Make type_name more obvious. xen/p2m/debugfs: Fix potential pointer exception. xen/enlighten: Fix compile warnings and set cx to known value. xen/xenbus: Remove the unnecessary check. xen/irq: If we fail during msi_capability_init return proper error code. xen/events: Don't check the info for NULL as it is already done. xen/events: BUG() when we can't allocate our event->irq array. * 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Fix selfballooning and ensure it doesn't go too far xen/gntdev: Fix sleep-inside-spinlock xen: modify kernel mappings corresponding to granted pages xen: add an "highmem" parameter to alloc_xenballooned_pages xen/p2m: Use SetPagePrivate and its friends for M2P overrides. xen/p2m: Make debug/xen/mmu/p2m visible again. Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."	2011-10-25 09:17:47 +02:00
Jens Axboe	83157223de	Merge branch 'for-linus' into for-3.2/core	2011-10-24 16:24:38 +02:00
Mike Miller	ab5dbebe33	cciss: add small delay when using PCI Power Management to reset for kump The P600 requires a small delay when changing states. Otherwise we may think the board did not reset and we bail. This for kdump only and is particular to the P600. Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-20 22:21:52 +02:00
Jens Axboe	b8d8bdfe31	Merge branch 'stable/for-jens-3.2' of git://oss.oracle.com/git/kwilk/xen into for-3.2/drivers	2011-10-20 15:10:59 +02:00
Jens Axboe	5c04b426f2	Merge branch 'v3.1-rc10' into for-3.2/core Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-19 14:30:42 +02:00
Konrad Rzeszutek Wilk	6927d92091	xen/blkback: Fix two races in the handling of barrier requests. There are two windows of opportunity to cause a race when processing a barrier request. This patch fixes this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-17 14:28:57 -04:00
Christoph Hellwig	456be1484f	loop: remove the incorrect write_begin/write_end shortcut Currently the loop device tries to call directly into write_begin/write_end instead of going through ->write if it can. This is a fairly nasty shortcut as write_begin and write_end are only callbacks for the generic write code and expect to be called with filesystem specific locks held. This code currently causes various issues for clustered filesystems as it doesn't take the required cluster locks, and it also causes issues for XFS as it doesn't properly lock against the swapext ioctl as called by the defragmentation tools. This in case causes data corruption if defragmentation hits a busy loop device in the wrong time window, as reported by RH QA. The reason why we have this shortcut is that it saves a data copy when doing a transformation on the loop device, which is the technical term for using cryptoloop (or an XOR transformation). Given that cryptoloop has been deprecated in favour of dm-crypt my opinion is that we should simply drop this shortcut instead of finding complicated ways to to introduce a formal interface for this shortcut. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-10-17 12:57:20 +02:00
Konrad Rzeszutek Wilk	dda1852802	xen/blkback: Check for proper operation. The patch titled: "xen/blkback: Fix the inhibition to map pages when discarding sector ranges." had the right idea except that it used the wrong comparison operator. It had == instead of !=. This fixes the bug where all (except discard) operations would have been ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-14 12:29:55 -04:00
Lars Ellenberg	3cb7a2a90f	drbd: get rid of drbd_bcast_ee, it is of no use anymore This function was used to broadcast the (leading part of the) bio payload in case we see a data integrity error. It could be received from userland with the drbdsetup events subcommand, to have a peek into the payload that caused the checksum mismatch, and guess from there what may have caused the mismatch, mainly to guess wether it was modification of in-flight data, or data corruption by broken hardware or software bugs. Meanwhile we support bios that are larger than the maximum payload a netlink datagram can carry. And we have means to reliably detect modification of in-flight data by calculating, and comparing, the checksum before and after sendmsg. There is no need to carry this around anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:08 +02:00
Lars Ellenberg	569083c08d	drbd: fix drbd_delete_device: remove vnr from volumes; idr_remove(); synchronize_rcu(); before cleanup Still missing: rcu_readlock() on the various call sites that access/iterate over those idrs. We don't need a specific write lock, as we only modify from configuration context, which is already strictly serialized. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:07 +02:00
Lars Ellenberg	da4a75d2ef	drbd: introduce a bio_set to allocate housekeeping bios from Don't rely on availability of bios from the global fs_bio_set, we should use our own bio_set for meta data IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:06 +02:00
Lars Ellenberg	9db4e77f8c	drbd: use the newly introduced page pool for bitmap IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:05 +02:00
Lars Ellenberg	35abf59424	drbd: add page pool to be used for meta data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:04 +02:00
Lars Ellenberg	3c13b680ce	drbd: only wakeup if something changed in update_peer_seq This commit got it wrong: drbd: Make the peer_seq updating code more obvious Make it more clear that update_peer_seq() is supposed to wake up the seq_wait queue whenever the sequence number changes. We don't need to wake up everytime we receive a sequence number that is _different_ from our currently stored "newest" sequence number, but only if we receive a sequence number _newer_ than what we already have, when we actually change mdev->peer_seq. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:04 +02:00
Lars Ellenberg	2c4a48d097	drbd: remove unused define Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:02 +02:00
Philipp Reisner	81a5d60ecf	drbd: Replaced the minor_table array by an idr Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:01 +02:00
Philipp Reisner	774b305518	drbd: Implemented new commands to create/delete connections/minors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:00 +02:00
Philipp Reisner	80883197da	drbd: Converted drbd_nl_(net_conf\|disconnect)() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:48:00 +02:00
Philipp Reisner	1aba4d7fcf	drbd: Preparing the connector interface to operator on connections Up to now it only operated on minor numbers. Now it can work also on named connections. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:59 +02:00
Philipp Reisner	2f5cdd0b2c	drbd: Converted the transfer log from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:58 +02:00
Philipp Reisner	49559d87fd	drbd: Improved the dec_() macros Now those can be used with a struct drbd_conf that has an other name than 'mdev'. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:57 +02:00
Philipp Reisner	3f9cbe937e	drbd: Removed the mdev parameter from the ..to_tags() and ...from_tags() functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:56 +02:00
Philipp Reisner	0e29d163f7	drbd: Reworked the unconfiguring and thread stopping code * Moved CONFIG_PENDING and DEVICE_DYING from mdev to tconn. * Renamed drbd_reconfig_start() and drbd_reconfig_done() to conn_reconfig_start() and conn_reconfig_done(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:55 +02:00
Andreas Gruenbacher	c66342d949	drbd: Remove left-over function prototypes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:55 +02:00
Andreas Gruenbacher	7201b972de	drbd: Replace get_asender_cmd() with its implementation Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:54 +02:00
Andreas Gruenbacher	6e849ce88c	drbd: Get rid of P_MAX_CMD Instead of artificially enlarging the command decoding arrays to P_MAX_CMD entries, check if an index is within the valid range using the ARRAY_SIZE() macro. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:53 +02:00
Andreas Gruenbacher	1b3bb47d52	drbd: Remove redundant check Opening a device only succeeds on a primary node, or when explicitly setting the allow_oos module parameter to allow opening the device read-only on a secondary node. There is no other way that a request can get into drbd_make_request(), so this code cannot trigger. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:52 +02:00
Andreas Gruenbacher	7be8da0798	drbd: Improve how conflicting writes are handled The previous algorithm for dealing with overlapping concurrent writes was generating unnecessary warnings for scenarios which could be legitimate, and did not always handle partially overlapping requests correctly. Improve it algorithm as follows: * While local or remote write requests are in progress, conflicting new local write requests will be delayed (commit 82172f7). * When a conflict between a local and remote write request is detected, the node with the discard flag decides how to resolve the conflict: It will ask its peer to discard conflicting requests which are fully contained in the local request and retry requests which overlap only partially. This involves a protocol change. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:51 +02:00
Andreas Gruenbacher	71b1c1eb9c	drbd: Use ping-timeout when waiting for missing ack packets When the node with the discard flag resolves write conflicts in dual-primary mode, it may determine that its peer has sent ack packets on the metadata socket which did not arrive, yet. Wait for the next ack with ping-timeout instead of a hard-coded 30 seconds. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:51 +02:00
Andreas Gruenbacher	8ccf218e9f	drbd: Replace atomic_add_return with atomic_inc_return Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:50 +02:00
Andreas Gruenbacher	206d358941	drbd: Concurrent write detection fix Commit 9b1e63e changed the concurrent write detection algorithm to only insert peer requests into write_requests tree after determining that there is no conflict. With this change, new conflicting local requests could be added while the algorithm runs, but this case was not handled correctly. Instead of making the algorithm deal with this case, switch back to adding peer requests to the write_requests tree immediately: this improves fairness. When a peer request is discarded, remove that request from the write_requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:49 +02:00
Andreas Gruenbacher	8050e6d005	drbd: Use container_of() instead of casting Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:48 +02:00
Lars Ellenberg	9676c76097	drbd: fix a wrong likely(), updated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:47 +02:00
Lars Ellenberg	c9d963a46d	drbd: silence some log messages on bitmap IO Summary log messages meant for global bitmap IO should not be printed for bitmap IO caused by activity log transactions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:47 +02:00
Lars Ellenberg	7ad651b522	drbd: new on-disk activity log transaction format Use a new on-disk transaction format for the activity log, which allows for multiple changes to the active set per transaction. Using 4k transaction blocks, we can now get rid of the work-around code to deal with devices not supporting 512 byte logical block size. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:46 +02:00
Lars Ellenberg	46a15bc3ec	lru_cache: allow multiple changes per transaction Allow multiple changes to the active set of elements in lru_cache. The only current user of lru_cache, drbd, is driving this generalisation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:45 +02:00
Lars Ellenberg	45dfffebd0	drbd: allow to select specific bitmap pages for writeout We are about to allow several changes to the active set in one activity log transaction. We have to write out the corresponding bitmap pages as well, if changed. Introduce drbd_bm_mark_for_writeout(), then re-use the existing bitmap writeout path to submit all marked pages in one go. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:44 +02:00
Lars Ellenberg	4738fa1690	drbd: use clear_bit_unlock() where appropriate Some open-coded clear_bit(); smp_mb__after_clear_bit(); should in fact have been smp_mb__before_clear_bit(); clear_bit(); Instead, use clear_bit_unlock() to annotate the intention, and have it do the right thing. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:42 +02:00
Lars Ellenberg	61610420f7	drbd: in drbd_suspend_al, set AL_SUSPENDED before unlocking the activity log As using an empty activity log is the whole point of the excercise, make sure it is still empty when setting AL_SUSPENDED. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:41 +02:00
Lars Ellenberg	867f57483b	drbd: fix typo in comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:40 +02:00
Lars Ellenberg	8c387def58	drbd: simplify condition in drbd_may_do_local_read() fold if (x >= (N+1)) return 0; if (x < N) return 0; into if (x != N) return 0; Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:39 +02:00
Andreas Gruenbacher	c670a39867	drbd: Use the IS_ALIGNED() macro in some more places Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:39 +02:00
Andreas Gruenbacher	8ca9844f10	drbd: Remove obsolete comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:38 +02:00
Andreas Gruenbacher	d0e22a260c	drbd: Iterate over all overlapping intervals in a tree Add a macro and helper function for doing that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:37 +02:00
Andreas Gruenbacher	fcefa62e4c	drbd: Rename drbd_endio_{pri,sec} -> drbd_{,peer_}request_endio Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:36 +02:00
Andreas Gruenbacher	fbe29dec98	drbd: Rename drbd_submit_ee -> drbd_submit_peer_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:35 +02:00
Philipp Reisner	df24aa45f4	drbd: Implemented connection wide state changes That is used for graceful disconnect only Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:47:32 +02:00
Philipp Reisner	047cd4a682	drbd: implemented receiving of P_CONN_ST_CHG_REQ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:05 +02:00
Philipp Reisner	fc3b10a45f	drbd: Implemented receiving of P_CONN_ST_CHG_REPLY Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:04 +02:00
Philipp Reisner	5aabf467e3	drbd: Global_state_lock not necessary here... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:03 +02:00
Philipp Reisner	cf29c9d8c8	drbd: Implemented conn_send_state_req() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:02 +02:00
Philipp Reisner	8410da8f0e	drbd: Introduced tconn->cstate_mutex In compatibility mode with old DRBDs, use that as the state_mutex as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:01 +02:00
Philipp Reisner	dad2055481	drbd: Removed drbd_state_lock() and drbd_state_unlock() The lock they constructed is only taken when the state_mutex was already taken. It is superficial. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:45:01 +02:00
Philipp Reisner	bbeb641c3e	drbd: Killed volume0; last step of multi-volume-enablement Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-10-14 16:44:58 +02:00
Konrad Rzeszutek Wilk	64391b2536	xen/blkback: Fix the inhibition to map pages when discarding sector ranges. The 'operation' parameters are the ones provided to the bio layer while the req->operation are the ones passed in between the backend and frontend. We used the wrong 'operation' value to squash the call to map pages when processing the discard operation resulting in an hypercall that did nothing. Lets guard against going in the mapping function by checking for the proper operation type. CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:38 -04:00
Konrad Rzeszutek Wilk	5c62cb4860	xen/blkback: Report VBD_WSECT (wr_sect) properly. We did not increment the amount of sectors written to disk b/c we tested for the == WRITE which is incorrect - as the operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch fixes it by doing a & WRITE check. CC: stable@kernel.org Reported-by: Andy Burns <xen.lists@burns.me.uk> Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:37 -04:00
Konrad Rzeszutek Wilk	29bde09378	xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. We emulate the barrier requests by draining the outstanding bio's and then sending the WRITE_FLUSH command. To drain the I/Os we use the refcnt that is used during disconnect to wait for all the I/Os before disconnecting from the frontend. We latch on its value and if it reaches either the threshold for disconnect or when there are no more outstanding I/Os, then we have drained all I/Os. Suggested-by: Christopher Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:36 -04:00
Laszlo Ersek	469738e675	xen-blkfront: plug device number leak in xlblk_init() error path ... though after a failed xenbus_register_frontend() all may be lost. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:35 -04:00
Konrad Rzeszutek Wilk	d11e615830	xen-blkfront: If no barrier or flush is supported, use invalid operation. Guard against issuing BLKIF_OP_WRITE_BARRIER or BLKIF_OP_FLUSH_CACHE by checking whether we successfully negotiated with the backend. The negotiation with the backend also sets the q->flush_flags which fortunately for us is also used when submitting an bio to us. If we don't support barriers or flushes it would be set to zero so we should never end up having to deal with REQ_FLUSH \| REQ_FUA. However, other third party implementations of __make_request that might be stacked on top of us might not be so smart, so lets fix this up. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:35 -04:00
Jan Beulich	8e6dc6fe51	xen-blkback: use kzalloc() in favor of kmalloc()+memset() This fixes the problem of three of those four memset()-s having improper size arguments passed: Sizeof a pointer-typed expression returns the size of the pointer, not that of the pointed to data. It also reverts using kmalloc() instead of kzalloc() for the allocation of the pending grant handles array, as that array gets fully initialized in a subsequent loop. Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:34 -04:00
Joe Jin	c555aab97d	xen-blkback: fixed indentation and comments This patch fixes belows: 1. Fix code style issue. 2. Fix incorrect functions name in comments. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:33 -04:00
Li Dongyang	69ef68cef9	xen-blkfront: fix a deadlock while handling discard response When we get -EOPNOTSUPP response for a discard request, we will clear the discard flag on the request queue so we won't attempt to send discard requests to backend again, and this should be protected under rq->queue_lock. However, when we setup the request queue, we pass blkif_io_lock to blk_init_queue so rq->queue_lock is blkif_io_lock indeed, and this lock is already taken when we are in blkif_interrpt, so remove the spin_lock/spin_unlock when we clear the discard flag or we will end up with deadlock here Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Updated description a bit and removed comment from source] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:32 -04:00
Li Dongyang	ed30bf317c	xen-blkfront: Handle discard requests. If the backend advertises 'feature-discard', then interrogate the backend for alignment and granularity. Setup the request queue with the appropiate values and send the discard operation as required. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Amended commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:31 -04:00
Li Dongyang	b3cb0d6adc	xen-blkback: Implement discard requests ('feature-discard') ..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend and used as appropiately by the backend. We also advertise certain granulity parameters to the frontend so it can plug them in. If the backend is a realy device - we just end up using 'blkdev_issue_discard' while for loopback devices - we just punch a hole in the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Fixed up pr_debug and commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-10-13 09:48:30 -04:00
Stefano Stabellini	0930bba674	xen: modify kernel mappings corresponding to granted pages If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-09-29 10:32:58 -04:00
Philipp Reisner	56707f9e87	drbd: Code de-duplication; new function apply_mask_val() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:22 +02:00
Philipp Reisner	4308a0a390	drbd: Removed the os parameter form sanitize_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:21 +02:00
Philipp Reisner	fda74117dc	drbd: Extracted is_valid_conn_transition() out of is_valid_transition() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:20 +02:00
Philipp Reisner	3509502dc8	drbd: Extracted is_valid_transition() out of sanitize_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:19 +02:00
Philipp Reisner	a75f34ad0c	drbd: Renamed is_valid_state_transition() to is_valid_soft_transition() And removed the unused mdev parameter, and made the order of the state parameters: os, ns Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:18 +02:00
Philipp Reisner	d50eee21c4	drbd: Extracted after_conn_state_ch() out of after_state_ch() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:17 +02:00
Philipp Reisner	2a67d8b93b	drbd: Converted drbd_send_ping() and related functions from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:16 +02:00
Philipp Reisner	00d56944ff	drbd: Generalized the work callbacks No longer work callbacks must operate on a mdev. From now on they can also operate on a tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:15 +02:00
Philipp Reisner	6699b65533	drbd: Moved some initializing code into drbd_new_tconn() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:14 +02:00
Philipp Reisner	392c880192	drbd: drbd_thread has now a pointer to a tconn instead of to a mdev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:13 +02:00
Philipp Reisner	19393e105f	drbd: Converted drbd_worker() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:12 +02:00
Philipp Reisner	32862ec705	drbd: Converted drbd_asender() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:11 +02:00
Philipp Reisner	4d641dd7b0	drbd: Converted drbdd_init() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:10 +02:00
Philipp Reisner	f1b3a6ec7d	drbd: Consolidated the setup of the thread name into the framework Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:09 +02:00
Philipp Reisner	a21e929827	drbd: Moved the mdev member into drbd_work (from drbd_request and drbd_peer_request) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:08 +02:00
Philipp Reisner	360cc74052	drbd: Converted drbd_free_sock() and drbd_disconnect() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:06 +02:00
Philipp Reisner	eefc2f7de2	drbd: Converted drbdd() from mdev to tconn The drbd_md_sync(mdev) happens in the after state change anyways... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:05 +02:00
Philipp Reisner	808222845d	drbd: Converted drbd_calc_cpu_mask() and drbd_thread_current_set_cpu() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:04 +02:00
Philipp Reisner	907599e044	drbd: Converted drbd_connect() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:03 +02:00
Philipp Reisner	062e879c8b	drbd: Use and idr data structure to map volume numbers to mdev pointers Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:02 +02:00
Philipp Reisner	dc8228d107	drbd: Converted drbd_send_protocol() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:01 +02:00
Philipp Reisner	13e6037dc9	drbd: Converted drbd_do_auth() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:33:00 +02:00
Philipp Reisner	611208706f	drbd: Converted drbd_(get\|put)_data_sock() and drbd_send_cmd2() to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:32:59 +02:00
Philipp Reisner	65d11ed6f2	drbd: Converted drbd_do_handshake() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:32:58 +02:00
Philipp Reisner	9ba7aa00ae	drbd: Converted drbd_recv_header() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:32:57 +02:00
Philipp Reisner	ce24385342	drbd: Converted decode_header() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:32:56 +02:00
Philipp Reisner	77351055b5	drbd: struct packet_info to hold information of decoded packets Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:32:50 +02:00
Philipp Reisner	de0ff338d6	drbd: Converted drbd_recv() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:29:52 +02:00
Philipp Reisner	8a22cccc20	drbd: Converted drbd_send_handshake() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:29:50 +02:00
Philipp Reisner	a25b63f1e7	drbd: Converted drbd_recv_fp() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:29:49 +02:00
Philipp Reisner	dbd9eea094	drbd: Removed unused mdev argument from drbd_recv_short() and drbd_socket_okay() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:29:48 +02:00
Philipp Reisner	d38e787ecc	drbd: Converted drbd_send_fp() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:29:45 +02:00
Philipp Reisner	bedbd2a53a	drbd: Converted drbd_send() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:27:00 +02:00
Philipp Reisner	1a7ba646e9	drbd: Converted helper functions for drbd_send() to tconn * drbd_update_congested() * we_should_drop_the_connection() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:59 +02:00
Philipp Reisner	0625ac190d	drbd: Converted wake_asender() and request_ping() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:58 +02:00
Philipp Reisner	808e37b803	drbd: Moved SIGNAL_ASENDER to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:57 +02:00
Philipp Reisner	e43ef195f8	drbd: Moved SEND_PING to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:56 +02:00
Philipp Reisner	25703f8320	drbd: Moved DISCARD_CONCURRENT to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:55 +02:00
Philipp Reisner	01a311a589	drbd: Started to separated connection flags (tconn) from block device flags (mdev) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:54 +02:00
Philipp Reisner	7653620de3	drbd: Converted drbd_wait_for_connect() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:53 +02:00
Philipp Reisner	eac3e990e4	drbd: Converted drbd_try_connect() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:52 +02:00
Philipp Reisner	60ae496626	drbd: conn_printk() a dev_printk() alike for drbd's connections Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:50 +02:00
Philipp Reisner	b53339fce2	drbd: Moving state related macros to drbd_state.h Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:49 +02:00
Philipp Reisner	8ea62f5464	drbd: Revert "Make sure we dont send state if a cluster wide state change is in progress" This reverts commit 6e9fdc92b77915d5c7ab8fea751f48378f8b0080. 1) This did not fixed the issue 2) Long sleeping work items can cause IO requests to take as long as the longest work item Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:48 +02:00
Philipp Reisner	e64a329459	drbd: Do no sleep long in drbd_start_resync Work items that sleep too long can cause requests to take as long as the longest sleeping work item. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:47 +02:00
Philipp Reisner	1f04af33fe	drbd: Moved code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:46 +02:00
Philipp Reisner	bc31fe3352	drbd: Eliminated the user of drbd_task_to_thread() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:45 +02:00
Philipp Reisner	bed879ae90	drbd: Moved the thread name into the data structure Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:44 +02:00
Philipp Reisner	b890733953	drbd: Moved the state functions into its own source file Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:43 +02:00
Andreas Gruenbacher	db830c464b	drbd: Local variable renames: e -> peer_req Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:42 +02:00
Andreas Gruenbacher	6c852beca1	drbd: Update some comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:41 +02:00
Andreas Gruenbacher	18b75d756b	drbd: Clean up some left-overs Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:40 +02:00
Andreas Gruenbacher	f6ffca9f42	drbd: Rename struct drbd_epoch_entry to struct drbd_peer_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:39 +02:00
Andreas Gruenbacher	c6f7df42c9	drbd: Remove unused variable in struct drbd_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:37 +02:00
Andreas Gruenbacher	70b1987663	drbd: Improve the drbd_find_overlap() documentation Describe how to reach any further overlapping intervals from the first overlap found. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:36 +02:00
Andreas Gruenbacher	43ae077d0a	drbd: Make the peer_seq updating code more obvious Make it more clear that update_peer_seq() is supposed to wake up the seq_wait queue whenever the sequence number changes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:35 +02:00
Andreas Gruenbacher	6024fece73	drbd: Defer new writes when detecting conflicting writes Before submitting a new local write request, wait for any conflicting local or remote requests to complete. We could assume that the new request occurred first and that the conflicting requests overwrote it (and therefore discard the new reques), but we know for sure that the new request occurred after the conflicting requests and so this behavior would we weird. We would also end up with the wrong result if the new request is not fully contained within the conflicting requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:34 +02:00
Andreas Gruenbacher	ddd8877d31	drbd: Remove unnecessary reference counting left-over Nothing in this function accesses mdev->tconn->net_conf, so there is no need for get_net_conf() / put_net_conf() anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:33 +02:00
Andreas Gruenbacher	5e4722645a	drbd: _req_conflicts(): Get rid of the epoch_entries tree Instead of keeping a separate tree for local and remote write requests for finding requests and for conflict detection, use the same tree for both purposes. Introduce a flag to allow distinguishing the two possible types of entries in this tree. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:32 +02:00
Andreas Gruenbacher	53840641bb	drbd: Allow to wait for the completion of an epoch entry as well Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:31 +02:00
Andreas Gruenbacher	3e05146f0a	drbd: Remove redundant check from drbd_contains_interval() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:30 +02:00
Andreas Gruenbacher	a500c2efbb	drbd: struct drbd_request: Introduce a new collision flag This flag is set when a processes puts itself to sleep to wait for a conflicting request to complete. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:29 +02:00
Andreas Gruenbacher	9e204cddaf	drbd: Move some functions to where they are used Move drbd_update_congested() to drbd_main.c, and drbd_req_new() and drbd_req_free() to drbd_req.c: those functions are not used anywhere else. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:28 +02:00
Andreas Gruenbacher	3e394da184	drbd: Move sequence number logic into drbd_receiver.c and simplify it These things are only used there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:27 +02:00
Andreas Gruenbacher	cc378270e4	drbd: Initialize the sequence number sent over the network even when not used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:26 +02:00
Andreas Gruenbacher	bdc7adb006	drbd: Remove redundant initialization packet_seq is initialized by both sides of a connection in drbd_connect(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:25 +02:00
Andreas Gruenbacher	d876302306	drbd: Rename "enum drbd_packets" to "enum drbd_packet" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:24 +02:00
Andreas Gruenbacher	f2ad906379	drbd: Move cmdname() out of drbd_int.h There is no good reason for cmdname() to be an inline function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:23 +02:00
Philipp Reisner	b42a70ad32	drbd: Do not access tconn after it was freed Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:22 +02:00
Philipp Reisner	257d0af689	drbd: Implemented receiving of new style packets on meta socket Now drbd communication with protocol 100 actually works. Replaced the remaining p_header80 with p_header where we no longer know which header it is. In the places where p_header80 is still in use, it is on purpose, because we know that it is an old style header there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:26:18 +02:00
Philipp Reisner	fd340c12c9	drbd: Use new header layout The new header layout will only be used if the peer supports it of course. For the first packet and the handshake packet the old (h80) layout is used for compatibility reasons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-09-28 10:23:03 +02:00
Carsten Emde	6c4867f646	floppy: use del_timer_sync() in init cleanup When no floppy is found the module code can be released while a timer function is pending or about to be executed. CPU0 CPU1 floppy_init() timer_softirq() spin_lock_irq(&base->lock); detach_timer(); spin_unlock_irq(&base->lock); -> Interrupt del_timer(); return -ENODEV; module_cleanup(); <- EOI call_timer_fn(); OOPS Use del_timer_sync() to prevent this. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-09-21 10:22:11 +02:00
Ayan George	4c823cc3d5	drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd() If the loop device is associated (lo->lo_state == Lo_bound), it will have a valid bdev pointed to by lo->lo_device. There is no reason to ever pass an additional block_device pointer. Signed-off-by: Ayan George <ayan.george@canonical.com> Cc: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-09-21 10:02:13 +02:00
Phillip Susi	8a9c594422	drivers/block/loop.c: emit uevent on auto release The loopback driver failed to emit the change uevent when auto releasing the device. Fixed lo_release() to pass the bdev to loop_clr_fd() so it can emit the event. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ayan George <ayan@ayan.net> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-09-21 10:02:13 +02:00
Sergei Shtylyov	5a3a76e6c3	drivers/block/cpqarray.c: use pci_dev->revision This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it wasn't converted by commit `44c10138fd` ("PCI: Change all drivers to use pci_device->revision"). Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Miller <mike.miller@hp.com> Cc: Chirag Kantharia <chirag.kantharia@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2011-09-21 10:02:13 +02:00
Jiri Kosina	e060c38434	Merge branch 'master' into for-next Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.	2011-09-15 15:08:18 +02:00
Jesper Juhl	e5de063016	Remove unneeded version.h includes from drivers/block/ It was pointed out by 'make versioncheck' that some includes of linux/version.h are not needed in drivers/block/. This patch removes them. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:57:06 +02:00
Justin P. Mattock	699324871f	treewide: remove extra semicolons from various parts of the kernel This is a resend from the original, changing the title from PATCH to RFC(since this is a review for commit, and I should have put that the first go around). and also removing some of the commit's with ia64 and bash since it is significant. let me know if I might have missed anything etc.. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:50:49 +02:00
Joe Perches	1d273b929c	drbd: Use angle brackets for system includes Use the normal include style. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:02:57 +02:00
Joe Perches	57f3224c3f	drbd: Convert vmalloc/memset to vzalloc Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 13:55:02 +02:00
Christoph Hellwig	5a7bbad27a	block: remove support for bio remapping from ->make_request There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-09-12 12:12:01 +02:00
Philipp Reisner	c012949a40	drbd: Replaced all p_header80 with a generic p_header Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:30:26 +02:00
Philipp Reisner	c6d25cfe52	drbd: Preparing to use p_header96 for all packets recv_bm_rle_bits() should not make any assumptions abou the layout of the packet header Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:30:25 +02:00
Philipp Reisner	191d3cc8d9	drbd: Made drbd_flush_workqueue() to take a tconn instead of an mdev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:30:24 +02:00
Philipp Reisner	a0638456c6	drbd: moved crypto transformations and friends from mdev to tconn sed -i \ -e 's/mdev->cram_hmac_tfm/mdev->tconn->cram_hmac_tfm/g' \ -e 's/mdev->integrity_w_tfm/mdev->tconn->integrity_w_tfm/g' \ -e 's/mdev->integrity_r_tfm/mdev->tconn->integrity_r_tfm/g' \ -e 's/mdev->int_dig_out/mdev->tconn->int_dig_out/g' \ -e 's/mdev->int_dig_in/mdev->tconn->int_dig_in/g' \ -e 's/mdev->int_dig_vv/mdev->tconn->int_dig_vv/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:30:23 +02:00
Philipp Reisner	87eeee41f8	drbd: moved req_lock and transfer log from mdev to tconn sed -i \ -e 's/mdev->req_lock/mdev->tconn->req_lock/g' \ -e 's/mdev->unused_spare_tle/mdev->tconn->unused_spare_tle/g' \ -e 's/mdev->newest_tle/mdev->tconn->newest_tle/g' \ -e 's/mdev->oldest_tle/mdev->tconn->oldest_tle/g' \ -e 's/mdev->out_of_sequence_requests/mdev->tconn->out_of_sequence_requests/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:30:15 +02:00
Philipp Reisner	31890f4ab2	drbd: moved agreed_pro_version, last_received and ko_count to tconn sed -i \ -e 's/mdev->agreed_pro_version/mdev->tconn->agreed_pro_version/g' \ -e 's/mdev->last_received/mdev->tconn->last_received/g' \ -e 's/mdev->ko_count/mdev->tconn->ko_count/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:07 +02:00
Philipp Reisner	e6b3ea83bc	drbd: moved receiver, worker and asender from mdev to tconn Patch mostly: sed -i -e 's/mdev->receiver/mdev->tconn->receiver/g' \ -e 's/mdev->worker/mdev->tconn->worker/g' \ -e 's/mdev->asender/mdev->tconn->asender/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:06 +02:00
Philipp Reisner	e42325a576	drbd: moved data and meta from mdev to tconn Patch mostly: sed -i -e 's/mdev->data/mdev->tconn->data/g' \ -e 's/mdev->meta/mdev->tconn->meta/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:05 +02:00
Philipp Reisner	b2fb6dbe52	drbd: moved net_cont and net_cnt_wait from mdev to tconn Patch partly generated by: sed -i -e 's/get_net_conf(mdev)/get_net_conf(mdev->tconn)/g' \ -e 's/put_net_conf(mdev)/put_net_conf(mdev->tconn)/g' \ -e 's/get_net_conf(odev)/get_net_conf(odev->tconn)/g' \ -e 's/put_net_conf(odev)/put_net_conf(odev->tconn)/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:04 +02:00
Philipp Reisner	89e58e755e	drbd: moved net_conf from mdev to tconn Besides moving the struct member, everything else is generated by: sed -i -e 's/mdev->net_conf/mdev->tconn->net_conf/g' \ -e 's/odev->net_conf/odev->tconn->net_conf/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:03 +02:00
Philipp Reisner	2111438b30	drbd: Minimal struct drbd_tconn Starting to dissolve the network connection from the actual block devices. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:02 +02:00
Andreas Gruenbacher	6618bf1638	drbd: Interval tree bugfix Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:00 +02:00
Andreas Gruenbacher	e3cfa7b26a	drbd: Inline function overlaps() is now unused Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:59 +02:00
Andreas Gruenbacher	70dc65e1b3	drbd: Remove some useless paranoia code The open_cnt check is an open-coded D_ASSERT() check. In case the data.work queue is not empty, it does not really help to know which drbd_work elements remained on that list: they will be freed immediately afterwards, anyway. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:58 +02:00
Andreas Gruenbacher	841ce241fa	drbd: Replace the ERR_IF macro with an assert-like macro Remove the file name and line number from the syslog messages generated: we have no duplicate function names, and no function contains the same assertion more than once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:57 +02:00
Andreas Gruenbacher	e77a0a5cc1	drbd: Convert all constants in enum drbd_thread_state to upper case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:56 +02:00
Andreas Gruenbacher	8554df1c6d	drbd: Convert all constants in enum drbd_req_event to upper case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:55 +02:00
Andreas Gruenbacher	bb3bfe9614	drbd: Remove the unused hash tables Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:54 +02:00
Andreas Gruenbacher	8b946255f8	drbd: Use interval tree for overlapping epoch entry detection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:53 +02:00
Andreas Gruenbacher	010f6e678f	drbd: Put sector and size in struct drbd_epoch_entry into struct drbd_interval Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:52 +02:00
Andreas Gruenbacher	bc9c5c4118	drbd: Use the read and write request trees for request lookups Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:50 +02:00
Andreas Gruenbacher	dac1389ccc	drbd: Add read_requests tree We do not do collision detection for read requests, but we still need to look up the request objects when we receive a package over the network. Using the same data structure for read and write requests results in simpler code once the tl_hash and app_reads_hash tables are removed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:31 +02:00
Andreas Gruenbacher	de696716e8	drbd: Use interval tree for overlapping write request detection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:06 +02:00
Andreas Gruenbacher	ace652acf2	drbd: Put sector and size in struct drbd_request into struct drbd_interval Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:05 +02:00
Andreas Gruenbacher	0939b0e5cd	drbd: Add interval tree data structure Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:04 +02:00
Andreas Gruenbacher	c3afd8f568	drbd: Request lookup code cleanup (4) Factor out duplicate code in got_NegAck(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:03 +02:00
Andreas Gruenbacher	ae3388daae	drbd: Request lookup code cleanup (3) Get rid of the ar_id_to_req() and ack_id_to_req() wrappers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:02 +02:00
Andreas Gruenbacher	668eebc6a1	drbd: Request lookup code cleanup (2) Unify the ar_id_to_req() and ack_id_to_req() functions: make both fail if the consistency check fails. Move the request lookup code now duplicated in both functions into its own function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:01 +02:00
Andreas Gruenbacher	5162458564	drbd: Request lookup code cleanup (1) Move _ar_id_to_req() to drbd_receiver.c and mark it non-inline. Remove the leading underscores from _ar_id_to_req() and _ack_id_to_req(). Mark ar_hash_slot() inline. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:00 +02:00
Andreas Gruenbacher	9c50842a35	drbd: Update outdated comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:59 +02:00
Andreas Gruenbacher	d628769b3c	drbd: Move drbd_free_tl_hash() to drbd_main() This is the only place where this function is used. Make it static. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:58 +02:00
Andreas Gruenbacher	579b57ed73	drbd: Magic reserved block_id value cleanup The ID_VACANT definition has become entirely irrelevant by now. The is_syncer_block_id() macro does not improve the code, so eliminated it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:57 +02:00
Andreas Gruenbacher	e7fad8af75	drbd: Endianness convert the constants instead of the variables Converting the constants happens at compile time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:57 +02:00
Andreas Gruenbacher	ca9bc12b90	drbd: Get rid of BE_DRBD_MAGIC and BE_DRBD_MAGIC_BIG Converting the constants happens at compile time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:56 +02:00
Andreas Gruenbacher	9a8e77530f	drbd: Consistently use block_id == ID_SYNCER for checksum based resync and online verify DRBD_MAGIC has nothing to do with block ids and the funny values computed were not actually used, anyway. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:55 +02:00
Andreas Gruenbacher	3980485361	drbd: Remove superfluous declaration Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:43 +02:00
Andreas Gruenbacher	28c455ceb2	drbd: Get rid of req_validator_fn typedef Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:31 +02:00
Kay Sievers	e03c8dd149	loop: always allow userspace partitions and optionally support automatic scanning Automatic partition scanning can be requested individually per loop device during its setup by setting LO_FLAGS_PARTSCAN. By default, no partition tables are scanned. Userspace can now always add and remove partitions from all loop devices, regardless if the in-kernel partition scanner is enabled or not. The needed partition minor numbers are allocated from the extended minors space, the main loop device numbers will continue to match the loop minors, regardless of the number of partitions used. # grep . /sys/class/block/loop1/loop/* /sys/block/loop1/loop/autoclear:0 /sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img /sys/block/loop1/loop/offset:0 /sys/block/loop1/loop/partscan:1 /sys/block/loop1/loop/sizelimit:0 # ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 Aug 14 20:22 /dev/loop0 brw-rw---- 1 root disk 7, 1 Aug 14 20:23 /dev/loop1 brw-rw---- 1 root disk 259, 0 Aug 14 20:23 /dev/loop1p1 brw-rw---- 1 root disk 259, 1 Aug 14 20:23 /dev/loop1p2 brw-rw---- 1 root disk 7, 99 Aug 14 20:23 /dev/loop99 brw-rw---- 1 root disk 259, 2 Aug 14 20:23 /dev/loop99p1 brw-rw---- 1 root disk 259, 3 Aug 14 20:23 /dev/loop99p2 crw------T 1 root root 10, 237 Aug 14 20:22 /dev/loop-control Cc: Karel Zak <kzak@redhat.com> Cc: Davidlohr Bueso <dave@gnu.org> Acked-By: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-23 20:12:04 +02:00
Jens Axboe	89c63a8ef3	Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2011-08-23 15:09:13 +02:00
Joe Jin	1bc05b0ae6	xen-blkback: fixed indentation and comments This patch fixes belows: 1. Fix code style issue. 2. Fix incorrect functions name in comments. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-08-22 11:35:36 -04:00
Joe Jin	6f5986bce5	xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed. When do block-attach/block-detach test with below steps, umount hangs in the guest. Furthermore shutdown ends up being stuck when umounting file-systems. 1. start guest. 2. attach new block device by xm block-attach in Dom0. 3. mount new disk in guest. 4. execute xm block-detach to detach the block device in dom0 until timeout 5. Any request to the disk will hung. Root cause: This issue is caused when setting backend device's state to 'XenbusStateClosing', which sends to the frontend the XenbusStateClosing notification. When frontend receives the notification it tries to release the disk in blkfront_closing(), but at that moment the disk is still in use by guest, so frontend refuses to close. Specifically it sets the disk state to XenbusStateClosing and sends the notification to backend - when backend receives the event, it disconnects the vbd from real device, and sets the vbd device state to XenbusStateClosing. The backend disconnects the real device/file, and any IO requests to the disk in guest will end up in ether, leaving disk DEAD and set to XenbusStateClosing. When the guest wants to disconnect the disk, umount will hang on blkif_release()->xlvbd_release_gendisk() as it is unable to send any IO to the disk, which prevents clean system shutdown. Solution: Don't disconnect backend until frontend state switched to XenbusStateClosed. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Daniel Stodden <daniel.stodden@citrix.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Annie Li <annie.li@oracle.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> [v1: Modified description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-08-22 11:35:35 -04:00
Lukas Czerner	dfaa2ef68e	loop: add discard support for loop devices This commit adds discard support for loop devices. Discard is usually supported by SSD and thinly provisioned devices as a method for reclaiming unused space. This is no different than trying to reclaim back space which is not used by the file system on the image, but it still occupies space on the host file system. We can do the reclamation on file system which does support hole punching. So when discard request gets to the loop driver we can translate that to punch a hole to the underlying file, hence reclaim the free space. This is very useful for trimming down the size of the image to only what is really used by the file system on that image. Fstrim may be used for that purpose. It has been tested on ext4, xfs and btrfs with the image file systems ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has some problems but it seems that ext4 punch hole implementation is somewhat flawed and it is unrelated to this commit. Also this is a very good method of validating file systems punch hole implementation. Note that when encryption is used, discard support is disabled, because using it might leak some information useful for possible attacker. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:50:46 +02:00
Andrew Morton	548ef6cc26	nbd-replace-some-printk-with-dev_warn-and-dev_info-checkpatch-fixes ERROR: code indent should use tabs where possible #30: FILE: drivers/block/nbd.c:578: +^I dev_info(disk_to_dev(lo->disk), "NBD_DISCONNECT\n");$ total: 1 errors, 0 warnings, 35 lines checked NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/nbd-replace-some-printk-with-dev_warn-and-dev_info.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:28 +02:00
WANG Cong	5eedf5415c	nbd: replace some printk with dev_warn() and dev_info() Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:28 +02:00
WANG Cong	7742ce4ab4	nbd: lower the loglevel of an error message This is only an error, no need to use KERN_CRIT log level. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:28 +02:00
WANG Cong	7f1b90f99a	nbd: replace printk KERN_ERR with dev_err() Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:22 +02:00
WANG Cong	1695b87f7d	nbd: replace sysfs_create_file() with device_create_file() Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:21 +02:00
WANG Cong	25ac0c2b97	nbd: use task_pid_nr() to get current pid Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-19 14:48:17 +02:00
Jens Axboe	40bb96ade4	Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2011-08-09 20:43:26 +02:00
Konrad Rzeszutek Wilk	ea5e116162	xen/blkback: Make description more obvious. With the frontend having Xen but the backend not, it just looks odd: <> Xen virtual block device support <> Block-device backend driver Fix it to have the 'Xen' in front of it. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-08-09 11:12:14 -04:00
Joe Handzik	f963d270cb	cciss: add transport mode attribute to sys Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-08 11:40:17 +02:00
Joseph Handzik	1304953700	cciss: Adds simple mode functionality Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-08 11:40:15 +02:00
Axel Lin	f41c53a569	block: swim3: fix unterminated of_device_id table of_device_id structures need a NULL terminating entry, add it. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-03 15:02:55 +02:00
H Hartley Sweeten	ddad9ef582	drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-02 12:43:49 +02:00
Kay Sievers	05eb0f252b	loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD waits for show() to finish to remove the sysfs file. cat /sys/class/block/loop0/loop/backing_file mutex_lock_nested+0x176/0x350 ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] loop_attr_do_show_backing_file+0x2f/0xd0 [loop] dev_attr_show+0x1b/0x60 ? sysfs_read_file+0x86/0x1a0 ? __get_free_pages+0x12/0x50 sysfs_read_file+0xaf/0x1a0 ioctl(LOOP_CLR_FD): wait_for_common+0x12c/0x180 ? try_to_wake_up+0x2a0/0x2a0 wait_for_completion+0x18/0x20 sysfs_deactivate+0x178/0x180 ? sysfs_addrm_finish+0x43/0x70 ? sysfs_addrm_start+0x1d/0x20 sysfs_addrm_finish+0x43/0x70 sysfs_hash_and_remove+0x85/0xa0 sysfs_remove_group+0x59/0x100 loop_clr_fd+0x1dc/0x3f0 [loop] lo_ioctl+0x223/0x7a0 [loop] Instead of taking the lo_ctl_mutex from sysfs code, take the inner lo->lo_lock, to protect the access to the backing_file data. Thanks to Tejun for help debugging and finding a solution. Cc: Milan Broz <mbroz@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-31 22:21:35 +02:00
Kay Sievers	d134b00b9a	loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices Instead of unconditionally creating a fixed number of dead loop devices which need to be investigated by storage handling services, even when they are never used, we allow distros start with 0 loop devices and have losetup(8) and similar switch to the dynamic /dev/loop-control interface instead of searching /dev/loop%i for free devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-31 22:08:04 +02:00
Kay Sievers	770fe30a46	loop: add management interface for on-demand device allocation Loop devices today have a fixed pre-allocated number of usually 8. The number can only be changed at module init time. To find a free device to use, /dev/loop%i needs to be scanned, and all devices need to be opened until a free one is possibly found. This adds a new /dev/loop-control device node, that allows to dynamically find or allocate a free device, and to add and remove loop devices from the running system: LOOP_CTL_ADD adds a specific device. Arg is the number of the device. It returns the device i or a negative error code. LOOP_CTL_REMOVE removes a specific device, Arg is the number the device. It returns the device i or a negative error code. LOOP_CTL_GET_FREE finds the next unbound device or allocates a new one. No arg is given. It returns the device i or a negative error code. The loop kernel module gets automatically loaded when /dev/loop-control is accessed the first time. The alias specified in the module, instructs udev to create this 'dead' device node, even when the module is not loaded. Example: cfd = open("/dev/loop-control", O_RDWR); # add a new specific loop device err = ioctl(cfd, LOOP_CTL_ADD, devnr); # remove a specific loop device err = ioctl(cfd, LOOP_CTL_REMOVE, devnr); # find or allocate a free loop device to use devnr = ioctl(cfd, LOOP_CTL_GET_FREE); sprintf(loopname, "/dev/loop%i", devnr); ffd = open("backing-file", O_RDWR); lfd = open(loopname, O_RDWR); err = ioctl(lfd, LOOP_SET_FD, ffd); Cc: Tejun Heo <tj@kernel.org> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-31 22:08:04 +02:00
Kay Sievers	34dd82afd2	loop: replace linked list of allocated devices with an idr index Replace the linked list, that keeps track of allocated devices, with an idr index to allow a more efficient lookup of devices. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-07-31 22:08:04 +02:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Linus Torvalds	ba5b56cb3e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: document unlocked d_parent accesses ceph: explicitly reference rename old_dentry parent dir in request ceph: document locking for ceph_set_dentry_offset ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ceph: protect d_parent access in ceph_d_revalidate ceph: protect access to d_parent ceph: handle racing calls to ceph_init_dentry ceph: set dir complete frag after adding capability rbd: set blk_queue request sizes to object size ceph: set up readahead size when rsize is not passed rbd: cancel watch request when releasing the device ceph: ignore lease mask ceph: fix ceph_lookup_open intent usage ceph: only link open operations to directory unsafe list if O_CREAT\|O_TRUNC ceph: fix bad parent_inode calc in ceph_lookup_open ceph: avoid carrying Fw cap during write into page cache libceph: don't time out osd requests that haven't been received ceph: report f_bfree based on kb_avail rather than diffing. ceph: only queue capsnap if caps are dirty ceph: fix snap writeback when racing with writes ...	2011-07-26 13:38:50 -07:00
Josh Durgin	029bcbd8b0	rbd: set blk_queue request sizes to object size This improves performance since more requests can be merged. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-07-26 11:29:35 -07:00
Yehuda Sadeh	79e3057c4c	rbd: cancel watch request when releasing the device We were missing this cleanup, so when a device was released the osd didn't clean up its watchers list, so following notifications could be slow as osd needed to timeout on the client. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-07-26 11:29:04 -07:00
Linus Torvalds	8ded371f81	Merge branch 'for-3.1/drivers' of git://git.kernel.dk/linux-block * 'for-3.1/drivers' of git://git.kernel.dk/linux-block: cciss: do not attempt to read from a write-only register xen/blkback: Add module alias for autoloading xen/blkback: Don't let in-flight requests defer pending ones. bsg: fix address space warning from sparse bsg: remove unnecessary conditional expressions bsg: fix bsg_poll() to return POLLOUT properly	2011-07-25 10:38:18 -07:00
Linus Torvalds	bbd9d6f7fb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits) vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp isofs: Remove global fs lock jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory fix IN_DELETE_SELF on overwriting rename() on ramfs et.al. mm/truncate.c: fix build for CONFIG_BLOCK not enabled fs:update the NOTE of the file_operations structure Remove dead code in dget_parent() AFS: Fix silly characters in a comment switch d_add_ci() to d_splice_alias() in "found negative" case as well simplify gfs2_lookup() jfs_lookup(): don't bother with . or .. get rid of useless dget_parent() in btrfs rename() and link() get rid of useless dget_parent() in fs/btrfs/ioctl.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers drivers: fix up various ->llseek() implementations fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek Ext4: handle SEEK_HOLE/SEEK_DATA generically Btrfs: implement our own ->llseek fs: add SEEK_HOLE and SEEK_DATA flags reiserfs: make reiserfs default to barrier=flush ... Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new shrinker callout for the inode cache, that clashed with the xfs code to start the periodic workers later.	2011-07-22 19:02:39 -07:00
Linus Torvalds	a99a7d1436	Merge branch 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mips: Fix i8253 clockevent fallout i8253: Cleanup outb/inb magic arm: Footbridge: Use common i8253 clockevent mips: Use common i8253 clockevent x86: Use common i8253 clockevent i8253: Create common clockevent implementation i8253: Export i8253_lock unconditionally pcpskr: MIPS: Make config dependencies finer grained pcspkr: Cleanup Kconfig dependencies i8253: Move remaining content and delete asm/i8253.h i8253: Consolidate definitions of PIT_LATCH x86: i8253: Consolidate definitions of global_clock_event i8253: Alpha, PowerPC: Remove unused asm/8253pit.h alpha: i8253: Cleanup remaining users of i8253pit.h i8253: Remove I8253_LOCK config i8253: Make pcsp sound driver use the shared i8253_lock i8253: Make pcspkr input driver use the shared i8253_lock i8253: Consolidate all kernel definitions of i8253_lock i8253: Unify all kernel declarations of i8253_lock i8253: Create linux/i8253.h and use it in all 8253 related files	2011-07-22 16:51:56 -07:00
Linus Torvalds	8181780c16	Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6 * 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6: dt: include linux/errno.h in linux/of_address.h of/address: Add of_find_matching_node_by_address helper dt: remove extra xsysace platform_driver registration tty/serial: Add devicetree support for nVidia Tegra serial ports dt: add empty of_property_read_u32[_array] for non-dt dt: bindings: move SEC node under new crypto/ dt: add helper function to read u32 arrays tty/serial: change of_serial to use new of_property_read_u32() api dt: add 'const' for of_property_read_string parameter **out_string dt: add helper functions to read u32 and string property values tty: of_serial: support for 32 bit accesses dt: document the of_serial bindings dt/platform: allow device name to be overridden drivers/amba: create devices from device tree dt: add of_platform_populate() for creating device from the device tree dt: Add default match table for bus ids	2011-07-22 14:53:38 -07:00
Linus Torvalds	111ad119d1	Merge branch 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI xen/pciback: Remove the DEBUG option. xen/pciback: Drop two backends, squash and cleanup some code. xen/pciback: Print out the MSI/MSI-X (PIRQ) values xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices. xen: rename pciback module to xen-pciback. xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases. xen/pciback: Allocate IRQ handler for device that is shared with guest. xen/pciback: Disable MSI/MSI-X when reseting a device xen/pciback: guest SR-IOV support for PV guest xen/pciback: Register the owner (domain) of the PCI device. xen/pciback: Cleanup the driver based on checkpatch warnings and errors. xen/pciback: xen pci backend driver. xen: tmem: self-ballooning and frontswap-selfshrinking xen: Add module alias to autoload backend drivers xen: Populate xenbus device attributes xen: Add __attribute__((format(printf... where appropriate xen: prepare tmem shim to handle frontswap xen: allow enable use of VGA console on dom0	2011-07-22 13:45:15 -07:00
Al Viro	e7f5909707	kill useless checks for sb->s_op == NULL never is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:21 -04:00
Grant Likely	8c11642a50	Merge commit 'v3.0-rc7' into devicetree/next	2011-07-15 20:11:34 -06:00
Stefan Bader	89153b5cae	xen-blkfront: Fix one off warning about name clash Avoid telling users to use xvde and onwards when using xvde. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-07-14 14:19:51 -04:00
Stefan Bader	196cfe2ae8	xen-blkfront: Drop name and minor adjustments for emulated scsi devices These were intended to avoid the namespace clash when representing emulated IDE and SCSI devices. However that seems to confuse users more than expected (a disk defined as sda becomes xvde). So for now go back to the scheme which does no adjustments. This will break when mixing IDE and SCSI names in the configuration of guests but should be by now expected. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-07-14 14:19:33 -04:00

... 25 26 27 28 29 ...

4663 Commits (24d4e7f642882a8a13da170b4ba86eec8fa91bf2)