mirror-linux

Commit Graph

Author	SHA1	Message	Date
Thomas Weißschuh	3141e0e536	blk-mq: make blk_mq_hw_ctx_sysfs_entry instances const The blk_mq_hw_ctx_sysfs_entry structures are never modified, mark them as const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260316-b4-sysfs-const-attr-block-v1-4-a35d73b986b0@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:29:16 -06:00
Thomas Weißschuh	f00d826f1b	blk-crypto: make blk_crypto_attr instances const The blk_crypto_attrs structures are never modified, mark them as const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: John Garry <john.g.garry@oracle.com>> Link: https://patch.msgid.link/20260316-b4-sysfs-const-attr-block-v1-3-a35d73b986b0@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:29:16 -06:00
Thomas Weißschuh	3c91226309	block: ia-ranges: make blk_ia_range_sysfs_entry instances const The blk_ia_range_sysfs_entry structures are never modified, mark them as const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260316-b4-sysfs-const-attr-block-v1-2-a35d73b986b0@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:29:16 -06:00
Thomas Weißschuh	223983874d	block: make queue_sysfs_entry instances const The queue_sysfs_entry structures are never modified, mark them as const. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260316-b4-sysfs-const-attr-block-v1-1-a35d73b986b0@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:29:16 -06:00
Christoph Hellwig	e80fd7a089	block: remove bvec_free bvec_free is only called by bio_free, so inline it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://patch.msgid.link/20260316161144.1607877-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:27:14 -06:00
Christoph Hellwig	b520c4eef8	block: split bio_alloc_bioset more clearly into a fast and slowpath bio_alloc_bioset tries non-waiting slab allocations first for the bio and bvec array, but does so in a somewhat convoluted way. Restructure the function so that it first open codes these slab allocations, and then falls back to the mempools with the original gfp mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://patch.msgid.link/20260316161144.1607877-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:27:14 -06:00
Christoph Hellwig	fed406f3c1	block: mark bvec_{alloc,free} static Only used in bio.c these days. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://patch.msgid.link/20260316161144.1607877-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-17 19:27:14 -06:00
Caleb Sander Mateos	5d54016205	ublk: report BLK_SPLIT_INTERVAL_CAPABLE The ublk driver doesn't access request integrity buffers directly, it only copies them to/from the ublk server in ublk_copy_user_integrity(). ublk_copy_user_integrity() uses bio_for_each_integrity_vec() to walk all the integrity segments. ublk devices are therefore capable of handling requests with integrity intervals split across segments. Set BLK_SPLIT_INTERVAL_CAPABLE in the struct blk_integrity flags for ublk devices to opt out of the integrity-interval dma_alignment limit. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://patch.msgid.link/20260313144701.1221652-3-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-14 07:44:30 -06:00
Keith Busch	203247c5cb	blk-integrity: support arbitrary buffer alignment A bio segment may have partial interval block data with the rest continuing into the next segments because direct-io data payloads only need to align in memory to the device's DMA limits. At the same time, the protection information may also be split in multiple segments. The most likely way that may happen is if two requests merge, or if we're directly using the io_uring user metadata. The generate/verify, however, only ever accessed the first bip_vec. Further, it may be possible to unalign the protection fields from the user space buffer, or if there are odd additional opaque bytes in front or in back of the protection information metadata region. Change up the iteration to allow spanning multiple segments. This patch is mostly a re-write of the protection information handling to allow any arbitrary alignments, so it's probably easier to review the end result rather than the diff. Many controllers are not able to handle interval data composed of multiple segments when PI is used, so this patch introduces a new integrity limit that a low level driver can set to notify that it is capable, default to false. The nvme driver is the first one to enable it in this patch. Everyone else will force DMA alignment to the logical block size as before to ensure interval data is always aligned within a single segment. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://patch.msgid.link/20260313144701.1221652-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-14 07:44:30 -06:00
Ming Lei	3dbaacf6ab	blk-cgroup: wait for blkcg cleanup before initializing new disk When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the previous disk's blkcg state is cleaned up asynchronously via disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk() runs before that cleanup finishes, we may overwrite q->root_blkg while the old one is still alive, and radix_tree_insert() in blkg_create() fails with -EEXIST because the old blkg entries still occupy the same queue id slot in blkcg->blkg_tree. This causes the sd probe to fail with -ENOMEM. Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL, which indicates the previous disk's blkcg cleanup has completed. Fixes: `1059699f87` ("block: move blkcg initialization/destroy into disk allocation/release handler") Cc: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260311032837.2368714-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-11 08:30:30 -06:00
Chaitanya Kulkarni	daa6c79858	block: clear BIO_QOS flags in blk_steal_bios() When a bio goes through the rq_qos infrastructure on a path's request queue, it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set. These flags indicate that rq_qos_done_bio() should be called on completion to update rq_qos accounting. During path failover in nvme_failover_req(), the bio's bi_bdev is redirected from the failed path's disk to the multipath head's disk via bio_set_dev(). However, the BIO_QOS flags are not cleared. When the bio eventually completes (either successfully via a new path or with an error via bio_io_error()), rq_qos_done_bio() checks for these flags and calls __rq_qos_done_bio(q->rq_qos, bio) where q is obtained from the bio's current bi_bdev - which is now the multipath head's queue, not the original path's queue. The multipath head's queue does not have rq_qos enabled (q->rq_qos is NULL), but the code assumes that if BIO_QOS_* flags are set, q->rq_qos must be valid. This breaks when a bio is moved between queues during NVMe multipath failover, leading to a NULL pointer dereference. Execution Context timeline :- * =====> dd process context [USER] dd process [SYSCALL] write() - dd process context submit_bio() nvme_ns_head_submit_bio() - path selection blk_mq_submit_bio() #### QOS FLAGS SET HERE [USER] dd waits or returns ==== I/O in flight on NVMe hardware ===== ===== End of submission path ==== ------------------------------------------------------ * dd ====> Interrupt context; [IRQ] NVMe completion interrupt nvme_irq() nvme_complete_rq() nvme_failover_req() ### BIO MOVED TO HEAD spin_lock_irqsave (atomic section) bio_set_dev() changes bi_bdev ### BUG: QOS flags NOT cleared kblockd_schedule_work() * Interrupt context =====> kblockd workqueue [WQ] kblockd workqueue - kworker process nvme_requeue_work() submit_bio_noacct() nvme_ns_head_submit_bio() nvme_find_path() returns NULL bio_io_error() bio_endio() rq_qos_done_bio() ### CRASH ### KERNEL PANIC / OOPS Crash from blktests nvme/058 (rapid namespace remapping): [ 1339.636033] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1339.641025] nvme nvme4: rescanning namespaces. [ 1339.642064] #PF: supervisor read access in kernel mode [ 1339.642067] #PF: error_code(0x0000) - not-present page [ 1339.642070] PGD 0 P4D 0 [ 1339.642073] Oops: Oops: 0000 [#1] SMP NOPTI [ 1339.642078] CPU: 35 UID: 0 PID: 4579 Comm: kworker/35:2H Tainted: G O N 6.17.0-rc3nvme+ #5 PREEMPT(voluntary) [ 1339.642084] Tainted: [O]=OOT_MODULE, [N]=TEST [ 1339.673446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 1339.682359] Workqueue: kblockd nvme_requeue_work [nvme_core] [ 1339.686613] RIP: 0010:__rq_qos_done_bio+0xd/0x40 [ 1339.690161] Code: 75 dd 5b 5d 41 5c c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 f5 53 48 89 fb <48> 8b 03 48 8b 40 30 48 85 c0 74 0b 48 89 ee 48 89 df ff d0 0f 1f [ 1339.703691] RSP: 0018:ffffc900066f3c90 EFLAGS: 00010202 [ 1339.706844] RAX: ffff888148b9ef00 RBX: 0000000000000000 RCX: 0000000000000000 [ 1339.711136] RDX: 00000000000001c0 RSI: ffff8882aaab8a80 RDI: 0000000000000000 [ 1339.715691] RBP: ffff8882aaab8a80 R08: 0000000000000000 R09: 0000000000000000 [ 1339.720472] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8882aa3b6010 [ 1339.724650] R13: 0000000000000000 R14: ffff8882338bcef0 R15: ffff8882aa3b6020 [ 1339.729029] FS: 0000000000000000(0000) GS:ffff88985c0cf000(0000) knlGS:0000000000000000 [ 1339.734525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1339.738563] CR2: 0000000000000000 CR3: 0000000111045000 CR4: 0000000000350ef0 [ 1339.742750] DR0: ffffffff845ccbec DR1: ffffffff845ccbed DR2: ffffffff845ccbee [ 1339.745630] DR3: ffffffff845ccbef DR6: 00000000ffff0ff0 DR7: 0000000000000600 [ 1339.748488] Call Trace: [ 1339.749512] <TASK> [ 1339.750449] bio_endio+0x71/0x2e0 [ 1339.751833] nvme_ns_head_submit_bio+0x290/0x320 [nvme_core] [ 1339.754073] __submit_bio+0x222/0x5e0 [ 1339.755623] ? rcu_is_watching+0xd/0x40 [ 1339.757201] ? submit_bio_noacct_nocheck+0x131/0x370 [ 1339.759210] submit_bio_noacct_nocheck+0x131/0x370 [ 1339.761189] ? submit_bio_noacct+0x20/0x620 [ 1339.762849] nvme_requeue_work+0x4b/0x60 [nvme_core] [ 1339.764828] process_one_work+0x20e/0x630 [ 1339.766528] worker_thread+0x184/0x330 [ 1339.768129] ? __pfx_worker_thread+0x10/0x10 [ 1339.769942] kthread+0x10a/0x250 [ 1339.771263] ? __pfx_kthread+0x10/0x10 [ 1339.772776] ? __pfx_kthread+0x10/0x10 [ 1339.774381] ret_from_fork+0x273/0x2e0 [ 1339.775948] ? __pfx_kthread+0x10/0x10 [ 1339.777504] ret_from_fork_asm+0x1a/0x30 [ 1339.779163] </TASK> Fix this by clearing both BIO_QOS_THROTTLED and BIO_QOS_MERGED flags when bios are redirected to the multipath head in nvme_failover_req(). This is consistent with the existing code that clears REQ_POLLED and REQ_NOWAIT flags when the bio changes queues. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260226031243.87200-3-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-10 07:11:09 -06:00
Chaitanya Kulkarni	b2c45ced59	block: move bio queue-transition flag fixups into blk_steal_bios() blk_steal_bios() transfers bios from a request to a bio_list when the request is requeued to a different queue. The NVMe multipath failover path (nvme_failover_req) currently open-codes clearing of REQ_POLLED, bi_cookie, and REQ_NOWAIT on each bio before calling blk_steal_bios(). Move these fixups into blk_steal_bios() itself so that any caller automatically gets correct flag state when bios cross queue boundaries. Simplify nvme_failover_req() accordingly. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260226031243.87200-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-10 07:11:09 -06:00
Jens Axboe	89d10b7803	Merge branch 'for-7.1/block-integrity' into for-7.1/block Merge in integrity changes which are also landing in the VFS tree as dependencies for fs related changes. * for-7.1/block-integrity: block: pass a maxlen argument to bio_iov_iter_bounce block: add fs_bio_integrity helpers block: make max_integrity_io_size public block: prepare generation / verification helpers for fs usage block: add a bdev_has_integrity_csum helper block: factor out a bio_integrity_setup_default helper block: factor out a bio_integrity_action helper	2026-03-09 14:30:14 -06:00
Damien Le Moal	ecd92cfec5	block: remove bdev_nonrot() bdev_nonrot() is simply the negative return value of bdev_rot(). So replace all call sites of bdev_nonrot() with calls to bdev_rot() and remove bdev_nonrot(). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
John Garry	d0e5fc7062	block: Correct comments on bio_alloc_clone() and bio_init_clone() Correct the comments that the cloned bio must be freed before the memory pointed to by @bio_src->bi_io_vecs (is freed). Christoph Hellwig contributed most the of the update wording. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	b0e497db68	Documentation: ABI: stable: document the zoned_qd1_writes attribute Update the documentation file Documentation/ABI/stable/sysfs-block to describe the zoned_qd1_writes sysfs queue attribute file. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	3d9782f62f	block: default to QD=1 writes for blk-mq rotational zoned devices For blk-mq rotational zoned block devices (e.g. SMR HDDs), default to having zone write plugging limit write operations to a maximum queue depth of 1 for all zones. This significantly reduce write seek overhead and improves SMR HDD write throughput. For remotely connected disks with a very high network latency this features might not be useful. However, remotely connected zoned devices are rare at the moment, and we cannot know the round trip latency to pick a good default for network attached devices. System administrators can however disable this feature in that case. For BIO based (non blk-mq) rotational zoned block devices, the device driver (e.g. a DM target driver) can directly set an appropriate default. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	1365b6904f	block: allow submitting all zone writes from a single context In order to maintain sequential write patterns per zone with zoned block devices, zone write plugging issues only a single write BIO per zone at any time. This works well but has the side effect that when large sequential write streams are issued by the user and these streams cross zone boundaries, the device ends up receiving a discontiguous set of write commands for different zones. The same also happens when a user writes simultaneously at high queue depth multiple zones: the device does not see all sequential writes per zone and receives discontiguous writes to different zones. While this does not affect the performance of solid state zoned block devices, when using an SMR HDD, this pattern change from sequential writes to discontiguous writes to different zones significantly increases head seek which results in degraded write throughput. In order to reduce this seek overhead for rotational media devices, introduce a per disk zone write plugs kernel thread to issue all write BIOs to zones. This single zone write issuing context is enabled for any zoned block device that has a request queue flagged with the new QUEUE_ZONED_QD1_WRITES flag. The flag QUEUE_ZONED_QD1_WRITES is visible as the sysfs queue attribute zoned_qd1_writes for zoned devices. For regular block devices, this attribute is not visible. For zoned block devices, a user can override the default value set to force the global write maximum queue depth of 1 for a zoned block device, or clear this attribute to fallback to the default behavior of zone write plugging which limits writes to QD=1 per sequential zone. Writing to a zoned block device flagged with QUEUE_ZONED_QD1_WRITES is implemented using a list of zone write plugs that have a non-empty BIO list. Listed zone write plugs are processed by the disk zone write plugs worker kthread in FIFO order, and all BIOs of a zone write plug are all processed before switching to the next listed zone write plug. A newly submitted BIO for a non-FULL zone write plug that is not yet listed causes the addition of the zone write plug at the end of the disk list of zone write plugs. Since the write BIOs queued in a zone write plug BIO list are necessarilly sequential, for rotational media, using the single zone write plugs kthread to issue all BIOs maintains a sequential write pattern and thus reduces seek overhead and improves write throughput. This processing essentially result in always writing to HDDs at QD=1, which is not an issue for HDDs operating with write caching enabled. Performance with write cache disabled is also not degraded thanks to the efficient write handling of modern SMR HDDs. A disk list of zone write plugs is defined using the new struct gendisk zone_wplugs_list, and accesses to this list is protected using the zone_wplugs_list_lock spinlock. The per disk kthread (zone_wplugs_worker) code is implemented by the function disk_zone_wplugs_worker(). A reference on listed zone write plugs is always held until all BIOs of the zone write plug are processed by the worker kthread. BIO issuing at QD=1 is driven using a completion structure (zone_wplugs_worker_bio_done) and calls to blk_io_wait(). With this change, performance when sequentially writing the zones of a 30 TB SMR SATA HDD connected to an AHCI adapter changes as follows (1MiB direct I/Os, results in MB/s unit): +--------------------+ \| Write BW (MB/s) \| +------------------+----------+---------+ \| Sequential write \| Baseline \| Patched \| \| Queue Depth \| 6.19-rc8 \| \| +------------------+----------+---------+ \| 1 \| 244 \| 245 \| \| 2 \| 244 \| 245 \| \| 4 \| 245 \| 245 \| \| 8 \| 242 \| 245 \| \| 16 \| 222 \| 246 \| \| 32 \| 211 \| 245 \| \| 64 \| 193 \| 244 \| \| 128 \| 112 \| 246 \| +------------------+----------+---------+ With the current code (baseline), as the sequential write stream crosses a zone boundary, higher queue depth creates a gap between the last IO to the previous zone and the first IOs to the following zones, causing head seeks and degrading performance. Using the disk zone write plugs worker thread, this pattern disappears and the maximum throughput of the drive is maintained, leading to over 100% improvements in throughput for high queue depth write. Using 16 fio jobs all writing to randomly chosen zones at QD=32 with 1 MiB direct IOs, write throughput also increases significantly. +--------------------+ \| Write BW (MB/s) \| +------------------+----------+---------+ \| Random write \| Baseline \| Patched \| \| Number of zones \| 6.19-rc7 \| \| +------------------+----------+---------+ \| 1 \| 191 \| 192 \| \| 2 \| 101 \| 128 \| \| 4 \| 115 \| 123 \| \| 8 \| 90 \| 120 \| \| 16 \| 64 \| 115 \| \| 32 \| 58 \| 105 \| \| 64 \| 56 \| 101 \| \| 128 \| 55 \| 99 \| +------------------+----------+---------+ Tests using XFS shows that buffered write speed with 8 jobs writing files increases by 12% to 35% depending on the workload. +--------------------+ \| Write BW (MB/s) \| +------------------+----------+---------+ \| Workload \| Baseline \| Patched \| \| \| 6.19-rc7 \| \| +------------------+----------+---------+ \| 256MiB file size \| 212 \| 238 \| +------------------+----------+---------+ \| 4MiB .. 128 MiB \| 213 \| 243 \| \| random file size \| \| \| +------------------+----------+---------+ \| 2MiB .. 8 MiB \| 179 \| 242 \| \| random file size \| \| \| +------------------+----------+---------+ Performance gains are even more significant when using an HBA that limits the maximum size of commands to a small value, e.g. HBAs controlled with the mpi3mr driver limit commands to a maximum of 1 MiB. In such case, the write throughput gains are over 40%. +--------------------+ \| Write BW (MB/s) \| +------------------+----------+---------+ \| Workload \| Baseline \| Patched \| \| \| 6.19-rc7 \| \| +------------------+----------+---------+ \| 256MiB file size \| 175 \| 245 \| +------------------+----------+---------+ \| 4MiB .. 128 MiB \| 174 \| 244 \| \| random file size \| \| \| +------------------+----------+---------+ \| 2MiB .. 8 MiB \| 171 \| 243 \| \| random file size \| \| \| +------------------+----------+---------+ Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	b7cbc30e93	block: rename struct gendisk zone_wplugs_lock field Rename struct gendisk zone_wplugs_lock field to zone_wplugs_hash_lock to clearly indicates that this is the spinlock used for manipulating the hash table of zone write plugs. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	c30e8c4bb0	block: remove disk_zone_is_full() The helper function disk_zone_is_full() is only used in disk_zone_wplug_is_full(). So remove it and open code it directly in this single caller. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	1084e41dee	block: rename and simplify disk_get_and_lock_zone_wplug() disk_get_and_lock_zone_wplug() always returns a zone write plug with the plug lock held. This is unnecessary since this function does not look at the fields of existing plugs, and new plugs need to be locked only after their insertion in the disk hash table, when they are being used. Remove the zone write plug locking from disk_get_and_lock_zone_wplug() and rename this function disk_get_or_alloc_zone_wplug(). blk_zone_wplug_handle_write() is modified to add locking of the zone write plug after calling disk_get_or_alloc_zone_wplug() and before starting to use the plug. This change also simplifies blk_revalidate_seq_zone() as unlocking the plug becomes unnecessary. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:30:00 -06:00
Damien Le Moal	0a8b8af896	block: fix zone write plugs refcount handling in disk_zone_wplug_schedule_bio_work() The function disk_zone_wplug_schedule_bio_work() always takes a reference on the zone write plug of the BIO work being scheduled. This ensures that the zone write plug cannot be freed while the BIO work is being scheduled but has not run yet. However, this unconditional reference taking is fragile since the reference taken is released by the BIO work blk_zone_wplug_bio_work() function, which implies that there always must be a 1:1 relation between the work being scheduled and the work running. Make sure to drop the reference taken when scheduling the BIO work if the work is already scheduled, that is, when queue_work() returns false. Fixes: `9e78c38ab3` ("block: Hold a reference on zone write plugs to schedule submission") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Damien Le Moal	b7d4ffb510	block: fix zone write plug removal Commit `7b29518728` ("block: Do not remove zone write plugs still in use") modified disk_should_remove_zone_wplug() to add a check on the reference count of a zone write plug to prevent removing zone write plugs from a disk hash table when the plugs are still being referenced by BIOs or requests in-flight. However, this check does not take into account that a BIO completion may happen right after its submission by a zone write plug BIO work, and before the zone write plug BIO work releases the zone write plug reference count. This situation leads to disk_should_remove_zone_wplug() returning false as in this case the zone write plug reference count is at least equal to 3. If the BIO that completes in such manner transitioned the zone to the FULL condition, the zone write plug for the FULL zone will remain in the disk hash table. Furthermore, relying on a particular value of a zone write plug reference count to set the BLK_ZONE_WPLUG_UNHASHED flag is fragile as reading the atomic reference count and doing a comparison with some value is not overall atomic at all. Address these issues by reworking the reference counting of zone write plugs so that removing plugs from a disk hash table can be done directly from disk_put_zone_wplug() when the last reference on a plug is dropped. To do so, replace the function disk_remove_zone_wplug() with disk_mark_zone_wplug_dead(). This new function sets the zone write plug flag BLK_ZONE_WPLUG_DEAD (which replaces BLK_ZONE_WPLUG_UNHASHED) and drops the initial reference on the zone write plug taken when the plug was added to the disk hash table. This function is called either for zones that are empty or full, or directly in the case of a forced plug removal (e.g. when the disk hash table is being destroyed on disk removal). With this change, disk_should_remove_zone_wplug() is also removed. disk_put_zone_wplug() is modified to call the function disk_free_zone_wplug() to remove a zone write plug from a disk hash table and free the plug structure (with a call_rcu()), when the last reference on a zone write plug is dropped. disk_free_zone_wplug() always checks that the BLK_ZONE_WPLUG_DEAD flag is set. In order to avoid having multiple zone write plugs for the same zone in the disk hash table, disk_get_and_lock_zone_wplug() checked for the BLK_ZONE_WPLUG_UNHASHED flag. This check is removed and a check for the new BLK_ZONE_WPLUG_DEAD flag is added to blk_zone_wplug_handle_write(). With this change, we continue preventing adding multiple zone write plugs for the same zone and at the same time re-inforce checks on the user behavior by failing new incoming write BIOs targeting a zone that is marked as dead. This case can happen only if the user erroneously issues write BIOs to zones that are full, or to zones that are currently being reset or finished. Fixes: `7b29518728` ("block: Do not remove zone write plugs still in use") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Bill Wendling	0ee8ab5d4d	block: annotate struct request_queue with __counted_by_ptr The queue_hw_ctx field in struct request_queue is an array of pointers to struct blk_mq_hw_ctx. The number of elements in this array is tracked by the nr_hw_queues field. The array is allocated in __blk_mq_realloc_hw_ctxs() using kcalloc_node() with set->nr_hw_queues elements. q->nr_hw_queues is subsequently updated to set->nr_hw_queues. When growing the array, the new array is assigned to queue_hw_ctx before nr_hw_queues is updated. This is safe because nr_hw_queues (the old smaller count) is used for bounds checking, which is within the new larger allocation. When shrinking the array, nr_hw_queues is updated to the smaller value, while queue_hw_ctx retains the larger allocation. This is also safe as the count is within the allocation bounds. Annotating queue_hw_ctx with __counted_by_ptr(nr_hw_queues) allows the compiler (with kSAN) to verify that accesses to queue_hw_ctx are within the valid range defined by nr_hw_queues. This patch was generated by CodeMender and reviewed by Bill Wendling. Tested by running blktests. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Bill Wendling <morbo@google.com> [axboe: massage commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	0cc9293bcc	sed-opal: add IOC_OPAL_GET_SUM_STATUS ioctl. This adds a function for retrieving the set of Locking objects enabled for Single User Mode (SUM) and the value of the RangeStartRangeLengthPolicy parameter. It retrieves data from the LockingInfo table, specifically the columns SingleUserModeRanges and RangeStartLengthPolicy, which were added according to the TCG Opal Feature Set: Single User Mode, as described in chapters 4.4.3.1 and 4.4.3.2. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	661025cdbc	sed-opal: increase column attribute type size to 64 bits. Change the column parameter in response_get_column() from u8 to u64 to support the full range of column identifiers. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	a441a9d224	sed-opal: add IOC_OPAL_ENABLE_DISABLE_LR. This ioctl is used to set up RLE (read lock enabled) and WLE (write lock enabled) parameters of the Locking object. In Single User Mode (SUM), if the RangeStartRangeLengthPolicy parameter is set in the 'Reactivate' method, only Admin authority maintains the locking range length and start (offset) attributes of Locking objects set up for SUM. All other attributes from struct opal_user_lr_setup (RLE - read locking enabled, WLE - write locking enabled) shall remain in possession of the User authority associated with the Locking object set for SUM. With the IOC_OPAL_ENABLE_DISABLE_LR ioctl, the opal_user_lr_setup members 'range_start' and 'range_length' of the ioctl argument are ignored. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	8e3d34a7ce	sed-opal: add IOC_OPAL_LR_SET_START_LEN ioctl. This ioctl is used to set up locking range start (offset) and locking range length attributes only. In Single User Mode (SUM), if the RangeStartRangeLengthPolicy parameter is set in the 'Reactivate' method, only Admin authority maintains the locking range length and start (offset) attributes of Locking objects set up for SUM. All other attributes from struct opal_user_lr_setup (RLE - read locking enabled, WLE - write locking enabled) shall remain in possession of the User authority associated with the Locking object set for SUM. Therefore, we need a separate function for setting up locking range start and locking range length because it may require two different authorities (and sessions) if the RangeStartRangeLengthPolicy attribute is set. With the IOC_OPAL_LR_SET_START_LEN ioctl, the opal_user_lr_setup members 'RLE' and 'WLE' of the ioctl argument are ignored. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	8ff71e6b96	sed-opal: refactor (split) IOC_OPAL_LR_SETUP internals. IOC_OPAL_LR_SETUP is used to set up a locking range entirely under a single authority (usually Admin1), but for Single User Mode (SUM), the permissions for attributes (RangeStart, RangeLength) and (ReadLockEnable, WriteLockEnable, ReadLocked, WriteLocked) may be split between two different authorities. Typically, it is Admin1 for the former and the User associated with the LockingRange in SUM for the latter. This commit only splits the internals in preparation for the introduction of separate ioctls for setting RangeStart, RangeLength and the rest using new ioctl calls. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	aca086ff27	sed-opal: add IOC_OPAL_REACTIVATE_LSP. This adds the 'Reactivate' method as described in the "TCG Storage Opal SSC Feature Set: Single User Mode" document (ch. 3.1.1.1). The method enables switching an already active SED OPAL2 device, with appropriate firmware support for Single User Mode (SUM), to or from SUM. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	c6c9dc91cb	sed-opal: add Admin1PIN parameter. As desribed in ch. 3.1.1.1.1.3 of TCG Storage Opal SSC Feature Set: Single User Mode document. To be used later in Reactivate method implementation. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	a184058fb4	sed-opal: add RangeStartRangeLengthPolicy parameter. As desribed in ch. 3.1.1.1.1.2 of TCG Storage Opal SSC Feature Set: Single User Mode document. To be used later in Reactivate method implementation and in function for retrieving SUM device status. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Ondrej Kozina	b26f29b669	sed-opal: add UID of Locking Table. As described in ch. 6.3, Table 240 in TCG Storage Architecture Core Specification document. It's also referenced in TCG Storage Opal SSC Feature Set: Single User Mode document, ch. 3.1.1.1 Reactivate method. It will be used later in Reactivate method implemetation for sed-opal interface. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 14:29:59 -06:00
Christoph Hellwig	a9aa6045ab	block: pass a maxlen argument to bio_iov_iter_bounce Allow the file system to limit the size processed in a single bounce operation. This is needed when generating integrity data so that the size of a single integrity segment can't overflow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	0bde8a12b5	block: add fs_bio_integrity helpers Add a set of helpers for file system initiated integrity information. These include mempool backed allocations and verifying based on a passed in sector and size which is often available from file system completion routines. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	8c56ef1015	block: make max_integrity_io_size public File systems that generate integrity will need this, so move it out of the block private or blk-mq specific headers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	3f00626832	block: prepare generation / verification helpers for fs usage Return the status from verify instead of directly stashing it in the bio, and rename the helpers to use the usual bio_ prefix for things operating on a bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	7afe93946d	block: add a bdev_has_integrity_csum helper Factor out a helper to see if the block device has an integrity checksum from bdev_stable_writes so that it can be reused for other checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	a936655697	block: factor out a bio_integrity_setup_default helper Add a helper to set the seed and check flag based on useful defaults from the profile. Note that this includes a small behavior change, as we now only set the seed if any action is set, which is fine as nothing will look at it otherwise. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Christoph Hellwig	7ea25eaad5	block: factor out a bio_integrity_action helper Split the logic to see if a bio needs integrity metadata from bio_integrity_prep into a reusable helper than can be called from file system code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-09 07:47:02 -06:00
Linus Torvalds	1f318b96cc	Linux 7.0-rc3	2026-03-08 16:56:54 -07:00
Linus Torvalds	fc9f248d8c	EFI fixes for v7.0 #2 Fix for the x86 EFI workaround keeping boot services code and data regions reserved until after SetVirtualAddressMap() completes: deferred struct page initialization may result in some of this memory to be lost permanently. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCaaxHtgAKCRAwbglWLn0t XIbNAPwNjw/TSgVD+Ur//yqY7TxZSBari8aheEkXNaYHFCPImwD6A1CzNGn6rcka JzeP+6HeOO9c0xCBudcR0aRfSma3cQI= =a+XF -----END PGP SIGNATURE----- Merge tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "Fix for the x86 EFI workaround keeping boot services code and data regions reserved until after SetVirtualAddressMap() completes: deferred struct page initialization may result in some of this memory being lost permanently" * tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efi: defer freeing of boot services memory	2026-03-08 12:13:09 -07:00
Linus Torvalds	014441d1e4	i2c-for-7.0-rc3 A revert for the i801 driver restoring old locking behaviour. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmmtKb8ACgkQFA3kzBSg Kba5dRAAnoNtQC2qd3PneEUZs/pVK3kE+wEJ57iWPyxnSUEW0O7RcbQHoETb039N O0aSyAL/x2pI4+nMYnLOkJUwwaDjcpSdCFPpeUsmIhzZo/k19hyPaW3VmdpIF+uR K6snCwNzH4AbCh0ka9XOUH4YXINse4C2n7ZP9r5z5WZ6ANK3x8oKGC/QRM6UPaZw jXPl962lb9LQARqvG6YnUHjn+x3teHW/sD3/48IHfNeuvhKstzG9Bc+XDZD+Uc7X EGNAwI7/4tkm/0vRZXDWkuupJleqZSIUXVlb5awv0p50IqREjEnl2fdQdoR90vux oooTKv4inWw0W79VBwQeScGCHKFPV00HQkexiyePmtCGwSU3/k3BWalD+jY9CF8s W6yDR7M3gmIeNbQQXGZx6/04KVFugtQEQm9v9O7bmB7oEW01K4tAAGhfcFgmwOeN qKsmF1Dt+KefYQdtWPCZpMT/zdUTjFJs69J8omxtyo5SdU8RWaLGMegYfEwUrakH r9pt/nASAPcMTb31KAlgro2QmvHWzRVx6+Sir41tLFB5Ls4jxC/a/cH3DWIqgq8V PqZF5dvfxsa/KoXrotpQHnS9Nma3KqEJnjLwg/7LhSPxjCqhKvlTjIqDv1IP5R0e N56KYy2MRRfdWUCovnXmTViFc5fsmJk1agjjXtxHHdp8GO/njAg= =Nc8b -----END PGP SIGNATURE----- Merge tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A revert for the i801 driver restoring old locking behaviour" * tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"	2026-03-08 10:17:05 -07:00
Linus Torvalds	c23719abc3	Miscellaneous x86 fixes: - Fix SEV guest boot failures in certain circumstances, due to very early code relying on a BSS-zeroed variable that isn't actually zeroed yet an may contain non-zero bootup values. Move the variable into the .data section go gain even earlier zeroing. - Expose & allow the IBPB-on-Entry feature on SNP guests, which was not properly exposed to guests due to initial implementational caution. - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative file paths. - 4 commits to fix the various SNC (Sub-NUMA Clustering) topology enumeration bugs/artifacts (sched-domain build errors mostly). SNC enumeration data got more complicated with Granite Rapids X (GNR) and Clearwater Forest X (CWF), which exposed these bugs and made their effects more serious. - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs. - Work around a historic libgcc unwinder bug in the vdso32 sigreturn code (again), which regressed during an overly aggressive recent cleanup of DWARF annotations. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsy9wRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ieiQ/7B2Rfm5vR5rQLlAv26iEMypIwoCiCMgzA YD3nOMFl6aGhphKryiU0b4MDhAIASN9X6mZloryUKyol1oKP0evkWXSk/0J+k+V9 lS7uIVL+8nPTSl3gQE7ARzJ9jakFN49VzDheZjsjIHC0+n+yvCJU6xSx8IKeiTSW axpX8R33M3Fj+u5anF3m37OdFTgiYxFO0t5VNFgWP4H9367yC/wnHPuDyidAdJ/N B7PL1L3rG3+w/4np81Xwi/rThwgsSWarVLNuMJuGM5wujMr8mQGhuWaeLiPgTx7G wze1iarWvp5uqamGztpy/4WMD1x0yBX9CCSocnwF48Fh1yTww5+uwOZn5e5fZxYr vDhCH6+DB8Rt3Wj+/3RBzHSFe7rNq+f86U84uxTwyOs5eC5sGUuyH15lCt4dP9ZO uQfW0dQRwvUXCGXJxxZdIR0nq/vEJUmQ+DLLL6zkCj24t9ND5IPAkBLVn7P5PO5s qv8dPpldSq57V4comqW8oDAqLL0OeS1qgggxlHzqAdrMmt+IVKWvteRXrkgy1m9Y Bt0EbdghUTZkn9+FcUTorVA/pZHL5sYCiuGQxNbaaLmMWrcX4I3XnEtpzgukHh8e BL1blJWAm/4cuhGXb4RF7AZMQgTU56greOU385Afc1Qz2lzohGO4lqgGOH8L0ZEh KqEX1IS0ZbI= =KlDX -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix SEV guest boot failures in certain circumstances, due to very early code relying on a BSS-zeroed variable that isn't actually zeroed yet an may contain non-zero bootup values Move the variable into the .data section go gain even earlier zeroing - Expose & allow the IBPB-on-Entry feature on SNP guests, which was not properly exposed to guests due to initial implementational caution - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative file paths - Fix the various SNC (Sub-NUMA Clustering) topology enumeration bugs/artifacts (sched-domain build errors mostly). SNC enumeration data got more complicated with Granite Rapids X (GNR) and Clearwater Forest X (CWF), which exposed these bugs and made their effects more serious - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs - Work around a historic libgcc unwinder bug in the vdso32 sigreturn code (again), which regressed during an overly aggressive recent cleanup of DWARF annotations * tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/vdso32: Work around libgcc unwinder bug x86/resctrl: Fix SNC detection x86/topo: Fix SNC topology mess x86/topo: Replace x86_has_numa_in_package x86/topo: Add topology_num_nodes_per_package() x86/numa: Store extra copy of numa_nodes_parsed x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths x86/sev: Allow IBPB-on-Entry feature for SNP guests x86/boot/sev: Move SEV decompressor variables into the .data section	2026-03-07 17:12:06 -08:00
Linus Torvalds	6ff1020c2f	Make clock_adjtime() syscall timex validation slightly more permissive for auxiliary clocks, to not reject syscalls based on the status field that do not try to modify the status field. This makes ABI behavior in clock_adjtime() consistent with CLOCK_REALTIME. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxzkRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hq8g//fRTp9p2pVfmRWUoxWELrT/bMK1r+D6F3 6BYkwp68peRhchVrFxkI/Y37rjAIC8CXZSPuvkubqIROrH3gA7SCCQYCcZKdss+t i3lbpQF8IbagPIS5btpOAN2KRCu2S7aqjDdH0rWb9VhQdlW7fI71Z72Uz07YEA+q TWpy3gE531P/dgAqcvIAyMHnFZDCb1S6z8wZvT3SV4r4GkczfXpTFyNHHtETSu0V 7isuOBfloM4HpDU50oUotlqBiwigH27J2Ad6aIrnCA7iaQPrzREysG+8E96ShhaB g6+qaQS5gTgFryA1bggA6LzGveLOI8bjy2kZ2SnZWuFPj46OReGIuwK4kyY07jz2 xk0sd37alN16ETKhGVLfAgjmzVGoKVNnp4ak9J3VmMbxWEmXeObuOC8SmF9VImc1 4bRaG9+Tlfd4DtOOz2+E4VcPE1D9A2tMw4esgUaXRrrp4GlEcKOJ5PRlWj0uGvrh xLPLbL0XIiWsjMsHdVs4Gq9Z0MvfRHc4VLOviIqLFtHox2DscZypPkyjKAv5inp0 /VWyUYJkkr07RMQQ3nqHnP+lzAfO2aSeZ72D9NnHStL3RPbGC4jYvpoi8dnH0/TT PKJgj2jb7u3h+1cxKBi1RM0JbxUYD5+4N8zfJISa9uqkHZ3XY3VyuuT+2RHO6CQp d1BdX0V4oDA= =zjov -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Make clock_adjtime() syscall timex validation slightly more permissive for auxiliary clocks, to not reject syscalls based on the status field that do not try to modify the status field. This makes the ABI behavior in clock_adjtime() consistent with CLOCK_REALTIME" * tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix timex status validation for auxiliary clocks	2026-03-07 17:09:15 -08:00
Linus Torvalds	b1b9a9d0b5	Fix a DL scheduler bug that may corrupt internal metrics during PI and setscheduler() syscalls, resulting in kernel warnings and misbehavior. Found during stress-testing. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxW0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jzXRAAjqcTwaC72cd+6cnh+tE9/fcjXf1JtK5e TxdTygsgBAbXh63rD4y4cRPueqBR1ne52TAV0lI8Z1pBM/XthnaF4MJBue6B8EdX SQIE7hpOh6R81I6hnuhNoNsAy95jQvYXN5SFaKMuNacWNVX8k3vPzN5XPxa7yHLN MVUL+O9c7Xwg4v30Nz/QIv0mFoPosbh4PIdeVpD/ghJAXtXhsCg7EYOivEk9UsSy TAcq3qRnfDyroIOc5/dnSglEwX12LQqVFBba97nI/TCjaH23PsUIt2Dg2rpJbJ+k bLh4hGpOoyQvgE/PSEdoMl1F9pXw3XiUOzAGrFJdqn0iKL+7WzuTEQH+vAToGZQv 4hF5BtMjLrAYY/MVsD8qJGm/pne5nTIo2gSsG7LZPwCmMj0rDUGXfO4G8N8LHhT7 ExQ/t2+z0BczsKdvF3VKX+RweT51AOYOWcmLIdA9h1jdAy858GVmTzSWDveAEJ0L yToPQ0UMCz985g9il6Rdb5cIphD7DjuUeFNnYTCm63cVpZdA4j8Da74r4KfP2jNY tRcbiUy+A7MwqW5aERgwBtI6XCz6QZqW3svJW9yYghf40lgNGAcDCTTdf2r7g0Ho Q0pQVxEk9mXD5N1otjzSS4piLbzoMaPH1L4W6ceHN1RzBjfSJED3tmfGUHZUDqNE w33GhhQAFpA= =vP5l -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a DL scheduler bug that may corrupt internal metrics during PI and setscheduler() syscalls, resulting in kernel warnings and misbehavior. Found during stress-testing" * tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting	2026-03-07 17:07:13 -08:00
Eric Dumazet	1954c4f012	eventpoll: Convert epoll_put_uevent() to scoped user access Saves two function calls, and one stac/clac pair. stac/clac is rather expensive on older cpus like Zen 2. A synthetic network stress test gives a ~1.5% increase of pps on AMD Zen 2. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-03-07 15:03:14 -08:00
Linus Torvalds	3b5d535c63	SCSI fixes on 20260307 Two core changes and the rest in drivers, one core change to quirk the behaviour of the Iomega Zip drive and one to fix a hang caused by tag reallocation problems, which has mostly been seen by the iscsi client. Note the latter fixes the problem but still has a slight sysfs memory leak, so will be amended in the next pull request (once we've run the fix for the fix through our testing). Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> -----BEGIN PGP SIGNATURE----- iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaaxT0hsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMiwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy c2hpcC5jb20ACgkQ50LJTO6YrIVmDwD+P17JCAk+Ju0aNSnjEmIjUC2oI1S+9GdO thbkK99vClABAOOkDvHopBBhfsilTpHBYjWFM34vC/iiaO/xfgd9YH2A =kIDx -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two core changes and the rest in drivers, one core change to quirk the behaviour of the Iomega Zip drive and one to fix a hang caused by tag reallocation problems, which has mostly been seen by the iscsi client. Note the latter fixes the problem but still has a slight sysfs memory leak, so will be amended in the next pull request (once we've run the fix for the fix through our testing)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: Fix recursive locking in __configfs_open_file() scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP scsi: mpi3mr: Clear reset history on ready and recheck state after timeout scsi: core: Fix refcount leak for tagset_refcnt	2026-03-07 14:04:50 -08:00
Linus Torvalds	fb07430e6f	fbdev fixes for kernel v7.0-rc3: Silence build error in au1100fb driver found by kernel test robot -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaayLiwAKCRD3ErUQojoP X1pVAP4/j6LjBX862nFgtxS5XC4YBkpGRLYwO2WJMec+4sO5fQD/ThrowpuzZfPl FhD/6WtMS4zPCDfNeqIKAo/JySez+w8= =2Tha -----END PGP SIGNATURE----- Merge tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fix from Helge Deller: "Silence build error in au1100fb driver found by kernel test robot" * tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: au1100fb: Fix build on MIPS64	2026-03-07 13:21:43 -08:00
Linus Torvalds	6deccafcb4	parisc architecture fixes for kernel v7.0-rc3: Three initial kernel mapping fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaayE4AAKCRD3ErUQojoP X4U4AQDtHPc9nlM3areu5yTQnOcPTExuEoIpvBm9ktwNCdrwCgEAt4tqv3hhxCvG /lwb6XBCHfyw3d/AsTRbOIH1MGCnaQQ= =itGt -----END PGP SIGNATURE----- Merge tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "While testing Sasha Levin's 'kallsyms: embed source file:line info in kernel stack traces' patch series, which increases the typical kernel image size, I found some issues with the parisc initial kernel mapping which may prevent the kernel to boot. The three small patches here fix this" * tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix initial page table creation for boot parisc: Check kernel mapping earlier at bootup parisc: Increase initial mapping to 64 MB with KALLSYMS	2026-03-07 12:38:16 -08:00

1 2 3 4 5 ...

1427908 Commits (3141e0e536b43ab3555737cb2ee6ea1ed0aff69f) All Branches Search

1427908 Commits (3141e0e536b43ab3555737cb2ee6ea1ed0aff69f)

All Branches