check_mft_mirror() still computes the number of bytes to validate in each
mirrored MFT record, but the actual comparison against $MFTMirr was dropped
when the superblock code was updated.
As a result, mount misses a stale or inconsistent $MFTMirr as long as both
records pass the structural baad-record checks. Restore the comparison and
log an error when the primary $MFT record differs from its mirror copy.
Returning false lets the existing mount error handling mark the volume as
having NTFS errors and, with on_errors=remount-ro, continue read-only. The
default on_errors=continue mount policy still allows the mount to proceed.
Fixes: 6251f0b0de ("ntfs: update super block operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_empty_logfile() has three related allocator bugs around the
@empty_buf and @ra buffers it uses inside the per-cluster loop.
When the loop encounters a runlist entry with LCN_RL_NOT_MAPPED, the
function kvfrees @empty_buf and goes to map_vcn to remap. @empty_buf
is not cleared. If ntfs_map_runlist_nolock() fails on re-entry,
control jumps to the err label which kvfrees @empty_buf a second time.
In the same branch, @ra is left allocated. When the remap succeeds
the function falls through the @empty_buf re-allocation and the @ra
re-allocation, overwriting the previous @ra pointer and leaking it.
The success path frees @empty_buf with kfree() instead of kvfree().
kvzalloc() may fall back to vmalloc(), in which case kfree() does not
correctly release the memory.
A KASAN-enabled QEMU harness mirroring this control flow reports
"BUG: KASAN: double-free" when the second ntfs_map_runlist_nolock()
fails.
Clear both @empty_buf and @ra after the in-loop releases so the err
path is a no-op when the buffers have already been freed and so the
remap-success path does not leak the previous @ra. Switch the success
path to kvfree() to match the @empty_buf allocator.
Fixes: 5218cd102a ("ntfs: update misc operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_attr_find() validates a named attribute before comparing it with the
requested name, but that check is currently after the AT_UNUSED handling.
When callers enumerate attributes with AT_UNUSED, ntfs_attr_find() can
return a malformed named attribute before checking whether name_offset
and name_length stay within the attribute record.
Some enumeration callers use the returned attribute name pointer
directly. For example, one path passes (attr + name_offset, name_length)
to ntfs_attr_iget(), where the name can later be copied according to
name_length. A malformed on-disk name_offset/name_length pair should not
be exposed to those callers.
Move the existing name bounds validation before returning attributes
during AT_UNUSED enumeration, and write it as an offset/remaining-size
check so the subtraction cannot underflow. Extract the converted values
into local variables (name_offset, attr_len, name_size) to make the
intent explicit and avoid repeating the endian conversions inside the
bounds check. This keeps matching attributes on the same checked path
while also covering attribute enumeration.
A small userspace ASAN model with attr length=32, name_offset=124 and
name_length=8 reproduces a heap-buffer-overflow read in the old
enumeration path. With this change the same malformed attribute is
rejected before the name pointer is returned to the caller.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
NTFS MFT record numbers are limited to the 32-bit range, and
ntfs_mft_record_layout() rejects mft_no >= 2^32. The free-MFT-record
bitmap scan in ntfs_mft_bitmap_find_and_alloc_free_rec_nolock() also
guards against this overflow but uses a strict greater than comparison,
allowing record number 2^32 itself through this earlier check.
Every other 2^32 boundary check in fs/ntfs/mft.c uses '>=', so the
strict greater than here is both a real off-by-one and an internal
inconsistency. A model with ll == 2^32 confirms the current check
accepts the value while the corrected check rejects it.
Use '>=' so the boundary matches the layout-time rejection and the
surrounding bitmap-scan checks.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_mft_record_check() verifies that attrs_offset is aligned and that
the resulting pointer stays within the allocated MFT record buffer, but
it does not check that the first attribute header starts within the
bytes_in_use area.
A malformed record with attrs_offset greater than bytes_in_use can pass
this check as long as attrs_offset is still within bytes_allocated. The
attribute parser then computes the remaining record space by subtracting
the attribute pointer from bytes_in_use. Because that value is unsigned,
the subtraction can underflow and allow bytes after bytes_in_use to be
interpreted as an attribute.
Reject records where attrs_offset is outside bytes_in_use or where the
used area does not even contain the four-byte attribute type/AT_END
terminator at attrs_offset.
A small userspace model with attrs_offset=128 and bytes_in_use=64 shows
the current check accepts the record and the parser space calculation
underflows to 0xffffffc0. With this change the same malformed record is
rejected before the attribute walker is entered.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_write_volume_label() does not check the return value of
kstrdup(). If the allocation fails, vol->volume_label is set to
NULL while the function returns success. A subsequent
FS_IOC_GETFSLABEL then returns an empty string even though the
on-disk label was updated correctly.
Fix by allocating the new label before taking vol_ni->mrec_lock and
updating any on-disk metadata, so an -ENOMEM from kstrdup() leaves
both the in-memory and on-disk labels untouched and consistent. On
success the preallocated copy replaces the old vol->volume_label.
Also move mark_inode_dirty_sync() into the success path so that it
is not called when no metadata was actually modified.
Fixes: 6251f0b0de ("ntfs: update super block operations")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_sd_add_everyone() builds the on-disk security descriptor for a
newly created file by kmalloc()'ing a buffer and then partially
filling it in:
sd = kmalloc(sd_len, GFP_NOFS);
...
sd->revision = 1;
sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
...
The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record. The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:
- struct security_descriptor_relative
@alignment (1 byte)
@sacl (4 bytes; SE_SACL_PRESENT is not set
but the offset still reaches disk)
- struct ntfs_sid (3 instances: owner, group, ACE.sid)
identifier_authority.value[0..4] (5 bytes per SID, 15 total
- only value[5] is set)
- struct ntfs_acl
@alignment1 (1 byte)
@alignment2 (2 bytes)
That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates. The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.
Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes. The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.
Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.
Found by inspection while auditing fs/ntfs new-inode paths.
Fixes: af0db57d42 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.
Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.
This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.
A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.
Fixes: 0a8ac0c1fa ("ntfs: update directory operations")
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_rl_collapse_range() merges the run on the left of the collapsed
region with the run on its right when they are contiguous. The contiguous
check chooses a clamped index when @new_1st_cnt is 0:
i = new_1st_cnt == 0 ? 1 : new_1st_cnt;
if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {
but the merge itself uses the unclamped value:
s_rl = &new_rl[new_1st_cnt - 1];
s_rl->length += s_rl[1].length;
When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes
before the kvcalloc() runlist buffer. The path is reachable through
fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an
attribute whose first run after the collapsed region and the following
run are holes. In that case ntfs_rle_lcn_contiguous() returns true
because both checked entries are LCN_HOLE, so the merge path is entered
with @new_1st_cnt still 0. Such consecutive holes do not occur on a
well-formed runlist (NTFS keeps runlists coalesced in memory), so this
OOB path is only reachable from a crafted volume.
A normal runlist has no element to the left of vcn 0, so the left/right
merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be
positive before checking or performing the merge. This skips the merge
entirely in that case instead of clamping the merge target.
The out-of-bounds write can corrupt an adjacent slab object. On a
non-KASAN kernel, it is reachable after a crafted NTFS volume has been
mounted read-write with the legacy fs/ntfs driver, by a local user that
has write access to the crafted file.
Fixes: 11ccc9107d ("ntfs: update runlist handling and cluster allocator")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch warnings:
ntfs_attr_open() warn: variable dereferenced before check 'ni'
Moves the ntfs_debug() call after the NULL pointer checks to ensure safe
access to the structure members.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_init_fs_context() allocates a fresh ntfs_volume with vol->upcase
left as NULL. ntfs_free_fs_context() unconditionally calls
ntfs_volume_free() during fs_context teardown, even when ntfs_fill_super()
never ran or already cleaned up. ntfs_volume_free() then executes:
mutex_lock(&ntfs_lock);
if (vol->upcase == default_upcase) {
ntfs_nr_upcase_users--;
vol->upcase = NULL;
}
When the global default_upcase is also NULL (very first mount attempt,
or all prior mounts have released the table), the comparison is
NULL == NULL, and ntfs_nr_upcase_users is decremented even though this
volume never claimed a reference. ntfs_nr_upcase_users is unsigned long,
so the decrement wraps to ULONG_MAX.
A subsequent successful mount can then free the shared table while the
mounted volume still points at it:
1. ntfs_fill_super() does the temporary ntfs_nr_upcase_users++ at the
"Generate the global default upcase table if necessary" block. With
the prior wraparound this brings the counter back to 0.
2. If the volume's $UpCase matches the default, the match path does
ntfs_nr_upcase_users++ and sets vol->upcase = default_upcase. The
counter is now 1.
3. On the success path, !--ntfs_nr_upcase_users evaluates true and
default_upcase is kvfree()'d while vol->upcase still points at it.
Subsequent upcase comparisons through that mount touch freed
memory.
This was reproduced with KASAN by closing a fresh fsopen("ntfs") context,
then mounting an NTFS image whose $UpCase table matches
generate_default_upcase(), and finally doing a case-insensitive lookup.
KASAN reports the dangling vol->upcase access:
BUG: KASAN: use-after-free in ntfs_collate_names+0x3b4/0x420
Read of size 2 at addr ffff888008d40048 by task init/1
ntfs_collate_names+0x3b4/0x420
ntfs_lookup_inode_by_name+0x1921/0x3130
ntfs_lookup+0x193/0xc40
vfs_statx+0xc7/0x190
vfs_fstatat+0x4b/0xa0
__do_sys_newfstatat+0x92/0xf0
The same QEMU reproducer was rerun after this change with KASAN
enabled. It reached "reproducer finished", and the log contained no
KASAN, use-after-free, Oops, or panic signatures.
Guard each comparison with an explicit vol->upcase non-NULL check so a
volume that never took a reference cannot decrement the global users
counter. Apply the same guard to the other default_upcase release sites
so all cleanup paths follow the same ownership rule: only volumes that
actually hold a default_upcase reference may drop one.
Fixes: 1e9ea7e044 ("Revert "fs: Remove NTFS classic"")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_inode_sync_filename() walks every FILE_NAME attribute and, for
each one that points at a different parent, opens the parent index
inode with ntfs_iget() and locks index_ni->mrec_lock. All three error
branches (NInoBeingDeleted, ntfs_index_ctx_get failure, ntfs_index_lookup
failure) drop the parent reference before unlocking:
iput(index_vi);
mutex_unlock(&index_ni->mrec_lock);
continue;
index_ni is NTFS_I(index_vi), so the ntfs_inode (and its mrec_lock) is
embedded in the inode allocation. If the parent directory is not held
outside the icache - no open dentry, recently evicted from dcache, no
other concurrent lookup - ntfs_iget() returns with i_count == 1 and
our iput() drops the last reference. evict_inode() then runs and
destroy_inode() schedules the slab object for RCU free, while
mutex_unlock() on the next line is still touching index_ni->mrec_lock.
Swap the order so the mutex is dropped while index_vi is still alive,
matching the success path at the bottom of the loop which already
unlocks before iput().
Reproduced under KASAN with a debug build that forces
ntfs_index_ctx_get() to fail when the parent index inode has been
opened with i_count == 1. KASAN reports a slab-use-after-free read
on the parent's mrec_lock from mutex_unlock() on the writeback worker:
BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xb5/0x970
Read of size 8 at addr ffff8880014b7598 by task kworker/u8:0/12
Workqueue: writeback wb_workfn (flush-253:0)
Call Trace:
mutex_unlock
ntfs_inode_sync_filename
__ntfs_write_inode
ntfs_write_inode
__writeback_single_inode
Allocated by task 103:
ntfs_alloc_big_inode
ntfs_iget
ntfs_lookup
__x64_sys_mkdir
Freed by task 12:
ntfs_free_big_inode
i_callback
rcu_do_batch
Last potentially related work creation:
call_rcu
destroy_inode
evict
dispose_list
evict_inodes
ntfs_inode_sync_filename
__ntfs_write_inode
The buggy address belongs to the object at ffff8880014b7440
which belongs to the cache ntfs_big_inode_cache of size 1800
The freed object is the parent directory inode itself: allocated by
mkdir(2) via ntfs_iget(), then released through call_rcu(i_callback)
that destroy_inode() scheduled when evict_inodes() ran from inside
ntfs_inode_sync_filename(). Re-running the same workload with
mutex_unlock() moved before iput() runs cleanly under KASAN.
Fixes: af0db57d42 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
This is not a normal data I/O hot path. The single in-tree caller is
the $LogFile emptying path used during read-write mount/remount, and
the bug only becomes visible on NTFS volumes whose cluster_size is
strictly smaller than the kernel's PAGE_SIZE (typically 4 KiB on
x86_64). Per Microsoft's format command documentation, NTFS supports
allocation unit sizes starting at 512 bytes, so 512 B, 1 KiB and 2 KiB
clusters are uncommon but valid on-disk configurations. When
cluster_size >= PAGE_SIZE every "start" passed in is page-aligned and
the buggy "from != 0" path is never taken.
ntfs_bdev_write() splits the write across one or more block-device
folios. Inside the loop, "to" is computed as the *end byte offset*
within the current page (0..PAGE_SIZE), and "from" is the start byte
offset within the page (reset to 0 from the second iteration onward).
The copy length should therefore be "to - from", but the current code
uses "to" directly:
to = min_t(u32, end - offset, PAGE_SIZE);
memcpy_to_folio(folio, from, buf + buf_off, to);
buf_off += to;
When "from != 0" (i.e. "start" is not page-aligned) memcpy_to_folio()
copies "from" extra bytes:
- it reads "from" bytes past the source buffer into kernel heap;
- it writes "from" bytes past the requested range into the next part
of the block-device page (or, if "from + to > PAGE_SIZE", past the
folio boundary entirely, which trips the VM_BUG_ON inside
memcpy_to_folio() on CONFIG_DEBUG_VM=y kernels).
"buf_off" is then advanced by the wrong amount, so every subsequent
iteration also reads the source buffer at the wrong offset and writes
the wrong content to disk.
ntfs_empty_logfile() calls
ntfs_bdev_write(sb, empty_buf, NTFS_CLU_TO_B(vol, lcn),
vol->cluster_size);
with empty_buf sized to vol->cluster_size. On a sub-PAGE_SIZE-cluster
volume, any $LogFile run whose LCN is not aligned to
PAGE_SIZE / cluster_size reaches the non-page-aligned path. The
over-copy can read beyond empty_buf and overwrite the sectors following
the requested cluster in the block-device page with unrelated kernel
heap contents while $LogFile is being emptied.
A userspace reducer of the same arithmetic and copy loop confirms the
bug under AddressSanitizer: ASan reports a heap-buffer-overflow read
past the source buffer for the buggy length, and the fixed version is
ASan-clean.
Compute the copy length as "to - from" and advance buf_off by the same
amount.
Fixes: 5218cd102a ("ntfs: update misc operations")
Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/format
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set
are both documented as synchronous, but neither actually waits for
the bio they submit nor inspects bi_status. write_inode() can
return success while dirty mft record bytes are still in flight, and
bio errors are silently dropped: the volume is not marked with
errors and the inode is not redirtied. This breaks fsync()/sync
metadata durability.
Switch ntfs_sync_mft_mirror() and the @sync path of
write_mft_record_nolock() to submit_bio_wait() and propagate the
returned error to the caller. Capture ntfs_sync_mft_mirror()'s
return value at its call sites in write_mft_record_nolock() so a
mirror write failure surfaces too.
The @sync parameter only controls the main MFT bio. The !@sync main
submission is therefore unchanged and still uses ntfs_bio_end_io() to
drop the folio reference taken before submission. The mirror call
has always been documented as performing synchronous I/O regardless
of @sync, so making it actually block restores the originally
intended contract for both @sync and !@sync callers.
Note this only fixes the synchronous mirror/main paths reachable
from write_mft_record_nolock(). The main MFT write submitted from
ntfs_write_mft_block() (the .writepages path) still does not wait
for completion or check bi_status; that requires a larger
restructuring and is left to a follow-up patch.
Fixes: 115380f9a2 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
After ntfs_sync_mft_mirror() became able to return real I/O errors,
ntfs_write_mft_block() still discards its return value at the call
site inside the per-record loop. A failed $MFTMirr write therefore
leaves the volume looking clean from the writeback path even though
the on-disk mirror is now stale.
Capture the return value and feed it into the function's existing
@err variable using the same "first error wins" pattern already used
on other failure paths. The error is propagated to the caller and,
via the existing tail of the function, sets NVolErrors so umount and
chkdsk see the volume as inconsistent.
Fixes: 115380f9a2 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_write_mft_block() is called by writeback_iter() with the folio
locked. When the per-call allocations for @locked_nis or @ref_inos
fail, the function returns -ENOMEM directly without unlocking the
folio. Any later task that needs the folio's lock then stalls, and
the folio's dirty state is silently lost from the writeback
iterator's point of view.
Use folio_redirty_for_writepage() so the folio remains dirty for a
subsequent writeback pass, unlock it, and only then return -ENOMEM
so the caller can propagate the error to fsync()/sync_filesystem().
Fixes: f462fdf3d6 ("ntfs: reduce stack usage in ntfs_write_mft_block()")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
When the mft record is an extent record, ntfs_may_write_mft_record()
looks up its base inode in the icache. The hash key passed to
find_inode_nowait() must be the base inode's mft number (na.mft_no,
set just above to MREF_LE(m->base_mft_record)), but the code passes
@mft_no, the extent record's own number.
find_inode_nowait() uses its second argument as the hashval, so the
lookup lands in the wrong bucket and almost always returns NULL.
ntfs_may_write_mft_record() then returns false and the writeback
path (ntfs_write_mft_block()) skips that extent record, leaving the
on-disk copy permanently out of sync with the in-memory one.
The original ilookup5_nowait() call this conversion replaced used
na.mft_no. Restore that.
Fixes: 115380f9a2 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch warnings:
ntfs_attrlist_entry_add() warn: variable dereferenced before check 'ni'
ntfs_attrlist_entry_add() warn: variable dereferenced before check 'attr'
Moves the ntfs_debug() call after the NULL pointer checks to ensure safe
access to the structure members.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Clang warns (or errors with CONFIG_WERROR=y / W=e):
fs/ntfs/runlist.c:755:6: error: variable 'rl' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
755 | if (overflows_type(lowest_vcn, vcn)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
fs/ntfs/runlist.c:971:9: note: uninitialized use occurs here
971 | kvfree(rl);
| ^~
...
rl has not been allocated at this point so the 'goto err_out' should
really just be a return of the error pointer -EIO.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
NTFS could store a filename as paired WIN32 and DOS $FILE_NAME attributes
for directories. But ntfs_delete() deleted both attributes for unlinking
a directory, but it also called drop_nlink() for each attributes.
This could trigger warnings when unlinking directories.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The Smatch reported a warning in __ntfs_bitmap_set_bits_in_run():
"warn: passing a valid pointer to 'PTR_ERR'"
This occurs because the 'folio' variable might contain a valid pointer
when jumping to the 'rollback' label, specifically when 'cnt <= 0' is
detected during the subsequent page mapping loop. In such cases,
calling PTR_ERR(folio) is incorrect as it does not contain an error
code.
Fix this by introducing an explicit 'err' variable to track the error
status. This ensures that the rollback logic and the return value
consistently use a proper error code regardless of the state of the
folio pointer.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
When ntfs_attr_get_search_ctx() fails and returns NULL, the function
returned early without calling put_page(ipage).
Fix this by jumping to err_out label on error. The err_out path now
properly releases the page and the mutex, with a NULL check for
the search context.
Reported-by: DaeMyung Kang <charsyam@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
In ntfs_mapping_pairs_decompress(), lowest_vcn is read from
on-disk metadata and used as the initial vcn without validation.
A malformed value can introduce an invalid (e.g. negative) vcn,
corrupting the runlist from the start.
Additionally, the accumulation
vcn += deltaxcn
does not check for s64 overflow. A crafted mapping pairs array
can wrap vcn to a negative value, breaking the monotonically-
increasing invariant relied upon by ntfs_rl_vcn_to_lcn() and
related helpers.
Fix this by validating lowest_vcn and using check_add_overflow()
for vcn accumulation.
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_reparse_set_wsl_symlink() converts the symlink target into an
allocated NLS string and transfers ownership to ni->target only after
ntfs_set_ntfs_reparse_data() succeeds. If setting the reparse data fails,
the converted target is left unreferenced and leaks.
Free the converted target on the reparse update failure path. Use kfree()
for the other local failure path as well, matching the ntfs_ucstonls()
allocation contract.
Fixes: fc053f05ca ("ntfs: add reparse and ea operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_index_walk_down() allocates ictx->ib when descending from the root
into an index allocation block. If that allocation fails, the old code
still passes the NULL buffer to ntfs_ib_read(), which can write through
it via ntfs_inode_attr_pread().
Allocate the index block into a temporary pointer and return -ENOMEM
before changing the index context on allocation failure. Also propagate
ERR_PTR() through ntfs_index_next() and ntfs_readdir() so walk-down
allocation or index block read failures are not mistaken for normal
index iteration inside the filesystem.
ntfs_readdir() keeps the existing userspace-visible behavior of
suppressing readdir errors after marking end_in_iterate; this change only
prevents the walk-down failure path from dereferencing NULL internally.
The failure was reproduced with failslab fail-nth injection on getdents64;
the original module hits a NULL pointer dereference in memcpy_orig through
ntfs_ib_read(), while the patched module reaches the same
ntfs_index_walk_down() allocation failure without crashing.
Fixes: 0a8ac0c1fa ("ntfs: update directory operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The current kmemdup() based allocation for IOMAP_INLINE can result in
inline_data pointer having a non-zero page offset. This causes
iomap_inline_data_valid() to fail the check:
iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data)
and triggers the kernel BUG at fs/iomap/buffered-io.c:1061.
This particularly affects workloads with frequent small file access
(e.g. Firefox Nightly profile on NTFS with bind mount) when using the
new ntfs. This fix this by allocating a full page with alloc_page() so that
page_address() always returns a page-aligned address.
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Linus pointed out that checking only VMA_WRITE_BIT is incorrect.
Private writable mappings (MAP_PRIVATE) set VM_WRITE but do not
write back to the filesystem. Also, mappings that can become
writable via mprotect() (VM_MAYWRITE) must be handled.
Use vma_desc_test_all(VMA_SHARED_BIT, VMA_MAYWRITE_BIT) instead,
which matches what other filesystems do.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch warned that the bitwise negation in ntfs_write_cb() might lead to
unintended truncation. Casting the block size to loff_t before bitwise
negation prevents the upper 32 bits of pos from being incorrectly zeroed
out during the calculation of new_vcn.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch reported that ctx_needs_reset could be used uninitialized if
ntfs_map_runlist_nolock() fails early when a search context is provided.
Specifically, if the function returns -EIO because the attribute is
resident, the code jumps to err_out. This initializes ctx_needs_reset to
false to satisfy the static checker.
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
We know "ret2" is zero so there is no need to check. Delete the
if statement.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch reported uninitialized symbol warnings in ntfs_ea_set_wsl_inode()
and __ntfs_create(). In ntfs_ea_set_wsl_inode(), the err variable could be
returned without initialization if no flags are set and rdev is zero.
Additionally, ea_size might remain uninitialized from the caller's
perspective if no EA operations are performed. While these cases might not
be triggered under current logic, we initialize them to zero to satisfy
the static checker.
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch reported that the variable rl could be used uninitialized in
ntfs_write_mft_block(). After analyzing the code,
when vol->cluster_size == NTFS_BLOCK_SIZE (512), it is smaller than
folio_size, so rl is guaranteed to be initialized. If vol->cluster_size
is larger, the condition to access rl becomes false, so a runtime error is
not expected to occur. However, to make the static checker happy,
this patch initializes rl to NULL and adds an explicit check before
its usage.
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Smatch reported that err could be used uninitialized if the code path
does not enter the first ntfs_zero_range() block.
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Since commit a2ad63daa8 ("VFS: add FMODE_CAN_ODIRECT file flag"),
noop_direct_io is not required.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
check an attribute size before memory allocation, and reject if the size
is over the maximum size.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The area beyond initialized_size are read as zero values, there is no need
to zero out that region.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs_read_iomap_begin_non_resident() rounds up MAPPED extents
to the block boundary of initialized_size. This ensures that
any subsequent blocks are treated as IOMAP_UNWRITTEN, but
it also causes the "straddle block" containing initialized_size
to be read from disk. The disk data beyond initialized_size in
this block is stale and must be zeroed to prevent data leakage.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
- Write support:
Implemented full write support based on the classic read-only NTFS
driver. Added delayed allocation to improve write performance through
multi-cluster allocation and reduced fragmentation of the cluster
bitmap.
- iomap conversion:
Switched buffered IO (reads/writes), direct IO, file extent mapping,
readpages, and writepages to use iomap.
- Remove buffer_head:
Completely removed buffer_head usage by converting to folios.
As a result, the dependency on CONFIG_BUFFER_HEAD has been removed
from Kconfig.
- Stability improvements:
The new ntfs driver passes 326 xfstests, compared to 273 for ntfs3.
All tests passed by ntfs3 are a complete subset of the tests passed
by this implementation. Added support for fallocate, idmapped mounts,
permissions, and more.
- xfstests Results report:
Total tests run: 787
Passed : 326
Failed : 38
Skipped : 423
Failed tests breakdown:
- 34 tests require metadata journaling
- 4 other tests:
094: No unwritten extent concept in NTFS on-disk format
563: cgroup v2 aware writeback accounting not supported
631: RENAME_WHITEOUT support required
787: NFS delegation test
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmnhuSoWHGxpbmtpbmpl
b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCCtPD/9SCy3cCJZqzAtiRIYmrVD4ji9a
vE4rZLu2A1SfTATjpKOgn7gLjBq0b2m3MxpdE8AawtXKmqOPLQFD4zj2sWePznfJ
zy1tim2b5rEKNBQjlzoMGpmbuL+GvDWRE/RtlFQAy5uxLrqCWxtlsP0VcgKwvi+1
MvhghuLNPCAtYA3ajadLUXK8LmhMtPVNHEqykFzTjcKAPqZyWAyS4wW7UXfPYccW
u+XBxuR6qdlWoVpQ3ig+gJSkadViQ/PfpjzCGPsyyvaiR0t3//SexOnhHKNgPStm
zKmD3X38y0X3wUaoDysNboPCM9+d0WsHX8whMgcCUXrIv0SRy5IL5RH0GmotaVha
n6uUfAJ3BsBbF0DgXe4VKbY9M7UoQfjPEre52F5arM6y0qcQ+2HtvBZPCqQccb4o
MsbuhgjyArN4LVwDb6sMK7psLjrxvuAiPtvUMUzgAt2cqCgFnHUR/SAOLy3q/RR4
6QCcfoOJX4YK5AJYgQdAXuF8h8T865OR3dPIKyz5SMyjcj+epPdf/mf4Fhe3xfpO
a8iFoQlKwLbD6zvpZcD5PMEAFpaVJEyYaAZ7DJLhfRpBofbn7y1U84/Ryq52+n1K
Me8ePvuN2Q2HjTr2RjoqnV+SAgVFeifOyrYUCu8HiGVkfpFw5UfQRJUflt4l5+cd
wA2frad6InBq4nGO4A==
=x5UX
-----END PGP SIGNATURE-----
Merge tag 'ntfs-for-7.1-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs
Pull ntfs resurrection from Namjae Jeon:
"Ever since Kari Argillander’s 2022 report [1] regarding the state of
the ntfs3 driver, I have spent the last 4 years working to provide
full write support and current trends (iomap, no buffer head, folio),
enhanced performance, stable maintenance, utility support including
fsck for NTFS in Linux.
This new implementation is built upon the clean foundation of the
original read-only NTFS driver, adding:
- Write support:
Implemented full write support based on the classic read-only NTFS
driver. Added delayed allocation to improve write performance
through multi-cluster allocation and reduced fragmentation of the
cluster bitmap.
- iomap conversion:
Switched buffered IO (reads/writes), direct IO, file extent
mapping, readpages, and writepages to use iomap.
- Remove buffer_head:
Completely removed buffer_head usage by converting to folios. As a
result, the dependency on CONFIG_BUFFER_HEAD has been removed from
Kconfig.
- Stability improvements:
The new ntfs driver passes 326 xfstests, compared to 273 for ntfs3.
All tests passed by ntfs3 are a complete subset of the tests passed
by this implementation. Added support for fallocate, idmapped
mounts, permissions, and more.
xfstests Results report:
Total tests run: 787
Passed : 326
Failed : 38
Skipped : 423
Failed tests breakdown:
- 34 tests require metadata journaling
- 4 other tests:
094: No unwritten extent concept in NTFS on-disk format
563: cgroup v2 aware writeback accounting not supported
631: RENAME_WHITEOUT support required
787: NFS delegation test"
Link: https://lore.kernel.org/all/da20d32b-5185-f40b-48b8-2986922d8b25@stargateuniverse.net/ [1]
[ Let's see if this undead filesystem ends up being of the "Easter
miracle" kind, or the "Nosferatu of filesystems" kind... ]
* tag 'ntfs-for-7.1-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: (46 commits)
ntfs: remove redundant out-of-bound checks
ntfs: add bound checking to ntfs_external_attr_find
ntfs: add bound checking to ntfs_attr_find
ntfs: fix ignoring unreachable code warnings
ntfs: fix inconsistent indenting warnings
ntfs: fix variable dereferenced before check warnings
ntfs: prefer IS_ERR_OR_NULL() over manual NULL check
ntfs: harden ntfs_listxattr against EA entries
ntfs: harden ntfs_ea_lookup against malformed EA entries
ntfs: check $EA query-length in ntfs_ea_get
ntfs: validate WSL EA payload sizes
ntfs: fix WSL ea restore condition
ntfs: add missing newlines to pr_err() messages
ntfs: fix pointer/integer casting warnings
ntfs: use ->mft_no instead of ->i_ino in prints
ntfs: change mft_no type to u64
ntfs: select FS_IOMAP in Kconfig
ntfs: add MODULE_ALIAS_FS
ntfs: reduce stack usage in ntfs_write_mft_block()
ntfs: fix sysctl table registration and path
...
Remove redundant out-of-bounds validations.
Since ntfs_attr_find and ntfs_external_attr_find
now validate the attribute value offsets and
lengths against the bounds of the MFT record block,
performing subsequent bounds checking in caller
functions like ntfs_attr_lookup is no longer necessary.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Add bound validation in ntfs_external_attr_find to
prevent out-of-bounds memory accesses. This ensures
that the attribute record's length, name offset, and
both resident and non-resident value offsets strictly
fall within the safe boundaries of the MFT record.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Add bound validations in ntfs_attr_find to ensure
attribute value offsets and lengths are safe to
access. It verifies that resident attributes meet
type-specific minimum length requirements and
check the mapping_pairs_offset boundaries for
non-resident attributes.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Detected by Smatch.
lcnalloc.c:736 ntfs_cluster_alloc() error:
we previously assumed 'rl' could be null (see line 719)
inode.c:3275 ntfs_inode_close() warn:
variable dereferenced before check 'tmp_nis' (see line 3255)
attrib.c:4952 ntfs_attr_remove() warn:
variable dereferenced before check 'ni' (see line 4951)
dir.c:1035 ntfs_readdir() error:
we previously assumed 'private' could be null (see line 850)
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Use IS_ERR_OR_NULL() instead of manual NULL and IS_ERR() checks.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Validate every EA entry only if the buffer length is required to prevent
large memory allocation.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Validate p_ea->ea_name_length tightly, and the used entry size
for every EA.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
if ea_info_qlen exceeds all_ea_size, OOB can happen.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>