Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a
generic cred to a svc_cred suitable for use in nfsd.
This is needed by the localio code to map nfs client creds to nfs
server credentials.
Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that
->fsuid and ->fsgid must be used (rather than ->uid and ->gid). In
addition, these uid and gid must be translated with from_kuid_munged()
so local client uses correct uid and gid when acting as local server.
Jeff Layton noted:
This is where the magic happens. Since we're working in
kuid_t/kgid_t, we don't need to worry about further idmapping.
Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal() Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There is really no value in triggering a BUG_ON in response to either
of these previously unsupported conditions.
NeilBrown would like the entire 'if (proc->p_proc != 0)' branch
removed (not just the one BUG_ON that must be removed for LOCALIO's
immediate needs of returning void).
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Mike Snitzer [Thu, 5 Sep 2024 19:09:44 +0000 (15:09 -0400)]
nfsd: add nfsd_serv_try_get and nfsd_serv_put
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
A percpu_ref is used to implement the interlock between
nfsd_destroy_serv and any caller of nfsd_serv_try_get.
This interlock is needed to properly wait for the completion of client
initiated localio calls to nfsd (that are _not_ in the context of nfsd).
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfsd_file_acquire_local() can be used to look up a file by filehandle
without having a struct svc_rqst. This can be used by NFS LOCALIO to
allow the NFS client to bypass the NFS protocol to directly access a
file provided by the NFS server which is running in the same kernel.
In nfsd_file_do_acquire() care is taken to always use fh_verify() if
rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise
the non-LOCALIO callers will not supply the correct and required
arguments to __fh_verify (e.g. gssclient isn't passed).
Introduce fh_verify_local() wrapper around __fh_verify to make it
clear that LOCALIO is intended caller.
Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC
offers performance improvements if/when a file is reopened before
launderette cleans it from the filecache's LRU.
Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfsd: factor out __fh_verify to allow NULL rqstp to be passed
__fh_verify() offers an interface like fh_verify() but doesn't require
a struct svc_rqst *, instead it also takes the specific parts as
explicit required arguments. So it is safe to call __fh_verify() with
a NULL rqstp, but the net, cred, and client args must not be NULL.
__fh_verify() does not use SVC_NET(), nor does the functions it calls.
Rather than using rqstp->rq_client pass the client and gssclient
explicitly to __fh_verify and then to nfsd_set_fh_dentry().
Lastly, it should be noted that the previous commit prepared for 4
associated tracepoints to only be used if rqstp is not NULL (this is a
stop-gap that should be properly fixed so localio also benefits from
the utility these tracepoints provide when debugging fh_verify
issues).
Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Chuck Lever [Thu, 5 Sep 2024 19:09:41 +0000 (15:09 -0400)]
NFSD: Short-circuit fh_verify tracepoints for LOCALIO
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
case, the existing trace points need to be skipped because they
want to dereference the address fields in the passed-in rqstp.
Temporarily make these trace points conditional to avoid a seg
fault in this case. Putting the "rqstp != NULL" check in the trace
points themselves makes the check more efficient.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Chuck Lever [Thu, 5 Sep 2024 19:09:40 +0000 (15:09 -0400)]
NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.
Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.
The file handle used by lockd and the one created by write_filehandle
never need any of the version-specific fields (which affect things
like write and getattr requests and pre/post attributes).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure. They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.
Prepare these to always succeed when rqstp is NULL.
Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
NFSD: Handle @rqstp == NULL in check_nfsd_access()
LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.
Signed-off-by: NeilBrown <neilb@suse.de> Co-developed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Mike Snitzer [Thu, 5 Sep 2024 19:09:37 +0000 (15:09 -0400)]
nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
Eliminates duplicate functions in various files to allow for
additional callers.
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Mike Snitzer [Thu, 5 Sep 2024 19:09:36 +0000 (15:09 -0400)]
nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Mike Snitzer [Thu, 5 Sep 2024 19:09:35 +0000 (15:09 -0400)]
nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c
Will also be used by fs/nfsd/localio.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Dan Aloni [Wed, 24 Jul 2024 11:07:12 +0000 (14:07 +0300)]
nfs: add 'noalignwrite' option for lock-less 'lost writes' prevention
There are some applications that write to predefined non-overlapping
file offsets from multiple clients and therefore don't need to rely on
file locking. However, if these applications want non-aligned offsets
and sizes they need to either use locks or risk data corruption, as the
NFS client defaults to extending writes to whole pages.
This commit adds a new mount option `noalignwrite`, which allows to turn
that off and avoid the need of locking, as long as these applications
don't overlap on offsets.
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Li Lingfeng [Sat, 24 Aug 2024 01:43:35 +0000 (09:43 +0800)]
nfs: fix the comment of nfs_get_root
The comment for nfs_get_root() needs to be updated as it would also be
used by NFS4 as follows:
@x[
nfs_get_root+1
nfs_get_tree_common+1819
nfs_get_tree+2594
vfs_get_tree+73
fc_mount+23
do_nfs4_mount+498
nfs4_try_get_tree+134
nfs_get_tree+2562
vfs_get_tree+73
path_mount+2776
do_mount+226
__se_sys_mount+343
__x64_sys_mount+106
do_syscall_64+69
entry_SYSCALL_64_after_hwframe+97
, mount.nfs4]: 1
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Roi Azarzar [Sun, 15 Sep 2024 10:27:35 +0000 (10:27 +0000)]
NFSv4.2: Fix detection of "Proxying of Times" server support
According to draft-ietf-nfsv4-delstid-07:
If a server informs the client via the fattr4_open_arguments
attribute that it supports
OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid
delegation stateid for an OPEN operation which sets the
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the
client via a CB_GETATTR for the fattr4_time_deleg_access (see
Section 5.2) attribute and fattr4_time_deleg_modify attribute (see
Section 5.2).
Thus, we should look that the server supports proxying of times via
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS.
We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS
and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the
client to correctly detect "Proxying of Times" support.
Signed-off-by: Roi Azarzar <roi.azarzar@vastdata.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: dcb3c20f7419 ("NFSv4: Add a capability for delegated attributes") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If the server is down when the client is trying to mount, so that the
calls to exchange_id or create_session fail, then we should allow the
mount system call to fail rather than hang and block other mount/umount
calls.
Zhaoyang Huang [Fri, 30 Aug 2024 03:27:47 +0000 (11:27 +0800)]
fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private
This patch is inspired by a code review of fs codes which aims at
folio's extra refcnt that could introduce unwanted behavious when
judging refcnt, such as[1].That is, the folio passed to
mapping_evict_folio carries the refcnts from find_lock_entries,
page_cache, corresponding to PTEs and folio's private if has. However,
current code doesn't take the refcnt for folio's private which could
have mapping_evict_folio miss the one to only PTE and lead to
call filemap_release_folio wrongly.
[1]
long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
{
...
//current code will misjudge here if there is one pte on the folio which
is be deemed as the one as folio's private
if (folio_ref_count(folio) >
folio_nr_pages(folio) + folio_has_private(folio) + 1)
return 0;
if (!filemap_release_folio(folio, 0))
return 0;
return remove_mapping(mapping, folio);
}
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Gaosheng Cui [Mon, 26 Aug 2024 03:21:57 +0000 (11:21 +0800)]
nfs: Remove obsoleted declaration for nfs_read_prepare
The nfs_read_prepare() have been removed since
commit a4cdda59111f ("NFS: Create a common pgio_rpc_prepare function"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
destroy_wait doesn't store all RPC clients. There was a list named
"all_clients" above it, which got moved to struct sunrpc_net in 2012,
but the comment was never removed.
Fixes: 70abc49b4f4a ("SUNRPC: make SUNPRC clients list per network namespace context") Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stephen Brennan [Mon, 19 Aug 2024 15:58:59 +0000 (08:58 -0700)]
SUNRPC: convert RPC_TASK_* constants to enum
The RPC_TASK_* constants are defined as macros, which means that most
kernel builds will not contain their definitions in the debuginfo.
However, it's quite useful for debuggers to be able to view the task
state constant and interpret it correctly. Conversion to an enum will
ensure the constants are present in debuginfo and can be interpreted by
debuggers without needing to hard-code them and track their changes.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Kunwu Chan [Fri, 16 Aug 2024 02:07:40 +0000 (10:07 +0800)]
SUNRPC: Fix -Wformat-truncation warning
Increase size of the servername array to avoid truncated output warning.
net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated
writing up to 107 bytes into a region of size 48
[-Werror=format-truncation=]
582 | snprintf(servername, sizeof(servername), "%s",
| ^~
net/sunrpc/clnt.c:582:33: note:‘snprintf’ output
between 1 and 108 bytes into a destination of size 48
582 | snprintf(servername, sizeof(servername), "%s",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
583 | sun->sun_path);
Thorsten Blum [Wed, 14 Aug 2024 10:01:28 +0000 (12:01 +0200)]
nfs: Annotate struct nfs_cache_array with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
array to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Increment size before adding a new struct to the array.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
v4.0 LOCK request to a Linux server (which had fixed the problem with
RELEASE_LOCKOWNER bug fixed).
The LOCK request presented a "new" lock owner so there are two seq ids
in the request: that for the open file, and that for the new lock.
Given the context I am confident that the new lock owner was reported to
have the wrong seqid. As lock owner identifiers are reused, the server
must still have a lock owner active which the client thinks is no longer
active.
I wasn't able to determine a root-cause but the simplest fix seems to be
to ensure lock owners are always unique much as open owners are (thanks
to a time stamp). The easiest way to ensure uniqueness is with a 64bit
counter for each server. That will never cycle (if updated once a
nanosecond the last 584 years. A single NFS server would not handle
open/lock requests nearly that fast, and a Linux node is unlikely to
have an uptime approaching that).
This patch removes the 2 ida and instead uses a per-server
atomic64_t to provide uniqueness.
Note that the lock owner already encodes the id as 64 bits even though
it is a 32bit value. So changing to a 64bit value does not change the
encoding of the lock owner. The open owner encoding is now 4 bytes
larger.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Li Lingfeng [Wed, 4 Sep 2024 12:34:57 +0000 (20:34 +0800)]
nfs: fix memory leak in error path of nfs4_do_reclaim
Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in
nfs4_do_reclaim()") separate out the freeing of the state owners from
nfs4_purge_state_owners() and finish it outside the rcu lock.
However, the error path is omitted. As a result, the state owners in
"freeme" will not be released.
Fix it by adding freeing in the error path.
Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Anna Schumaker [Mon, 23 Sep 2024 19:00:07 +0000 (15:00 -0400)]
Merge tag 'nfsd-6.12' into linux-next-with-localio
NFSD 6.12 Release Notes
Notable features of this release include:
- Pre-requisites for automatically determining the RPC server thread
count
- Clean-up and preparation for supporting LOCALIO, which will be
merged via the NFS client tree
- Enhancements and fixes to NFSv4.2 COPY offload
- A new Python-based tool for generating kernel SunRPC XDR encoding
and decoding functions, added as an aid for prototyping features
in protocols based on the Linux kernel's SunRPC implementation.
As always I am grateful to the NFSD contributors, reviewers,
testers, and bug reporters who participated during this cycle.
Chuck Lever [Fri, 13 Sep 2024 17:50:56 +0000 (13:50 -0400)]
xdrgen: Prevent reordering of encoder and decoder functions
I noticed that "xdrgen source" reorders the procedure encoder and
decoder functions every time it is run. I would prefer that the
generated code be more deterministic: it enables a reader to better
see exactly what has changed between runs of the tool.
The problem is that Python sets are not ordered. I use a Python set
to ensure that, when multiple procedures use a particular argument or
result type, the encoder/decoder for that type is emitted only once.
Sets aren't ordered, but I can use Python dictionaries for this
purpose to ensure the procedure functions are always emitted in the
same order if the .x file does not change.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 13 Sep 2024 18:08:13 +0000 (14:08 -0400)]
tools: Add xdrgen
Add a Python-based tool for translating XDR specifications into XDR
encoder and decoder functions written in the Linux kernel's C coding
style. The generator attempts to match the usual C coding style of
the Linux kernel's SunRPC consumers.
This approach is similar to the netlink code generator in
tools/net/ynl .
The maintainability benefits of machine-generated XDR code include:
- Stronger type checking
- Reduces the number of bugs introduced by human error
- Makes the XDR code easier to audit and analyze
- Enables rapid prototyping of new RPC-based protocols
- Hardens the layering between protocol logic and marshaling
- Makes it easier to add observability on demand
- Unit tests might be built for both the tool and (automatically)
for the generated code
In addition, converting the XDR layer to use memory-safe languages
such as Rust will be easier if much of the code can be converted
automatically.
Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds. A
new filehandle would be recorded in the "new" bit set. That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.
Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set. This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.
This patch updates bd->new before clearing the set with that index,
instead of afterwards.
Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Mon, 9 Sep 2024 14:40:53 +0000 (10:40 -0400)]
nfsd: fix initial getattr on write delegation
At this point in compound processing, currentfh refers to the parent of
the file, not the file itself. Get the correct dentry from the delegation
stateid instead.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Thu, 29 Aug 2024 13:26:40 +0000 (09:26 -0400)]
nfsd: untangle code in nfsd4_deleg_getattr_conflict()
The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
With this patch we:
- properly handle non-nfsd leases. We must not assume flc_owner is a
delegation unless fl_lmops == &nfsd_lease_mng_ops
- move the main code out of the for loop
- have a single exit which calls nfs4_put_stid()
(and other exits which don't need to call that)
[ jlayton: refactored on top of Neil's other patch: nfsd: fix
nfsd4_deleg_getattr_conflict in presence of third party lease ]
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Scott Mayhew [Mon, 9 Sep 2024 20:28:54 +0000 (16:28 -0400)]
nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
This patch is intended to go on top of "nfsd: return -EINVAL when
namelen is 0" from Li Lingfeng. Li's patch checks for 0, but we should
be enforcing an upper bound as well.
Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its
database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the
downcall anyway.
Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Li Lingfeng [Tue, 3 Sep 2024 11:14:46 +0000 (19:14 +0800)]
nfsd: return -EINVAL when namelen is 0
When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
result in namelen being 0, which will cause memdup_user() to return
ZERO_SIZE_PTR.
When we access the name.data that has been assigned the value of
ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
triggered.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 74725959c33c ("nfsd: un-deprecate nfsdcld") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 28 Aug 2024 17:40:04 +0000 (13:40 -0400)]
NFSD: Limit the number of concurrent async COPY operations
Nothing appears to limit the number of concurrent async COPY
operations that clients can start. In addition, AFAICT each async
COPY can copy an unlimited number of 4MB chunks, so can run for a
long time. Thus IMO async COPY can become a DoS vector.
Add a restriction mechanism that bounds the number of concurrent
background COPY operations. Start simple and try to be fair -- this
patch implements a per-namespace limit.
An async COPY request that occurs while this limit is exceeded gets
NFS4ERR_DELAY. The requesting client can choose to send the request
again after a delay or fall back to a traditional read/write style
copy.
If there is need to make the mechanism more sophisticated, we can
visit that in future patches.
Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Wed, 28 Aug 2024 17:40:03 +0000 (13:40 -0400)]
NFSD: Async COPY result needs to return a write verifier
Currently, when NFSD handles an asynchronous COPY, it returns a
zero write verifier, relying on the subsequent CB_OFFLOAD callback
to pass the write verifier and a stable_how4 value to the client.
However, if the CB_OFFLOAD never arrives at the client (for example,
if a network partition occurs just as the server sends the
CB_OFFLOAD operation), the client will never receive this verifier.
Thus, if the client sends a follow-up COMMIT, there is no way for
the client to assess the COMMIT result.
The usual recovery for a missing CB_OFFLOAD is for the client to
send an OFFLOAD_STATUS operation, but that operation does not carry
a write verifier in its result. Neither does it carry a stable_how4
value, so the client /must/ send a COMMIT in this case -- which will
always fail because currently there's still no write verifier in the
COPY result.
Thus the server needs to return a normal write verifier in its COPY
result even if the COPY operation is to be performed asynchronously.
If the server recognizes the callback stateid in subsequent
OFFLOAD_STATUS operations, then obviously it has not restarted, and
the write verifier the client received in the COPY result is still
valid and can be used to assess a COMMIT of the copied data, if one
is needed.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Fri, 30 Aug 2024 07:03:17 +0000 (17:03 +1000)]
nfsd: avoid races with wake_up_var()
wake_up_var() needs a barrier after the important change is made in the
var and before wake_up_var() is called, else it is possible that a wake
up won't be sent when it should.
In each case here the var is changed in an "atomic" manner, so
smb_mb__after_atomic() is sufficient.
In one case the important change (removing the lease) is performed
*after* the wake_up, which is backwards. The code survives in part
because the wait_var_event is given a timeout.
This patch adds the required barriers and calls destroy_delegation()
*before* waking any threads waiting for the delegation to be destroyed.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Thorsten Blum [Wed, 28 Aug 2024 21:42:55 +0000 (23:42 +0200)]
NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Use struct_size() instead of manually calculating the number of bytes to
allocate for a pnfs_block_deviceaddr with a single volume.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Guoqing Jiang [Wed, 21 Aug 2024 14:03:18 +0000 (22:03 +0800)]
nfsd: call cache_put if xdr_reserve_space returns NULL
If not enough buffer space available, but idmap_lookup has triggered
lookup_fn which calls cache_get and returns successfully. Then we
missed to call cache_put here which pairs with cache_get.
Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviwed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Mon, 26 Aug 2024 12:50:12 +0000 (08:50 -0400)]
nfsd: track the main opcode for callbacks
Keep track of the "main" opcode for the callback, and display it in the
tracepoint. This makes it simpler to discern what's happening when there
is more than one callback in flight.
The one special case is the CB_NULL RPC. That's not a CB_COMPOUND
opcode, so designate the value 0 for that.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Li Lingfeng [Fri, 23 Aug 2024 07:00:49 +0000 (15:00 +0800)]
nfsd: remove unused parameter of nfsd_file_mark_find_or_create
Commit 427f5f83a319 ("NFSD: Ensure nf_inode is never dereferenced") passes
inode directly to nfsd_file_mark_find_or_create instead of getting it from
nf, so there is no need to pass nf.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Li Lingfeng [Wed, 14 Aug 2024 11:29:07 +0000 (19:29 +0800)]
NFSD: remove redundant assignment operation
Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and
write delegations") added a new assignment statement to add
RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
old one should be removed.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Sun, 11 Aug 2024 17:11:07 +0000 (13:11 -0400)]
NFSD: Fix NFSv4's PUTPUBFH operation
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).
PUTPUBFH should now behave just like PUTROOTFH.
Reported-by: Cedric Blancher <cedric.blancher@gmail.com> Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton@gmail.com> Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Mark Grimes [Wed, 7 Aug 2024 01:58:34 +0000 (18:58 -0700)]
nfsd: Add quotes to client info 'callback address'
The 'callback address' in client_info_show is output without quotes
causing yaml parsers to fail on processing IPv6 addresses.
Adding quotes to 'callback address' also matches that used by
the 'address' field.
Signed-off-by: Mark Grimes <mark.grimes@ixsystems.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Mon, 29 Jul 2024 20:52:32 +0000 (16:52 -0400)]
svcrdma: Handle device removal outside of the CM event handler
Synchronously wait for all disconnects to complete to ensure the
transports have divested all hardware resources before the
underlying RDMA device can safely be removed.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NeilBrown [Wed, 14 Aug 2024 13:21:01 +0000 (09:21 -0400)]
nfsd: move error choice for incorrect object types to version-specific code.
If an NFS operation expects a particular sort of object (file, dir, link,
etc) but gets a file handle for a different sort of object, it must
return an error. The actual error varies among NFS versions in non-trivial
ways.
For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
INVAL is suitable.
For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
was found when not expected. This take precedence over NOTDIR.
For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
preference to EINVAL when none of the specific error codes apply.
When nfsd_mode_check() finds a symlink where it expected a directory it
needs to return an error code that can be converted to NOTDIR for v2 or
v3 but will be SYMLINK for v4. It must be different from the error
code returns when it finds a symlink but expects a regular file - that
must be converted to EINVAL or SYMLINK.
So we introduce an internal error code nfserr_symlink_not_dir which each
version converts as appropriate.
nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable
error if the object is the wrong time. For v4.0 we use nfserr_symlink
for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type.
We handle this difference in-place in nfsd_check_obj_isreg() as there is
nothing to be gained by delaying the choice to nfsd4_map_status().
As a result of these changes, nfsd_mode_check() doesn't need an rqstp
arg any more.
Note that NFSv4 operations are actually performed in the xdr code(!!!)
so to the only place that we can map the status code successfully is in
nfsd4_encode_operation().
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: be more systematic about selecting error codes for internal use.
Rather than using ad hoc values for internal errors (30000, 11000, ...)
use 'enum' to sequentially allocate numbers starting from the first
known available number - now visible as NFS4ERR_FIRST_FREE.
The goal is values that are distinct from all be32 error codes. To get
those we must first select integers that are not already used, then
convert them with cpu_to_be32().
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: Move error code mapping to per-version proc code.
There is code scattered around nfsd which chooses an error status based
on the particular version of nfs being used. It is cleaner to have the
version specific choices in version specific code.
With this patch common code returns the most specific error code
possible and the version specific code maps that if necessary.
Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
function which is called to map the resp->status before each non-trivial
nfsd_proc_* or nfsd3_proc_* function returns.
NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
are left for a later patch.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
With this patch the only places that test ->rq_vers against a specific
version are nfsd_v4client() and nfsd_set_fh_dentry().
The latter sets some flags in the svc_fh, which now includes:
fh_64bit_cookies
fh_use_wgather
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()
nfsd_breaker_owns_lease() currently open-codes the same test that
nfsd_v4client() performs.
With this patch we use nfsd_v4client() instead.
Also as i_am_nfsd() is only used in combination with kthread_data(),
replace it with nfsd_current_rqst() which combines the two and returns a
valid svc_rqst, or NULL.
The test for NULL is moved into nfsd_v4client() for code clarity.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nfsd: Pass 'cred' instead of 'rqstp' to some functions.
nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags()
only ever need the cred out of rqstp, so pass it explicitly instead of
the whole rqstp.
This makes the interfaces cleaner.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
sunrpc: allow svc threads to fail initialisation cleanly
If an svc thread needs to perform some initialisation that might fail,
it has no good way to handle the failure.
Before the thread can exit it must call svc_exit_thread(), but that
requires the service mutex to be held. The thread cannot simply take
the mutex as that could deadlock if there is a concurrent attempt to
shut down all threads (which is unlikely, but not impossible).
nfsd currently call svc_exit_thread() unprotected in the unlikely event
that unshare_fs_struct() fails.
We can clean this up by introducing svc_thread_init_status() by which an
svc thread can report whether initialisation has succeeded. If it has,
it continues normally into the action loop. If it has not,
svc_thread_init_status() immediately aborts the thread.
svc_start_kthread() waits for either of these to happen, and calls
svc_exit_thread() (under the mutex) if the thread aborted.
Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.
As documented in svc_xprt.c, sv_nrthreads is protected by the service
mutex, and it does not need ->sv_lock.
(->sv_lock is needed only for sv_permsocks, sv_tempsocks, and
sv_tmpcnt).
So remove the unnecessary locking.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Merge tag 'timers_urgent_for_v6.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
- Remove percpu irq related code in the timer-of initialization routine
as it is broken but also unused (Daniel Lezcano)
- Fix return -ETIME when delta exceeds INT_MAX and the next event not
taking effect sometimes (Jacky Bai)
* tag 'timers_urgent_for_v6.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/imx-tpm: Fix next event not taking effect sometime
clocksource/drivers/imx-tpm: Fix return -ETIME when delta exceeds INT_MAX
clocksource/drivers/timer-of: Remove percpu irq related code
Merge tag 'perf_urgent_for_v6.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Fix perf's AUX buffer serialization
- Prevent uninitialized struct members in perf's uprobes handling
* tag 'perf_urgent_for_v6.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Fix AUX buffer serialization
uprobes: Use kzalloc to allocate xol area
Merge tag 'char-misc-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/other driver fixes for 6.11-rc7. It's
nothing huge, just a bunch of small fixes of reported problems,
including:
- lots of tiny iio driver fixes
- nvmem driver fixex
- binder UAF bugfix
- uio driver crash fix
- other small fixes
All of these have been in linux-next this week with no reported
problems"
* tag 'char-misc-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
VMCI: Fix use-after-free when removing resource in vmci_resource_remove()
Drivers: hv: vmbus: Fix rescind handling in uio_hv_generic
uio_hv_generic: Fix kernel NULL pointer dereference in hv_uio_rescind
misc: keba: Fix sysfs group creation
dt-bindings: nvmem: Use soc-nvmem node name instead of nvmem
nvmem: Fix return type of devm_nvmem_device_get() in kerneldoc
nvmem: u-boot-env: error if NVMEM device is too small
misc: fastrpc: Fix double free of 'buf' in error path
binder: fix UAF caused by offsets overwrite
iio: imu: inv_mpu6050: fix interrupt status read for old buggy chips
iio: adc: ad7173: fix GPIO device info
iio: adc: ad7124: fix DT configuration parsing
iio: adc: ad_sigma_delta: fix irq_flags on irq request
iio: adc: ads1119: Fix IRQ flags
iio: fix scale application in iio_convert_raw_to_processed_unlocked
iio: adc: ad7124: fix config comparison
iio: adc: ad7124: fix chip ID mismatch
iio: adc: ad7173: Fix incorrect compatible string
iio: buffer-dmaengine: fix releasing dma channel on error
iio: adc: ad7606: remove frstdata check for serial mode
...
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A pile of Qualcomm clk driver fixes with two main themes: the alpha
PLL driver and shared RCGs, and one fix for the Starfive JH7110 SoC.
- The Alpha PLL clk_ops had multiple problems around setting rates.
There are a handful of patches here that fix masks and skip
enabling the clk from set_rate() when the PLL is disabled. The PLLs
are crucial to operation of the system as almost all frequencies in
the system are derived from them.
- Parking shared RCGs at a slow always on clk at registration time
breaks stuff.
USB host mode can't handle such a slow frequency and the serial
console gets all garbled when the UART clk is handed over to the
kernel. There's a few patches that don't use the shared clk_ops for
the UART clks and another one to skip parking the USB clk at
registration time.
- The Starfive PLL driver used for the CPU was busted causing cpufreq
to fail because the clk didn't change to a safe parent during
set_rate().
The fix is to register a notifier and switch to a safe parent so
the PLL can change rate in a glitch free manner"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc-sc8280xp: don't use parking clk_ops for QUPs
clk: starfive: jh7110-sys: Add notifier for PLL0 clock
clk: qcom: gcc-sm8650: Don't use shared clk_ops for QUPs
clk: qcom: gcc-sm8550: Don't park the USB RCG at registration time
clk: qcom: gcc-sm8550: Don't use parking clk_ops for QUPs
clk: qcom: gcc-x1e80100: Don't use parking clk_ops for QUPs
clk: qcom: ipq9574: Update the alpha PLL type for GPLLs
clk: qcom: gcc-x1e80100: Fix USB 0 and 1 PHY GDSC pwrsts flags
clk: qcom: clk-alpha-pll: Update set_rate for Zonda PLL
clk: qcom: clk-alpha-pll: Fix zonda set_rate failure when PLL is disabled
clk: qcom: clk-alpha-pll: Fix the trion pll postdiv set rate API
clk: qcom: clk-alpha-pll: Fix the pll post div mask
Merge tag 'pinctrl-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"A single fix for Qualcomm laptops that are affected by
missing wakeup IRQs"
* tag 'pinctrl-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: x1e80100: Bypass PDC wakeup parent for now
Merge tag 'linux_kselftest-kunit-fixes-6.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
PullKUnit fix from Shuah Khan:
"Fix to a missing function parameter warning found during documentation
build in linux-next"
* tag 'linux_kselftest-kunit-fixes-6.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Fix missing kerneldoc comment
Merge tag 'pci-v6.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Unregister platform devices for child nodes when stopping a PCI
device, even if the PCI core has already cleared the OF_POPULATED bit
and of_platform_depopulate() doesn't do anything (Bartosz
Golaszewski)
- Rescan the bus from a separate thread so we don't deadlock when
triggering rescan from sysfs (Bartosz Golaszewski)
* tag 'pci-v6.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/pwrctl: Rescan bus on a separate thread
PCI: Don't rely on of_platform_depopulate() for reused OF-nodes
Merge tag 'v6.11-rc6-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix potential mount hang
- fix retry problem in two types of compound operations
- important netfs integration fix in SMB1 read paths
- fix potential uninitialized zero point of inode
- minor patch to improve debugging for potential crediting problems
* tag 'v6.11-rc6-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
netfs, cifs: Improve some debugging bits
cifs: Fix SMB1 readv/writev callback in the same way as SMB2/3
cifs: Fix zero_point init on inode initialisation
smb: client: fix double put of @cfile in smb2_set_path_size()
smb: client: fix double put of @cfile in smb2_rename_path()
smb: client: fix hang in wait_for_response() for negproto
KVM: x86: don't fall through case statements without annotations
clang warns on this because it has an unannotated fall-through between
cases:
arch/x86/kvm/x86.c:4819:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
and while we could annotate it as a fallthrough, the proper fix is to
just add the break for this case, instead of falling through to the
default case and the break there.
gcc also has that warning, but it looks like gcc only warns for the
cases where they fall through to "real code", rather than to just a
break. Odd.
Fixes: d30d9ee94cc0 ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix the arm64 usage of ftrace_graph_ret_addr() to pass the
&state->graph_idx pointer instead of NULL, otherwise this function
just returns early"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: stacktrace: fix the usage of ftrace_graph_ret_addr()
Merge tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A revert for the mmap() change that ties the allocation range to the
hint adress, as what we tried to do ended up regressing on other
userspace workloads.
- A fix to avoid a kernel memory leak when emulating misaligned
accesses from userspace.
- A Kconfig fix for toolchain vector detection, which now correctly
detects vector support on toolchains where the V extension depends on
the M extension.
- A fix to avoid failing the linear mapping bootmem bounds check on
NOMMU systems.
- A fix for early alternatives on relocatable kernels.
* tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix RISCV_ALTERNATIVE_EARLY
riscv: Do not restrict memory size because of linear mapping on nommu
riscv: Fix toolchain vector detection
riscv: misaligned: Restrict user access to kernel memory
riscv: mm: Do not restrict mmap address based on hint
riscv: selftests: Remove mmap hint address checks
Revert "RISC-V: mm: Document mmap changes"
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm fixes from Paolo Bonzini:
"Many small fixes that accumulated while I was on vacation...
- Fixup missed comments from the REMOVED_SPTE => FROZEN_SPTE rename
- Ensure a root is successfully loaded when pre-faulting SPTEs
- Grab kvm->srcu when handling KVM_SET_VCPU_EVENTS to guard against
accessing memslots if toggling SMM happens to force a VM-Exit
- Emulate MSR_{FS,GS}_BASE on SVM even though interception is always
disabled, so that KVM does the right thing if KVM's emulator
encounters {RD,WR}MSR
- Explicitly clear BUS_LOCK_DETECT from KVM's caps on AMD, as KVM
doesn't yet virtualize BUS_LOCK_DETECT on AMD
- Cleanup the help message for CONFIG_KVM_AMD_SEV, and call out that
KVM now supports SEV-SNP too
- Specialize return value of
KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM), based on VM type
- Remove unnecessary dependency on CONFIG_HIGH_RES_TIMERS
- Note an RCU quiescent state on guest exit. This avoids a call to
rcu_core() if there was a grace period request while guest was
running"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Remove HIGH_RES_TIMERS dependency
kvm: Note an RCU quiescent state on guest exit
KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM
KVM: SEV: Update KVM_AMD_SEV Kconfig entry and mention SEV-SNP
KVM: SVM: Don't advertise Bus Lock Detect to guest if SVM support is missing
KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE
KVM: x86: Acquire kvm->srcu when handling KVM_SET_VCPU_EVENTS
KVM: x86/mmu: Check that root is valid/loaded when pre-faulting SPTEs
KVM: x86/mmu: Fixup comments missed by the REMOVED_SPTE=>FROZEN_SPTE rename
Merge tag 'pm-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix an incorrect warning emitted by the amd-pstate driver on
processors that don't support X86_FEATURE_CPPC (Gautham Shenoy)"
* tag 'pm-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Remove warning for X86_FEATURE_CPPC on certain Zen models
Merge tag 'block-6.11-20240906' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Mostly just some fixlets for NVMe, but also a bug fix for the ublk
driver and an integrity fix"
* tag 'block-6.11-20240906' of git://git.kernel.dk/linux:
bio-integrity: don't restrict the size of integrity metadata
ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery()
nvmet: Identify-Active Namespace ID List command should reject invalid nsid
nvme: set BLK_FEAT_ZONED for ZNS multipath disks
nvme-pci: Add sleep quirk for Samsung 990 Evo
nvme-pci: allocate tagset on reset if necessary
nvmet-tcp: fix kernel crash if commands allocation fails
nvme: use better description for async reset reason
nvmet: Make nvmet_debugfs static
Merge tag 'sound-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 6.11, at least for this level of amount.
In addition to the usual HD-audio quirks, there are more changes in
ASoC, but all look small and device-specific fixes, and nothing stands
out. The only slightly big change is sunxi I2S fix, which looks quite
safe to apply, too"
* tag 'sound-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek - Fix inactive headset mic jack for ASUS Vivobook 15 X1504VAP
ALSA: hda/realtek: Support mute LED on HP Laptop 14-dq2xxx
ALSA: hda/realtek: Enable Mute Led for HP Victus 15-fb1xxx
ALSA: hda/realtek: extend quirks for Clevo V5[46]0
ASoC: codecs: lpass-va-macro: set the default codec version for sm8250
ALSA: hda: add HDMI codec ID for Intel PTL
ALSA: hda/realtek: add patch for internal mic in Lenovo V145
ASoC: sunxi: sun4i-i2s: fix LRCLK polarity in i2s mode
ASoC: amd: yc: Add a quirk for MSI Bravo 17 (D7VEK)
ASoC: mediatek: mt8188-mt6359: Modify key
ASoc: SOF: topology: Clear SOF link platform name upon unload
ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices
ASoC: SOF: ipc: replace "enum sof_comp_type" field with "uint32_t"
ASoC: fix module autoloading
ASoC: tda7419: fix module autoloading
ASoC: google: fix module autoloading
ASoC: intel: fix module autoloading
ASoC: tegra: Fix CBB error during probe()
ASoC: dapm: Fix UAF for snd_soc_pcm_runtime object
ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict
...
Merge tag 'mmc-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Apply SD quirks earlier during probe so they become relevant
MMC host:
- cqhci: Fix checking of CQHCI_HALT state
- dw_mmc: Fix IDMAC operation with pages bigger than 4K
- sdhci-of-aspeed: Fix module autoloading"
* tag 'mmc-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: cqhci: Fix checking of CQHCI_HALT state
mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K
mmc: sdhci-of-aspeed: fix module autoloading
mmc: core: apply SD quirks earlier during probe
Merge tag 'gpio-fixes-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an OF node reference leak in gpio-rockchip
- add the missing module device table to gpio-modepin
* tag 'gpio-fixes-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: modepin: Enable module autoloading
gpio: rockchip: fix OF node leak in probe()
Merge tag 'pmdomain-v6.11-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fix from Ulf Hansson:
- Fix support for required OPPs for multiple PM domains
* tag 'pmdomain-v6.11-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
OPP: Fix support for required OPPs for multiple PM domains
Merge tag 'pwm/for-6.11-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"Fix an off-by-one in the stm32 driver.
Hardware engineers tend to start counting at 1 while the software guys
usually start with 0. This isn't so nice because that results in
drivers where pwm device #2 needs to use the hardware registers with
index 3.
This was noticed by Fabrice Gasnier.
A small patch fixing that mismatch is the only change included here"
* tag 'pwm/for-6.11-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: stm32: Use the right CCxNP bit in stm32_pwm_enable()
Merge tag 'drm-fixes-2024-09-06' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"This has a fair few patches in it, but I reviewed them all and they
seem like real things, amdgpu, i915 and xe each have a bunch of fixes
for various things, then there is a some bridge suspend/resume
ordering fixes for a recent rework, and then some single driver
changes in a few others.
Nothing looks too serious, hopefully next week is quiet.
amdgpu:
- IPS workaround
- Fix compatibility with older MES firmware
- Fix CPU spikes when clearing VRAM
- Backlight fix
- PMO fix
- Revert SWSMU change to fix regression
i915:
- Do not attempt to load the GSC multiple times
- Fix readout degamma_lut mismatch on ilk/snb
- Mark debug_fence_init_onstack() with __maybe_unused
- fence: Mark debug_fence_free() with __maybe_unused
- display: Add mechanism to use sink model when applying quirk
- display: Increase Fast Wake Sync length as a quirk
komeda:
- zpos normalization fix
nouveau:
- incorrect register fix
imagination:
- memory leak fix
bridge:
- hdmi/bridge rework fixes
panthor:
- cache coherency fix
- hi priority access fix
panel:
- change of compatible string
fbdev:
- deferred-io init with no struct page fix"
* tag 'drm-fixes-2024-09-06' of https://gitlab.freedesktop.org/drm/kernel: (29 commits)
Revert "drm/amdgpu: align pp_power_profile_mode with kernel docs"
drm/fbdev-dma: Only install deferred I/O if necessary
drm/panthor: flush FW AS caches in slow reset path
drm: panel: nv3052c: Correct WL-355608-A8 panel compatible
dt-bindings: display: panel: Rename WL-355608-A8 panel to rg35xx-*-panel
drm/panthor: Restrict high priorities on group_create
drm/xe/display: Avoid encoder_suspend at runtime suspend
drm/xe: Suspend/resume user access only during system s/r
drm/xe/display: Match i915 driver suspend/resume sequences better
drm/xe: Add missing runtime reference to wedged upon gt_reset
drm/xe/pcode: Treat pcode as per-tile rather than per-GT
drm/xe/gsc: Do not attempt to load the GSC multiple times
drm/bridge-connector: reset the HDMI connector state
drm/bridge-connector: move to DRM_DISPLAY_HELPER module
drm/display: stop depending on DRM_DISPLAY_HELPER
drm/i915/display: Increase Fast Wake Sync length as a quirk
drm/i915/display: Add mechanism to use sink model when applying quirk
drm/amd/display: Block timing sync for different signals in PMO
drm/amd/display: Lock DC and exit IPS when changing backlight
drm/amdgpu: always allocate cleared VRAM for GEM allocations
...
get_stashed_dentry() tries to optimistically retrieve a stashed dentry
from a provided location. It needs to ensure to hold rcu lock before it
dereference the stashed location to prevent UAF issues. Use
rcu_dereference() instead of READ_ONCE() it's effectively equivalent
with some lockdep bells and whistles and it communicates clearly that
this expects rcu protection.
Merge tag 'asoc-fix-v6.11-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.11
A larger set of fixes than I'd like at this point, but mainly due to
people working on fixing module autoloading by adding missing exports of
ID tables rather than anything particularly concerning. There are some
other runtime fixes and quirks, and a tweak to the ABI definition for
SOF which ensures that a struct layout doesn't vary depending on the
architecture of the host.