fkie_cve-2025-40007
Vulnerability from fkie_nvd
Published
2025-10-20 16:15
Modified
2025-10-21 19:31
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
netfs: fix reference leak
Commit 20d72b00ca81 ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1. The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()). That works most of the
time if all goes well.
However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation(). The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will
lead to an eventual completion and which do not is hard to see:
- Some functions like netfs_create_write_req() allocate a request, but
will never submit any I/O.
- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
and then netfs_put_request(); however, netfs_unbuffered_read() can
also fail early before submitting the I/O request, therefore another
netfs_put_request() call must be added there.
A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure. But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.
I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further. Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE(). It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).
All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.
References
Impacted products
| Vendor | Product | Version |
|---|
{
"cveTags": [],
"descriptions": [
{
"lang": "en",
"value": "In the Linux kernel, the following vulnerability has been resolved:\n\nnetfs: fix reference leak\n\nCommit 20d72b00ca81 (\"netfs: Fix the request\u0027s work item to not\nrequire a ref\") modified netfs_alloc_request() to initialize the\nreference counter to 2 instead of 1. The rationale was that the\nrequet\u0027s \"work\" would release the second reference after completion\n(via netfs_{read,write}_collection_worker()). That works most of the\ntime if all goes well.\n\nHowever, it leaks this additional reference if the request is released\nbefore the I/O operation has been submitted: the error code path only\ndecrements the reference counter once and the work item will never be\nqueued because there will never be a completion.\n\nThis has caused outages of our whole server cluster today because\ntasks were blocked in netfs_wait_for_outstanding_io(), leading to\ndeadlocks in Ceph (another bug that I will address soon in another\npatch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call\nwhich failed in fscache_begin_write_operation(). The leaked\nnetfs_io_request was never completed, leaving `netfs_inode.io_count`\nwith a positive value forever.\n\nAll of this is super-fragile code. Finding out which code paths will\nlead to an eventual completion and which do not is hard to see:\n\n- Some functions like netfs_create_write_req() allocate a request, but\n will never submit any I/O.\n\n- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()\n and then netfs_put_request(); however, netfs_unbuffered_read() can\n also fail early before submitting the I/O request, therefore another\n netfs_put_request() call must be added there.\n\nA rule of thumb is that functions that return a `netfs_io_request` do\nnot submit I/O, and all of their callers must be checked.\n\nFor my taste, the whole netfs code needs an overhaul to make reference\ncounting easier to understand and less fragile \u0026 obscure. But to fix\nthis bug here and now and produce a patch that is adequate for a\nstable backport, I tried a minimal approach that quickly frees the\nrequest object upon early failure.\n\nI decided against adding a second netfs_put_request() each time\nbecause that would cause code duplication which obscures the code\nfurther. Instead, I added the function netfs_put_failed_request()\nwhich frees such a failed request synchronously under the assumption\nthat the reference count is exactly 2 (as initially set by\nnetfs_alloc_request() and never touched), verified by a\nWARN_ON_ONCE(). It then deinitializes the request object (without\ngoing through the \"cleanup_work\" indirection) and frees the allocation\n(with RCU protection to protect against concurrent access by\nnetfs_requests_seq_start()).\n\nAll code paths that fail early have been changed to call\nnetfs_put_failed_request() instead of netfs_put_request().\nAdditionally, I have added a netfs_put_request() call to\nnetfs_unbuffered_read() as explained above because the\nnetfs_put_failed_request() approach does not work there."
}
],
"id": "CVE-2025-40007",
"lastModified": "2025-10-21T19:31:25.450",
"metrics": {},
"published": "2025-10-20T16:15:37.357",
"references": [
{
"source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"url": "https://git.kernel.org/stable/c/4d428dca252c858bfac691c31fa95d26cd008706"
},
{
"source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"url": "https://git.kernel.org/stable/c/8df142e93098b4531fadb5dfcf93087649f570b3"
}
],
"sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"vulnStatus": "Awaiting Analysis"
}
Loading…
Loading…
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…