fkie_cve-2025-40007
Vulnerability from fkie_nvd
Published
2025-10-20 16:15
Modified
2025-10-21 19:31
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved: netfs: fix reference leak Commit 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") modified netfs_alloc_request() to initialize the reference counter to 2 instead of 1. The rationale was that the requet's "work" would release the second reference after completion (via netfs_{read,write}_collection_worker()). That works most of the time if all goes well. However, it leaks this additional reference if the request is released before the I/O operation has been submitted: the error code path only decrements the reference counter once and the work item will never be queued because there will never be a completion. This has caused outages of our whole server cluster today because tasks were blocked in netfs_wait_for_outstanding_io(), leading to deadlocks in Ceph (another bug that I will address soon in another patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call which failed in fscache_begin_write_operation(). The leaked netfs_io_request was never completed, leaving `netfs_inode.io_count` with a positive value forever. All of this is super-fragile code. Finding out which code paths will lead to an eventual completion and which do not is hard to see: - Some functions like netfs_create_write_req() allocate a request, but will never submit any I/O. - netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read() and then netfs_put_request(); however, netfs_unbuffered_read() can also fail early before submitting the I/O request, therefore another netfs_put_request() call must be added there. A rule of thumb is that functions that return a `netfs_io_request` do not submit I/O, and all of their callers must be checked. For my taste, the whole netfs code needs an overhaul to make reference counting easier to understand and less fragile & obscure. But to fix this bug here and now and produce a patch that is adequate for a stable backport, I tried a minimal approach that quickly frees the request object upon early failure. I decided against adding a second netfs_put_request() each time because that would cause code duplication which obscures the code further. Instead, I added the function netfs_put_failed_request() which frees such a failed request synchronously under the assumption that the reference count is exactly 2 (as initially set by netfs_alloc_request() and never touched), verified by a WARN_ON_ONCE(). It then deinitializes the request object (without going through the "cleanup_work" indirection) and frees the allocation (with RCU protection to protect against concurrent access by netfs_requests_seq_start()). All code paths that fail early have been changed to call netfs_put_failed_request() instead of netfs_put_request(). Additionally, I have added a netfs_put_request() call to netfs_unbuffered_read() as explained above because the netfs_put_failed_request() approach does not work there.
Impacted products
Vendor Product Version



{
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nnetfs: fix reference leak\n\nCommit 20d72b00ca81 (\"netfs: Fix the request\u0027s work item to not\nrequire a ref\") modified netfs_alloc_request() to initialize the\nreference counter to 2 instead of 1.  The rationale was that the\nrequet\u0027s \"work\" would release the second reference after completion\n(via netfs_{read,write}_collection_worker()).  That works most of the\ntime if all goes well.\n\nHowever, it leaks this additional reference if the request is released\nbefore the I/O operation has been submitted: the error code path only\ndecrements the reference counter once and the work item will never be\nqueued because there will never be a completion.\n\nThis has caused outages of our whole server cluster today because\ntasks were blocked in netfs_wait_for_outstanding_io(), leading to\ndeadlocks in Ceph (another bug that I will address soon in another\npatch).  This was caused by a netfs_pgpriv2_begin_copy_to_cache() call\nwhich failed in fscache_begin_write_operation().  The leaked\nnetfs_io_request was never completed, leaving `netfs_inode.io_count`\nwith a positive value forever.\n\nAll of this is super-fragile code.  Finding out which code paths will\nlead to an eventual completion and which do not is hard to see:\n\n- Some functions like netfs_create_write_req() allocate a request, but\n  will never submit any I/O.\n\n- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()\n  and then netfs_put_request(); however, netfs_unbuffered_read() can\n  also fail early before submitting the I/O request, therefore another\n  netfs_put_request() call must be added there.\n\nA rule of thumb is that functions that return a `netfs_io_request` do\nnot submit I/O, and all of their callers must be checked.\n\nFor my taste, the whole netfs code needs an overhaul to make reference\ncounting easier to understand and less fragile \u0026 obscure.  But to fix\nthis bug here and now and produce a patch that is adequate for a\nstable backport, I tried a minimal approach that quickly frees the\nrequest object upon early failure.\n\nI decided against adding a second netfs_put_request() each time\nbecause that would cause code duplication which obscures the code\nfurther.  Instead, I added the function netfs_put_failed_request()\nwhich frees such a failed request synchronously under the assumption\nthat the reference count is exactly 2 (as initially set by\nnetfs_alloc_request() and never touched), verified by a\nWARN_ON_ONCE().  It then deinitializes the request object (without\ngoing through the \"cleanup_work\" indirection) and frees the allocation\n(with RCU protection to protect against concurrent access by\nnetfs_requests_seq_start()).\n\nAll code paths that fail early have been changed to call\nnetfs_put_failed_request() instead of netfs_put_request().\nAdditionally, I have added a netfs_put_request() call to\nnetfs_unbuffered_read() as explained above because the\nnetfs_put_failed_request() approach does not work there."
    }
  ],
  "id": "CVE-2025-40007",
  "lastModified": "2025-10-21T19:31:25.450",
  "metrics": {},
  "published": "2025-10-20T16:15:37.357",
  "references": [
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/4d428dca252c858bfac691c31fa95d26cd008706"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/8df142e93098b4531fadb5dfcf93087649f570b3"
    }
  ],
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "vulnStatus": "Awaiting Analysis"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…