Vulnerability-Lookup

GHSA-59XX-49PJ-8GX8

Vulnerability from github – Published: 2024-11-26 00:33 – Updated: 2024-11-26 00:33

Details

In the Linux kernel, the following vulnerability has been resolved:

nvme: make keep-alive synchronous operation

The nvme keep-alive operation, which executes at a periodic interval, could potentially sneak in while shutting down a fabric controller. This may lead to a race between the fabric controller admin queue destroy code path (invoked while shutting down controller) and hw/hctx queue dispatcher called from the nvme keep-alive async request queuing operation. This race could lead to the kernel crash shown below:

Call Trace: autoremove_wake_function+0x0/0xbc (unreliable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18

While shutting down fabric controller, if nvme keep-alive request sneaks in then it would be flushed off. The nvme_keep_alive_end_io function is then invoked to handle the end of the keep-alive operation which decrements the admin->q_usage_counter and assuming this is the last/only request in the admin queue then the admin->q_usage_counter becomes zero. If that happens then blk-mq destroy queue operation (blk_mq_destroy_ queue()) which could be potentially running simultaneously on another cpu (as this is the controller shutdown code path) would forward progress and deletes the admin queue. So, now from this point onward we are not supposed to access the admin queue resources. However the issue here's that the nvme keep-alive thread running hw/hctx queue dispatch operation hasn't yet finished its work and so it could still potentially access the admin queue resource while the admin queue had been already deleted and that causes the above crash.

This fix helps avoid the observed crash by implementing keep-alive as a synchronous operation so that we decrement admin->q_usage_counter only after keep-alive command finished its execution and returns the command status back up to its caller (blk_execute_rq()). This would ensure that fabric shutdown code path doesn't destroy the fabric admin queue until keep-alive request finished execution and also keep-alive thread is not running hw/hctx queue dispatch operation.

Show details on source website

JSON

To clipboard

{
  "affected": [],
  "aliases": [
    "CVE-2024-53102"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-11-25T22:15:17Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nnvme: make keep-alive synchronous operation\n\nThe nvme keep-alive operation, which executes at a periodic interval,\ncould potentially sneak in while shutting down a fabric controller.\nThis may lead to a race between the fabric controller admin queue\ndestroy code path (invoked while shutting down controller) and hw/hctx\nqueue dispatcher called from the nvme keep-alive async request queuing\noperation. This race could lead to the kernel crash shown below:\n\nCall Trace:\n    autoremove_wake_function+0x0/0xbc (unreliable)\n    __blk_mq_sched_dispatch_requests+0x114/0x24c\n    blk_mq_sched_dispatch_requests+0x44/0x84\n    blk_mq_run_hw_queue+0x140/0x220\n    nvme_keep_alive_work+0xc8/0x19c [nvme_core]\n    process_one_work+0x200/0x4e0\n    worker_thread+0x340/0x504\n    kthread+0x138/0x140\n    start_kernel_thread+0x14/0x18\n\nWhile shutting down fabric controller, if nvme keep-alive request sneaks\nin then it would be flushed off. The nvme_keep_alive_end_io function is\nthen invoked to handle the end of the keep-alive operation which\ndecrements the admin-\u003eq_usage_counter and assuming this is the last/only\nrequest in the admin queue then the admin-\u003eq_usage_counter becomes zero.\nIf that happens then blk-mq destroy queue operation (blk_mq_destroy_\nqueue()) which could be potentially running simultaneously on another\ncpu (as this is the controller shutdown code path) would forward\nprogress and deletes the admin queue. So, now from this point onward\nwe are not supposed to access the admin queue resources. However the\nissue here\u0027s that the nvme keep-alive thread running hw/hctx queue\ndispatch operation hasn\u0027t yet finished its work and so it could still\npotentially access the admin queue resource while the admin queue had\nbeen already deleted and that causes the above crash.\n\nThis fix helps avoid the observed crash by implementing keep-alive as a\nsynchronous operation so that we decrement admin-\u003eq_usage_counter only\nafter keep-alive command finished its execution and returns the command\nstatus back up to its caller (blk_execute_rq()). This would ensure that\nfabric shutdown code path doesn\u0027t destroy the fabric admin queue until\nkeep-alive request finished execution and also keep-alive thread is not\nrunning hw/hctx queue dispatch operation.",
  "id": "GHSA-59xx-49pj-8gx8",
  "modified": "2024-11-26T00:33:31Z",
  "published": "2024-11-26T00:33:31Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-53102"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/1a1bcca5c9efd2c72c8d2fcbadf2d673cceb2ea7"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/afa229465399f89d3af9d72ced865144c9748846"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/ccc1d82dfaad0ad27d21139da22e57add73d2a5e"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d06923670b5a5f609603d4a9fee4dec02d38de9c"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}

Sightings

Author	Source	Type	Date

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

Action not permitted

GHSA-59XX-49PJ-8GX8

CVE-2024-53102 (GCVE-0-2024-53102)

Tags

Sightings

Nomenclature