cve-2024-53169
Vulnerability from cvelistv5
Published
2024-12-27 13:49
Modified
2024-12-27 13:49
Severity ?
EPSS score ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
nvme-fabrics: fix kernel crash while shutting down controller
The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (invoked while shutting down controller) and hw/hctx
queue dispatcher called from the nvme keep-alive async request queuing
operation. This race could lead to the kernel crash shown below:
Call Trace:
autoremove_wake_function+0x0/0xbc (unreliable)
__blk_mq_sched_dispatch_requests+0x114/0x24c
blk_mq_sched_dispatch_requests+0x44/0x84
blk_mq_run_hw_queue+0x140/0x220
nvme_keep_alive_work+0xc8/0x19c [nvme_core]
process_one_work+0x200/0x4e0
worker_thread+0x340/0x504
kthread+0x138/0x140
start_kernel_thread+0x14/0x18
While shutting down fabric controller, if nvme keep-alive request sneaks
in then it would be flushed off. The nvme_keep_alive_end_io function is
then invoked to handle the end of the keep-alive operation which
decrements the admin->q_usage_counter and assuming this is the last/only
request in the admin queue then the admin->q_usage_counter becomes zero.
If that happens then blk-mq destroy queue operation (blk_mq_destroy_
queue()) which could be potentially running simultaneously on another
cpu (as this is the controller shutdown code path) would forward
progress and deletes the admin queue. So, now from this point onward
we are not supposed to access the admin queue resources. However the
issue here's that the nvme keep-alive thread running hw/hctx queue
dispatch operation hasn't yet finished its work and so it could still
potentially access the admin queue resource while the admin queue had
been already deleted and that causes the above crash.
The above kernel crash is regression caused due to changes implemented
in commit a54a93d0e359 ("nvme: move stopping keep-alive into
nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin
g the admin queue and freeing the admin tagset so that it wouldn't sneak
in during the shutdown operation. However we removed the keep alive stop
operation from the beginning of the controller shutdown code path in commit
a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()")
and added it under nvme_uninit_ctrl() which executes very late in the
shutdown code path after the admin queue is destroyed and its tagset is
removed. So this change created the possibility of keep-alive sneaking in
and interfering with the shutdown operation and causing observed kernel
crash.
To fix the observed crash, we decided to move nvme_stop_keep_alive() from
nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure
that we don't forward progress and delete the admin queue until the keep-
alive operation is finished (if it's in-flight) or cancelled and that would
help contain the race condition explained above and hence avoid the crash.
Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of
adding nvme_stop_keep_alive() to the beginning of the controller shutdown
code path in nvme_stop_ctrl(), as was the case earlier before commit
a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"),
would help save one callsite of nvme_stop_keep_alive().
References
Impacted products
{ "containers": { "cna": { "affected": [ { "defaultStatus": "unaffected", "product": "Linux", "programFiles": [ "drivers/nvme/host/core.c" ], "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git", "vendor": "Linux", "versions": [ { "lessThan": "30794f4952decb2ec8efa42f704cac5304499a41", "status": "affected", "version": "a54a93d0e3599b05856971734e15418ac551a14c", "versionType": "git" }, { "lessThan": "5416b76a8156c1b8491f78f8a728f422104bb919", "status": "affected", "version": "a54a93d0e3599b05856971734e15418ac551a14c", "versionType": "git" }, { "lessThan": "e9869c85c81168a1275f909d5972a3fc435304be", "status": "affected", "version": "a54a93d0e3599b05856971734e15418ac551a14c", "versionType": "git" } ] }, { "defaultStatus": "affected", "product": "Linux", "programFiles": [ "drivers/nvme/host/core.c" ], "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git", "vendor": "Linux", "versions": [ { "status": "affected", "version": "6.11" }, { "lessThan": "6.11", "status": "unaffected", "version": "0", "versionType": "semver" }, { "lessThanOrEqual": "6.11.*", "status": "unaffected", "version": "6.11.11", "versionType": "semver" }, { "lessThanOrEqual": "6.12.*", "status": "unaffected", "version": "6.12.2", "versionType": "semver" }, { "lessThanOrEqual": "*", "status": "unaffected", "version": "6.13-rc1", "versionType": "original_commit_for_fix" } ] } ], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nnvme-fabrics: fix kernel crash while shutting down controller\n\nThe nvme keep-alive operation, which executes at a periodic interval,\ncould potentially sneak in while shutting down a fabric controller.\nThis may lead to a race between the fabric controller admin queue\ndestroy code path (invoked while shutting down controller) and hw/hctx\nqueue dispatcher called from the nvme keep-alive async request queuing\noperation. This race could lead to the kernel crash shown below:\n\nCall Trace:\n autoremove_wake_function+0x0/0xbc (unreliable)\n __blk_mq_sched_dispatch_requests+0x114/0x24c\n blk_mq_sched_dispatch_requests+0x44/0x84\n blk_mq_run_hw_queue+0x140/0x220\n nvme_keep_alive_work+0xc8/0x19c [nvme_core]\n process_one_work+0x200/0x4e0\n worker_thread+0x340/0x504\n kthread+0x138/0x140\n start_kernel_thread+0x14/0x18\n\nWhile shutting down fabric controller, if nvme keep-alive request sneaks\nin then it would be flushed off. The nvme_keep_alive_end_io function is\nthen invoked to handle the end of the keep-alive operation which\ndecrements the admin-\u003eq_usage_counter and assuming this is the last/only\nrequest in the admin queue then the admin-\u003eq_usage_counter becomes zero.\nIf that happens then blk-mq destroy queue operation (blk_mq_destroy_\nqueue()) which could be potentially running simultaneously on another\ncpu (as this is the controller shutdown code path) would forward\nprogress and deletes the admin queue. So, now from this point onward\nwe are not supposed to access the admin queue resources. However the\nissue here\u0027s that the nvme keep-alive thread running hw/hctx queue\ndispatch operation hasn\u0027t yet finished its work and so it could still\npotentially access the admin queue resource while the admin queue had\nbeen already deleted and that causes the above crash.\n\nThe above kernel crash is regression caused due to changes implemented\nin commit a54a93d0e359 (\"nvme: move stopping keep-alive into\nnvme_uninit_ctrl()\"). Ideally we should stop keep-alive before destroyin\ng the admin queue and freeing the admin tagset so that it wouldn\u0027t sneak\nin during the shutdown operation. However we removed the keep alive stop\noperation from the beginning of the controller shutdown code path in commit\na54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\")\nand added it under nvme_uninit_ctrl() which executes very late in the\nshutdown code path after the admin queue is destroyed and its tagset is\nremoved. So this change created the possibility of keep-alive sneaking in\nand interfering with the shutdown operation and causing observed kernel\ncrash.\n\nTo fix the observed crash, we decided to move nvme_stop_keep_alive() from\nnvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure\nthat we don\u0027t forward progress and delete the admin queue until the keep-\nalive operation is finished (if it\u0027s in-flight) or cancelled and that would\nhelp contain the race condition explained above and hence avoid the crash.\n\nMoving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of\nadding nvme_stop_keep_alive() to the beginning of the controller shutdown\ncode path in nvme_stop_ctrl(), as was the case earlier before commit\na54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\"),\nwould help save one callsite of nvme_stop_keep_alive()." } ], "providerMetadata": { "dateUpdated": "2024-12-27T13:49:14.925Z", "orgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "shortName": "Linux" }, "references": [ { "url": "https://git.kernel.org/stable/c/30794f4952decb2ec8efa42f704cac5304499a41" }, { "url": "https://git.kernel.org/stable/c/5416b76a8156c1b8491f78f8a728f422104bb919" }, { "url": "https://git.kernel.org/stable/c/e9869c85c81168a1275f909d5972a3fc435304be" } ], "title": "nvme-fabrics: fix kernel crash while shutting down controller", "x_generator": { "engine": "bippy-5f407fcff5a0" } } }, "cveMetadata": { "assignerOrgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "assignerShortName": "Linux", "cveId": "CVE-2024-53169", "datePublished": "2024-12-27T13:49:14.925Z", "dateReserved": "2024-11-19T17:17:25.005Z", "dateUpdated": "2024-12-27T13:49:14.925Z", "state": "PUBLISHED" }, "dataType": "CVE_RECORD", "dataVersion": "5.1", "vulnerability-lookup:meta": { "nvd": "{\"cve\":{\"id\":\"CVE-2024-53169\",\"sourceIdentifier\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\",\"published\":\"2024-12-27T14:15:24.057\",\"lastModified\":\"2024-12-27T14:15:24.057\",\"vulnStatus\":\"Received\",\"cveTags\":[],\"descriptions\":[{\"lang\":\"en\",\"value\":\"In the Linux kernel, the following vulnerability has been resolved:\\n\\nnvme-fabrics: fix kernel crash while shutting down controller\\n\\nThe nvme keep-alive operation, which executes at a periodic interval,\\ncould potentially sneak in while shutting down a fabric controller.\\nThis may lead to a race between the fabric controller admin queue\\ndestroy code path (invoked while shutting down controller) and hw/hctx\\nqueue dispatcher called from the nvme keep-alive async request queuing\\noperation. This race could lead to the kernel crash shown below:\\n\\nCall Trace:\\n autoremove_wake_function+0x0/0xbc (unreliable)\\n __blk_mq_sched_dispatch_requests+0x114/0x24c\\n blk_mq_sched_dispatch_requests+0x44/0x84\\n blk_mq_run_hw_queue+0x140/0x220\\n nvme_keep_alive_work+0xc8/0x19c [nvme_core]\\n process_one_work+0x200/0x4e0\\n worker_thread+0x340/0x504\\n kthread+0x138/0x140\\n start_kernel_thread+0x14/0x18\\n\\nWhile shutting down fabric controller, if nvme keep-alive request sneaks\\nin then it would be flushed off. The nvme_keep_alive_end_io function is\\nthen invoked to handle the end of the keep-alive operation which\\ndecrements the admin-\u003eq_usage_counter and assuming this is the last/only\\nrequest in the admin queue then the admin-\u003eq_usage_counter becomes zero.\\nIf that happens then blk-mq destroy queue operation (blk_mq_destroy_\\nqueue()) which could be potentially running simultaneously on another\\ncpu (as this is the controller shutdown code path) would forward\\nprogress and deletes the admin queue. So, now from this point onward\\nwe are not supposed to access the admin queue resources. However the\\nissue here\u0027s that the nvme keep-alive thread running hw/hctx queue\\ndispatch operation hasn\u0027t yet finished its work and so it could still\\npotentially access the admin queue resource while the admin queue had\\nbeen already deleted and that causes the above crash.\\n\\nThe above kernel crash is regression caused due to changes implemented\\nin commit a54a93d0e359 (\\\"nvme: move stopping keep-alive into\\nnvme_uninit_ctrl()\\\"). Ideally we should stop keep-alive before destroyin\\ng the admin queue and freeing the admin tagset so that it wouldn\u0027t sneak\\nin during the shutdown operation. However we removed the keep alive stop\\noperation from the beginning of the controller shutdown code path in commit\\na54a93d0e359 (\\\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\\\")\\nand added it under nvme_uninit_ctrl() which executes very late in the\\nshutdown code path after the admin queue is destroyed and its tagset is\\nremoved. So this change created the possibility of keep-alive sneaking in\\nand interfering with the shutdown operation and causing observed kernel\\ncrash.\\n\\nTo fix the observed crash, we decided to move nvme_stop_keep_alive() from\\nnvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure\\nthat we don\u0027t forward progress and delete the admin queue until the keep-\\nalive operation is finished (if it\u0027s in-flight) or cancelled and that would\\nhelp contain the race condition explained above and hence avoid the crash.\\n\\nMoving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of\\nadding nvme_stop_keep_alive() to the beginning of the controller shutdown\\ncode path in nvme_stop_ctrl(), as was the case earlier before commit\\na54a93d0e359 (\\\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\\\"),\\nwould help save one callsite of nvme_stop_keep_alive().\"}],\"metrics\":{},\"references\":[{\"url\":\"https://git.kernel.org/stable/c/30794f4952decb2ec8efa42f704cac5304499a41\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/5416b76a8156c1b8491f78f8a728f422104bb919\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/e9869c85c81168a1275f909d5972a3fc435304be\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"}]}}" } }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.