ghsa-3j9f-4r3c-964g
Vulnerability from github
Published
2024-09-18 09:30
Modified
2024-09-20 18:32
Details

In the Linux kernel, the following vulnerability has been resolved:

powerpc/qspinlock: Fix deadlock in MCS queue

If an interrupt occurs in queued_spin_lock_slowpath() after we increment qnodesp->count and before node->lock is initialized, another CPU might see stale lock values in get_tail_qnode(). If the stale lock value happens to match the lock on that CPU, then we write to the "next" pointer of the wrong qnode. This causes a deadlock as the former CPU, once it becomes the head of the MCS queue, will spin indefinitely until it's "next" pointer is set by its successor in the queue.

Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in occasional lockups similar to the following:

$ stress-ng --all 128 --vm-bytes 80% --aggressive \ --maximize --oomable --verify --syslog \ --metrics --times --timeout 5m

watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ec

The following code flow illustrates how the deadlock occurs. For the sake of brevity, assume that both locks (A and B) are contended and we call the queued_spin_lock_slowpath() function.

    CPU0                                   CPU1
    ----                                   ----

spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes ---truncated---

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-46797"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-667"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-09-18T08:15:06Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\npowerpc/qspinlock: Fix deadlock in MCS queue\n\nIf an interrupt occurs in queued_spin_lock_slowpath() after we increment\nqnodesp-\u003ecount and before node-\u003elock is initialized, another CPU might\nsee stale lock values in get_tail_qnode(). If the stale lock value happens\nto match the lock on that CPU, then we write to the \"next\" pointer of\nthe wrong qnode. This causes a deadlock as the former CPU, once it becomes\nthe head of the MCS queue, will spin indefinitely until it\u0027s \"next\" pointer\nis set by its successor in the queue.\n\nRunning stress-ng on a 16 core (16EC/16VP) shared LPAR, results in\noccasional lockups similar to the following:\n\n   $ stress-ng --all 128 --vm-bytes 80% --aggressive \\\n               --maximize --oomable --verify  --syslog \\\n               --metrics  --times  --timeout 5m\n\n   watchdog: CPU 15 Hard LOCKUP\n   ......\n   NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490\n   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90\n   Call Trace:\n    0xc000002cfffa3bf0 (unreliable)\n    _raw_spin_lock+0x6c/0x90\n    raw_spin_rq_lock_nested.part.135+0x4c/0xd0\n    sched_ttwu_pending+0x60/0x1f0\n    __flush_smp_call_function_queue+0x1dc/0x670\n    smp_ipi_demux_relaxed+0xa4/0x100\n    xive_muxed_ipi_action+0x20/0x40\n    __handle_irq_event_percpu+0x80/0x240\n    handle_irq_event_percpu+0x2c/0x80\n    handle_percpu_irq+0x84/0xd0\n    generic_handle_irq+0x54/0x80\n    __do_irq+0xac/0x210\n    __do_IRQ+0x74/0xd0\n    0x0\n    do_IRQ+0x8c/0x170\n    hardware_interrupt_common_virt+0x29c/0x2a0\n   --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490\n   ......\n   NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490\n   LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90\n   --- interrupt: 500\n    0xc0000029c1a41d00 (unreliable)\n    _raw_spin_lock+0x6c/0x90\n    futex_wake+0x100/0x260\n    do_futex+0x21c/0x2a0\n    sys_futex+0x98/0x270\n    system_call_exception+0x14c/0x2f0\n    system_call_vectored_common+0x15c/0x2ec\n\nThe following code flow illustrates how the deadlock occurs.\nFor the sake of brevity, assume that both locks (A and B) are\ncontended and we call the queued_spin_lock_slowpath() function.\n\n        CPU0                                   CPU1\n        ----                                   ----\n  spin_lock_irqsave(A)                          |\n  spin_unlock_irqrestore(A)                     |\n    spin_lock(B)                                |\n         |                                      |\n         \u25bc                                      |\n   id = qnodesp-\u003ecount++;                       |\n  (Note that nodes[0].lock == A)                |\n         |                                      |\n         \u25bc                                      |\n      Interrupt                                 |\n  (happens before \"nodes[0].lock = B\")          |\n         |                                      |\n         \u25bc                                      |\n  spin_lock_irqsave(A)                          |\n         |                                      |\n         \u25bc                                      |\n   id = qnodesp-\u003ecount++                        |\n   nodes[1].lock = A                            |\n         |                                      |\n         \u25bc                                      |\n  Tail of MCS queue                             |\n         |                             spin_lock_irqsave(A)\n         \u25bc                                      |\n  Head of MCS queue                             \u25bc\n         |                             CPU0 is previous tail\n         \u25bc                                      |\n   Spin indefinitely                            \u25bc\n  (until \"nodes[1].next != NULL\")      prev = get_tail_qnode(A, CPU0)\n                                                |\n                                                \u25bc\n                                       prev == \u0026qnodes[CPU0].nodes[0]\n                                     (as qnodes\n---truncated---",
  "id": "GHSA-3j9f-4r3c-964g",
  "modified": "2024-09-20T18:32:25Z",
  "published": "2024-09-18T09:30:38Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-46797"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/734ad0af3609464f8f93e00b6c0de1e112f44559"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d84ab6661e8d09092de9b034b016515ef9b66085"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/f06af737e4be28c0e926dc25d5f0a111da4e2987"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.