ghsa-cgvj-gwgx-p5f7
Vulnerability from github
Published
2024-07-16 12:30
Modified
2024-08-21 18:31
Details

In the Linux kernel, the following vulnerability has been resolved:

mm: vmscan: remove deadlock due to throttling failing to make progress

A soft lockup bug in kcompactd was reported in a private bugzilla with the following visible in dmesg;

watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]

The machine had 256G of RAM with no swap and an earlier failed allocation indicated that node 0 where kcompactd was run was potentially unreclaimable;

Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes

Vlastimil Babka investigated a crash dump and found that a task migrating pages was trying to drain PCP lists;

PID: 52922 TASK: ffff969f820e5000 CPU: 19 COMMAND: "kworker/u128:3" Call Trace: __schedule schedule schedule_timeout wait_for_completion __flush_work __drain_all_pages __alloc_pages_slowpath.constprop.114 __alloc_pages alloc_migration_target migrate_pages migrate_to_node do_migrate_pages cpuset_migrate_mm_workfn process_one_work worker_thread kthread ret_from_fork

This failure is specific to CONFIG_PREEMPT=n builds. The root of the problem is that kcompact0 is not rescheduling on a CPU while a task that has isolated a large number of the pages from the LRU is waiting on kcompact0 to reschedule so the pages can be released. While shrink_inactive_list() only loops once around too_many_isolated, reclaim can continue without rescheduling if sc->skipped_deactivate == 1 which could happen if there was no file LRU and the inactive anon list was not low.

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2022-48800"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-667"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-07-16T12:15:04Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm: vmscan: remove deadlock due to throttling failing to make progress\n\nA soft lockup bug in kcompactd was reported in a private bugzilla with\nthe following visible in dmesg;\n\n  watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]\n  watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]\n  watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]\n  watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]\n\nThe machine had 256G of RAM with no swap and an earlier failed\nallocation indicated that node 0 where kcompactd was run was potentially\nunreclaimable;\n\n  Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB\n    inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB\n    mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:\n    0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB\n    kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes\n\nVlastimil Babka investigated a crash dump and found that a task\nmigrating pages was trying to drain PCP lists;\n\n  PID: 52922  TASK: ffff969f820e5000  CPU: 19  COMMAND: \"kworker/u128:3\"\n  Call Trace:\n     __schedule\n     schedule\n     schedule_timeout\n     wait_for_completion\n     __flush_work\n     __drain_all_pages\n     __alloc_pages_slowpath.constprop.114\n     __alloc_pages\n     alloc_migration_target\n     migrate_pages\n     migrate_to_node\n     do_migrate_pages\n     cpuset_migrate_mm_workfn\n     process_one_work\n     worker_thread\n     kthread\n     ret_from_fork\n\nThis failure is specific to CONFIG_PREEMPT=n builds.  The root of the\nproblem is that kcompact0 is not rescheduling on a CPU while a task that\nhas isolated a large number of the pages from the LRU is waiting on\nkcompact0 to reschedule so the pages can be released.  While\nshrink_inactive_list() only loops once around too_many_isolated, reclaim\ncan continue without rescheduling if sc-\u003eskipped_deactivate == 1 which\ncould happen if there was no file LRU and the inactive anon list was not\nlow.",
  "id": "GHSA-cgvj-gwgx-p5f7",
  "modified": "2024-08-21T18:31:27Z",
  "published": "2024-07-16T12:30:40Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2022-48800"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/3980cff6349687f73d5109f156f23cb261c24164"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/b485c6f1f9f54b81443efda5f3d8a5036ba2cd91"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.