ghsa-r8fr-h8q4-2f98
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
userfaultfd: fix checks for huge PMDs
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different ways depending on kernel version:
- The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit the right two race windows) - I've tested this in a kernel build with some extra mdelay() calls. See the commit message for a description of the race scenario. On older kernels (before 6.5), I think the same bug can even theoretically lead to accessing transhuge page contents as a page table if you hit the right 5 narrow race windows (I haven't tested this case).
- As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for detecting PMDs that don't point to page tables. On older kernels (before 6.5), you'd just have to win a single fairly wide race to hit this. I've tested this on 6.1 stable by racing migration (with a mdelay() patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86 VM, that causes a kernel oops in ptlock_ptr().
- On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed to yank page tables out from under us (though I haven't tested that), so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one fix for bug 3), so that the first fix can be backported to kernels affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
I have experimentally verified this in a kernel with extra mdelay() calls; the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow pte_offset_map_lock to fail"), this can't lead to anything worse than a BUG_ON(), since the page table access helpers are actually designed to deal with page tables concurrently disappearing; but on older kernels (<=6.4), I think we could probably theoretically race past the two BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs (in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a PMD that contains a migration entry (which just requires winning a single, fairly wide race), it will pass the PMD to pte_offset_map_lock(), which assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will crash or maybe worse if there is no "struct page" for the address bits in the migration entry PMD - I think at least on X86 there usually is no corresponding "struct page" thanks to the PTE inversion mitigation, amd64 looks different).
If that didn't crash, the kernel would next try to write a PTE into what it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge() before __pte_alloc() - that's redundant, we're going to have to check for that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
{ "affected": [], "aliases": [ "CVE-2024-46787" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-09-18T08:15:05Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nuserfaultfd: fix checks for huge PMDs\n\nPatch series \"userfaultfd: fix races around pmd_trans_huge() check\", v2.\n\nThe pmd_trans_huge() code in mfill_atomic() is wrong in three different\nways depending on kernel version:\n\n1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit\n the right two race windows) - I\u0027ve tested this in a kernel build with\n some extra mdelay() calls. See the commit message for a description\n of the race scenario.\n On older kernels (before 6.5), I think the same bug can even\n theoretically lead to accessing transhuge page contents as a page table\n if you hit the right 5 narrow race windows (I haven\u0027t tested this case).\n2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for\n detecting PMDs that don\u0027t point to page tables.\n On older kernels (before 6.5), you\u0027d just have to win a single fairly\n wide race to hit this.\n I\u0027ve tested this on 6.1 stable by racing migration (with a mdelay()\n patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86\n VM, that causes a kernel oops in ptlock_ptr().\n3. On newer kernels (\u003e=6.5), for shmem mappings, khugepaged is allowed\n to yank page tables out from under us (though I haven\u0027t tested that),\n so I think the BUG_ON() checks in mfill_atomic() are just wrong.\n\nI decided to write two separate fixes for these (one fix for bugs 1+2, one\nfix for bug 3), so that the first fix can be backported to kernels\naffected by bugs 1+2.\n\n\nThis patch (of 2):\n\nThis fixes two issues.\n\nI discovered that the following race can occur:\n\n mfill_atomic other thread\n ============ ============\n \u003czap PMD\u003e\n pmdp_get_lockless() [reads none pmd]\n \u003cbail if trans_huge\u003e\n \u003cif none:\u003e\n \u003cpagefault creates transhuge zeropage\u003e\n __pte_alloc [no-op]\n \u003czap PMD\u003e\n \u003cbail if pmd_trans_huge(*dst_pmd)\u003e\n BUG_ON(pmd_none(*dst_pmd))\n\nI have experimentally verified this in a kernel with extra mdelay() calls;\nthe BUG_ON(pmd_none(*dst_pmd)) triggers.\n\nOn kernels newer than commit 0d940a9b270b (\"mm/pgtable: allow\npte_offset_map[_lock]() to fail\"), this can\u0027t lead to anything worse than\na BUG_ON(), since the page table access helpers are actually designed to\ndeal with page tables concurrently disappearing; but on older kernels\n(\u003c=6.4), I think we could probably theoretically race past the two\nBUG_ON() checks and end up treating a hugepage as a page table.\n\nThe second issue is that, as Qi Zheng pointed out, there are other types\nof huge PMDs that pmd_trans_huge() can\u0027t catch: devmap PMDs and swap PMDs\n(in particular, migration PMDs).\n\nOn \u003c=6.4, this is worse than the first issue: If mfill_atomic() runs on a\nPMD that contains a migration entry (which just requires winning a single,\nfairly wide race), it will pass the PMD to pte_offset_map_lock(), which\nassumes that the PMD points to a page table.\n\nBreakage follows: First, the kernel tries to take the PTE lock (which will\ncrash or maybe worse if there is no \"struct page\" for the address bits in\nthe migration entry PMD - I think at least on X86 there usually is no\ncorresponding \"struct page\" thanks to the PTE inversion mitigation, amd64\nlooks different).\n\nIf that didn\u0027t crash, the kernel would next try to write a PTE into what\nit wrongly thinks is a page table.\n\nAs part of fixing these issues, get rid of the check for pmd_trans_huge()\nbefore __pte_alloc() - that\u0027s redundant, we\u0027re going to have to check for\nthat after the __pte_alloc() anyway.\n\nBackport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.", "id": "GHSA-r8fr-h8q4-2f98", "modified": "2024-11-20T18:32:15Z", "published": "2024-09-18T09:30:37Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-46787" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/3c6b4bcf37845c9359aed926324bed66bdd2448d" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/98cc18b1b71e23fe81a5194ed432b20c2d81a01a" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.