github - ghsa-xxw2-vx44-7cv6

ghsa-xxw2-vx44-7cv6

Vulnerability from github

Published

2025-05-20 18:30

Modified

2025-11-03 21:33

Details

In the Linux kernel, the following vulnerability has been resolved:

x86/mm: Eliminate window where TLB flushes may be inadvertently skipped

tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But should_flush_tlb() has a bug and suppresses the flush. Fix it by widening the window where should_flush_tlb() sends an IPI.

Long Version:

=== History ===

There were a few things leading up to this.

First, updating mm_cpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mm_cpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again.

=== Problem ===

The too-aggressive code in should_flush_tlb() strikes in this window:

// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));

next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.

    this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!

should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3() and writing to 'loaded_mm', which is a window where they should not be suppressed. Whoops.

=== Solution ===

Thankfully, the fuzzy "just about to write CR3" window is already marked with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in should_flush_tlb() is sufficient to ensure that the CPU is targeted with an IPI.

This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact.

Update the comment where LOADED_MM_SWITCHING is written since it grew yet another user.

Peter Z also raised a concern that should_flush_tlb() might not observe 'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written.

Show details on source website

JSON

To clipboard

{
  "affected": [],
  "aliases": [
    "CVE-2025-37964"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-05-20T16:15:34Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nx86/mm: Eliminate window where TLB flushes may be inadvertently skipped\n\ntl;dr: There is a window in the mm switching code where the new CR3 is\nset and the CPU should be getting TLB flushes for the new mm.  But\nshould_flush_tlb() has a bug and suppresses the flush.  Fix it by\nwidening the window where should_flush_tlb() sends an IPI.\n\nLong Version:\n\n=== History ===\n\nThere were a few things leading up to this.\n\nFirst, updating mm_cpumask() was observed to be too expensive, so it was\nmade lazier.  But being lazy caused too many unnecessary IPIs to CPUs\ndue to the now-lazy mm_cpumask().  So code was added to cull\nmm_cpumask() periodically[2].  But that culling was a bit too aggressive\nand skipped sending TLB flushes to CPUs that need them.  So here we are\nagain.\n\n=== Problem ===\n\nThe too-aggressive code in should_flush_tlb() strikes in this window:\n\n\t// Turn on IPIs for this CPU/mm combination, but only\n\t// if should_flush_tlb() agrees:\n\tcpumask_set_cpu(cpu, mm_cpumask(next));\n\n\tnext_tlb_gen = atomic64_read(\u0026next-\u003econtext.tlb_gen);\n\tchoose_new_asid(next, next_tlb_gen, \u0026new_asid, \u0026need_flush);\n\tload_new_mm_cr3(need_flush);\n\t// ^ After \u0027need_flush\u0027 is set to false, IPIs *MUST*\n\t// be sent to this CPU and not be ignored.\n\n        this_cpu_write(cpu_tlbstate.loaded_mm, next);\n\t// ^ Not until this point does should_flush_tlb()\n\t// become true!\n\nshould_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()\nand writing to \u0027loaded_mm\u0027, which is a window where they should not be\nsuppressed.  Whoops.\n\n=== Solution ===\n\nThankfully, the fuzzy \"just about to write CR3\" window is already marked\nwith loaded_mm==LOADED_MM_SWITCHING.  Simply checking for that state in\nshould_flush_tlb() is sufficient to ensure that the CPU is targeted with\nan IPI.\n\nThis will cause more TLB flush IPIs.  But the window is relatively small\nand I do not expect this to cause any kind of measurable performance\nimpact.\n\nUpdate the comment where LOADED_MM_SWITCHING is written since it grew\nyet another user.\n\nPeter Z also raised a concern that should_flush_tlb() might not observe\n\u0027loaded_mm\u0027 and \u0027is_lazy\u0027 in the same order that switch_mm_irqs_off()\nwrites them.  Add a barrier to ensure that they are observed in the\norder they are written.",
  "id": "GHSA-xxw2-vx44-7cv6",
  "modified": "2025-11-03T21:33:56Z",
  "published": "2025-05-20T18:30:57Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-37964"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/02ad4ce144bd27f71f583f667fdf3b3ba0753477"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/12f703811af043d32b1c8a30001b2fa04d5cd0ac"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/399ec9ca8fc4999e676ff89a90184ec40031cf59"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d41072906abec8bb8e01ed16afefbaa558908c89"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d87392094f96e162fa5fa5a8640d70cc0952806f"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a"
    },
    {
      "type": "WEB",
      "url": "https://lists.debian.org/debian-lts-announce/2025/08/msg00010.html"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}

CVE-2025-37964 (GCVE-0-2025-37964)

Vulnerability from cvelistv5

Published

2025-05-20 16:01

Modified

2025-11-03 19:57

Severity ?

Summary

In the Linux kernel, the following vulnerability has been resolved: x86/mm: Eliminate window where TLB flushes may be inadvertently skipped tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But should_flush_tlb() has a bug and suppresses the flush. Fix it by widening the window where should_flush_tlb() sends an IPI. Long Version: === History === There were a few things leading up to this. First, updating mm_cpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mm_cpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again. === Problem === The too-aggressive code in should_flush_tlb() strikes in this window: // Turn on IPIs for this CPU/mm combination, but only // if should_flush_tlb() agrees: cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); load_new_mm_cr3(need_flush); // ^ After 'need_flush' is set to false, IPIs *MUST* // be sent to this CPU and not be ignored. this_cpu_write(cpu_tlbstate.loaded_mm, next); // ^ Not until this point does should_flush_tlb() // become true! should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3() and writing to 'loaded_mm', which is a window where they should not be suppressed. Whoops. === Solution === Thankfully, the fuzzy "just about to write CR3" window is already marked with loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state in should_flush_tlb() is sufficient to ensure that the CPU is targeted with an IPI. This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact. Update the comment where LOADED_MM_SWITCHING is written since it grew yet another user. Peter Z also raised a concern that should_flush_tlb() might not observe 'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written.

References

URL

Tags

	https://git.kernel.org/stable/c/12f703811af043d32b1c8a30001b2fa04d5cd0ac
	https://git.kernel.org/stable/c/02ad4ce144bd27f71f583f667fdf3b3ba0753477
	https://git.kernel.org/stable/c/d41072906abec8bb8e01ed16afefbaa558908c89
	https://git.kernel.org/stable/c/d87392094f96e162fa5fa5a8640d70cc0952806f
	https://git.kernel.org/stable/c/399ec9ca8fc4999e676ff89a90184ec40031cf59
	https://git.kernel.org/stable/c/fea4e317f9e7e1f449ce90dedc27a2d2a95bee5a

Impacted products

Vendor

Product

Version

Linux

Version: 848b5815177582de0e1d0118725378e0fbadca20
Version: b47002ed65ade940839b7f439ff4a194e7d5ec28
Version: a04fe3bfc71e28009e20357b79df1e8ef7c9d600
Version: 3dbe889a1b829b4c07e0836ff853fe649e51ce4f
Version: 6db2526c1d694c91c6e05e2f186c085e9460f202
Version: 6db2526c1d694c91c6e05e2f186c085e9460f202
Version: d1347977661342cb09a304a17701eb2d4aa21dec

Linux

Version: 6.14

Show details on NVD website