fkie_cve-2025-68310
Vulnerability from fkie_nvd
Published
2025-12-16 16:16
Modified
2025-12-18 15:08
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved: s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump Do not block PCI config accesses through pci_cfg_access_lock() when executing the s390 variant of PCI error recovery: Acquire just device_lock() instead of pci_dev_lock() as powerpc's EEH and generig PCI AER processing do. During error recovery testing a pair of tasks was reported to be hung: mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working INFO: task kmcheck:72 blocked for more than 122 seconds. Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000 Call Trace: [<000000065256f030>] __schedule+0x2a0/0x590 [<000000065256f356>] schedule+0x36/0xe0 [<000000065256f572>] schedule_preempt_disabled+0x22/0x30 [<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8 [<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core] [<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core] [<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398 [<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0 INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds. Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000 Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: [<000000065256f030>] __schedule+0x2a0/0x590 [<000000065256f356>] schedule+0x36/0xe0 [<0000000652172e28>] pci_wait_cfg+0x80/0xe8 [<0000000652172f94>] pci_cfg_access_lock+0x74/0x88 [<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core] [<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core] [<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core] [<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168 [<0000000652513212>] devlink_health_report+0x19a/0x230 [<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core] No kernel log of the exact same error with an upstream kernel is available - but the very same deadlock situation can be constructed there, too: - task: kmcheck mlx5_unload_one() tries to acquire devlink lock while the PCI error recovery code has set pdev->block_cfg_access by way of pci_cfg_access_lock() - task: kworker mlx5_crdump_collect() tries to set block_cfg_access through pci_cfg_access_lock() while devlink_health_report() had acquired the devlink lock. A similar deadlock situation can be reproduced by requesting a crdump with > devlink health dump show pci/<BDF> reporter fw_fatal while PCI error recovery is executed on the same <BDF> physical function by mlx5_core's pci_error_handlers. On s390 this can be injected with > zpcictl --reset-fw <BDF> Tests with this patch failed to reproduce that second deadlock situation, the devlink command is rejected with "kernel answers: Permission denied" - and we get a kernel log message of: mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5 because the config read of VSC_SEMAPHORE is rejected by the underlying hardware. Two prior attempts to address this issue have been discussed and ultimately rejected [see link], with the primary argument that s390's implementation of PCI error recovery is imposing restrictions that neither powerpc's EEH nor PCI AER handling need. Tests show that PCI error recovery on s390 is running to completion even without blocking access to PCI config space.
Impacted products
Vendor Product Version



{
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\ns390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump\n\nDo not block PCI config accesses through pci_cfg_access_lock() when\nexecuting the s390 variant of PCI error recovery: Acquire just\ndevice_lock() instead of pci_dev_lock() as powerpc\u0027s EEH and\ngenerig PCI AER processing do.\n\nDuring error recovery testing a pair of tasks was reported to be hung:\n\nmlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working\nINFO: task kmcheck:72 blocked for more than 122 seconds.\n      Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1\n\"echo 0 \u003e /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\ntask:kmcheck         state:D stack:0     pid:72    tgid:72    ppid:2      flags:0x00000000\nCall Trace:\n [\u003c000000065256f030\u003e] __schedule+0x2a0/0x590\n [\u003c000000065256f356\u003e] schedule+0x36/0xe0\n [\u003c000000065256f572\u003e] schedule_preempt_disabled+0x22/0x30\n [\u003c0000000652570a94\u003e] __mutex_lock.constprop.0+0x484/0x8a8\n [\u003c000003ff800673a4\u003e] mlx5_unload_one+0x34/0x58 [mlx5_core]\n [\u003c000003ff8006745c\u003e] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]\n [\u003c0000000652556c5a\u003e] zpci_event_attempt_error_recovery+0xf2/0x398\n [\u003c0000000651b9184a\u003e] __zpci_event_error+0x23a/0x2c0\nINFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.\n      Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1\n\"echo 0 \u003e /proc/sys/kernel/hung_task_timeout_secs\" disables this message.\ntask:kworker/u1664:6 state:D stack:0     pid:1514  tgid:1514  ppid:2      flags:0x00000000\nWorkqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]\nCall Trace:\n [\u003c000000065256f030\u003e] __schedule+0x2a0/0x590\n [\u003c000000065256f356\u003e] schedule+0x36/0xe0\n [\u003c0000000652172e28\u003e] pci_wait_cfg+0x80/0xe8\n [\u003c0000000652172f94\u003e] pci_cfg_access_lock+0x74/0x88\n [\u003c000003ff800916b6\u003e] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]\n [\u003c000003ff80098824\u003e] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]\n [\u003c000003ff80074b62\u003e] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]\n [\u003c0000000652512242\u003e] devlink_health_do_dump.part.0+0x82/0x168\n [\u003c0000000652513212\u003e] devlink_health_report+0x19a/0x230\n [\u003c000003ff80075a12\u003e] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]\n\nNo kernel log of the exact same error with an upstream kernel is\navailable - but the very same deadlock situation can be constructed there,\ntoo:\n\n- task: kmcheck\n  mlx5_unload_one() tries to acquire devlink lock while the PCI error\n  recovery code has set pdev-\u003eblock_cfg_access by way of\n  pci_cfg_access_lock()\n- task: kworker\n  mlx5_crdump_collect() tries to set block_cfg_access through\n  pci_cfg_access_lock() while devlink_health_report() had acquired\n  the devlink lock.\n\nA similar deadlock situation can be reproduced by requesting a\ncrdump with\n  \u003e devlink health dump show pci/\u003cBDF\u003e reporter fw_fatal\n\nwhile PCI error recovery is executed on the same \u003cBDF\u003e physical function\nby mlx5_core\u0027s pci_error_handlers. On s390 this can be injected with\n  \u003e zpcictl --reset-fw \u003cBDF\u003e\n\nTests with this patch failed to reproduce that second deadlock situation,\nthe devlink command is rejected with \"kernel answers: Permission denied\" -\nand we get a kernel log message of:\n\nmlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5\n\nbecause the config read of VSC_SEMAPHORE is rejected by the underlying\nhardware.\n\nTwo prior attempts to address this issue have been discussed and\nultimately rejected [see link], with the primary argument that s390\u0027s\nimplementation of PCI error recovery is imposing restrictions that\nneither powerpc\u0027s EEH nor PCI AER handling need. Tests show that PCI\nerror recovery on s390 is running to completion even without blocking\naccess to PCI config space."
    }
  ],
  "id": "CVE-2025-68310",
  "lastModified": "2025-12-18T15:08:06.237",
  "metrics": {},
  "published": "2025-12-16T16:16:10.547",
  "references": [
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/0fd20f65df6aa430454a0deed8f43efa91c54835"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/3591d56ea9bfd3e7fbbe70f749bdeed689d415f9"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/54f938d9f5693af8ed586a08db4af5d9da1f0f2d"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/b63c061be622b17b495cbf78a6d5f2d4c3147f8e"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/d0df2503bc3c2be385ca2fd96585daad1870c7c5"
    }
  ],
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "vulnStatus": "Awaiting Analysis"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…