ghsa-78wr-fq6m-qpff
Vulnerability from github
Published
2024-09-18 09:30
Modified
2024-09-26 15:30
Details

In the Linux kernel, the following vulnerability has been resolved:

ice: protect XDP configuration with a mutex

The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler.

XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked

With an unfortunate timing, such accesses can result in a crash such as the one below:

[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604]

The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway.

There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock().

Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open

The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion.

Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_x ---truncated---

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2024-46765"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-476"
    ],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2024-09-18T08:15:04Z",
    "severity": "MODERATE"
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nice: protect XDP configuration with a mutex\n\nThe main threat to data consistency in ice_xdp() is a possible asynchronous\nPF reset. It can be triggered by a user or by TX timeout handler.\n\nXDP setup and PF reset code access the same resources in the following\nsections:\n* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked\n* ice_vsi_rebuild() for the PF VSI - not protected\n* ice_vsi_open() - already rtnl-locked\n\nWith an unfortunate timing, such accesses can result in a crash such as the\none below:\n\n[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14\n[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18\n[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms\n[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001\n[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14\n[ +0.394718] ice 0000:b1:00.0: PTP reset successful\n[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098\n[ +0.000045] #PF: supervisor read access in kernel mode\n[ +0.000023] #PF: error_code(0x0000) - not-present page\n[ +0.000023] PGD 0 P4D 0\n[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI\n[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1\n[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021\n[ +0.000036] Workqueue: ice ice_service_task [ice]\n[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]\n[...]\n[ +0.000013] Call Trace:\n[ +0.000016] \u003cTASK\u003e\n[ +0.000014] ? __die+0x1f/0x70\n[ +0.000029] ? page_fault_oops+0x171/0x4f0\n[ +0.000029] ? schedule+0x3b/0xd0\n[ +0.000027] ? exc_page_fault+0x7b/0x180\n[ +0.000022] ? asm_exc_page_fault+0x22/0x30\n[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]\n[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]\n[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]\n[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]\n[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]\n[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]\n[ +0.000145] ice_rebuild+0x18c/0x840 [ice]\n[ +0.000145] ? delay_tsc+0x4a/0xc0\n[ +0.000022] ? delay_tsc+0x92/0xc0\n[ +0.000020] ice_do_reset+0x140/0x180 [ice]\n[ +0.000886] ice_service_task+0x404/0x1030 [ice]\n[ +0.000824] process_one_work+0x171/0x340\n[ +0.000685] worker_thread+0x277/0x3a0\n[ +0.000675] ? preempt_count_add+0x6a/0xa0\n[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50\n[ +0.000679] ? __pfx_worker_thread+0x10/0x10\n[ +0.000653] kthread+0xf0/0x120\n[ +0.000635] ? __pfx_kthread+0x10/0x10\n[ +0.000616] ret_from_fork+0x2d/0x50\n[ +0.000612] ? __pfx_kthread+0x10/0x10\n[ +0.000604] ret_from_fork_asm+0x1b/0x30\n[ +0.000604] \u003c/TASK\u003e\n\nThe previous way of handling this through returning -EBUSY is not viable,\nparticularly when destroying AF_XDP socket, because the kernel proceeds\nwith removal anyway.\n\nThere is plenty of code between those calls and there is no need to create\na large critical section that covers all of them, same as there is no need\nto protect ice_vsi_rebuild() with rtnl_lock().\n\nAdd xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().\n\nLeaving unprotected sections in between would result in two states that\nhave to be considered:\n1. when the VSI is closed, but not yet rebuild\n2. when VSI is already rebuild, but not yet open\n\nThe latter case is actually already handled through !netif_running() case,\nwe just need to adjust flag checking a little. The former one is not as\ntrivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of\nhardware interaction happens, this can make adding/deleting rings exit\nwith an error. Luckily, VSI rebuild is pending and can apply new\nconfiguration for us in a managed fashion.\n\nTherefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to\nindicate that ice_x\n---truncated---",
  "id": "GHSA-78wr-fq6m-qpff",
  "modified": "2024-09-26T15:30:34Z",
  "published": "2024-09-18T09:30:37Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-46765"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/2504b8405768a57a71e660dbfd5abd59f679a03f"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/2f057db2fb29bc209c103050647562e60554d3d3"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/391f7dae3d836891fc6cfbde38add2d0e10c6b7f"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.