ghsa-crmq-c99h-f986
Vulnerability from github
Published
2025-12-24 15:30
Modified
2025-12-24 15:30
Details

In the Linux kernel, the following vulnerability has been resolved:

btrfs: don't free qgroup space unless specified

Boris noticed in his simple quotas testing that he was getting a leak with Sweet Tea's change to subvol create that stopped doing a transaction commit. This was just a side effect of that change.

In the delayed inode code we have an optimization that will free extra reservations if we think we can pack a dir item into an already modified leaf. Previously this wouldn't be triggered in the subvolume create case because we'd commit the transaction, it was still possible but much harder to trigger. It could actually be triggered if we did a mkdir && subvol create with qgroups enabled.

This occurs because in btrfs_insert_delayed_dir_index(), which gets called when we're adding the dir item, we do the following:

btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);

if we're able to skip reserving space.

The problem here is that trans->block_rsv points at the temporary block rsv for the subvolume create, which has qgroup reservations in the block rsv.

This is a problem because btrfs_block_rsv_release() will do the following:

if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) { qgroup_to_release = block_rsv->qgroup_rsv_reserved - block_rsv->qgroup_rsv_size; block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size; }

The temporary block rsv just has ->qgroup_rsv_reserved set, ->qgroup_rsv_size == 0. The optimization in btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0. Then later on when we call btrfs_subvolume_release_metadata() which has

btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);

qgroup_to_release is set to 0, and we do not convert the reserved metadata space.

The problem here is that the block rsv code has been unconditionally messing with ->qgroup_rsv_reserved, because the main place this is used is delalloc, and any time we call btrfs_block_rsv_release() we do it with qgroup_to_release set, and thus do the proper accounting.

The subvolume code is the only other code that uses the qgroup reservation stuff, but it's intermingled with the above optimization, and thus was getting its reservation freed out from underneath it and thus leaking the reserved space.

The solution is to simply not mess with the qgroup reservations if we don't have qgroup_to_release set. This works with the existing code as anything that messes with the delalloc reservations always have qgroup_to_release set. This fixes the leak that Boris was observing.

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2023-54158"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-12-24T13:16:17Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: don\u0027t free qgroup space unless specified\n\nBoris noticed in his simple quotas testing that he was getting a leak\nwith Sweet Tea\u0027s change to subvol create that stopped doing a\ntransaction commit.  This was just a side effect of that change.\n\nIn the delayed inode code we have an optimization that will free extra\nreservations if we think we can pack a dir item into an already modified\nleaf.  Previously this wouldn\u0027t be triggered in the subvolume create\ncase because we\u0027d commit the transaction, it was still possible but\nmuch harder to trigger.  It could actually be triggered if we did a\nmkdir \u0026\u0026 subvol create with qgroups enabled.\n\nThis occurs because in btrfs_insert_delayed_dir_index(), which gets\ncalled when we\u0027re adding the dir item, we do the following:\n\n  btrfs_block_rsv_release(fs_info, trans-\u003eblock_rsv, bytes, NULL);\n\nif we\u0027re able to skip reserving space.\n\nThe problem here is that trans-\u003eblock_rsv points at the temporary block\nrsv for the subvolume create, which has qgroup reservations in the block\nrsv.\n\nThis is a problem because btrfs_block_rsv_release() will do the\nfollowing:\n\n  if (block_rsv-\u003eqgroup_rsv_reserved \u003e= block_rsv-\u003eqgroup_rsv_size) {\n\t  qgroup_to_release = block_rsv-\u003eqgroup_rsv_reserved -\n\t\t  block_rsv-\u003eqgroup_rsv_size;\n\t  block_rsv-\u003eqgroup_rsv_reserved = block_rsv-\u003eqgroup_rsv_size;\n  }\n\nThe temporary block rsv just has -\u003eqgroup_rsv_reserved set,\n-\u003eqgroup_rsv_size == 0.  The optimization in\nbtrfs_insert_delayed_dir_index() sets -\u003eqgroup_rsv_reserved = 0.  Then\nlater on when we call btrfs_subvolume_release_metadata() which has\n\n  btrfs_block_rsv_release(fs_info, rsv, (u64)-1, \u0026qgroup_to_release);\n  btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);\n\nqgroup_to_release is set to 0, and we do not convert the reserved\nmetadata space.\n\nThe problem here is that the block rsv code has been unconditionally\nmessing with -\u003eqgroup_rsv_reserved, because the main place this is used\nis delalloc, and any time we call btrfs_block_rsv_release() we do it\nwith qgroup_to_release set, and thus do the proper accounting.\n\nThe subvolume code is the only other code that uses the qgroup\nreservation stuff, but it\u0027s intermingled with the above optimization,\nand thus was getting its reservation freed out from underneath it and\nthus leaking the reserved space.\n\nThe solution is to simply not mess with the qgroup reservations if we\ndon\u0027t have qgroup_to_release set.  This works with the existing code as\nanything that messes with the delalloc reservations always have\nqgroup_to_release set.  This fixes the leak that Boris was observing.",
  "id": "GHSA-crmq-c99h-f986",
  "modified": "2025-12-24T15:30:40Z",
  "published": "2025-12-24T15:30:40Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2023-54158"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/04ff6bd0317735791ef3e443c7c89f3c0dda548d"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/148b16cd30b202999ec5b534e3e5d8ab4b766f21"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/15e877e5923ec6d6caa5e447dcc4b79a8ff7cc53"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/1e05bf5e80bb1161b7294c9ce5292b26232ab853"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/478bd15f46b6e3aae78aac4f3788697f1546eea6"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/d246331b78cbef86237f9c22389205bc9b4e1cc1"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/f264be24146bee2d652010a18ae2517df5856261"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…