CVE-2026-53923 (GCVE-0-2026-53923)

Vulnerability from cvelistv5 – Published: 2026-06-22 21:55 – Updated: 2026-06-23 15:05
VLAI
Title
vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Summary
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
SSVC
Exploitation: none Automatable: no Technical Impact: partial
CISA Coordinator (v2.0.3)
CWE
  • CWE-681 - Incorrect Conversion between Numeric Types
  • CWE-200 - Exposure of Sensitive Information to an Unauthorized Actor
Assigner
Impacted products
Vendor Product Version
vllm-project vllm Affected: >= 0.5.5, < 0.23.1rc0
Create a notification for this product.
Show details on NVD website

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2026-53923",
                "options": [
                  {
                    "Exploitation": "none"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2026-06-23T15:04:15.555317Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2026-06-23T15:05:21.711Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003e= 0.5.5, \u003c 0.23.1rc0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM\u0027s GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users\u0027 inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0."
        }
      ],
      "metrics": [
        {
          "cvssV4_0": {
            "attackComplexity": "LOW",
            "attackRequirements": "NONE",
            "attackVector": "NETWORK",
            "baseScore": 5.3,
            "baseSeverity": "MEDIUM",
            "privilegesRequired": "NONE",
            "subAvailabilityImpact": "NONE",
            "subConfidentialityImpact": "NONE",
            "subIntegrityImpact": "NONE",
            "userInteraction": "PASSIVE",
            "vectorString": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N",
            "version": "4.0",
            "vulnAvailabilityImpact": "NONE",
            "vulnConfidentialityImpact": "LOW",
            "vulnIntegrityImpact": "LOW"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-681",
              "description": "CWE-681: Incorrect Conversion between Numeric Types",
              "lang": "en",
              "type": "CWE"
            }
          ]
        },
        {
          "descriptions": [
            {
              "cweId": "CWE-200",
              "description": "CWE-200: Exposure of Sensitive Information to an Unauthorized Actor",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2026-06-22T21:55:42.001Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4"
        },
        {
          "name": "https://github.com/vllm-project/vllm/pull/44971",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/pull/44971"
        },
        {
          "name": "https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e"
        }
      ],
      "source": {
        "advisory": "GHSA-5jv2-g5wq-cmr4",
        "discovery": "UNKNOWN"
      },
      "title": "vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2026-53923",
    "datePublished": "2026-06-22T21:55:42.001Z",
    "dateReserved": "2026-06-11T15:46:12.316Z",
    "dateUpdated": "2026-06-23T15:05:21.711Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.2",
  "vulnerability-lookup:meta": {
    "epss": {
      "cve": "CVE-2026-53923",
      "date": "2026-06-24",
      "epss": "0.00321",
      "percentile": "0.23715"
    },
    "vulnrichment": {
      "containers": "{\"adp\": [{\"title\": \"CISA ADP Vulnrichment\", \"metrics\": [{\"other\": {\"type\": \"ssvc\", \"content\": {\"id\": \"CVE-2026-53923\", \"role\": \"CISA Coordinator\", \"options\": [{\"Exploitation\": \"none\"}, {\"Automatable\": \"no\"}, {\"Technical Impact\": \"partial\"}], \"version\": \"2.0.3\", \"timestamp\": \"2026-06-23T15:04:15.555317Z\"}}}], \"providerMetadata\": {\"orgId\": \"134c704f-9b21-4f2e-91b3-4a467353bcc0\", \"shortName\": \"CISA-ADP\", \"dateUpdated\": \"2026-06-23T15:04:19.969Z\"}}], \"cna\": {\"title\": \"vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow\", \"source\": {\"advisory\": \"GHSA-5jv2-g5wq-cmr4\", \"discovery\": \"UNKNOWN\"}, \"metrics\": [{\"cvssV4_0\": {\"version\": \"4.0\", \"baseScore\": 5.3, \"attackVector\": \"NETWORK\", \"baseSeverity\": \"MEDIUM\", \"vectorString\": \"CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N\", \"userInteraction\": \"PASSIVE\", \"attackComplexity\": \"LOW\", \"attackRequirements\": \"NONE\", \"privilegesRequired\": \"NONE\", \"subIntegrityImpact\": \"NONE\", \"vulnIntegrityImpact\": \"LOW\", \"subAvailabilityImpact\": \"NONE\", \"vulnAvailabilityImpact\": \"NONE\", \"subConfidentialityImpact\": \"NONE\", \"vulnConfidentialityImpact\": \"LOW\"}}], \"affected\": [{\"vendor\": \"vllm-project\", \"product\": \"vllm\", \"versions\": [{\"status\": \"affected\", \"version\": \"\u003e= 0.5.5, \u003c 0.23.1rc0\"}]}], \"references\": [{\"url\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4\", \"name\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4\", \"tags\": [\"x_refsource_CONFIRM\"]}, {\"url\": \"https://github.com/vllm-project/vllm/pull/44971\", \"name\": \"https://github.com/vllm-project/vllm/pull/44971\", \"tags\": [\"x_refsource_MISC\"]}, {\"url\": \"https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e\", \"name\": \"https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e\", \"tags\": [\"x_refsource_MISC\"]}], \"descriptions\": [{\"lang\": \"en\", \"value\": \"vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM\u0027s GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users\u0027 inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.\"}], \"problemTypes\": [{\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-681\", \"description\": \"CWE-681: Incorrect Conversion between Numeric Types\"}]}, {\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-200\", \"description\": \"CWE-200: Exposure of Sensitive Information to an Unauthorized Actor\"}]}], \"providerMetadata\": {\"orgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"shortName\": \"GitHub_M\", \"dateUpdated\": \"2026-06-22T21:55:42.001Z\"}}}",
      "cveMetadata": "{\"cveId\": \"CVE-2026-53923\", \"state\": \"PUBLISHED\", \"dateUpdated\": \"2026-06-23T15:05:21.711Z\", \"dateReserved\": \"2026-06-11T15:46:12.316Z\", \"assignerOrgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"datePublished\": \"2026-06-22T21:55:42.001Z\", \"assignerShortName\": \"GitHub_M\"}",
      "dataType": "CVE_RECORD",
      "dataVersion": "5.2"
    }
  }
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…