ghsa-4hrh-9vmp-2jgg
Vulnerability from github
Published
2021-05-21 14:23
Modified
2024-10-31 19:58
Summary
Heap buffer overflow in `StringNGrams`
Details

Impact

An attacker can cause a heap buffer overflow by passing crafted inputs to tf.raw_ops.StringNGrams:

```python import tensorflow as tf

separator = b'\x02\x00'
ngram_widths = [7, 6, 11] left_pad = b'\x7f\x7f\x7f\x7f\x7f' right_pad = b'\x7f\x7f\x25\x5d\x53\x74' pad_width = 50 preserve_short_sequences = True

l = ['', '', '', '', '', '', '', '', '', '', '']

data = tf.constant(l, shape=[11], dtype=tf.string)

l2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3] data_splits = tf.constant(l2, shape=[116], dtype=tf.int64)

out = tf.raw_ops.StringNGrams(data=data, data_splits=data_splits, separator=separator, ngram_widths=ngram_widths, left_pad=left_pad, right_pad=right_pad, pad_width=pad_width, preserve_short_sequences=preserve_short_sequences) ```

This is because the implementation fails to consider corner cases where input would be split in such a way that the generated tokens should only contain padding elements:

cc for (int ngram_index = 0; ngram_index < num_ngrams; ++ngram_index) { int pad_width = get_pad_width(ngram_width); int left_padding = std::max(0, pad_width - ngram_index); int right_padding = std::max(0, pad_width - (num_ngrams - (ngram_index + 1))); int num_tokens = ngram_width - (left_padding + right_padding); int data_start_index = left_padding > 0 ? 0 : ngram_index - pad_width; ... tstring* ngram = &output[ngram_index]; ngram->reserve(ngram_size); for (int n = 0; n < left_padding; ++n) { ngram->append(left_pad_); ngram->append(separator_); } for (int n = 0; n < num_tokens - 1; ++n) { ngram->append(data[data_start_index + n]); ngram->append(separator_); } ngram->append(data[data_start_index + num_tokens - 1]); // <<< for (int n = 0; n < right_padding; ++n) { ngram->append(separator_); ngram->append(right_pad_); } ... }

If input is such that num_tokens is 0, then, for data_start_index=0 (when left padding is present), the marked line would result in reading data[-1].

Patches

We have patched the issue in GitHub commit ba424dd8f16f7110eea526a8086f1a155f14f22b.

The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

For more information

Please consult our security guide for more information regarding the security model and how to contact us with issues and questions.

Attribution

This vulnerability has been reported by Yakun Zhang and Ying Wang of Baidu X-Team.

Show details on source website


{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "2.1.4"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.2.0"
            },
            {
              "fixed": "2.2.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.3.0"
            },
            {
              "fixed": "2.3.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.4.0"
            },
            {
              "fixed": "2.4.2"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-cpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "2.1.4"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-cpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.2.0"
            },
            {
              "fixed": "2.2.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-cpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.3.0"
            },
            {
              "fixed": "2.3.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-cpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.4.0"
            },
            {
              "fixed": "2.4.2"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-gpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "2.1.4"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-gpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.2.0"
            },
            {
              "fixed": "2.2.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-gpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.3.0"
            },
            {
              "fixed": "2.3.3"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "tensorflow-gpu"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "2.4.0"
            },
            {
              "fixed": "2.4.2"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2021-29542"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-131",
      "CWE-787"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2021-05-18T21:54:20Z",
    "nvd_published_at": "2021-05-14T20:15:00Z",
    "severity": "LOW"
  },
  "details": "### Impact\nAn attacker can cause a heap buffer overflow by passing crafted inputs to `tf.raw_ops.StringNGrams`:\n\n```python\nimport tensorflow as tf\n\nseparator = b\u0027\\x02\\x00\u0027    \nngram_widths = [7, 6, 11]\nleft_pad = b\u0027\\x7f\\x7f\\x7f\\x7f\\x7f\u0027\nright_pad = b\u0027\\x7f\\x7f\\x25\\x5d\\x53\\x74\u0027\npad_width = 50\npreserve_short_sequences = True\n  \nl = [\u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027, \u0027\u0027]\n  \ndata = tf.constant(l, shape=[11], dtype=tf.string)\n  \nl2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n     0, 0, 3]\ndata_splits = tf.constant(l2, shape=[116], dtype=tf.int64)\n\nout = tf.raw_ops.StringNGrams(data=data,\n    data_splits=data_splits, separator=separator,\n    ngram_widths=ngram_widths, left_pad=left_pad,\n    right_pad=right_pad, pad_width=pad_width,\n    preserve_short_sequences=preserve_short_sequences)\n```\n\nThis is because the [implementation](https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L171-L185) fails to consider corner cases where input would be split in such a way that the generated tokens should only contain padding elements:\n  \n```cc\nfor (int ngram_index = 0; ngram_index \u003c num_ngrams; ++ngram_index) {\n  int pad_width = get_pad_width(ngram_width);\n  int left_padding = std::max(0, pad_width - ngram_index);\n  int right_padding = std::max(0, pad_width - (num_ngrams - (ngram_index + 1)));\n  int num_tokens = ngram_width - (left_padding + right_padding);\n  int data_start_index = left_padding \u003e 0 ? 0 : ngram_index - pad_width;\n  ...\n  tstring* ngram = \u0026output[ngram_index];\n  ngram-\u003ereserve(ngram_size);\n  for (int n = 0; n \u003c left_padding; ++n) {\n    ngram-\u003eappend(left_pad_);\n    ngram-\u003eappend(separator_);\n  }\n  for (int n = 0; n \u003c num_tokens - 1; ++n) {\n    ngram-\u003eappend(data[data_start_index + n]);\n    ngram-\u003eappend(separator_);\n  }\n  ngram-\u003eappend(data[data_start_index + num_tokens - 1]); // \u003c\u003c\u003c\n  for (int n = 0; n \u003c right_padding; ++n) {\n    ngram-\u003eappend(separator_);\n    ngram-\u003eappend(right_pad_);\n  }\n  ...\n}\n```\n\nIf input is such that `num_tokens` is 0, then, for `data_start_index=0` (when left padding is present), the marked line would result in reading `data[-1]`.\n\n### Patches\nWe have patched the issue in GitHub commit [ba424dd8f16f7110eea526a8086f1a155f14f22b](https://github.com/tensorflow/tensorflow/commit/ba424dd8f16f7110eea526a8086f1a155f14f22b).\n\nThe fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.\n\n### For more information\nPlease consult [our security guide](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) for more information regarding the security model and how to contact us with issues and questions.\n\n### Attribution\nThis vulnerability has been reported by Yakun Zhang and Ying Wang of Baidu X-Team.",
  "id": "GHSA-4hrh-9vmp-2jgg",
  "modified": "2024-10-31T19:58:52Z",
  "published": "2021-05-21T14:23:15Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/tensorflow/tensorflow/security/advisories/GHSA-4hrh-9vmp-2jgg"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2021-29542"
    },
    {
      "type": "WEB",
      "url": "https://github.com/tensorflow/tensorflow/commit/ba424dd8f16f7110eea526a8086f1a155f14f22b"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/tensorflow-cpu/PYSEC-2021-470.yaml"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/tensorflow-gpu/PYSEC-2021-668.yaml"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/tensorflow/PYSEC-2021-179.yaml"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/tensorflow/tensorflow"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:L",
      "type": "CVSS_V3"
    },
    {
      "score": "CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:N/VC:N/VI:N/VA:L/SC:N/SI:N/SA:N",
      "type": "CVSS_V4"
    }
  ],
  "summary": "Heap buffer overflow in `StringNGrams`"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.