A critical XML External Entity (XXE) vulnerability in Apache Tika, tracked as CVE-2025-66516, exposes users to attacks through specially crafted PDF files containing XFA content.
Disclosed on December 4, 2025, by Apache security team member Tim Allison, this flaw affects core parsing modules across multiple versions.
Attackers can exploit it to read sensitive files, trigger denial-of-service conditions, or execute server-side request forgery on vulnerable systems.
Apache Tika is a popular open-source toolkit for content analysis and extraction, widely used in document processing pipelines, search engines, and security tools.
The vulnerability stems from improper handling of XML in PDF-embedded XFA forms, allowing external entity expansion.
This expands on the earlier CVE-2025-54988, which focused on the PDF parser module. Users who patched only that module remain at risk if core components lag.
Vulnerability Details and Affected Components
The XXE issue resides deep in Tika’s XML processing. When Tika parses a malicious PDF containing an XFA (XML Forms Architecture) template, it activates external entities defined in the XML.
For example, an attacker embeds a payload like <!DOCTYPE foo [<!ENTITY xxe SYSTEM “file:///etc/passwd”>]> inside the PDF’s XFA stream.
Tika’s core then resolves this, potentially leaking local files or probing internal networks via protocols like http:// or file://.
This affects all platforms Windows, Linux, macOS since Tika is Java-based. The CVE broadens CVE-2025-54988 in two key ways:
- The root fix is in tika-core, not just the PDF module. Upgrading only tika-parser-pdf-module leaves systems exposed.
- In the Tika 1.x series, the PDF parser lived in tika-parsers, which was overlooked in the prior report.
Here’s a breakdown of impacted artifacts:
| Component | Affected Versions | Fixed In |
|---|---|---|
| tika-core | 1.13 – 3.2.1 | 3.2.2+ |
| tika-parsers | 1.13 – 1.28.5 (pre-2.0.0) | 2.0.0+ |
| tika-parser-pdf-module | 2.0.0 – 3.2.1 | 3.2.2+ |
Severity is rated critical due to its ease of exploitation no authentication required and its potential to impact remote code in misconfigured setups.
CVSS v3.1 score isn’t finalized, but XXE flaws typically hit 9.8/10 for network accessibility.
Mitigation and Recommendations
To fix this, upgrade immediately: tika-core and tika-parser-pdf-module to 3.2.2 or later; tika-parsers to 2.0.0+. Verify via Maven coordinates: org.apache.tika:tika-core:3.2.2.
Disable external entity processing in Tika configs if upgrades aren’t feasible set TikaConfig with feature(“xml-external-entities”, false).
Organizations scanning uploads or indexing docs should audit Tika deployments. Tools like OWASP Dependency-Check can flag vulnerable versions.
No known exploits in the wild yet, but PDF vectors make this ripe for phishing campaigns.





