Skip to content

[CALCITE-7601] Harden ST_GeomFromGML against external entity expansion#5006

Open
alhudz wants to merge 2 commits into
apache:mainfrom
alhudz:gml-xxe-harden
Open

[CALCITE-7601] Harden ST_GeomFromGML against external entity expansion#5006
alhudz wants to merge 2 commits into
apache:mainfrom
alhudz:gml-xxe-harden

Conversation

@alhudz

@alhudz alhudz commented Jun 9, 2026

Copy link
Copy Markdown

Jira Link

A Jira can be filed for this if preferred; raising the patch first since the change is small and self-contained.

Changes Proposed

Repro: SELECT ST_GeomFromGML(g) where g is a GML string carrying a DOCTYPE with an external entity, e.g. <!DOCTYPE x [ <!ENTITY e SYSTEM "file:///etc/passwd"> ]> referenced from <gml:coordinates>&e;,0</gml:coordinates>.

Expected: the entity is not resolved.

Actual: fromGml builds a JTS GMLReader, whose internal SAXParserFactory leaves DOCTYPE and external general/parameter entities enabled, so the parser fetches the entity target and inlines it into the geometry. That is local file read / SSRF (XXE) from row data, since the GML argument crosses the trust boundary at the ST_GeomFromGML SQL function.

Fix: parse with a SAXParser configured with disallow-doctype-decl and external entities off, feeding JTS's own GMLHandler. Same hardening already used in XmlFunctions and DiffRepository.

Test: SpatialTypeUtilsTest gets a regression that points an external entity at a temp file holding a valid coordinate, so an unguarded parser would return POINT (7 8) while the guarded one rejects the document.

Comment on lines +147 to +149
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting all three features is overkill, since disallow-doctype-decl will throw on a DOCTYPE declaration and will therefore not parse any external subset (non-validating/load-external-dtd) nor entities.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, dropped both external-entity lines. disallow-doctype-decl throws on the DOCTYPE so nothing reaches the entity-resolution path anyway. Test still rejects the payload.

@xuzifu666

Copy link
Copy Markdown
Member

Thank you for your contribution @alhudz . Please refer to other PRs. We need Jira to illustrate the issue, and we strictly require that the submitted information be consistent with Jira's.

@ppkarwasz

Copy link
Copy Markdown
Member

Disclaimer: some shameless self-promotion 😉

Since changes like this one have been landing in many OSS projects lately, a month ago I wrote a small reusable library that hands you a hardened JAXP factory with a single method call:

https://javadoc.io/static/eu.copernik/copernik-xml-factory/0.1.0/eu/copernik/xml/factory/XmlFactories.html

The library code itself wouldn't add much to a PR like this. What might be worth it is everything around the code: a continuously evolving test suite that runs against every JAXP implementation still in the wild, and the fact that a single dependency can act as a "scapegoat" for SAST tools and reviewers. Instead of every project re-deriving the full set of (often cargo-culted) properties and tripping XXE/SSRF warnings whenever one is missing, the hardening lives in one audited place.

Note

The library is incubating, with the hope of landing in either Apache Commons or Xerces. I haven't had definitive feedback from either project yet, so I'll keep maintaining it for the foreseeable future.

@alhudz alhudz changed the title harden ST_GeomFromGML against external entity expansion [CALCITE-7601] Harden ST_GeomFromGML against external entity expansion Jun 11, 2026
@alhudz

alhudz commented Jun 11, 2026

Copy link
Copy Markdown
Author

I have created CALCITE-7601 to track this issue and have updated the PR information to reference the Jira so that the descriptions remain consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants