Add security threat model (THREAT_MODEL.md) + SECURITY.md/AGENTS.md discoverability by potiuk · Pull Request #6535 · apache/hive

potiuk · 2026-06-11T15:54:55Z

This adds a v0 security threat model + discoverability wiring to apache/hive, produced by the ASF Security team for the Hive PMC to review and own — the pre-flight step for the Glasswing security scan the PMC opted into.

What's here

THREAT_MODEL.md — a v0 model (Michael Scovetta rubric, run with Claude Opus) covering the HiveServer2 SQL front door, the Metastore, and the UDF / SerDe / execution layer: trust boundaries, in/out-of-scope adversaries, what Hive upholds vs. what it leaves to the operator (TLS, authorization-model choice, network isolation, UDF vetting), known non-findings, and triage dispositions. Every non-trivial claim is provenance-tagged (documented) / (maintainer) / (inferred); the (inferred) ones are our hypotheses.
SECURITY.md — private reporting via security@hive.apache.org + a pointer to the model.
AGENTS.md — wires AGENTS.md → SECURITY.md → THREAT_MODEL.md so the scan agent (and researchers) can mechanically find the model.

How to engage — this is a draft to react to, not a finished artifact. THREAT_MODEL.md §14 collects open questions in waves; answer inline a few at a time, correct anything wrong, and the model becomes the PMC's. Once you're happy, we queue the scan in OSS-criticality order. No deadline pressure with the Mythos 5 window being extended.

Generated-by: Claude Opus 4.8 (1M context)

… discoverability v0 threat model produced by the ASF Security team via threat-model-producer (Michael Scovetta rubric, run with Claude Opus) for the PMC to review, correct, and own. Wires the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md discoverability chain the scan agent follows. Every non-trivial claim is provenance-tagged; open questions for the PMC are collected in THREAT_MODEL.md section 14. Generated-by: Claude Opus 4.8 (1M context)

okumin · 2026-06-12T09:19:18Z

Thank you! I will check the draft

potiuk · 2026-06-14T01:40:31Z

Thanks @okumin — no rush. The most useful read is the §14 "Open questions" section at the end; those are where I inferred a position and would value your confirmation or correction.

okumin

I answered some obvious points. I'm still checking the remaining

okumin · 2026-06-14T08:50:10Z

+*(inferred)* tag to *(maintainer)* once confirmed.
+
+**Wave 1 — scope & intended use**
+1. Is the in-scope surface "HiveServer2 (the SQL front door) + the artifacts it


Some external services, such as Apache Spark, can directly access Hive Metastore. The direct metastore access should be in scope.
https://gist.github.com/okumin/f33574e96efd014fa6c1f5d4e13531ef

Off-topic: Should we have a separate THREAT_MODEL.md for Hive Metastore if it is more convenient for AI? HiveServer2 and Hive Metastore have different security models and different parameters.

okumin · 2026-06-14T09:27:04Z

+   compiles/executes", with the Metastore treated as an intra-cluster trusted
+   service? Or is direct Metastore access in scope?
+2. Are UDFs / SerDes / custom InputFormats / `TRANSFORM` scripts in scope as
+   *code-execution-by-design* (not a sandbox), per §7?


Yes, they are, but we may elaborate on details.

Prerequisite

We assume authentication and authorization are configured.

Built-in UDF

Basically, built-in UDFs should be safe. Apache Hive exceptionally includes some insecure UDFs (e.g., reflect, reflect2, java_method, in_file) that allow arbitrary code execution. A Hive administrator must configure hive.server2.builtin.udf.blacklist to block such UDFs, either through their authorization plugin or directly. We know that major plugins, such as the Ranger authorizer, configure hive.server2.builtin.udf.blacklist properly.

Custom UDF

A Hive administrator must configure access policies (e.g., via Ranger) to allow only trusted users to add UDF jars or register UDFs. They are responsible for implementing safe UDFs, and Hive systems trust them. Hive can't guarantee safety when a trusted user adds compromised UDFs.

SerDe/InputFormat/OutputFormat

Only Hive administrators can put jars. They are responsible for deploying secure SerDe/InputFormat/OutputFormat, and Hive systems trust them. Hive can't guarantee safety when a Hive administrator adds a compromised SerDe/InputFormat/OutputFormat.

TRANSFORM

In a secure deployment, it must be prohibited. Major authorization plugins add org.apache.hadoop.hive.ql.security.authorization.plugin.DisallowTransformHook to hive.exec.pre.hooks, and TRANSFORM is prohibited. Hive administrators are responsible for either using proper authorization plugins or configuring the hook themselves.

okumin · 2026-06-15T07:36:59Z

+6. At the client→HS2 boundary, are SQL text, JDBC connection properties, and
+   session-config overrides all treated as untrusted (subject to the conf
+   whitelist)?
+7. Which `doAs` posture is the supported/recommended one, and how does it


When authentication and authorization, I expect hive.server2.enable.doAs to be false. That's because HiveServer2 can enforce policies on its own. I would like someone else to double-check this point.

okumin · 2026-06-15T07:52:10Z

+   scoping, others)?
+
+**Wave 3 — disclaimed properties & defaults**
+9. Confirm the operator-owned list in §6 (TLS, authz-model choice, network


As stated above, Hive Metastore should be protected at the application level, not at the network level.
I also wonder if we should accept only Ranger as an authorization system. I'm still considering that point, though.

okumin · 2026-06-15T08:53:55Z

+11. Is super-linear resource use / a hang on a pathological query a bug, or is
+    bounding it the operator's job (YARN queues / HS2 limits)?
+12. Confirm the real names + shipped defaults of the §8 levers (especially
+    `hive.server2.authentication` and the default authorization model).


I'm checking them. We have parameters to configure TLS on the Hive side.

Incorporates okumin's PR apache#6535 review: - direct Hive Metastore access in scope (HMS enforces caller authz at the application level; Spark et al. connect directly) -> §3.3/§4/§11a - UDF/SerDe/TRANSFORM code-execution detail: built-in UDF blacklist (reflect/reflect2/java_method/in_file), custom UDF/SerDe admin trust, TRANSFORM disable via DisallowTransformHook -> §7/§8/§11a - §14 Q1/Q2 promoted to maintainer; Q7/Q9/Q12 annotated PMC-reviewing Generated-by: Claude Opus 4.8

potiuk · 2026-06-17T01:36:03Z

Thanks okumin — this is exactly the kind of detail that makes the model useful. Folded your review in and pushed (THREAT_MODEL.md, +75/-23):

Direct Metastore access (your L186): added as in-scope adversary §3.3 — HMS enforces caller authorization at the application level (since Spark and similar talk to it directly), and §4 now frames network isolation as defense-in-depth rather than the primary control. Correspondingly flipped the §11a "Metastore Thrift port has no authorization" entry from out-of-scope to VALID/in-model.
UDF / SerDe / TRANSFORM (your L190): folded the whole breakdown into §7, with the config levers in §8 — built-in code-exec UDFs (reflect, reflect2, java_method, in_file) blocked via hive.server2.builtin.udf.blacklist; custom UDF/SerDe/InputFormat/OutputFormat as admin-trusted jar installs; TRANSFORM disabled via DisallowTransformHook in hive.exec.pre.hooks. Added a §11a non-finding for the built-in-UDF case. (Your gist was very helpful — thanks for the link.)

I've left these as "PMC reviewing" in §14 pending your follow-up, so nothing's prematurely locked:

doAs (L203): I noted hive.server2.enable.doAs=false as the expected posture but flagged it for the second-pair-of-eyes check you asked for.
Ranger-only authz (L209): left §6/§9 open while you weigh whether to treat Ranger as the only supported authorization system.
TLS params (L215): §8 TLS lever left unnamed until you confirm the exact Hive-side parameter names.

On your off-topic question — separate THREAT_MODEL.md for the Metastore: my lean is to keep a single THREAT_MODEL.md but with clearly separated HS2 vs HMS scope/boundary/property subsections, since it's one repo and one discoverability chain (AGENTS.md -> SECURITY.md -> THREAT_MODEL.md) and the triage dispositions are shared. If you'd rather split them, that works too — we'd just point SECURITY.md at both files. Your call; happy to restructure either way.

sonarqubecloud · 2026-06-17T02:39:12Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests unstable and removed tests pending labels Jun 11, 2026

okumin reviewed Jun 15, 2026

View reviewed changes

asf-ci-hive added tests pending and removed tests unstable labels Jun 17, 2026

asf-ci-hive added tests passed and removed tests pending labels Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add security threat model (THREAT_MODEL.md) + SECURITY.md/AGENTS.md discoverability#6535

Add security threat model (THREAT_MODEL.md) + SECURITY.md/AGENTS.md discoverability#6535
potiuk wants to merge 2 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-11

potiuk commented Jun 11, 2026

Uh oh!

okumin commented Jun 12, 2026

Uh oh!

potiuk commented Jun 14, 2026

Uh oh!

okumin left a comment

Uh oh!

okumin Jun 14, 2026

Uh oh!

okumin Jun 14, 2026

Uh oh!

okumin Jun 15, 2026

Uh oh!

okumin Jun 15, 2026

Uh oh!

okumin Jun 15, 2026

Uh oh!

potiuk commented Jun 17, 2026

Uh oh!

sonarqubecloud Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

potiuk commented Jun 11, 2026

Uh oh!

okumin commented Jun 12, 2026

Uh oh!

potiuk commented Jun 14, 2026

Uh oh!

okumin left a comment

Choose a reason for hiding this comment

Uh oh!

okumin Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 14, 2026

Choose a reason for hiding this comment

Prerequisite

Built-in UDF

Custom UDF

SerDe/InputFormat/OutputFormat

TRANSFORM

Uh oh!

okumin Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

potiuk commented Jun 17, 2026

Uh oh!

sonarqubecloud Bot commented Jun 17, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants