From 58603056d400e946de867d5f806607b39ccbe28c Mon Sep 17 00:00:00 2001
From: Jarek Potiuk <jarek@potiuk.com>
Date: Thu, 11 Jun 2026 17:54:13 +0200
Subject: [PATCH 1/2] Add security threat model (THREAT_MODEL.md) + SECURITY.md
 / AGENTS.md discoverability

v0 threat model produced by the ASF Security team via threat-model-producer
(Michael Scovetta rubric, run with Claude Opus) for the PMC to review, correct,
and own. Wires the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md discoverability
chain the scan agent follows. Every non-trivial claim is provenance-tagged;
open questions for the PMC are collected in THREAT_MODEL.md section 14.

Generated-by: Claude Opus 4.8 (1M context)
---
 AGENTS.md       |  52 +++++++++++
 SECURITY.md     |  40 +++++++++
 THREAT_MODEL.md | 223 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 315 insertions(+)
 create mode 100644 AGENTS.md
 create mode 100644 SECURITY.md
 create mode 100644 THREAT_MODEL.md

diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 000000000000..1bf11597d57f
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,52 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to you under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# AGENTS.md
+
+## Security
+
+You are helping a security researcher find and report vulnerabilities in
+Apache Hive. Before drafting any report or reaching any conclusion, complete
+these steps.
+
+### Step 1 — Read the threat model
+Read **[THREAT_MODEL.md](THREAT_MODEL.md)**: the trust boundaries (the
+HiveServer2 SQL front door, the Metastore, the query/UDF execution layer), the
+adversaries in and out of scope, and what Hive upholds versus what it leaves to
+the operator.
+
+### Step 2 — Read the security policy
+Read **[SECURITY.md](SECURITY.md)** for how to report.
+
+### Key scoping facts (see THREAT_MODEL.md)
+- The **HiveServer2** SQL front door is the primary untrusted boundary; the
+  Metastore and execution cluster are assumed to run inside an
+  operator-controlled perimeter.
+- **UDFs, SerDes, custom InputFormats, and `TRANSFORM` scripts are
+  code-execution by design**, not a sandbox — running authorized code is a
+  feature, not a vulnerability.
+- Transport security (TLS), the choice of authorization model (Ranger /
+  SQL-standard / storage-based), and network isolation are **operator**
+  responsibilities, not engine invariants.
+- Hive does **not** defend against an operator with `root`, the Hadoop
+  superuser, or direct HDFS / metastore-DB access.
+
+### Step 3 — Route the finding
+Route the finding to exactly one disposition in **THREAT_MODEL.md §13**
+(VALID, or one of the `OUT-OF-MODEL` / `BY-DESIGN` dispositions) and cite the
+section that justifies the call. This model is **v0** — open questions for the
+PMC are in §14.
diff --git a/SECURITY.md b/SECURITY.md
new file mode 100644
index 000000000000..bbb3a013e15b
--- /dev/null
+++ b/SECURITY.md
@@ -0,0 +1,40 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to you under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# Security Policy
+
+## Reporting a Vulnerability
+
+Please report suspected security vulnerabilities in Apache Hive **privately**
+to the Hive security list at `security@hive.apache.org`, following the
+[Apache Software Foundation security process](https://www.apache.org/security/).
+Do **not** open public GitHub issues or pull requests for security reports — a
+private report lets the issue be investigated and fixed before disclosure.
+
+## Threat Model
+
+A threat model for Apache Hive is maintained in
+[THREAT_MODEL.md](THREAT_MODEL.md). It describes the trust boundaries (the
+HiveServer2 SQL front door, the Metastore, the query/UDF execution layer), the
+adversaries in and out of scope, the security properties Hive upholds given its
+deployment assumptions versus those left to the operator (transport security,
+authorization-model choice, network isolation, UDF vetting), and the recurring
+non-findings. Triagers of scanner, fuzzer, or AI-generated findings should
+route each through `THREAT_MODEL.md` §13.
+
+This file is **v0** and carries open questions for the Hive PMC in
+`THREAT_MODEL.md` §14.
diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md
new file mode 100644
index 000000000000..65a70b227894
--- /dev/null
+++ b/THREAT_MODEL.md
@@ -0,0 +1,223 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to you under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# Apache Hive — Threat Model (v0 draft)
+
+> **Status:** v0 draft produced by the ASF Security team (Michael Scovetta
+> rubric, run with Claude Opus) for the Apache Hive PMC to review, correct,
+> and own. Every non-trivial claim is provenance-tagged
+> *(documented)* / *(maintainer)* / *(inferred)*; the *(inferred)* tags are
+> the producer's hypotheses and each has a matching question in §14. Written
+> against `master`; revise on a new public-facing surface (a new server
+> endpoint, auth mechanism, file format, or execution engine), not on
+> internal refactors.
+
+## §1 — Purpose and consumers
+
+This document describes the **implicit security contract** between Apache
+Hive and its downstream operators: what Hive assumes about its environment,
+what it upholds given those assumptions, what it leaves to the operator, and
+which "syntactically possible" misuses fall outside the intended design. It
+serves the **integrator/operator** (which threats they own) and the
+**triager** (classifying a scanner/AI/CVE-style finding as valid, out of
+model, or disclaimed by design — cite the section).
+
+## §2 — What Hive is
+
+Apache Hive is a **SQL data-warehouse layer over Apache Hadoop** *(documented:
+README)*. The in-scope component families:
+
+| Component | Role | Primary surface |
+| --- | --- | --- |
+| **HiveServer2 (HS2)** | the SQL front door — accepts queries over a Thrift/binary or HTTP transport, authenticates the session, compiles + authorizes + executes | network (the highest-value untrusted boundary) *(inferred — §14 Q1)* |
+| **Hive Metastore (HMS)** | a Thrift service holding table/partition/schema metadata + storage locations | network (intra-cluster) *(inferred — §14 Q1)* |
+| **Query compiler + execution** | parse → plan → run on Tez / MapReduce / Spark; reads/writes HDFS, HBase, object stores | depends on the configured engine *(documented: README)* |
+| **UDF / SerDe / file-format layer** | user-supplied or built-in functions and (de)serializers invoked during execution | in-JVM code execution *(inferred — §14 Q2)* |
+| **JDBC/ODBC drivers + Beeline** | client-side connectors | client trust domain |
+
+Hive is **not** a standalone secured appliance: it is a clustered service
+that depends on Hadoop (HDFS, YARN), a metastore RDBMS, and typically an
+external authorization service (Apache Ranger or SQL-standard authorization)
+and a KDC for Kerberos *(inferred — §14 Q3)*.
+
+## §3 — Adversaries in and out of scope
+
+**In scope** *(inferred — §14 Q4)*:
+
+1. A **SQL client** connecting to HiveServer2 with valid or attempted-invalid
+   credentials, trying to read/modify data outside their authorization, or to
+   reach the host through query features.
+2. A **network adversary** between client and HS2 / between HS2 and HMS, where
+   transport security is not configured.
+
+**Out of scope** *(inferred — §14 Q5)*:
+
+3. **An operator with `root` / the Hadoop superuser / direct HDFS or metastore-DB
+   access.** Anyone who already controls the storage layer or the cluster
+   processes is not an adversary Hive defends against → `OUT-OF-MODEL:
+   adversary-not-in-scope`.
+4. **A trusted authenticated admin** performing an authorized action (creating a
+   function, changing config, granting a role). A new path to a privilege the
+   principal already holds is `OUT-OF-MODEL: equivalent-harm`.
+5. **Bugs in the dependencies Hive orchestrates** — Hadoop/HDFS, YARN, Tez,
+   the metastore RDBMS, Ranger, the KDC, the JVM. Report upstream →
+   `OUT-OF-MODEL: unsupported-component`.
+
+## §4 — Trust boundaries
+
+- **Client → HiveServer2** is the primary boundary. The session is
+  authenticated (Kerberos / LDAP / PAM / custom / none) and every statement is
+  authorization-checked before execution *(inferred — §14 Q6)*. Whether the SQL
+  text, JDBC connection properties, and session-configuration overrides are
+  treated as untrusted at this boundary is the load-bearing question for
+  triage.
+- **HiveServer2 → Metastore** and **HS2 → execution engine / HDFS** are
+  intra-cluster boundaries assumed to run inside an operator-controlled,
+  network-isolated perimeter *(inferred — §14 Q3)*.
+- **`doAs` impersonation:** when enabled, HS2 executes work as the connected
+  end user against HDFS rather than as the Hive service principal; when
+  disabled, all access runs as the Hive principal and authorization is fully
+  delegated to the SQL-layer authorizer *(inferred — §14 Q7)*. The two modes
+  have materially different blast radii and which is "the" supported posture
+  is a §14 question.
+
+## §5 — What Hive upholds (given §3/§4 assumptions)
+
+*(all inferred — §14 Q8)*
+
+- **Authentication** of the HS2 session via the configured mechanism before any
+  statement runs.
+- **Authorization** of each statement against the configured model (Ranger /
+  SQL-standard / storage-based), scoped to the principal's granted privileges
+  on the named objects.
+- **Memory safety on well-formed input** to the extent the JVM provides it;
+  Hive is Java, so classic memory-corruption is out of the language model.
+
+## §6 — What Hive leaves to the operator
+
+*(inferred — §14 Q9)*
+
+- **Transport security (TLS)** on the HS2 and Metastore endpoints, and the KDC
+  / LDAP server's own security.
+- **Choosing and configuring an authorization model.** Storage-based and
+  SQL-standard and Ranger give materially different guarantees; the default
+  posture (and whether "no authorization configured" is a supported production
+  mode) is the operator's call.
+- **Network isolation** of the Metastore, the metastore RDBMS, and the
+  execution cluster from untrusted networks.
+- **Vetting UDFs / SerDes / aux JARs.** Adding a function or SerDe is adding
+  code to the execution JVM (see §9).
+
+## §7 — Properties Hive does *not* uphold (by design)
+
+*(inferred — §14 Q10)*
+
+- **A sandbox around UDFs, SerDes, custom InputFormats, or `TRANSFORM`/script
+  operators.** Code a principal is authorized to register or invoke runs with
+  the privileges of the execution process; this is a feature, not a
+  containment boundary. `BY-DESIGN: property-disclaimed`.
+- **Protection against an operator who controls the underlying storage,
+  metastore DB, or cluster processes** (see §3 item 3).
+- **Resource fairness / DoS protection as a hard guarantee.** A sufficiently
+  expensive query can exhaust cluster resources; per-pool/queue limits
+  (YARN, HS2 query limits) are the operator's lever, not an engine invariant
+  *(inferred — §14 Q11)*.
+
+## §8 — Key configuration levers (load-bearing)
+
+*(all inferred — §14 Q12; confirm names + defaults)*
+
+| Lever | Why it matters |
+| --- | --- |
+| Authentication mode (`hive.server2.authentication`) | `NONE` vs `KERBEROS`/`LDAP` decides whether the front door is open. |
+| Authorization model (Ranger / SQL-std / storage-based / none) | decides whether statements are access-controlled at all. |
+| `doAs` (`hive.server2.enable.doAs`) | decides whether HDFS access runs as the end user or the Hive principal. |
+| TLS on HS2 / HMS transports | decides whether sessions + metadata are on the wire in clear. |
+| UDF/SerDe allow-listing, `hive.security.authorization.sqlstd.confwhitelist` | decides which session config + functions an untrusted client may set/use. |
+
+## §11a — Known non-findings (seed for scanner/AI triage)
+
+These are the dispositions the PMC most likely wants pre-recorded; confirm and
+extend *(inferred — §14 Q13)*:
+
+- **"A UDF / `TRANSFORM` script / custom SerDe can run arbitrary code."**
+  By design — registering or invoking code is an authorized operation, not a
+  sandbox escape. `BY-DESIGN`.
+- **"HiveServer2 with `authentication=NONE` accepts anyone."** That is a
+  non-default / operator-chosen insecure configuration, not a Hive defect.
+  `OUT-OF-MODEL: non-default-build` (confirm the shipped default).
+- **"The Metastore Thrift port has no authorization."** In-model only if the
+  PMC asserts HMS is meant to enforce caller authorization; if HMS is an
+  intra-cluster trusted service behind the perimeter, reports against direct
+  HMS access are `OUT-OF-MODEL: adversary-not-in-scope`. (§14 Q1/Q3.)
+- **Dependency-tail CVEs** (Hadoop, Log4j, a transitive JAR) surfaced by an
+  SCA scanner against Hive's build — triage upstream unless Hive's own code
+  reaches the vulnerable path with untrusted input.
+
+## §13 — Triage dispositions
+
+A finding is **VALID** only when all hold: the violated property is one Hive
+claims (§5), the attacker is in scope (§3), and the affected code is on an
+in-model surface (§2/§4) reached by untrusted input. Otherwise route to one
+of: `OUT-OF-MODEL: adversary-not-in-scope` · `OUT-OF-MODEL: equivalent-harm` ·
+`OUT-OF-MODEL: unsupported-component` · `OUT-OF-MODEL: non-default-build` ·
+`BY-DESIGN: property-disclaimed`.
+
+## §14 — Open questions for the Hive PMC
+
+Grouped in waves; answer inline (a few at a time is fine). Each promotes an
+*(inferred)* tag to *(maintainer)* once confirmed.
+
+**Wave 1 — scope & intended use**
+1. Is the in-scope surface "HiveServer2 (the SQL front door) + the artifacts it
+   compiles/executes", with the Metastore treated as an intra-cluster trusted
+   service? Or is direct Metastore access in scope?
+2. Are UDFs / SerDes / custom InputFormats / `TRANSFORM` scripts in scope as
+   *code-execution-by-design* (not a sandbox), per §7?
+3. Confirm the assumed deployment: clustered, behind an operator-controlled
+   perimeter, with Hadoop + a metastore RDBMS + (Ranger or SQL-std auth) + KDC
+   as trusted dependencies.
+4. Is the in-scope adversary "a SQL client at the HS2 boundary" (+ a network
+   MITM where TLS is off)? Anything to add?
+5. Confirm operators with storage/metastore-DB/cluster-process access, and
+   trusted admins doing authorized actions, are out of model.
+
+**Wave 2 — trust boundaries & auth**
+6. At the client→HS2 boundary, are SQL text, JDBC connection properties, and
+   session-config overrides all treated as untrusted (subject to the conf
+   whitelist)?
+7. Which `doAs` posture is the supported/recommended one, and how does it
+   change the authorization story?
+8. What properties does Hive claim to uphold given valid input (auth, authz
+   scoping, others)?
+
+**Wave 3 — disclaimed properties & defaults**
+9. Confirm the operator-owned list in §6 (TLS, authz-model choice, network
+   isolation, UDF vetting). Anything mis-assigned?
+10. Confirm the by-design non-guarantees in §7.
+11. Is super-linear resource use / a hang on a pathological query a bug, or is
+    bounding it the operator's job (YARN queues / HS2 limits)?
+12. Confirm the real names + shipped defaults of the §8 levers (especially
+    `hive.server2.authentication` and the default authorization model).
+13. What do scanners/fuzzers/researchers most often report that you consider a
+    non-finding? (Feeds §11a.)
+
+**Wave 4 — meta**
+14. Hive has no in-repo `SECURITY.md`/`THREAT_MODEL.md` today; this PR adds
+    them and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm this
+    in-repo model is canonical (vs the cwiki security pages), how it should
+    reference those pages, and who owns revisions.

From f5a9ebb91b055e5a95d08523da15d0db8814a149 Mon Sep 17 00:00:00 2001
From: Jarek Potiuk <jarek@potiuk.com>
Date: Tue, 16 Jun 2026 21:35:45 -0400
Subject: [PATCH 2/2] THREAT_MODEL.md: fold in Hive PMC review answers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Incorporates okumin's PR #6535 review:
- direct Hive Metastore access in scope (HMS enforces caller authz at the
  application level; Spark et al. connect directly) -> §3.3/§4/§11a
- UDF/SerDe/TRANSFORM code-execution detail: built-in UDF blacklist
  (reflect/reflect2/java_method/in_file), custom UDF/SerDe admin trust,
  TRANSFORM disable via DisallowTransformHook -> §7/§8/§11a
- §14 Q1/Q2 promoted to maintainer; Q7/Q9/Q12 annotated PMC-reviewing

Generated-by: Claude Opus 4.8
---
 THREAT_MODEL.md | 98 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 75 insertions(+), 23 deletions(-)

diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md
index 65a70b227894..fc264547d2ff 100644
--- a/THREAT_MODEL.md
+++ b/THREAT_MODEL.md
@@ -56,24 +56,29 @@ and a KDC for Kerberos *(inferred — §14 Q3)*.
 
 ## §3 — Adversaries in and out of scope
 
-**In scope** *(inferred — §14 Q4)*:
+**In scope**:
 
 1. A **SQL client** connecting to HiveServer2 with valid or attempted-invalid
    credentials, trying to read/modify data outside their authorization, or to
-   reach the host through query features.
+   reach the host through query features. *(inferred — §14 Q4)*
 2. A **network adversary** between client and HS2 / between HS2 and HMS, where
-   transport security is not configured.
+   transport security is not configured. *(inferred — §14 Q4)*
+3. A **direct Metastore (HMS) client.** Some external services (e.g. Apache
+   Spark) connect to the Hive Metastore directly, so HMS is expected to enforce
+   caller authorization at the **application level** — it is not merely an
+   intra-cluster service shielded by a network perimeter. A client reaching HMS
+   outside its authorization is in-model. *(maintainer — okumin, §14 Q1.)*
 
 **Out of scope** *(inferred — §14 Q5)*:
 
-3. **An operator with `root` / the Hadoop superuser / direct HDFS or metastore-DB
+4. **An operator with `root` / the Hadoop superuser / direct HDFS or metastore-DB
    access.** Anyone who already controls the storage layer or the cluster
    processes is not an adversary Hive defends against → `OUT-OF-MODEL:
    adversary-not-in-scope`.
-4. **A trusted authenticated admin** performing an authorized action (creating a
+5. **A trusted authenticated admin** performing an authorized action (creating a
    function, changing config, granting a role). A new path to a privilege the
    principal already holds is `OUT-OF-MODEL: equivalent-harm`.
-5. **Bugs in the dependencies Hive orchestrates** — Hadoop/HDFS, YARN, Tez,
+6. **Bugs in the dependencies Hive orchestrates** — Hadoop/HDFS, YARN, Tez,
    the metastore RDBMS, Ranger, the KDC, the JVM. Report upstream →
    `OUT-OF-MODEL: unsupported-component`.
 
@@ -86,8 +91,11 @@ and a KDC for Kerberos *(inferred — §14 Q3)*.
   treated as untrusted at this boundary is the load-bearing question for
   triage.
 - **HiveServer2 → Metastore** and **HS2 → execution engine / HDFS** are
-  intra-cluster boundaries assumed to run inside an operator-controlled,
-  network-isolated perimeter *(inferred — §14 Q3)*.
+  intra-cluster boundaries. The Metastore is protected at the **application
+  level** (it enforces caller authorization), because external services such as
+  Spark talk to HMS directly (§3.3); network isolation is defense-in-depth, not
+  the primary control *(maintainer — okumin, §14 Q1)*. The HS2 → engine / HDFS
+  path is assumed inside an operator-controlled perimeter *(inferred — §14 Q3)*.
 - **`doAs` impersonation:** when enabled, HS2 executes work as the connected
   end user against HDFS rather than as the Hive service principal; when
   disabled, all access runs as the Hive principal and authorization is fully
@@ -129,9 +137,30 @@ and a KDC for Kerberos *(inferred — §14 Q3)*.
 - **A sandbox around UDFs, SerDes, custom InputFormats, or `TRANSFORM`/script
   operators.** Code a principal is authorized to register or invoke runs with
   the privileges of the execution process; this is a feature, not a
-  containment boundary. `BY-DESIGN: property-disclaimed`.
+  containment boundary. `BY-DESIGN: property-disclaimed`. The model assumes
+  authentication and authorization are configured; against that baseline
+  *(maintainer — okumin)*:
+    - **Built-in UDFs** are generally safe, but Hive ships a few that allow
+      arbitrary code execution (`reflect`, `reflect2`, `java_method`,
+      `in_file`). A Hive administrator must block these via
+      `hive.server2.builtin.udf.blacklist` (directly or through the
+      authorization plugin); major plugins such as the Ranger authorizer
+      configure this blacklist properly.
+    - **Custom UDFs** — the administrator must restrict, via access policies
+      (e.g. Ranger), who may add UDF jars or register UDFs. Those trusted users
+      are responsible for safe UDFs; Hive cannot guarantee safety when a
+      trusted user adds a compromised UDF.
+    - **SerDe / InputFormat / OutputFormat** — only administrators can install
+      the jars; they are responsible for deploying safe implementations, and
+      Hive trusts them. Hive cannot guarantee safety against a compromised one
+      an admin installs.
+    - **`TRANSFORM`** — in a secure deployment it must be prohibited. Major
+      authorization plugins add
+      `org.apache.hadoop.hive.ql.security.authorization.plugin.DisallowTransformHook`
+      to `hive.exec.pre.hooks`; administrators are responsible for using such a
+      plugin or configuring the hook themselves.
 - **Protection against an operator who controls the underlying storage,
-  metastore DB, or cluster processes** (see §3 item 3).
+  metastore DB, or cluster processes** (see §3 item 4).
 - **Resource fairness / DoS protection as a hard guarantee.** A sufficiently
   expensive query can exhaust cluster resources; per-pool/queue limits
   (YARN, HS2 query limits) are the operator's lever, not an engine invariant
@@ -148,6 +177,8 @@ and a KDC for Kerberos *(inferred — §14 Q3)*.
 | `doAs` (`hive.server2.enable.doAs`) | decides whether HDFS access runs as the end user or the Hive principal. |
 | TLS on HS2 / HMS transports | decides whether sessions + metadata are on the wire in clear. |
 | UDF/SerDe allow-listing, `hive.security.authorization.sqlstd.confwhitelist` | decides which session config + functions an untrusted client may set/use. |
+| Built-in UDF blacklist (`hive.server2.builtin.udf.blacklist`) | blocks code-exec built-ins (`reflect`, `reflect2`, `java_method`, `in_file`); Ranger configures it *(maintainer — okumin)*. |
+| `TRANSFORM` disable (`DisallowTransformHook` in `hive.exec.pre.hooks`) | prohibits `TRANSFORM`/script operators in secure deployments *(maintainer — okumin)*. |
 
 ## §11a — Known non-findings (seed for scanner/AI triage)
 
@@ -160,10 +191,17 @@ extend *(inferred — §14 Q13)*:
 - **"HiveServer2 with `authentication=NONE` accepts anyone."** That is a
   non-default / operator-chosen insecure configuration, not a Hive defect.
   `OUT-OF-MODEL: non-default-build` (confirm the shipped default).
-- **"The Metastore Thrift port has no authorization."** In-model only if the
-  PMC asserts HMS is meant to enforce caller authorization; if HMS is an
-  intra-cluster trusted service behind the perimeter, reports against direct
-  HMS access are `OUT-OF-MODEL: adversary-not-in-scope`. (§14 Q1/Q3.)
+- **"The Metastore Thrift port has no authorization."** **In-model.** The PMC
+  confirms HMS is expected to enforce caller authorization at the application
+  level (external services such as Spark access HMS directly — §3.3), so a
+  genuine missing-authorization path on HMS is a **VALID** finding, not
+  out-of-scope. *(maintainer — okumin, §14 Q1.)*
+- **"A built-in UDF (`reflect`/`reflect2`/`java_method`/`in_file`) enables code
+  execution."** By design — these are blocked by the administrator via
+  `hive.server2.builtin.udf.blacklist` (Ranger does this). Reachable only when
+  the operator has not configured the blacklist → `OUT-OF-MODEL:
+  non-default-build` / operator responsibility, unless the report shows a
+  *bypass* of a configured blacklist. *(maintainer — okumin.)*
 - **Dependency-tail CVEs** (Hadoop, Log4j, a transitive JAR) surfaced by an
   SCA scanner against Hive's build — triage upstream unless Hive's own code
   reaches the vulnerable path with untrusted input.
@@ -183,11 +221,17 @@ Grouped in waves; answer inline (a few at a time is fine). Each promotes an
 *(inferred)* tag to *(maintainer)* once confirmed.
 
 **Wave 1 — scope & intended use**
-1. Is the in-scope surface "HiveServer2 (the SQL front door) + the artifacts it
-   compiles/executes", with the Metastore treated as an intra-cluster trusted
-   service? Or is direct Metastore access in scope?
-2. Are UDFs / SerDes / custom InputFormats / `TRANSFORM` scripts in scope as
-   *code-execution-by-design* (not a sandbox), per §7?
+1. *(Answered — okumin: direct Metastore access **is** in scope; HMS enforces
+   caller authorization at the application level, since external services like
+   Spark talk to it directly. Folded into §3.3 / §4 / §11a.)* ~~Is the in-scope
+   surface "HiveServer2 + the artifacts it compiles/executes", with the
+   Metastore treated as intra-cluster trusted? Or is direct Metastore access in
+   scope?~~
+2. *(Answered — okumin: yes, in scope as code-execution-by-design, with detail
+   on built-in UDF blacklist, custom UDF/SerDe admin trust, and `TRANSFORM`
+   disable via `DisallowTransformHook`. Folded into §7 / §8 / §11a.)* ~~Are
+   UDFs / SerDes / custom InputFormats / `TRANSFORM` scripts in scope as
+   code-execution-by-design (not a sandbox), per §7?~~
 3. Confirm the assumed deployment: clustered, behind an operator-controlled
    perimeter, with Hadoop + a metastore RDBMS + (Ranger or SQL-std auth) + KDC
    as trusted dependencies.
@@ -200,18 +244,26 @@ Grouped in waves; answer inline (a few at a time is fine). Each promotes an
 6. At the client→HS2 boundary, are SQL text, JDBC connection properties, and
    session-config overrides all treated as untrusted (subject to the conf
    whitelist)?
-7. Which `doAs` posture is the supported/recommended one, and how does it
-   change the authorization story?
+7. *(PMC reviewing — okumin expects `hive.server2.enable.doAs=false` is the
+   intended posture, since HiveServer2 can enforce policies itself; asked for a
+   second PMC member to double-check before we finalize.)* Which `doAs` posture
+   is the supported/recommended one, and how does it change the authorization
+   story?
 8. What properties does Hive claim to uphold given valid input (auth, authz
    scoping, others)?
 
 **Wave 3 — disclaimed properties & defaults**
-9. Confirm the operator-owned list in §6 (TLS, authz-model choice, network
+9. *(Partially — okumin notes HMS is protected at the application level rather
+   than the network level (reflected in §3.3/§4), and is still considering
+   whether to treat Ranger as the only supported authorization system.)*
+   Confirm the operator-owned list in §6 (TLS, authz-model choice, network
    isolation, UDF vetting). Anything mis-assigned?
 10. Confirm the by-design non-guarantees in §7.
 11. Is super-linear resource use / a hang on a pathological query a bug, or is
     bounding it the operator's job (YARN queues / HS2 limits)?
-12. Confirm the real names + shipped defaults of the §8 levers (especially
+12. *(PMC reviewing — okumin is checking the exact TLS configuration parameter
+    names on the Hive side; §8 TLS lever left unnamed pending that.)* Confirm
+    the real names + shipped defaults of the §8 levers (especially
     `hive.server2.authentication` and the default authorization model).
 13. What do scanners/fuzzers/researchers most often report that you consider a
     non-finding? (Feeds §11a.)