From 6f7dc8a3b457b6dbc14d70fe3d9a304a0b479069 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Sun, 8 Mar 2026 22:36:39 -0500
Subject: [PATCH 01/13] docs: add end-to-end installation guide and fill
 customer-facing gaps

- Add book/src/manuals/installation-guide.md: 10-step deployment guide
  stitching together existing docs and filling gaps (Vault commands,
  Temporal setup, admin-cli build, Elektra OTP bootstrap, verification)
- Update building_bmm_containers.md: add image summary table, tagging/
  pushing section (auth before tag/push), REST image build steps, fix
  typo "perfrom" and stray backtick in tar command
- Update site-setup.md: replace nvcr.io/nvidian internal image refs with
  <YOUR_REGISTRY> placeholders and build-from-source links (fixes #476)
- Update helm/PREREQUISITES.md: add Vault PKI engine/role/auth/policy
  commands, explicit carbide DB/user requirements, pg extensions, and
  new Temporal section (optional for core, required for REST)
- Update book/src/SUMMARY.md: add installation guide entry, fix broken
  faqs.md link (file is faq.md)
- Update README.md: add installation guide link in Getting Started

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 README.md                                    |   1 +
 book/src/SUMMARY.md                          |   1 +
 book/src/manuals/building_nico_containers.md |  78 ++-
 book/src/manuals/installation-guide.md       | 478 +++++++++++++++++++
 book/src/manuals/pushing_containers.md       |  39 ++
 book/src/manuals/site-setup.md               |  18 +-
 helm/PREREQUISITES.md                        |  94 +++-
 7 files changed, 688 insertions(+), 21 deletions(-)
 create mode 100644 book/src/manuals/installation-guide.md
 create mode 100644 book/src/manuals/pushing_containers.md

diff --git a/README.md b/README.md
index 4be89996d4..25f7e1db99 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,7 @@ of the bare-metal lifecycle to fast-track building next generation AI Cloud offe
 ## Getting Started
 
 - Go to the [NCX Infra Controller overview](https://nvidia.github.io/ncx-infra-controller-core/) to get an overview of NICo architecture and capabilities.
+- Follow the [End-to-End Installation Guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/installation-guide.html) for a complete walkthrough from cluster setup to first provisioned host.
 - Or jump to the [Site Setup guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/site-setup.html) to start setting up your site for NICo.
 - Or jump to the [Building Containers guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/building_nico_containers.html) to see an overview for building the containers.
 - Check out [Local Development with DevSpace](dev/deployment/devspace/README.md) to run NICo locally with mock systems.
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
index 6afafd1e5c..ec4852b535 100644
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@@ -25,6 +25,7 @@
 
 # Manuals
 
+- [End-to-End Installation Guide](manuals/installation-guide.md)
 - [Site Setup](manuals/site-setup.md)
     - [Site Reference Architecture](manuals/site-reference-arch.md)
     - [Networking Requirements](manuals/networking_requirements.md)
diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index b3684966ea..ef2bb31d88 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -1,12 +1,34 @@
 # Building NICo Containers
 
 This section provides instructions for building the containers for NCX Infra Controller (NICo).
+For the complete deployment workflow, see the [End-to-End Installation Guide](installation-guide.md).
+
+## Container Image Summary
+
+The following table lists all container images produced by this build process:
+
+| Image Name | Dockerfile | Purpose | Architecture |
+|------------|-----------|---------|-------------|
+| `nico-buildcontainer-x86_64` | `dev/docker/Dockerfile.build-container-x86_64` | Intermediate build container (Rust toolchain, libraries) | x86_64 |
+| `nico-runtime-container-x86_64` | `dev/docker/Dockerfile.runtime-container-x86_64` | Intermediate runtime base image | x86_64 |
+| `nico` (nvmetal-carbide) | `dev/docker/Dockerfile.release-container-sa-x86_64` | Carbide API, DHCP, DNS, PXE, hardware health, SSH console | x86_64 |
+| `boot-artifacts-x86_64` | `dev/docker/Dockerfile.release-artifacts-x86_64` | PXE boot artifacts for x86 hosts | x86_64 |
+| `boot-artifacts-aarch64` | `dev/docker/Dockerfile.release-artifacts-aarch64` | PXE boot artifacts for DPU BFB provisioning | x86_64 (bundles aarch64 binaries) |
+| `machine-validation-runner` | `dev/docker/Dockerfile.machine-validation-runner` | Machine validation / burn-in test runner | x86_64 |
+| `machine-validation-config` | `dev/docker/Dockerfile.machine-validation-config` | Machine validation config (bundles runner tar) | x86_64 |
+| `build-artifacts-container-cross-aarch64` | `dev/docker/Dockerfile.build-artifacts-container-cross-aarch64` | Intermediate cross-compile container for aarch64 | x86_64 |
+
+The intermediate images (`nico-buildcontainer-x86_64`, `nico-runtime-container-x86_64`,
+`build-artifacts-container-cross-aarch64`) are used during the build process and do not
+need to be pushed to your registry. The remaining images must be pushed to a registry
+accessible by your Kubernetes cluster.
 
 ## Installing Prerequisite Software
 
 Before you begin, ensure you have the following prerequisites:
 
 * An Ubuntu 24.04 Host or VM with 150GB+ of disk space (MacOS is not supported)
+* For REST containers: Go 1.25.4 or later, Docker 20.10+ with BuildKit enabled
 
 Use the following steps to install the prerequisite software on the Ubuntu Host or VM. These instructions
 assume an `apt`-based distribution such as Ubuntu 24.04.
@@ -55,27 +77,34 @@ cargo make --cwd pxe --env SA_ENABLEMENT=1 build-boot-artifacts-x86-host-sa
 docker build --build-arg "CONTAINER_RUNTIME_X86_64=alpine:latest" -t boot-artifacts-x86_64 -f dev/docker/Dockerfile.release-artifacts-x86_64 .
 ```
 
-## Building the Machine Validation images
+## Building the Machine Validation Images
 
 ```sh
-docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 -t machine-validation-runner -f dev/docker/Dockerfile.machine-validation-runner .
+docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 \
+  -t machine-validation-runner -f dev/docker/Dockerfile.machine-validation-runner .
 
-docker save --output crates/machine-validation/images/machine-validation-runner.tar machine-validation-runner:latest 
-
-// This copies `machine-validation-runner.tar` into the `/images` directory on the `machine-validation-config` container.  When using a kubernetes deployment model
-// this is the only `machine-validation` container you need to configure on the `carbide-pxe` pod.
-
-docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 -t machine-validation-config -f dev/docker/Dockerfile.machine-validation-config .
+docker save --output crates/machine-validation/images/machine-validation-runner.tar \
+  machine-validation-runner:latest
 
+docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 \
+  -t machine-validation-config -f dev/docker/Dockerfile.machine-validation-config .
 ```
 
-## Building nico-core container
+The `machine-validation-config` container bundles `machine-validation-runner.tar` into its
+`/images` directory. In a Kubernetes deployment, this is the only machine-validation
+container you need to configure on the `carbide-pxe` pod.
+
+## Building nico-core Container
 
 ```sh
-docker build --build-arg "CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64" --build-arg "CONTAINER_BUILD_X86_64=nico-buildcontainer-x86_64" -f dev/docker/Dockerfile.release-container-sa-x86_64 -t nico .
+docker build \
+  --build-arg "CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64" \
+  --build-arg "CONTAINER_BUILD_X86_64=nico-buildcontainer-x86_64" \
+  -f dev/docker/Dockerfile.release-container-sa-x86_64 \
+  -t nico .
 ```
 
-## Building the AARCH64 Containers and artifacts
+## Building the AARCH64 Containers and Artifacts
 
 ### Building the Cross-compile container
 
@@ -101,3 +130,30 @@ docker build --build-arg "CONTAINER_RUNTIME_AARCH64=alpine:latest" -t boot-artif
 ```
 
 **NOTE**: The `CONTAINER_RUNTIME_AARCH64=alpine:latest` build argument must be included. The aarch64 binaries are bundled into an x86 container.
+
+## Building REST Containers
+
+The REST components (cloud-api, cloud-workflow, site-manager, site-agent,
+db migrations, cert-manager) are built from the
+[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) repository.
+
+```sh
+cd bare-metal-manager-rest
+make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<your-version-tag>
+```
+
+### REST Image Summary
+
+| Image | Purpose |
+|-------|---------|
+| `carbide-rest-api` | REST API server (port 8388) |
+| `carbide-rest-workflow` | Temporal workflow worker (cloud-worker, site-worker) |
+| `carbide-rest-site-manager` | Site management / registry service |
+| `carbide-rest-site-agent` | On-site agent (elektra) |
+| `carbide-rest-db` | Database migration job (runs once per upgrade) |
+| `carbide-rest-cert-manager` | Native PKI certificate manager (credsmgr) |
+
+## Next Steps
+
+After building all images, tag and push them to your private registry.
+See [Tagging and Pushing Containers](pushing_containers.md).
diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
new file mode 100644
index 0000000000..85bb0e503d
--- /dev/null
+++ b/book/src/manuals/installation-guide.md
@@ -0,0 +1,478 @@
+# End-to-End Installation Guide
+
+This guide ties together the build, deploy, and configuration steps needed to go from
+a ready Kubernetes cluster to your first provisioned bare-metal host. It links to
+existing documentation for each major step and fills the gaps between them.
+
+The order of operations below follows the sequence validated by NVIDIA engineering
+and SA teams during production deployments.
+
+## Order of Operations
+
+| Step | What | Where to find details |
+|------|------|----------------------|
+| 1 | [Build and push all container images](#1-build-and-push-containers) | [Building NICo Containers](building_nico_containers.md), [REST README](https://github.com/NVIDIA/bare-metal-manager-rest#building-docker-images) |
+| 2 | [Provision site controller OS and Kubernetes](#2-site-controller-and-kubernetes) | [Site Reference Architecture](site-reference-arch.md) |
+| 3 | [Deploy foundation services](#3-foundation-services) | [Site Setup](site-setup.md), [helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) |
+| 4 | [Deploy site CA, credsmgr, and Temporal](#4-site-ca-credsmgr-and-temporal) | This guide |
+| 5 | [Deploy Carbide REST / cloud components](#5-deploy-carbide-rest-components) | This guide, [REST repo](https://github.com/NVIDIA/bare-metal-manager-rest) |
+| 6 | [Deploy Carbide core](#6-deploy-carbide-core) | [Helm README](../../helm/README.md), [deploy/README.md](../../deploy/README.md) |
+| 7 | [Install admin-cli](#7-install-admin-cli) | This guide |
+| 8 | [Deploy Elektra site agent](#8-deploy-elektra-site-agent) | This guide |
+| 9 | [Ingest managed hosts](#9-ingest-hosts) | [Ingesting Hosts](ingesting_machines.md) |
+| 10 | [Verify end-to-end](#10-verification) | This guide |
+
+---
+
+## 1. Build and Push Containers
+
+All container images must be built from source and pushed to a registry your cluster
+can access. There are no pre-built public images available.
+
+```{note}
+If you encounter `nvcr.io/nvidian/...` image references in documentation or manifests,
+those are NVIDIA-internal paths not accessible externally. Replace them with your own
+registry paths after building from source.
+```
+
+### BMM Core
+
+Follow the [Building NICo Containers](building_nico_containers.md) guide for build steps,
+then [Tagging and Pushing Containers](pushing_containers.md) to push images to your
+private registry. It covers
+prerequisites, build steps for x86_64 and aarch64, tagging, pushing to a private
+registry, and a summary table of all images produced.
+
+### BMM REST
+
+Clone [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest)
+and build with:
+
+```bash
+REGISTRY=<your-registry.example.com/carbide>
+TAG=<your-version-tag>
+
+make docker-build IMAGE_REGISTRY=$REGISTRY IMAGE_TAG=$TAG
+
+for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
+             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager; do
+    docker push "$REGISTRY/$image:$TAG"
+done
+```
+
+See the [bare-metal-manager-rest README](https://github.com/NVIDIA/bare-metal-manager-rest#building-docker-images)
+for the full list of images and build options.
+
+---
+
+## 2. Site Controller and Kubernetes
+
+Customers are expected to provision their own site controller OS and Kubernetes cluster.
+
+See the [Site Reference Architecture](site-reference-arch.md) for hardware requirements,
+Kubernetes versions, networking best practices, and IP pool sizing.
+
+In summary, you need:
+
+* 3 or 5 site controller nodes running Ubuntu 24.04 LTS with Kubernetes v1.30.x
+* CNI (Calico v3.28.1 validated), ingress controller (Contour), load balancer (MetalLB)
+* OOB switch VLANs with DHCP relay pointing at the Carbide DHCP service VIP
+* In-band ToR switches with BGP unnumbered on DPU-facing ports, EVPN enabled
+* IP pools allocated per the reference architecture
+
+---
+
+## 3. Foundation Services
+
+Deploy the following services before any Carbide components. The order within this
+step matters.
+
+**For baselines and versions**, see [Site Setup](site-setup.md).
+
+**For the Secrets, ConfigMaps, and ClusterIssuer** that the Helm chart expects, see
+[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) -- it provides `kubectl create`
+commands for every required resource.
+
+Deploy in this order:
+
+1. **External Secrets Operator (ESO)** -- optional, but simplifies secret management.
+   If you skip ESO, create all Kubernetes Secrets manually.
+
+2. **cert-manager** (v1.11.1+) with approver-policy (v0.6.3). Create the
+   `vault-forge-issuer` ClusterIssuer as described in
+   [helm/PREREQUISITES.md](../../helm/PREREQUISITES.md#5-clusterissuer).
+
+3. **PostgreSQL** -- SSL-enabled, with required extensions:
+
+```bash
+psql "postgres://<USER>:<PASS>@<HOST>:<PORT>/<DB>?sslmode=require" \
+  -c 'CREATE EXTENSION IF NOT EXISTS btree_gin;' \
+  -c 'CREATE EXTENSION IF NOT EXISTS pg_trgm;'
+```
+
+4. **Vault** -- deployed and unsealed, with:
+   * PKI secrets engine at mount path **`forgeca`**
+   * PKI role named **`forge-cluster`**
+   * Kubernetes auth enabled with a role for the cert-manager service account
+   * Vault policy granting sign/issue capabilities
+
+These Vault configuration steps are documented in detail in
+[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md#hashicorp-vault).
+
+---
+
+## 4. Site CA, credsmgr, and Temporal
+
+This step sets up the certificate infrastructure that both the REST / cloud components
+and Temporal depend on.
+
+### 4.1 Create Site CA Secrets
+
+Create root CA secrets in the `cert-manager` namespace:
+
+```bash
+kubectl -n cert-manager create secret generic vault-root-ca-certificate \
+  --from-file=certificate=./cacert.pem
+kubectl -n cert-manager create secret generic vault-root-ca-private-key \
+  --from-file=privatekey=./ca.key
+```
+
+If you need to generate a self-signed root CA for testing:
+
+```bash
+openssl req -x509 -newkey rsa:4096 -keyout ca.key -out cacert.pem \
+  -sha256 -days 3650 -nodes -subj "/CN=Carbide Root CA"
+```
+
+### 4.2 Deploy cloud-cert-manager (credsmgr)
+
+credsmgr runs an embedded Vault process and creates the `vault-issuer` ClusterIssuer
+used for Temporal TLS certificates and cloud component mTLS.
+
+From the `bare-metal-manager-rest` repository, update images in
+`deploy/kustomize/base/cert-manager/kustomization.yaml` to point at your registry,
+then:
+
+```bash
+kubectl apply -k deploy/kustomize/base/cert-manager
+kubectl get clusterissuer vault-issuer
+```
+
+Verify the `vault-issuer` shows `Ready=True` before proceeding.
+
+### 4.3 Provision Temporal TLS Certificates
+
+Apply Temporal certificate manifests (client certs for `cloud-api` and `cloud-workflow`,
+server certs for the `temporal` namespace). These manifests are in the
+`bare-metal-manager-rest` repository under `deploy/kustomize/base/temporal-certs`:
+
+```bash
+kubectl apply -k deploy/kustomize/base/temporal-certs
+```
+
+Verify:
+
+```bash
+kubectl -n cloud-api      get certificate temporal-client-cloud-certs
+kubectl -n cloud-workflow  get certificate temporal-client-cloud-certs
+kubectl -n temporal        get secret server-cloud-certs server-interservice-certs server-site-certs
+```
+
+### 4.4 Deploy Temporal
+
+Deploy Temporal server v1.22.6 with Elasticsearch 7.17.3 for visibility.
+Use the TLS certificates provisioned above for mTLS.
+
+After all Temporal pods are `Running`, register the required namespaces:
+
+```bash
+tctl --ns cloud namespace register
+tctl --ns site namespace register
+```
+
+```{note}
+If Temporal pods are stuck in `Init:0/1`, the Elasticsearch index may not be ready.
+Check `kubectl -n temporal logs elasticsearch-master-0` and wait for ES to become
+healthy, or create the index manually.
+```
+
+---
+
+## 5. Deploy Carbide REST Components
+
+The REST / cloud layer provides the customer-facing API, workflow orchestration, and
+site management. Deploy from the
+[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) repository.
+
+For each component below, update the image reference in `kustomization.yaml` to
+your registry and adjust ConfigMaps for your Postgres, Temporal, and Vault endpoints.
+
+### 5.1 Database Migrations (cloud-db)
+
+Initializes the cloud database schema. This is a one-time job:
+
+```bash
+kubectl apply -k deploy/kustomize/base/db
+kubectl -n cloud-db get jobs -w
+```
+
+Wait for the job to complete before proceeding.
+
+### 5.2 cloud-workflow
+
+Deploys `cloud-worker` and `site-worker` Temporal workers:
+
+```bash
+kubectl apply -k deploy/kustomize/base/cloud-workflow
+kubectl -n cloud-workflow get pods
+```
+
+Both deployments should reach `Running`.
+
+### 5.3 cloud-api
+
+The customer-facing REST API:
+
+```bash
+kubectl apply -k deploy/kustomize/base/cloud-api
+kubectl -n cloud-api get pods
+```
+
+### 5.4 cloud-site-manager
+
+The site registry service:
+
+```bash
+kubectl apply -k deploy/kustomize/base/site-manager
+```
+
+```{note}
+If `carbide-rest-site-manager` fails with `unable to start container process`, the
+entrypoint in `deployment.yaml` does not match the production Dockerfile. Update
+`deployment.yaml` to use the correct binary path.
+```
+
+---
+
+## 6. Deploy Carbide Core
+
+This deploys the on-site gRPC API and all supporting services (DHCP, DNS, PXE,
+hardware health, SSH console, and optionally Unbound) into the `forge-system` namespace.
+
+There are two deployment methods: **Helm** (recommended) and **Kustomize** (legacy).
+
+### Helm (Recommended)
+
+See the [Helm chart README](../../helm/README.md) for full documentation and
+[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) for the Secrets and ConfigMaps
+that must exist before install.
+
+1. Copy `helm/examples/values-minimal.yaml` (or `values-full.yaml`) and customize:
+   * `global.image.repository` and `global.image.tag` -- your built core image
+   * `global.imagePullSecrets` -- if using a private registry
+   * `carbide-api.hostname` -- your API FQDN
+   * `carbide-api.siteConfig.carbideApiSiteConfig` -- site-specific TOML overrides
+   * MetalLB `externalService` annotations for each service VIP
+   * Kea DHCP configuration under `carbide-dhcp.config`
+
+2. Install:
+
+```bash
+helm upgrade --install carbide ./helm \
+  --namespace forge-system --create-namespace \
+  -f values-mysite.yaml
+```
+
+3. Verify:
+
+```bash
+kubectl -n forge-system get pods
+kubectl -n forge-system get certificates
+```
+
+The migration job runs automatically. Pods may briefly restart until the database is ready.
+
+### Kustomize (Alternative)
+
+See [deploy/README.md](../../deploy/README.md) for the full list of inputs.
+Populate `deploy/kustomization.yaml` and `deploy/files/`, then:
+
+```bash
+cd deploy
+kustomize build . --enable-helm --enable-alpha-plugins --enable-exec | kubectl apply -f -
+```
+
+### Verify the API
+
+```bash
+curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/healthz
+```
+
+If the API VIP is not externally reachable:
+
+```bash
+kubectl port-forward svc/carbide-api 1079:1079 -n forge-system
+curl -k https://localhost:1079/healthz
+```
+
+---
+
+## 7. Install admin-cli
+
+Build from source in the `bare-metal-manager-core` repository:
+
+```bash
+cargo make build-cli
+```
+
+The binary is at `target/release/admin-cli`. Point it at your API:
+
+```bash
+admin-cli -c https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME> site info
+```
+
+If the API is not externally reachable:
+
+```bash
+kubectl port-forward svc/carbide-api 1079:1079 -n forge-system &
+admin-cli -c https://localhost:1079 site info
+```
+
+---
+
+## 8. Deploy Elektra Site Agent
+
+Elektra bridges the on-site Carbide core to the cloud REST layer via Temporal.
+
+1. Register a site through cloud-api or cloud-site-manager to get a `<SITE_UUID>`.
+
+2. Register the per-site Temporal namespace:
+
+```bash
+tctl --ns <SITE_UUID> namespace register
+```
+
+3. Generate an OTP for the site agent and create the bootstrap secret. The OTP is
+   issued by `cloud-site-manager` and stored as a Kubernetes secret in the
+   `elektra-site-agent` namespace:
+
+```bash
+# Issue a one-time password for the site
+OTP=$(curl -s -X POST https://<CLOUD_API_HOST>/api/v1/sites/<SITE_UUID>/otp \
+  -H "Authorization: Bearer <TOKEN>" | jq -r '.otp')
+
+kubectl -n elektra-site-agent create secret generic site-agent-bootstrap \
+  --from-literal=SITE_UUID=<SITE_UUID> \
+  --from-literal=OTP="$OTP" \
+  --from-literal=CLOUD_API_ENDPOINT=https://<CLOUD_API_HOST>
+```
+
+4. Update the image and site config in the site-agent manifests, then apply:
+
+```bash
+kubectl apply -k deploy/kustomize/base/site-agent
+```
+
+5. Verify:
+
+```bash
+kubectl -n elektra-site-agent get pods
+kubectl -n elektra-site-agent logs -l app=elektra --tail=50
+```
+
+---
+
+## 9. Ingest Hosts
+
+See [Ingesting Hosts](ingesting_machines.md) for the complete procedure.
+
+For each managed host, you need the **BMC MAC address**, **chassis serial number**, and
+**factory BMC username/password** (from your asset management system or server vendor).
+
+```bash
+# Set desired credentials BMM will apply to all hosts
+admin-cli -c <api-url> credential add-bmc --kind=site-wide-root --password='<PASSWORD>'
+admin-cli -c <api-url> credential add-uefi --kind=host --password='<PASSWORD>'
+
+# Upload expected machines manifest
+admin-cli -c <api-url> credential em replace-all --filename expected_machines.json
+
+# Approve for measured boot ingestion
+admin-cli -c <api-url> mb site trusted-machine approve \* persist --pcr-registers="0,3,5,6"
+```
+
+BMM then automatically: assigns IPs via DHCP, discovers BMCs via Redfish, rotates
+credentials, provisions DPUs, PXE-boots hosts into Scout for hardware discovery, and
+moves machines to the `Available` pool.
+
+Monitor progress:
+
+```bash
+admin-cli -c <api-url> machine list
+```
+
+---
+
+## 10. Verification
+
+Once hosts are `Available`, verify the full deployment:
+
+```bash
+# All core pods running
+kubectl -n forge-system get pods
+
+# API healthy
+curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/healthz
+
+# Machines discovered and available
+admin-cli -c <api-url> machine list
+
+# Admin UI accessible
+# https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME>/admin
+# Or via port-forward: kubectl port-forward svc/carbide-api 1079:1079 -n forge-system
+```
+
+To complete the hello-world test, create an instance to provision Ubuntu on a managed
+host, then SSH to verify:
+
+```bash
+ssh -p 22 <instance-id>@<CARBIDE_SSH_CONSOLE_EXTERNAL_IP>
+```
+
+---
+
+## Troubleshooting
+
+### Temporal Pods Stuck in Init
+
+Pods stuck in `Init:0/1` -- usually Elasticsearch index not ready.
+Check `kubectl -n temporal logs elasticsearch-master-0`.
+
+### kubectl Connection Refused
+
+When accessing through a jump host: `ssh -L 6443:localhost:6443 <jump-host>`
+
+### External API Access Blocked
+
+Use port-forwarding: `kubectl port-forward svc/carbide-api 1079:1079 -n forge-system`
+
+### carbide-rest-site-manager Fails to Start
+
+`unable to start container process` -- entrypoint mismatch between `deployment.yaml`
+and the Dockerfile. Update to the correct binary path.
+
+### Pods Stuck in ImagePullBackOff
+
+Missing `imagePullSecrets`. Verify: `kubectl -n <ns> get secret imagepullsecret`
+
+### nvcr.io/nvidian Image References
+
+Internal NVIDIA paths. Build from source (Step 1) and replace with your registry URL.
+
+### Machines Not Progressing
+
+Check state controller logs:
+`kubectl -n forge-system logs -l app=carbide-api --tail=100 | grep state_controller`
+
+Common causes: DHCP relay not configured on OOB switch, BMC MACs not matching the
+expected machines table, network boot not first in boot order.
diff --git a/book/src/manuals/pushing_containers.md b/book/src/manuals/pushing_containers.md
new file mode 100644
index 0000000000..7e610e3738
--- /dev/null
+++ b/book/src/manuals/pushing_containers.md
@@ -0,0 +1,39 @@
+# Tagging and Pushing Containers to a Private Registry
+
+After building all NICo container images (see [Building NICo Containers](building_nico_containers.md)),
+tag them for your private registry and push. Set your registry URL and version tag as
+environment variables:
+
+```sh
+REGISTRY=<your-registry.example.com/carbide>
+TAG=<your-version-tag>
+```
+
+## Authenticate with your registry
+
+```sh
+docker login <your-registry.example.com>
+```
+
+## Tag and Push NICo Core Images
+
+```sh
+docker tag nico $REGISTRY/nvmetal-carbide:$TAG
+docker tag boot-artifacts-x86_64 $REGISTRY/boot-artifacts-x86_64:$TAG
+docker tag boot-artifacts-aarch64 $REGISTRY/boot-artifacts-aarch64:$TAG
+docker tag machine-validation-config $REGISTRY/machine-validation-config:$TAG
+
+docker push $REGISTRY/nvmetal-carbide:$TAG
+docker push $REGISTRY/boot-artifacts-x86_64:$TAG
+docker push $REGISTRY/boot-artifacts-aarch64:$TAG
+docker push $REGISTRY/machine-validation-config:$TAG
+```
+
+## Tag and Push BMM REST Images
+
+```sh
+for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
+             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager; do
+    docker push "$REGISTRY/$image:$TAG"
+done
+```
diff --git a/book/src/manuals/site-setup.md b/book/src/manuals/site-setup.md
index aa32e3c4ec..8932a6e207 100644
--- a/book/src/manuals/site-setup.md
+++ b/book/src/manuals/site-setup.md
@@ -74,17 +74,23 @@ These components are not required for NICo setup, but are recommended site metri
 
 The following services are installed during the NICo installation process.
 
-- **NICo core (forge‑system)**
+- **NICo core (forge-system)**
 
-  - nvmetal-carbide:v2025.07.04-rc2-0-8-g077781771 (primary carbide-api, plus supporting workloads)
+  - `<YOUR_REGISTRY>/nvmetal-carbide:<TAG>` (primary carbide-api, plus supporting workloads).
+    Build from [bare-metal-manager-core](https://github.com/NVIDIA/bare-metal-manager-core).
+    See [Building BMM Containers](building_bmm_containers.md).
 
-- **cloud‑api**: cloud-api:v0.2.72 (two replicas)
+- **cloud-api**: `<YOUR_REGISTRY>/carbide-rest-api:<TAG>` (two replicas).
+  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
 
-- **cloud‑workflow**: cloud-workflow:v0.2.30 (cloud‑worker, site‑worker)
+- **cloud-workflow**: `<YOUR_REGISTRY>/carbide-rest-workflow:<TAG>` (cloud-worker, site-worker).
+  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
 
-- **cloud‑cert‑manager (credsmgr)**: cloud-cert-manager:v0.1.16
+- **cloud-cert-manager (credsmgr)**: `<YOUR_REGISTRY>/carbide-rest-cert-manager:<TAG>`.
+  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
 
-- **elektra-site-agent**: forge-elektra:v2025.06.20-rc1-0
+- **elektra-site-agent**: `<YOUR_REGISTRY>/carbide-rest-site-agent:<TAG>`.
+  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
 
 ## Order of Operations
 
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index 9955a80638..ccec86f1ab 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -26,9 +26,53 @@ helm install cert-manager jetstack/cert-manager \
 Required for PKI (certificate signing) and secret storage. Vault serves as the backend for the cert-manager issuer and provides secrets to various Carbide components.
 
 - Vault must be deployed and unsealed.
-- A PKI secrets engine must be configured for certificate signing.
+- A **PKI secrets engine** must be enabled at mount path **`forgeca`**:
+
+```bash
+vault secrets enable -path=forgeca pki
+vault secrets tune -max-lease-ttl=87600h forgeca
+```
+
+- A **PKI role** named **`forge-cluster`** must be created under the `forgeca` mount. This role name is referenced by `carbide-api` via the `VAULT_PKI_ROLE_NAME` environment variable:
+
+```bash
+vault write forgeca/roles/forge-cluster \
+  allowed_domains="forge.local,svc.cluster.local" \
+  allow_subdomains=true \
+  allow_bare_domains=true \
+  max_ttl=8760h
+```
+
+- **Kubernetes auth** must be enabled with a role for the **cert-manager service account**, so the `vault-forge-issuer` ClusterIssuer (Section 5) can authenticate to Vault:
+
+```bash
+vault auth enable kubernetes
+vault write auth/kubernetes/config \
+  kubernetes_host="https://kubernetes.default.svc:443"
+vault write auth/kubernetes/role/cert-manager \
+  bound_service_account_names=cert-manager \
+  bound_service_account_namespaces=cert-manager \
+  policies=forge-pki-policy \
+  ttl=1h
+```
+
+- A **Vault policy** must grant the cert-manager role permission to sign certificates:
+
+```bash
+vault policy write forge-pki-policy - <<EOF
+path "forgeca/sign/forge-cluster" {
+  capabilities = ["create", "update"]
+}
+path "forgeca/issue/forge-cluster" {
+  capabilities = ["create"]
+}
+EOF
+```
+
 - The `VAULT_SERVICE` URL must be provided to the cluster via a ConfigMap (see Section 4).
 
+For additional Vault configuration details, see the [Site Setup guide](book/src/manuals/site-setup.md#vault-pki-and-secrets).
+
 ### External Secrets Operator (Optional)
 
 Can be used to synchronize secrets from Vault into Kubernetes automatically. This is not required if you create all necessary secrets manually (see Section 3).
@@ -46,11 +90,53 @@ If you want Prometheus metrics collection, install the [Prometheus Operator](htt
 
 An SSL-enabled PostgreSQL instance is required by `carbide-api` for persistent storage.
 
-- **Recommended:** Use a PostgreSQL operator such as Crunchy PGO or Zalando Postgres Operator to manage the database lifecycle.
-- **Database name:** `carbide` (configurable via values).
-- **Schema creation:** The migration job included in the `carbide-api` subchart handles schema creation and migrations automatically. You do not need to run migrations manually.
+- **Recommended:** Use a PostgreSQL operator such as Crunchy PGO or Zalando Postgres Operator (v1.10.1 with Spilo-15 image 3.0-p1 validated) to manage the database lifecycle.
+- **Database and user:** Create a dedicated database named `carbide` with a dedicated user named `carbide`. Do not use the default `postgres` superuser for Carbide services.
+- **Required extensions:** The following PostgreSQL extensions must be created before the migration job runs:
+
+```bash
+psql "postgres://<POSTGRES_USER>:<POSTGRES_PASSWORD>@<POSTGRES_HOST>:<POSTGRES_PORT>/<POSTGRES_DB>?sslmode=require" \
+  -c 'CREATE EXTENSION IF NOT EXISTS btree_gin;' \
+  -c 'CREATE EXTENSION IF NOT EXISTS pg_trgm;'
+```
+
+- **Schema creation:** The migration job included in the `carbide-api` subchart handles schema creation and migrations automatically after extensions are in place. You do not need to run migrations manually.
 - **Connection details:** Provided to the chart via a ConfigMap and a Secret (see Sections 3 and 4 below).
 
+For additional PostgreSQL configuration details (TLS, ESO integration, per-namespace credentials), see the [Site Setup guide](book/src/manuals/site-setup.md#postgresql-db).
+
+---
+
+## 2a. Temporal (Required for bare-metal-manager-rest only)
+
+Temporal is **not required** by the Carbide core Helm chart. You can operate Carbide core
+standalone using `admin-cli` with direct gRPC commands.
+
+Temporal **is required** if you deploy the
+[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) layer
+(cloud-api, cloud-workflow, site-manager, elektra-site-agent). The REST components use
+Temporal for workflow orchestration between the cloud control plane and site agents.
+
+If you plan to deploy bare-metal-manager-rest:
+
+- **Reference version:** Temporal server v1.22.6, admin tools v1.22.4, UI v2.16.2
+- **Visibility store:** Elasticsearch 7.17.3
+- **Persistence:** PostgreSQL (can share the same cluster as Carbide, with separate
+  databases `temporal` and `temporal_visibility`)
+- **Frontend endpoint:** `temporal-frontend.temporal.svc:7233` (cluster-internal)
+- **Required namespaces:** Register `cloud` and `site` after Temporal is running:
+
+```bash
+tctl --ns cloud namespace register
+tctl --ns site namespace register
+```
+
+- **mTLS:** The REST components expect Temporal client TLS certificates. These are
+  issued by the `vault-issuer` ClusterIssuer created by cloud-cert-manager (credsmgr),
+  which is part of bare-metal-manager-rest. See the
+  [End-to-End Installation Guide](book/src/manuals/installation-guide.md) for the
+  full deployment order.
+
 ---
 
 ## 3. Kubernetes Secrets

From 24d7f406a9761987779bed2389056dfd5020652f Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Tue, 10 Mar 2026 12:08:09 -0500
Subject: [PATCH 02/13] docs: add Vault AppRole/token generation steps to
 PREREQUISITES.md

Add step-by-step instructions for obtaining VAULT_ROLE_ID,
VAULT_SECRET_ID, and VAULT_TOKEN from Vault. These values were
previously listed as required but with no explanation of how to
generate them -- customers were blocked at this step.

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 helm/PREREQUISITES.md |  54 ++++++++++++++++--
 work.md               | 130 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 180 insertions(+), 4 deletions(-)
 create mode 100644 work.md

diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index ccec86f1ab..b8708a35ea 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -166,23 +166,69 @@ Vault AppRole credentials for automated secret access by Carbide services.
 
 **Required keys:** `VAULT_ROLE_ID`, `VAULT_SECRET_ID`
 
+To obtain these values, enable AppRole auth in Vault and create a role for Carbide:
+
+```bash
+vault auth enable approle
+
+vault write auth/approle/role/carbide \
+  token_policies="forge-pki-policy,forge-kv-policy" \
+  token_ttl=1h \
+  token_max_ttl=4h \
+  secret_id_ttl=0
+```
+
+Then read the role ID and generate a secret ID:
+
+```bash
+vault read -field=role_id auth/approle/role/carbide/role-id
+vault write -field=secret_id -f auth/approle/role/carbide/secret-id
+```
+
+Create the Kubernetes secret with the values returned above:
+
 ```bash
 kubectl create secret generic carbide-vault-approle-tokens \
   --namespace forge-system \
-  --from-literal=VAULT_ROLE_ID='<role-id>' \
-  --from-literal=VAULT_SECRET_ID='<secret-id>'
+  --from-literal=VAULT_ROLE_ID='<role-id-from-above>' \
+  --from-literal=VAULT_SECRET_ID='<secret-id-from-above>'
 ```
 
 ### `carbide-vault-token`
 
-Vault token for direct API access.
+Vault token for direct API access. This token is used by Carbide services that
+authenticate to Vault directly rather than via AppRole.
 
 **Required keys:** `VAULT_TOKEN`
 
+Generate a token with the policies Carbide needs:
+
+```bash
+vault token create \
+  -policy=forge-pki-policy \
+  -policy=forge-kv-policy \
+  -ttl=768h \
+  -display-name=carbide-api
+```
+
+The `token` field in the output is your `VAULT_TOKEN`. Create the Kubernetes secret:
+
 ```bash
 kubectl create secret generic carbide-vault-token \
   --namespace forge-system \
-  --from-literal=VAULT_TOKEN='<vault-token>'
+  --from-literal=VAULT_TOKEN='<token-from-above>'
+```
+
+**Note:** The policies referenced above (`forge-pki-policy`, `forge-kv-policy`) must
+be created first. See the [Vault section](#hashicorp-vault) above for the PKI policy.
+For the KV policy:
+
+```bash
+vault policy write forge-kv-policy - <<EOF
+path "secrets/*" {
+  capabilities = ["create", "read", "update", "delete", "list"]
+}
+EOF
 ```
 
 ### `ssh-host-key` (for carbide-ssh-console-rs)
diff --git a/work.md b/work.md
new file mode 100644
index 0000000000..c6ad6b3462
--- /dev/null
+++ b/work.md
@@ -0,0 +1,130 @@
+# PR Review Context: docs/end-to-end-installation-guide
+
+## What this PR does
+
+This PR improves the deployment documentation for NVIDIA Bare Metal Manager (Carbide)
+so external customers can actually deploy it end-to-end without needing internal NVIDIA
+access, Slack channels, or video recordings.
+
+## The problem
+
+Customers following the public repos (bare-metal-manager-core and bare-metal-manager-rest)
+hit three blockers:
+
+1. **nvcr.io/nvidian references**: `site-setup.md` listed container images from NVIDIA's
+   internal registry (nvcr.io/nvidian/nvforge-devel/...) with no instructions on how to
+   build them from source. This was filed as GitHub issue #476.
+
+2. **Vault/PostgreSQL gaps in helm/PREREQUISITES.md**: Customers asked (verbatim):
+   - "A PKI secrets engine is required for Vault. Is there any specific setting also?"
+   - "Do we have to create another one called 'carbide'?"
+   - "I don't see Temporal in prerequisites. Do carbide still need it?"
+   The answers were discoverable only by cross-referencing site-setup.md, Helm values
+   files, and the ClusterIssuer example.
+
+3. **No end-to-end guide**: The 10-step deployment flow (validated by SA teams at TLV01)
+   was only documented internally. Externally, customers had to piece together:
+   - building_bmm_containers.md (how to build core images)
+   - bare-metal-manager-rest README (how to build REST images)
+   - site-setup.md (foundation service baselines)
+   - helm/PREREQUISITES.md (secrets and configmaps)
+   - helm/README.md (Helm chart config)
+   - deploy/README.md (Kustomize alternative)
+   - ingesting_machines.md (host onboarding)
+   with no document explaining the ordering or the gaps between them.
+
+## Files changed (6 files, +685 -29)
+
+### New: book/src/manuals/installation-guide.md
+
+A lean "stitching" document that links to existing docs and fills gaps. Follows the
+exact 10-step sequence SA teams used during production deployments:
+
+1. Build and push containers (links to building_bmm_containers.md + REST README)
+2. Site controller and Kubernetes (links to site-reference-arch.md)
+3. Foundation services (links to site-setup.md + PREREQUISITES.md)
+4. Site CA, credsmgr, and Temporal (GAP FILLED: deployment order, vault commands)
+5. Deploy Carbide REST components (GAP FILLED: cloud-db, workflow, api, site-manager)
+6. Deploy Carbide core (links to helm/README.md, with Kustomize alternative)
+7. Install admin-cli (GAP FILLED: build from source, port-forward workaround)
+8. Deploy Elektra site agent (GAP FILLED: site registration, Temporal namespace, OTP)
+9. Ingest hosts (links to ingesting_machines.md)
+10. Verification (GAP FILLED: healthz, admin UI, hello-world test)
+
+Plus a troubleshooting section with real issues from SA deployments.
+
+### Updated: book/src/manuals/building_bmm_containers.md
+
+Added:
+- Container image summary table (image name, Dockerfile, purpose, architecture)
+- Which images are intermediate (don't push) vs deployable (must push)
+- "Tagging and Pushing to a Private Registry" section with docker tag/push commands
+- "Building BMM REST Containers" section with make docker-build + push loop
+- REST image summary table
+- Fixed typo: "perfrom" -> "perform", removed stray backtick in tar command
+
+### Updated: book/src/manuals/site-setup.md
+
+Replaced 5 lines referencing nvcr.io/nvidian/nvforge-devel/... with:
+- `<YOUR_REGISTRY>/image-name:<TAG>` placeholder format
+- "Build from [repo-name](github-link)" for each component
+This directly fixes GitHub issue #476.
+
+### Updated: helm/PREREQUISITES.md
+
+Added under "HashiCorp Vault":
+- PKI secrets engine must be at mount path `forgeca` (with vault commands)
+- PKI role must be named `forge-cluster` (with vault write command)
+- Kubernetes auth must have a role for cert-manager SA (with vault commands)
+- Vault policy for PKI signing (with vault policy write command)
+- Link to site-setup.md for additional details
+
+Added new section "2a. Temporal":
+- Temporal is NOT required for carbide-core (can use admin-cli with gRPC directly)
+- Temporal IS required for bare-metal-manager-rest
+- Reference versions, frontend endpoint, required namespaces, mTLS note
+
+Updated under "PostgreSQL Database":
+- Explicit: "Create a dedicated database named carbide with a dedicated user named
+  carbide. Do not use the default postgres superuser."
+- Added required extensions (btree_gin, pg_trgm) with psql command
+- Link to site-setup.md for additional details
+
+### Updated: book/src/SUMMARY.md
+
+Added "End-to-End Installation Guide" as the first entry under Manuals.
+
+### Updated: README.md
+
+Added installation guide link in Getting Started section. Tweaked existing link
+descriptions to be more specific.
+
+## Overlap with PR #479
+
+Larry Chen's PR #479 ("docs: remove private repos") also modifies site-setup.md to
+strip nvcr.io/nvidian prefixes. His change is narrower (just removes the prefix,
+leaving bare image names). Our change is broader (replaces with YOUR_REGISTRY
+placeholders and adds build-from-source links). Whoever merges second resolves the
+conflict on that file.
+
+## How to review
+
+1. Start with `book/src/manuals/installation-guide.md` -- does the 10-step flow make
+   sense? Does it match what you'd actually do deploying Carbide?
+
+2. Check `helm/PREREQUISITES.md` -- are the Vault commands correct? Is the Temporal
+   section accurate (optional for core, required for REST)?
+
+3. Check `book/src/manuals/building_bmm_containers.md` -- is the image summary table
+   complete? Are the tag/push commands right?
+
+4. Check `book/src/manuals/site-setup.md` -- are the replacement image names and repo
+   links correct?
+
+## Source material
+
+- SA deployment notes for TLV01 (Chelsea Isaac's 10-step guide)
+- Carbide Installation Walkthrough BYO K8s (~4000 line internal doc, sections 7.0-7.14)
+- Customer questions from SMC/Rafay partner deployments
+- GitHub issue #476 (nvcr.io references blocking customers)
+- Slack threads from #carbide-sa-enablement and ext-rafay channels

From 97b00ec0c14a1e84c639705ebea872dbedcf3003 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Tue, 10 Mar 2026 15:56:29 -0500
Subject: [PATCH 03/13] docs: fix stale BMM references after NICo rename

- Update site-setup.md, SUMMARY.md, installation-guide.md, and
  pushing_containers.md to use NICo naming and correct repo links
- Remove work.md from PR

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/SUMMARY.md                    |  14 ++-
 book/src/manuals/installation-guide.md |  10 +-
 book/src/manuals/pushing_containers.md |   2 +-
 book/src/manuals/site-setup.md         |  16 +--
 work.md                                | 130 -------------------------
 5 files changed, 23 insertions(+), 149 deletions(-)
 delete mode 100644 work.md

diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
index ec4852b535..d996d29c8d 100644
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@@ -1,4 +1,4 @@
-# NCX Infra Controller
+# NVIDIA Bare Metal Manager
 
 - [Introduction](README.md)
 - [Hardware Compatbility List](hcl.md)
@@ -11,7 +11,6 @@
 - [Redfish Workflow](architecture/redfish_workflow.md)
     - [Redfish Endpoints Reference](architecture/redfish/endpoints_reference.md)
 - [Reliable state handling](architecture/state_handling.md)
-- [Networking integrations](architecture/networking_integrations.md)
 - [DPU configuration](architecture/dpu_configuration.md)
 - [Health checks and health aggregation](architecture/health_aggregation.md)
     - [Health probe IDs](architecture/health/health_probe_ids.md)
@@ -30,6 +29,7 @@
     - [Site Reference Architecture](manuals/site-reference-arch.md)
     - [Networking Requirements](manuals/networking_requirements.md)
 - [Building NICo Containers](manuals/building_nico_containers.md)
+- [Tagging and Pushing Containers](manuals/pushing_containers.md)
 - [Ingesting Hosts](manuals/ingesting_machines.md)
 - [Updating Expected Hosts Manifest](manuals/expected_machine_update.md)
 - [Host Validation](manuals/machine_validation.md)
@@ -40,13 +40,17 @@
   - [VPC Routing Profiles](manuals/vpc/vpc_routing_profiles.md)
   - [VPC Peering](manuals/vpc/vpc_peering_management.md)
 - [Metrics]()
-    - [Core metrics](manuals/metrics/core_metrics.md)
+    - [Core metrics](manuals/metrics/carbide_core_metrics.md)
+
+# Sites and site access
+
+- [carbide-admin-cli access](sites/forge_admin_cli.md)
 
 <!-- TODO: Add "Updating Hosts" and "Removing Hosts" pages. -->
 
 # Design
 
-- [SPIFFE SVID Design](design/machine-identity/spiffe-svid-sdd.md)
+- [SPIFFE SVID Design](design/spiffe-svid-sdd.md)
 
 # Development
 
@@ -68,7 +72,7 @@
 
 # Playbooks
 
-- [Azure OIDC for NCX Infra Controller-Web UI](playbooks/carbide_web_oauth2.md)
+- [Azure OIDC for NVIDIA Bare Metal Manager-Web UI](playbooks/carbide_web_oauth2.md)
 - [Force deleting and rebuilding Forge hosts](playbooks/force_delete.md)
 - [Rebooting a machine](playbooks/machine_reboot.md)
 - [Instance/Subnet/etc is stuck in a state]()
diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index 85bb0e503d..bb87c1f062 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -35,7 +35,7 @@ those are NVIDIA-internal paths not accessible externally. Replace them with you
 registry paths after building from source.
 ```
 
-### BMM Core
+### NICo Core
 
 Follow the [Building NICo Containers](building_nico_containers.md) guide for build steps,
 then [Tagging and Pushing Containers](pushing_containers.md) to push images to your
@@ -43,7 +43,7 @@ private registry. It covers
 prerequisites, build steps for x86_64 and aarch64, tagging, pushing to a private
 registry, and a summary table of all images produced.
 
-### BMM REST
+### NICo REST
 
 Clone [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest)
 and build with:
@@ -319,7 +319,7 @@ curl -k https://localhost:1079/healthz
 
 ## 7. Install admin-cli
 
-Build from source in the `bare-metal-manager-core` repository:
+Build from source in the `ncx-infra-controller-core` repository:
 
 ```bash
 cargo make build-cli
@@ -390,7 +390,7 @@ For each managed host, you need the **BMC MAC address**, **chassis serial number
 **factory BMC username/password** (from your asset management system or server vendor).
 
 ```bash
-# Set desired credentials BMM will apply to all hosts
+# Set desired credentials NICo will apply to all hosts
 admin-cli -c <api-url> credential add-bmc --kind=site-wide-root --password='<PASSWORD>'
 admin-cli -c <api-url> credential add-uefi --kind=host --password='<PASSWORD>'
 
@@ -401,7 +401,7 @@ admin-cli -c <api-url> credential em replace-all --filename expected_machines.js
 admin-cli -c <api-url> mb site trusted-machine approve \* persist --pcr-registers="0,3,5,6"
 ```
 
-BMM then automatically: assigns IPs via DHCP, discovers BMCs via Redfish, rotates
+NICo then automatically: assigns IPs via DHCP, discovers BMCs via Redfish, rotates
 credentials, provisions DPUs, PXE-boots hosts into Scout for hardware discovery, and
 moves machines to the `Available` pool.
 
diff --git a/book/src/manuals/pushing_containers.md b/book/src/manuals/pushing_containers.md
index 7e610e3738..9a4893f4eb 100644
--- a/book/src/manuals/pushing_containers.md
+++ b/book/src/manuals/pushing_containers.md
@@ -29,7 +29,7 @@ docker push $REGISTRY/boot-artifacts-aarch64:$TAG
 docker push $REGISTRY/machine-validation-config:$TAG
 ```
 
-## Tag and Push BMM REST Images
+## Tag and Push REST Images
 
 ```sh
 for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
diff --git a/book/src/manuals/site-setup.md b/book/src/manuals/site-setup.md
index 8932a6e207..3d8faaeed1 100644
--- a/book/src/manuals/site-setup.md
+++ b/book/src/manuals/site-setup.md
@@ -1,6 +1,6 @@
 # Site Setup Guide
 
-This page outlines the software dependencies for a Kubernetes-based install of NCX Infra Controller (NICo). It includes the *validated baseline* of software dependencies,
+This page outlines the software dependencies for a Kubernetes-based install of NVIDIA Bare Metal Manager (BMM). It includes the *validated baseline* of software dependencies,
 as well as the *order of operations* for site bringup, including what you must configure if you already operate some of the common services yourself.
 
 **Important Notes**
@@ -16,7 +16,7 @@ as well as the *order of operations* for site bringup, including what you must c
 
 ## Validated Baseline
 
-This section lists all software dependencies, including the versions validated for this release of NICo.
+This section lists all software dependencies, including the versions validated for this release of BMM.
 
 ### Kubernetes and Node Runtime
 
@@ -58,7 +58,7 @@ This section lists all software dependencies, including the versions validated f
 
 ### Monitoring and Telemetry (OPTIONAL)
 
-These components are not required for NICo setup, but are recommended site metrics.
+These components are not required for BMM setup, but are recommended site metrics.
 
 - **Monitoring System**:  Prometheus Operator v0.68.0; Prometheus v2.47.0; Alertmanager v0.26.0
 
@@ -70,15 +70,15 @@ These components are not required for NICo setup, but are recommended site metri
 
 - **Host Monitoring** Node exporter v1.6.1
 
-### NICo Components
+### BMM Components
 
-The following services are installed during the NICo installation process.
+The following services are installed during the BMM installation process.
 
-- **NICo core (forge-system)**
+- **NICo core (forge-system)**
 
   - `<YOUR_REGISTRY>/nvmetal-carbide:<TAG>` (primary carbide-api, plus supporting workloads).
-    Build from [bare-metal-manager-core](https://github.com/NVIDIA/bare-metal-manager-core).
-    See [Building BMM Containers](building_bmm_containers.md).
+    Build from [ncx-infra-controller-core](https://github.com/NVIDIA/ncx-infra-controller-core).
+    See [Building NICo Containers](building_nico_containers.md).
 
 - **cloud-api**: `<YOUR_REGISTRY>/carbide-rest-api:<TAG>` (two replicas).
   Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
diff --git a/work.md b/work.md
deleted file mode 100644
index c6ad6b3462..0000000000
--- a/work.md
+++ /dev/null
@@ -1,130 +0,0 @@
-# PR Review Context: docs/end-to-end-installation-guide
-
-## What this PR does
-
-This PR improves the deployment documentation for NVIDIA Bare Metal Manager (Carbide)
-so external customers can actually deploy it end-to-end without needing internal NVIDIA
-access, Slack channels, or video recordings.
-
-## The problem
-
-Customers following the public repos (bare-metal-manager-core and bare-metal-manager-rest)
-hit three blockers:
-
-1. **nvcr.io/nvidian references**: `site-setup.md` listed container images from NVIDIA's
-   internal registry (nvcr.io/nvidian/nvforge-devel/...) with no instructions on how to
-   build them from source. This was filed as GitHub issue #476.
-
-2. **Vault/PostgreSQL gaps in helm/PREREQUISITES.md**: Customers asked (verbatim):
-   - "A PKI secrets engine is required for Vault. Is there any specific setting also?"
-   - "Do we have to create another one called 'carbide'?"
-   - "I don't see Temporal in prerequisites. Do carbide still need it?"
-   The answers were discoverable only by cross-referencing site-setup.md, Helm values
-   files, and the ClusterIssuer example.
-
-3. **No end-to-end guide**: The 10-step deployment flow (validated by SA teams at TLV01)
-   was only documented internally. Externally, customers had to piece together:
-   - building_bmm_containers.md (how to build core images)
-   - bare-metal-manager-rest README (how to build REST images)
-   - site-setup.md (foundation service baselines)
-   - helm/PREREQUISITES.md (secrets and configmaps)
-   - helm/README.md (Helm chart config)
-   - deploy/README.md (Kustomize alternative)
-   - ingesting_machines.md (host onboarding)
-   with no document explaining the ordering or the gaps between them.
-
-## Files changed (6 files, +685 -29)
-
-### New: book/src/manuals/installation-guide.md
-
-A lean "stitching" document that links to existing docs and fills gaps. Follows the
-exact 10-step sequence SA teams used during production deployments:
-
-1. Build and push containers (links to building_bmm_containers.md + REST README)
-2. Site controller and Kubernetes (links to site-reference-arch.md)
-3. Foundation services (links to site-setup.md + PREREQUISITES.md)
-4. Site CA, credsmgr, and Temporal (GAP FILLED: deployment order, vault commands)
-5. Deploy Carbide REST components (GAP FILLED: cloud-db, workflow, api, site-manager)
-6. Deploy Carbide core (links to helm/README.md, with Kustomize alternative)
-7. Install admin-cli (GAP FILLED: build from source, port-forward workaround)
-8. Deploy Elektra site agent (GAP FILLED: site registration, Temporal namespace, OTP)
-9. Ingest hosts (links to ingesting_machines.md)
-10. Verification (GAP FILLED: healthz, admin UI, hello-world test)
-
-Plus a troubleshooting section with real issues from SA deployments.
-
-### Updated: book/src/manuals/building_bmm_containers.md
-
-Added:
-- Container image summary table (image name, Dockerfile, purpose, architecture)
-- Which images are intermediate (don't push) vs deployable (must push)
-- "Tagging and Pushing to a Private Registry" section with docker tag/push commands
-- "Building BMM REST Containers" section with make docker-build + push loop
-- REST image summary table
-- Fixed typo: "perfrom" -> "perform", removed stray backtick in tar command
-
-### Updated: book/src/manuals/site-setup.md
-
-Replaced 5 lines referencing nvcr.io/nvidian/nvforge-devel/... with:
-- `<YOUR_REGISTRY>/image-name:<TAG>` placeholder format
-- "Build from [repo-name](github-link)" for each component
-This directly fixes GitHub issue #476.
-
-### Updated: helm/PREREQUISITES.md
-
-Added under "HashiCorp Vault":
-- PKI secrets engine must be at mount path `forgeca` (with vault commands)
-- PKI role must be named `forge-cluster` (with vault write command)
-- Kubernetes auth must have a role for cert-manager SA (with vault commands)
-- Vault policy for PKI signing (with vault policy write command)
-- Link to site-setup.md for additional details
-
-Added new section "2a. Temporal":
-- Temporal is NOT required for carbide-core (can use admin-cli with gRPC directly)
-- Temporal IS required for bare-metal-manager-rest
-- Reference versions, frontend endpoint, required namespaces, mTLS note
-
-Updated under "PostgreSQL Database":
-- Explicit: "Create a dedicated database named carbide with a dedicated user named
-  carbide. Do not use the default postgres superuser."
-- Added required extensions (btree_gin, pg_trgm) with psql command
-- Link to site-setup.md for additional details
-
-### Updated: book/src/SUMMARY.md
-
-Added "End-to-End Installation Guide" as the first entry under Manuals.
-
-### Updated: README.md
-
-Added installation guide link in Getting Started section. Tweaked existing link
-descriptions to be more specific.
-
-## Overlap with PR #479
-
-Larry Chen's PR #479 ("docs: remove private repos") also modifies site-setup.md to
-strip nvcr.io/nvidian prefixes. His change is narrower (just removes the prefix,
-leaving bare image names). Our change is broader (replaces with YOUR_REGISTRY
-placeholders and adds build-from-source links). Whoever merges second resolves the
-conflict on that file.
-
-## How to review
-
-1. Start with `book/src/manuals/installation-guide.md` -- does the 10-step flow make
-   sense? Does it match what you'd actually do deploying Carbide?
-
-2. Check `helm/PREREQUISITES.md` -- are the Vault commands correct? Is the Temporal
-   section accurate (optional for core, required for REST)?
-
-3. Check `book/src/manuals/building_bmm_containers.md` -- is the image summary table
-   complete? Are the tag/push commands right?
-
-4. Check `book/src/manuals/site-setup.md` -- are the replacement image names and repo
-   links correct?
-
-## Source material
-
-- SA deployment notes for TLV01 (Chelsea Isaac's 10-step guide)
-- Carbide Installation Walkthrough BYO K8s (~4000 line internal doc, sections 7.0-7.14)
-- Customer questions from SMC/Rafay partner deployments
-- GitHub issue #476 (nvcr.io references blocking customers)
-- Slack threads from #carbide-sa-enablement and ext-rafay channels

From f5672e0f230b3248eb25eb4de7075a0750495546 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Sun, 19 Apr 2026 22:31:10 -0500
Subject: [PATCH 04/13] docs: add NGC account prerequisite for container builds

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/building_nico_containers.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index ef2bb31d88..ba0439851a 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -29,6 +29,7 @@ Before you begin, ensure you have the following prerequisites:
 
 * An Ubuntu 24.04 Host or VM with 150GB+ of disk space (MacOS is not supported)
 * For REST containers: Go 1.25.4 or later, Docker 20.10+ with BuildKit enabled
+* An [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/) account (free). Required for pulling base images such as the DOCA HBN container used in the aarch64 / DPU BFB build. Sign up at [ngc.nvidia.com](https://ngc.nvidia.com) and generate an API key under **API Keys** > **Generate Personal Key**.
 
 Use the following steps to install the prerequisite software on the Ubuntu Host or VM. These instructions
 assume an `apt`-based distribution such as Ubuntu 24.04.
@@ -123,6 +124,13 @@ BUILD_CONTAINER_X86_URL="nico-buildcontainer-x86_64" cargo make build-cli
 
 ### Building the DPU BFB
 
+The BFB build automatically pulls the HBN container from `nvcr.io`. You must
+authenticate with NGC before building:
+
+```sh
+docker login nvcr.io -u '$oauthtoken' -p <NGC_API_KEY>
+```
+
 ```sh
 cargo make --cwd pxe --env SA_ENABLEMENT=1 build-boot-artifacts-bfb-sa
 

From 6088910e6c490a58aa6d9a7b9fd8a8c61ceb1d76 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 07:47:47 -0500
Subject: [PATCH 05/13] docs: add missing REST images and repo link

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/installation-guide.md |  3 ++-
 book/src/manuals/pushing_containers.md | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index bb87c1f062..5998284471 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -55,7 +55,8 @@ TAG=<your-version-tag>
 make docker-build IMAGE_REGISTRY=$REGISTRY IMAGE_TAG=$TAG
 
 for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
-             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager; do
+             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager \
+             carbide-rla carbide-psm carbide-nsm; do
     docker push "$REGISTRY/$image:$TAG"
 done
 ```
diff --git a/book/src/manuals/pushing_containers.md b/book/src/manuals/pushing_containers.md
index 9a4893f4eb..0e2bab80d8 100644
--- a/book/src/manuals/pushing_containers.md
+++ b/book/src/manuals/pushing_containers.md
@@ -31,9 +31,22 @@ docker push $REGISTRY/machine-validation-config:$TAG
 
 ## Tag and Push REST Images
 
+REST images are built from the
+[ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest)
+repository. The `make docker-build` command tags images at build time when you pass
+`IMAGE_REGISTRY` and `IMAGE_TAG`:
+
+```sh
+cd /path/to/ncx-infra-controller-rest
+make docker-build IMAGE_REGISTRY=$REGISTRY IMAGE_TAG=$TAG
+```
+
+Then push all REST images:
+
 ```sh
 for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
-             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager; do
+             carbide-rest-site-agent carbide-rest-db carbide-rest-cert-manager \
+             carbide-rla carbide-psm carbide-nsm; do
     docker push "$REGISTRY/$image:$TAG"
 done
 ```

From 89552b2a6779907685c8f0fe7cbd8dd68fabfa15 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:01:58 -0500
Subject: [PATCH 06/13] docs: fix PKI role config and add missing REST images

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/building_nico_containers.md |  3 +++
 helm/PREREQUISITES.md                        | 12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index ba0439851a..371455c31d 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -160,6 +160,9 @@ make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<
 | `carbide-rest-site-agent` | On-site agent (elektra) |
 | `carbide-rest-db` | Database migration job (runs once per upgrade) |
 | `carbide-rest-cert-manager` | Native PKI certificate manager (credsmgr) |
+| `carbide-rla` | Rack Level Abstraction service |
+| `carbide-psm` | Power Shelf Manager service |
+| `carbide-nsm` | NVSwitch Manager service |
 
 ## Next Steps
 
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index b8708a35ea..6a769676a5 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -37,10 +37,14 @@ vault secrets tune -max-lease-ttl=87600h forgeca
 
 ```bash
 vault write forgeca/roles/forge-cluster \
-  allowed_domains="forge.local,svc.cluster.local" \
-  allow_subdomains=true \
-  allow_bare_domains=true \
-  max_ttl=8760h
+  allow_any_name=true \
+  allowed_uri_sans="spiffe://*" \
+  max_ttl=720h \
+  ttl=720h \
+  key_type=ec \
+  key_bits=256 \
+  require_cn=false \
+  use_csr_common_name=true
 ```
 
 - **Kubernetes auth** must be enabled with a role for the **cert-manager service account**, so the `vault-forge-issuer` ClusterIssuer (Section 5) can authenticate to Vault:

From 6d0d64755210145efb268ce72f33cb8c806b010f Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:10:54 -0500
Subject: [PATCH 07/13] docs: fix REST repo references and installation guide
 accuracy

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/building_nico_containers.md |   4 +-
 book/src/manuals/installation-guide.md       | 198 +++++++++----------
 helm/PREREQUISITES.md                        |   8 +-
 3 files changed, 105 insertions(+), 105 deletions(-)

diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index 371455c31d..0c270db485 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -143,10 +143,10 @@ docker build --build-arg "CONTAINER_RUNTIME_AARCH64=alpine:latest" -t boot-artif
 
 The REST components (cloud-api, cloud-workflow, site-manager, site-agent,
 db migrations, cert-manager) are built from the
-[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) repository.
+[ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repository.
 
 ```sh
-cd bare-metal-manager-rest
+cd ncx-infra-controller-rest
 make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<your-version-tag>
 ```
 
diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index 5998284471..1d0f836db3 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -11,14 +11,14 @@ and SA teams during production deployments.
 
 | Step | What | Where to find details |
 |------|------|----------------------|
-| 1 | [Build and push all container images](#1-build-and-push-containers) | [Building NICo Containers](building_nico_containers.md), [REST README](https://github.com/NVIDIA/bare-metal-manager-rest#building-docker-images) |
+| 1 | [Build and push all container images](#1-build-and-push-containers) | [Building NICo Containers](building_nico_containers.md), [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) |
 | 2 | [Provision site controller OS and Kubernetes](#2-site-controller-and-kubernetes) | [Site Reference Architecture](site-reference-arch.md) |
 | 3 | [Deploy foundation services](#3-foundation-services) | [Site Setup](site-setup.md), [helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) |
-| 4 | [Deploy site CA, credsmgr, and Temporal](#4-site-ca-credsmgr-and-temporal) | This guide |
-| 5 | [Deploy Carbide REST / cloud components](#5-deploy-carbide-rest-components) | This guide, [REST repo](https://github.com/NVIDIA/bare-metal-manager-rest) |
+| 4 | [Deploy site CA, credsmgr, and Temporal](#4-site-ca-credsmgr-and-temporal) | This guide, [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) |
+| 5 | [Deploy Carbide REST / cloud components](#5-deploy-carbide-rest-components) | This guide, [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) |
 | 6 | [Deploy Carbide core](#6-deploy-carbide-core) | [Helm README](../../helm/README.md), [deploy/README.md](../../deploy/README.md) |
 | 7 | [Install admin-cli](#7-install-admin-cli) | This guide |
-| 8 | [Deploy Elektra site agent](#8-deploy-elektra-site-agent) | This guide |
+| 8 | [Deploy Elektra site agent](#8-deploy-elektra-site-agent) | This guide, [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) |
 | 9 | [Ingest managed hosts](#9-ingest-hosts) | [Ingesting Hosts](ingesting_machines.md) |
 | 10 | [Verify end-to-end](#10-verification) | This guide |
 
@@ -45,7 +45,7 @@ registry, and a summary table of all images produced.
 
 ### NICo REST
 
-Clone [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest)
+Clone [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest)
 and build with:
 
 ```bash
@@ -61,7 +61,7 @@ for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
 done
 ```
 
-See the [bare-metal-manager-rest README](https://github.com/NVIDIA/bare-metal-manager-rest#building-docker-images)
+See the [ncx-infra-controller-rest README](https://github.com/NVIDIA/ncx-infra-controller-rest#building-docker-images)
 for the full list of images and build options.
 
 ---
@@ -127,56 +127,50 @@ These Vault configuration steps are documented in detail in
 This step sets up the certificate infrastructure that both the REST / cloud components
 and Temporal depend on.
 
-### 4.1 Create Site CA Secrets
+### 4.1 Create Site CA Secret
 
-Create root CA secrets in the `cert-manager` namespace:
+Generate a root CA and create the `ca-signing-secret` used by the
+`carbide-rest-ca-issuer` ClusterIssuer and credsmgr. From the
+`ncx-infra-controller-rest` repository:
 
 ```bash
-kubectl -n cert-manager create secret generic vault-root-ca-certificate \
-  --from-file=certificate=./cacert.pem
-kubectl -n cert-manager create secret generic vault-root-ca-private-key \
-  --from-file=privatekey=./ca.key
+./scripts/gen-site-ca.sh
 ```
 
-If you need to generate a self-signed root CA for testing:
+This creates a `kubernetes.io/tls` secret named `ca-signing-secret` in both the
+`carbide-rest` and `cert-manager` namespaces. Run `./scripts/gen-site-ca.sh --help`
+for options (custom CN, output to disk, dry-run).
 
-```bash
-openssl req -x509 -newkey rsa:4096 -keyout ca.key -out cacert.pem \
-  -sha256 -days 3650 -nodes -subj "/CN=Carbide Root CA"
-```
-
-### 4.2 Deploy cloud-cert-manager (credsmgr)
+### 4.2 Create carbide-rest-ca-issuer and deploy credsmgr
 
-credsmgr runs an embedded Vault process and creates the `vault-issuer` ClusterIssuer
-used for Temporal TLS certificates and cloud component mTLS.
-
-From the `bare-metal-manager-rest` repository, update images in
-`deploy/kustomize/base/cert-manager/kustomization.yaml` to point at your registry,
-then:
+Create the `carbide-rest-ca-issuer` ClusterIssuer (backed by `ca-signing-secret`
+from Step 4.1) and deploy credsmgr. From the `ncx-infra-controller-rest` repository:
 
 ```bash
+kubectl apply -k deploy/kustomize/base/cert-manager-io
 kubectl apply -k deploy/kustomize/base/cert-manager
-kubectl get clusterissuer vault-issuer
+kubectl get clusterissuer carbide-rest-ca-issuer
 ```
 
-Verify the `vault-issuer` shows `Ready=True` before proceeding.
+Verify `carbide-rest-ca-issuer` shows `Ready=True` before proceeding.
 
 ### 4.3 Provision Temporal TLS Certificates
 
-Apply Temporal certificate manifests (client certs for `cloud-api` and `cloud-workflow`,
-server certs for the `temporal` namespace). These manifests are in the
-`bare-metal-manager-rest` repository under `deploy/kustomize/base/temporal-certs`:
+Apply the Temporal namespace, database credentials, and mTLS certificate manifests.
+From the `ncx-infra-controller-rest` repository:
 
 ```bash
-kubectl apply -k deploy/kustomize/base/temporal-certs
+kubectl apply -f deploy/kustomize/base/temporal-helm/namespace.yaml
+kubectl apply -f deploy/kustomize/base/temporal-helm/db-creds.yaml
+kubectl apply -k deploy/kustomize/base/common
 ```
 
-Verify:
+Verify the mTLS certificates are issued:
 
 ```bash
-kubectl -n cloud-api      get certificate temporal-client-cloud-certs
-kubectl -n cloud-workflow  get certificate temporal-client-cloud-certs
-kubectl -n temporal        get secret server-cloud-certs server-interservice-certs server-site-certs
+kubectl wait --for=condition=Ready certificate/server-interservice-cert -n temporal --timeout=120s
+kubectl wait --for=condition=Ready certificate/server-cloud-cert -n temporal --timeout=120s
+kubectl wait --for=condition=Ready certificate/server-site-cert -n temporal --timeout=120s
 ```
 
 ### 4.4 Deploy Temporal
@@ -184,11 +178,25 @@ kubectl -n temporal        get secret server-cloud-certs server-interservice-cer
 Deploy Temporal server v1.22.6 with Elasticsearch 7.17.3 for visibility.
 Use the TLS certificates provisioned above for mTLS.
 
-After all Temporal pods are `Running`, register the required namespaces:
+After all Temporal pods are `Running`, register the required namespaces via
+`temporal-admintools`:
 
 ```bash
-tctl --ns cloud namespace register
-tctl --ns site namespace register
+kubectl exec -n temporal deploy/temporal-admintools -- \
+  temporal operator namespace create --namespace cloud \
+    --address temporal-frontend.temporal:7233 \
+    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
+    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
+    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
+    --tls-server-name interservice.server.temporal.local
+
+kubectl exec -n temporal deploy/temporal-admintools -- \
+  temporal operator namespace create --namespace site \
+    --address temporal-frontend.temporal:7233 \
+    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
+    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
+    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
+    --tls-server-name interservice.server.temporal.local
 ```
 
 ```{note}
@@ -203,55 +211,32 @@ healthy, or create the index manually.
 
 The REST / cloud layer provides the customer-facing API, workflow orchestration, and
 site management. Deploy from the
-[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) repository.
-
-For each component below, update the image reference in `kustomization.yaml` to
-your registry and adjust ConfigMaps for your Postgres, Temporal, and Vault endpoints.
+[ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repository.
 
-### 5.1 Database Migrations (cloud-db)
-
-Initializes the cloud database schema. This is a one-time job:
+All REST components deploy into the `carbide-rest` namespace via a single Helm
+umbrella chart:
 
 ```bash
-kubectl apply -k deploy/kustomize/base/db
-kubectl -n cloud-db get jobs -w
+helm upgrade --install carbide-rest helm/charts/carbide-rest \
+  --namespace carbide-rest --create-namespace \
+  -f <your-ncx-rest-values.yaml> \
+  --set global.image.repository=<your-registry> \
+  --set global.image.tag=<your-rest-tag> \
+  --timeout 600s --wait
 ```
 
-Wait for the job to complete before proceeding.
-
-### 5.2 cloud-workflow
-
-Deploys `cloud-worker` and `site-worker` Temporal workers:
-
-```bash
-kubectl apply -k deploy/kustomize/base/cloud-workflow
-kubectl -n cloud-workflow get pods
-```
+This deploys: `carbide-rest-api`, `carbide-rest-workflow` (cloud-worker and
+site-worker), `carbide-rest-site-manager`, `carbide-rest-db` (migration job),
+`carbide-rest-cert-manager` (credsmgr), and Keycloak (dev IdP).
 
-Both deployments should reach `Running`.
-
-### 5.3 cloud-api
-
-The customer-facing REST API:
-
-```bash
-kubectl apply -k deploy/kustomize/base/cloud-api
-kubectl -n cloud-api get pods
-```
-
-### 5.4 cloud-site-manager
-
-The site registry service:
+Verify:
 
 ```bash
-kubectl apply -k deploy/kustomize/base/site-manager
+kubectl get pods -n carbide-rest
 ```
 
-```{note}
-If `carbide-rest-site-manager` fails with `unable to start container process`, the
-entrypoint in `deployment.yaml` does not match the production Dockerfile. Update
-`deployment.yaml` to use the correct binary path.
-```
+All deployments should reach `Running` and the db-migration job should show
+`Completed`.
 
 ---
 
@@ -344,41 +329,55 @@ admin-cli -c https://localhost:1079 site info
 ## 8. Deploy Elektra Site Agent
 
 Elektra bridges the on-site Carbide core to the cloud REST layer via Temporal.
+It deploys as a StatefulSet in the `carbide-rest` namespace.
 
-1. Register a site through cloud-api or cloud-site-manager to get a `<SITE_UUID>`.
-
-2. Register the per-site Temporal namespace:
+1. Pre-apply the gRPC client certificate so it exists before the pod starts:
 
 ```bash
-tctl --ns <SITE_UUID> namespace register
+helm template carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
+  --namespace carbide-rest \
+  -f <your-site-agent-values.yaml> \
+  --set global.image.repository=<your-registry> \
+  --set global.image.tag=<your-rest-tag> \
+  --show-only templates/certificate.yaml | kubectl apply -f -
+
+kubectl wait --for=condition=Ready certificate/core-grpc-client-site-agent-certs \
+  -n carbide-rest --timeout=120s
 ```
 
-3. Generate an OTP for the site agent and create the bootstrap secret. The OTP is
-   issued by `cloud-site-manager` and stored as a Kubernetes secret in the
-   `elektra-site-agent` namespace:
+2. Create the per-site Temporal namespace (the site-agent panics without it):
 
 ```bash
-# Issue a one-time password for the site
-OTP=$(curl -s -X POST https://<CLOUD_API_HOST>/api/v1/sites/<SITE_UUID>/otp \
-  -H "Authorization: Bearer <TOKEN>" | jq -r '.otp')
-
-kubectl -n elektra-site-agent create secret generic site-agent-bootstrap \
-  --from-literal=SITE_UUID=<SITE_UUID> \
-  --from-literal=OTP="$OTP" \
-  --from-literal=CLOUD_API_ENDPOINT=https://<CLOUD_API_HOST>
+SITE_UUID=<your-site-uuid>
+
+kubectl exec -n temporal deploy/temporal-admintools -- \
+  temporal operator namespace create --namespace "$SITE_UUID" \
+    --address temporal-frontend.temporal:7233 \
+    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
+    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
+    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
+    --tls-server-name interservice.server.temporal.local
 ```
 
-4. Update the image and site config in the site-agent manifests, then apply:
+3. Install the site-agent Helm chart (the pre-install hook registers the site
+   and creates the `site-registration` secret):
 
 ```bash
-kubectl apply -k deploy/kustomize/base/site-agent
+helm upgrade --install carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
+  --namespace carbide-rest \
+  -f <your-site-agent-values.yaml> \
+  --set global.image.repository=<your-registry> \
+  --set global.image.tag=<your-rest-tag> \
+  --set "envConfig.CLUSTER_ID=$SITE_UUID" \
+  --set "envConfig.TEMPORAL_SUBSCRIBE_NAMESPACE=$SITE_UUID" \
+  --timeout 300s --wait
 ```
 
-5. Verify:
+4. Verify:
 
 ```bash
-kubectl -n elektra-site-agent get pods
-kubectl -n elektra-site-agent logs -l app=elektra --tail=50
+kubectl get pods -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent
+kubectl logs -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent --tail=20
 ```
 
 ---
@@ -459,8 +458,9 @@ Use port-forwarding: `kubectl port-forward svc/carbide-api 1079:1079 -n forge-sy
 
 ### carbide-rest-site-manager Fails to Start
 
-`unable to start container process` -- entrypoint mismatch between `deployment.yaml`
-and the Dockerfile. Update to the correct binary path.
+`unable to start container process` -- verify the image was built with the production
+Dockerfile (`docker/production/Dockerfile.carbide-rest-site-manager`), not the local
+dev Dockerfile.
 
 ### Pods Stuck in ImagePullBackOff
 
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index 6a769676a5..28b52303a9 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -111,17 +111,17 @@ For additional PostgreSQL configuration details (TLS, ESO integration, per-names
 
 ---
 
-## 2a. Temporal (Required for bare-metal-manager-rest only)
+## 2a. Temporal (Required for ncx-infra-controller-rest only)
 
 Temporal is **not required** by the Carbide core Helm chart. You can operate Carbide core
 standalone using `admin-cli` with direct gRPC commands.
 
 Temporal **is required** if you deploy the
-[bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest) layer
+[ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) layer
 (cloud-api, cloud-workflow, site-manager, elektra-site-agent). The REST components use
 Temporal for workflow orchestration between the cloud control plane and site agents.
 
-If you plan to deploy bare-metal-manager-rest:
+If you plan to deploy ncx-infra-controller-rest:
 
 - **Reference version:** Temporal server v1.22.6, admin tools v1.22.4, UI v2.16.2
 - **Visibility store:** Elasticsearch 7.17.3
@@ -137,7 +137,7 @@ tctl --ns site namespace register
 
 - **mTLS:** The REST components expect Temporal client TLS certificates. These are
   issued by the `vault-issuer` ClusterIssuer created by cloud-cert-manager (credsmgr),
-  which is part of bare-metal-manager-rest. See the
+  which is part of ncx-infra-controller-rest. See the
   [End-to-End Installation Guide](book/src/manuals/installation-guide.md) for the
   full deployment order.
 

From 2b163afa47fb255581c0949cf63aa930fb155125 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:14:15 -0500
Subject: [PATCH 08/13] docs: fix Temporal cert source and Keycloak deployment
 in install guide

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/installation-guide.md | 27 ++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index 1d0f836db3..0af5442521 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -156,16 +156,24 @@ Verify `carbide-rest-ca-issuer` shows `Ready=True` before proceeding.
 
 ### 4.3 Provision Temporal TLS Certificates
 
-Apply the Temporal namespace, database credentials, and mTLS certificate manifests.
-From the `ncx-infra-controller-rest` repository:
+Apply the Temporal namespace, database credentials, and mTLS server certificate
+manifests. From the `ncx-infra-controller-rest` repository:
+
+```bash
+kubectl apply -k deploy/kustomize/base/temporal-helm
+```
+
+This creates the `temporal` namespace, database credentials, and three server
+mTLS certificates (`server-interservice-cert`, `server-cloud-cert`,
+`server-site-cert`) issued by `carbide-rest-ca-issuer`.
+
+Then apply the common resources (Temporal client certs for the REST workers):
 
 ```bash
-kubectl apply -f deploy/kustomize/base/temporal-helm/namespace.yaml
-kubectl apply -f deploy/kustomize/base/temporal-helm/db-creds.yaml
 kubectl apply -k deploy/kustomize/base/common
 ```
 
-Verify the mTLS certificates are issued:
+Verify the server certificates are issued:
 
 ```bash
 kubectl wait --for=condition=Ready certificate/server-interservice-cert -n temporal --timeout=120s
@@ -227,7 +235,14 @@ helm upgrade --install carbide-rest helm/charts/carbide-rest \
 
 This deploys: `carbide-rest-api`, `carbide-rest-workflow` (cloud-worker and
 site-worker), `carbide-rest-site-manager`, `carbide-rest-db` (migration job),
-`carbide-rest-cert-manager` (credsmgr), and Keycloak (dev IdP).
+and `carbide-rest-cert-manager` (credsmgr).
+
+If you need a dev IdP, deploy Keycloak separately before the umbrella chart:
+
+```bash
+kubectl apply -k deploy/kustomize/base/keycloak -n carbide-rest
+kubectl rollout status deployment/keycloak -n carbide-rest --timeout=300s
+```
 
 Verify:
 

From eb3c5e7d26eb00cbb486455cdc5af37b26d46725 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:26:00 -0500
Subject: [PATCH 09/13] docs: address review findings across PR 512

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/building_nico_containers.md |  2 +-
 book/src/manuals/installation-guide.md       | 38 +++++++++-----------
 helm/PREREQUISITES.md                        | 12 +++----
 3 files changed, 23 insertions(+), 29 deletions(-)

diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index 0c270db485..5b6b5c6725 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -28,7 +28,7 @@ accessible by your Kubernetes cluster.
 Before you begin, ensure you have the following prerequisites:
 
 * An Ubuntu 24.04 Host or VM with 150GB+ of disk space (MacOS is not supported)
-* For REST containers: Go 1.25.4 or later, Docker 20.10+ with BuildKit enabled
+* For REST containers: Go (see `go.mod` in the REST repo for the required version), Docker 20.10+ with BuildKit enabled
 * An [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/) account (free). Required for pulling base images such as the DOCA HBN container used in the aarch64 / DPU BFB build. Sign up at [ngc.nvidia.com](https://ngc.nvidia.com) and generate an API key under **API Keys** > **Generate Personal Key**.
 
 Use the following steps to install the prerequisite software on the Ubuntu Host or VM. These instructions
diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index 0af5442521..aba251b1fa 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -191,22 +191,16 @@ After all Temporal pods are `Running`, register the required namespaces via
 
 ```bash
 kubectl exec -n temporal deploy/temporal-admintools -- \
-  temporal operator namespace create --namespace cloud \
-    --address temporal-frontend.temporal:7233 \
-    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
-    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
-    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
-    --tls-server-name interservice.server.temporal.local
+  temporal operator namespace create cloud --address temporal-frontend.temporal:7233
 
 kubectl exec -n temporal deploy/temporal-admintools -- \
-  temporal operator namespace create --namespace site \
-    --address temporal-frontend.temporal:7233 \
-    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
-    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
-    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
-    --tls-server-name interservice.server.temporal.local
+  temporal operator namespace create site --address temporal-frontend.temporal:7233
 ```
 
+If your Temporal deployment uses mTLS, add the TLS flags to each command:
+`--tls-cert-path`, `--tls-key-path`, `--tls-ca-path`, `--tls-server-name`.
+See `helm-prereqs/SETUP_PHASES.md` for the full mTLS example.
+
 ```{note}
 If Temporal pods are stuck in `Init:0/1`, the Elasticsearch index may not be ready.
 Check `kubectl -n temporal logs elasticsearch-master-0` and wait for ES to become
@@ -240,7 +234,7 @@ and `carbide-rest-cert-manager` (credsmgr).
 If you need a dev IdP, deploy Keycloak separately before the umbrella chart:
 
 ```bash
-kubectl apply -k deploy/kustomize/base/keycloak -n carbide-rest
+(cd <ncx-infra-controller-rest> && kubectl apply -k deploy/kustomize/base/keycloak)
 kubectl rollout status deployment/keycloak -n carbide-rest --timeout=300s
 ```
 
@@ -326,17 +320,17 @@ Build from source in the `ncx-infra-controller-core` repository:
 cargo make build-cli
 ```
 
-The binary is at `target/release/admin-cli`. Point it at your API:
+The binary is at `target/release/carbide-admin-cli`. Point it at your API:
 
 ```bash
-admin-cli -c https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME> site info
+carbide-admin-cli -c https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME> site info
 ```
 
 If the API is not externally reachable:
 
 ```bash
 kubectl port-forward svc/carbide-api 1079:1079 -n forge-system &
-admin-cli -c https://localhost:1079 site info
+carbide-admin-cli -c https://localhost:1079 site info
 ```
 
 ---
@@ -406,14 +400,14 @@ For each managed host, you need the **BMC MAC address**, **chassis serial number
 
 ```bash
 # Set desired credentials NICo will apply to all hosts
-admin-cli -c <api-url> credential add-bmc --kind=site-wide-root --password='<PASSWORD>'
-admin-cli -c <api-url> credential add-uefi --kind=host --password='<PASSWORD>'
+carbide-admin-cli -c <api-url> credential add-bmc --kind=site-wide-root --password='<PASSWORD>'
+carbide-admin-cli -c <api-url> credential add-uefi --kind=host --password='<PASSWORD>'
 
 # Upload expected machines manifest
-admin-cli -c <api-url> credential em replace-all --filename expected_machines.json
+carbide-admin-cli -c <api-url> expected-machine replace-all --filename expected_machines.json
 
 # Approve for measured boot ingestion
-admin-cli -c <api-url> mb site trusted-machine approve \* persist --pcr-registers="0,3,5,6"
+carbide-admin-cli -c <api-url> mb site trusted-machine approve \* persist --pcr-registers="0,3,5,6"
 ```
 
 NICo then automatically: assigns IPs via DHCP, discovers BMCs via Redfish, rotates
@@ -423,7 +417,7 @@ moves machines to the `Available` pool.
 Monitor progress:
 
 ```bash
-admin-cli -c <api-url> machine list
+carbide-admin-cli -c <api-url> machine list
 ```
 
 ---
@@ -440,7 +434,7 @@ kubectl -n forge-system get pods
 curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/healthz
 
 # Machines discovered and available
-admin-cli -c <api-url> machine list
+carbide-admin-cli -c <api-url> machine list
 
 # Admin UI accessible
 # https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME>/admin
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index 28b52303a9..1730a59977 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -75,7 +75,7 @@ EOF
 
 - The `VAULT_SERVICE` URL must be provided to the cluster via a ConfigMap (see Section 4).
 
-For additional Vault configuration details, see the [Site Setup guide](book/src/manuals/site-setup.md#vault-pki-and-secrets).
+For additional Vault configuration details, see the [Site Setup guide](../book/src/manuals/site-setup.md#vault-pki-and-secrets).
 
 ### External Secrets Operator (Optional)
 
@@ -107,7 +107,7 @@ psql "postgres://<POSTGRES_USER>:<POSTGRES_PASSWORD>@<POSTGRES_HOST>:<POSTGRES_P
 - **Schema creation:** The migration job included in the `carbide-api` subchart handles schema creation and migrations automatically after extensions are in place. You do not need to run migrations manually.
 - **Connection details:** Provided to the chart via a ConfigMap and a Secret (see Sections 3 and 4 below).
 
-For additional PostgreSQL configuration details (TLS, ESO integration, per-namespace credentials), see the [Site Setup guide](book/src/manuals/site-setup.md#postgresql-db).
+For additional PostgreSQL configuration details (TLS, ESO integration, per-namespace credentials), see the [Site Setup guide](../book/src/manuals/site-setup.md#postgresql-db).
 
 ---
 
@@ -131,14 +131,14 @@ If you plan to deploy ncx-infra-controller-rest:
 - **Required namespaces:** Register `cloud` and `site` after Temporal is running:
 
 ```bash
-tctl --ns cloud namespace register
-tctl --ns site namespace register
+temporal operator namespace create --namespace cloud --address temporal-frontend.temporal:7233
+temporal operator namespace create --namespace site --address temporal-frontend.temporal:7233
 ```
 
 - **mTLS:** The REST components expect Temporal client TLS certificates. These are
-  issued by the `vault-issuer` ClusterIssuer created by cloud-cert-manager (credsmgr),
+  issued by the `carbide-rest-ca-issuer` ClusterIssuer backed by `ca-signing-secret`,
   which is part of ncx-infra-controller-rest. See the
-  [End-to-End Installation Guide](book/src/manuals/installation-guide.md) for the
+  [End-to-End Installation Guide](../book/src/manuals/installation-guide.md) for the
   full deployment order.
 
 ---

From 342983ba82f2dde09a6491d7a8336db8ad042e7f Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:28:20 -0500
Subject: [PATCH 10/13] docs: align step 8 Temporal namespace with step 4.4
 pattern

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/installation-guide.md | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index aba251b1fa..31dd83c052 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -360,14 +360,11 @@ kubectl wait --for=condition=Ready certificate/core-grpc-client-site-agent-certs
 SITE_UUID=<your-site-uuid>
 
 kubectl exec -n temporal deploy/temporal-admintools -- \
-  temporal operator namespace create --namespace "$SITE_UUID" \
-    --address temporal-frontend.temporal:7233 \
-    --tls-cert-path /var/secrets/temporal/certs/server-interservice/tls.crt \
-    --tls-key-path /var/secrets/temporal/certs/server-interservice/tls.key \
-    --tls-ca-path /var/secrets/temporal/certs/server-interservice/ca.crt \
-    --tls-server-name interservice.server.temporal.local
+  temporal operator namespace create "$SITE_UUID" --address temporal-frontend.temporal:7233
 ```
 
+If your Temporal deployment uses mTLS, add the TLS flags as described in Step 4.4.
+
 3. Install the site-agent Helm chart (the pre-install hook registers the site
    and creates the `site-registration` secret):
 

From 82347d1c97c013438770a4c36b738e3b543771f1 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 08:37:56 -0500
Subject: [PATCH 11/13] docs: fix vault token key, ssh public key, healthz
 route, KV engine step

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/installation-guide.md |  6 +++---
 helm/PREREQUISITES.md                  | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index 31dd83c052..d3242830be 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -300,14 +300,14 @@ kustomize build . --enable-helm --enable-alpha-plugins --enable-exec | kubectl a
 ### Verify the API
 
 ```bash
-curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/healthz
+curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/
 ```
 
 If the API VIP is not externally reachable:
 
 ```bash
 kubectl port-forward svc/carbide-api 1079:1079 -n forge-system
-curl -k https://localhost:1079/healthz
+curl -k https://localhost:1079/
 ```
 
 ---
@@ -428,7 +428,7 @@ Once hosts are `Available`, verify the full deployment:
 kubectl -n forge-system get pods
 
 # API healthy
-curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/healthz
+curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/
 
 # Machines discovered and available
 carbide-admin-cli -c <api-url> machine list
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index 1730a59977..d10e406cf7 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -220,13 +220,22 @@ The `token` field in the output is your `VAULT_TOKEN`. Create the Kubernetes sec
 ```bash
 kubectl create secret generic carbide-vault-token \
   --namespace forge-system \
-  --from-literal=VAULT_TOKEN='<token-from-above>'
+  --from-literal=token='<token-from-above>'
 ```
 
 **Note:** The policies referenced above (`forge-pki-policy`, `forge-kv-policy`) must
 be created first. See the [Vault section](#hashicorp-vault) above for the PKI policy.
 For the KV policy:
 
+Enable the KV v2 secrets engine at the `secrets` mount path (must match
+`FORGE_VAULT_MOUNT` in the `vault-cluster-info` ConfigMap):
+
+```bash
+vault secrets enable -version=2 -path=secrets kv
+```
+
+Then create the policy:
+
 ```bash
 vault policy write forge-kv-policy - <<EOF
 path "secrets/*" {
@@ -245,7 +254,8 @@ SSH host key used by the console proxy service. This key must be generated ahead
 ssh-keygen -t ed25519 -f /tmp/ssh_host_ed25519_key -N ""
 kubectl create secret generic ssh-host-key \
   --namespace forge-system \
-  --from-file=ssh_host_ed25519_key=/tmp/ssh_host_ed25519_key
+  --from-file=ssh_host_ed25519_key=/tmp/ssh_host_ed25519_key \
+  --from-file=ssh_host_ed25519_key_pub=/tmp/ssh_host_ed25519_key.pub
 ```
 
 ### `azure-sso-carbide-web-client-secret` (Optional -- only if using OAuth2)

From 098b0278e2a3956a8068ca2dd9316d9074622623 Mon Sep 17 00:00:00 2001
From: vigneshv <vigneshv@nvidia.com>
Date: Mon, 20 Apr 2026 09:29:29 -0500
Subject: [PATCH 12/13] docs: clean up REST image descriptions and DB secret
 key docs

Signed-off-by: vigneshv <vigneshv@nvidia.com>
---
 book/src/manuals/building_nico_containers.md | 8 ++++----
 helm/PREREQUISITES.md                        | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index 5b6b5c6725..b1e679f346 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -155,11 +155,11 @@ make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<
 | Image | Purpose |
 |-------|---------|
 | `carbide-rest-api` | REST API server (port 8388) |
-| `carbide-rest-workflow` | Temporal workflow worker (cloud-worker, site-worker) |
-| `carbide-rest-site-manager` | Site management / registry service |
-| `carbide-rest-site-agent` | On-site agent (elektra) |
+| `carbide-rest-workflow` | Temporal workflow worker |
+| `carbide-rest-site-manager` | Site management and registry service |
+| `carbide-rest-site-agent` | On-site Temporal agent |
 | `carbide-rest-db` | Database migration job (runs once per upgrade) |
-| `carbide-rest-cert-manager` | Native PKI certificate manager (credsmgr) |
+| `carbide-rest-cert-manager` | PKI certificate manager |
 | `carbide-rla` | Rack Level Abstraction service |
 | `carbide-psm` | Power Shelf Manager service |
 | `carbide-nsm` | NVSwitch Manager service |
diff --git a/helm/PREREQUISITES.md b/helm/PREREQUISITES.md
index d10e406cf7..1db7ad1990 100644
--- a/helm/PREREQUISITES.md
+++ b/helm/PREREQUISITES.md
@@ -151,7 +151,9 @@ All secrets should be created in the `forge-system` namespace (or whichever name
 
 Database credentials for `carbide-api`.
 
-**Required keys:** `username`, `password`, `host`, `port`, `dbname`, `uri`
+**Required keys:** `username`, `password`
+
+The Helm chart reads only `username` and `password` from this secret; connection host, port, and database name come from the `forge-system-carbide-database-config` ConfigMap (Section 4). The additional keys below (`host`, `port`, `dbname`, `uri`) are optional conveniences for manual `psql` access or ESO integration.
 
 ```bash
 kubectl create secret generic forge-system.carbide.forge-pg-cluster.credentials \

From 48225a1988094f78ed87d034cbf6f9b01644660d Mon Sep 17 00:00:00 2001
From: Peter Gambrill <pgambrill@nvidia.com>
Date: Mon, 20 Apr 2026 18:12:53 -0700
Subject: [PATCH 13/13] Copyedits and rebase fixes for end-to-end install
 changes -e Signed-off-by: Peter Gambrill <pgambrill@nvidia.com>

---
 book/src/SUMMARY.md                          |  15 +-
 book/src/manuals/building_nico_containers.md |  10 +-
 book/src/manuals/installation-guide.md       | 244 +++++++++----------
 book/src/manuals/pushing_containers.md       |  15 +-
 book/src/manuals/site-setup.md               |  37 +--
 5 files changed, 162 insertions(+), 159 deletions(-)

diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
index d996d29c8d..7c54009ddc 100644
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@@ -1,7 +1,7 @@
-# NVIDIA Bare Metal Manager
+# NCX Infra Controller
 
 - [Introduction](README.md)
-- [Hardware Compatbility List](hcl.md)
+- [Hardware Compatibility List](hcl.md)
 - [Release Notes](release-notes.md)
 - [FAQs](faq.md)
 
@@ -11,6 +11,7 @@
 - [Redfish Workflow](architecture/redfish_workflow.md)
     - [Redfish Endpoints Reference](architecture/redfish/endpoints_reference.md)
 - [Reliable state handling](architecture/state_handling.md)
+- [Networking integrations](architecture/networking_integrations.md)
 - [DPU configuration](architecture/dpu_configuration.md)
 - [Health checks and health aggregation](architecture/health_aggregation.md)
     - [Health probe IDs](architecture/health/health_probe_ids.md)
@@ -40,17 +41,13 @@
   - [VPC Routing Profiles](manuals/vpc/vpc_routing_profiles.md)
   - [VPC Peering](manuals/vpc/vpc_peering_management.md)
 - [Metrics]()
-    - [Core metrics](manuals/metrics/carbide_core_metrics.md)
-
-# Sites and site access
-
-- [carbide-admin-cli access](sites/forge_admin_cli.md)
+    - [Core metrics](manuals/metrics/core_metrics.md)
 
 <!-- TODO: Add "Updating Hosts" and "Removing Hosts" pages. -->
 
 # Design
 
-- [SPIFFE SVID Design](design/spiffe-svid-sdd.md)
+- [SPIFFE SVID Design](design/machine-identity/spiffe-svid-sdd.md)
 
 # Development
 
@@ -72,7 +69,7 @@
 
 # Playbooks
 
-- [Azure OIDC for NVIDIA Bare Metal Manager-Web UI](playbooks/carbide_web_oauth2.md)
+- [Azure OIDC for NCX Infra Controller-Web UI](playbooks/carbide_web_oauth2.md)
 - [Force deleting and rebuilding Forge hosts](playbooks/force_delete.md)
 - [Rebooting a machine](playbooks/machine_reboot.md)
 - [Instance/Subnet/etc is stuck in a state]()
diff --git a/book/src/manuals/building_nico_containers.md b/book/src/manuals/building_nico_containers.md
index b1e679f346..f5644b1206 100644
--- a/book/src/manuals/building_nico_containers.md
+++ b/book/src/manuals/building_nico_containers.md
@@ -1,7 +1,7 @@
 # Building NICo Containers
 
 This section provides instructions for building the containers for NCX Infra Controller (NICo).
-For the complete deployment workflow, see the [End-to-End Installation Guide](installation-guide.md).
+For the complete deployment workflow, refer to the [End-to-End Installation Guide](installation-guide.md).
 
 ## Container Image Summary
 
@@ -28,8 +28,8 @@ accessible by your Kubernetes cluster.
 Before you begin, ensure you have the following prerequisites:
 
 * An Ubuntu 24.04 Host or VM with 150GB+ of disk space (MacOS is not supported)
-* For REST containers: Go (see `go.mod` in the REST repo for the required version), Docker 20.10+ with BuildKit enabled
-* An [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/) account (free). Required for pulling base images such as the DOCA HBN container used in the aarch64 / DPU BFB build. Sign up at [ngc.nvidia.com](https://ngc.nvidia.com) and generate an API key under **API Keys** > **Generate Personal Key**.
+* For REST containers: Go (refer to the `go.mod` file in the [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) for the current required version), Docker 20.10+ with BuildKit enabled
+* An [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/) account (free). Required for pulling base images such as the DOCA HBN container used in the aarch64/DPU BFB build. Sign up at [ngc.nvidia.com](https://ngc.nvidia.com) and generate an API key under **API Keys** > **Generate Personal Key**.
 
 Use the following steps to install the prerequisite software on the Ubuntu Host or VM. These instructions
 assume an `apt`-based distribution such as Ubuntu 24.04.
@@ -166,5 +166,5 @@ make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<
 
 ## Next Steps
 
-After building all images, tag and push them to your private registry.
-See [Tagging and Pushing Containers](pushing_containers.md).
+After building all images, you will need to tag them and push them to your private registry.
+Refer to the [Tagging and Pushing Containers](pushing_containers.md) section for more details.
diff --git a/book/src/manuals/installation-guide.md b/book/src/manuals/installation-guide.md
index d3242830be..779b169fe9 100644
--- a/book/src/manuals/installation-guide.md
+++ b/book/src/manuals/installation-guide.md
@@ -4,8 +4,8 @@ This guide ties together the build, deploy, and configuration steps needed to go
 a ready Kubernetes cluster to your first provisioned bare-metal host. It links to
 existing documentation for each major step and fills the gaps between them.
 
-The order of operations below follows the sequence validated by NVIDIA engineering
-and SA teams during production deployments.
+The order of operations below has been validated by NVIDIA engineering
+and SA teams for production deployments.
 
 ## Order of Operations
 
@@ -26,7 +26,7 @@ and SA teams during production deployments.
 
 ## 1. Build and Push Containers
 
-All container images must be built from source and pushed to a registry your cluster
+All container images must be built from source and pushed to a registry that your cluster
 can access. There are no pre-built public images available.
 
 ```{note}
@@ -37,16 +37,15 @@ registry paths after building from source.
 
 ### NICo Core
 
-Follow the [Building NICo Containers](building_nico_containers.md) guide for build steps,
-then [Tagging and Pushing Containers](pushing_containers.md) to push images to your
-private registry. It covers
-prerequisites, build steps for x86_64 and aarch64, tagging, pushing to a private
+Follow the [Building NICo Containers](building_nico_containers.md) guide to build the container images,
+then follow the [Tagging and Pushing Containers](pushing_containers.md) guide to push the images to your
+private registry. These sections cover prerequisites, build steps for x86_64 and aarch64, tagging, pushing to a private
 registry, and a summary table of all images produced.
 
 ### NICo REST
 
-Clone [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest)
-and build with:
+Clone the [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repo and build the container images
+as follows:
 
 ```bash
 REGISTRY=<your-registry.example.com/carbide>
@@ -61,77 +60,73 @@ for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
 done
 ```
 
-See the [ncx-infra-controller-rest README](https://github.com/NVIDIA/ncx-infra-controller-rest#building-docker-images)
+Refer to the [ncx-infra-controller-rest README](https://github.com/NVIDIA/ncx-infra-controller-rest#building-docker-images)
 for the full list of images and build options.
 
 ---
 
 ## 2. Site Controller and Kubernetes
 
-Customers are expected to provision their own site controller OS and Kubernetes cluster.
+You will need to provision your own site controller OS and Kubernetes cluster.
 
-See the [Site Reference Architecture](site-reference-arch.md) for hardware requirements,
-Kubernetes versions, networking best practices, and IP pool sizing.
+Refer to the [Site Reference Architecture](site-reference-arch.md) section for hardware requirements,
+Kubernetes versions, networking best practices, and IP pool sizing recommendations.
 
-In summary, you need:
+In summary, you will need the following:
 
 * 3 or 5 site controller nodes running Ubuntu 24.04 LTS with Kubernetes v1.30.x
 * CNI (Calico v3.28.1 validated), ingress controller (Contour), load balancer (MetalLB)
 * OOB switch VLANs with DHCP relay pointing at the Carbide DHCP service VIP
-* In-band ToR switches with BGP unnumbered on DPU-facing ports, EVPN enabled
-* IP pools allocated per the reference architecture
+* In-band ToR switches with BGP unnumbered on DPU-facing ports, with EVPN enabled
+* IP pools allocated per the Site Reference Architecture recommendations
 
 ---
 
 ## 3. Foundation Services
 
-Deploy the following services before any Carbide components. The order within this
-step matters.
+Deploy the following services before any Carbide components.
 
-**For baselines and versions**, see [Site Setup](site-setup.md).
+* *For baselines and versions*, refer to the [Site Setup](site-setup.md) section.
 
-**For the Secrets, ConfigMaps, and ClusterIssuer** that the Helm chart expects, see
-[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) -- it provides `kubectl create`
-commands for every required resource.
+* *For the Secrets, ConfigMaps, and ClusterIssuer* that the Helm chart expects, refer to
+the [helm/PREREQUISITES.md](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/helm/PREREQUISITES.md)
+file, which provides the `kubectl create` commands for every required resource.
 
-Deploy in this order:
+Deploy the services in this order:
 
-1. **External Secrets Operator (ESO)** -- optional, but simplifies secret management.
-   If you skip ESO, create all Kubernetes Secrets manually.
+1. **External Secrets Operator (ESO)**: This service is optional, but simplifies secret management.
+   If you skip ESO, you will need to create all Kubernetes Secrets manually.
 
-2. **cert-manager** (v1.11.1+) with approver-policy (v0.6.3). Create the
-   `vault-forge-issuer` ClusterIssuer as described in
-   [helm/PREREQUISITES.md](../../helm/PREREQUISITES.md#5-clusterissuer).
+2. **cert-manager** (v1.11.1+) with approver-policy (v0.6.3): Create the
+   `vault-forge-issuer` ClusterIssuer as described in the
+   [/helm/PREREQUISITES.md](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/helm/PREREQUISITES.md#5-clusterissuer).
 
-3. **PostgreSQL** -- SSL-enabled, with required extensions:
+3. **PostgreSQL**: SSL-enabled, with extensions. Create the required extensions using the following command:
 
-```bash
-psql "postgres://<USER>:<PASS>@<HOST>:<PORT>/<DB>?sslmode=require" \
-  -c 'CREATE EXTENSION IF NOT EXISTS btree_gin;' \
-  -c 'CREATE EXTENSION IF NOT EXISTS pg_trgm;'
-```
+   ```bash
+   psql "postgres://<USER>:<PASS>@<HOST>:<PORT>/<DB>?sslmode=require" \
+     -c 'CREATE EXTENSION IF NOT EXISTS btree_gin;' \
+     -c 'CREATE EXTENSION IF NOT EXISTS pg_trgm;'
+   ```
 
-4. **Vault** -- deployed and unsealed, with:
-   * PKI secrets engine at mount path **`forgeca`**
-   * PKI role named **`forge-cluster`**
+4. **Vault**: Deployed and unsealed, with the following configuration:
+   * PKI secrets engine at mount path `forgeca`
+   * PKI role named `forge-cluster`
    * Kubernetes auth enabled with a role for the cert-manager service account
-   * Vault policy granting sign/issue capabilities
-
-These Vault configuration steps are documented in detail in
-[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md#hashicorp-vault).
+   * Vault policy granting sign/issue capabilities (Refer to the [Site Setup](site-setup.md#vault-pki-and-secrets) section for more details).
 
 ---
 
 ## 4. Site CA, credsmgr, and Temporal
 
-This step sets up the certificate infrastructure that both the REST / cloud components
+Next, set up the certificate infrastructure that both the REST cloud components
 and Temporal depend on.
 
 ### 4.1 Create Site CA Secret
 
 Generate a root CA and create the `ca-signing-secret` used by the
-`carbide-rest-ca-issuer` ClusterIssuer and credsmgr. From the
-`ncx-infra-controller-rest` repository:
+`carbide-rest-ca-issuer` ClusterIssuer and credsmgr. Run the following command
+from the `ncx-infra-controller-rest` repository:
 
 ```bash
 ./scripts/gen-site-ca.sh
@@ -141,10 +136,11 @@ This creates a `kubernetes.io/tls` secret named `ca-signing-secret` in both the
 `carbide-rest` and `cert-manager` namespaces. Run `./scripts/gen-site-ca.sh --help`
 for options (custom CN, output to disk, dry-run).
 
-### 4.2 Create carbide-rest-ca-issuer and deploy credsmgr
+### 4.2 Create carbide-rest-ca-issuer and Deploy credsmgr
 
 Create the `carbide-rest-ca-issuer` ClusterIssuer (backed by `ca-signing-secret`
-from Step 4.1) and deploy credsmgr. From the `ncx-infra-controller-rest` repository:
+from Step 4.1) and deploy credsmgr. Run the following commands from the `ncx-infra-controller-rest`
+repository:
 
 ```bash
 kubectl apply -k deploy/kustomize/base/cert-manager-io
@@ -152,12 +148,14 @@ kubectl apply -k deploy/kustomize/base/cert-manager
 kubectl get clusterissuer carbide-rest-ca-issuer
 ```
 
-Verify `carbide-rest-ca-issuer` shows `Ready=True` before proceeding.
+Verify that `carbide-rest-ca-issuer` shows `Ready=True` before proceeding.
 
 ### 4.3 Provision Temporal TLS Certificates
 
 Apply the Temporal namespace, database credentials, and mTLS server certificate
-manifests. From the `ncx-infra-controller-rest` repository:
+manifests.
+
+First, run the following command from the `ncx-infra-controller-rest` repository:
 
 ```bash
 kubectl apply -k deploy/kustomize/base/temporal-helm
@@ -167,13 +165,13 @@ This creates the `temporal` namespace, database credentials, and three server
 mTLS certificates (`server-interservice-cert`, `server-cloud-cert`,
 `server-site-cert`) issued by `carbide-rest-ca-issuer`.
 
-Then apply the common resources (Temporal client certs for the REST workers):
+Next, apply the common resources (Temporal client certs for the REST workers):
 
 ```bash
 kubectl apply -k deploy/kustomize/base/common
 ```
 
-Verify the server certificates are issued:
+Verify that the server certificates have been issued:
 
 ```bash
 kubectl wait --for=condition=Ready certificate/server-interservice-cert -n temporal --timeout=120s
@@ -199,20 +197,20 @@ kubectl exec -n temporal deploy/temporal-admintools -- \
 
 If your Temporal deployment uses mTLS, add the TLS flags to each command:
 `--tls-cert-path`, `--tls-key-path`, `--tls-ca-path`, `--tls-server-name`.
-See `helm-prereqs/SETUP_PHASES.md` for the full mTLS example.
+Refer to `helm-prereqs/SETUP_PHASES.md` for the full mTLS example.
 
 ```{note}
 If Temporal pods are stuck in `Init:0/1`, the Elasticsearch index may not be ready.
-Check `kubectl -n temporal logs elasticsearch-master-0` and wait for ES to become
-healthy, or create the index manually.
+Check the logs using `kubectl -n temporal logs elasticsearch-master-0` and wait for
+Elasticsearch to become healthy, or create the index manually.
 ```
 
 ---
 
 ## 5. Deploy Carbide REST Components
 
-The REST / cloud layer provides the customer-facing API, workflow orchestration, and
-site management. Deploy from the
+The REST cloud layer provides the customer-facing API, along with workflow orchestration and
+site management. The components are built from the
 [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repository.
 
 All REST components deploy into the `carbide-rest` namespace via a single Helm
@@ -227,7 +225,7 @@ helm upgrade --install carbide-rest helm/charts/carbide-rest \
   --timeout 600s --wait
 ```
 
-This deploys: `carbide-rest-api`, `carbide-rest-workflow` (cloud-worker and
+This deploys the following: `carbide-rest-api`, `carbide-rest-workflow` (cloud-worker and
 site-worker), `carbide-rest-site-manager`, `carbide-rest-db` (migration job),
 and `carbide-rest-cert-manager` (credsmgr).
 
@@ -238,7 +236,7 @@ If you need a dev IdP, deploy Keycloak separately before the umbrella chart:
 kubectl rollout status deployment/keycloak -n carbide-rest --timeout=300s
 ```
 
-Verify:
+Verify the deployment as follows:
 
 ```bash
 kubectl get pods -n carbide-rest
@@ -258,19 +256,19 @@ There are two deployment methods: **Helm** (recommended) and **Kustomize** (lega
 
 ### Helm (Recommended)
 
-See the [Helm chart README](../../helm/README.md) for full documentation and
-[helm/PREREQUISITES.md](../../helm/PREREQUISITES.md) for the Secrets and ConfigMaps
+Refer to the [Helm chart README](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/helm/README.md) for full documentation and
+[helm/PREREQUISITES.md](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/helm/PREREQUISITES.md) for the Secrets and ConfigMaps
 that must exist before install.
 
-1. Copy `helm/examples/values-minimal.yaml` (or `values-full.yaml`) and customize:
-   * `global.image.repository` and `global.image.tag` -- your built core image
-   * `global.imagePullSecrets` -- if using a private registry
-   * `carbide-api.hostname` -- your API FQDN
-   * `carbide-api.siteConfig.carbideApiSiteConfig` -- site-specific TOML overrides
-   * MetalLB `externalService` annotations for each service VIP
-   * Kea DHCP configuration under `carbide-dhcp.config`
+1. Copy `helm/examples/values-minimal.yaml` (or `values-full.yaml`) and customize the following values:
+   * `global.image.repository` and `global.image.tag`: Your built core image
+   * `global.imagePullSecrets`: If using a private registry, add the secret name here
+   * `carbide-api.hostname`: Your API FQDN
+   * `carbide-api.siteConfig.carbideApiSiteConfig`: Site-specific TOML overrides
+   * `externalService`: MetalLB annotations for each service VIP
+   * `carbide-dhcp.config`: Add your Kea DHCP configuration in this section
 
-2. Install:
+2. Install the Helm chart:
 
 ```bash
 helm upgrade --install carbide ./helm \
@@ -278,7 +276,7 @@ helm upgrade --install carbide ./helm \
   -f values-mysite.yaml
 ```
 
-3. Verify:
+3. Verify the deployment as follows:
 
 ```bash
 kubectl -n forge-system get pods
@@ -289,8 +287,8 @@ The migration job runs automatically. Pods may briefly restart until the databas
 
 ### Kustomize (Alternative)
 
-See [deploy/README.md](../../deploy/README.md) for the full list of inputs.
-Populate `deploy/kustomization.yaml` and `deploy/files/`, then:
+Refer to [deploy/README.md](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/deploy/README.md) for the full list of inputs.
+Populate `deploy/kustomization.yaml` and `deploy/files/`, then run the following command:
 
 ```bash
 cd deploy
@@ -303,7 +301,7 @@ kustomize build . --enable-helm --enable-alpha-plugins --enable-exec | kubectl a
 curl -k https://<CARBIDE_API_EXTERNAL_IP>:1079/
 ```
 
-If the API VIP is not externally reachable:
+If the API VIP is not externally reachable, you can use port-forwarding to access it locally:
 
 ```bash
 kubectl port-forward svc/carbide-api 1079:1079 -n forge-system
@@ -314,19 +312,19 @@ curl -k https://localhost:1079/
 
 ## 7. Install admin-cli
 
-Build from source in the `ncx-infra-controller-core` repository:
+Build the admin-cli from source in the `ncx-infra-controller-core` repository:
 
 ```bash
 cargo make build-cli
 ```
 
-The binary is at `target/release/carbide-admin-cli`. Point it at your API:
+The binary is located at `target/release/carbide-admin-cli`. Point it to your API as follows:
 
 ```bash
 carbide-admin-cli -c https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME> site info
 ```
 
-If the API is not externally reachable:
+If the API is not externally reachable, you can use port-forwarding to access it locally:
 
 ```bash
 kubectl port-forward svc/carbide-api 1079:1079 -n forge-system &
@@ -342,58 +340,58 @@ It deploys as a StatefulSet in the `carbide-rest` namespace.
 
 1. Pre-apply the gRPC client certificate so it exists before the pod starts:
 
-```bash
-helm template carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
-  --namespace carbide-rest \
-  -f <your-site-agent-values.yaml> \
-  --set global.image.repository=<your-registry> \
-  --set global.image.tag=<your-rest-tag> \
-  --show-only templates/certificate.yaml | kubectl apply -f -
+   ```bash
+   helm template carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
+   --namespace carbide-rest \
+   -f <your-site-agent-values.yaml> \
+   --set global.image.repository=<your-registry> \
+   --set global.image.tag=<your-rest-tag> \
+   --show-only templates/certificate.yaml | kubectl apply -f -
 
-kubectl wait --for=condition=Ready certificate/core-grpc-client-site-agent-certs \
-  -n carbide-rest --timeout=120s
-```
+   kubectl wait --for=condition=Ready certificate/core-grpc-client-site-agent-certs \
+   -n carbide-rest --timeout=120s
+   ```
 
 2. Create the per-site Temporal namespace (the site-agent panics without it):
 
-```bash
-SITE_UUID=<your-site-uuid>
+   ```bash
+   SITE_UUID=<your-site-uuid>
 
-kubectl exec -n temporal deploy/temporal-admintools -- \
-  temporal operator namespace create "$SITE_UUID" --address temporal-frontend.temporal:7233
-```
+   kubectl exec -n temporal deploy/temporal-admintools -- \
+   temporal operator namespace create "$SITE_UUID" --address temporal-frontend.temporal:7233
+   ```
 
-If your Temporal deployment uses mTLS, add the TLS flags as described in Step 4.4.
+   If your Temporal deployment uses mTLS, add the TLS flags as described in Step 4.4.
 
 3. Install the site-agent Helm chart (the pre-install hook registers the site
    and creates the `site-registration` secret):
 
-```bash
-helm upgrade --install carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
-  --namespace carbide-rest \
-  -f <your-site-agent-values.yaml> \
-  --set global.image.repository=<your-registry> \
-  --set global.image.tag=<your-rest-tag> \
-  --set "envConfig.CLUSTER_ID=$SITE_UUID" \
-  --set "envConfig.TEMPORAL_SUBSCRIBE_NAMESPACE=$SITE_UUID" \
-  --timeout 300s --wait
-```
+   ```bash
+   helm upgrade --install carbide-rest-site-agent helm/charts/carbide-rest-site-agent \
+   --namespace carbide-rest \
+   -f <your-site-agent-values.yaml> \
+   --set global.image.repository=<your-registry> \
+   --set global.image.tag=<your-rest-tag> \
+   --set "envConfig.CLUSTER_ID=$SITE_UUID" \
+   --set "envConfig.TEMPORAL_SUBSCRIBE_NAMESPACE=$SITE_UUID" \
+   --timeout 300s --wait
+   ```
 
-4. Verify:
+4. Verify the deployment as follows:
 
-```bash
-kubectl get pods -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent
-kubectl logs -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent --tail=20
-```
+   ```bash
+   kubectl get pods -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent
+   kubectl logs -n carbide-rest -l app.kubernetes.io/name=carbide-rest-site-agent --tail=20
+   ```
 
 ---
 
 ## 9. Ingest Hosts
 
-See [Ingesting Hosts](ingesting_machines.md) for the complete procedure.
+Refer to the [Ingesting Hosts](ingesting_machines.md) section for the complete ingestion procedure.
 
-For each managed host, you need the **BMC MAC address**, **chassis serial number**, and
-**factory BMC username/password** (from your asset management system or server vendor).
+For each managed host, you need the BMC MAC address, chassis serial number, and
+factory BMC username/password (from your asset management system or server vendor).
 
 ```bash
 # Set desired credentials NICo will apply to all hosts
@@ -407,11 +405,11 @@ carbide-admin-cli -c <api-url> expected-machine replace-all --filename expected_
 carbide-admin-cli -c <api-url> mb site trusted-machine approve \* persist --pcr-registers="0,3,5,6"
 ```
 
-NICo then automatically: assigns IPs via DHCP, discovers BMCs via Redfish, rotates
-credentials, provisions DPUs, PXE-boots hosts into Scout for hardware discovery, and
+NICo then automatically assigns IPs via DHCP, discovers BMCs via Redfish, rotates
+credentials, provisions DPUs, PXE-boots hosts into Scout for hardware discovery, and then
 moves machines to the `Available` pool.
 
-Monitor progress:
+Monitor progress as follows:
 
 ```bash
 carbide-admin-cli -c <api-url> machine list
@@ -439,7 +437,7 @@ carbide-admin-cli -c <api-url> machine list
 ```
 
 To complete the hello-world test, create an instance to provision Ubuntu on a managed
-host, then SSH to verify:
+host, then use SSH to verify:
 
 ```bash
 ssh -p 22 <instance-id>@<CARBIDE_SSH_CONSOLE_EXTERNAL_IP>
@@ -451,34 +449,36 @@ ssh -p 22 <instance-id>@<CARBIDE_SSH_CONSOLE_EXTERNAL_IP>
 
 ### Temporal Pods Stuck in Init
 
-Pods stuck in `Init:0/1` -- usually Elasticsearch index not ready.
-Check `kubectl -n temporal logs elasticsearch-master-0`.
+If Temporal pods are stuck in `Init:0/1`, the Elasticsearch index may not be ready.
+Check the logs using `kubectl -n temporal logs elasticsearch-master-0` and wait for
+Elasticsearch to become healthy, or create the index manually.
 
 ### kubectl Connection Refused
 
-When accessing through a jump host: `ssh -L 6443:localhost:6443 <jump-host>`
+When accessing through a jump host, use port-forwarding as follows: `ssh -L 6443:localhost:6443 <jump-host>`
 
 ### External API Access Blocked
 
-Use port-forwarding: `kubectl port-forward svc/carbide-api 1079:1079 -n forge-system`
+Use port-forwarding as follows: `kubectl port-forward svc/carbide-api 1079:1079 -n forge-system`
 
 ### carbide-rest-site-manager Fails to Start
 
-`unable to start container process` -- verify the image was built with the production
-Dockerfile (`docker/production/Dockerfile.carbide-rest-site-manager`), not the local
-dev Dockerfile.
+If the carbide-rest-manager returns `unable to start container process`, verify the image was built with the production
+Dockerfile (`docker/production/Dockerfile.carbide-rest-site-manager`), not with the local dev Dockerfile.
 
 ### Pods Stuck in ImagePullBackOff
 
-Missing `imagePullSecrets`. Verify: `kubectl -n <ns> get secret imagepullsecret`
+If pods are stuck in `ImagePullBackOff`, verify that the `imagePullSecrets` are present. Run the following command to check: `kubectl -n <ns> get secret imagepullsecret`
 
 ### nvcr.io/nvidian Image References
 
-Internal NVIDIA paths. Build from source (Step 1) and replace with your registry URL.
+If you encounter `nvcr.io/nvidian/...` image references in documentation or manifests,
+those are NVIDIA-internal paths not accessible externally. Replace them with your own
+registry paths after building from source.
 
 ### Machines Not Progressing
 
-Check state controller logs:
+Check the state controller logs as follows:
 `kubectl -n forge-system logs -l app=carbide-api --tail=100 | grep state_controller`
 
 Common causes: DHCP relay not configured on OOB switch, BMC MACs not matching the
diff --git a/book/src/manuals/pushing_containers.md b/book/src/manuals/pushing_containers.md
index 0e2bab80d8..2926d76fb5 100644
--- a/book/src/manuals/pushing_containers.md
+++ b/book/src/manuals/pushing_containers.md
@@ -1,8 +1,11 @@
 # Tagging and Pushing Containers to a Private Registry
 
-After building all NICo container images (see [Building NICo Containers](building_nico_containers.md)),
-tag them for your private registry and push. Set your registry URL and version tag as
-environment variables:
+After building all NICo container images (refer to the [Building NICo Containers](building_nico_containers.md) section),
+you will need to tag them and push them to your private registry.
+
+## Setting Environment Variables
+
+Set your registry URL and version tag as environment variables:
 
 ```sh
 REGISTRY=<your-registry.example.com/carbide>
@@ -33,15 +36,15 @@ docker push $REGISTRY/machine-validation-config:$TAG
 
 REST images are built from the
 [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest)
-repository. The `make docker-build` command tags images at build time when you pass
-`IMAGE_REGISTRY` and `IMAGE_TAG`:
+repository. The `make docker-build` command tags images at build time when you pass the
+`IMAGE_REGISTRY` and `IMAGE_TAG` environment variables:
 
 ```sh
 cd /path/to/ncx-infra-controller-rest
 make docker-build IMAGE_REGISTRY=$REGISTRY IMAGE_TAG=$TAG
 ```
 
-Then push all REST images:
+Then, push all REST images to your private registry:
 
 ```sh
 for image in carbide-rest-api carbide-rest-workflow carbide-rest-site-manager \
diff --git a/book/src/manuals/site-setup.md b/book/src/manuals/site-setup.md
index 3d8faaeed1..edef23061f 100644
--- a/book/src/manuals/site-setup.md
+++ b/book/src/manuals/site-setup.md
@@ -1,6 +1,6 @@
 # Site Setup Guide
 
-This page outlines the software dependencies for a Kubernetes-based install of NVIDIA Bare Metal Manager (BMM). It includes the *validated baseline* of software dependencies,
+This page outlines the software dependencies for a Kubernetes-based install of NVIDIA NCX Infra Controller (NICo). It includes the *validated baseline* of software dependencies,
 as well as the *order of operations* for site bringup, including what you must configure if you already operate some of the common services yourself.
 
 **Important Notes**
@@ -16,7 +16,7 @@ as well as the *order of operations* for site bringup, including what you must c
 
 ## Validated Baseline
 
-This section lists all software dependencies, including the versions validated for this release of BMM.
+This section lists all software dependencies, including the versions validated for this release of NICo.
 
 ### Kubernetes and Node Runtime
 
@@ -58,7 +58,7 @@ This section lists all software dependencies, including the versions validated f
 
 ### Monitoring and Telemetry (OPTIONAL)
 
-These components are not required for BMM setup, but are recommended site metrics.
+These components are not required for NICo setup, but are recommended site metrics.
 
 - **Monitoring System**:  Prometheus Operator v0.68.0; Prometheus v2.47.0; Alertmanager v0.26.0
 
@@ -70,27 +70,30 @@ These components are not required for BMM setup, but are recommended site metric
 
 - **Host Monitoring** Node exporter v1.6.1
 
-### BMM Components
+### NICo Components
 
-The following services are installed during the BMM installation process.
+The following services are installed during the NICo installation process.
 
-- **NICo core (forge-system)**
+- **NICo core (forge-system)**: `<YOUR_REGISTRY>/nvmetal-carbide:<TAG>` (primary carbide-api, plus supporting workloads)
+    
+  - Build from the [ncx-infra-controller-core](https://github.com/NVIDIA/ncx-infra-controller-core) repo.
+    Refer to the [Building NICo Containers](building_nico_containers.md) section for more details.
 
-  - `<YOUR_REGISTRY>/nvmetal-carbide:<TAG>` (primary carbide-api, plus supporting workloads).
-    Build from [ncx-infra-controller-core](https://github.com/NVIDIA/ncx-infra-controller-core).
-    See [Building NICo Containers](building_nico_containers.md).
+- **cloud-api**: `<YOUR_REGISTRY>/carbide-rest-api:<TAG>` (two replicas)
+  
+  - Build from the [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repo.
 
-- **cloud-api**: `<YOUR_REGISTRY>/carbide-rest-api:<TAG>` (two replicas).
-  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
+- **cloud-workflow**: `<YOUR_REGISTRY>/carbide-rest-workflow:<TAG>` (cloud-worker, site-worker)
+  
+  - Build from the [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repo.
 
-- **cloud-workflow**: `<YOUR_REGISTRY>/carbide-rest-workflow:<TAG>` (cloud-worker, site-worker).
-  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
-
-- **cloud-cert-manager (credsmgr)**: `<YOUR_REGISTRY>/carbide-rest-cert-manager:<TAG>`.
-  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
+- **cloud-cert-manager (credsmgr)**: `<YOUR_REGISTRY>/carbide-rest-cert-manager:<TAG>`
+  
+  - Build from the [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repo.
 
 - **elektra-site-agent**: `<YOUR_REGISTRY>/carbide-rest-site-agent:<TAG>`.
-  Build from [bare-metal-manager-rest](https://github.com/NVIDIA/bare-metal-manager-rest).
+  
+  - Build from the [ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repo.
 
 ## Order of Operations