Multi-cluster management in air-gap environments 2/2: leveraging RHACM Policies to automate the import and disconnection of ManagedClusters
Table of Contents
Multi-cluster management in air-gap environments - This article is part of a series.
How could we automate with ACM Policies the import and later disconnection of many ManagedClusters from a Hub cluster?
Introduction #
In Part 1 of this series we discussed how to adopt a multi-cluster architecture by importing multiple Kubernetes clusters into an ACM Hub cluster. In that scenario, the Hub cluster was disconnected and therefore a second step took place, configuring the imported cluster as in an air-gap environment to align it with the Hub cluster reachability.
In this post, we’ll focus on leveraging RHACM Policies to provide automation. This time, we’ll be able to manage both the import and the disconnection processes through the Hub cluster, without the need to interact with the spoke clusters at all through the Open Cluster Management policy driven governance. .
Let’s quickly refresh the scenario:
- A compact (3-node) disconnected Openshift cluster was installed to be the ACM Hub cluster.
$ oc --kubeconfig=hub-kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
hub-ctlplane-0.rhacm-demo.lab Ready control-plane,master,worker 85d v1.27.6+f67aeb3
hub-ctlplane-1.rhacm-demo.lab Ready control-plane,master,worker 85d v1.27.6+f67aeb3
hub-ctlplane-2.rhacm-demo.lab Ready control-plane,master,worker 85d v1.27.6+f67aeb3
$ oc --kubeconfig=hub-kubeconfig get csv
NAME DISPLAY VERSION REPLACES PHASE
advanced-cluster-management.v2.9.2 Advanced Cluster Management for Kubernetes 2.9.2 advanced-cluster-management.v2.9.1 Succeeded
openshift-gitops-operator.v1.11.1 Red Hat OpenShift GitOps 1.11.1 openshift-gitops-operator.v1.11.0 Succeeded
topology-aware-lifecycle-manager.v4.14.3 Topology Aware Lifecycle Manager 4.14.3 topology-aware-lifecycle-manager.v4.14.2 Succeeded
multicluster-engine.v2.4.3 multicluster engine for Kubernetes 2.4.3 multicluster-engine.v2.4.2 Succeeded
- An online/connected Single-Node Openshift (SNO) was deployed.
$ oc --kubeconfig=sno-kubeconfig get node
NAME STATUS ROLES AGE VERSION
ocp-sno Ready control-plane,master,worker 3d22h v1.27.6+f67aeb3
- An offline registry was pre-populated before installing the Hub cluster
Offline registry initial catalog.
curl -X GET -u <user>:<passwd> https://infra.rhacm-demo.lab:8443/v2/_catalog | jq { "repositories": [ "lvms4/lvms-must-gather-rhel9", "lvms4/lvms-operator-bundle", "lvms4/lvms-rhel9-operator", "lvms4/topolvm-rhel9", "multicluster-engine/addon-manager-rhel8", "multicluster-engine/agent-service-rhel8", "multicluster-engine/apiserver-network-proxy-rhel8", ........ redacted ..... "oc-mirror", "openshift/graph-image", "openshift/origin-must-gather", "openshift/release", "openshift/release/metadata", "openshift/release-images", "openshift-gitops-1/argo-rollouts-rhel8", "openshift-gitops-1/argocd-rhel8", "openshift-gitops-1/console-plugin-rhel8", "openshift-gitops-1/dex-rhel8", "openshift-gitops-1/gitops-operator-bundle", "openshift-gitops-1/gitops-rhel8", "openshift-gitops-1/gitops-rhel8-operator", "openshift-gitops-1/kam-delivery-rhel8", "openshift-gitops-1/must-gather-rhel8", "openshift4/ose-configmap-reloader", "openshift4/ose-csi-external-provisioner", "openshift4/ose-csi-external-resizer", "openshift4/ose-csi-external-snapshotter", "openshift4/ose-csi-livenessprobe", "openshift4/ose-csi-node-driver-registrar", "openshift4/ose-haproxy-router", "openshift4/ose-kube-rbac-proxy", "openshift4/ose-oauth-proxy", "openshift4/topology-aware-lifecycle-manager-operator-bundle", "openshift4/topology-aware-lifecycle-manager-precache-rhel8", "openshift4/topology-aware-lifecycle-manager-recovery-rhel8" ] }
RHACM Governance vs. Zero Touch Provisioning #
On the one hand, ACM natively provides 3 different CRs to create or delete objects in its ManagedClusters:
- Policy CR: defines the ‘musthave’ / *‘mustnothave’*objectDefinition we want to define or delete in a cluster.
- PlacementRule CR: targets specific groups of clusters through different labels.
- PlacementBinding CR: binds a PlacementRule to a Policy.
On the other hand, Zero Touch Provisoning seeks to simplify this process further and unify all the information in a single PolicyGenTemplate CR, providing typical Telco RAN policies as templates. This way, applying a PGT with the required overwritten fields would generate the Policy, PlacementRule, and PlacementBinding in an automated manner.
Considering the two steps in this post: import and the disconnection of ManagedClusters in RHACM, the former is better aligned with ACM Policies since there are no ZTP Templates available; while the latter can be straight-forward implemented with ZTP leveraging the available templates for disconnected/air-gap environments.
In this post, both steps will be covered with each implementation to thoroughly understand the potential of ZTP in common Telco RAN configurations seeing its advantage over the ACM Policies framework. However, the best approach would be to import it with an ACM Policy and then disconnected it with a ZTP PolicyGenTemplate resource.
Automating the ManagedCluster import in ACM #
As we saw in Part 1, to import a cluster we simply need to create 3 manifests in the Hub cluster: the ManagedCluster CR, the KlusterletAddonConfig CR, and the AutoImportSecret containing the Kubeconfig file.
With Policies, we’ll now tell the Hub cluster to apply these 3 manifests to itself.
ZTP PolicyGenTemplate CR #
Let’s see now how ZTP automatically generates the Policy, PlacementRule and PlacementBinding resources given a PGT definition. As mentioned before, the import resources are not natively provided by the ZTP prepared templates so they must be placed inside the GitOps source in the source-crs folder.
site-policies/
├── 1-import-cluster-pgt.yaml
├── kustomization.yaml
└── source-crs
└── import
├── AutoImportSecret.yaml
├── KlusterletAddonConfig.yaml
├── ManagedCluster.yaml
├── Namespace.yaml
Each file added to source-crs will then be referenced in a PolicyGenTemplate resource in the filename field, like in the example below:
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: import-cluster
namespace: ztp-policies
spec:
bindingRules:
local-cluster: true
remediationAction: "inform"
sourceFiles:
- fileName: import/Namespace.yaml
policyName: "import-policy"
- fileName: import/ManagedCluster.yaml
policyName: "import-policy"
evaluationInterval:
compliant: never
noncompliant: 1s
- fileName: import/AutoImportSecret.yaml
policyName: "import-policy"
evaluationInterval:
compliant: never
- fileName: import/KlusterletAddonConfig.yaml
policyName: "import-policy"
By applying the PGT referencing each import resource to the Hub cluster, the tupple {Policy, PlacementBinding, and PlacementRule} are generated and the ManagedCluster will be imported.
Automating the ManagedCluster disconnection in ACM #
As we saw in Part 1, to disconnect a cluster an ImageContentSourcePolicy CR is required to configure the registry mirror configuration file and a new CatalogSource needs to be created pointing to the disconnected index image.
To provide the cluster to access to the offline Registry, the credentials and certificate need also to be applied. Let’s see how both steps are implemented through the policy driven governance framework.
This time, the ZTP PGT provides the necessary templates to configure a disconnected cluster in an air-gap environment. However, credentials and certificates will still need to be added to source-crs folder.
ZTP PolicyGenTemplate CR #
Let’s leverage the ZTP RAN templates available in ZTP for air-gap spoke clusters and just set the specific fields to the values in this scenario.
To grant the spoke cluster access to the offline registry, the extra manifests with the access details need to be placed in the source-crs folder:
site-policies/
├── 2-total-disconnect-cluster-pgt.yaml
├── kustomization.yaml
└── source-crs
├── disconnect
│ ├── CACM.yaml
│ ├── DisconnectedPullSecret.yaml
│ └── ImageConfigCluster.yaml
To apply them they will be referenced from the PolicyGenTemplate resource described below in the post.
Once the credentials and CA Certificate are provided, the following RAN templates for ZTP Policies will be used:
- DisconnectedICSP.yaml: creates a new ICSP with the registry mirror configuration file. It will be placed to the cluster nodes in /etc/containers/registries.conf.
- DefaultCatsrc.yaml: creates a new CatalogSource referring the index image in the offline registry.
- OperatorHub.yaml: disables the rest of the catalogs in the cluster.
Apply a single PolicyGenTemplate (PGT) including all these resources to the Hub cluster. Given the bindingRules, the Hub cluster will create the corresponding objects in the imported spoke cluster.
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
name: disconnect-cluster
namespace: ztp-policies
spec:
bindingRules:
name: sno
remediationAction: inform
sourceFiles:
- fileName: disconnect/DisconnectedPullSecret.yaml
policyName: "disconnect-policy"
- fileName: disconnect/ImageConfigCluster.yaml
policyName: "disconnect-policy"
- fileName: disconnect/CACM.yaml
policyName: "disconnect-policy"
- fileName: DisconnectedICSP.yaml
policyName: disconnect-policy
spec:
repositoryDigestMirrors:
- mirrors:
- infra.rhacm-demo.lab:8443/openshift
source: quay.io/openshift
- mirrors:
- infra.rhacm-demo.lab:8443/openshift/release
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
... redacted ...
- fileName: DefaultCatsrc.yaml
policyName: "disconnect-policy"
metadata:
name: cs-redhat-operator-index
spec:
image: infra.rhacm-demo.lab:8443/redhat/redhat-operator-index:v4.14
sourceType: grpc
- fileName: OperatorHub.yaml
policyName: "disconnect-policy"
Once it is applied, the imported cluster will have automatically be disconnected and will only have access to the mirror registry.
Wrap Up #
Even if ZTP is meant to provision baremetal infrastructure deploying Openshift clusters from a central Hub cluster, with these series of posts we have tweaked the ZTP resources to import already running infrastructure, so that not only new fresh baremetals can be installed but also existing clusters can be managed by the Hub cluster.