Multi-cluster management in air-gap environments 2/2: leveraging RHACM Policies to automate the import and disconnection of ManagedClusters

Table of Contents

Multi-cluster management in air-gap environments - This article is part of a series.

Part 1: Multi-cluster management in air-gap environments 1/2: how to import multiple clusters in ACM and disconnect them

Part 2: This Article

How could we automate with ACM Policies the import and later disconnection of many ManagedClusters from a Hub cluster?

Introduction #

In Part 1 of this series we discussed how to adopt a multi-cluster architecture by importing multiple Kubernetes clusters into an ACM Hub cluster. In that scenario, the Hub cluster was disconnected and therefore a second step took place, configuring the imported cluster as in an air-gap environment to align it with the Hub cluster reachability.

In this post, we’ll focus on leveraging RHACM Policies to provide automation. This time, we’ll be able to manage both the import and the disconnection processes through the Hub cluster, without the need to interact with the spoke clusters at all through the Open Cluster Management policy driven governance. .

Let’s quickly refresh the scenario:

A compact (3-node) disconnected Openshift cluster was installed to be the ACM Hub cluster.

$ oc --kubeconfig=hub-kubeconfig get nodes
NAME                            STATUS   ROLES                         AGE   VERSION
hub-ctlplane-0.rhacm-demo.lab   Ready    control-plane,master,worker   85d   v1.27.6+f67aeb3
hub-ctlplane-1.rhacm-demo.lab   Ready    control-plane,master,worker   85d   v1.27.6+f67aeb3
hub-ctlplane-2.rhacm-demo.lab   Ready    control-plane,master,worker   85d   v1.27.6+f67aeb3

$ oc --kubeconfig=hub-kubeconfig get csv 
NAME                                       DISPLAY                                      VERSION   REPLACES                                   PHASE
advanced-cluster-management.v2.9.2         Advanced Cluster Management for Kubernetes   2.9.2     advanced-cluster-management.v2.9.1         Succeeded
openshift-gitops-operator.v1.11.1          Red Hat OpenShift GitOps                     1.11.1    openshift-gitops-operator.v1.11.0          Succeeded
topology-aware-lifecycle-manager.v4.14.3   Topology Aware Lifecycle Manager             4.14.3    topology-aware-lifecycle-manager.v4.14.2   Succeeded
multicluster-engine.v2.4.3                 multicluster engine for Kubernetes           2.4.3     multicluster-engine.v2.4.2                 Succeeded

An online/connected Single-Node Openshift (SNO) was deployed.

$ oc --kubeconfig=sno-kubeconfig  get node
NAME       STATUS   ROLES                         AGE     VERSION
ocp-sno   Ready    control-plane,master,worker   3d22h   v1.27.6+f67aeb3

An offline registry was pre-populated before installing the Hub cluster

Offline registry initial catalog.

curl -X GET -u <user>:<passwd> https://infra.rhacm-demo.lab:8443/v2/_catalog | jq 
{
  "repositories": [
    "lvms4/lvms-must-gather-rhel9",
    "lvms4/lvms-operator-bundle",
    "lvms4/lvms-rhel9-operator",
    "lvms4/topolvm-rhel9",
    "multicluster-engine/addon-manager-rhel8",
    "multicluster-engine/agent-service-rhel8",
    "multicluster-engine/apiserver-network-proxy-rhel8",
    ........ redacted .....
    "oc-mirror",
    "openshift/graph-image",
    "openshift/origin-must-gather",
    "openshift/release",
    "openshift/release/metadata",
    "openshift/release-images",
    "openshift-gitops-1/argo-rollouts-rhel8",
    "openshift-gitops-1/argocd-rhel8",
    "openshift-gitops-1/console-plugin-rhel8",
    "openshift-gitops-1/dex-rhel8",
    "openshift-gitops-1/gitops-operator-bundle",
    "openshift-gitops-1/gitops-rhel8",
    "openshift-gitops-1/gitops-rhel8-operator",
    "openshift-gitops-1/kam-delivery-rhel8",
    "openshift-gitops-1/must-gather-rhel8",
    "openshift4/ose-configmap-reloader",
    "openshift4/ose-csi-external-provisioner",
    "openshift4/ose-csi-external-resizer",
    "openshift4/ose-csi-external-snapshotter",
    "openshift4/ose-csi-livenessprobe",
    "openshift4/ose-csi-node-driver-registrar",
    "openshift4/ose-haproxy-router",
    "openshift4/ose-kube-rbac-proxy",
    "openshift4/ose-oauth-proxy",
    "openshift4/topology-aware-lifecycle-manager-operator-bundle",
    "openshift4/topology-aware-lifecycle-manager-precache-rhel8",
    "openshift4/topology-aware-lifecycle-manager-recovery-rhel8"
  ]
}

RHACM Governance vs. Zero Touch Provisioning #

On the one hand, ACM natively provides 3 different CRs to create or delete objects in its ManagedClusters:

Policy CR: defines the ‘musthave’ / *‘mustnothave’*objectDefinition we want to define or delete in a cluster.
PlacementRule CR: targets specific groups of clusters through different labels.
PlacementBinding CR: binds a PlacementRule to a Policy.

On the other hand, Zero Touch Provisoning seeks to simplify this process further and unify all the information in a single PolicyGenTemplate CR, providing typical Telco RAN policies as templates. This way, applying a PGT with the required overwritten fields would generate the Policy, PlacementRule, and PlacementBinding in an automated manner.

Considering the two steps in this post: import and the disconnection of ManagedClusters in RHACM, the former is better aligned with ACM Policies since there are no ZTP Templates available; while the latter can be straight-forward implemented with ZTP leveraging the available templates for disconnected/air-gap environments.

In this post, both steps will be covered with each implementation to thoroughly understand the potential of ZTP in common Telco RAN configurations seeing its advantage over the ACM Policies framework. However, the best approach would be to import it with an ACM Policy and then disconnected it with a ZTP PolicyGenTemplate resource.

Automating the ManagedCluster import in ACM #

As we saw in Part 1, to import a cluster we simply need to create 3 manifests in the Hub cluster: the ManagedCluster CR, the KlusterletAddonConfig CR, and the AutoImportSecret containing the Kubeconfig file.

With Policies, we’ll now tell the Hub cluster to apply these 3 manifests to itself.

The Hub cluster is by default imported into RHACM as the local-cluster. If that is not your case, make sure to import it before continuing. As any other ManagedCluster, any resource related to a specific ManagedCluster needs its own namespace. In this case, everything will be created in the local-cluster namespace

ZTP PolicyGenTemplate CR #

Let’s see now how ZTP automatically generates the Policy, PlacementRule and PlacementBinding resources given a PGT definition. As mentioned before, the import resources are not natively provided by the ZTP prepared templates so they must be placed inside the GitOps source in the source-crs folder.

site-policies/
├── 1-import-cluster-pgt.yaml
├── kustomization.yaml
└── source-crs
    └── import
        ├── AutoImportSecret.yaml
        ├── KlusterletAddonConfig.yaml
        ├── ManagedCluster.yaml
        ├── Namespace.yaml

Each file added to source-crs will then be referenced in a PolicyGenTemplate resource in the filename field, like in the example below:

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: import-cluster
  namespace: ztp-policies
spec:
  bindingRules:
    local-cluster: true
  remediationAction: "inform"
  sourceFiles:
    - fileName: import/Namespace.yaml
      policyName: "import-policy"
    - fileName: import/ManagedCluster.yaml
      policyName: "import-policy"
      evaluationInterval:
        compliant: never
        noncompliant: 1s
    - fileName: import/AutoImportSecret.yaml
      policyName: "import-policy"
      evaluationInterval:
        compliant: never
    - fileName: import/KlusterletAddonConfig.yaml
      policyName: "import-policy"

By applying the PGT referencing each import resource to the Hub cluster, the tupple {Policy, PlacementBinding, and PlacementRule} are generated and the ManagedCluster will be imported.

Automating the ManagedCluster disconnection in ACM #

As we saw in Part 1, to disconnect a cluster an ImageContentSourcePolicy CR is required to configure the registry mirror configuration file and a new CatalogSource needs to be created pointing to the disconnected index image.

To provide the cluster to access to the offline Registry, the credentials and certificate need also to be applied. Let’s see how both steps are implemented through the policy driven governance framework.

This time, the ZTP PGT provides the necessary templates to configure a disconnected cluster in an air-gap environment. However, credentials and certificates will still need to be added to source-crs folder.

ZTP PolicyGenTemplate CR #

Let’s leverage the ZTP RAN templates available in ZTP for air-gap spoke clusters and just set the specific fields to the values in this scenario.

To grant the spoke cluster access to the offline registry, the extra manifests with the access details need to be placed in the source-crs folder:

site-policies/
├── 2-total-disconnect-cluster-pgt.yaml
├── kustomization.yaml
└── source-crs
    ├── disconnect
    │   ├── CACM.yaml
    │   ├── DisconnectedPullSecret.yaml
    │   └── ImageConfigCluster.yaml

To apply them they will be referenced from the PolicyGenTemplate resource described below in the post.

Once the credentials and CA Certificate are provided, the following RAN templates for ZTP Policies will be used:

DisconnectedICSP.yaml: creates a new ICSP with the registry mirror configuration file. It will be placed to the cluster nodes in /etc/containers/registries.conf.
DefaultCatsrc.yaml: creates a new CatalogSource referring the index image in the offline registry.
OperatorHub.yaml: disables the rest of the catalogs in the cluster.

Apply a single PolicyGenTemplate (PGT) including all these resources to the Hub cluster. Given the bindingRules, the Hub cluster will create the corresponding objects in the imported spoke cluster.

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: disconnect-cluster
  namespace: ztp-policies
spec:
  bindingRules:
    name: sno
  remediationAction: inform
  sourceFiles:
    - fileName: disconnect/DisconnectedPullSecret.yaml
      policyName: "disconnect-policy"
    - fileName: disconnect/ImageConfigCluster.yaml
      policyName: "disconnect-policy"
    - fileName: disconnect/CACM.yaml      
      policyName: "disconnect-policy"
    - fileName: DisconnectedICSP.yaml
      policyName: disconnect-policy
      spec:
        repositoryDigestMirrors:
          - mirrors:
            -  infra.rhacm-demo.lab:8443/openshift
            source: quay.io/openshift 
          - mirrors:
            -  infra.rhacm-demo.lab:8443/openshift/release
            source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
            ... redacted ...
    - fileName: DefaultCatsrc.yaml
      policyName: "disconnect-policy"
      metadata:
        name: cs-redhat-operator-index
      spec:
        image:  infra.rhacm-demo.lab:8443/redhat/redhat-operator-index:v4.14
        sourceType: grpc
    - fileName: OperatorHub.yaml
      policyName: "disconnect-policy"

Once it is applied, the imported cluster will have automatically be disconnected and will only have access to the mirror registry.

Wrap Up #

Even if ZTP is meant to provision baremetal infrastructure deploying Openshift clusters from a central Hub cluster, with these series of posts we have tweaked the ZTP resources to import already running infrastructure, so that not only new fresh baremetals can be installed but also existing clusters can be managed by the Hub cluster.

Multi-cluster management in air-gap environments - This article is part of a series.

Part 1: Multi-cluster management in air-gap environments 1/2: how to import multiple clusters in ACM and disconnect them

Part 2: This Article