Monitoring ZTP policies deployment on ManagedClusters
Table of Contents
Dealing with Zero Touch Provisioning in Openshift at the edge involves implementing both cluster provisioning and Day-2 operations in a fully automated way through GitOps practices in Red Hat ACM. While troubleshooting the managed cluster installation may be a child’s play, monitoring the later policies creation can be very confusing. Check out a simple checklist on how to track a sample Policy Generator.
Policies Workflow in ZTP 4.10 #
Let’s assume that both a SiteConfig and a PolicyGenTemplate were pushed to the GitOps source and that the spoke cluster installation is now completed. We will work with ptnr5 cluster example.
- ptnr5 new cluster is Ready
- ClusterGroupUpgrade resource automatically gets created inside ztp-install namespace
[~]$ oc get clustergroupupgrades -A
NAMESPACE NAME AGE
ztp-install ptnr5 8s
- CGU will first label the ManagedCluster as ztp-running
[~]$ oc get managedcluster ptnr5 -o yaml | grep labels -A11
labels:
app.kubernetes.io/instance: clusters-ptnr5
cluster.open-cluster-management.io/clusterset: default
clusterID: ca7549b5-9086-49f9-8347-bc4a418a5742
feature.open-cluster-management.io/addon-cluster-proxy: available
feature.open-cluster-management.io/addon-config-policy-controller: available
feature.open-cluster-management.io/addon-governance-policy-framework: available
feature.open-cluster-management.io/addon-work-manager: available
name: ptnr5
openshiftVersion: 4.10.15
site: ptnr5-site
ztp-running: ""
- At this point, the policies are deployed in the ManagedCluster namespace ptnr5 and are enforced according to their wave annotation.
[tginer@tginer ~]$ oc get policies -A
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE
ptnr5 ztp-install.ptnr5-common-policy-r5n6d enforce NonCompliant 7s
ptnr5 ztp-ptnr5.common-policy inform NonCompliant 7s
ztp-install ptnr5-common-policy-r5n6d enforce NonCompliant 68s
ztp-ptnr5 common-policy inform NonCompliant 32m
- If we check inside the Spoke cluster, the Operator starts to get installed. Wait until Succeeded state.
- Finally once policies are compliant, ManagedCluster gets the label ztp-done
[tginer@tginer ~]$ oc get managedcluster ptnr5 -o yaml | grep labels -A11
labels:
app.kubernetes.io/instance: clusters-ptnr5
cluster.open-cluster-management.io/clusterset: default
clusterID: ca7549b5-9086-49f9-8347-bc4a418a5742
feature.open-cluster-management.io/addon-cluster-proxy: available
feature.open-cluster-management.io/addon-config-policy-controller: available
feature.open-cluster-management.io/addon-governance-policy-framework: available
feature.open-cluster-management.io/addon-work-manager: available
name: ptnr5
openshiftVersion: 4.10.15
site: ptnr5-site
ztp-done: ""
Troubleshooting policies #
If things do not go as neat as presented in this article, let’s leave here a quick reminder of where to look for the logs.
First, look for a cluster-group pod:
[tginer@tginer ~]$ oc get pods -A | grep cluster-group
openshift-operators cluster-group-upgrades-controller-manager-644f45bc59-gxgsw 2/2 Running 0 23h
[tginer@tginer ~]$ oc logs -f -c manager cluster-group-upgrades-controller-manager-644f45bc59-gxgsw -n openshift-operators
Container manager logs
ter.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: *v1alpha1.ClusterGroupUpgrade"} 2023-09-28T12:12:39.117Z INFO controller.managedclusterForCGU Starting Controller {"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster"} 2023-09-28T12:12:39.117Z INFO controller.clustergroupupgrade Starting Controller {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade"} 2023-09-28T12:12:39.218Z INFO controller.clustergroupupgrade Starting workers {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "worker count": 5} 2023-09-28T12:12:39.218Z INFO controller.managedclusterForCGU Starting workers {"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "worker count": 1} 2023-09-28T12:12:39.219Z INFO controllers.ManagedClusterForCGU cluster is ready {"Name": "local-cluster"} 2023-09-28T12:12:39.320Z INFO controllers.ManagedClusterForCGU WARN: No child policies found for cluster{"Name": "local-cluster"}
Apply policies to existing clusters #
What if we want to apply new policies to existing clusters? How can we trigger just the policies workflow without involving another Site creation?
Manually create ClusterGroupUpgrade in ztp-install namespace #
Manually create a CGU resource in ztp-install namespace so that the new policies in policygentemplate/ in git are taken into account.
An example if we were to apply a new policy to the local-cluster, i.e., the Hub cluster, just for testing purposes:
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: local-cluster
namespace: ztp-install
spec:
clusters:
- local-cluster
managedPolicies:
- common-ptp-policy
remediationStrategy:
maxConcurrency: 1
timeout: 300
enable: true
This will automatically trigger the first policies in the wave.
Wrap up #
The Policy Generator offers RAN solutions a list of full Templated CRs that can be later on easily customized and grouped for each target cluster or clusters.
Monitoring how the policies hierarchy and waves work is a step-by-step process. In this post, I’m trying to understand how the policies are sequentially being enforced until the final state of the spoke cluster is the same as the one defined in the GitOps source. Hope it helped!