Skip to main content
  1. Posts/

Monitoring ZTP policies deployment on ManagedClusters

·3 mins
ztp openshift kubernetes
Teresa Giner Blog
Author
Teresa Giner Blog
Open Source enthusiast
Table of Contents

Dealing with Zero Touch Provisioning in Openshift at the edge involves implementing both cluster provisioning and Day-2 operations in a fully automated way through GitOps practices in Red Hat ACM. While troubleshooting the managed cluster installation may be a child’s play, monitoring the later policies creation can be very confusing. Check out a simple checklist on how to track a sample Policy Generator.

Policies Workflow in ZTP 4.10 #

Let’s assume that both a SiteConfig and a PolicyGenTemplate were pushed to the GitOps source and that the spoke cluster installation is now completed. We will work with ptnr5 cluster example.

  1. ptnr5 new cluster is Ready

cluster-ready.png

  1. ClusterGroupUpgrade resource automatically gets created inside ztp-install namespace
[~]$ oc get clustergroupupgrades -A
NAMESPACE      NAME      AGE
ztp-install    ptnr5     8s
  1. CGU will first label the ManagedCluster as ztp-running
[~]$ oc get managedcluster ptnr5 -o yaml | grep labels -A11
labels:
    app.kubernetes.io/instance: clusters-ptnr5
    cluster.open-cluster-management.io/clusterset: default
    clusterID: ca7549b5-9086-49f9-8347-bc4a418a5742
    feature.open-cluster-management.io/addon-cluster-proxy: available
    feature.open-cluster-management.io/addon-config-policy-controller: available
    feature.open-cluster-management.io/addon-governance-policy-framework: available
    feature.open-cluster-management.io/addon-work-manager: available
    name: ptnr5
    openshiftVersion: 4.10.15
    site: ptnr5-site
    ztp-running: ""
  1. At this point, the policies are deployed in the ManagedCluster namespace ptnr5 and are enforced according to their wave annotation.
[tginer@tginer ~]$ oc get policies -A
NAMESPACE      NAME                                    REMEDIATION ACTION   COMPLIANCE STATE   AGE
ptnr5          ztp-install.ptnr5-common-policy-r5n6d   enforce              NonCompliant       7s
ptnr5          ztp-ptnr5.common-policy                 inform               NonCompliant       7s
ztp-install    ptnr5-common-policy-r5n6d               enforce              NonCompliant       68s
ztp-ptnr5      common-policy                           inform               NonCompliant       32m
  1. If we check inside the Spoke cluster, the Operator starts to get installed. Wait until Succeeded state.

succeeded.png

  1. Finally once policies are compliant, ManagedCluster gets the label ztp-done

[tginer@tginer ~]$ oc get managedcluster ptnr5 -o yaml | grep labels -A11
labels:
    app.kubernetes.io/instance: clusters-ptnr5
    cluster.open-cluster-management.io/clusterset: default
    clusterID: ca7549b5-9086-49f9-8347-bc4a418a5742
    feature.open-cluster-management.io/addon-cluster-proxy: available
    feature.open-cluster-management.io/addon-config-policy-controller: available
    feature.open-cluster-management.io/addon-governance-policy-framework: available
    feature.open-cluster-management.io/addon-work-manager: available
    name: ptnr5
    openshiftVersion: 4.10.15
    site: ptnr5-site
    ztp-done: ""

policy.png

Troubleshooting policies #

If things do not go as neat as presented in this article, let’s leave here a quick reminder of where to look for the logs.

First, look for a cluster-group pod:

[tginer@tginer ~]$ oc get pods -A | grep cluster-group
openshift-operators  cluster-group-upgrades-controller-manager-644f45bc59-gxgsw        2/2     Running     0      23h

[tginer@tginer ~]$  oc logs -f -c manager cluster-group-upgrades-controller-manager-644f45bc59-gxgsw -n openshift-operators
  • Container manager logs

    ter.open-cluster-management.io", "reconciler kind": "ManagedCluster", "source": "kind source: *v1alpha1.ClusterGroupUpgrade"}
    2023-09-28T12:12:39.117Z	INFO	controller.managedclusterForCGU	Starting Controller	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster"}
    2023-09-28T12:12:39.117Z	INFO	controller.clustergroupupgrade	Starting Controller	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade"}
    2023-09-28T12:12:39.218Z	INFO	controller.clustergroupupgrade	Starting workers	{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "worker count": 5}
    2023-09-28T12:12:39.218Z	INFO	controller.managedclusterForCGU	Starting workers	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ManagedCluster", "worker count": 1}
    2023-09-28T12:12:39.219Z	INFO	controllers.ManagedClusterForCGU	cluster is ready	{"Name": "local-cluster"}
    2023-09-28T12:12:39.320Z	INFO	controllers.ManagedClusterForCGU	WARN: No child policies found for cluster{"Name": "local-cluster"}
    

Apply policies to existing clusters #

What if we want to apply new policies to existing clusters? How can we trigger just the policies workflow without involving another Site creation?

Manually create ClusterGroupUpgrade in ztp-install namespace #

Manually create a CGU resource in ztp-install namespace so that the new policies in policygentemplate/ in git are taken into account.

An example if we were to apply a new policy to the local-cluster, i.e., the Hub cluster, just for testing purposes:

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: local-cluster
  namespace: ztp-install
spec:
  clusters:
  - local-cluster
  managedPolicies:
  - common-ptp-policy
  remediationStrategy:
    maxConcurrency: 1
    timeout: 300 
  enable: true

This will automatically trigger the first policies in the wave.

Wrap up #

The Policy Generator offers RAN solutions a list of full Templated CRs that can be later on easily customized and grouped for each target cluster or clusters.

Monitoring how the policies hierarchy and waves work is a step-by-step process. In this post, I’m trying to understand how the policies are sequentially being enforced until the final state of the spoke cluster is the same as the one defined in the GitOps source. Hope it helped!

Related

PTP from scratch: from switch to OpenShift configuration
·11 mins
ptp openshift kubernetes
What would you do if you were given a new Arista ToR switch and a baremetal Openshift cluster and had to configure PTP on some of the nodes?
Headache provisioning baremetal servers? Basic tips to clean, reset, and reboot BMCs
·2 mins
baremetal hpe-dell
New to out-of-band management (OOB)?
About Me
·1 min
Since I joined Red Hat in January 2022, I’ve been moving around my Openshift notes too many times willing to make them more handy.