Whats new in Policy Reporter

Table of Contents

Foreword

A lot has happened since I developed Policy Reporter for over a year and released it as an open-source project. With the donation to the Kyverno organization, adoption grew continuously and it steadily received important feedback.

Policy Reporter has grown with the help of this feedback and has seen important improvements. In this blog post, I'd like to share the latest and most important of these enhancements and features in specific use cases to provide an overview of the functionality and potential to improve your security monitoring.

Introduction

Policy Reporter adds observability and monitoring possibilities to your cluster security based on the PolicyReport CRD, introduced by the Kubernetes Policy Working Group and used by security tools like Kyverno.

It processes each (Cluster)PolicyReport resource in the cluster and checks each update for new violations, to send them to configured targets like Grafana Loki, Slack, MS Team and more. It also provides metrics and a flexible REST API about your PolicyReportResults.

The REST API is used by the optional Policy Reporter UI, a standalone NuxtJS based dashboard that provides different views about the current status of your PolicyReportResults. Different filters and groupings helps you to find the information you are interested in.

Policy Engine Support

Although in most cases Policy Reporter is used together with Kyverno, it is not limited to it. Since the PolicyReport CRD is a generic CRD for different types of policy engines, it supports any tool that supports it directly or indirectly. Over the last year not only Policy Reporter has evolved, but also the mentioned support for the PolicyReport CRD has grown and so it is possible to visualize PolicyReportResults from tools like Falco, Kube Bench, Trivy (Operator), Tracee, jsPolicy and of course Kyverno in the same dashboard.

The integration of these tools varies and depends on the type of support, for PolicyReports. Some have native support and some require additional tools/adapters and setup. I will provide some links to check the different integrations.

Example

The following screenshot shows a Policy Reporter UI setup with integration of Kyverno and the Trivy Operator CRDs.

Policy Reporter UI with Dashboards for Kyverno and Trivy Operator

Multi Cloud & Multi Tenancy Features

Today, it is very common for organizations to run multiple clusters and have different teams responsible for specific namespaces or projects within a cluster. Policy Reporter provides a number of new features to accommodate these different responsibilities and reduce the effort required to switch between multiple clusters.

Target Channels and Filters

One of the first features of Policy Reporter was the aggregation of various notification to push to tools like Slack, Discord or Grafana Loki. Thus, it was only possible to configure a single message channel or instance, with no restrictions on the notification sent. This leads to several challenges in a multi-tenancy cluster, such as unexpected security breach notifications that you shouldn't see or don't care about.

In version 2.5.0 of Policy Reporter, the channel and filter features for targets were introduced to solve this problem. With the channels, the possibility of configuring multiple push messages for the same type of targets was created. The filters in addition to the channels allow you to restrict the notifications to a channel based on namespace, policy, priority or source, where you can define either exclusion or inclusion rules.

Example

A common example, also shown in the documentation, are slack notifications. Imagine you have a product with a microservice architecture running in your cluster and different teams have dedicated responsibilities for a subset of the microservices. You don't want a single Slack channel for all new violations, because the teams are only interested in the violations related to the services they own. High frequency pushes about other parts of the product may annoy the developers and they start to ignore them in general.

With dedicated channels, restricted to the team namespaces, they only get pushes they are interested in and the frequency of pushes is also reduced to a minimum. You can also play with the priority filter to further reduce the pushes to really critical violations.

slack:
webhook: "https://hooks.slack.com/services/T0..."
  minimumPriority: "warning"
  skipExistingOnStartup: true
  channels:
  - name: Team A
    webhook: "https://hooks.slack.com/services/T1..."
    source: [Kyverno]
    filter:
      namespaces:
        include: ["finance", "subscriptions"]
  - name: Team B
    webhook: "https://hooks.slack.com/services/T2..."
    source: [Kyverno]
    filter:
      namespaces:
        include: ["email", "translation"]

The configuration above shows a central root channel for the infrastructure team, that will receive all violations with an priority of warning or higher. For our product teams we added two channels. The first channel for Team A is restricted to violations from Kyverno in the namespaces finance and subscriptions. For Team B we have a second channel with Kyverno violations for the namespaces email and translation.

In this setup we reduced the notifications for the teams by selecting the team specific namespaces, they will also only get notifications with a priority of warning or higher because this is inherited from the root configuration.

E-Mail Reports

In addition to improving the multi-tenant user experience, email reports were introduced to simplify the use of Policy Reporter in a multi-cluster environment. The intent was to provide the operator with a daily overview of PolicyReportResults from all clusters with a different notification method.

You can choose between a summary report, with the number of result per status and source, or a violation summary with additional details about the failed results like resource, policy and rule. Both are implemented as CronJob with customizable schedule, and can be send as often as needed to a list of configured emails.

Example

As the operator of three clusters, I want to receive daily updates on the current status and violations of my Kyverno validation results for existing resources. To this end, I have installed Policy Reporter on each cluster and configured email reports for each of them to the same email address with a unique cluster name at 08:00 AM.

Production Cluster Configuration

emailReports:
  clusterName: Production
  smtp:
    host: smtp.server.com
    port: 465
    username: policy-reporter@company.org
    password: password
    from: policy-reporter@company.org
    encryption: ssl/tls
  violations:
    enabled: true
    schedule: "0 8 * * *"
    to: ['operator@email.com']

Development Cluster Configuration

emailReports:
  clusterName: Development
  smtp:
    host: smtp.server.com
    port: 465
    username: policy-reporter@company.org
    password: password
    from: policy-reporter@company.org
    encryption: ssl/tls
  violations:
    enabled: true
    schedule: "0 8 * * *"
    to: ['operator@email.com']

Playground Cluster Configuration

emailReports:
  clusterName: Playground Cluster
  smtp:
    host: smtp.server.com
    port: 465
    username: policy-reporter@company.org
    password: password
    from: policy-reporter@company.org
    encryption: ssl/tls
  violations:
    enabled: true
    schedule: "0 8 * * *"
    to: ['operator@email.com']

An example email looks like this and gives me as operator an overview about the current status and where I have to take a closer look.

Policy Reporter Violation Report for Playground Cluster

Policy Reporter UI - Multi Cluster Support

While Policy Reporter UI was intended as simple and low-cost alternative to monitoring solutions like Prometheus and Grafana setups, it is used more and more often in production environments. These environments also have different requirements than small playground environments.

One of these requirements was the ability to use a single Policy Reporter UI in a multi cluster environment instead of switching between one dashboard per cluster. This was also perceived as more useful than email reports because of additional details and nearly real-time feedback for changes.

This user request has been addressed with version 1.6.0 of the Policy Reporter UI, adding the ability to configure additional Policy Reporter REST APIs from external clusters and switch between them in a single Policy Reporter UI. The results are still strictly divided by clusters, but you can easily and quickly change the preferred API.

Important Note

Be sure that your APIs are not accessible for the outside world! Use tools like VPN, private Networks or internal Network Load Balancer to expose your APIs in a secure way to the UI.

Example

In this scenario we have two clusters. Our Development cluster, named Default, and our K3S Cluster. To get feedback at any time about the current status of our Kyverno validation we decided to use Policy Reporter UI instead of email reports. We want to use a single UI instance instead of switching between two completely different installations.

K3S Cluster

Because we will use our Development cluster to setup the UI, we only need Policy Reporter with the enabled REST API. We also configure an IP for the REST API, which is only available internally, to access our REST API within the Development cluster.

If we also want information about our Kyverno Policies, we can also enable the KyvernoPlugin with the enabled REST API and configure them as well.

rest:
  enabled: true

kyvernoPlugin:
  enabled: true

Development Cluster

Our Development Cluster will host the Policy Reporter UI. So, we need to enable it and configure the K3S Cluster as additional external API.

ui:
  enabled: true

  clusterName: Default # standard name for the API of the current cluster.

  clusters:
  - name: K3S Cluser
    api: http://10.0.0.1 # refers to service/policy-reporter on port 8080 of our K3S Cluster
    kyvernoApi: http://10.0.0.2 # refers to service/policy-reporter-kyverno-plugin port on 8080 of our K3S Cluster

Demo

Metric Customization

The metrics endpoint was designed to provide as much information about PolicyReportResults as possible. This has the downside that it has a high cardinality, by providing one metric per result. To make this more flexible and reduce the metrics to a bare minimum, Policy Reporter version 2.7.0 implemented filters and label customization.

Filters

Filters makes it possible to define exclude or include rules to filter results by status, severities, namespaces, sources and policies.

Label Customization

To reduce the cardinality, it is possible to reduce the number of labels and details. This can be achieved by setting the metrics.mode to custom and define the labels you want under metrics.customLabel. Supported labels are namespace, rule, policy, report, kind, name, status, severity, category and source.

Example

Because we are using Policy Reporter UI for details about our PolicyReportResults, we only want to use metrics for alerting and basic information about failing and error results. So, we define filters for the status we are interested in and reduce the labels to namespace, status, source and policy.

metrics:
  enabled: true
  mode: custom
  customLabels: ["namespace", "policy", "status", "source"]
  filter:
    status:
      include: ['fail', 'error']

Metric Example

Basic metrics with the provided configuration would look like.

# HELP cluster_policy_report_result Gauge of Results by Policy
# TYPE cluster_policy_report_result gauge
cluster_policy_report_result{policy="require-ns-labels",source="Kyverno",status="fail"} 2
# HELP policy_report_result Gauge of Results by Policy
# TYPE policy_report_result gauge
policy_report_result{namespace="argo-cd",policy="disallow-capabilities-strict",source="Kyverno",status="fail"} 6
policy_report_result{namespace="argo-cd",policy="disallow-privilege-escalation",source="Kyverno",status="fail"} 6
policy_report_result{namespace="argo-cd",policy="require-run-as-nonroot",source="Kyverno",status="fail"} 4
policy_report_result{namespace="argo-cd",policy="restrict-seccomp-strict",source="Kyverno",status="fail"} 6

Observe blocked requests

By default, Kyverno provides (Cluster)PolicyReports on audit validation results. This is useful to gain initial insight into existing resources or to validate common best practices, but you really want to enforce security risk policies.

While you can get PolicyReports about policy enforcement for resources that are already created, you cannot get PolicyReports about blocked requests. In this case, you need to check your Kubernetes events. To simplify this behavior, the KyvernoPlugin of Policy Reporter provides the ability to create (Cluster)PolicyReports based on the mentioned Kubernetes events. This makes it possible to use all the features of the Policy Reporter, such as notifications once a request has been blocked by a validation policy. By default, this PolicyReports will use Kyverno Event as source to show them separated by the audit validation results. This can be customized.

Example

We are using an enforce policy to block resources created in the default namespace. To observe how often this happens and if anything breaks with this change, we want use slack notifications as soon as a request is blocked in the default namespace.

kyvernoPlugin:
  blockReports:
    enabled: true
    source: 'Kyverno Event'

slack:
  webhook: 'https://hooks.slack.com/services/T0...'
  minimumPriority: 'warning'
  skipExistingOnStartup: true
  source: ['Kyverno Event']
  filter:
    namespaces:
      include: ['default']

Example Report

Generated PolicyReport by the KyvernoPlugin

apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  labels:
    managed-by: policy-reporter-kyverno-plugin
  name: polr-ns-default-blocked
  namespace: default
results:
- category: Multi-Tenancy
  message: Using 'default' namespace is not allowed.
  policy: disallow-default-namespace
  properties:
    eventName: disallow-default-namespace.171689c29fa22a8a
    resultID: 917e39a77a23dbdc8a2e5f3d372548bf688ce535
  resources:
  - kind: Pod
    name: nginx
    namespace: default
  result: fail
  rule: validate-namespace
  severity: medium
  source: Kyverno Event
  timestamp:
    nanos: 0
    seconds: 1663668581
summary:
  error: 0
  fail: 1
  pass: 0
  skip: 0
  warn: 0

High Available Setup

To enable scaling and support HA setups in production clusters, Policy Reporter uses leader election to ensure that only one pod sends target notifications. Other features like metrics and REST APIs are load balanced between all pods.

Same goes for the KyvernoPlugin which uses leader election only for the block reports feature, to ensure that only one instance is responsible to create and update PolicyReports for blocked requests.

The Policy Reporter UI does not need leader election because it is stateless except for the Logs page. If you use this feature, it is recommended to use redis as centralized log storage to share the pushed logs between all Policy Reporter UI instances.

Example

To install Policy Reporter via Helm in a HA setup, you simply need to configure the replicaCount for each component higher than one. If you use the Logs page feature, you can also configure a shared redis cache as mentioned above.

replicaCount: 3

kyvernoPlugin:
  enabled: true
  replicaCount: 3
  blockReports:
    enabled: true

ui:
  enabled: true
  replicaCount: 3
  redis:
    enabled: true
    address: redis:6379

Policy Reporter CLI

The CLI is currently under development and is an alternative to the Policy Reporter UI. You can list or search your PolicyReportResults with different filter and grouping options. You can use it as kubectl plugin, which makes it possible to switch easily between your clusters.

fjogeleit/policy-reporter-cli - GitHub

Example

To check our failing results against the Pod Security Standards, we can list our results with a filter for result and category. We can list them in the current, selected or over all namespaces.

kubectl polr results list --category 'Pod Security Standards (Restricted)' --result fail -n argo-cd
NAMESPACE KIND        NAME                                  POLICY                        RULE                         RESULT
argo-cd   StatefulSet argo-cd-argocd-application-controller disallow-capabilities-strict  autogen-require-drop-all     fail
argo-cd   StatefulSet argo-cd-argocd-application-controller disallow-privilege-escalation autogen-privilege-escalation fail
argo-cd   StatefulSet argo-cd-argocd-application-controller require-run-as-nonroot        autogen-run-as-non-root      fail
argo-cd   StatefulSet argo-cd-argocd-application-controller restrict-seccomp-strict       autogen-check-seccomp-strict fail
argo-cd   Deployment  argo-cd-argocd-dex-server             disallow-capabilities-strict  autogen-require-drop-all     fail
argo-cd   Deployment  argo-cd-argocd-dex-server             disallow-privilege-escalation autogen-privilege-escalation fail
argo-cd   Deployment  argo-cd-argocd-dex-server             require-run-as-nonroot        autogen-run-as-non-root      fail
argo-cd   Deployment  argo-cd-argocd-dex-server             restrict-seccomp-strict       autogen-check-seccomp-strict fail
...

Security Improvements

To use features like push notifications or email reports, it is necessary to configure the related credentials, these are persisted with the whole configuration as a single Secret, by default. This makes it very inconvenient to share your configuration or use GitOps principles, where you want to push your configurations into your Git repositories.

Target Credentials

This can be now achieved with the new secretRef configuration for each target, including channels. This property can reference an already existing secret within the same namespace, with the required credentials instead of configuring them directly.

Depending on the related target, it supports values for host, webhook, username, password, accessKeyID, secretAccessKey and token. It maps the key to the related configuration of the target; the only exception is token. It is used as Authorization header for webhook targets, to support authenticated API access.

Example

Instead of configuring our Slack webhook directly in our values.yaml, we want to use a secretRef to avoid exposing them in our GitHub repository.

Configuration

slack:
  secretRef: 'slack-webhook'
  minimumPriority: 'warning'
  skipExistingOnStartup: true

Referenced Secret

apiVersion: v1
kind: Secret
metadata:
  name: slack-webhook
type: Opaque
data:
  webhook: aHR0cHM6Ly9ob29rcy5zbGFjay5jb20vc2VydmljZXMvVDAuLi4= # https://hooks.slack.com/services/T0...

E-Mail SMTP Credentials

A similar feature is also provided for the SMTP configuration of the email reports. Instead of configure them directly in the values.yaml, you can also configure an already existing secret property. It supports all possible emailReports.smtp keys and it is also possible to mix secret configuration and direct configuration, where secret values have a higher priority than the directly provided ones.

Example

Like the approach for our Slack webhook, we want to do a similar approach for our SMTP configuration.

Configuration

emailReports:
  clusterName: Playground Cluster
  smtp:
    secret: smpt-config
  violations:
    enabled: true
    schedule: "0 0 * * *"
    to: ['devOps@organization.com']

SMTP Secret

apiVersion: v1
kind: Secret
metadata:
  name: smpt-config
type: Opaque
data:
  encryption: c3NsL3Rscw==                    # ssl/tls
  from: cG9saWN5LXJlcG9ydGVyQGdtYWlsLmRl      # policy-reporter@gmail.de
  host: c210cC5zZXJ2ZXIuY29t                  # smtp.server.com
  password: cGFzc3dvcmQ=                      # password
  port: NDY1                                  # 465
  username: cG9saWN5LXJlcG9ydGVyQGdtYWlsLmRl  # policy-reporter@gmail.de

Policy Reporter in action

If you want to see most of the presented features in action, you can also watch Episode 58 | CloudNativeFM with Saim on YouTube.

Conclusion

With amazing community feedback and adoption, Policy Reporter quickly became a great addition to the Kyverno ecosystem. Compared to its early days, it now has a wide range of features and customization options for many different use cases. It also runs in a variety of different clusters these days and works in both small and large production environments.

There is always room for improvement, and I am happy to hear your use case and feedback.