The Problem

After installing the Rook operator and getting a Ceph cluster up and running on my favourite kubernetes distro (OpenShift/OKD), I was unable to access the Ceph dashboard.

Here’s the error that curl reports:

nanibot@NaniBots-Mac-mini ~ % curl -vvv https://rook-ceph-dashboard-rook-ceph.apps.openshift.internal.nanibot.net/
* Host rook-ceph-dashboard-rook-ceph.apps.openshift.internal.nanibot.net:443 was resolved.
* IPv6: (none)
* IPv4: 192.168.1.103
*   Trying 192.168.1.103:443...
* Connected to rook-ceph-dashboard-rook-ceph.apps.openshift.internal.nanibot.net (192.168.1.103) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to rook-ceph-dashboard-rook-ceph.apps.openshift.internal.nanibot.net:443
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to rook-ceph-dashboard-rook-ceph.apps.openshift.internal.nanibot.net:443

Looking into the logs

The Rook operator has the following message related to the Ceph dashboard module:

error: 2024-07-20 09:19:20.017024 E | op-mgr: failed modules: "dashboard". failed to initialize dashboard: failed to create a self signed cert for the ceph dashboard: failed to create self signed cert on mgr: exec timeout waiting for the command ceph to return

Clearly, the rook operator has failed to initialize the dashboard module. But why does this happen?

Debugging

File: pkg/operator/ceph/cluster/mgr/dashboard.go

Function: initializeSecureDashboard

We find the relevant string here i.e. “failed to create a self signed cert for the ceph dashboard”

Looks like the operator timeouts if it’s unable to intialize the dashboard module. It might also be possible that the admin user for the ceph dashboard isn’t created (it was, in my case).

Since the operator cannot halt indefinitely, it carries on with other tasks and marks the dashboard module initialization stage as failed.

Solution

No problem! We’re going to create the cert ourselves using the rook-ceph-tools pod (enable it in the helm chart if you haven’t already… it’s a lifesaver)

Here are the steps to be followed (Can be executed from the rook-ceph-tools pod):

  1. Run ‘ceph dashboard create-self-signed-cert’ to generate the certificate.

  2. Create an user with the administrator role using ‘ceph dashboard ac-user-create <user-name> -i <file-with-password> administrator’

After following the above steps, you should be (hopefully) able to get into the ceph dashboard!

Or… you know… just restart the rook-operator pod… Wait, we don’t really have to perform all the above steps? Restarting the rook-operator pod should fix it? I don’t know… I didn’t try it out…