Unavailability of cloud services for customers running on one cluster in North Europe
Updates
Dear valued Jedox customers,
as communicated earlier, we are sharing the details about the root cause of the issue experienced by several customers with environments running in the cloud region North Europe as part of the standard Postmortem process.
From 2025-01-17 at 01:28 UTC until 2025-01-17 at 11:00 UTC, the containers were continuously restarting and trying to perform again a pull for the images, which led to reaching the respective limits for one cluster from the affected cloud region.
Due to this error, all affected customers experienced unavailability of their Jedox services as their instances were not responsive.
Root cause: during the investigations after the incident, our engineers found our that the Image Pull limits were reached after a regular maintenance activity due to which Jedox workloads were also not scheduled, triggering the incident.
Jedox engineers have taken immediate remediation actions at the time of the incident, retagging all workload images, which restored services for the affected customers.
Additional corrective and preventive actions will be taken:
- Image Pull policies on all workloads will be adjusted to new values to assure that such outages won’t reoccur and cause direct on our customers.
- A stricter auditing of workloads deployed into the production environments will be enforced.
- The communication via https://status.jedox.cloud/ and incident response process will be also reviewed, to improve the incident communication time.
We apologize again for any inconvenience caused to your business as Jedox remains committed to drive corrective actions for improving the stability and availability of the customers’ services.
If you have any further questions or concerns, please do not hesitate to contact our Support Team via Jedox Customer Portal.
Thank you for your continued partnership and trust!
Dear valued customers,
Since our previous communication earlier today, the problem was not reported anymore to our support team, so we consider to be completely solved.
We will share the details of the Postmortem via this communication in 7 days.
We are apologizing again for any inconvenience caused and Jedox remains committed to drive corrective actions to avoid future recurrences.
Thank you for your continued partnership and patience!
Dear valued customers,
we would like to inform you that all affected instances were restored by our engineers, and services should be now up and running.
Our engineers will monitor the situation for the next 4 hours, and if it is stays stable, we will close this communication afterwards.
A Postmortem will be worked on, with the details being shared here within 7 days.
We are apologizing again for any inconvenience caused, and Jedox remains committed to drive corrective actions to avoid future recurrences.
If you have any further questions or concerns, please do not hesitate to contact our Support Team via Jedox Customer Portal.
Thank you for your continued support.
Dear valued customers,
we are experiencing availability issues for customers running on one cluster in the cloud region North Europe, so not all customers from this region are affected.
The outage seems to occur since 17.01 at 01:00 AM UTC time.
For all affected users, currently it is not possible to use their instances, which are unresponsive.
We are in touch with the engineers from our Cloud Service Provider to identify the root cause of the outage and apply the fix.
We want to assure you, that our engineers are handling this incident with the highest urgency.
We apologize for any inconvenience this may cause and appreciate your understanding as we are working to improve our services.
If you have any further questions or concerns, please do not hesitate to contact our Support Team via Jedox Customer Portal.
Thank you for your continued support.
← Back