Login Issues Affecting App Access

Incident Report for Kameleoon

Postmortem

Summary

Between June 10, 2026 at 8:45 PM CEST and June 11, 2026 at 1:00 AM CEST, Kameleoon experienced a production incident that caused significant instability and temporarily prevented some customers from accessing the application interface.

The incident was caused by an unexpected overload on our production database. Our investigation identified that the overload originated from a cache invalidation and request amplification issue in one of the services responsible for rebuilding customer configuration data.

The incident has been resolved, the root cause has been identified, and mitigations have been deployed.

Impact

Affected customers may have experienced:

  • Temporary inability to access the Kameleoon application interface
  • Intermittent platform errors
  • Delays or failures when loading parts of the application

The issue affected the interface used to manage experiments, feature flags, results, and platform configuration.

Experience delivery, visitor tracking, data collection, feature flag evaluation, and experiment execution were not impacted.

Timeline

June 10, 2026, 8:45 PM CEST

  • The incident began following a cache expiration and service restart sequence.
  • The affected service started rebuilding configuration data, generating an unexpectedly high number of backend requests.

During Incident Investigation

  • Engineering teams identified abnormal database load and service instability.
  • Emergency mitigation actions were applied to reduce database pressure and restore platform stability.

June 11, 2026, 1:00 AM CEST

  • The affected service successfully rebuilt and cached the required configuration.
  • Database load returned to normal levels and the platform stabilized.

Root Cause Analysis

The incident was caused by a rare combination of events where multiple cache layers became unavailable at the same time.

This forced a service responsible for rebuilding customer configuration data to repeatedly request large configuration payloads from backend systems. Due to inefficient pagination and insufficient protection against repeated retries, this created a request amplification loop that generated excessive database load.

This database overload caused instability across parts of the platform that depend on healthy database connectivity.

Resolution

The incident was resolved by reducing database pressure and allowing the affected configuration data to be successfully rebuilt and cached.

As part of the mitigation, we:

  • Reduced backend database pressure
  • Added missing database indexes
  • Increased cache duration
  • Improved pagination behavior
  • Re-enabled additional cache protection

After these changes, the platform stabilized and normal access was restored.

Preventive & Remediation Measures

To reduce the likelihood and impact of similar incidents in the future, we have initiated the following actions:

Immediate Actions

  • Improved database query performance through additional indexes
  • Increased cache duration to reduce unnecessary rebuilds
  • Increased pagination size to reduce backend request volume
  • Re-enabled additional caching protection

Planned Improvements

  • Strengthen circuit breaker behavior to better handle partial failure scenarios
  • Add alerting on abnormal configuration rebuild traffic
  • Improve cache warm-up strategy during deployments
  • Add protection against request amplification loops
  • Stress-test configuration rebuild flows with large customer configurations

Ongoing Monitoring

We are continuing to closely monitor the platform to confirm long-term stability and validate the effectiveness of the remediation measures.

Conclusion

We sincerely apologize for the disruption caused by this incident.

We understand that access to the Kameleoon application is critical for our customers to manage experiments, feature flags, reporting, and configuration. Platform availability and reliability remain our top priority, and we are taking immediate actions to strengthen the resilience of our systems and prevent similar incidents from happening again.

Posted Jun 12, 2026 - 18:55 CEST

Resolved

We have released a patch to address the issue that was preventing customers from accessing the application.

Our team is closely monitoring the situation to ensure the service remains stable. The root cause of the unexpected database load has not yet been fully identified, and our investigation is ongoing.

We sincerely apologize for the impact caused by this incident. We recognize that we have experienced several incidents recently, and we are taking immediate action to improve platform availability and prevent similar issues from happening again.

We will share a post-mortem as soon as possible.
Posted Jun 11, 2026 - 01:14 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 11, 2026 - 01:01 CEST

Identified

We are continuing to investigate the issue. Initial findings indicate the issue may be related to elevated database load impacting authentication services. Our engineering team is actively working to reduce the impact, identify the root cause, and restore normal access as quickly as possible.

We will provide another update as soon as we have more information.

Thank you for your patience.
Posted Jun 11, 2026 - 00:18 CEST

Investigating

We’re currently experiencing an issue that prevent users from logging in to the app. Our team is investigating the cause and working to restore access as quickly as possible. We’ll provide updates as soon as more information is available.
Posted Jun 10, 2026 - 21:22 CEST
This incident affected: Campaign Management (core service) (app.kameleoon.com (back-office)).