Post-Mortem: Temporary Image Unavailability After Storage Migration
Start Time: 19 Oct 2025, 14:30 CEST
End Time: 20 Oct 2025, 11:00 CEST
Overview
During a planned migration to a new storage system, a small number of newly uploaded customer images were temporarily unavailable. Core application functionality remained operational throughout. All affected files were restored through a fallback mechanism.
Root Cause Analysis
- The storage migration involved copying existing files to the new system.
- After switching to the new storage, some recently uploaded files had not yet been synchronized.
- The content cache did not serve these files as expected.
- Therefore, image requests for these recent files occasionally failed until retrieved from the previous storage system.
Resolution Steps
- Confirmed that the missing files existed on the previous storage.
- Connected both storage systems to the application layer.
- Implemented a fallback rule: if a file is not found on the new storage, automatically read it from the previous storage.
- This restored full access to all images.
Impact
- Some customer images were temporarily unavailable.
- Core platform functionality and other data were not affected.
- No data loss occurred.
Preventive Actions
- Use incremental synchronization during migrations where new files continue to be uploaded.
- Validate caching and fallback behavior ahead of cutover.
- Add verification steps to confirm full data availability before switching primary storage.
Next Steps
- Complete final synchronization from old to new storage.
- Remove fallback only after confirming data integrity.
- Update storage migration procedures to include incremental sync, cache validation, and fallback readiness.