I wouldn’t use rook if you solely want S3. It is a massively complex system whic...

breakingcups · 2025-12-19T21:48:46 1766180926

IS there a better solution for self-healing S3 storage that you could recommend? I'm also curious what will make a rook cluster croak after some time and what kind of maintenance is required in your experience.

__turbobrew__ · 2025-12-20T05:17:37 1766207857

I have unfortunately got a ceph cluster in a bad enough state that I just had to delete the pools and start from scratch. It was due to improper sequencing when removing OSDs, but that is kindof the point is you have to know what you are doing to know how to do things safely. For the most part I have so far learned by blundering things and learning hard lessons. Ceph clusters when mistreated can get into death spirals that only an experienced practitioner can advert through very carefully modifying cluster state through things like upmaps. You also need to make sure you understand your failure domains and how to spread mons and osds across the domains to properly handle failure. Lots of people don’t think about this and then one day a rack goes poof and you didn’t replicate your data across racks and you have data loss. Same thing with mons, you should be deploying mons across at least 3 failure domains (ideally 3 different datacenters) to maintain quorum during an outage.

adamcharnock · 2025-12-19T22:11:41 1766182301

Not used it yet, but RustFS sounds like it has self healing

https://docs.rustfs.com/troubleshooting/healing.html

adastra22 · 2025-12-19T22:02:28 1766181748

ceph?

yupyupyups · 2025-12-20T03:24:57 1766201097

Rook is ceph.