How can you implement a process to reduce staff burnout while adhering to Site Reliability Engineering (SRE) best practices, given frequent production outages that trigger alerts for unhealthy systems which are automatically restarted within a minute?