I wanted to address the recent downtime, reasons behind it and what we’re doing to address it.
We were down from 2am MST to ~ 6:30am MST this morning, resulting in 4h and 32 minutes of downtime. We’ve also been incurring smaller downtime pockets starting on July 11th, which resulted in another cumulative downtime of 29 minutes. If you were negatively affected by this downtime and are a SocialEngine Cloud customer, please write to firstname.lastname@example.org from the email used to log into the dashboard and we will give you 1 day of credit, or if you’re on trial, another day to continue trying the site.
The Cause and What We’re Doing to Fix It
As you may or may not remember, back in May, we had an incident that brought down our site and cloud sites for a few hours. While the underlying cause was AWS-related, our web instance never returned online, which forced us to rebuild it on the spot. As part of the rebuild, we’ve upgraded PHP to 5.5 and Apache to 2.4. Few days following, we’ve experienced similar downtime, due to undetermined causes, our Apache Worker processes would simply hang one by one until apache would run out of available workers and stop serving requests.
We’ve spent time addressing immediate issues apparent from the upgrade, and after doing one particular thing, the issue had stopped. We thought we out of the woods. It returned 3 days ago with vengeance. We’ve been using it as an opportunity to track down the underlying cause and we’re close to being able to figure it out, but last night it happened in the middle of the night while everyone with the power to fix it in deep sleep.
We’re committed to getting to the bottom of this and our intent is to get it resolved within next 2 days. We appreciate the patience and very sorry for any trouble this has caused. Please contact email@example.com to get a credit for this downtime. If you had issues with purchasing PHP, send an email to firstname.lastname@example.org and we’ll be able to assist you.
Update on the Recent Downtime