Start Time (UTC): October 24, 2019, 22:20 hours UTC
End Time (UTC): October 25, 2019, 22:25 hours UTC
Duration: 24hrs 5mins
Incident Summary:
VMware Cloud on AWS (VMC) experienced an operational issue that caused us to inadvertently remediate hosts that had not actually failed. Customers may have noticed an unusual amount of host replacement activity due to this error.
Impact Summary:
Users may have seen multiple hosts added to their SDDCs which was then followed by an equal number of hosts being removed. Because hosts are always added and removed in pairs when remediation is performed. The net effect is that the SDDC eventually returned to it’s original size and existing workloads should not be impacted. All hosts were placed into maintenance mode prior to their removal and no hosts with running VMs were removed from the SDDC. However, the SDDC may have experienced higher than normal levels of vMotion traffic as workloads were rebalanced across these new hosts. As per our normal policy, customers are not billed for host maintenance and this activity will not affect customer bills.
Root Cause:
VMC has a monitoring system that monitors the health of an SDDC and sends these events to a host remediation service in VMC. This system makes a decision to react to the event based on the health of the underlying host. On 10/24/2019 there was a monitoring agent service update to the fleet. This resulted in a significantly large number of events being sent to the service in a short span of time. With the high volume of events, the service in some cases could not determine the host health in time and decided to error on the side of caution and added a new host to maintain customer SLA. In the majority of the cases, the service was able to correctly determine the host health and removed the new host leaving the original host intact.
VMware engineering has completed all false Host Remediation activities and the incident has been resolved.
Impact : None
Start Time: October 24, 2019, 22:20 hours UTC End Time: October 25, 2019, 22:25 hours UTC
We are continuing to work on a fix for this issue.
Impact : User may have seen multiple hosts added to their SDDCs unnecessarily. Existing Workload is not impacted.
Start Time: October 24, 2019, 22:20 hours UTC END Time: N/A
VMware Engineering teams are in the process of removing inadvertently added hosts.
Impact : User may have seen multiple hosts added to their SDDCs unnecessarily. Existing Workload is not impacted.
Start Time: October 24, 2019, 22:20 hours UTC END Time: N/A
Please be aware that we have experienced an operational issue that caused us to inadvertently remediate hosts that had not actually failed. A small subset of SDDC’s may have unnecessarily seen hosts being added. Please note that we are aware of this issue and we have stopped this from occurring. As per our normal policy, customers are not billed for host maintenance and this activity will not affect customer bills.
Impact : User may have seen multiple hosts added to their SDDCs unnecessarily. Existing Workload is not impacted.
Start Time: October 24, 2019, 22:20 hours UTC END Time: N/A
We’ll find your subscription and send you a link to login to manage your preferences.
We’ve found your existing subscription and have emailed you a secure link to manage your preferences.
We’ll use your email to save your preferences so you can update them later.
Subscribe to other services using the bell icon on the subscribe button on the status page.
You’ll no long receive any status updates from Sandbox, are you sure?
{{ error }}
We’ll no longer send you any status updates about Sandbox.