This communication is to provide you with the Reason for Outage pertaining to the issue which affected Verizon MRS Services in Miami (MIA1) on September 23-24, 2019.
Please be advised that services were affected by a dual failure:
1. Processor card failure in network router (Primary & backup)
2. Gateway router was experiencing max CPU utilization and required a full reboot.
o The Verizon NOC identified what appears to be a failing route processor. A remote switchover was attempted but that has failed.
o We were pending dispatch to restore access and/or reboot the equipment as we have lost access to it.
o This case has been escalated with higher management within Verizon.
o At this time there is a tech on site working with an engineer to restore access to the equipment.
o We have an Equinix tech on site now and he should be calling in shortly. The plan of action is going to reboot BR1 device to restore access to the chassis and routing table. Verizon NOC is joining a crisis bridge now to discuss alternative plans as well. As this unfolds we will continue to update you.
o Troubleshooting is still on going with the Equinix tech on site. In the meantime, NOC is also seeking assistance from vendor Cisco.
o Isolation efforts are still in progress with the Equinix tech and our engineers.
o At this time, we are getting ready to have the tech on site attempt to reboot the box.
o The box has been rebooted and we now have access to it. We are continuing to troubleshoot to further isolate with the issue. At this time we are seeing several bgp sessions up. Will continue to provide updates.
o We are still on a crisis bridge with multiple internal groups and working with Cisco TAC now working on the issue. There is no ETTR at this time, but we will continue to update you with progress.
o We are making progress with Cisco, after pulling 1 of the cards completely out the equipment stopped rebooting and is starting to load. It will take up to 20 minutes for it to fully load. We will update the network ticket with the progress and will continue to update this thread as well.
o After the equipment reboot, some customer services went up, but after a while the service went down again. As the service did not come up so now the engineers are working with the 3rd party providers in order to isolate the issue. This case if being treated on high priority. No ETR is available at the moment.
o Troubleshooting continues on the crisis bridge with multiple internal partners. At this time, we are seeing packets leaving BR2, but not making it to GW3. Next step is going to be to shut down GW3 and force traffic to GW4 in an attempt to isolate. We will continue to update you with progress.