Measuring Device Recovery Time
Figuring out how long it takes a device to recover from a reboot or power outage sounds trivial, except for those of you who are troubleshooting remotely or managing a monitoring system. This also helps to figure out how long it takes for a redundant link to fail over, routing, etc…
When I speak to clients about this simple exercise, their common response is of course “How do I do that”?
The obvious methodology is to start a stopwatch and power on the device. But what do you do if its remote and how do you prevent people from hitting the start too soon or stop too late?
Over the years I figured out a pretty simple way to do this using my favorite free utility, hrping (https://www.cfos.de/en/ping/ping.htm)
Let me start with the basic disclaimer that there are a ton of ways to do this, even with hrping, so feel free to share how you do it. Before I get to the exercise, let me explain why I chose to use hrping. Hrping pings every 500 milliseconds but allows me to ping with an interval option (-s), where I can define how often the utility will ping (in milliseconds). Microsoft ping pings at a 1 second interval and you can’t change it.
Before we start, please note any switch configuration details that might prevent a device from immediately getting online, like Cisco’s portfast spanning tree command that will shorten or bypass the STP timer and gets the port up quicker. Also note if the device is cabled or wireless.
I simply use hrping and ping a device, reboot it and note the timeout message. In this case I used the following command hrping 10.44.10.1 -t -s 100 -T
· -t continuous ping
· -s 100 ping every 100ms
· -T display date and time in results
You could optionally add >log.txt if you want the output to go to a file, or if there is a lot of output. Just note that you will not see any output and will have to break from the command when the device is back up (use a separate command prompt ping to determine)
Here is an example of a successful ping and one that failed
2022-10-25 15:44:17.261: From 10.44.10.1: bytes=60 seq=005f TTL=64 ID=da27 time=1.152ms
2022-10-25 15:44:17.370: Timeout waiting for seq=004c
Here’s the trick to save some time, I add the following to the hrping command; | find “Timeout waiting”
This will only display failed pings making it easier to document how long the device was down. Here is an example of a wireless camera rebooting.
hrping 10.44.10.39 -t -s 100 -T | find "Timeout"
2022-10-25 17:09:03.671: Timeout waiting for seq=03e4
2022-10-25 17:09:03.671: Timeout waiting for seq=03e5
2022-10-25 17:09:03.671: Timeout waiting for seq=03e6
2022-10-25 17:10:01.179: Timeout waiting for seq=0625
2022-10-25 17:10:01.179: Timeout waiting for seq=0626
2022-10-25 17:10:01.179: Timeout waiting for seq=0627
In this example the device was not reachable from 17:09:03 to 17:10:01 , or 58 seconds and since I used 100 ms as my ping interval, this is accurate up to 100 ms, which is good enough for me.
Heres comes the table of results from devices in my office
Recovery Time (secs)
Wireless Access Point (Stewie)