I'm wondering if there is any tool that will accurately measure failover times in the lab, possibly even down to the milli-second. Anything that's more accurate than a ping would be great... :)
What about debug logging? Or are their timestamps not specific down to the millisecond? Would syslogs be the answer?
what constitutes failover time? there lies the problem, when the packets make it to the redundant server? or when the service is responding
Quote from: ristau5741 on January 24, 2018, 09:03:09 PM
what constitutes failover time? there lies the problem, when the packets make it to the redundant server? or when the service is responding
When full end-to-end connectivity has been restored.
Something like iperf, with two nodes on both ends of the topology that you're testing, keeping a steady stream of timestamped traffic open and reporting back when exactly end-to-end connectivity has been restored. Thinking about it, might be possible using BFD for this.
IXIA, spirent STC (L4) / Avalanche (L7) are two big name commercial traffic generator options.
Here's an open source one: https://trex-tgn.cisco.com/ (https://trex-tgn.cisco.com/)
This is a VERY deep field once you get into the weeds... be warned. The exact nature of the traffic will have a bearing on what you measure and what you're actually making the device do - this is especially true for security platforms. Example: I'm driving 2k concurrent TCP connections (RENO) @ ~70k connections with total throughput ~4Gb (lots of tiny 8kb HTTP GETs). It tells me I lost 15000 packets and 500 connections. What's the convergence time? LOLOLOLOLOL
For simple RS something like a spirent which generates a simple stream of packets, count the lost packets that's done - sure, but as soon as you get into stateful devices or real world traffic patterns hmmmm
I have used Jperf/wireshark for this. Just send a stream of UDP traffic as fast as possible with a destination of a PC, the PC captures on wireshark. Cause failure, once the fail-over is done stop the capture. Filter so that only traffic from the Jperf box is visible.
Go to view, time display format, seconds since previous displayed packet. Then sort by time. The largest amount of time there should be your fail-over time.