I suggest 25% redundancy or better.

For every four active servers, I have a backup server.

I use identical machines and copy the Catalyst show folder across all machines. If I am just using cues and presets then the preset file is enough. However, if I'm using DMX (sometimes as many as 3 universe per machine) then I'll copy the configuration file as well.

I also test each machine's files on each different machine to verify working order. This is especially necessary since even a subtle change in the machine will cause one config file to crash another machine on launch.

I typically have the servers plugged into a router so that if one fails, I can run an Applescript which loads the failed server's show file on the backup machine, then I just punch up a router preset which changes the route. This solution tends only to have a 30 second down time.

To eliminate that downtime even further, I will typically run my backup machines live and have them mirror the most critical of my servers. That way, a primary server is failover backed up in 5 seconds or less. The lesser significant of the servers still take the average 30 seconds to launch the correct show file but I have never had anyone notice.

When using DMX I run almost exclusively artnet from a DP-8000 so that one ethernet stream is sent to all machines. Obviously if you're running a DMX cable, you'll have to swap if you fail.

I am absolutely confident in the use of this solution and have developed it for others as well. If you're new to Catalyst or a little unsure about the show files, artnet, or networking, then I would strongly suggest 100% redundancy where you do in fact use a KVM switch.

If using TH2G boxes then you're sort of committed to redundant everything and as many KVMs as you have outputs. This may seem a little impractical but consider that a KVM means instant failover.

On the other hand, manually swapping the DMX, DVI (or VGA) cables, dongle, and SSD from one machine to another is also a possibility but you stand as much as 5 minutes of down time (or more) in the event of a failure.