What is a failover cluster?

Rahul Awati
Ben Rubenstein,Senior Manager, Social Media and Online Community

In computing, a failover cluster refers to a group of independent servers that work together to maintain high availability of applications and services. If one of the servers fails, another node in the cluster can take over its workload with little or no downtime. This process is known as failover.

In a failover cluster, the clustered servers can collectively improve the availability and scalability of applications and services. The servers work together to provide either continuous availability or at least high availability (HA) through the failover process. Failover clusters can comprise either physical servers or virtual machines (VMs).

Each server is called a node and is connected to other servers in the cluster by physical cables and software. A failover cluster consists of at least two nodes to transfer data, as well as software to process the data through the cables. In addition, a cluster uses one or more technologies for load balancing, storage, parallel processing and other functions.

The applications and services in a failover cluster are sometimes known as clustered roles. The cluster keeps these roles operational if one server fails. At the same time, each role is proactively monitored to ensure that it is working properly. If it's not, it might be restarted or the role moved to another node.

The need for failover clusters

Server failure can lead to application downtime, which can result in operational disruptions for users. By providing continuous availability or high availability, failover clusters enable users to keep using the applications and services they need without experiencing outages, even if a server fails. There might be a brief service interruption with high-availability clusters during the failover process. However, the system usually recovers quickly with little or no data loss and minimum downtime.

Failover clusters play a vital role to ensure the ongoing availability of mission-critical applications and systems such as online transaction processing (OLTP) systems that demand very high -- near 100% -- availability. Database replication and disaster recovery (DR) also require failover clusters. These clusters provide geographic replication so if a server in one location goes down, data will still be available on failover servers at other sites.

What is a failover cluster? – TechTarget Definition (1)

How failover clusters work

In a high-availability failover cluster, groups of independent servers share data and resources, including storage. At any time, a least one node is active and at least one is passive. These clusters include a monitoring connection that allows each server to check the health of the other servers. In a two-node cluster (the simplest possible configuration), if a node fails the other node will recognize the failure via the monitoring connection and configure itself as the active node. Larger configurations usually use dedicated servers that determine if any nodes are failing and then direct another node to assume the load and participate in the failover process.

In high-availability failover clusters using VMs, the VMs are pooled into a cluster along with the physical servers they reside on. If there is a failure, the VMs on the failed host will be restarted on alternate hosts.

In a continuous-availability failover cluster -- also known as a fault-tolerant cluster -- multiple systems share one copy of a computer's operating system (OS). Thus, the commands issued by one system are simultaneously executed on the other systems as well. The cluster requires a continuously available and near-exact copy of a machine running the service (physical or virtual). This redundancy model is known as 2N. It can automatically detect failures of hard drives, power supplies, network components and CPUs. When the cluster identifies a failure point, a backup component -- or procedure -- takes its place instantaneously without service interruption.

What is a failover cluster? – TechTarget Definition (2)

High-availability failover clusters vs. continuous-availability failover clusters

High-availability clusters attempt 99.999% availability (five 9s availability). These clusters are generally adequate for most applications and services. However, services or applications that require 100% availability, continuous-availability failover clusters are required. These include mission-critical applications such as electronic stock trading, ATM banking, order management, employee time clock systems and reservation systems. Many applications in manufacturing, logistics and e-commerce also require continuous-availability failover clusters.

What is a failover cluster? – TechTarget Definition (3)

Types of failover clusters

Failover clustering is a popular feature in Windows Server and Azure Stack HCI. With those OSes, organizations can create highly available or continuously available file share storage for applications such as Microsoft SQL Server and Hyper-V VMs. Another approach is to create highly available clustered roles on physical servers or VMs that are installed on servers running Hyper-V.

Failover clusters in Windows provide Cluster Shared Volume (CSV) functionality. CSV provides a general-purpose, clustered file system that's layered above NTFS or ReFS. Multiple clustered nodes can read from or write to the same LUN (part of a drive or collection of drives) that is provisioned as an NTFS volume. CSVs provide a consistent distributed namespace that can be used by clustered roles to access shared storage from all nodes and simplify the management of a large number of LUNs in a failover cluster.

Failover clusters are also available in VMware, SQL Server and Red Hat Enterprise Linux. VMware offers many virtualization tools for VM failover clusters. For example, vSphere vMotion replicates a VMware VM and its network to provide continuous availability. VMware vSphere pools VMs and their hosts into a cluster for automatic failover and high availability.

SQL Server 2017 comes with a high-availability failover clustering solution called Always On that registers SQL Server components as cluster resources and enables automatic failover to a different node if one node fails. Red Hat Enterprise Linux also provides a failover cluster mechanism, which allows users to create high-availability failover clusters with the Red Hat Global File System (GFS/GFS2).

What is cluster affinity?

affinity refers to a rule a user would set up to establish a relationship between two or more roles to keep them together. These roles could be VMs, resource groups or other entities. Anti-affinity is also a rule, albeit one that is used to keep the specified roles apart.

These rules are required because a failover cluster can hold many roles that can move and run between nodes. However, when certain roles should not run on the same node, they must be kept apart. Affinity and anti-affinity rules help avoid performance impact issues, which might happen if roles running on the same node use more resources or memory than they should.

Learn more about the differences between virtual servers and physical servers.

This was last updated in February 2023

Continue Reading About failover cluster

Failover cluster quorum considerations for Windows admins

Windows Server cluster sets add more protection to workloads

Use affinity and anti-affinity rules to guide VM behavior

A deep dive into guest clustering for shared virtual storage

Live-migrate Hyper-V VMs in and outside a failover cluster

Related Terms

client-server network: A client-server network is a distributed communications architecture in which a centralized server receives and responds to ... Seecompletedefinition
ITIL V3: ITIL V3 is the third version of the Information Technology Infrastructure Library (ITIL), a globally recognized collection of ... Seecompletedefinition
Microsoft: Microsoft is the largest vendor of computer software in the world. Seecompletedefinition

Dig Deeper on IT operations and infrastructure management

What IT admins should consider when licensing a VMBy: BrienPosey
Windows Server Failover Clustering (WSFC)By: GavinWright
Compare VMware HA vs. DRS to prevent host failureBy: BrienPosey
VSphere High Availability explained: Monitor VMs and applicationsBy: AllysonLarcom

FAQs

What is a failover cluster? – TechTarget Definition? ›

In computing, a failover cluster refers to a group of independent servers that work together to maintain high availability of applications and services. If one of the servers fails, another node in the cluster can take over its workload with little or no downtime.

Find Out More ›

What is the definition of a failover? ›

Failover is the ability to switch automatically and seamlessly to a reliable backup system. When a component or primary system fails, either a standby operational mode or redundancy should achieve failover and lessen or eliminate negative impact on users.

Tell Me More ›

What are the main features of a failover cluster? ›

The key components of a failover cluster typically include multiple servers (or nodes), shared storage resources, and cluster management software that monitors the health and status of the nodes. When a failure like a hardware malfunction is detected, the cluster management software orchestrates the failover process.

Get More Info ›

What is the basic operation of a failover cluster? ›

Failover Clusters

Using the active/passive technology reserves one or more servers that sit idle in the event of a failure on the active server. In the event of a failure of the active node, a passive node will become active and carry on all activities the previous active node performed.

See Details ›

What is the benefit of a failover cluster? ›

A key benefit of Failover Clusters is high availability and optimal performance. An organization, such as a banking enterprise or an e-commerce enterprise, can more efficiently run an application or a website to store data. For example, let's think about an application running in a Microsoft SQL database.

Find Out More ›

What is an example of a failover? ›

When the power goes out in a building or home, a backup generator temporarily restores electricity. Similarly, in server failover, a secondary server takes over when the primary server fails.

What are the different types of failover? ›

Three forms of failover exist: automatic failover (without data loss), planned manual failover (without data loss), and forced manual failover (with possible data loss), typically called forced failover.

View Details ›

What is the difference between failover cluster and always on? ›

Always On availability groups does not depend on any form of shared storage. However, if you use a SQL Server failover cluster instance (FCI) to host one or more availability replicas, each of those FCIs will require shared storage as per standard SQL Server failover cluster instance installation.

Know More ›

What is the disadvantage of using a failover cluster? ›

Essentially, the cost of using a cluster to host an application will typically be at least double over hosting on a single server. Another disadvantage is requiring a certain level of IT infrastructure to be present. Most of the time, a Failover Cluster will be sharing disks between the two (or more) cluster nodes.

Learn More Now ›

What is the failover cluster service name? ›

MSCS is also known as Windows Server Failover Clustering (WSFC).

Keep Reading ›

Why is failover important? ›

Failover and recovery testing help determine if the system can handle and power an extra CPU or multiple servers once its performance threshold is reached, which typically occurs during critical failures. These tests help to understand the crucial relationship between security, resilience and failover testing.

Why do we need failover? ›

Failover can eliminate or reduce breaks in continuity, helping an organization remain solvent—even during a significant disaster. Specifically, failover allows you to: Keep your database protected when the system needs to undergo maintenance or if it fails. Run maintenance tasks automatically.

Get More Info ›

What is the difference between cluster and failover? ›

The clustered servers (called nodes) are connected by physical cables and by software. If one or more of the cluster nodes fail, other nodes begin to provide service (a process known as failover).

What is the difference between backup and failover? ›

A back-up takes a copy of your existing system and stores it for later recovery. Failover means that if the server your learning management system runs on fails, or is overextended, it fails over to another server automatically.

Discover More ›

What happens when a failover occurs? ›

Failover occurs when users cannot access the server that contains the database they want or they cannot access the database itself. When a user tries to open a database that is not available, the Cluster Manager looks in the Cluster Database Directory for a replica of that database.

Get More Info Here ›

What is the difference between redundancy and failover? ›

Redundancy is having extra components available in the case a component fails. Failover is the mechanism, be it automatic or manual, for bringing up a contingent operational plan. Availability is a characteristic of a system that describes uptime, typically expressed as a percentage (e.g. 99.99%)

Read On ›