Overview of Running vSphere IaaS Control Plane on vSAN Stretched Cluster

Learn what are the general topologies and guidelines for deploying a Supervisor on a stretched vSAN cluster. Stretched vSAN cluster brings the capability to run VMs with high availability across a stretched data center environment. Starting from the vSphere 8 Update 3 release, you can also run TKG workloads on a single vSAN stretched cluster that has equal numbers of hosts that are geographically separated. This way, you provide distributed high availability for TKG workloads across a stretched data center environment.

A vSAN stretched cluster is a vSAN cluster that spans across two data sites for faster level of availability and inter-site load balancing. Both sites have equal number of ESXi hosts and are part of the same vSphere cluster. Typically, sites part of a vSAN stretched cluster are geographically separated locations and are referred to as vSAN fault domains. In most cases, you deploy vSAN stretched clusters in environments where the distance between data centers is limited, such as metropolitan or campus environments. In a stretched vSAN cluster configuration, both data sites are active sites. In case of a site failure, workloads are restarted on the site that is still active. Each vSAN stretched cluster also has a witness node that serves as a tiebreaker when a decision must be made regarding availability of datastore components when the network connection between the two sites is lost.

For more information on vSAN stretched clusters, see the VMware vSAN Documentation and the vSAN Stretched Cluster Guide.

You can deploy a Supervisor on an existing vSAN stretched cluster in active/active mode. When a Supervisor is deployed on a vSAN stretched cluster and settings are applied to provide HA to the Supervisor workloads, this configuration is referred to as a deployment-mode.

The supported Supervisor deployment on a vSAN stretched cluster is a single-zone Supervisor, where the underlying vSphere cluster is a vSAN stretched cluster.

Note: You can only use a greenfield deployment for a Supervisor running on a stretched vSAN cluster starting from the vSphere 8 Update 3 release. A greenfield deployment in this case means a Supervisor that is freshly deployed on a vSAN stretched cluster. If the Supervisor is already deployed on a different storage solution or a non-stretched vSAN cluster, you cannot convert the Supervisor to run on a vSAN stretched cluster.

To deploy a vSAN stretched cluster, follow the recommendations and instructions provided in vSAN Stretched Cluster Guide and the VMware vSAN Documentation. To activate and configure a Supervisor running on a vSAN stretched cluster, follow the instructions in the current guide. This way, you can ensure that:

Single-host failure does not bring down all the Supervisor control plane VMs and TKG cluster worker and control plane nodes.
Single-site failure or isolation allows all the Supervisor workloads to be fully-recovered and brought back to running state on the site that is still functioning. If one of the vSAN stretched cluster sites fails or gets network isolated from the other site and the witness, the Supervisor workloads can still be recovered and brought back to running state on the other site that is still functioning and connected to the witness node. This includes all Supervisor control plane VMs, TKG cluster control plane and worker nodes, and all the pods inside TKG clusters.
Bringing down the inter-site link between the two sites over the vSAN network allows all the workloads, Supervisor control plane VMs, TKG worker and control plane nodes to recover and get back to running state.
All the Supervisor workloads are able to access the persistent volume claims (PVCs) they were accessing before the failure event, including single host failure, entire site failure or isolation, or inter-site link failure.
All the Supervisor and TKG load balancer services continue to be reachable from outside the Supervisor after a failure event.

In the following example deployment, the vSAN stretched cluster is running in an active/active topology. The Supervisor is respectively configured in an active/active deployment mode. The Supervisor and TKG cluster control plane nodes are collocated. The worker nodes of TKG clusters are distributed between the two sites. The location of the Supervisor and TKG cluster VMs is determined by using site-affinity rules. A witness host is deployed outside of the vSAN stretched cluster..

A deployment of a Supervisor on top a vSAN stretched cluster with VMs distributed between the two sites