Prepare to Deploy Management Clusters to vSphere

Before you can use the Tanzu CLI or installer interface to deploy a management cluster, you must prepare your vSphere environment. You must make sure that vSphere meets the general requirements, and import the base image templates from which Tanzu Kubernetes Grid creates cluster node VMs. Each base image template contains a version of a machine OS and a version of Kubernetes.

Important

For Tanzu Kubernetes Grid deployments to vSphere, VMware recommends that you use the vSphere IaaS control plane (formerly known as vSphere with Tanzu) Supervisor. Using TKG with a standalone management cluster is only recommended for the use cases listed in When to Use a Standalone Management Cluster in About TKG.

From v2.5.1 onwards, Tanzu Kubernetes Grid does not support creating management clusters or workload clusters on vSphere 6.7. For more information, see End of Support for TKG Management and Workload Clusters on vSphere 6.7.

General Requirements

A machine with the Tanzu CLI, Docker, and kubectl installed. See Install the Tanzu CLI and Kubernetes CLI for Use with Standalone Management Clusters.
- This is the bootstrap machine from which you run tanzu, kubectl and other commands.
- The bootstrap machine can be a local physical machine or a VM that you access via a console window or client shell.
A vSphere 8, vSphere 7, VMware Cloud on AWS, or Azure VMware Solution account with:
- vSphere 7 or 8: Possible when vSphere IaaS control plane (formerly known as vSphere with Tanzu) Supervisor is not enabled.
- VMware Cloud on AWS: Deployed SDDC version is compatible with this version of Tanzu Kubernetes Grid. See the VMware Product Interoperability Matrix.
- At least the permissions described in Required Permissions for the vSphere Account.
Your vSphere instance has the following objects in place:
- Either a standalone host or a vSphere cluster with at least two hosts
  - If you are deploying to a vSphere cluster, ideally vSphere DRS is enabled.
- Optionally, a resource pool in which to deploy the Tanzu Kubernetes Grid Instance
- A VM folder in which to collect the Tanzu Kubernetes Grid VMs
- A datastore with sufficient capacity for the control plane and worker node VM files
- To deploy multiple Tanzu Kubernetes Grid instances to the same vSphere instance, create a dedicated resource pool, VM folder, and network for each instance that you deploy.
- To run the management cluster or its workload clusters across multiple availability zones, either now or later, create a vsphere-zones.yaml file that defines the zones’ FailureDomain and DeploymentZone objects.
  - Create this file as described in Create FailureDomain and DeploymentZone Objects in Kubernetes under Running Clusters Across Multiple Availability Zones.
You have done the following to prepare your vSphere environment:
- Created a base image template that matches the management cluster’s Kubernetes version. See Import the Base Image Template into vSphere.
- Created a vSphere account for Tanzu Kubernetes Grid, with a role and permissions that let it manipulate vSphere objects as needed. See Required Permissions for the vSphere Account.
- If you are using NSX Advanced Load Balancer to load-balance workloads, you have deployed it to vSphere as described in Install NSX Advanced Load Balancer. See Kube-Vip and NSX Advanced Load Balancer for vSphere below.
A vSphere network* with:
- The ability to allocate IP addresses to VMs created for your cluster, via either DHCP or by allowing VMs to select addresses when they boot up. In vSphere, the network type defaults to VSS, but you may prefer VDS or NSX in production environments.
  - In VDS and NSX, create a custom VM Network that can allocate IP addresses to the Kubernetes nodes in TKG.
- A DNS nameserver.
  - VSS includes this. For VDS and NSX, or if you are using Node IPAM, you need to know and configure the DNS nameserver addresses.
- A DHCP server configured with Option 3 (Router) and Option 6 (DNS) with which to connect the cluster node VMs that Tanzu Kubernetes Grid deploys. The node VMs must be able to connect to vSphere.
- A set of available static virtual IP addresses for all of the clusters that you create, including both management and workload clusters.
  - Every cluster that you deploy to vSphere requires a static IP address or FQDN for its control plane endpoint.
    - You configure this value as VSPHERE_CONTROL_PLANE_ENDPOINT, or if you are using NSX Advanced Load Balancer for your control plane endpoint, let the address be set automatically from an address pool.
    - After you create a management or workload cluster, you must configure its node DHCP reservations and endpoint DNS as described in Configure Node DHCP Reservations and Endpoint DNS Record. For instructions on how to configure DHCP reservations, see your DHCP server documentation.
- Traffic allowed out to vCenter Server from the network on which clusters will run.
- Traffic allowed between your local bootstrap machine and port 6443 of all VMs in the clusters you create. Port 6443 is where the Kubernetes API is exposed by default. To change this port for a management or a workload cluster, set the CLUSTER_API_SERVER_PORT or for environments with NSX Advanced Load Balancer, VSPHERE_CONTROL_PLANE_ENDPOINT_PORT variable when deploying the cluster.
- Traffic allowed between port 443 of all VMs in the clusters you create and vCenter Server. Port 443 is where the vCenter Server API is exposed.
- Traffic allowed between your local bootstrap machine out to the image repositories listed in the management cluster Bill of Materials (BoM) file, over port 443, for TCP. The BoM file is under ~/.config/tanzu/tkg/bom/ and its name includes the Tanzu Kubernetes Grid version, for example tkg-bom-v2.5.2+vmware.1.yaml for v2.5.2.
- The Network Time Protocol (NTP) service running on all hosts, and the hosts running on UTC. To check the time settings on hosts:
  1. Use SSH to log in to the ESXi host.
  2. Run the date command to see the timezone settings.
  3. If the timezone is incorrect, run esxcli system time set.
- The NTP server is accessible from all VMs. You can configure this using DHCP Option 42, or else follow Configuring NTP without DHCP Option 42.
NSX Advanced Load Balancer (ALB) installed in your vSphere instance, if you want to use NSX ALB as the load balancer and the endpoint provider for control plane HA. See Install NSX Advanced Load Balancer.
If your vSphere environment runs VMware NSX, you can use the NSX interfaces when you deploy management clusters. Make sure that your NSX setup includes a segment on which DHCP is enabled. Make sure that NTP is configured on all ESXi hosts, on vCenter Server, and on the bootstrap machine.

*Or see Prepare an Internet-Restricted Environment for installing without external network access.

Management Cluster Sizing Examples

The table below describes sizing examples for management clusters on vSphere. Use this data as guidance to ensure your management cluster is scaled to handle the number of workload clusters that you plan to deploy. The Workload cluster VM size column lists the VM sizes that were used for the examples in the Can manage… column.

Management cluster plan	Management cluster VM size	Can manage…	Workload cluster VM size
3 control plane nodes and 3 worker nodes	Control plane nodes: CPU: 2 Memory: 4 GB Disk: 20 GB Worker nodes: CPU: 2 Memory: 4 GB Disk: 20 GB	Examples: 5 workload clusters, each cluster deployed with 3 control plane and 200 worker nodes; or 10 workload clusters, each cluster deployed with 3 control plane and 50 worker nodes	Control plane nodes: CPU: 2 Memory: 4 GB Disk: 20 GB Worker nodes: CPU: 2 Memory: 4 GB Disk: 20 GB
3 control plane nodes and 3 worker nodes	Control plane nodes: CPU: 4 Memory: 16 GB Disk: 40 GB Worker nodes: CPU: 4 Memory: 16 GB Disk: 40 GB	Example: One workload cluster, deployed with 3 control plane and 500 worker nodes	Control plane nodes: CPU: 16 Memory: 64 GB Disk: 100 GB Worker nodes: CPU: 8 Memory: 8 GB Disk: 20 GB
3 control plane nodes and 3 worker nodes	Control plane nodes: CPU: 4 Memory: 16 GB Disk: 40 GB Worker nodes: CPU: 4 Memory: 16 GB Disk: 40 GB	Example: 200 workload clusters, each cluster deployed with 3 control plane and 5 worker nodes	Control plane nodes: CPU: 2 Memory: 4 GB Disk: 20 GB Worker nodes: CPU: 2 Memory: 4 GB Disk: 20 GB

Management cluster plan

Management cluster VM size

Can manage…

Workload cluster VM size

3 control plane nodes and 3 worker nodes

Control plane nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

Worker nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

Examples:

5 workload clusters, each cluster deployed with 3 control plane and 200 worker nodes; or
10 workload clusters, each cluster deployed with 3 control plane and 50 worker nodes

Control plane nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

Worker nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

3 control plane nodes and 3 worker nodes

Control plane nodes:

CPU: 4
Memory: 16 GB
Disk: 40 GB

Worker nodes:

CPU: 4
Memory: 16 GB
Disk: 40 GB

Example: One workload cluster, deployed with 3 control plane and 500 worker nodes

Control plane nodes:

CPU: 16
Memory: 64 GB
Disk: 100 GB

Worker nodes:

CPU: 8
Memory: 8 GB
Disk: 20 GB

3 control plane nodes and 3 worker nodes

Control plane nodes:

CPU: 4
Memory: 16 GB
Disk: 40 GB

Worker nodes:

CPU: 4
Memory: 16 GB
Disk: 40 GB

Example: 200 workload clusters, each cluster deployed with 3 control plane and 5 worker nodes

Control plane nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

Worker nodes:

CPU: 2
Memory: 4 GB
Disk: 20 GB

Also, see Minimum VM Sizes for Cluster Nodes below.

Kube-Vip and NSX Advanced Load Balancer for vSphere

Each management cluster and workload cluster that you deploy to vSphere requires one static virtual IP address for external requests to the cluster’s API server. You must be able to assign this IP address, so it cannot be within your DHCP range, but it must be in the same subnet as the DHCP range.

The cluster control plane’s Kube-Vip pod uses this static virtual IP address to serve API requests, and the API server certificate includes the address to enable secure TLS communication. In workload clusters, Kube-Vip runs in a basic, Layer-2 failover mode, assigning the virtual IP address to one control plane node at a time. In this mode, Kube-Vip does not function as a true load balancer for control plane traffic.

Tanzu Kubernetes Grid can use Kube-Vip as a load balancer for workloads in workload clusters (Technical Preview). You cannot use Kube-VIP as a LoadBalancer service on Windows-based clusters. For more information, see Kube-VIP Load Balancer.

To load-balance workloads on vSphere, use NSX Advanced Load Balancer, also known as Avi Load Balancer, Essentials Edition.

Important
On vSphere 8, to use NSX Advanced Load Balancer with a TKG standalone management cluster and its workload clusters you need NSX ALB v22.1.2 or later and TKG v2.1.1 or later.

Import the Base Image Template into vSphere

Before you can deploy a cluster to vSphere, you must import into vSphere a base image template containing the OS and Kubernetes versions that the cluster nodes run on. For each supported pair of OS and Kubernetes versions, VMware publishes a base image template in OVA format, for deploying clusters to vSphere. After you import the OVA into vSphere, you must convert the resulting VM into a VM template.

Supported base images for cluster nodes depend on the type of cluster, as follows:

Management Cluster: OVA must have Kubernetes v1.28.11, the default version for Tanzu Kubernetes Grid v2.5.2. So it must be one of the following:
- Ubuntu v22.04 Kubernetes v1.28.11 OVA
  
  Note
  In Tanzu Kubernetes Grid v2.5.2, the Ubuntu OVA image uses the Unified Extensible Firmware Interface (UEFI) booting mode.
- Photon v5 Kubernetes v1.28.11 OVA
- A custom OVA with a custom Tanzu Kubernetes release (TKr), as described in Build Machine Images.
Workload Clusters: OVA can have any supported combination of OS and Kubernetes version, as packaged in a Tanzu Kubernetes release. See Multiple Kubernetes Versions.

To import a base image template into vSphere:

Go to the Broadcom Support Portal and log in with your VMware customer credentials.
Go to the Tanzu Kubernetes Grid downloads page.
Download a Tanzu Kubernetes Grid OVA for the cluster nodes.

Important
For the management cluster, you must use one of the Kubernetes v1.28.11 OVA downloads. Make sure you download the most recent OVA base image templates in the event of security patch releases.

You can find updated base image templates that include security patches on the Tanzu Kubernetes Grid product download page.
In the vSphere Client, right-click an object in the vCenter Server inventory, select Deploy OVF template.
Select Local file, click the button to upload files, and navigate to the downloaded OVA file on your local machine.
Follow the installer prompts to deploy a VM from the OVA.
- Accept or modify the appliance name
- Select the destination datacenter or folder
- Select the destination host, cluster, or resource pool
- Accept the end user license agreements (EULA)
- Select the disk format and destination datastore
- Select the network for the VM to connect to
Note
If you select thick provisioning as the disk format, when Tanzu Kubernetes Grid creates cluster node VMs from the template, the full size of each node’s disk will be reserved. This can rapidly consume storage if you deploy many clusters or clusters with many nodes. However, if you select thin provisioning, as you deploy clusters this can give a false impression of the amount of storage that is available. If you select thin provisioning, there might be enough storage available at the time that you deploy clusters, but storage might run out as the clusters run and accumulate data.
Click Finish to deploy the VM.
When the OVA deployment finishes, right-click the VM and select Template > Convert to Template.

Important
Do not power on the VM before you convert it to a template.
In the VMs and Templates view, right-click the new template, select Add Permission, and assign the tkg-user to the template with the TKG role.

For information about how to create the user and role for Tanzu Kubernetes Grid, see Required Permissions for the vSphere Account below.

Repeat the procedure for each of the Kubernetes versions for which you downloaded the OVA file.

Required Permissions for the vSphere Account

The vCenter Single Sign On account that you provide to Tanzu Kubernetes Grid when you deploy a management cluster must have the correct permissions in order to perform the required operations in vSphere.

It is not recommended to provide a vSphere administrator account to Tanzu Kubernetes Grid, because this provides Tanzu Kubernetes Grid with far greater permissions than it needs. The best way to assign permissions to Tanzu Kubernetes Grid is to create a role and a user account, and then to grant that user account that role on vSphere objects.

Note
If you intend to use Velero to back up and restore workload clusters, you must also set the permissions listed in Credentials and Privileges for VMDK Access in the Virtual Disk Development Kit Programming Guide.

The procedure below desribes the role and user account to create in vCenter Server. For details about how to create roles and user accounts, see Using vCenter Server Roles to Assign Privileges in the vSphere 8 docs.

In the vSphere Client, create a new role, for example TKG, with the following permissions.

vSphere Object	Required Permission
Cns	Searchable
Datastore	Allocate space Browse datastore Low level file operations
Global (if using Velero for backup and restore)	Disable methods Enable methods Licenses
Network	Assign network
Profile-driven storage	Profile-driven storage view
Resource	Assign virtual machine to resource pool
Sessions	Message Validate session
Virtual machine	Change Configuration > Add existing disk Change Configuration > Add new disk Change Configuration > Add or remove device Change Configuration > Advanced configuration Change Configuration > Change CPU count Change Configuration > Change Memory Change Configuration > Change Settings Change Configuration > Configure Raw device Change Configuration > Extend virtual disk Change Configuration > Modify device settings Change Configuration > Remove disk Change Configuration > Toggle disk change tracking* Edit Inventory > Create from existing Edit Inventory > Remove Interaction > Power On Interaction > Power Off Provisioning > Allow read-only disk access* Provisioning > Allow virtual machine download* Provisioning > Deploy template Snapshot Management > Create snapshot* Snapshot Management > Remove snapshot* *Required to enable the Velero plugin, as described in Back Up and Restore Management and Workload Cluster Infrastructure. You can add these permissions when needed later.
vApp	Import

Create a new user account in the appropriate domain, for example tkg-user.
Assign the tkg-user with the TKG role to each object that your Tanzu Kubernetes Grid deployment will use.
- Hosts and Clusters
  - The root vCenter Server object
  - The Datacenter and all of the Host and Cluster folders, from the Datacenter object down to the cluster that manages the Tanzu Kubernetes Grid deployment
  - Target hosts and clusters
  - Target resource pools, with propagate to children enabled
- VMs and Templates
  - The deployed Tanzu Kubernetes Grid base image templates
  - Target VM and Template folders, with propagate to children enabled
- Storage
  - Datastores and all storage folders, from the Datacenter object down to the datastores that will be used for Tanzu Kubernetes Grid deployments
- Networking
  - Networks or distributed port groups to which clusters will be assigned
  - Distributed switches

Minimum VM Sizes for Cluster Nodes

Configure the sizes of your management and workload cluster nodes depending on cluster complexity and expected demand. You can set them to small, medium, large, or extra-large as defined in Predefined Node Sizes.

For all clusters on vSphere, you configure these with the SIZE, CONTROLPLANE_SIZE, and WORKER_SIZE cluster configuration variables. Or for greater granularity, you can use the VSPHERE_* _DISK_GIB, _NUM_CPUS, and _MEM_MIB configuration variables.

For management clusters, the installer interface Instance Type field also configures node VM sizes.

For single-worker management and workload clusters running sample applications, use the following minimum VM sizes:

No services installed: small
Basic services installed (Wavefront, Fluent Bit, Contour, Envoy, and TMC agent): medium

Create an SSH Key Pair

In order for the Tanzu CLI to connect to vSphere from the machine on which you run it, you must provide the public key part of an SSH key pair to Tanzu Kubernetes Grid when you deploy the management cluster. If you do not already have one on the machine on which you run the CLI, you can use a tool such as ssh-keygen to generate a key pair.

On the machine on which you will run the Tanzu CLI, run the following ssh-keygen command.
```
ssh-keygen -t rsa -b 4096 -C "email@example.com"
```
At the prompt Enter file in which to save the key (/root/.ssh/id_rsa): press Enter to accept the default.
Enter and repeat a password for the key pair.
Add the private key to the SSH agent running on your machine, and enter the password you created in the previous step.
```
ssh-add ~/.ssh/id_rsa
```
Open the file .ssh/id_rsa.pub in a text editor so that you can easily copy and paste it when you deploy a management cluster.

Obtain vSphere Certificate Thumbprints

If your vSphere environment uses untrusted, self-signed certificates to authenticate connections, you must verify the thumbprint of the vCenter Server when you deploy a management cluster. If your vSphere environment uses trusted certificates that are signed by a known Certificate Authority (CA), you do not need to verify the thumbprint.

You can use your Web browser’s certificate viewer to obtain the vSphere certificate thumbprint.

Log in to the vSphere Client in a Web browser.
Access the certificate viewer by clicking on the Secure (padlock) icon to the left of the Web address in the URL field.

The next steps depend on which browser you use. For example, in Google Chrome, you select Connection is secure > Certificate is valid to see the certificate details, including the thumbprint.
Record the SHA-1 Fingerprint value from the browser. If it contains spaces between each hex pair, substitute a : character for each space, for example 6D:4A:DC:6C:C4:43:73:BB:DF:9A:32:68:67:56:F9:96:02:08:64:F4.

You can use this thumbprint string to verify it when you deploy a management cluster from the installer interface, or provide it to the VSPHERE_TLS_THUMBPRINT option when you deploy clusters from a configuration file.

Settings and Rules for IPv6

To deploy a management cluster that supports IPv6 in an IPv6 networking environment:

Configure Linux to accept router advertisements to ensure the default IPv6 route is not removed from the routing table when the Docker service starts. For more information, see Docker CE deletes IPv6 Default route. sudo sysctl net.ipv6.conf.eth0.accept_ra=2
Create a masquerade rule for bootstrap cluster to send outgoing traffic from the bootstrap cluster: sudo ip6tables -t nat -A POSTROUTING -s fc00:f853:ccd:e793::/64 ! -o docker0 -j MASQUERADE For more information about masquerade rules, See MASQUERADE.
Deploy the management cluster by running tanzu mc create, as described in Deploy Management Clusters from a Configuration File.
- When you create the configuration file for the management cluster, set TKG_IP_FAMILY and other variables as described in Configure for IPv6.
- For IPv6 support, you must deploy the management cluster from a configuration file, not the installer interface.

Prepare Availability Zones

To deploy a standalone management and workload clusters to run across multiple availability zones (AZs) in vSphere, you need to:

Create or identify either of the following sets of objects in vSphere:
- A vSphere datacenter and its clusters
- A vSphere cluster and its host groups
Tag the objects to associate them with a region and its AZs in Kubernetes, as described in Prepare Regions and AZs in vSphere.

What to Do Next

For production deployments, it is strongly recommended to enable identity management for your clusters:

For information about the preparatory steps to perform before you deploy a management cluster, see Obtain Your Identity Provider Details in Configure Identity Management.
For conceptual information about identity management and access control in Tanzu Kubernetes Grid, see About Identity and Access Management.

If you are using Tanzu Kubernetes Grid in an environment with an external internet connection, once you have set up identity management, you are ready to deploy management clusters to vSphere.

Deploy Management Clusters with the Installer Interface. This is the preferred option for first deployments.
Deploy Management Clusters from a Configuration File. This is the more complicated method, that allows greater flexibility of configuration and automation.
If you are using Tanzu Kubernetes Grid in an internet-restricted environment, see Prepare an Internet-Restricted Environment for the additional steps to perform.