Introduction
VMware vSphere comprises one or more virtual centre servers and multiple ESXi hosts. Virtual networking using distributed switches and NSX, and virtual storage (VSAN) can all be managed through the virtual centre server. The VMware update manager is also integrated. vSphere 6.5 comes with many features that enable customers to easily manage their private, public and hybrid cloud environments.
Customer Background and Issues
During a recent engagement for a renowned healthcare provider, a vSphere 6.5 Cluster issue was encountered with VCSA and external PSC.
The customer’s environment is built on Cisco UCS FlexPods with FCoE (Fibre Channel over Ethernet) – all flash storage. There are two tier 3 datacentres, two VMware Virtual Centre Server appliances (one at each site) running vSphere 6.5 in enhanced linked mode. The Platform Service Controllers are external, which is a requirement for linked mode. There are six clusters, with three blades in each cluster, three clusters in each site.
The customer was seeing numerous issues with one of the six clusters including:
- No management of VM’s possible (power on/off, create or delete snapshots, edit settings was greyed out on every VM)
- vMotion failing
- DRS non-functional due to vMotion failure
- HA wasn’t enabling/disabling correctly (one of the three hosts in the cluster would function, two would not)
- Unable to disconnect/connect a host from the vCenter using the GUI
- Tasks showing as ‘running’ in tasks on the vCenter server, but not showing as running on the host when directly connected
During troubleshooting with VMware GSS, it was discovered that the database entries for the hosts/cluster were corrupt.
The Solution
After further investigation, we identified that the problem was linked to the vCenter and not the hosts, as all of the commands could be run when connected directly to the host console. VMware GSS were engaged, and they agreed with the initial findings, that the vCenter vPostgres entries were corrupt. The decision was made to disconnect the hosts from the vCenter and then reconnect, with the aim to refresh the agents and force the database to update. Because we could not disconnect or reconnect from the web interface, the hosts from the vCenter were forced to disconnect, using the vPostGres database commands. This was completed during a pre-planned, out of hours change window, where the fix below was applied:
- SSH to VCSA
- /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres press <Enter>
- Should give VCDB=# prompt
- Select id,dns_name,enabled from vpx_host;
- Will list all hosts and the id number of the host.
- Update vpx_host set enabled = 0 where id = <Host ID from table above>;
- Response should be UPDATE 1
- Select id,dns_name,enabled from vpx_host;
- Should show the host removed with a 0 under enabled
- Repeat for any other hosts
- \q
- Stop vpxd – service-control –stop vmware-vpxd press <Enter>
- Start vpxd – service-control –start vmware-vpxd press <Enter>
- Refresh/log in to vSphere web client and reconnect the disconnected hosts by right click on host, Connection > Connect
The Results
Once the fix had been applied, the cluster within the customer’s environment returned to a manageable state, with zero downtime for the customers virtual machines. The fix was also applied with zero impact to the services, resulting in the functions that would not previously work now working without fault.
Xtravirt is a leading VMware specialist and has the ultimate combination of deep experience and agility to design and deliver IT transformations. If you need assistance in getting the best from your VMware vSphere estate, please contact us, and we’d be happy to use our wealth of knowledge and experience to assist you.
The post VMware vSphere 6.5 – a customer solution appeared first on Xtravirt.