Server Health Checks - 2
Check
Network Connections
Here are some other checks you should perform to ensure proper network connectivity:
Here are some other checks you should perform to ensure proper network connectivity:
1. ipconfig /all will display all you TCP/IP settings including you
MAC address
2. ipconfig /flushdns will flush your dns resolver cache
3. ipconfig/displaydns will display what is in your dns name cache
4. Netstat -an command will show all the connections & ports
from a machine
5. Nbtstat command will show net bios tcp/ip connection stats
6. Tracert <IP or DNS Name> command will show you the
path the packet takes, the routers, and the response time for each hop.
7. pathping <IP or DNS Name> command combines ping and tracert
to the 100th degree. It pings each hop 100 times and is great for testing
wan connectivity
Disk Space
All kinds of bad stuff can
happen when your disk space is filling up. The best way to alleviate this
is to write a script to notify you when you reach a certain threshold. In a
future post I”ll share a method for you to do just that…however if there is a
problem and you need to perform a health check then here is how you check the
space the old fashion way.
To
check disk space manually:
1. Right Click on My Computer
2. Select Manage
3. Select Disk Management
4. Validate each disk more than 10 percent free space
Event Logs
Event logs can reveal a more
historical perspective on what is going on with the system and applications.
Things to look for when troubleshooting event logs is to query either the
system or the application logs and look for the presence of events that have a
timestamp near the time of the issue you are troubleshooting.
Events
have 3 categories in the event viewer:
·
Informational: Noted with a white icon and letter ‘i’.
Successful operations are logged as informational. Usually not used in
troubleshooting problems or failures
·
Warning: Noted with a yellow icon and exclamation
point. These usually are looked up as they serve as predictive future failure
indicators, such as disk space running low, dhcp ip address lease renewal
failures, etc.
·
Error: Noted with a red circle icon and ‘x’.
These are indications that something has failed outright and are a good
starting point for troubleshooting.
When
looking at event logs, use the information to determine the following:
·
Is the incident tied to
a particular time or outage incident?
·
Is this a one-off, or
has this particular error occurred multiple times in the past?
·
Does this error appear
on other systems or is it unique to the system that has failed?
Also make sure you take a look at eventcombmt from Microsoft. This tool allows you
to search the logs of multiple machines. The benefit to this is to see if
a specific error or warning message is also occurring on other systems.
This can help rule out issues.
Services
Troubleshooting services
should be limited to the specific that is affected by the problem being
troubleshot. Each server will have specific services varying upon the types of
applications running. You should document how your servers services are
configured to and compare that to the server in question to see if anything is
not configured correctly.
Cluster
Servers that host
applications and services that require high availability should be clustered so
that if one node fails the other can pick up the workload. Clustered
servers need the same type of health checks as stand-alone systems except you
will want to check on the health of the cluster.
Check
Cluster Resource Status
1. Open Cluster Administrator: Log onto server, select Start –> Run –> cluadmin
2. Check the Resources and ensure all are Online
3. If Cluster Administrator does not open, ensure that the Cluster
Service is running on the node.
4. Cluster resource status can also be checked from a remote
server. From a command prompt, just type –cluster
res <cluster name>
Client
Side Health
1. Right click on My Computer, select Manage
2. Open Device Manage
3. Drill down to SCSI and RAID Controllers, verify
that the HBA HW is visible and does not show any errors
4. If it does not show up in Device Manager, you may need to
re-scan for the HW, re-seat the fiber card, or re-install the driver.
5. If the HBA is showing healthy in Device Manager, open the tool
that you use to view configuration and settings for the fiber card and verify
there aren’t any transmit/receive errors on link statistics or counters
Switch
Health
1. Make sure fiber is properly connected to each switch
2. Make sure switch has no errors
3. If you’re using zoning verify it is properly configured
Check
Fiber and SAN Connectivity
1. Log onto san appliance and verify that the SAN is in general
good health and no major errors are present for the controllers, loops,
switches, or ports.
2. Ensure that the LUNs are presented to the servers in the cluster
NLBS
Some applications will
require you to spread the load across multiple servers. Web servers are a
very popular choice to network load balance. As with clusters we will
need to check the status of the load balancing.
Check
NLBS Status CMD Line
1. From a command prompt on the local system, run ‘wlbs query’.
This will give you the convergence status of the local node with the nlbs
cluster.
2. Other useful NLBS commands: wlbs stop (stops nlbs), wlbs start
(starts nlbs), wlbs drainstop (drains node)
0 Comments