Chassis Troubleshooting Guide

Chassis Troubleshooting Guide Introduction:

This document covers the troubleshooting steps for the majority of chassis issues that you are likely to face out in the field. This document covers all of the troubleshooting steps that Technical Support Operations would ask you to complete before submitting a chassis RMA. Becoming familiar with these procedures will enable you to more quickly diagnose server problems, and may also help reduce the number of site visits needed to fix a chassis related issue.


List of tools, hardware and software your technicians should have with them when troubleshooting a down chassis:
We recommend bringing basically the same set of tools, software and hardware with you to troubleshoot an existing chassis as you would use when installing a new Chassis. Most of the time you will not have to use most of these items, but having them with you at all times will ultimately save you many hours of extra work. It is quite common for a client to call and report a "down system" when in reality they have a simple wiring issue. It is a big waste of your time, and the client's if you must turn around and go back to the office to pickup a different set of tools or a different set of software. This may seem like common sense advice, but it's worth documenting as Technical Support deals with many down system cases that get delayed for lack of basic tools and materials.

1. Basic tools to have with you: screwdrivers, pliers, flashlight, anti-static wrist strap, phone tools (butt set, toner, punch down tool), RJ-45 and RJ-11 crimpers, wire cutters/stripers, a banjo adapter, snips, USB flash drive, spare Ethernet and phone cabling and spare grounding wire.

2. Software. Get an inexpensive CD wallet and burn CD's of your most commonly used Operating Systems, AltiWare versions, drivers and utilities. Keep this software with your other tools. While you may be able to download most of it if needed while at the client site, having it ahead of time on CD is cheap insurance. Every minute you spend searching for and downloading common software at the client site is time you could have been using to troubleshoot the client's actual problem. Here is a good starter set of CD's to keep: Windows XP Professional installation media, Windows 2003 Server installation media, Bootable CD's and Floppies, AltiWare install CD's to cover your most common installations, CD with common AltiWare updates, CD with documentation and a Driver/Utility CD. Finally, Restore CD's. Anytime you sell a new chassis, you should save a copy of the system restore CD's it comes with, don't count on your clients to keep track of them over the years after you install their phone system. It takes only a few minutes to burn a set of restore CD's from your company's file server, it takes a minimum one business day for AltiGen to ship you a new set of restore CD's.

3. Spare Parts. It can be a huge time saver to have just a few inexpensive spare parts with you at all times. Here is our recommendation for spares that each field tech would benefit from carrying: Hard Drives (recommend both a SATA and an IDE drive), CT-BUS cables, Triton extension board power cable, T1 loopback plug, harmonica, spare amphenol cable and 66-block and spare RAM (carry a few different types). Add to this list anything you find yourself using frequently. We also strongly recommend keeping a spare chassis, assortment of telephony boards, spare SBC and a spare PSU at your office.

4. Anything else you find helpful. One partner I spoke to always keeps a stack of our quick reference cards with him when visiting a client site. Whether it's for post-install training or to troubleshoot a down system. He said this has saved him untold hours on-site by being able to hand out cards to end users with basic functionality questions, instead of always stopping his troubleshooting to show them how to transfer a call or setup their Voicemail. Certainly clients appreciate the personal touch, but often a quick answer to an easy question is just as valuable. Another dealer keeps a few of his company's sales brochures with him, and drops them off to other businesses near the client he is visiting.


Troubleshooting steps by type of failure.
Note: Always try and capture a backup of the client's configuration if at all possible before beginning any troubleshooting. If there a piece of server hardware is beginning to fail, you may have limited time to work with the system before you can no longer access AltiWare or the Operating System. Also, start with the obvious. Has the server been rebooted since the problem started? Are there any error messages displayed? Have there been any recent hardware, software or network changes? Is the server properly grounded and connected via an adequate UPS? Is the chassis properly installed in a cool, dry environment? Does the server function normally with all AltiGen boards removed from the chassis?

Warning! Improper handling of chassis components or telephony boards may void your hardware warranty. Always wear an ESD wrist strap when installing, removing or moving boards or other chassis components.

System crash, unexpected reboot, blue screen, software/application hangs:
1. Check the event log for any Application or System errors that occurred around the time of the crash. Specifically, check for any disk errors application hangs or unexpected service stops. If you're not sure what an error means, search the AltiGen Knowledge Base (http://kb.altigen.com), Google (http://www.google.com), and Microsoft's Tech Net (http://support.microsoft.com) for information.
 

2. If AV software is installed, check that the Altiserv, Postoffice and AltiDB folders and their subfolders are excluded from real-time and scheduled scans. It may be helpful to disable AV completely while troubleshooting the issue. This is very important! Incorrect AntiVirus configuration is the cause of many problems.
 

3. Use Adaptec Storage Manager to check the health of the RAID array.
 

4. Use task manager to check the CPU/memory utilization. Look for any applications or processes that are consuming unusually high amounts of CPU or memory.
 

5. If the system is blue screening, look in the C:\windows folder to see if there are any memory dumps (files ending in .dmp). If so, open a case and upload the memory dump file.
 

6. Make sure the OS is up to date with supported service packs (Currently SP2 for XP and 2003) and Windows updates (www.windowsupdate.com). Confirm that automatic updates are off.
 

7. Are there any 3rd party applications installed? If so, disable them while troubleshooting.
 

8. Confirm that all fans, including CPU, Chassis and PSU fans are in working order. Overheating can cause premature drive failure, system lockups, blue screens and a host of other issues.
 

9. Is the chassis generating any audible alarms or are any LED's red/amber/flashing?
 

10. If the server has redundant power supplies, make sure that both PSU's are running.
 

11. What boards are installed? Are they all within supported revision level for the version of AltiWare the client is running? Check that all boards are installed according to the Quick Install Guide.
 

12. If all of the above checks out and you are still experiencing problems, create a backup of the client's configuration and reimage the chassis. Reimaging the chassis is a relatively quick way to rule out software issues. Relatively because trying to track down the source can be a long, painful process Backup your AltiGen config, reimage, reapply the same AltiWare update you were running before, and restore.
 

13. For blue screens/crashes that generate invalid page faults or other memory related errors, try replacing the RAM in the system and see if the problem is still reproducible.

 

14. Contact Technical Support if you cannot determine the cause of the failure after following these steps. Please have remote access available.
 


Chassis Hard lock:
Can see the desktop, but can't move the mouse and there are no calls being processed.
1. Check the DSP boards first (T1/PRI, VoIP, Resource, MeetMe). Run multiple passes of the AltiGen Board Test and CT-BUS tool on the DSP boards to see if you can replicate the lockup or if any boards generate errors.

 

2. Any board(s) that consistently fail the Board test should be replaced. If the chassis hard locks while running the Board Test, run the test on the DSP boards individually to determine which board is causing the problem.
 

3. If the CT-BUS test fails, check that all boards are installed according to the Quick Install Guide. If they are, replace the MVIP cable. Rerun the CT-BUS test to confirm the problem has been resolved.
 

4. Create a backup of the client's configuration and reimage the chassis. Reimaging the chassis is a relatively quick way to rule out software issues. Relatively because trying to track down the source can be a long, painful process Backup your AltiGen config, reimage, reapply the same AltiWare update you were running before, and restore.

 

5. If all of the boards check out okay and a reimage does not resolve, check other chassis hardware (Fans, PSU, CPU, Memory, RAID Health etc.) and contact Technical Support.



System powers on, but will not POST or no video:
1. Does the system play any beep codes when powered on? If so record the pattern and contact AltiGen Technical Support 8. Confirm that all fans, including CPU, Chassis and PSU fans are in working order.
 

2. Is the chassis generating any audible alarms or are any LED's red/amber/flashing?
 

3. If the server has redundant power supplies, make sure that both PSU's are running.
 

4. Reseat the SBC (Single Board Computer), or move it to a different slot.
 

5. Remove all boards except the SBC and try booting again. If the system boots up and can load windows, shut the server down and begin adding the boards back one at a time until you identify the faulting hardware.

 

6. If you cannot determine the cause of the failure after completing these steps, please contact Technical Support.



System Posts, but will not load OS:
1. Where does the boot process fail? Can you boot into safe mode (Applicable on MAX1000-R and standard Office chassis, the MAX1000 does not have a safe mode boot option)?
 

2. If you can boot into safe mode, but can't start Windows normally, check the event logs, look for any hardware or software that was recently added to the system.
 

3. Enter the RAID BIOS (usually + while booting). Check the health of the array, replace drives and rebuild or reimage as needed.
 

4. If all hardware checks out and the RAID array and drives are in good condition, and you cannot find the cause of the bootup failure, get a backup of the client's configuration and reimage the chassis. You may be able to get a backup by booting into Safe Mode with Networking, or if the chassis will not load Windows even in Safe Mode, you may disconnect one of the drives and connect it to another server/computer as a slave, then manually copy off the data.


5. If you still have issues loading the OS after correcting any Hard Drive/RAID issues, and have already reimaged the chassis, contact Technical Support.



Chassis and Component RMA's:

If you are unable to resolve the issues with the chassis after following these steps and contact Technical Support, Technical Support may ask you to RMA the chassis or one of it's components. Please be aware, that only clients that have purchased Premier support may enter an Advance Replacement RMA. For non-Premier clients, they may enter a Depot Repair or pay a $1,000 expedite fee for Advance Replacement. Many clients may be able to upgrade their support to Premier, contact the Market Development Support Team at 888-258-4436 Option 2 if you have questions on Service Plans or need help upgrading a client to Premier Service.


1. Contact Technical Support before submitting a chassis or system component RMA.


2. Make sure you have followed the steps in this guide to try and resolve the issue before requesting an RMA. Technical Support may ask you to perform additional testing depending on the nature of the failure.


3. If the problem can be isolated to a specific component, the RMA must be for that component. We will only RMA an entire chassis as a last resort if the source of the issue cannot be positively identified.


4. Fully document your RMA request! We will need the Chassis Serial Number, a detailed description of the failure, how to reproduce the failure and the steps taken to try and resolve the issue. Incomplete or inaccurate RMA descriptions can result in delays in processing the RMA or NTF (No Trouble Found) fees if we are unable to reproduce the problem. If Technical Support determines that all necessary troubleshooting steps have not been tried, additional troubleshooting/testing may be required before processing your RMA.


Please refer to Field Alert 236 for detailed information on the new Service Plans, Support Process Changes and RMA details.



Attachments

No attachments were found.

Related Articles

Visitor Comments

Article Details

Last Updated
5th of April, 2010

Would you like to...

Print this page  Print this page

Email this page  Email this page

Post a comment  Post a comment

 Subscribe me

Subscribe me  Add to favorites

Remove Highlighting Remove Highlighting

Edit this Article

Quick Edit

Export to PDF


User Opinions



How would you rate this answer?




Thank you for rating this answer.

Continue