Robert’s Trouble Shooting Guide

  The following is offered for your assistance, without prejudice.


INDEX:
	Safety
	Overall
	Gather the History
	Check It Out
	Closing in on the Problem
	Isolating the Problem
	Effecting a Solution
	Before you Consider the Job Done
	When the Problem Resists Solution

Safety:

  By the general nature of this article, safety cannot be adequately addressed. Work carefully, and with due care and attention to what you are doing and to what is going on around you. Apply common sense in thinking and action, for your personal safety, for the safety of others, and for the preservation of equipment, tools and environment.

  A system, especially in failure mode, can be dangerous. Although dangers are always present, one existing failure could indicate, or generate, a weakness in the system. An existing failure could also allow the compounding of a second failure, whether that second failure was dependent on the first, or independent of it. Such additional failures could precipitate further damage to the equipment. Also possible is injury to personnel, and damage to other equipment in the vicinity. These consequences must be considered before any further running of the equipment, including test running, is performed.

  With guards removed, as may be necessary to troubleshoot, the safety of the guard must be replaced by additional diligence, attention and care.


Overall:

  Trouble shooting may turn out to be an iterative process. Always be prepared to go back to a previous step, and try pursuing new ideas. When collecting data, be aware of this, and collect enough so that ‘going back’ is to go back to previous data, and not to collect more data. This will help you save time – your time and that of others.

  If the ‘data’ you do collect is not internally consistent – you may have collected conclusions, not data; or your ‘data’ may be wrong – or both. Try to resolve any discrepancies before continuing with another step – or make it a goal of the next step to resolve these discrepancies.


Gather the History:

  Before fixing something, you should know what is wrong. Knowing what has happened can help you look for the problem.


Check It Out:

  Verify that the problem, as defined, exists.

  Obtain first-hand experience with the problem, and reduce it to a simple form. This can help you make sense of what you were told.

  Consider these questions, as you hypothesize where the problem may be:

  There may be a quick path to rectification, as suggested in the answer to the following:

  If there is need to look further, consider these:


Closing in on the Problem:

  From what you have been told, what is the likely nature of the problem? Try to define two points: “Everything up to here is working”; and “Beyond here, things would function if the inputs were right.” The input of simulated signals may help in defining this latter point.

  Select a spot, between the “OK-to-here” point and the “OK-from-here” point and determine, if possible, if the process is, or is not, functioning up to and/or beyond this new point. This should allow one of the earlier points (“OK-to-here” and “OK-from-here”) to be moved closer to the other. This selected spot for testing could be chosen based on convenience with which the test can be done, convenience of repair or replace of a component, or based on a suspicion as to where the problem is. Failing any other logic, the new point should be roughly ‘half way’ between the other two. (‘Half-Way’ could be based on: The count of components modifying, or otherwise changing the signal count; or The distance the signal travels; or Something else.)

  If the suspect signal splits, and the failure is seen only at the end of one branch, then the system is probably OK, up to the split. If the system is not working, on any branch beyond the split, there are three possibilities: A single failure, upstream of the split could have happened, and may be considered likely; But multiple simultaneous failures, downstream of the split could have been caused by the likes of a voltage spike or pressure surge. Consider the possibilities, likelihood, and testing options, before choosing where and how to look next; and keep in mind that one may be working on a hunch, at best. The option not chosen may have be hiding the solution. Of course, in a bad-case situation, a surge may have spawned damage to the signal source and to equipment downstream of the split.

  Reiterate the foregoing while practical.

  Sometimes finding the problem can be done by actually implicating a specific component or module. At other times – especially when dealing with a system in development (a system that has never worked) – there can be a subtle difference between ‘finding and resolving a problem’, and ‘resolving a problem, there by identifying it so it can be avoided in the future’. In this case, if applying a solution to a hypothesized problem resolves it, then one has likely identified the problem. Use this approach with prudence when working on a system which has previously been fully and properly commissioned and functioning.


Isolating the Problem:

  Whether to perform a simple test on a component that is likely to be good, or a complicated diagnostic test on a component that is likely to have failed, is a judgement decision. How simple versus complicated? How unlikely versus likely? How strongly do the indicators point to the component involved in the complicated test? What else would you learn in doing the simple test?

  Can you swap components in the suspect area with believed-good components from a functioning system, or from available spares? (Note that there may be special settings or adjustments which will have to be implemented on a replacement part – and undone, if the part is to be returned to its previous location, or at a minimum documented if the part is to be returned to storage.)

  Does the replacement component make the system work, or does it leave the system in the same failure mode? If the same failure mode is exhibited, can you verify that the replacement part is not defective?

  Does the suspect component work where the replacement was taken from, or does it replicate the failure mode in that system?

  When components are returned to their original locations, does the problem move back? (This is known as A-B-A testing, and can help identify problems that are simply from loose connections and/or improper adjustment.)

  If swapping components introduces a new failure mode in either system: It is time to slow down and think!

  If a previously-working systems exhibit a failure mode when components are swapped back into their original position: It is time to slow way down and have a long think!


Effecting a Solution:

  Resist the temptation to make adjustments before knowing that an adjustment is needed. This is especially true where the problem could be as simple as an electronic component needing a reset.

  Resist the temptation to make adjustments before knowing where an adjustment is needed. If there is an alignment problem between two components, determine which should be moved to minimize the necessity of any subsequent realignments between the balance of the system and the adjusted component.

  If you understand what caused the problem, you can probably understand what the problem is, and how it should be addressed. If you figure there is no way the failure could have happened – you may not be knowing, truly, what has failed.

  Is the economic/expedient route to repair, to replace, or to troubleshoot further? Should you instead make a ‘work-around’?

  Even if you are sure of what adjustments are to be made, consider generating enough information to allow an “undo.” This information can also be used to judge the magnitude of the change being made, which is perhaps useful in other ways.

  Ensure the repairs make the system functional.


Before you Consider the Job Done:

  Remove tools, undo special settings, close the machine, replace guards, return borrowed tools, etc.

  If you have cured the symptom but not the cause, what plans should be made to find and cure the problem? If the solution was a ‘work-around’, is it to be permanent?

  If the problem was the result of abuse, is re-training (of operator, maintenance, or other crews) required? Should the manual be revised?

  If the solution was a ‘work-around’, is re-training required?

  Re-ensure, after replacing all guards and covers, that all operations are functional. Report to the customer that the equipment is available for use. (At the risk of embarrassing yourself, the steps of this paragraph may, in some situations, be interchanged.)

  If troubleshooting turned into a ghost-hunting exercise, and no significant repairs were made, then make enough notes so that you, or someone else, can pick up from where you left of when [‘when’ not ‘if’] the ghost re-appears.

  Document and share your experience, so others may know what areas are prone to failure.

  Especially if you implemented a work-around: Leave documentation with the system so others may operate and maintain what is now non-standard equipment; Share with colleagues, and file documentation, so others may re-use the work-around you developed.

  Consider the utility of further examinations, which would come from a “root cause failure analysis.”


When the Problem Resists Solution:

  On a rare occasion, you may encounter a problem which seems to resist sensible logic; when the simplest solution is not a solution. One or more of the following may help:


Copyright

This material is Copyright (© 1998, through 2016), Robert W. C. Stevens. Reproduction, with this copyright notice intact, is permitted – but sharing the URL would save a tree, and probably make more sense.


 Robert’s Home Page  The latest version of this page may be accessed at
http://www.wendygamble.com/RwcS/Guides/TroubleShooting.html
 Pleased To Be Of Service, RwcS