Jump to content

NI-Week Session: Advanced Error Handling in LabVIEW


crelf

Recommended Posts

Brian Gapske (V I Engineering, Inc Test Software and Integration Group) and I will be co-presenting a session called "Advanced Error Handling in LabVIEW" at NI-Week 2009. We've got some interesting stuff to show you, but I'd like to open the floor to see if there's anything anyone would like to hear specifically. Please reply to this thread and let's get some breain storming going. Of course, I can't gaurantee that we'll be able to answer all of your questions in the presentation (there's only so much time in a day :( ) but if you'd like us to skew it to something in particular, this is the place to discuss your ideas.

Link to comment

I would like to see an error handling for multiloop architectures with a dedicated error handling loop. The overall idea is presented in the 'LabVIEW Style book' by P. Blume. But there are no details on the error handling specifically.

One really important thing is how to report errors to users. Using the normal error dialog works nice for me, but users click it away most of the time and then give me a call 'your software is not working'. This is more about psychology than SE.

Felix

Link to comment

QUOTE (Black Pearl @ May 20 2009, 06:57 PM)

One really important thing is how to report errors to users. Using the normal error dialog works nice for me, but users click it away most of the time and then give me a call 'your software is not working'. This is more about psychology than SE.

I agree that usability is a key. I'd sure like to know who at NI thought it was a grand idea to share error 7 between File I/O and GPIB. "Why does my system have a GPIB error? ... Because your configuration is pointing to a file that doesn't exist." I have NEVER in 16 years had a user tell me that there was a File I/O error when an error 7 occured.

I used to be bad at error handling. (I hope Brian isn't using me as the bad example.) I now use a centralized error handler with an error passing vi that makes it simple. With a central error handler, I can log errors and command things to shutdown if severe errors occur.

Link to comment

QUOTE (gleichman @ May 21 2009, 03:27 AM)

I agree that usability is a key. I'd sure like to know who at NI thought it was a grand idea to share error 7 between File I/O and GPIB. "Why does my system have a GPIB error? ... Because your configuration is pointing to a file that doesn't exist." I have NEVER in 16 years had a user tell me that there was a File I/O error when an error 7 occured.

This is a historical flaw, I don't remember why that happend. They had no central repositiory for the error these times, maybe. But yes, every user is reporting you a GPIB error and never ever a file error.

Felix

Link to comment

QUOTE (Black Pearl @ May 20 2009, 08:38 PM)

This is a historical flaw, I don't remember why that happend. They had no central repositiory for the error these times, maybe. But yes, every user is reporting you a GPIB error and never ever a file error.
Weird... error code 7 is one of the ones I deal with on a daily basis (just today, in fact, I fixed a bug today involving a function returning it and the next function in line not handling it correctly) and I've never had the GPIB error... this could be because except for a week of training 9 years ago, I've never had reason to use GPIB. :-)

The common NI Error Code Database came into being in LV 6.1. Prior to that, every group had its own error codes, and of course, every group started counting at 1... except for one group ( I forget which) that actually used error code zero as an error. Ug. Nowadays we have reserved error code ranges for different products, and we offset the errors coming from the operating system into their own region. There's even two ranges reserved for our customers. :-)

Link to comment

QUOTE (Aristos Queue @ May 20 2009, 09:40 PM)

There's even two ranges reserved for our customers. :-)

With a measly 6000 error codes. We've used a good portion of them, and undoubtedly they would conflict with other users' codes if were were ever to share code. I don't suppose we could have a few more of the 4 billion codes available? Whom do we have to waterboard to make this happen?

Link to comment

I would like your thoughts on the creating the custom Error text file. NI seems to promote that in the Intermediate classes. When I create custom Errors I just use a state machine in my Error handling routine. I tried using the file but I didn't like switching back and forth. I prefer to see them in my block diagram. I would be curious on what you think on using a text file to create you custom errors

Link to comment

QUOTE (ASTDan @ May 21 2009, 02:29 PM)

I would like your thoughts on the creating the custom Error text file. NI seems to promote that in the Intermediate classes. When I create custom Errors I just use a state machine in my Error handling routine. I tried using the file but I didn't like switching back and forth. I prefer to see them in my block diagram. I would be curious on what you think on using a text file to create you custom errors

I use the <err> and <append> tags. This also avoids collisions with other errors I might define with the same code. (I have made two VIs: Overwrite error and Add Details to Error).

Moving the custom error messages around from project to project was to much hassle.

Felix

Link to comment

Quick note to everyone: great idea - and even better to hear that some of you are implementing your own layers to better handle errors in LabVIEW. I encourage you all to upload an code that you think appropriate to better facilitate the discussion. So, upload your ideas: we might even feature them in our presentation!

Link to comment

I'll try and add my :2cents: to show the concept of how I handle error logging and visualisation in my applications.

I basically have the whole error/message logging encapsulated in a by-ref class. This class:

  • handles logging of the errors/messages to disk
  • rotates logs every N days
  • has an active thread of which optionally the UI can be shown as a (floating) window to see the realtime log
  • publishes log events to interested subscribers through dynamic events

Each parallel loop (including dynamically spawned processes in active objects etc.) takes a reference to the logger object. To make sure all errors are catched all executions chains should end with the AddError method.

post-906-1242946857.png?width=400

In the example here, on error, the default logwindow would be shown (which shows all messages since application start). Whether or not you want that depends on the type of application and where you are in the development cycle. I usually use the catched event to determine what error occured and decide what to do with it (e.g. ignore it, show in a nice UI, quit app, etc.)

Link to comment

This is my take on error handling

post-8614-1242953725.jpg?width=400

I use a FGV to store User event. On an error I generate a user event and pass the error to my event loop. In the dynamic error event case I have a state machine that can handle whatever error based on the error code, and then perform a custom action (i.e. shut down hardware, perform a certain action to correct the error, etc). I also log all errors generated to a text file.

This I have found works for me because the error dialogs are handeled in the event loop and I can still call other sections of code when the error dialog is displayed.

I am very interested in what other people do to handle their errors.

Dan

P.S. I hope NI gives you the big room for this presentation and doesn't have it at 4:00 on Thurs.

Link to comment

This how I handel Error in my Error Case.

QUOTE (ASTDan @ May 21 2009, 08:13 PM)

This is my take on error handling

http://lavag.org/old_files/monthly_05_2009/post-8614-1242953725.jpg' target="_blank">post-8614-1242953725.jpg?width=400

I use a FGV to store User event. On an error I generate a user event and pass the error to my event loop. In the dynamic error event case I have a state machine that can handle whatever error based on the error code, and then perform a custom action (i.e. shut down hardware, perform a certain action to correct the error, etc). I also log all errors generated to a text file.

This I have found works for me because the error dialogs are handeled in the event loop and I can still call other sections of code when the error dialog is displayed.

I am very interested in what other people do to handle their errors.

Dan

P.S. I hope NI gives you the big room for this presentation and doesn't have it at 4:00 on Thurs.

Link to comment

QUOTE (jdunham @ May 20 2009, 11:53 PM)

With a measly 6000 error codes. We've used a good portion of them, and undoubtedly they would conflict with other users' codes if were were ever to share code. I don't suppose we could have a few more of the 4 billion codes available? Whom do we have to waterboard to make this happen?
No need for waterboarding. No one had ever asked for a larger range.

As of this morning, the range 500000 to 599999 is now reserved for users.

  • Like 1
Link to comment

QUOTE (Aristos Queue @ May 22 2009, 09:12 AM)

As of this morning, the range 500000 to 599999 is now reserved for users.

:thumbup:

I think everyone should move over to the new range and use it exculsively I hereby claim the old ranges for VIE!

I am, of course, joking.

Link to comment

QUOTE (crelf @ May 22 2009, 02:57 PM)

I think everyone should move over to the new range and use it exculsively I hereby claim the old ranges for VIE!
Ah, but you hit on a very real problem: People cannot move to a new range. If we could do that, we would make the GPIB error codes no longer overlap with the LV error codes. But there's a lot of VIs in the world that check for specific error codes as returned values. Changing the error code for a given error can wreck havoc.

Once an error code is allocated, it stays allocated, even if the product is end-of-life, because someone might still be using that product out in the world.

Link to comment

QUOTE (Aristos Queue @ May 22 2009, 04:45 PM)

Ah, but you hit on a very real problem: People cannot move to a new range. If we could do that, we would make the GPIB error codes no longer overlap with the LV error codes. But there's a lot of VIs in the world that check for specific error codes as returned values. Changing the error code for a given error can wreck havoc.

Once an error code is allocated, it stays allocated, even if the product is end-of-life, because someone might still be using that product out in the world.

Oh totally! It would be crazy to decomission error codes.

Link to comment

QUOTE (Aristos Queue @ May 22 2009, 06:12 AM)

No need for waterboarding. No one had ever asked for a larger range.

As of this morning, the range 500000 to 599999 is now reserved for users.

Thanks! :worship:

At the risk of pushing my luck, how about a range specifically reserved for OpenG? :ninja:

Link to comment

QUOTE (Black Pearl @ May 25 2009, 08:40 AM)

As far is I read the other posts, we all use events to pass the error data. Other ways of doing it?

Felix

Thats probably because everyone here uses the same topology. (centralised error handling). I use local error handling, since a lots different stuff has to happen if there is an error (not just tell the user) and that would make a centralised error handler a bit of a pig. The only common denominator is that I have to put a dialogue on screen and halt other processes execution ("Launch Error Dialogue.vi") while the operator decides what to do. In the meantime the process that threw the error tries to recover to a safe/stable state. The "Launch Error Dialogue" loads and runs (yup, you guessed it - the "Error Dialogue.vi) which logs to a file and filters the error to provide different options to the user (if required). It can be called from anywhere in the code and can remain on-screen, not show at all (i.e just log) or time out after n seconds (depending on the error level). It also does other things like set off a siren, change traffic light indicators etc. Nice and simple and just plonk it in your error case of the state machine.

One thing that hasn't been discussed so far is error levels.

In my system(s), I have severity/priority levels for errors (Information, System, Critical, Recoverable, Process and Maintenance). What do other people do to prioritise errors (if anything)?

Link to comment

QUOTE (Aristos Queue @ May 22 2009, 10:45 PM)

Ah, but you hit on a very real problem: People cannot move to a new range. If we could do that, we would make the GPIB error codes no longer overlap with the LV error codes. But there's a lot of VIs in the world that check for specific error codes as returned values. Changing the error code for a given error can wreck havoc.

Once an error code is allocated, it stays allocated, even if the product is end-of-life, because someone might still be using that product out in the world.

Would it not be possible append a totally new errornum series to the end of the source field, introduce new error cluster definition side by side with old one and new error table?

Or even better introduce a new error wire with "fields" status, Errcode, code, (time[optional]), source and Errsource, and have conversions between the new and old error handling. Then better we could have better error message in the new series, while still keep the old in place.

Link to comment

QUOTE (Anders Björk @ May 25 2009, 11:51 AM)

Would it not be possible append a totally new errornum series to the end of the source field, introduce new error cluster definition side by side with old one and new error table?

Or even better introduce a new error wire with "fields" status, Errcode, code, (time[optional]), source and Errsource, and have conversions between the new and old error handling. Then better we could have better error message in the new series, while still keep the old in place.

Great ideas Anders - That's very close to what we'll be presenting at NI-Week.

  • Like 1
Link to comment

QUOTE (crelf @ May 25 2009, 06:53 PM)

Great ideas Anders - That's very close to what we'll be presenting at NI-Week.

Would't be an OOP aproach be best for that? Base class only contains the traditional error cluster, so the object can be casted to that base class to be compatible with the standard error.

Felix

Link to comment

This is a topic I've been experimenting with for a while. Basically, I've found it's insufficient to have just a central error handler or just a local error handler. I think you need to have strategies for both. If you do just central error handling, it becomes difficult to do things like retry an operation, because it requires a lot of code for the central error handler to communicate with the specific section of code that threw the error. You also have to deal with the behavior of other code as you pass the error around, which is difficult, because different VIs and APIs treat incoming errors in different ways (which means that for any sufficiently complex secton of code, the behavior on an incoming error is essentially undefined). If you do just local error handling (I use the term "specific"), you end up calling dialogs or accessing files from loops you probably shouldn't be accessing them from (I do a lot of RT programming).

My strategy has been to create a specific error handler which you call after each functional segment of code (which can be a loop iteration, subVI, or something more granular), and which can take actions based on specific error codes that happen in that segment (much like exception handling in other languages like Java). The specific error handler can take actions like retrying code, ignoring the error, converting it to a warning, or categorizing it. Ideally, the concept is that at the end of any functional segment of code, the errors from that segment have been handled if possible and categorized if not. You can then avoid passing them to other segments of code to get around the problem with undefined behavior that I mentioned before. For usability's sake, my specific error handler is an express VI that lets you configure a list of error codes or ranges and actions for each. I categorize errors by using the <append> tag in the source field, which keeps them fully compatible with all of the normal error handling functions (one drawback is that this requires string manipulation, which is kind of a no-no time-critical RT code, I haven't yet come up with an alternative I'm comfortable with though).

A categorized error feeds into the central error handler (you can pass them with queues, events, FGs, or whatever you like), which can take actions based on categories of error. Each error category can take multiple actions, examples of actions are notifying the user, logging, placing outputs in a safe state, and system shutdown/reboot. Of course, there is always the case of an error code you've never seen before, which I usually treat as a critical error that puts the system in a safe state, logs, and notifies the user.

At some point I'll get the kinks ironed out of my code to the point where I feel comfortable posting it (at that point it will probably show up as a reference design on ni.com), but I think the concepts are solid no matter what implementation you use.

Regards,

Ryan K.

  • Like 1
Link to comment

QUOTE (Anders Björk @ May 25 2009, 10:51 AM)

Not generally. You have errors that arise from code where the only returned value is a number. No strings, no clusters, no booleans, just a number. They may be C built DLLs like the GPIB drivers. They may be DLLs built with LabVIEW VIs that are then returning just the error code. They could be log files where people just wrote down the number and are now reading it back in. There are tons of places where the only bit of data that is preserved is the integer. Because of the way that error codes get used by many users -- both internal and external to NI -- there's no way to have any migration path. Back in 2001, I spent a year working with CVI and TestStand and driver groups to find a migration path, and we ultimately determined that there couldn't be one, which meant that we needed to strengthen the protections of the NI Error Code Database to make sure that overlaps never occurred. That's when we started reserving error code ranges, so that two products simultaneously in development wouldn't accidentally grab the same number, or something like that.

QUOTE (Black Pearl @ May 26 2009, 02:17 PM)

Would't be an OOP aproach be best for that? Base class only contains the traditional error cluster, so the object can be casted to that base class to be compatible with the standard error.

And something very much like that is what I prototyped and posted to LAVA last year.

Link to comment

QUOTE (Black Pearl @ May 26 2009, 03:17 PM)

QUOTE (Aristos Queue @ May 26 2009, 05:47 PM)

And something very much like that is what I prototyped and posted to LAVA last year.

One of the questions that has existed since man first used a LabVIEW error cluster is how can I make it scalable so I can have more than one error simultaneously. The next question was where do I put all this data? I've seen many different implementations, some concatenated their error structure to the "source" text field in the standard error cluster, some implemented a completely new system outside of the traditional error cluster (with appropriate converters to and from each system). The question is: which one is best? The former breaks less and required less retro-fitting of existing code, and can benefit from some of the existing error handling VIs that ship with LabVIEW, but it sometimes breaks when VIs misbehave and wipe the error cluster clean (yes, some primatives have been know to do it under certain circumstances). The latter is tempting because we can start from scratch and design whatever we want, but it requires integration into existing components that, by default, use the existing format is challenging. Thoughts?

Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.