Jump to content

Boxcar (running) average for 1D array


Recommended Posts

LabVIEW 8.01, Full Development, no added toolkits

I need a simple, fast, adjustable method of filtering out intermittant, short peaks (noise) in incomming data. The method we have come up with is to perform a Boxcar (or moving) average on the data. To explain, say you have an array of 100 elements. Take the first 10 elements and average them. This is your new first element. Now take elements 2 through 11 and average them, this is your new second element, etc. You can do this with an incomming data stream, you just lose the first 9.

I have what I think is a very efficient, fast method for doing this. If anyone can come up with a better way, I would appreciate knowing. Also, I would like to know how fast this will execute. Of course, this will depend on how many elements are 'boxcarred' and the speed of the processor. I have included a 'throw together' test I created that I think tells me it will execute at about 6 nanoseconds per element with a per run overhead of about 60 nanoseconds.

How did I get this? In the test routine I input a file of a little over 12K elements (attached). Before and after each run of the boxcar I capture the Tick Count, subtract them, and output the values. Looking at the graph you see periodic peaks of 1 millisecond. I believe this is where the tick just happens to increment exactly when the boxcar is running. So, if you calculate the number of runs between each peak and divide that into 1 millisecond it should be close to the run time of the boxcar (yes, I know it takes some time to get the second Tick Count).

Running this on a 1.69 GHz, Pentium M, Gateway laptop, I calculated (all times in seconds) 9.5e-7 for 1 element, 1.14e-6 for 10, 1.69e-6 for 100, 6.7e-6 for 1000, and 3.03e-5 for 5000.

Dividing the 5000 element time by 5000 gave me the 6 nanosecond result, and the 60 nano overhead is an EWAG (Educated Wild A** Guess) based on the 100 and 1000 elements numbers.

Better/faster method? All opinions appreciated.

BTW. I give the Boxcar routine freely to the forum. Anyone may use it for any reason (though I wouldn't mind credit).

Sorry to be so long winded.

Roy

Link to comment

QUOTE(rkesmodel @ Mar 17 2007, 01:11 PM)

Better/faster method? All opinions appreciated.

You'll get less data movement if, instead of using Rotate 1D Array, you track which element needs to be replaced on each call. Start the value at zero, increment on each call, and reset to zero if it equals the size of your array. That way all that you do is substitute the value of the array instead of moving all the elements of the array. Attached is example VI (saved in LV8.0).

Link to comment

QUOTE(rkesmodel @ Mar 17 2007, 11:11 AM)

LabVIEW 8.01, Full Development, no added toolkits

I need a simple, fast, adjustable method of filtering out intermittant, short peaks (noise) in incomming data. The method we have come up with is to perform a Boxcar (or moving) average on the data. To explain, say you have an array of 100 elements. Take the first 10 elements and average them. This is your new first element. Now take elements 2 through 11 and average them, this is your new second element, etc. You can do this with an incomming data stream, you just lose the first 9.

I have what I think is a very efficient, fast method for doing this. If anyone can come up with a better way, I would appreciate knowing. Also, I would like to know how fast this will execute. Of course, this will depend on how many elements are 'boxcarred' and the speed of the processor. I have included a 'throw together' test I created that I think tells me it will execute at about 6 nanoseconds per element with a per run overhead of about 60 nanoseconds.

How did I get this? In the test routine I input a file of a little over 12K elements (attached). Before and after each run of the boxcar I capture the Tick Count, subtract them, and output the values. Looking at the graph you see periodic peaks of 1 millisecond. I believe this is where the tick just happens to increment exactly when the boxcar is running. So, if you calculate the number of runs between each peak and divide that into 1 millisecond it should be close to the run time of the boxcar (yes, I know it takes some time to get the second Tick Count).

Running this on a 1.69 GHz, Pentium M, Gateway laptop, I calculated (all times in seconds) 9.5e-7 for 1 element, 1.14e-6 for 10, 1.69e-6 for 100, 6.7e-6 for 1000, and 3.03e-5 for 5000.

Dividing the 5000 element time by 5000 gave me the 6 nanosecond result, and the 60 nano overhead is an EWAG (Educated Wild A** Guess) based on the 100 and 1000 elements numbers.

Better/faster method? All opinions appreciated.

BTW. I give the Boxcar routine freely to the forum. Anyone may use it for any reason (though I wouldn't mind credit).

Sorry to be so long winded.

Roy

Couldn't you just use the Pt. By Pt. function : Mean Pt by Pt.vi ?

(under Signal Processing>Pt by Pt> Probability & Statistics>Mean).

It seems to be doing the same thing.

Neville.

Link to comment

QUOTE(Neville D @ Mar 19 2007, 02:37 PM)

Couldn't you just use the Pt. By Pt. function : Mean Pt by Pt.vi ?

(under Signal Processing>Pt by Pt> Probability & Statistics>Mean).

I didn't know about this VI -- I tend to be familiar with the language and not so much the libraries of VIs written in the language. But I opened it up... seems that, yes, the functionality is the same as that requested, but it's a pretty inefficient implementation. I played hide-the-dots with that VI for a while (if you're unfamiliar with that game, it's where you use Tools>>Profile>>Show Buffer Allocations to optimize LV code -- called hide the dots because you try to get down to as few dots on the terminals as possible) and I found a lot of places that could be improved.

If he's looking for speed, I think that the VI I posted will be substantially better. The Mean Pt by Pt.vi has more functionality than mine, but for what rkesmodel is attempting, I don't think he needs any further functionality.

Link to comment

QUOTE(Aristos Queue @ Mar 20 2007, 02:45 AM)

...But I opened it up... seems that, yes, the functionality is the same as that requested, but it's a pretty inefficient implementation...

The dark side... :rolleyes:

I have spoken to NI about this issue several times.

It seems like almost all the Pt by Pt functions uses the build array until the buffer is filled, then they switch to a mode where data is rotated at each call, making them more or less useless in an RT environment.

/J

Link to comment
  • 1 year later...

QUOTE (Aristos Queue @ Mar 17 2007, 05:37 PM)

You'll get less data movement if, instead of using Rotate 1D Array, you track which element needs to be replaced on each call. Start the value at zero, increment on each call, and reset to zero if it equals the size of your array. That way all that you do is substitute the value of the array instead of moving all the elements of the array. Attached is example VI (saved in LV8.0).

Hi, I don't quite understand how this is initialized. I have it within a if block that will only execute once every few minutes. How do I ensure that it gets initialized when the program begins?

-Pat

Link to comment

QUOTE (irpotential @ Jun 20 2008, 01:04 PM)

Hi, I don't quite understand how this is initialized. I have it within a if block that will only execute once every few minutes. How do I ensure that it gets initialized when the program begins?

-Pat

Use the "Is first call?" function under the Data Communication>>Synchronization sub-pallet

Neville.

Link to comment

The stated purpose of the routine is to remove intermittent spikes in the data. Median would be better than mean for this purpose. It takes the middle value of the set, rather than the average, so a couple of high values are lost entirely and do not change the median the way the average does.

This, of course raises a question of how to compute the median, it needs to put the items in order and pick the middle one. Doing this on a continuous basis seems likely to respond to ingenuity, since as you move on to the next point, n-1 points are already in order in the buffer.

John

Link to comment

QUOTE (jbrohan @ Jun 21 2008, 02:47 AM)

The stated purpose of the routine is to remove intermittent spikes in the data. Median would be better than mean for this purpose. It takes the middle value of the set, rather than the average, so a couple of high values are lost entirely and do not change the median the way the average does.

This, of course raises a question of how to compute the median, it needs to put the items in order and pick the middle one. Doing this on a continuous basis seems likely to respond to ingenuity, since as you move on to the next point, n-1 points are already in order in the buffer.

John

There is already a Median Filter VI and the Median Filter Pt-by-Pt VI (if using in a loop).

I think they are in the Pro package of LV though..

Neville.

Link to comment

I completely have the Filter Pt-by-Pt VI because it is not included in the base package (which some of my users use) and it is horribly inefficient.

I prefer to use the attached method. It is polymorphic and works for filtering single or multiple channels. You only need two single point buffers for each channel you are filtering and the size of the average does not effect performance. I believe that you could experience some jitter when using this method, but it has not been an issue for my applications. I don't think it can get any better than this, expecially for averages lasting thousands of points.

Note that whenever filtering data you need to make sure that a +/-Inf or NaN cannot get "caught" in your shift registers. In this example I chose to ignore those values and just pass the last valid data. You can use this to filter out invalid data points, just convert them to NaN before passing them to the filter. I like to use this method to filter out bad thermocouple values that are caused by broken wires, etc.

Thoughts? Comments?

Link to comment

Excellent, bazookazuz.

I personally prefer to use exponentially weighted moving averages for most applications. The formula for it is EWMA(i+1) = EWMA(i)*(1-lambda) + latestDataPt*lambda. It's even simpler to compute, and does a nice job of smoothing. The only problem case I've ever seen where you wouldn't want EWMA is where machine error occasionally gives you wild numbers in the (usually negative, for some reason) billions while your real data is order ~ 1. In that case, boxcar moving average is better, because it comes back to sanity faster.

Link to comment

QUOTE (torekp @ Jul 1 2008, 08:12 AM)

Excellent, bazookazuz.

I personally prefer to use exponentially weighted moving averages for most applications. The formula for it is EWMA(i+1) = EWMA(i)*(1-lambda) + latestDataPt*lambda. It's even simpler to compute, and does a nice job of smoothing. The only problem case I've ever seen where you wouldn't want EWMA is where machine error occasionally gives you wild numbers in the (usually negative, for some reason) billions while your real data is order ~ 1. In that case, boxcar moving average is better, because it comes back to sanity faster.

Thanks!

I used to use EWMA for all general filtering because it used minimal memory and was easy to use. Once I found out that you could do a boxcar average with minimal memory usage as well (see above post) I switched to it because it has better response time. I noticed the most improvement in response time when measuring the fill-rate on a high-noise system. In my case we would only fill for a few minutes, so response time was critical or we would over-fill.

As for the jitter issue i proposed above, I did some tests over the weekend and I did not see any signs of jitter errors. I ran noisy sine waves into the filter for an equivelnat of one week of run time at 100Hz.

Cheers!

Link to comment
  • 2 weeks later...

QUOTE (bazookazuz @ Jul 7 2008, 10:24 PM)

Thanks!

I used to use EWMA for all general filtering because it used minimal memory and was easy to use. Once I found out that you could do a boxcar average with minimal memory usage as well (see above post) I switched to it because it has better response time. I noticed the most improvement in response time when measuring the fill-rate on a high-noise system. In my case we would only fill for a few minutes, so response time was critical or we would over-fill.

As for the jitter issue i proposed above, I did some tests over the weekend and I did not see any signs of jitter errors. I ran noisy sine waves into the filter for an equivelnat of one week of run time at 100Hz.

Cheers!

Never mind, the code I posted above does not work. I thought it did, but it turn out to just be a slightly different version of the EWMA method. Apparently, there is no free lunch. Back to the drawing board...

Link to comment
  • 3 weeks later...
  • 10 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.