alukindo Posted October 2, 2012 Report Posted October 2, 2012 Hi: This link points to a great utility to do curve smoothing. http://zone.ni.com/d...a/epd/p/id/4499 The challenge though is that the VI is really slow when trying to smooth a long curve. I am using this in geological logs that happen to be ~45 pages long. Question: Does anyone wish to do code optimization to make the VI run faster. I need it to run more than 10 times faster. Let me know how much the consulting effort will be for this? NOTE: While I will or can pay for the effort to optimize the code, the VI will still be available for others to use and the original author will be acknowleged. Regards Anthony Quote
drjdpowell Posted October 3, 2012 Report Posted October 3, 2012 I would not be able to help you for a few weeks as I’m off on vacation. I can see why your smoothing functions so slow and I’m sure someone could easily improve it’s performance by orders of magnitude on large data sets. However, are you sure you would not be better served by using one of the many “Filter” VIs in LabVIEW? I tend to use the Savitsky-Golay filter, but there are many others that can be used for smoothing. They’ll be much, much faster. Quote
asbo Posted October 3, 2012 Report Posted October 3, 2012 Another vote for the Savitsky-Golay, it's almost beautiful sometimes. Quote
alukindo Posted October 4, 2012 Author Report Posted October 4, 2012 drjdpowell: I checked that Savitsky-Golay filtering algorithm. I like the fact that it leaves in all the 'peaks' and 'valleys' while smoothing, but how can I use it with an array of X and Y values where the X values are not collected at the same X-interval? The VI connector pane shows that it will accept only one X- values at-a-time? Note that the Lowess VI in te example that I provide above already has array inputs for pairs of 'X' and 'Y' values. . . . . Or am I missing something? Anthony L. Quote
Popular Post GregSands Posted October 4, 2012 Popular Post Report Posted October 4, 2012 I also like the Savitsky-Golay, but it only works for uniformly spaced data, whereas the above utility is also good for non-uniform X spacing. I had already rewritten this utility for my own use, and checking it against the original I see it has a ~15x speedup on a single-core, and ~25-30x on my 2-core laptop -- should be even greater with more cores. Here's the main things I changed: passing individual X and Y arrays rather than a cluster replacing the Power function in the weighting routine with a multiply - this makes the most difference, about 8x turning off debugging - gives almost another doubling in speed moving some functions outside the loops, and sometimes removing loops altogether using parallel loops and sharing clones for subVIs inside parallel loops If you can get away with SGLs rather than DBLs, you'll get a further speedup, and if your data is evenly spaced but you still want to use this algorithm, then you shouldn't need to recompute the weighting function throughout your data - it only changes towards the start and end. SmoothCurveFit.zip You're welcome to use this rewritten code. 4 Quote
drjdpowell Posted October 4, 2012 Report Posted October 4, 2012 Ah, non-uniform spacing, I see. Greg, My first thought would have been to truncate the weighting calculation and fitting to only a region around the point where the weights are non negligible. Currently, the algorithm uses the entire data set in the calculation of each point even though most of the data has near zero weighting. For very large datasets this will be very significant. — James BTW> Sometimes it can be worth using interpolation to produce a uniform spacing of non-uniform data in order to be able to use more powerful analysis tools. Savitsky-Golay, for example, can be used to determine the first and higher-order derivatives of the data for use in things like peak identification. Quote
alukindo Posted October 4, 2012 Author Report Posted October 4, 2012 Greg Sands: Wow! 15x Faster. I owe you a beer, or a soda --just-in-case you don't drink beer. I have been in search of something more efficient. I really appreciate the wizardry that went into speeding up this routine. Will Test and get back. Thanks and Regards Anthony djdpowell: I like the idea of interpolation to attin uniform X- spacing because we have to that to conform to the LAS file export standard. I will try the Savitsky-Golay fileter when going through the interpolation algorithm. The only thing is that the interpolation is only done during final reporting to conform with Log ASCII file export requirements for report drilling info. I allother cases the data needs to presented as-is but smoothed. Thanks for all these great suggestions. Amazing what can be learned from these forums when minds from around the word come together to contribute solutions. Anthony Quote
asbo Posted October 4, 2012 Report Posted October 4, 2012 Here's a VI I've used before to change a pair of arbitrary X and Y arrays into a waveform, if you need some help on the interpolation front. XY Data to Waveform.vi Quote
GregSands Posted October 4, 2012 Report Posted October 4, 2012 Greg, My first thought would have been to truncate the weighting calculation and fitting to only a region around the point where the weights are non negligible. Currently, the algorithm uses the entire data set in the calculation of each point even though most of the data has near zero weighting. For very large datasets this will be very significant. Yes, in fact the weighting calculation already truncates the values so that they are set to zero outside of a window around each point. However with a variable X-spacing, the size of the window (in both samples and X-distance) can vary, so it would be a little more complicated to work out that size for each point. Just had a quick go - with a fixed window size, you get a further 10x speedup, but if you have to threshold to find the appropriate window, it's only another 2-3x. Still, that's around 40x overall, with further increases using multiple cores. SmoothCurveFit_subset.zip I'd only ever used it for arrays up to about 5000 points, so it had been fast enough. Interestingly, the greatest speedup still comes from replacing the Power function with a Multiply -- never use x^y for integer powers! Any further improvements? Quote
ShaunR Posted October 5, 2012 Report Posted October 5, 2012 (edited) Any further improvements? Just has a cursory glance. But it looks like you are calculating the coefficients and passing the XY parms for the linear fit twice with the same data (it's only the weightings that change from the first "fit" to the second) . You could pre-calculate them in a separate loop and just pass them into the other loops. Also, you might benefit from passing through the x array (through the coefficient vi). Edited October 5, 2012 by ShaunR Quote
GregSands Posted October 6, 2012 Report Posted October 6, 2012 Just has a cursory glance. But it looks like you are calculating the coefficients and passing the XY parms for the linear fit twice with the same data (it's only the weightings that change from the first "fit" to the second) . You could pre-calculate them in a separate loop and just pass them into the other loops. I used a separate loop to start with, but the speed improvement was minimal, and the memory use would be increased fairly significantly. Quote
alukindo Posted October 14, 2012 Author Report Posted October 14, 2012 Hi Greg Sands: I tried this on the logs and the smoothing assigned NaN values to some of the Y-Points. Do you know what could be causing this? The final result is that the output curve has many broken regions. I have attached various screen shots that shows this. Regards Anthony L. Quote
GregSands Posted October 15, 2012 Report Posted October 15, 2012 (edited) I tried this on the logs and the smoothing assigned NaN values to some of the Y-Points. Do you know what could be causing this? The final result is that the output curve has many broken regions. I have attached various screen shots that shows this. Hard to tell without seeing the data, but if your screen-shot is correct, you have a "Weighting Fraction" equal to zero, so I wonder if that is causing the problem. I'm pretty sure that it should be greater than zero - it's the fraction of the dataset to use for fitting at each point. Edited October 15, 2012 by GregSands Quote
alukindo Posted October 16, 2012 Author Report Posted October 16, 2012 Hi Greg: There is code in your routine which forces the ratio to something other than zero. I later found out that I need to input a higher coefficient compared to the ones I've been using on the original routine. So your revised routine does work but I just need to adjust the smoothing factor after I have tried on several logs. Thanks and Regards Anthony L. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.