I also like the Savitsky-Golay, but it only works for uniformly spaced data, whereas the above utility is also good for non-uniform X spacing.
I had already rewritten this utility for my own use, and checking it against the original I see it has a ~15x speedup on a single-core, and ~25-30x on my 2-core laptop -- should be even greater with more cores. Here's the main things I changed:
passing individual X and Y arrays rather than a cluster
replacing the Power function in the weighting routine with a multiply - this makes the most difference, about 8x
turning off debugging - gives almost another doubling in speed
moving some functions outside the loops, and sometimes removing loops altogether
using parallel loops and sharing clones for subVIs inside parallel loops
If you can get away with SGLs rather than DBLs, you'll get a further speedup, and if your data is evenly spaced but you still want to use this algorithm, then you shouldn't need to recompute the weighting function throughout your data - it only changes towards the start and end.
SmoothCurveFit.zip
You're welcome to use this rewritten code.