bsvingen Posted May 25, 2006 Report Share Posted May 25, 2006 Hello The speed of doing floating point operations is important to my application, but how optimize this is in G? I have therefore made a small test program that tests several different methods to see which is fastest. The tests consist of doing 2D vertex rotations of the form: x' = x*cos(theta) + y*sin(theta) y' = -x*sin(theta) + y*cos(theta) The different methods consist of: The included NI sub vi (LV8), an ordinary sub vi, an ordinary sub vi flagged as subroutine, call by refrence sub vi, "inlining" the diagram without any function call, using formula node with arrays, using formula node without arrays, DLL made in C direct call, DLL made in C that takes whole arrays and the same DLLs made in FORTRAN. I also tried a MathScript node, but thas was several thousand times slower than the others so i just removed it alltogeher. Some of the results are very obvious but some are very surprising (to me at least). The one with the diagram "inlined" (directly in the main vi) is set to 100 % while the other are percentages from that one (smaller is better). NI built in sub vi subroutine reference "inline" FormulaN 1 FormulaN 2 DLL C DLL C Arr DLL F95 DLL F95 Arr546 205 123 508 100 177 91 177 64 179 64 In general a G diagram takes 30-40% longer to execute that C or fortran (this i knew), but very surprisingly the formula node is faster than the diagram (10 % faster). However when arrays are put into the diagram (FormulaN 1) it slows down, and it seems to me that it is directly proportional to the amount of array indexing that is done.Using a subroutine is only about 25% slower than a direct diagram, however it already takes twice the time of C/FORTRAN.Using an ordinary sub vi takes twice the time of "inlining" the diagram and the speed of execution is now 3 times longer than C/FORTRAN.Using a call by reference node really bogs things down by a factor of 5 compared with "inline" diagram and a factor 8 compared with C/FORTRAN.Finally the included vi from NI is dead slow, it is in fact very hard to bult a simple routine like this that executes that slow (but i explain a bit more later).Direct call to DLLs is slower than "inlining" the code for this simple routine, but it is still faster than using an ordinary sub vi, and this is very strange. This means that overhead associated with calling a DLL is less than the overhead associated with calling a sub vi, why? The included vi from NI is in fact calling a DLL, so why is it so slow? The reason is that it is calling a DLL that obviously is optimized to take millions of vertices and rotate then the same angle (for instance when rotating a picture). Using it with different angles each time therefor consist in transforming the doubles to an array of one double and do the same for the output and using a routine that is optimized for something totally different then what you are trying to do. Used as it should be used with millions of vertices and the same angle it is extremely efficient, but when used for different angles it is unbelievable inefficient. One can only wonder why NI didnt take the extra 5 min to code this instance of the polymorfic vi using ordinary methods. For the DLLs i have use lcc C compiler and Salford FORTRAN 95, they are both free for non comercial use. I downloaded the FORTRAN compiler today, and was extremely impresses with the ease of making DLLs. It was just a matter of writing an ordinary subroutine and compile and link (I used ordinary F77 code as i have no idea of how to write F95 code ). All the hieryglyphic things associated with DLLs are completely hidden from the user. Anyway, these tests indicates that the fastest possible G coding is to use formula nodes, but do the indexing of any arrays outside the node. This will be faster than any other method in labview. Earlier, my impression was that formula nodes were slow, but that was probably because i did alot of indexing of arrays inside. For pure speed a DLL is the way to go, and with the ease this can be done in Salford, hmmm. The program is included in a zip file. Just unzip and run the "Test 2D Rotate.vi". Only LV8. Download File:post-4885-1148575817.zip Quote Link to comment
bsvingen Posted May 27, 2006 Author Report Share Posted May 27, 2006 These simple tests also show the importance of inlining small functions with regard to execution speed. While the basic execution speed in labview is a factor 1.6 compared with C or Fortran (1.4 with formula nodes with no array indexing), it will be practically impossible to maintain that speed in a readable and reasonable maintainable labview application. A factor of 3-4 is probably more representative. A simple inline construct would solve most of this. Quote Link to comment
Gary Rubin Posted May 27, 2006 Report Share Posted May 27, 2006 I'm very interested in what you're doing, but unfortunately, I do not have LV8. Would it be possible to post your test VI in a LV7.1 version? Thanks, Gary Quote Link to comment
bsvingen Posted May 27, 2006 Author Report Share Posted May 27, 2006 Here is a 7.1.1 version. There is no NI sub vi in this since this is only available for 8.0 Download File:post-4885-1148747671.zip Quote Link to comment
ahlers01 Posted May 27, 2006 Report Share Posted May 27, 2006 Hi bsvingen, LV complains about a missing 'salflibc.dll'. Could you post that one, too? Quote Link to comment
bsvingen Posted May 28, 2006 Author Report Share Posted May 28, 2006 Hi bsvingen,LV complains about a missing 'salflibc.dll'. Could you post that one, too? Strange. Seems like Salford Fortran need some kind of run-time library, very odd for this simple code. I found the salflibc.dll in a "redist" folder in the salford installation folder. Here it is. or not?. The board complains and sends a message that it will not accept i file "with that file extension". I zipped the file, and you need to unzip it along with the others i guess. If it complains about more i think you can install the Salford FORTRAN package or just comment out the call to those dlls, since the c-dlls are almost exactly the same execution speed. The salford fortran is freely downloadable from http://www.silverfrost.com/32/ftn95/ftn95_...nal_edition.asp Download File:post-4885-1148773381.zip Quote Link to comment
ahlers01 Posted May 28, 2006 Report Share Posted May 28, 2006 Strange. Seems like Salford Fortran need some kind of run-time library, very odd for this simple code. I found the salflibc.dll in a "redist" folder in the salford installation folder. Here it is. or not?. The board complains and sends a message that it will not accept i file "with that file extension". I zipped the file, and you need to unzip it along with the others i guess. If it complains about more i think you can install the Salford FORTRAN package or just comment out the call to those dlls, since the c-dlls are almost exactly the same execution speed.The salford fortran is freely downloadable from http://www.silverfrost.com/32/ftn95/ftn95_...nal_edition.asp o.k., I got it running now. The execution speed results you obtained are indeed very interesting. I basically could confirm all the timings you obtained. In addition to your tests I added one where the whole array is passed to the subroutine (which contains what you call 'inline' code). Interestingly the subroutine performs slightly faster when you set the priority not to 'subroutine', but to 'time crritical' only. The fastest pure-LV solution i came up with was one where the formula node (with external indexing) is put in a 'time critical' sub-VI, and where the cos calculation is replaced by sqrt(1- sin*sin): It is nearly as fast as the lcc-array solution, as the green line in the following diagram shows (the diagram lists msec used for 500000 array length): So the LV solution is only 10% slower than the lcc one (and faster than fortran), unless you speed up the dll code by avoiding the cos calculation there, too. Remarkably, the timings of the two LV-subroutine solutions (black and green lines in the diagram) are much more constant than the timings of the inline, lccc and fortran solutions, which sometimes are longer due to some backgrounf activity in the system. That becomes especially obvious when I did not turn of my PC's DSL network. The following diagram shows the timings with network on (in the above diagram it had been off): The black and green curves are 'unimpressed' by the background activity, the others obviously not. (I wonder if they become more stable, or even faster, when 'wrapped' into 'time critical' subVI) In conclusion, I would say that it is possible to have a fast floating point meth in LV which nearly reaches a DLL implemetation. Inline code is not necessary, IMO, since the 'time critical' subVI method can be used. BTW: I obtained my results with a beta version of LV 8.2, since I had all older LVs uninstalled. I attach a copy of my VIs which is saved for LV 8.0: Download File:post-833-1148831147.zip Quote Link to comment
bsvingen Posted May 28, 2006 Author Report Share Posted May 28, 2006 Seems like floating point is a strange beast to optimize. It is very strange that your FORTRAN dll is slower than the C dll, as they are identical on my PC. My results from your file is in the attachment. Right now i'm using my laptob which is a Dell with a dual Centrino, and it could very well be that one of them (or borth) optimizes for dual cores, or/and for pentium (instead of AMD) etc, but this is only guessing since i have only downloaded them and compiled the code without looking any further into the optimization switches other than put them on "optimization". Basically the C and FORTRAN DLLs are 60 ms while the others are about 90 ms on my system running LV8.0.1 It therefore looks like LV8.2 is much faster. I will try to use the sqr function for cos in the dlls, and see if that will speed up things. Edit: I tried the sqrt function in the C code, but it still was 60 ms, no change that I could see from the graphs. Quote Link to comment
Gary Rubin Posted May 30, 2006 Report Share Posted May 30, 2006 Thank you for providing your test code in LV7.1 I found a couple of other interesting things. When I replace your Sin and Cosine functions in the subvis with the single function that calculates both, I get a noticeable speed increase (more so for the normal subvi than for the Subroutine priority). When I do the same replacement in the inline case, however, I get no improvement. Also, when I replace the indexed inline call with an array inline call (see attached image), I saw slower speed from the array one. I believe this is due to the large number of array allocations necessary because of all the branches in the wires. I find it interesting that this discussion, along with a previous one, are making me realize that NI's Optimization application notes are not the gospel that I once thought of them as. Instead, the appropriate technique for optimization (i.e. indexing vs. complete arrays) depends a lot on the code itself. Regarding the use of the subroutine priority - I seem to remember that one shouldn't play around with priorities, as it tends to interfere with multithreading. This goes back a few years though - was that just something related to the way multithreading was done on non-HT processors and/or Win2k? Gary Quote Link to comment
bsvingen Posted June 1, 2006 Author Report Share Posted June 1, 2006 This is interesting and very strange. I made another simple test case that calculates z=sqrt(x^2 + y^2) when the variables are in an array and the output is an array. Here the strange thing happened that when doing this "inline" with diagram and calling the same diagram as a sub vi, there seems to be abselutely no penalty for the sub vi call, even when the sub vi is a "normal" one. Another thing to note is that calculating x^2 as x*x is approx 5 times faster than using x**2 in a formula node. This is for LV7.1.1 Download File:post-4885-1149120243.zip Quote Link to comment
Gary Rubin Posted June 1, 2006 Report Share Posted June 1, 2006 Another thing to note is that calculating x^2 as x*x is approx 5 times faster than using x**2 in a formula node. I know that "Power" is a very expensive operation, so this doesn't surprise me. Quote Link to comment
bsvingen Posted June 1, 2006 Author Report Share Posted June 1, 2006 I know that "Power" is a very expensive operation, so this doesn't surprise me. I agree, but what is confusing here is that there exist a pow(x,y) in the formula node and and an equivalent diagram function. There also exist (in LV8) a square diagram function. It is therefore easy to believe that since pow(x,y)_formula node = pow(x,y)_diagram, then x**2_formula node = x**2_diagram, but this is not the case. There exist no square formula node function, only two ways of writing the power function. Quote Link to comment
mseb Posted August 29, 2006 Report Share Posted August 29, 2006 Hello, this is my first post here, I hope not to be completely NAN for this first contribution I'm trying to find my way on the subject of both good programming techniques and nice execution speed using LV. Since I read this discussion, I decided to use formula nodes when doing heavy computations, but I had not looked at the details of the test vi, I had just verified that using a formula node instead of "standard" functions was, indeed much faster. Looking at it today, I was surprised to see that the Formula Node 2 (arrays outside the formula node) was using three array indexing operations. Am I the only one who use auto-indexing ? If you do it, the results for this case improve of about 30 %, (on my machine, the "Factor" drops from 97 to 67), which makes this solution rather competitive IMHO with the F95 or C DLL's. BTW, using the auto-indexing trick does not produce such an impact on all test cases (at least those I've tried). I thought this was worth writing my first post, so here it is If anyone has a generic solution, keeping both code readability and execution speed, I'd love to hear about it : I **hate** formula nodes, they're really too large for my screen and I can't type two lines of code without introducing three bugs ! Thank you for having read me thus far, and thank you to all of the contributors to these forums, it's been a fascinating and fruitfull reading for me. I hope I'll be able to give you something back. Quote Link to comment
EWR Posted January 22, 2008 Report Share Posted January 22, 2008 QUOTE(Gary Rubin @ May 27 2006, 02:00 PM) I'm very interested in what you're doing, but unfortunately, I do not have LV8. Would it be possible to post your test VI in a LV7.1 version?Thanks,GaryHello bsvingen, first of all, thank you for your excellent posted software (Execution_Speed.zip). It is the first time I generate a DLL in my life (many years of computing experience), using LabVIEW (only 2 months of experience), employing a different compiler than that one you suggested (I use Borland C++ 5.02). When I compiled your unmodified C code, I only saw a short flash on the screen. I did not understand what happened, my big surprise was to see in the folder the new DLL! Then I started your VI in LabVIEW 7.1, the second big surprise it worked correctly! Thank you again for the software. May be you can comment on two problems related to the DLLs. - When I try to compile for a second time the C code after minor modifications, the compiler gives a warning that he could not compile because the DLL isin use (in memory). The settings in the registry are done to unload the DLL after the execution ends. I had to exit LabVIEW and restart it later to get rid of the problem? How can be avoided this? - I tried to do a "printf" in the C code. The compiler worked without warnings, but nothing was printed out when running the VI. Kind regards, Erich Quote Link to comment
CindyLong Posted October 23, 2008 Report Share Posted October 23, 2008 It seems that the compiler matters a lot. I tried to use the MS Visual Studio 2008 to compile. The result for C code is around 500 for DLL lcc and 58 for DLL Arrar lcc. It seems that the dll fucntions calling introduced a significant overhead. Anybody has an idea on how to improve this? I attached the solution for building the dll. Thanks It seems that the compiler matters a lot. I tried to use the MS Visual Studio 2008 to compile. The result for C code is around 500 for DLL lcc and 58 for DLL Arrar lcc. Wheres for the original dll, it is about 390 for DLL lcc and 68 for DLL Arrar lcc. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.