Jump to content

ShaunR

Members
  • Posts

    4,883
  • Joined

  • Days Won

    297

Everything posted by ShaunR

  1. I know. I'm running x64 at the moment so that's why it s 8. Maybe I should have put a conditional disable for people on 32 bit, but it was late at night and I had already pulled most of my hair out. One doc I found states: "LabVIEW stores variants as handles to a LabVIEW internal data structure. Variant data is made up of 4 bytes" And a handle is a **myhandle. I don't really care what the data structure is only that I can point to the handle so it "should" be just pointer copying. Works fine for the write and once for the read, but second time around the deallocation kills LV. So. Suggestions?
  2. I'm probably missing something fundamental about how variants are stored in memory . I want to create an array of pointers to variants. I seem to be able to write quite happily, but when read; it causes a problem. The following VI demonstrates what I am trying to do. It runs through once quite happily, but on the second execution it fails on the DSDisposePointer (even though the variant has been written and read correctly). If you disable the read, then it doesn't crash so it must be something to do with the way I'm retrieving the data. Any help appreciated. varpointers.vi
  3. +1. It's not conclusive between resolution and loop jitter if you check with the elapsed time.vi (that is inside the loop) since it is difficult to separate them (occasionally you see 10s of nsec or psec readings). But as the Tick count2+ uses the same timing method (which yields ps resolution), I think we can assume it is mainly loop jitter that causes the spacing. I think also that whether the Elapsed Time.vi gets executed in the same slice or the next (or the next +1 etc, depending on the schedular) is the reason we see several lines at x*loop jitter.
  4. It's a good job MJE noticed the bug where the dequeues aren't reading everything then. Still. It's rather counter intuitive. Well .remembered. That also explains why the sudden increase (the curved lines around 1ms) in the old 1M data point plots. By the way peeps. Feel free to write your own benchmarks/test software or abuse-ware. Especially if it shows a particular problem. The ones I supplied are really just test harnesses so that I could write the API and see some info on what was going wrong (you can probably see by the latest images that the ones I am using now are evevolving as I probe different aspects). Not much time has been spent on them so I expect there's lots of issues. Eventually, once all fingers have been inserted in all orifices, they will become usage examples and regression tests (if anyone uses the SQLite API For LabVIEW you will know what this means) so if you do knock something up, please post it-big or small. I think I should also request at this point, to avoid any confusion, that you should only post code to this thread if you are happy for your submission to be public domain. By all means make your own proprietary stuff, but please post it somewhere else so those that come here can be sure that anything downloaded from this thread is covered by the previous public domain declaration. Eventually it will go in the CR, and it will not be an issue,but until then please be aware that any submissions to this thread are public domain. Sorry to have to state it, but a little bit of garlic now will save the blood being sucked later.
  5. Indeed. See my post in reply to GregSands about processor hogging.For a practical implementation. Then yes.I agree. It should yield. Adding the wait ms and changing to Normal degrades the performance on my machine by about 30% and as I wanted to see what it "could do" and explore it's behaviour rather than labviews, I wasn't particulalry concerned about other software or parts of the same software. It makes it quite hard to optimise if the benchmarks are cluttered with task/thread switches et al. Better (to begin with) to run it as fast as posssible with as much resource as required to eek out the last few drops, then you can make informed decisions as to what you are prepared to trade for how much performance. Of course, in 2011+ you can "inline". In 2009 you cannot, so subroutine is the only way. Ooooh. I've just thought of a new signature....... "I don't write code fast. I just write fast code
  6. OK. I've managed to replicate this on a Core 2 Duo (the other PC is an I7). The periodic large deviations coincide with the buffer length. If you look at your 10K@ 1Kbuffer, each point is 1K apart. If you were to set the #iterations to 100 and, say the buffer length to 10, then the separation would be every 10 and you would see 9 points [ (#iterations/buffer length) -1 ]. Similarly, if you set the buffer length to 20, you would see 4. My initial thoughts (as seems to be the norm) was a collision or maybe a task switch on the modulo operator. But it smells more like processor hogging as you can get rid of it by placing a wait with 0 ms in the write and read while loops. You have to set the read and write VIs to be "Normal" instead of "subroutine" to be able to use the wait and therefore you lose a lot of the performance so it's not ideal, but it does seem to cure it. I'm not sure of the exact mechanism-i'll have to chew it over. But it seems CPU architecture dependent. Look forward to it. Care to elaborate on for loop Vs While loop? (Don't speed up the queues too much eh? )
  7. Calculating the MD5 Message-Digest of a String or File
  8. Addendum. I've just modified the test to a) ensure timers always execute at pre-determined points in the while loops (connected the error terminals). b) pre-allocate the arrays. So it looks like a lot of what I was surmising is true. There is still one allocation in the 258 iteration image which might be for the shift registers. But everything is a lot more stable and the STD and mean are now meaningful (if you'l excuse the pun). Does anyone want to put forward a theory why we get discrete lines exactly 0.493 usecs apart? (maybe a different number on your machine, but the separation is always constant)
  9. @ GregSands Whilst a median of 4..6 micro seconds isn't fantastic. It's still not bad (slightly faster than the queues). In your images I am looking at the median and Max-Count Exec Times peak. The reason is as will follow. I'm not (at the moment) sure why you get the 300ms spikes (thread starvation?) but most of the other stuff can be explained (I think) I've been playing a bit more and have some theories to fit the facts. The following images are a little contrived since I run the tests multiple times and chose the ones that show the effects without extra spurii. But the data is always there in each run, just that you get more of them The following is with 258 data points buffer. We can clearly see two distinct levels (lines) for the write time and believe me; each data point in each line is identical. There are a couple of data points above these two (5 in fact). Theses anomalous points intriguingly occur at 1, 33, 65,129 and 257 or (2^n)+1. OK. So 17 is missing. you'll just have to take my word for it that it does sometimes show up. We can also notice that these points occur in the reader as well; at exactly the same locations with approximately the same magnitude. That is just too convenient. OK. So maybe we are getting collisions between the read and write. The following is again with 258 iterations with a buffer size of 2 (the minimum). That will definitely cause collisions if any are to be had. Nope. Exactly the same positions and "roughly" the same magnitudes. I would expect something to change at least if that were the issue. So if they really do occur at 2n+1 if we increase further we should see another appear at 513. Bingo!. Indeed. There it is. Here is what I think is going on. The LabVIEW memory manager uses an exponential allocation method for creating memory (perhaps AQ can confirm). So, every time the "Build Array" on the edge of the while loop needs more space, we see the the allocation take place which causes these artifacts. The more data points we build, the more of impact the LabVIEW memory manager has on the results. The "real" results for the buffer itself are the straight lines in the plots which are predictable and discrete. The spurious data points above these are LabVIEW messing with the test so that we can get pretty graphs. We can exacerbate the effect by going to a very high number. We can still clearly see our two lines and you will notice throughout all the images the Median and the Max Count-Exec Times have remained constant (scroll back up and check) which implies that the vast majority of the data points are the same. The results for the mean, STD and to our eyes are "confused" by the number of suprii. So I am postulating that most, if not everything above those two lines in the images is due to the build arrays on the border of the while loops. Of course. I cannot prove this since we are suffering from the "Observer Effect" and if I remove the build arrays on the border, we won't have any pretty graphs . Even running the profiler will cause massive increases in benchmark times and spread the data. I think we need a better bench marking method (pre-allocated arrays?). Of course. It raises the question. Why don't the queues exhibit this behaviour with the queue benchmark?. Well. I think they do. But it is less obvious because, for small numbers of iterations there is greater variance in the execution times and it is buried in the natural jitter. It only sticks out like a sore thumb with the buffer because it is so predictable that they are the only anomalies.
  10. Sweet! I've replaced the global variable with LabVIEW memory manager functions and, as surmised, the buffer size longer affects performance so you can make it as big as you want. . You can grab version 2 here:Prototype Circular Buffer Benchmark V2 I've been asked to clarify/state Licencing so here goes.
  11. Contingent on buffer multiples is a little perplexing. But that it disapers when the buffer is bigger than the points suggests it is due to the write waiting for the readers to catch up and letting LabVIEW get involved in scheduling (readers "should" be faster than the writer because they are simpler but the global array access is slow for the reasons MJE and AQ mentioned - I think I have a fix for that )Can you post the image of it not misbehaving? The graphs tell me a lot about what is happening. (BTW, I like the presentation as a log graphs better than my +3stds-I will modify the tests)
  12. Interesting. You can see a lot of data points at about 300ms dragging the average up whereas most are clearly sub 10us. If you set the buffer size to 101 and do 100 data points, does it improve? (don't run the profiler at the same time, it will skew the results)
  13. OK. Tidied up the benchmark so you can play, scorn, abuse. If we get something viable, then I will stick it in the CR with the loosest licence (public domain, BSD or something) You can grab it here: Prototype Circular Buffer Benchmark It'll be interesting to see the performance on different platforms/versions/bitnesses. (The above image was on Windows 7 x64 using LV2009 x64) Compared alongside each other. This is what we see.
  14. It doesn't tell me when I upload (just says uploading is not allowed with a upload error). When it first happened, I deleted 2 of my code repository submissions just in case that was the problem (deleted about 1.2 MB to upload a 46KB image). It didn't make a difference. There used to be a page in the profile that allowed you to view all the attachments and uploads and monitor your usage. That seems to have disappeared so I can't be sure that deleting more from the CR will allow uploading..
  15. The images are just inserted images that have to reside on another server (using the insert image button in the bar). Usually I upload the images to lavag and insert them. Obviously I cannot do that at the moment so this way is a work-around. I can do something similar for the files I want to upload, but then they won't appear inline in the posts as attachments (and presumably I cannot put stuff in the code repository for people). You will have to be redirected to my download page to get them (not desirable).
  16. For a while now (ever since the last Lavag.org crash). I have not been able to post pictures or upload files, The upload section just states "Uploading is not allowed" and there is no real indication as to why this is so.
  17. You want it 40 usecs because because 40+50 <100? Put your acquisition and processing (50us) in a producer loop and the TX in a consumer loop. Then your total processing time will be just the worst of the two (70us) rather than the addition of both.
  18. It depends where your bottleneck is. 24xDouble precision numbers @ 10k is about 2MB/sec. Doesn't sound a lot to me. Are we talking PXI-RT or PXI-Windows7? How are you acquiring and how are you transferring (TCPIP, MXI?).
  19. Well. Another weekend wasted being a geek So I wrote a circular buffer and took on board (but didn't implement exactly) the Disruptor pattern. I've written a test harness which I will post soon once it is presentable so that we can optismise with the full brain power of Lavag.org rather than my puny organ (fnarr, fnarr). In a nutshell, it looks good - assuming I'm not doing anything daft and getting overruns that I'm not detecting.
  20. It looks to me like the data is already processed. If you just plot the data directly (and change the graph scale to logarithmic) you will get: If you want to smooth it, use the Interpolate 1D.vi and select spline (ntimes= 10).and you will get:
  21. You can always just return an array with one element for single values (if a task returns a single value, just use the build array to convert it). Then all your companes are the same. If you really want to, you can wrap that that into a single value polymorphic VI to return just element 0. That way you won't get a run-time error.
  22. If I remember correctly. The Array Subset only keeps track of indexes (sub-array type). Would this avoid the copy? Not sure where "parallel" queues came into it. If we were to try parallel queues (I think you are saying a queue for each read process). Then data still has to be copied (within the labview code) on to each queue? Would you not get a copy at the wire junction at least if not on the queues themselves? This scenario is really slow in LabVIEW. I have to use TCPIP repeaters (one of the things I have my eye on for this implementation) and it is a huge bottleneck since to cater for arbitrary numbers of queues, you need to use a for loop to populate the queues and copies are definately made. I think it will be impossible to see the same sort of performance that they achieve without going to compiled code (there is a .NET and C++ implementation) and if our only interest is to benchmark it to the LV queue primitives, we aren't really comparing apples (compiled code implementation of queues in the LV runtime vs labview code). However, the principle still stands and I think it may yield benefits for the aforementioned scenarios (queue-case for example), so I will certainly persevere. Of course. It'd be great if NI introduced a set of primitives in the LV kernel (Apache 2.0 licence I believe )
  23. In general, I think it will not realise the performance improvements for the pointer reasons you have stated (we are ultimately constrained by the internal workings of LV, which we cannot circumvent). I'm sure if we tried to implement a queue in native labview, it wouldn't be anywhere near as fast as the primitives. That said... There a lot of the code seems to be designed around ensuring atomicity. For example. In LabVIEW, we can both read and write to a global variable without having to use mutexes (I believe this is why they discuss CAS). LabVIEW handles all that. Maybe there are some aspects of their software (I haven't got around to looking at their source yet) that is redundant due to LabVIEWS machinations........that's a big maybe with a capital "PROBABLY NOT". I'm not quite sure what you mean about "is going to have to copy data out of the buffer in order to leave the buffer in tact for the next reader". Are you saying that merely using the index array primitive destroys the element? I'm currently stuck at the "back pressure" aspect to the writer as I can't seem to get the logic right. Assuming I have the logic right (still not sure) then this is one instance when a class beats the pants off of classic labview. With a class I can read (2 readers) and write at about 50us, but don't quote me on that as I still don't have confidence in my logic (strange thing is, this slows down if you remove the error case structures to about 1ms ). I'm not trying anything complex. Just an array of doubles as the buffer. DVRs just kill it. Not an option, So it makes classes a bit of a nightmare since you need to share the buffer off-wire. To hack around this, I went for a global variable to store the buffer (harking back to my old "Data Pool" pattern) and the classes just being accessors (Dpendancy Barrier?) and storing the position (for the reader). I should just qualify that time claim in that the class VIs are all re-entrant subroutines (using 2009, so no in-place). Not doing this you can multiply by about 100. Which method did you use to create the ring buffer? I'm currently trying the size mod 2 with the test for 1 element gap. This is slower than checking for overflow and reset, but easier to read whilst I'm chopping things around.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.